Sample size determination for open-ended questions or qualitative interviews relies primarily on custom and finding the point where little new information is obtained (thematic saturation). Here, we propose and test a refined definition of saturation as obtaining the most salient items in a set of qualitative interviews (where items can be material things or concepts, depending on the topic of study) rather than attempting to obtain all the items. Salient items have higher prevalence and are more culturally important. To do this, we explore saturation, salience, sample size, and domain size in 28 sets of interviews in which respondents were asked to list all the things they could think of in one of 18 topical domains. The domains—like kinds of fruits (highly bounded) and things that mothers do (unbounded)—varied greatly in size. The datasets comprise 20–99 interviews each (1,147 total interviews). When saturation was defined as the point where less than one new item per person would be expected, the median sample size for reaching saturation was 75 (range = 15–194). Thematic saturation was, as expected, related to domain size. It was also related to the amount of information contributed by each respondent but, unexpectedly, was reached more quickly when respondents contributed less information. In contrast, a greater amount of information per person increased the retrieval of salient items. Even small samples (n = 10) produced 95% of the most salient ideas with exhaustive listing, but only 53% of those items were captured with limited responses per person (three). For most domains, item salience appeared to be a more useful concept for thinking about sample size adequacy than finding the point of thematic saturation. Thus, we advance the concept of saturation in salience and emphasize probing to increase the amount of information collected per respondent to increase sample efficiency.

Document Type


Publication Date


Notes/Citation Information

Published in PLOS ONE, v. 13, no. 6, e0198606, p. 1-18.

© 2018 Weller et al.

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Digital Object Identifier (DOI)


Funding Information

This project was partially supported by the Agency for Healthcare Research and Quality (R24HS022134). Funding for the original data sets was from the National Science Foundation (#BCS-0244104) for Gravlee et al. (2013), from the National Institute on Drug Abuse (R29DA10640) for Brewer et al. (2002), and from the Air Force Office of Scientific Research for Brewer (1995).

Related Content

All relevant data are available as an Excel file in the Supporting Information files.

journal.pone.0198606.s001.xlsx (267 kB)
S1 Appendix. The original data for the 28 examples.

journal.pone.0198606.s002.docx (15 kB)
S2 Appendix. GLM statistical model results for the 28 examples.