Background: Internet is becoming an increasingly common tool for survey research, particularly among “hidden” or vulnerable populations, such as men who have sex with men (MSM). Web-based research has many advantages for participants and researchers, but fraud can present a significant threat to data integrity.

Objective: The purpose of this analysis was to evaluate fraud detection strategies in a Web-based survey of young MSM and describe new protocols to improve fraud detection in Web-based survey research.

Methods: This study involved a cross-sectional Web-based survey that examined individual- and network-level risk factors for HIV transmission and substance use among young MSM residing in 15 counties in Central Kentucky. Each survey entry, which was at least 50% complete, was evaluated by the study staff for fraud using an algorithm involving 8 criteria based on a combination of geolocation data, survey data, and personal information. Entries were classified as fraudulent, potentially fraudulent, or valid. Descriptive analyses were performed to describe each fraud detection criterion among entries.

Results: Of the 414 survey entries, the final categorization resulted in 119 (28.7%) entries identified as fraud, 42 (10.1%) as potential fraud, and 253 (61.1%) as valid. Geolocation outside of the study area (164/414, 39.6%) was the most frequently violated criterion. However, 33.3% (82/246) of the entries that had ineligible geolocations belonged to participants who were in eligible locations (as verified by their request to mail payment to an address within the study area or participation at a local event). The second most frequently violated criterion was an invalid phone number (94/414, 22.7%), followed by mismatching names within an entry (43/414, 10.4%) and unusual email addresses (37/414, 8.9%). Less than 5% (18/414) of the entries had some combination of personal information items matching that of a previous entry.

Conclusions: This study suggests that researchers conducting Web-based surveys of MSM should be vigilant about the potential for fraud. Researchers should have a fraud detection algorithm in place prior to data collection and should not rely on the Internet Protocol (IP) address or geolocation alone, but should rather use a combination of indicators.

Document Type


Publication Date


Notes/Citation Information

Published in JMIR Public Health and Surveillance, v. 5, issue 1, e12344, p. 1-10.

©April M Ballard, Trey Cardwell, April M Young. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 04.02.2019.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on http://publichealth.jmir.org, as well as this copyright and license information must be included.

Digital Object Identifier (DOI)


Funding Information

This research was supported by the National Institute on Drug Abuse (NIH NIDA R03 DA039740).