Blog Post »
Catching Careless Responders in Online Studies
Psychologists are increasingly turning to online data collection methods when conducting research, and such methods sometimes degrade data quality. We all have heard anecdotal accounts of malingering participants on M-Turk and online HSP studies and, as a result, many of us have probably developed our own hunches regarding which methods to employ when trying to keep our online data clean. But what are the best methods available?
A new study in Psychological Methods (Meade & Craig, 2012) empirically examined methods for identifying careless survey responding in online data collection. Nearly 400 undergrads completed an 11-page online survey with roughly 50 items per page—yes, my eyes hurt would hurt, too—and participants were exposed to three different instruction types: anonymity (i.e., your responses are completely anonymous), confidential/identified (i.e., your responses are confidential but you must enter your name at the bottom of each page so we can examine your data), and stern warning (i.e., honest/effortful responding is part of the university ethic code and we’ll be watching you). The researchers subsequently employed and analyzed many techniques for identifying careless responders, including bogus items (e.g., please choose “strongly agree”), a few measures of self-reported engagement/attention/honesty/etc., Marlow-Crowne & BIDR, and computational methods (e.g., Mahalanobis D, examining strings of consecutive identical responses).
Results were as follows. First, although the instructions did not exert a strong effect across methods, participants in the confidential/identified and stern warning conditions responded incorrectly to fewer bogus items than participants in the anonymous condition, suggesting that a slight change in our instructions could improve data quality. Second, results of different techniques (e.g., self-reports vs. bogus items) were at best modestly correlated, suggesting that any one technique is insufficient to properly clean a large data set. Third, self-reported engagement/attention/honesty/etc. and Marlowe-Crown/BIDR correlated near-zero with objective and computational methods, suggesting that these methods might not be valid indices of data quality.
Finally, the researchers simulated thousands of data sets with varying parameters of carelessness (e.g., 5% v. 10% of respondents) and compared techniques in their efficacy for correctly classifying respondents as careless. Without going into too much detail, one method that emerged as relatively effective was the even-odd consistency measure, in which a single scale (e.g., the NPI) is divided into two subscales based on even/odd numbered items, and within-person correlations are then calculated between these two subscales.
Further discussion of the efficacy and implementation of the even-odd technique and various other computational techniques is in the paper, as well as a nifty two-page “recommendations” section for anyone with a more of a life than me. In short, it seems we can improve our data quality by a) not promising anonymity; b) including some bogus items; c) running one or two basic computational checks of data quality; and d) not relying on funnel debriefing-style self-reports of honest responding. Hope this helps everyone!