Stamping Out Survey Sinners

Shift Insight research experts put their knowledge to the test

Shift Insight shares how survey design can help to stamp out poor quality responses, to try to identify ‘click farm’ bots, cheaters, and disengaged respondents in their survey data.

So, you’ve hit your target number of responses for your survey, but is it safe to assume that every one of these responses is an honest and accurate portrayal of a real person’s views? Realistically, there will always be some bad apples contaminating the data. These are likely to come in the form of three categories: bots, cheaters, and disengaged respondents.

‘Click farm’ type fraud is a huge problem for online research because the data that is produced is effectively 100% fake. These are bots that cheat the system to make money on a large-scale. They register as multiple panellists using spoof IP addresses and profiling information, and their answers are completely automated.

Cheaters, on the other hand, are usually real people who only complete the survey for the incentive and therefore speed through as quickly as possible to claim the prize. They may have no relevant experience or knowledge of the subject matter being investigated, which gives rise to responses that are not representative of the target population’s views.

Finally, we have disengaged respondents, who could be exactly who you are looking for, and even start out being diligent, but lose interest and become fatigued as they move through the survey. They are less likely to read or acknowledge survey questions in full detail, and are therefore unlikely to provide carefully considered and accurate answers.

The extent to which survey sinners are a problem for online research is difficult to measure. The validity of the data could be called into question as false and inaccurate responses could produce skewed statistical estimates and significantly damage the research.

A global survey respondent panel estimates a 5% rate of poor-quality responses from the data they provide, though this is likely to be an underestimate, given that the data cleaning and identification of bad apples relies on the researchers to flag these to the provider. Luckily, there are a few tricks that can help to prevent false responses and weed out survey sinners from the data sample; and we put some of these to the test in our own survey…

1. Double questions

One method – which is likely to catch cheaters or bots – is to include two questions which contradictory answers could be provided for. For example, you could ask someone whether they are a parent and then ask how many children they have (including a ‘none’ option). This works particularly well if the questions are at different points in the survey.

2. Contradictory grid statements

Along similar lines, using a grid with agreement statements which directly contradict one another is a useful method. This would mean agreeing to both statements would be inaccurate or ‘impossible’. This especially helps make it clearer when respondents have flatlined (i.e. given the same response for every statement), versus those who just genuinely agree with all your statements!

3. Open questions

Open questions are an easy place to spot poor quality respondents – writing a logical answer can just be too much effort for cheaters, bots, and the disengaged! Have a scan of people’s answers for any nonsense, irrelevant, or unusually repetitive (i.e. mass automated) answers. This could include numerical outliers.

4. Attention checking question

One simple way to catch poor quality respondents is to throw in a question for the sole purpose of tripping them up. Check if respondents are still paying attention by asking them to select a particular answer (ideally keep it on a similar topic to the rest so it’s not too obvious). Someone didn’t select answer B when you told them to? Get rid of them!

Putting these tricks to the test

We created a survey exploring attitudes towards climate change, with the different identification methods peppered throughout with the aim of testing whether these were effective in catching poor quality respondents. Using a leading respondent panel, we received 89 responses to the survey.

After carefully combing through and cleaning the data, we found that 20.2% of the total number of respondents were flagged as providing poor-quality responses – four times the amount estimated by the panel! The table below shows the number of poor-quality responses that were identified for each method:

Contradictory grid statements Open questions Attention checking question Double question
10 8 6 0
Contradictory statements were found to be the most effective method, revealing 10 incidents of poor-quality responses, whilst the “double question” found none.

This amounted to 18 individual respondents flagged in total – several of these were caught out on more than one test.

Now, we could easily just get rid of all these respondents and try to replace them, however, sometimes the last thing you want to do is throw out responses unnecessarily (it may have been a bit of a slog to get them!). It’s often worth taking the extra time to scrutinize your data.

For each of these methods, differentiation between those who are definitely unreliable versus those who possibly are is needed. We recommend a flagging system where those who fall into the grey area are only removed if they fail multiple criteria.

In addition to the methods we tested, there are other ways you can increase the reliability of your data, and prevent drop-outs. These can range from making your survey as clear and efficient as possible to keep respondents engaged, to restricting access to stop multiple attempts by bots:

  • Match the question wording to the scale labels so even those scanning questions can comprehend.
  • Use consistent language, particular across rating scales.
  • Keep the survey as short and concise as possible.
  • Make it conversational – build rapport!
  • Make it interactive and visual – the survey should be enjoyable.
  • If you are using a potentially untrustworthy sample source, try not to make it too obvious who you want to speak to in the introduction and screening questions.
  • Don’t allow multiple attempts at the survey – but do allow people to save and finish their response later if the survey exceeds 15 minutes.
  • Remove your ‘back’ button – don’t allow respondents to edit their answers to the ‘right’ response.

As technology improves there are also more rigorous checks you can undertake. This includes facial recognition, identification checks (e.g. reviewing passports), and recording geolocations. At the moment, these are more often used for qualitative research, but as bots become more sophisticated, it’s likely quantitative checks will need to be tighter.

All of these approaches will likely take a bit of tailoring to fit your specific project, and the decision for removing respondents can be subjective, but hopefully these tips will help you to make your survey data squeaky clean.

About Shift Insight

Shift Insight are a market research agency specialising in education, sustainability, and membership. To find out more, please visit their website: www.shift-insight.co.uk

Comments are closed.