How to Detect Fraudulent and Low-Quality Survey Responses

Guide · by Licrat

Bad survey responses don't announce themselves. A bot, a click-farm worker, and a respondent rushing through for the incentive all submit answers that look complete — and every one of them quietly degrades your dataset. By the time a fraudulent or careless response reaches your analysis, it has already shifted your averages, padded your sample, and nudged you toward the wrong conclusion.

This guide walks through the signals that separate genuine answers from junk, how to detect each one, and where automated scoring earns its place.

Why low-quality responses cost more than you think

The damage rarely scales with the number of bad responses — it's worse than that. A handful of straight-liners can flip a borderline significance test. Duplicate submissions inflate your effective sample and distort proportions. And if you pay incentives, every fraudulent completion is money spent on data you'll have to throw away. Worst of all is the reputational hit: if a client or reviewer discovers the dataset was dirty, the finding — and your credibility — goes with it.

The goal isn't to be paranoid. It's to make removal decisions you can defend.

The signals that give bad responses away

Speeding

A respondent who finishes far faster than it's physically possible to read and answer the questions almost certainly didn't engage with them. Detect it by comparing completion time against a realistic minimum for the questionnaire's length — flag anything below a sensible floor (for example, under half the median time, or beneath a hard minimum-seconds threshold you set per survey).

Straight-lining and patterned answering

Selecting the same option down an entire grid — "Strongly agree" to everything — is the classic tell of someone not reading. Detect it through low variance within matrix questions, and by checking reverse-coded items: a genuine respondent answers them in opposite directions, a straight-liner doesn't.

Attention-check failures

An instructed-response item ("For quality control, please select 'Disagree' here") is trivial for an attentive human and easy for a bot or a skimmer to miss. Whether the trap was answered correctly is one of the cleanest single signals you have.

Duplicates

The same person — or the same script — submitting more than once is common, often with small variations to dodge naïve checks. Catch it with fingerprinting (device and answer-pattern fingerprints) and by checking for duplicates both within a single batch and across batches collected over time.

Gibberish in open-text

Open-ended answers are where fraud is easiest to spot and easiest to miss at scale: random characters, copy-pasted boilerplate, off-topic text, or AI-generated filler. Character and entropy checks, repetition detection, and relevance all help separate a real answer from noise.

Uniform response timing

A real person speeds up on easy items and slows down on hard ones. A script tends to answer every item in suspiciously constant time. Low variance in per-question timing is a quiet but powerful flag — especially combined with the others.

Doing it by hand vs. automating it

With enough syntax, you can catch much of this in Excel or SPSS — for one survey. The trouble is that it doesn't scale, it's error-prone, and every researcher ends up with slightly different thresholds, so results aren't comparable across projects or teammates. Manual cleaning is also opaque to the people who depend on it: "we removed 8% of responses" invites the obvious question — "based on what, exactly?"

Where deterministic, explainable scoring fits

An automated layer applies the same rules to every response and returns a quality score together with the specific flags that triggered it — so the decision is consistent and reproducible. The word that matters is deterministic: the same response always gets the same score, and you can see precisely why. There's no AI black box making an unexplainable guess, which is exactly what makes the result auditable for a client, a reviewer, or an ethics board.

This is what Winnow does. Each response comes back with a 0–100 quality score, a clear recommendation (accept, review, or reject), and the exact flags behind it — speeding, straight-lining, failed attention checks, duplicates, gibberish, uniform timing. You stay in control of the thresholds; the engine just makes the judgment explainable.

Try it on your own data

You can score your first responses for free. Grab an API key, send a batch, and see the flags for yourself in a couple of minutes.

Get your free API key →