How to Clean Low-Quality Responses from Qualtrics and SurveyMonkey Exports

Guide · by Licrat

You've closed fieldwork, you've downloaded the file, and now you have a few thousand rows that range from excellent to garbage. This is the least glamorous part of any survey project and the one most often skipped — a surprising share of do-it-yourself survey runners never clean their data at all. It's also the part that decides whether your conclusions are real.

This guide covers what each platform gives you, where those tools stop, and how to get a single consistent quality standard across whatever you've exported.

What Qualtrics gives you

Qualtrics has decent built-in quality features, but you have to know their edges.

Speeders are flagged relative to your sample — responses more than two standard deviations below the median duration, once you have collected around 100 responses. Because the threshold moves as data arrives, Qualtrics itself advises against deleting speeders until collection is finished.

Bot detection uses Google's invisible reCAPTCHA v3 and writes a Q_RecaptchaScore; a score below 0.5 is treated as a likely bot. Note that imported responses aren't checked.

Duplicate and fraud detection changed recently. As of mid-2025, Qualtrics deprecated its long-standing RelevantID integration and the associated fraud score, moving to a newer Q_DuplicateRespondent flag. If you have old workflows keyed on Q_RelevantIDFraudScore or Q_RelevantIDDuplicate, they no longer compute live values for new responses. It's a useful reminder that platform-side fraud tooling is a moving target — what you build on today may quietly stop working next year.

What SurveyMonkey gives you

SurveyMonkey's Response Quality tool uses machine learning to flag poor-quality responses — gibberish open-ends, straight-lining (answering the same option on every question), and speeding. It's useful, but it's fenced in: it's a paid-plan feature, English-only, available only in the US data centre, and it runs only on complete responses. When you export, the flags arrive in a separate file keyed by respondent ID, which you then have to join back to your data.

It's also ML-based, which means each verdict is a probability you can't fully inspect — fine for triage, harder to defend line by line.

Where the platform tools leave a gap

Put together, the limitations form a pattern:

They're partial. Each platform covers some signals and not others, and the coverage changes over time (see RelevantID).
They're conditional. Paid tier, region, language, completeness, "imported responses excluded" — the fine print decides whether you're actually protected.
They don't travel. The moment your data leaves the platform — merged with another source, handed to an analyst, re-run next quarter — the platform's checks don't come with it.
They're often not explainable. An ML "poor quality" flag or a reCAPTCHA probability isn't something you can walk a client through.

If you run more than one platform, or you need the same standard applied to every dataset, you end up doing the consistent part by hand anyway.

Applying one consistent standard

This is where an external, deterministic layer earns its place. Once you have your export — CSV, XLS, or SPSS — you map its columns to a common shape and score every response the same way, regardless of where it came from.

With Winnow that's two steps:

Map your columns. Exports never share a schema — Qualtrics names its duration field one thing, SurveyMonkey another. The API's mapping parameter lets you point your export's field names at the fields the scorer expects, so you don't have to reshape the file first.
Score the batch. Send the responses to /v1/score/batch and get a quality_score, an accept / review / reject recommendation, and explicit flags for each row. For a whole-dataset view, /v1/report returns the score distribution, how often each flag fired, an estimated count of clean responses, and an overall quality grade — the summary you'd otherwise assemble by hand in a pivot table.

Because the six signals are rule-based, the result is deterministic and explainable: the same row always scores the same, and every rejection comes with the reason attached. The same standard applies to a Qualtrics export and a SurveyMonkey export alike — which is the whole point when your insights have to be trustworthy and consistent.

Clean your first export free →