Complicated Thoughts: So I ran a Google Consumer Survey

So I finally actually ran a GCS survey yesterday. The question: "Which of these candidates in the Republican Presidential Primary do you most support?" Trump, Carson, Cruz, Rubio, "Other", or "I am not a Republican".

Why did I do it that way?

Apparently the version with a screening question cost 20 times as much per response. Even with the "I am not a Republican" option getting over 50%, this was still 10 times cheaper (and almost free with a coupon).

The pricing has definitely gotten more complicated. At launch time, if I remember correctly a 2-question survey with a (>10% response) screening question was 50 cents per response. Apparently responses work such that you're better off either having a long series of questions, or having a single question (which is as easy to answer as it is to close the window). So I think that for exploratory polling, the single-question approach will almost always be better.

There's a limit of 7 answers for GCS responses. I suspect that the clown-car listing of GOP candidates is distorting the results of GOP polls. Even if nobody supported Rand Paul, he might get 2% just from hearing his name. (the "Deez Nuts" phenomenon)
There is no "decline to state" since I assume those people won't answer, or will volunteer it in the "Other" box.
It's very hard to tell what the impact of partial responses are. Is "14.4%" a good response rate? Most of the example surveys have 15-20% responses on the first screening question, so there apparently isn't too large a response bias. But there is some. Hopefully it's non-voters who were more likely to not engage.
Also, there wasn't a separate option for "I am not a voter" and "I am a Democratic voter". So it's hard to tell how many of the 52% are non-citizens or non-voters or simply Hillary Clinton supporters. Or how many of the Hillary supporters also support Trump. This is a poll of Americans, not of Republicans.
The difference between "this is a result that is statistically significant even if you didn't know it going in" and "this is a result that is statistically significant if you had a reason to be testing this specific reason going in" is very difficult to communicate.
The difference between "the error we know we have from having a biased sample" and "the error we expect to have based on taking a sample at all" is also difficult to communicate.
A tool "to get a statistically significant sample on this cross-tab result, you would need a survey with about 4000 responses" would also be an interesting tool to have. (or even "4000 responses - +/- 4%. 6000 responses - +/- 3%)
Polls are iterative things. You have to see what types of interesting results you can get before you know what the interesting results mean. In particular, it would be interesting to prompt with a bunch of "moderate" second-tier candidates (Bush / Kasich / Christie / Fiorina) and see how many people volunteer the front-runners. Alternatively, a bunch of "conservative" second-tier candidates (Jindal / Santorum / Huckabee / Paul).

Complicated Thoughts

Friday, November 13, 2015

So I ran a Google Consumer Survey

1 comment: