|
Sampling – the most important
and least understood part of survey research
Question: What is the most important musical event
in the last 10 years?
This
question was in a recent internet poll.
Most people will be surprised to learn that most popular answer was
the musical episode of Buffy the Vampire Slayer. Elton John reworking of “Candle in Wind”
at Princess Diana’s funeral was second. (Source: tvguide.com). Do you believe that result? It’s nonsense. But why is it nonsense?
The
question was put on the website.
Anyone who came to the website could answer the question. But there was no control on who answered
the question or even who came or didn’t come to the site. The people who answered the question were
representative of nobody. The advent
of the web has made getting this type of useless information easier. But
how do you get good information?
The
important part is getting a statistical sample of the population. A statistical sample is one in which
every member of the entire target population has, excuse the geek-speak, a
“known probability of selection.”
That means that we know what the odds are of each member will be
included in the sample, which doesn’t necessarily mean that each member has
the same probability of being selected.
For
example, an association has 40,000 members and wants to do a survey of
members. It selects 1,000 members
for the survey. In this case, each
member has a 1 in 40 chance of being in the survey sample. Now, let’s say that it will be an e-mail
survey and the association only has e-mail addresses for 30,000 of its
members, so they select 1,000 e-mail addresses. Now, members with a valid e-mail address
have a 1 in 30 chance of being in the sample and the members without valid
e-mail addresses have a 0 chance.
This
is still a statistical sample, but it has a built-in bias. The association needs to be sure that
there is not a difference between those with valid e-mails and those without
(this could be done with a quick phone survey of a few questions to a small
number of those without a valid e-mail address).
Another
example of unequal probabilities, I did a survey of long-haul trucking
companies for the Federal Highway Administration. About 80 percent of trucking companies
are owner/operators, that is, the company had one truck that the owner
drove. A random sample of trucking
companies could have a sample of nearly all owner/operators and would not
include the larger companies which have thousands of trucks.
To
ensure that we included the largest companies in our sample, the list of
trucking companies was sorted into three groupings: large companies,
mid-size companies, and small companies including owner-operators. We then took a sample of each group. There are two methods we could choose to
pull these samples. We could sample
so that each member of each group had the same probability of selection –
or we could choose an unequal probability.
For this example, we could choose the same number of sample cases
from each group. Notice the
difference in these methods in the table below:
Method
1: Equal Probability of Selection
|
|
Total Number
of Firms
|
Number in Sample
|
Probability of Selection
|
|
Large
companies
|
2,000
|
40
|
1 in 50
|
|
Mid-size
companies
|
10,000
|
200
|
1 in 50
|
|
Small
companies
|
48,000
|
960
|
1 in 50
|
|
Total
|
60,000
|
1,200
|
1 in 50
|
Method
2: Equal Number in Sample Groups
|
|
Total Number
of firms
|
Number in Sample
|
Probability of Selection
|
|
Large
companies
|
2,000
|
400
|
1 in 5
|
|
Mid-size
companies
|
10,000
|
400
|
1 in 25
|
|
Small
companies
|
48,000
|
400
|
1 in 120
|
|
Total
|
60,000
|
1,200
|
1 in 50
|
What
are the advantages of the two methods?
With the first method, we can easily make generalizations of the
total population (and with reduced variance). But if we wanted to make comparisons
between the small and the large companies, we may not have the power since
there are only 40 large companies in our sample.
With
the second method, we would have the power to make group comparisons. However, in order to make generalizations
of the total population, the results of the each group would need to
weighted to account for the unequal probability of selection.
Sampling
from a list can be straight-forward but there are common mistakes. One of
the most common is not getting a random sample. I know of an organization that pulled a
sample from its database. The
computer database administrator was not given guidance, so he pulled the
first 1,000 cases from the membership list.
Unfortunately, the list was sorted by zip code, so the organization
got a nice census of the members in the New England and New Jersey.
Sampling
when there is not a list is even more complex. One way to get around this, is to sample
a “parent” organization and create a list from those organizations to get a
sample. For example, to interview
hospital nurses, you may select a small number of hospitals. You ask the
hospital to supply a list of nurses and you choose a certain number from
each hospital. There are statistical
costs for this type of sample.
Surveying
the general public has its own issues.
First, there is no way at the present time to do a general public
survey using e-mail addresses. There
are companies that are trying to make it work—some very good companies, but
the methodology is not yet there.
And if you try it, you are setting yourself up for some crazy
results such as the musical Buffy the Vampire Slayer result.
<<back
next>>
White paper
home
Back to Top
Philosophy
Communications and Marketing
Research
|