There are multiple reasons why you might want to collect race/ethnic data on a survey and each of those reasons will have different implications for what you ask and how. In this post, I will clarify the different purposes that might be served by asking about race/ethnicity and provide some suggestions for creating a question that gathers usable data. This post focuses on American race/ethnic categories; in other parts of the world there are even more options and considerations.
First, what is the difference between race and ethnicity, and why do I keep writing "race/ethnic"? Here's a great overview of the topic from Stanford's Gendered Innovations. Basically, race is a social construct based on oppression and colonization; ethnicity is a shared language or culture.
Even though race isn't real, there are some good reasons to track it. One is that you may want to know whether there are differences in outcomes for members of different population groups, or whether participants in your program from different race/ethnic groups have different needs.
You might be concerned that some groups have more unmet need than others, or that your program needs to culturally and linguistically reflect its participants. Or you may want to be able to articulate a disparity to funders and allies in order to support a program that addresses those disparities. For example, there are documented disparities in health care and education. If that's your aim, then you may want to conduct some research in advance to identify groups that you're particularly concerned about and make sure that your data collection strategy is aligned to that goal. I've written on presenting data by race here and Urban Institute presents some best practices here.
Another reason you might be collecting race/ethnic data is that you want to ensure that your program is serving a population that matches your community, or that nobody is being systematically excluded from participating in your program or services. If that's your purpose, you'll be comparing your sample to data from the US Census, and you may want to use their race/ethnic categories -- which were updated for this year's census.
Here is how the US Census asks about race/ethnicity in the 2020 Census.
But even the Census Bureau researchers have agreed that asking about race and ethnicity separately is a little confusing and doesn't really reflect how some populations think about their racial backgrounds. When they get to the question that asks for their race, some people who identify as Hispanic/Latino will select "other".
Most Americans don't think about race and ethnicity separately, so you might want to ask about race and ethnicity in a single question like this:
In this version, respondents can check multiple boxes, so your totals will add up to more than 100% -- just make a footnote on your charts. You will be able to perform comparisons with this data, for example do individuals who identify as one group have higher graduation rates than people who identify as a different group. With a little coding and cleaning up, you can transform the data to reflect the census's categories if you need to compare to census data.
If you want to have respondents check only one box, you'll need to add a multi-racial or bi-racial answer choice like this, which comes from Versta Research:
This option is nice and tidy and can easily be recoded to match the census categories. But sometimes, these answer choices don't capture the full ethnic diversity of our neighborhoods.
I once presented the demographic data from a neighborhood survey that captured only race/ethnic data as reflected on the census. When I presented the data, some audience members pointed out that by including populations such as Cape Verdean and Haitian in the Black/African-American category, my data didn't fully represent the cultural and linguistic diversity of the neighborhood. If you want to use your survey to make sure that your programming truly reflects your community's cultural diversity, you might want to include national origin. The census does it in the open text below each answer choice, in the example above. Alternately, you could add a national origin question to your survey, too. To speed the coding, you might write this as a multiple choice question where the neighborhood's major national origins are given as answer choices.
Ultimately, how you collect this data should reflect what you're going to do with it. There's no need to create a long question with 20 answer choices if you don't really have a plan to analyze or use that data. And you don't want to create a question that alienates or offends respondents. While your funders and stakeholders might dictate how you report the data, they usually don't dictate how you ask it. It is OK to recode data later if you are reporting to an agency that uses categories that don't meet all of your needs.