How to Download Percent Non-White Population Data by Census Tract for GIS

In order to map non-white population per census tract, you must first perform some steps to prepare the data. In this tutorial, we will practice:

Downloading high-quality data
Cleaning the data to prepare it for mapping

Downloading the data

You can achieve these steps using a number of software environments, including R and Python. In this tutorial we will download the data manually from NHGIS.org, and show how to clean the data using 3 different spreadsheet tools: Excel, LibreOffice, and Google Sheets.

Visit NHGIS.org . Create a free academic use account, if you haven’t already, and log in.
From the NHGIS.org home screen, choose START HERE: Get Data in the middle of the page, or SELECT DATA in the navigation bar menu at the top of the page.
Click TOPICS.
Scroll down and choose Race from the RACE, ETHNICITY, AND ORIGINS tab. Check off this item using the check-box furthest to the left.
Click SUBMIT.
Under YEARS select whichever American Community Survey 5-year range is most applicable to your project.
Under GEOGRAPHIC LEVELS choose CENSUS TRACT

Why are we using census tracts?

Censuses are collected at the individual level. When you fill out a census form, you fill it out for yourself, or for your household. Survey results for each household are generally not released to the public, as a way of protecting people's privacy. Instead, individual results are aggregated to specific units of geography. These units have been pre-drawn by the United States Census Bureau. You can learn more about the hierarchical census geographies on census.gov.

Census tracts are a popular unit for geospatial analysis, because counties, the level above, can be too vague to show spatial specificity, whereas more granual geographies, such as block groups or blocks can have a higher likelihood for sampling errors.

In the SELECT DATA table, click the table heading POPULARITY. This will sort the resulting datasets from most to least commonly downloaded.
Select the first result, “Race” by clicking the green check-box to the left of the result. You will see the source table reflected in your your DATA CART: 1 SOURCE TABLE.
From your DATA CART choose CONTINUE and CONTINUE and SUBMIT to begin the download. If you are not logged in you will be prompted to do so now.
Wait a moment or so for your extract to finish processing, and then download it by selecting TABLES.

Preparing the data for mapping

Editing the scale of the data

We have downloaded rows for every census tract in the entire United States. This is not ideal because we are planning to upload this data to visualization software. Reducing our data at the outset will save us time and space.

The data cleaning portion of this tutorial is written for three spreadsheet software options: Microsoft Excel, an open (free) desktop spredsheet tool called LibreOffice , or Google Sheets.

Excel instructions

Open the .CSV in Microsoft Excel. If you are prompted to remove leading zeroes, choose Don't Convert.
Highlight the column titled STATE.
From the menu ribbon choose Sort and Filter.
Choose Filter. Click the drop-down arrow at the top of the column titled STATE.
Uncheck Select All.
Check off New York and close the filter window by clicking the small red x in the upper left-hand corner.
CTRL + A on a Windows, Command + A on a Mac, or from the menu Edit → Select All to highlight all the records.
Copy the records. CTRL + C; Command + C; Edit→ Copy.
Open a new sheet File → New.
Paste the records into the new sheet. Save this sheet with your other datasets as new-york-percent-nonwhite.csv

Do not use .XLSX

Make sure to always save your spreadsheet data for mapping in .CSV format. .CSV is an open-data format, and has a higher likelihood of being interoperable with most GIS software.

LibreOffice instructions

Open the dataset in LibreOffice.
Click the AutoFilter button in the menu ribbon.
This will add drop-down arrows to the tops of the column headers. Choose the drop-down arrow next to the column titled STATE.
Uncheck All. Scroll down and check off New York.
CTRL + A on a Windows, Command + A on a Mac, or from the menu Edit then Select All to highlight all the records.
Copy the records. CTRL + C; Command + C; Edit, Copy.
Open a new sheet File → New → Spreadsheet.
Paste the records into the new sheet. Save this sheet with your other datasets as new-york-percent-nonwhite.csv

Google Sheets instructions

Type sheets.new into a browser to open a new Google sheet.
Choose File→ Open→ Upload→ Browse to bring in the census data .csv.
For the STATE column, click or hover over the column header until a drop-down arrow appears.
Click the drop-down arrow to open a column menu.
Choose Create a Filter.
Now, next to the header name for the column, there will be a button with three horizontal lines.
Click the button next to the header name with the horizontal lines to open a filtering wizard.
Under Filter by Values, next to Select All, click the text that says Clear.
Scroll down to where it says New York and click New York. Select OK.

Scaling data more granularly by using the table

We have reduced our records to only represent census tracts in New York State. If we wanted to further scale down our data, we could filter the records by a particular county. For instance, we could include only the counties associated with New York City.

Creating a mappable percent column

Open the NHGIS codebook. The file should read something like nhgis00xx_ds262_20225_tract_codebook.txt.

nhgis codebook

Observe that the variables are structured as follows:

AQNGE001: Total population
AQNGE002: White alone
AQNGE003 to AQNGE008: Various categories of nonwhite populations

AQNGE009 and AQNGE010 are explanations of how AQNGE008 breaks down further; we will not include these last two categories in our sum, because they are duplicate values.

To create a mappable column which is a percent nonwhite, we have to sum all of the counts for each nonwhite category, and divide by the total population.

Percent Nonwhite Equation

Percent Nonwhite= Sum of categories AQNGE003 to AQNGE010 / AQNGE001 x 100

Create a new column called PCT_NONWHITE.
In the second cell of the new column, enter the equation
= IFERROR((AS2+AT2+AU2+AV2+AW2+AX2+AY2+AZ2)/ AQ2* 100, 0)
‘IFERROR’ catches null values in the source census data. We ask the function to return 0 instead of an error message if the function can’t carry out.
Fill down to complete the column by double-clicking the plus sign on the cell with the formula.
File → Save. Make sure to save as a .CSV.

You now have a dataset you can use to map percent nonwhite. Use the GEOID column to join this data with a census tract shapefile, and use the newly computed PCT_NONWHITE field to power the visualization.

Follow the next tutorial in the workshop to learn how to create a map from this data.