How to Download and Prepare Census Data for GIS Analysis

Jan 18, 2023 · 7 mins read

To effectively map and analyze social, economic, or demographic patterns using Census data in a GIS interface, this tutorial will walk you through the process of downloading and preparing data manually.

This tutorial focuses on manual data preparation using spreadsheet software. While every step in this guide details manual processes, most of these tasks – including downloading, filtering, and calculating derived variables – could also be efficiently performed using programming languages like Python or R, especially for larger datasets or automated workflows.

For those interested in programmatic access:

We will cover:

  • Downloading high-quality, free data from NHGIS.org.
  • Understanding and selecting appropriate geographic units.
  • Cleaning and scaling the data using common spreadsheet software.
  • Calculating derived variables (like percentages or rates).
  • Mastering a methodology you can adapt for various Census variables.

1. Downloading High-Quality Data

Visit NHGIS.org. Create a free academic use account if you haven’t already, and log in.

From the NHGIS.org home screen, choose START HERE: Get Data in the middle of the page, or SELECT DATA in the navigation bar menu at the top.

Click TOPICS. You can browse broad categories (like "Poverty" or "Race, Ethnicity, and Origins") or use the search bar to find specific variables relevant to your project.

Once you've found the variables you need (e.g., specific population counts, income levels), select them using the check-box furthest to the left. Click SUBMIT.

Under YEARS, select the American Community Survey 5-year range or Decennial Census year most applicable to your project.

Under GEOGRAPHIC LEVELS, choose CENSUS TRACT.

Why are we using census tracts?

Censuses are collected at the individual level, but results are aggregated to specific units of geography to protect privacy. Census tracts are a popular unit for geospatial analysis because counties (the level above) can be too vague to show spatial specificity, whereas more granular geographies (such as block groups or blocks) can have a higher likelihood for sampling errors. You can learn more about the hierarchical Census geographies on census.gov. If you need help determining the best geographic level for your analysis, schedule a consultation with us.

After selecting your variables, years, and geographic levels, your DATA CART will reflect your selections. From your DATA CART, choose CONTINUE, CONTINUE, and SUBMIT to begin the download. If you are not logged in, you will be prompted to do so.

Wait a moment or so for your extract to finish processing, and then download it by selecting TABLES.


2. Preparing Your Data for Mapping

You will have downloaded rows for every census tract in the entire United States, which is often more data than you need. Reducing your data at the outset will save time and space, especially when importing to visualization software.

Editing the Scale of the Data (Filtering by State)

This data cleaning portion of this tutorial is written for three spreadsheet software options: Microsoft Excel, an open (free) desktop spreadsheet tool called LibreOffice, or Google Sheets.

Microsoft Excel instructions:
  • Select the header row (row 1).
  • Go to the Data tab on the ribbon and click Filter.
  • Click the down arrow next to the column header containing state names (e.g., 'STATEA').
  • Uncheck (Select All) and then check the box next to the desired state(s). Click OK.
LibreOffice instructions:
  • Select any cell within your data range.
  • Go to Data in the top menu and select AutoFilter.
  • Click the down arrow next to the column header containing state names.
  • Uncheck All and then select the desired state(s). Click OK.
Google Sheets instructions:
  • Select any cell within your data range.
  • Go to Data in the top menu and select Create a filter.
  • Click the filter icon (upside-down triangle) in the column header for state names.
  • Click Clear and then select the desired state(s). Click OK.

Scaling Data More Granularly (Filtering by County/City)

Once you've reduced your records to a specific state (e.g., New York State), you can further scale down your data by filtering records by a particular county or a group of counties (e.g., the counties associated with New York City). You will use the same filtering steps as above, applying them to the county-level column in your dataset. If you need assistance with specific filtering strategies, schedule a consultation with us.


3. Calculating Derived Variables: A Step-by-Step Example (Poverty Rate)

Many valuable Census variables, like percentages or rates, need to be derived from raw population counts. This process is often called normalization in data and GIS analysis. When seeking spatial patterns in mapped and visualized data, understanding ratios (normalized values) is often more useful than looking at raw counts, as counts can be misleading where populations are dense. However, this approach does not apply to all variables; readers should think critically about whether their variable should be represented as a count or a ratio to truly reflect what they intend to show. For assistance in determining the best representation for your data, please consult our team at the Harvard Map Collection.

From your downloaded folder, open the codebook. The file should read something like nhgis00xx_ds262_20225_tract_codebook.txt. This codebook details the meaning of each variable.

For Poverty Rate, your goal is to divide the number of people below a certain poverty level by the total population for whom poverty status is determined.

  • AMZME002 is the number of people with income under 0.50 of the poverty level.
  • AMZME003 is the number of people with income between 0.50 and 0.99 of the poverty level.
  • AMZME001 is the total population for whom poverty status is determined.

Poverty Rate Equation:
Poverty Rate = (AMZME002 + AMZME003) / AMZME001 * 100

Steps to create the column:

  • Create a new column in your spreadsheet software called POVERTY_RATE.
  • In the second cell of the new column (e.g., B2), enter the equation (example for Excel/Sheets): = IFERROR((AR2 + AS2)/AQ2 * 100, 0)
    • IFERROR catches null values in the source Census data. We ask the function to return 0 instead of an error message if the function can’t carry out the calculation.
  • Fill down to complete the column by double-clicking the plus sign on the cell with the formula, or dragging the fill handle.
  • Save your file, making sure to choose the .CSV format.

4. Adapting for Other Census Variables & Calculations

The methodology demonstrated above is flexible. You can apply the same principles to calculate other derived variables (like percentages of specific populations) from raw Census counts.

To create a derived percentage or rate for your specific variable:

  1. Consult the Codebook: Open the codebook.txt file from your NHGIS download. Carefully identify:
    • The variable representing your total population or denominator (e.g., AQNGE001 for Total Population in the Race data).
    • The variables representing the specific population segment(s) you want to include in your numerator. (e.g., for "Percent Non-White," you would sum AQNGE003 through AQNGE008 and potentially other relevant categories). Pay close attention to any notes about duplicate values or subtotals, as sometimes certain categories are already included in others (e.g., AQNGE009 and AQNGE010 in Race data are often breakdowns of AQNGE008 and should not be double-counted).
  2. Construct Your Equation: Use the sum of your chosen numerator variables, divide by your total population/denominator, and multiply by 100 to get a percentage.
    • Example (Percent Non-White): (AQNGE003 + AQNGE004 + AQNGE005 + AQNGE006 + AQNGE007 + AQNGE008) / AQNGE001 * 100
  3. Apply in Spreadsheet Software: Create a new column and enter your custom formula, then fill down.
  4. Save as CSV: Always save your final prepared data in a .CSV format for easy import into GIS software.

5. Finalizing Your Data for GIS

You now have a dataset ready for GIS mapping. Use the GEOID column (which uniquely identifies each Census tract) to join this data with a Census tract shapefile in your GIS software. The newly computed derived field (e.g., POVERTY_RATE or PCT_NONWHITE) will then power your visualization.

For a step-by-step guide on how to join tabular data in QGIS and create a map, please refer to our tutorial on Performing a Table Join.


This site is a project of the Harvard Map Collection. Visit our website to learn more.