Working With Large Data Sets

EKs covered: 3.1.2F, 3.2.1A, 3.2.1B, 3.2.1C, 3.2.2A, 3.2.2B, 3.2.2C, 3.2.2D, 3.2.2G

What about data with millions of pieces of information, instead of a few hundred? Large data sets present challenges and opportunities for discovering new information.

Baby Name Voyager
  1. Using a Web browser, open the Baby Name Voyager. This visualization shows the 1000 most popular names of boys and girls born in the United States for every year from 1880 to 2014.
  2. If the graph becomes unresponsive or blank, reload it.
  3. What was the most popular girl's name in the 1900s? In the 1960s?
  4. What boys' names are much less popular today than they were in 1880?
  5. Type in what you think is the most popular name in your school. Is this name still popular for new babies?
  6. What else can you find? Find some interesting information in the data, then prepare to show it to your class.

The Baby Name Voyager is an impressive visualization of a large data set. This data comes from the Social Security Administration, a text file for each year from 1880 to 2014. Very few of the insights in this data would be learned just from reading these files!

Large data sets present unique challenges and opportunities:

Visualizations and interactive tools are especially valuableRunning Data in NYC when working with large data sets, giving people the opportunity to study what might otherwise be incomprehensible. This map from YesYesNo was generated from runners contributing their tracking data.

  1. Work with the Social Security birth data to produce a visualization of your own. You can start with this file, with other years' data available here to download.
  2. Think of a large data source you've produced, and visualize it using Snap!. Remember, large data sets can include text, sound, images, and video.