Visualizing Data

DAT-2.D.2, DAT-2.D.6 bullet 4

Tables, diagrams, text, charts, graphs, and other visual tools help extract, modify, and communicate information from data.

On this page, you will create a visualizations to help you analyze and communicate information from your dataset.

Grouping Data

DAT-2.E.3 classifying only

Classifying data means distributing data into groups based on common characteristics.

Another thing that's often done in data science is grouping (or classifying) data. For example, here is the cars data grouped by vehicle make (column 14):
group (data of table (cars)) by field (14) by intervals of ( )) reporting a table with three columns; column A contains each make of car from the original data set; column B shows the total number of cars in the original data set that are of that make; column C shows a picture of a list icon in each row of the table.

The by intervals of input to the group table block should be left empty when, as in this example, the field on which you're grouping is text rather than numbers. Later on this page, you'll see how to use intervals in graphing.

  1. Open your U5L3-Data-Processing project if it isn't open already.
  2. Talk with Your Partner Determine one question you can answer by grouping your data, and build code to answer it.
    Click for example questions for which grouping is helpful.
    • How many Toyotas are in the database?
    • Which brand in the table has the most models listed?
    • How many 2010 Hyundais are in the database? (This requires looking inside one of the lists in column C, so you'd need two keep functions.)

    Pipe may be useful for questions that require looking inside the inner lists of the grouped data (in column C).

Plotting Data

The bar chart function works like the group function, but with special features for numeric data: it allows you to select upper and lower limits of the data; you can have a range of values in one bucket, such as values 6–10, values 11–15, and so on; and it sorts the groups. For example, here is the cars data grouped by city MPG (column 9):
bar chart of table (data of table (cars)) groups by field: (9) from: (5) to: (50) interval: (5) reporting a table with three columns and 10 rows; the values in the first column are multiples of 5 from 5 to 50; the values in the second column are 0, 311, 1554, 2057, 879, 267, 6, 2, 0, 0; the values in the third column are each a list icon indicating that a sublist is stored in that field

The number in column A is the largest value included in each group. If the values aren't all integers, the next group includes anything larger. For example, the group numbered 15 includes values from 10.0001 (or anything more than 10) to exactly 15.

You can plot the data from bar chart to visualize them:
plot bar chart (bar chart of table (data of table (cars)) grouped by field: (9) from: (5) to: (50) interval: (5)) bars at x: (-200) y: (-100) width: (400) height: (200) bar graph running from 0 to 50 on the horizontal axis and from 0 to 2050 on the vertical axis with bars indicating 0 between 0 and 5, about 300 between 5 and 10, about 1500 between 10 and 15, about 2000 between 15 and 30, about 800 between 20 and 25, about 200 between 25 and 30, and 0 beyond 30

  1. Plot a few bar charts of some fields from your dataset and make at least one new observation about your data.
  2. The mode of a data set is the value that appears most often in it.

    Here is a bar chart of field 11 from the cars data set (highway MPG) with MPG values from 5 to 50, using an interval of 3. Identify the mode. (It will be a range of values such as 13–15 or 16–18.)
    [same plot bar chart instruction as above, but with grouped by field: (11) and interval: (3)] plot bar chart (bar chart of table (data of table (cars)) grouped by field: (11) from: (5) to: (50) interval: (3)) bars at x: (-200) y: (-100) width: (400) height: (200) bar graph running from 0 to 51 on the horizontal axis and from 0 to 1050 on the vertical axis with bars indicating 0 for less than 9, about 30 between 9 and 12, about 220 between 12 and 15, about 550 between 15 and 18, about 900 between 18 and 21, about 900 between 21 and 24, about 1000 between 24 and 27, about 550 between 27 and 30, about 300 between 30 and 33, about 250 between 33 and 36, about 20 between 36 and 39, about 20 between 39 and 42, and 0 beyond 42
  3. Here is another bar chart with all the inputs the same as before, but with an interval of 6. Identify the mode.
    [same plot bar chart instruction as above, but with grouped by field: (11) and interval: (6)] plot bar chart (bar chart of table (data of table (cars)) grouped by field: (11) from: (5) to: (50) interval: (6)) bars at x: (-200) y: (-100) width: (400) height: (200) bar graph running from 0 to 54 on the horizontal axis and from 0 to 1800 on the vertical axis with bars indicating 0 between 0 and 6, about 30 between 6 and 12, about 800 between 12 and 18, about 1800 between 18 and 24, about 1600 between 24 and 30, about 700 between 30 and 36, about 50 between 36 and 42, and 0 beyond 42
  4. Talk with Your Partner How can these results both be correct? (There's nothing wrong with the graphs.)
  5. Talk with Your Partner Why would you ever use an interval larger than 1?
  1. Research the question of why would you ever use an interval larger than 1.