On this page, you will ask and answer more demanding questions about your dataset and learn new tools.
keep
. Filtering is a powerful technique for finding information and recognizing patterns in data. For example, filtering can help you answer questions like "What is the average city MPG for just the Subarus in this dataset?"keep
all the records from cars for which the 14^{th} field is "Subaru." Then, we take column 9 of those records (the "City MPG") and find their average.
Notice that there are many digits in the answer above. How many digits are given in the table for each car's MPG? An important rule in data science is not to claim more precision in a result than is warranted by the given data, so this answer should be rounded to 19.
You might find expressions with many nested function calls easier to build by using the pipe
function. What would this look like using pipe
?
You can use the pipe
function from the "Bar Charts" library to work through your data analysis one function at at time:
The pipe
function sends the data from table (
cars)
through keep
to filter it for just the Subarus, through column (9) of table
to get just the "City MPG" for those Subarus, and finally through average
to get the average of those Subaru MPG values.
Recall that the empty input slots in each function are filled by the output of the previous function (or the starting dataset in the case of the first function): the empty slots in the keep
function are both filled by data from table (
cars)
; the empty slot in column (9) of table
is filled by the output of keep
; and the empty slot in average
is filled by the output of column (9) of table
. It's like a pipe of made of pieces connected together; the data goes in one end and works through each function, computing a new value at each step.
You learned about loading libraries and exporting/importing blocks on Unit 2 Lab 4 Page 2: Making a Mathematical Library.
maximum of list
, minimum of list
, sum of list
, and average of list
blocks from your U2L4-MathTools project. pipe
. (Or try both!)average
.)minimum
.)Notice that the column you use to filter the data (such as year) doesn't have to be the column you are asking about (such as transmission).
Sometimes, you want to keep a subset of your data (such as "Which cars were made in 2010?"), but other times, you just want one item that matches your requirement, often because what you really want to know is whether any items match, and as soon as you find one, the answer is "yes" (such as "Were any electric cars made in 2010?"). Snap! has a higher order function that works similarly to keep
, but it reports only the first item that's found, so it can be faster.
Find first
is equivalent to item (1) of (keep)
. It is a higher order function like keep
, map
, and combine
because it takes a function (a predicate) as input.
Click for an example of keep
vs. find first
.
You can access or change data to create new information by using:
Map
to transform every element of a data set (such as doubling every element in a list, or extracting the parent’s email from every student record)Keep
or find first
to filter a data set (such as keeping only the positive numbers from a list, or keeping only students who signed up for band from a database of all students)Combine
to combine or compare data in some way (such as adding up a list of numbers, or finding the student who has the highest GPA) bar chart
, which you will learn on the next page)