Computers can be useful in processing data to gain information, but your ability process data depends both on your capabilities and the tools you have available.
On this page, you will build various tools (specifically, selectors) to help you answer questions about data that interests you.
You may be familiar with tables from spreadsheet applications (such as Google Sheets, Apple Numbers, or Microsoft Excel) which store tabular data in different file formats. Spreadsheet programs help efficiently organize information, and they can find trends in data automatically (such as the line shown at right). CSV is an open spreadsheet format that works in any of these applications and in Snap!.
Although spreadsheets are a common kind of data, they aren't the only kind. Consider the list of words that you used in Lab 1 to check spelling. It's not a list of lists with two dimensions of data (rows and columns); it's just a list with one dimension of data. "Spreadsheet" refers to two-dimensional data organized in rows and columns. "Dataset" is the more general term for any collection of data, including simple, one-dimensional lists; spreadsheets (two-dimensional lists of lists); and more complicated datasets such as spreadsheets with multiple tabs.
item (1) of
) differ from the other items?A table is represented in Snap! as a list of lists. If you right-click (or control-click on a Mac) a table, you can switch to "list view" and see how the data (and column headings) are stored. See examples of table view and list view.
Record
and column
are selectors for a table
abstract data type. We don't need a constructor for this abstract data type because we are importing the data from the Internet, but the selectors will be useful. (Field
is a selector for record
, which is, itself, an abstract data type.)
Notice that these suggested block names include the word "table" or "record" before the second input. Including the expected input data type in the block name can help you avoid bugs caused by using a selector that doesn't match the input you want to use.
Try to figure out how to report just one column, but click if you really need a hint.
performs the same function on every item in a list. Have I seen
map
before?
map
will perform the same function on every record in your dataset. You will need to determine what function to map over the dataset. You learned about map
on Unit 3 Lab 2 Page 5: Transforming Every List Item.
You can see the column number by holding your mouse pointer over the letter at the top of the column in table view.
You may need to use map
, keep
, or combine
to answer your question. Click to see where you learned about these higher order functions.
map
on Unit 3 Lab 2 Page 5: Transforming Every List Item.keep
on Unit 2 Lab 3 Page 5: Keep
ing Items from a List .combine
on Unit 2 Lab 4 Page 3: Other Mathematical Reporters.Click for example questions to ask about a single column.
average
block.)minimum
block.)Notice that all of these examples only require data from one column. If you want to ask a question that requires looking at another column (for example, "What's the model of the car with the highest MPG?"), you can do the Take It Further Activity below.
Researchers often face challenges with data before they even begin analysis. Suppose you are combining data from different countries about distances between cities, and you discover that the distance data from the U.S. is measured in miles, but the distance data from Europe is measured in kilometers; to make meaningful comparisons, you need uniform data (all in miles or all in kilometers). As another example, if you use an online survey to collect data, the way participants abbreviate, spell, or capitalize their entries may vary. Data may also be incomplete (if some people didn't complete the survey) or invalid (if some people made mistakes).
Cleaning data is the process of making the data uniform without changing its meaning (such as replacing abbreviations, spellings, and capitalizations with the intended word or converting miles to kilometers). Programmers can use programs to filter and clean digital data, thereby gaining insight and knowledge.
Imagine you read in the news that people who eat a lot of broccoli are less likely to get cancer. The conclusion that broccoli prevents cancer could be a result of bias. It could be that people who eat a lot of broccoli tend to be the same people who also get a lot of exercise, and it's actually the exercise that makes the difference. In research, the term "bias" doesn't have to mean prejudice; it's about reasons the data might not mean what they seem to mean.
People sometimes think that the way to overcome bias is to use a bigger sample (asking more people if they eat broccoli and have cancer). But if the bigger sample has the same problem (people getting more exercise also eat more broccoli), then a bigger sample won't eliminate the bias.
I need to redo these images and code
statements with "record" instead of "row." --MF, 6/27/19
Click for a hint about how to build row with maximum in column () of table ()
.
One way to build a function for a simple list (that isn't a table) is shown below. You can build a
block that compares two inputs and use it with
combine
to find the maximum of a whole list.
You can use a similar approach here by first building a that compares a specific field (column) for two rows and reports the row with the higher value in the specified column.