ST EK List:
3.1.1A Computers are used in an iterative and interactive way when processing digital information to gain insight and knowledge.
3.1.1B Digital information can be filtered and cleaned by using computers to process information.
3.1.1C Combining data sources, clustering data, and data classification are part of the process of using computers to process information.
3.1.1D Insight and knowledge can be obtained from translating and transforming digitally represented information.
3.1.1E Patterns can emerge when data is transformed using computational tools.
3.1.2A Collaboration is an important part of solving data driven problems.
3.1.2B Collaboration facilitates solving computational problems by applying multiple perspectives, experiences, and skill sets.
3.1.2C Communication between participants working on data driven problems gives rise to enhanced insights and knowledge.
3.1.2D Collaboration in developing hypotheses and questions, and in testing hypotheses and answering questions, about data helps participants gain insight and knowledge.
3.1.2E Collaborating face-to-face and using online collaborative tools can facilitate processing information to gain insight and knowledge.
3.1.2F Investigating large data sets collaboratively can lead to insight and knowledge not obtained when working alone.
3.2.1A Large data sets provide opportunities and challenges for extracting information and knowledge.
3.2.1B Large data sets provide opportunities for identifying trends, making connections in data, and solving problems.
3.2.1C Computing tools facilitate the discovery of connections in information within large data sets.
3.2.1D Search tools are essential for efficiently finding information.
3.2.1E Information filtering systems are important tools for finding information and recognizing patterns in the information.
3.2.1F Software tools, including spreadsheets and databases, help to efficiently organize and find trends in information.
3.2.2A Large data sets include data such as transactions, measurements, text, sound, images, and video.
3.2.2B The storing, processing, and curating of large data sets is challenging.
3.2.2C Structuring large data sets for analysis can be challenging.
3.2.2D Maintaining privacy of large data sets containing personal information can be challenging.
3.2.2E Scalability of systems is an important consideration when data sets are large.
3.2.2F The size or scale of a system that stores data affects how that data set is used.
3.2.2H Analytical techniques to store, manage, transmit, and process data sets change as the size of data sets scale.
Self-Check: Big Data
On this page, you will prepare for data questions on the AP exam.
Here are two BJC videos about data from University of California, Berkeley.
These questions are similar to those you will see on the AP CSP exam.
Scientists studying birds often attach tracking tags to migrating birds. For each bird, the following data is collected regularly at frequent intervals:
Date and time
Latitude and Longitude
Altitude
Temperature
Which of the following questions about a particular bird could not be answered using only the data gathered from the tracking tags.
Approximately how much time does the bird spend in the air and on the ground?
This could be determined from the “Altitude” data.
Does the bird travel in groups with other tracked birds?
This could be determined from the “Latitude and Longitude” data.
Is the migration path of the bird affected by temperature patterns?
This could be determined from the “Temperature” data.
What are the effects of industrial pollution on the migration path of the bird?
Correct, there is no data collected on pollution in the bird’s environment.
Using computers, researchers often search large data sets to find interesting patterns in the data. Which is of the following is not an example where searching for patterns is needed to gather desired information?
An online shopping company analyzing customers purchase history to recommend new products.
This is an example of searching for patterns to gather desired information.
A high school analyzing student attendance records to determine which students should receive a disciplinary warning.
Correct, there is no need here for pattern analysis, just sorting the data to get a list of students with poor attendance records.
A credit scoring company analyzing purchase history of clients to identify cases of identity theft.
This is an example of searching for patterns to gather desired information.
A college analyzing high school students’ GPA and SAT scores to assess their potential college success.
This is an example of searching for patterns to gather desired information.
A car hailing company uses an app to track the travel trends of its customers. The data collected can be filtered and sorted by geographic location, time and date, miles travelled, and fare charged for the trip. Which of the following is least likely to be answerable using only the trends feature?
What time of the day is the busiest for the company at a given city.
Filtering by geographic location and sorting through time information would yield this information.
From which geographical location do the longest rides originate.
Sorting through miles travelled and noting geographic location would yield this information.
How is competition with the local cab companies affecting business in a given district.
Correct, there is no information on the competition available in the data collected.
How much money was earned by the company in a given month.
Filtering by date and summing up fares charged would yield this information.
An online music download company stores information about song purchases made by its customers. Every day, the following information is made publicly available on a company website database.
The day and date of each song purchased.
The title of the song.
The cities where customers purchased each song.
The number of times each song was purchased in a given city.
An example portion of the database is shown below. The database is sorted by date and song title.
Day and Date
Song Title
City
Number of Times Purchased
Mon 07/10/17
Despacito
Boston, MA
117
Mon 07/10/17
Malibu
Chicago, IL
53
Mon 07/10/17
Malibu
New York, NY
197
Mon 07/10/17
Bad Liar
Anchorage, AK
11
Tue 07/11/17
Despacito
San Diego, CA
241
Which of the following cannot be determined using only the information in the database?
The song that is purchased the most in a given week.
This information can be found by summing all the purchases of every song in a given week.
The city with the fewest purchases on a particular day.
This information can be found by summing all the purchases of every city on a given day.
The total number of cities in which a certain song was purchased in a given month.
This information can be found by listing the cities for all the purchases of a given song in a given month.
The total number of songs purchased by a particular customer during the course of a given year.
Correct, there is no data stored on individual customers.