Self-Check: Big Data

On this page, you will prepare for data questions on the AP exam.

DAT-2.C.6, DAT-2.C.7, DAT-2.C.8

The size of a data set affects the amount of information that can be extracted from it. The datasets you work with in this course are small compared to the "big data" used to look at trends in Internet searches, environmental research, or financial technology. Large data sets (billions or trillions of entries) are difficult to process using a single computer and may require parallel computing across multiple systems; scalability becomes an issue because the computational capacity of a system can impact how data sets can be processed and stored.

    DAT-2.A
  1. These questions are similar to those you will see on the AP CSP exam.
    Scientists studying birds often attach tracking tags to migrating birds. For each bird, the following data is collected regularly at frequent intervals:
    • Date and time
    • Latitude and Longitude
    • Altitude
    • Temperature
    Which of the following questions about a particular bird could not be answered using only the data gathered from the tracking tags.
    Approximately how much time does the bird spend in the air and on the ground?
    Does the bird travel in groups with other tracked birds?
    Is the migration path of the bird affected by temperature patterns?
    What are the effects of industrial pollution on the migration path of the bird?
    Using computers, researchers often search large data sets to find interesting patterns in the data. Which is of the following is not an example where searching for patterns is needed to gather desired information?
    An online shopping company analyzing customers purchase history to recommend new products.
    A high school analyzing student attendance records to determine which students should receive a disciplinary warning.
    A credit scoring company analyzing purchase history of clients to identify cases of identity theft.
    A college analyzing high school students’ GPA and SAT scores to assess their potential college success.
    A car hailing company uses an app to track the travel trends of its customers. The data collected can be filtered and sorted by geographic location, time and date, miles traveled, and fare charged for the trip. Which of the following is least likely to be answerable using only the trends feature?
    What time of the day is the busiest for the company at a given city.
    From which geographical location do the longest rides originate.
    How is competition with the local cab companies affecting business in a given district.
    How much money was earned by the company in a given month.
    An online music download company stores information about song purchases made by its customers. Every day, the following information is made publicly available on a company website database.
    • The day and date of each song purchased.
    • The title of the song.
    • The cities where customers purchased each song.
    • The number of times each song was purchased in a given city.
    An example portion of the database is shown below. The database is sorted by date and song title.
    Day and Date Song Title City Number of Times Purchased
    Mon 07/10/17 Despacito Boston, MA 117
    Mon 07/10/17 Malibu Chicago, IL 53
    Mon 07/10/17 Malibu New York, NY 197
    Mon 07/10/17 Bad Liar Anchorage, AK 11
    Tue 07/11/17 Despacito San Diego, CA 241
    Which of the following cannot be determined using only the information in the database?
    The song that is purchased the most in a given week.
    The city with the fewest purchases on a particular day.
    The total number of cities in which a certain song was purchased in a given month.
    The total number of songs purchased by a particular customer during the course of a given year.