Lab 1: Search Algorithms and Efficiency
In a reversal of their Unit 2 number guessing game, students design an algorithm that identifies the player's secret number by: guessing, asking the player if the guesses are too high or too low, and guessing again until guessing right. Students use this as a basis for developing and analyzing list searching algorithms for ordered and unordered lists.
An efficient list searching algorithm depends on the fact that numbers have order. Searching an ordered list to find the location of a given object is most efficiently done with essentially the same algorithm. Searching for the location of an object in an unordered list cannot be done the same way. The time that it takes these two algorithms to find their objects hardly matters when the lists are small, but matters a lot when the lists get large. This is students' first experience analyzing computational complexity.
From Dan:
- Given a ton of card decks, have the class figure out the fastest INDIVIDUAL time to start from a shuffled deck to a sorted deck (students can choose what "in order" means for that, but it has to be a unique order, so "reds in front of blacks" isn't enough — people always choose to sort by either rank or suit). Talk about the different algorithms that evolved, and what were fastest.
- Tell the class they can work as a TEAM to sort a shuffled desk the fastest. The teams can be any size. Let them continue to adjust the algorithms and sizes of teams and personnel on the teams to achieve the fastest scores. Talk about the different algorithms that evolved, and what were fastest.
This is a great introduction to concurrent algorithms, and allows the kids to come up with algorithms on their own, and builds team spirit. It's really fun and useful to plot the fastest times vs the number of players, and talk about Amdahl's law. (Added in by MF) Brian suggested the placement in the middle of Lab 3.
From L3:
Students compare and contrast algorithms based on their code and running times. Students write a program that times a computation in milliseconds and then time several processes and compare the results. Students also learn the differences between algorithms that run in reasonable time and algorithms that do not. Algorithms that run in reasonable time are further divided into constant time processes (ones in which computing time is not affected by the magnitude of the input), linear time (computing time proportional to the magnitude of the input), and polynomial time (slower than linear, computing time proportional to some polynomial function of the input).
Pacing:
The 7 required lab pages could be split across 4–7 days (
150–300 minutes). Expected times to complete follow:
Lab Pages
Solutions
Correlation with 2020 AP CS Principles Framework
Computational Thinking Practices: Skills
- 1.A: Investigate the situation, context or task.
- 1.D: Evaluate solution options.
Learning Objectives:
-
AAP-2.P: For binary search algorithms:
- Determine the number of iterations required to find a value in a data set. (1.D)
- Explain the requirements necessary to complete a binary search. (1.A)
-
AAP-4.A: For determining the efficiency of an algorithm:
- Explain the difference between algorithms that run in reasonable time and those that do not. (1.D)
- Identify situations where a heuristic solution may be more appropriate. (1.D)
- AAP-4.B: Explain the existence of undecidable problems in computer science. (1.A)
-
CSN-2.A: For sequential, parallel, and distributed computing:
- Compare problem solutions. (1.D)
- Determine the efficiency of solutions. (1.D)
- CSN-2.B: Describe benefits and challenges of parallel and distributed computing. (1.D)
Essential Knowledge:
- AAP-2.O.1: Traversing a list can be a complete traversal, where all elements in the list are accessed, or a partial traversal, where only a portion of elements are accessed.
- AAP-2.O.5: Linear search or sequential search algorithms check each element of a list, in order, until the desired value is found or all elements in the list have been checked.
- AAP-2.P.1: The binary search algorithm starts at the middle of a sorted data set of numbers and eliminates half of the data; this process repeats until the desired value is found or all elements have been eliminated.
- AAP-2.P.2: Data must be in sorted order to use the binary search algorithm.
- AAP-2.P.3: Binary search is often more efficient than sequential/linear search when applied to sorted data.
- AAP-4.A.1: A problem is a general description of a task that can (or cannot) be solved algorithmically. An instance of a problem also includes specific input. For example, sorting is a problem; sorting the list (2,3,1,7) is an instance of the problem.
- AAP-4.A.2: A decision problem is a problem with a yes/no answer (e.g., is there a path from A to B?). An optimization problem is a problem with the goal of finding the "best" solution among many (e.g., what is the shortest path from A to B?).
- AAP-4.A.3: Efficiency is an estimation of the amount of computational resources used by an algorithm. Efficiency is typically expressed as a function of the size of the input.
- AAP-4.A.4: An algorithm's efficiency is determined through formal or mathematical reasoning.
- AAP-4.A.5: An algorithm's efficiency can be informally measured by determining the number of times a statement or group of statements executes.
- AAP-4.A.6: Different correct algorithms for the same problem can have different efficiencies.
- AAP-4.A.7: Algorithms with a polynomial efficiency or slower (constant, linear, square, cube, etc.) are said to run in a reasonable amount of time. Algorithms with exponential or factorial efficiencies are examples of algorithms that run in an unreasonable amount of time.
- AAP-4.A.8: Some problems cannot be solved in a reasonable amount of time because there is no efficient algorithm for solving them. In these cases, approximate solutions are sought.
- AAP-4.A.9: A heuristic is an approach to a problem that produces a solution that is not guaranteed to be optimal but may be used when techniques that are guaranteed to always find an optimal solution are impractical.
- AAP-4.B.1: A decidable problem is a decision problem for which an algorithm can be written to produce a correct output for all inputs (e.g., Is the number even?).
- AAP-4.B.2: An undecidable problem is one for which no algorithm can be constructed that is always capable of providing a correct yes-or-no answer.
- AAP-4.B.3: An undecidable problem may have some instances that have an algorithmic solution, but there is no algorithmic solution that could solve all instances of the problem.
- CSN-2.A.1: Sequential computing is a computational model in which operations are performed in order one at a time.
- CSN-2.A.2: Parallel computing is a computational model where the program is broken into multiple smaller sequential computing operations, some of which are performed simultaneously.
- CSN-2.A.3: Distributed computing is a computational model in which multiple devices are used to run a program.
- CSN-2.A.4: Comparing efficiency of solutions can be done by comparing the time it takes them to perform the same task.
- CSN-2.A.5: A sequential solution takes as long as the sum of all of its steps.
- CSN-2.A.6: A parallel computing solution takes as long as its sequential tasks plus the longest of its parallel tasks.
- CSN-2.A.7: The "speedup" of a parallel solution is measured in the time it took to complete the task sequentially divided by the time it took to complete the task when done in parallel.
- CSN-2.B.1: Parallel computing consists of a parallel portion and a sequential portion.
- CSN-2.B.2: Solutions that use parallel computing can scale more effectively than solutions that use sequential computing.
- CSN-2.B.3: Distributed computing allows problems to be solved that could not be solved on a single computer because of either the processing time or storage needs involved.
- CSN-2.B.4: Distributed computing allows much larger problems to be solved quicker than they could be solved using a single computer.
- CSN-2.B.5: When increasing the use of parallel computing in a solution, the efficiency of the solution is still limited by the sequential portion. This means that at some point, adding parallel portions will no longer meaningfully increase efficiency.