Removing Duplicates

On this page, you will learn a recursive technique to remove duplicates from a list.
distinct items from (list (apples) (carrots) (bananas) (oranges) (bananas) (bread) (carrots)) reporting { apples, oranges, bananas, bread, carrots}

Suppose you have a list of items and want to know if the elements of the list are distinct. For example, you might want to make sure that you don't have anything on your shopping list twice.

As a first step, we'll just answer the yes/no question: are there any duplicates on your list?
are the items of (list (apples) (carrots) (bananas) (oranges) (bananas) (bread) (carrots)) distinct? reporting false

  1. Experiment with the all but first of 'list input slot' block using a few different input lists. Talk with Your Partner What does this block report?
  2. Finish building the are the items distinct? predicate, which is started below. Since you're writing a predicate, your procedure should always report true or false.
    Try doing it without this hint.
    If the program gets to the third report block at the bottom, what does that tell you about the items in the list?
    are the items of (data) distinct?
{
	if (is (data) empty?)
	{
		report ()
	}
	if ((all but first of (data)) contains ())
	{
		report ()
	}
	report (are the items of () distinct?)
}
    Notice that this procedure calls itself at the end. (It is recursive.) This won't work if the input to that call is the same as the original input. So you can't say report (are the items of (data) distinct?). Instead, you have to use a smaller input value.
  3. Talk with Your Partner If you doubled the length of the list, would this algorithm take the same amount of time? Twice as long? More than twice as long?

For your grocery list, you wouldn't just want to know whether or not there are duplicates. You'd want a new list with the duplicates removed.

  1. Experiment with the () in front of 'list input slot' block using a few different inputs. Talk with Your Partner What does this block report?
  2. Build a distinct items from reporter using a structure similar to that of the are the items distinct? predicate.
    distinct items form (data)
{
	if (is (data) empty?)
	{
		report ()
	}
	if ((all but first of (data)) contains ())
	{
		report ()
	}
	report ()
}

    The algorithm for this block will make the same decisions as in are the items distinct?. But that was a predicate. This one has to report a list. So look at the three report blocks in your code for the are the items distinct? predicate, and decide what they should report for the distinct items from reporter.

    Need another hint?

    • If the list is empty, what should distinct items from report?
    • If the first item in the list appears in the rest of the list, it doesn't matter which copy you leave out. Is there an easy way to get a version of the list without one of those copies?
      • What if there are other duplicates in that list?
    • If computer makes it to the third report block, what does that tell you about the first item in the list? Do you want the first item as part of the list you report?

  3. Test it. Be sure to pick good test cases: More than one pair of duplicates, more than two items of the same value, duplicates right next to each other in the list, etc.
  4. "U5L1-Removing-Duplicates"Save your work as U5L1-Removing-Duplicates