Counting Words

One approach to classifying messages is by separating them into individual words. In this activity, you'll do this and then look at the average number of words in ham messages versus spam messages. In the next activity, you'll write a classifier based on what you've learned.

The following script might help you get started: it puts all the spam messages into a variable, and all the ham messages into another.

The images on this page don't look like Snap! --MF
Making separate lists of spam and ham messages

With those separate variables, you should be able to calculate the average number of words of each kind of message more easily.

  1. What's the average length of a spam message?
  2. Of a ham message?
  3. Because the math isn't the point of the exercise, here a Snap! block that calculates the average of a list of numbers:
    Image needs alt and title attribs. --MF