One approach to classifying messages is by separating them into individual words. In this activity, you'll do this and then look at the average number of words in ham messages versus spam messages. In the next activity, you'll write a classifier based on what you've learned.
The following script might help you get started: it puts all the spam messages into a variable, and all the ham messages into another.
With those separate variables, you should be able to calculate the average number of words of each kind of message more easily.