## Thursday, October 13, 2011

### Let's Sort This Out, Part IV

I like to watch detective shows when I am not creating elementary math curriculum. I prefer Columbo, Rockford, Hawaii 5-O, Sherlock Holmes, Lovejoy, Murder She Wrote, and similar adventure / mystery stories.

Why do I bring up my detective interest in a series about sorting data?

Because in any detective show there are potential suspects that the detective must discover, question and then eliminate or pursue. Some people answer his questions truthfully. Others avoid, evade, mislead or obstruct his investigation. What must the detective do to solve the case?

A detective must sort out the liars. Find the deceptive, law-breaking people. Then Book 'Em.

In math, we have a similar task. We must sort the outliers.

What do you mean by that? you ask.

An outlier is an data point that is markedly different from the majority of our samples. We don't have a precise mathematical definition of an outlier; we have to make those decision ourselves. But you'll know one when you see one, as I outline the process of outing the outliers.

Let's look at the data from our 25 countries. [click on either chart to enlarge it]

First I sorted our countries by population. The average population is shown in by a line in RED. It's in BLUE when we exclude China, and in GREEN when we exclude the top two and bottom two samples. The average population is probably about 7.5 million.

Now I have resorted the data by area in square kilometers. The average area is shown in RED. It's in BLUE when we exclude China, and in GREEN when we exclude the top two and bottom two samples. The average area is probably about 215,000 sq km.

Outliers show up most clearly when you plot data points on a graph. Here's the same data plotted on charts. See the obvious outlier(s), outlined in red?

If people living in China hadn't visited this site recently, our outliers would be defined differently. Let's take a look. Here are the new outliers, in green.

This looks dramatically different. The outliers are Burma (population) and Libya (area), and the scales on our graph cover a much smaller range of numbers.

We'll finish our sorting tomorrow.