Why do I bring up my detective interest in a series about

**sorting**data?

Because in any detective show there are potential suspects that the detective must discover, question and then eliminate or pursue. Some people answer his questions truthfully. Others avoid, evade, mislead or obstruct his investigation. What must the detective do to solve the case?

A detective must

**sort out the liars**. Find the deceptive, law-breaking people. Then Book 'Em.

In math, we have a similar task. We must

**sort the outliers**.

*What do you mean by that?*you ask.

An

**outlier**is an data point that is markedly different from the majority of our samples. We don't have a precise mathematical definition of an

**outlier**; we have to make those decision ourselves. But you'll know one when you see one, as I outline the process of outing the outliers.

Let's look at the data from our 25 countries. [click on either chart to enlarge it]

First I sorted our countries by population. The average population is shown in by a line in

**RED**. It's in

**BLUE**when we exclude China, and in

**GREEN**when we exclude the top two and bottom two samples. The average population is probably about 7.5 million.

Now I have resorted the data by area in square kilometers. The average area is shown in

**RED**. It's in

**BLUE**when we exclude China, and in

**GREEN**when we exclude the top two and bottom two samples. The average area is probably about 215,000 sq km.

Outliers show up most clearly when you plot data points on a graph. Here's the same data plotted on charts. See the obvious outlier(s), outlined in

**red**?

If people living in China hadn't visited this site recently, our outliers would be defined differently. Let's take a look. Here are the new outliers, in

**green**.

This looks dramatically different. The outliers are Burma (population) and Libya (area), and the scales on our graph cover a much smaller range of numbers.

We'll finish our

**sorting**tomorrow.