Sunday, 21 December 2014

Finding Outliers - Simple Statistical Way

Outliers? Why do we need them?
Too big or too small value is outlier in statistics. We need to consider outliers separately for good business decisions and beautiful graphs. Business decisions are mostly taken around (a) most frequent case OR around center of the population and (b) exceptions i.e. outliers and they are far away points from the center. My take is outliers should never be rejected from data but be treated as special cases for further analysis. Outliers may open new doors of possibilities.
The most common way to find an outlier is by using quartiles or percentiles. Quartile 1(Q1) == 25th Percentile,Quartile 2(Q2) == Median == 50th Percentile & Quartile 3(Q3) == 75th Percentile. To find out the accepted data range we need to find the spread of the data and most of the times it is done using either by Standard Variation or by IQR(Interquartile Range) i.e. (Q3-Q1) in simple way. Below figure gives the simple formula to find the outlier.
In simplest way any data point which is out of the (Q1 - 1.5 * IQR) and (Q3 + 1.5 * IQR) they can be treated for outlier analysis. There are many other methods but this one makes most sense based on common business analysis. This 1.5 is statistically negotiated value i.e. constant.
Here are a couple of examples from web search :)

No comments:

Post a Comment