02.2 - Missing Data Treatment - DATS

02.2 - Missing Data Treatment - DATS

Recall the meaning of Outlier and how to find them in our data

Outlier

outlier

According to [[Douglas Hawkins]]

An outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism.

Types of outliers

Mild outlier

mild outlier

x0 is a Mild [[#Outlier]] if:
x0[Q11.5IQR;Q3+1.5IQR]

Severe outlier

Outlier detection

Univariate and multivariate outlier detection

When a data sample is given, we can often assumed it to be generated by one basic generating mechanism (ex. a probability distribution like the logistic one). Often times, the same data can be generated by 2 or more mechanisms. This could cause one set of data to have observations in 2 completely different classes of similar objects. When looking at the data through one generating mechanism, the other class would appear as a big group of outliers.

Univariate outlier detection

Outliers in boxplots

The boxplot is a graphical display for Exploratory Data Analysis, where the outliers appear tagged. In the boxplot we are able to show both #Types of outliers.

Outlier 2024-11-14 10.37.32.excalidraw.png

Multivariate outlier detection

Outliers can be detected by computing the distance between the central point of data, by means of an iterative algorithm:
DM2(i,G)=(xiG)V1(xiG)

Once we find outliers, there are several options we might consider: