# What to do with not normally distributed Data lang: en_US

Normal Distribution data is required for many statistical tools that assume normality. This page gives some information about how to deal with not normally distributed data.

## Step 1

Do normally check Anderson Darling normality test with a high p value you can assume normality of the data. Develve assumes a p value above 0.10 as normally distributed. Develve is on the safe side some people say that 0.05 is enough to assume normality.

## Step 2

Find out why the data is possible not normally distributed.

### Mixture of various distributions

• Samples from different batches
• Samples from different dates
• Samples form different mold cavities
Try to sort the data in subgroups. This is possible in the DOE mode in Develve.

#### Example

In this example the data is sorted on the two production lines 1 and 2 and after sorting the data of the both production lines are normally distributed Column B and C, and the original data is in column A. Data file

#### Example 2

Sometimes the indication of a mixture of 2 different distributions is not clearly visible in the histogram but when looking to the normally plot there is a bend in line (see graph below). Data file

### Extreme values (outliers)

Too many outliers will result in non normality. If the outliers are special causes it wise to filter these data points. But be aware in normally distributed data-set you can expect some outliers. In normally distributed data a outlier is not always caused by a special cause.

When filtering the data you should analysis and explain why you can remove these outliers.

#### Example

In the example in column B is the filtered data and in column C are the outliers and in column A is the original data. After filtering the data is normally distributed. Data file

### Drift in measurement system

Look to the Time graph. Data file

### Cases that are not solvable by rearranging the data.

#### Sorted data

The data set is only a part of all the data and all the data outside the tolerance borders is filtered. Data file
On from left to right: the original data, without the data above the tolerance border, data without min max tolerance and only data above the upper tolerance.
##### This can happen when analyzing
• Field returns
• Line rejects
• Data without the rejects

#### Data is close to zero or a other limit

Data close to the zero or the optimum will tend to skew to the left. #### Low resolution of the measurement

Due low resolution of the measurement the data is rounded to the nearest digit. This leads to data that the data is grouped in small sets see graph. To solve this try to increase the measurement resolution. Use the histogram or the individual dot plot see if there is a rounding effect in the data.  Data file

#### Data is following an other distribution

• Lifetime data is often not normal distributed (wear out). This data is often following the Weibull or Lognormal distribution. For this data use Weibull analysis.
• Data is close to zero or a other limit
• Proportional data

#### Example

Use the Distribution fitting function Tools=>Distribution fitting. The graph with the highest Correlation coefficient (r²) is the best fitting distribution. Data file

## Step 3

If the case is not solvable by rearranging the data there are two options. Transform data or use a test that is not based on a normally assumption.

### Transform

With the Box-Cox transformation it is possible to transform non normal distributed data to a more normal distributed data-set see Box-Cox transformation.  Before transformation After transformation

### Test not based on normal assumption

 No normal assumption Based on normal assumption 1 sample Wilcoxon median test one sample t-test 2 sample Mann-Whitney median test 2 sample t-test Variation Levene test Variation F-test Kruskal-Wallis Test One way Anova