The inter quartile range (IQR) is the difference of the 1st and 3rd quartile. This interval includes 50% of all values of a quantity. For the computation of the inter quartile range at least 12 measured values are required.
The skewness
indicates, how symmetrical or non-symmetrical the measurement distribution is.
The excess kurtosis
indicates, how thin or thick the tails of the measurement distribution, in comparison to the Gaussian normal distribution.
Skewness | distribution | excess kurtosis | distribution | |
---|---|---|---|---|
< 0 | left-skewed | < 0 | thin-tailed (blunt peaked) | |
= 0 | symmetrical | = 0 | normal shape (Gaussian bell) | |
> 0 | right-skewed | > 0 | thick-tailed (sharp peaked) |
This plot yields a fast graphical method to assess the deviation of a sample distribution from normality. The measured values are plotted on the horizontal axis against the normal order statistic medians on the vertical axis. In case of a Gaussian normal distribution all points in the plot should be close to collinear. Deviations of this may indicate non-symmetric (skewed) or thick-tailed, i.e. peak-shaped, or thin-tailed distributions. Isolated points at the left-bottom or right-top of the graphic are outlier-suspected.
The measured values are treated as independent, identically Gaussian normal distributes random variates. If the measured values only approximately follow a normal distribution then some tests still yield correct results, provided that the number of measurements is not too small. Correlations would falsify the result.
All tests are computed, which are theoretically computable, even if a different test would less probably bring about a false decision. Example: If the standard deviation of the measurements is a priori correctly known then the w-Test after Baarda less probably brings about a false decision than the Pope-test and the Z-test less probably brings about a false decision than the t-test.
Acceptance region : The null hypothesis Ho is accepted if the test statistic
N(μ,σ) | Gaussian normal distribution with expectation μ and standard deviation σ |
t(r) | t-distribution r degrees of freedom |
τ(r) | τ-distribution r degrees of freedom |
χ²(r) | χ²-distribution r degrees of freedom |
F(r,r') | F-distribution r and r' degrees of freedom |
This test checks the hypothesis of Gaussian normal distribution. The test statistic measures the difference between the empirical distribution of the sample and the normal distribution, giving more weights to the tails of the distribution than similar tests, e.g. the Cramér–von-Mises criterion.
The Anderson-Darling test is computed in up to four variants:
The first three variants work only if the required values have been given.
Every year in the third semester of the curriculum Surveying/Geoinformatics of every student has to measure tacheometric point positionings of the same points. These can be treated as independent repeated measurements. In the years 2010 and 2011 the following results have been obtained for a chosen point:
Year | Determination of the point height, unit=metre |
---|---|
2010 | 116.774 116.755 116.755 116.751 116.742 116.745 116.760 116.754 116.753 116.739 116.752 116.747 116.732 116.752 116.736 116.764 116.738 116.765 116.757 116.750 116.741 116.759 116.751 116.753 116.734 116.737 116.757 116.730 116.755 |
2011 | 116.764 116.748 116.758 116.743 116.757 116.659 116.744 116.754 116.761 116.762 116.769 116.741 116.747 116.738 116.744 116.750 116.746 116.736 116.760 116.762 116.760 116.756 116.739 116.754 116.728 116.745 116.737 116.750 |
In the exercise network this point has the nominal height of 116.767 m . Despite of lacking routine the students can be asked for a standard deviation of the determination of σo=0.01 m . The repeated measurements shall be tested statistically with a probability of type I decision error of α=0.05 .
The normal probability plot
shows the distribution of the measured values (dots) relative to a best fitting Gaussiannormal distribution (straight line).
Immediately it is seen that at the left bottom there is an isolated red (=2011) dot. This clearly indicates an outlier.
The remaining dots scatter around the blue (=2010) line.
In the middle zone of the dots (around the median) the blue dots are located right of the red (=2011) dots.
Consequently, the median of the 2010 measurements is larger by
3 mm.
First of all, it can be tested if the required measurement precision has been realized, i.e. if Ho:σ≤σo can be assumed. This is done by the right-tailed global test, which for the measurement series Year 2011 is rejected. This means that the measurement precision has probably not been realized. Strictly speaking, the probability, that the measurement precision has been realized and the global test is rejected nonetheless, is α=0.05.
Also the w-test after Baarda detects an outlier for Year 2011 . This is the measured value 116.659, which is most departed from the mean. If you eliminate this value and repeat the computation, both the left-tailed global test as well as the w-test are accepted. However, it could be irritating that also the right-tailed global test Ho:σ≥σo is accepted. The reason for this phenomenon as follows: If α is chosen small enough, the null hypotheses of all statistical tests are always accepted. And for a small number of measurements, α=0.05 is practically small.
If σo=0.01 would have been unknown then the outlier could have been detected by the τ-test after Pope.
The posterior standard deviations of a single determination of the point height are estimated by 0.0107 m for Year 2010 and 0.0102 m for Year 2011 . The answer to the question if after elimination of the outlier the measurement precision of both years should be treated as identical, is given by the two-tailed F-test with Ho:σx=σy. This hypothesis is accepted. Therefore, the precision in the Year 2011 was not significantly higher.
The means amount to 116.7496 m for Year 2010 and 116.7501 m for Year 2011 . The answer of the question, if after elimination of the outlier the expected values of both series coincide, yields the double-sample Z-test with Ho:μx=μy . The hypothesis is accepted. Therefore, the expected values should be treated as identical.
The hypothesis that the nominal value of that point μo=116.767 is identical to the mean, is tested by the two-tailed single-sample Z-test with Ho:μ=μo . This hypothesis is rejected for both years. The conclusion could be that the nominal value is not correct or that one control height used in all determinations of instrument heights was not correct and this has not been detected during station setup.
Since both measurement series do not show significant differences, it is possible to merge them into one series and repeat the computation:
As in the cases before, the nominal height value of μo=116.767 is rejected. This value is either incorrect or the heights are biased. Moreover, the hypothesis of normal distribution with the parameters μo=116.767 is rejected. Without this parameter the Anderson-Darling-test succeeds.
Two series of Gaussian normal distributed pseudo random numbers N(53.06;16.10) from www.random.org/gaussian-distributions are investigated.
One series of Laplace distributed and one series of χ²(1) distributed pseudo random numbers, 100 values each, computed by GNU Octave are investigated. The true parameters of the distributions are
series | expectation | median | stddev. | inter quartile range | skewness | excess kurtosis |
---|---|---|---|---|---|---|
Laplace | 0 | 0 | 1.414 | 1.386 | 0 | 3 |
χ²(1) | 1 | 0.455 | 1.414 | 1.222 | 2.828 | 12 |
|
Ausgleichungslehrbücher |
|