Perform goodness of fit test between an observation and a known distribution

Goodness of Fit Test

Explanations & examples:

In a Goodness-Of-Fit (GOF) Test, which is a special version of a chi-squared-test, it is tested whether an observed distribution of data is is equal to an already known distribution. In other words whether the observed data follows the known distribution or not. In the first table the known distribution is entered, in the second table the observed distribution.

The known distribution, that the observed data is being compared with, can be entered either as percentages or as number values by clicking the respective option in the menu above. If the known distribution is known as percentual values, then the percentages entered in the first table should sum to 100. For example 20%, 15%, 30%, 10%, 25%. If, on the other hand, the known distribution is known by number values, then number values should be entered into the first table, for example 95, 71, 143, 48, 119.

Values for the observed data entered into the second table below must always be entered as number values and not percentages. Regardless of whether the known distribution was entered as percentages or number values.

To see the formulas used in the calculations of the GOF-test and the corresponding p-value, please see the page formulas.

Example 1

: Known distribution as percentage values.

In a certain country, the latest election between 10 political parties resulted in the following distribution of the votes:

Political Party	A	B	C	D	E	F	G	H	I	J
Percentual share of the votes	26.3%	4.6%	3.4%	4.2%	7.5%	0.8%	21.1%	19.5%	7.8%	4.8%

In a survey three years later, a sample of 1029 people were asked what they would vote if there was election tomorrow. The result of the survey gave the following distribution:

Political Party	A	B	C	D	E	F	G	H	I	J
No. of persons that would vote for that party	283	60	31	40	88	9	192	179	90	57

We want to investigate whether the distribution in the survey sample is corresponding to the known distribution from the election or if the voters (overall considered) have changed their minds since the election. We therefore write down the null hypothesis (H0):
The distribution of the voters of the parties is unchanged since the election.
This null hypothesis can of course be phrased in more than one way. An alternative way would be "there is no statistically significant difference in the way voters vote today compared with the way they voted in the election." Both the known distribution and the sample observation is put into the tables:

Input tables over the distributions from a goodness of fit test

Input tables over the distributions from a goodness of fit test

The result and p-value from a goodness of fit test with known percentual distribution

The p-value is 12.83% (0.1283) which is more than 5% (0.05). So we can not reject the null hypothesis, saying that the distributions are not significantly different on a five percent significance level. In other words the distribution in the survey could be the same as the known distribution of the election result. The differences we observe between the observed values in the survey and the values we would have expected if the distribution in the survey had been exactly the same as in the election are not significant. So we conclude that the people in the country haven't changed their political standpoint and if there was an election tomorrow, the result would be about the same as at the last election. Note that the chi-square statistic (the number χ²) is smaller than the critical value in this case, since the p-value is larger than 0.05. If p < 0.05 then the chi-square number would have been larger than the critical value (on a five percent significance level).

Example 2

: Known distribution as number values.

A candy factory produces gummy bears in the colours red, blue, green, yellow and purple. We have an assumption that the machinery in the factory is programmed to produce the same no. of each colour gummy bear, in other words; 20 % of each colour. To test this hypothesis, we pick a sample of 200 gummy bears from the factory and count the no. of red, blue, green, yellow and purple gummy bears in the sample. We find the following distribution:

Colour	Red	Blue	Green	Yellow	Purple
No. of bears	33	36	47	30	54

The null hypothesis (H0) in this case is: The factory produces the same amount of each colour gummy bear. Or, phrased alternatively; there is no statistically significant difference in the percentages of each colour gummy bear produced. To test the hypothesis we input the data into the tables, both the hypothetical numbers from the known distribution and the numbers from the sample.

Input tables over the distributions from a goodness of fit test. Both the known distribution and the sample are with data values

Input tables over the distributions from a goodness of fit test. Both the known distribution and the sample are with data values

The result and p-value from a goodness of fit test with the known distribution as not values instead of percentages

The p-value in this case is 0.0364 which is below 0.05. So in this case we reject the null hypothesis, claiming that there was no statistically significant difference in the percentages produced of each colour gummy bear. Having rejected this, there is a difference in the percentual distribution of the colours of the gummy bears. The factory doesn't produce the same percentage of red, blue, green, yellow and purple, but rather somewhat more purple and green gummy bears.

Known distribution is given by?
Percentual values	Expected values

Columns:

Show:

Expected values
Percentual deviation
Contributions to \( \chi^2 \)

Decimals:

*Enter the distribution in the corresponding cells*
					Total
					Total
%	%	%	%	%	%

*Enter the observations in the corresponding cells*
					Total
					Total

Results

Degrees of Freedom (DF)	Critical Value ( 5% )	Chi Square ( \( \chi^2 \) ) Value	P Value