Instructor’s Note:
Focus on the basics of this topic, the probability of being tested on the details is low.
In tests concerning the variance of a single normally distributed population, we use the chi-square test statistic, denoted by χ2.
Properties of the chi-square distribution
The chi-square distribution is asymmetrical and like the t-distribution, is a family of distributions. This means that a different distribution exists for each possible value of degrees of freedom, n – 1. Since the variance is a squared term, the minimum value can only be 0. Hence, the chi-square distribution is bounded below by 0. The graph below shows the shape of a chi-square distribution:
After drawing a random sample from a normally distributed population, we calculate the test statistic using the following formula using n – 1 degrees of freedom:
where:
n = sample size
s = sample variance
We then determine the critical values using the level of significance and degrees of freedom. The chi-square distribution table is used to calculate the critical value.
Example
Consider Fund Alpha which we discussed in an earlier example. This fund has been in existence for 20 months. During this period the standard deviation of monthly returns was 5%. You want to test a claim by the fund manager that the standard deviation of monthly returns is less than 6%.
Solution:
The null and alternate hypotheses are: H0: σ2 ≥ 36 versus Ha: σ2 < 36
Note that the standard deviation is 6%. Since we are dealing with population variance, we will square this number to arrive at a variance of 36%.
We then calculate the value of the chi-square test statistic:
c2 = (n – 1) s2 / σ02 = 19 x 25/36 = 13.19
Next, we determine the rejection point based on df = 19 and significance = 0.05. Using the chi-square table, we find that this number is 10.117.
Since the test statistic (13.19) is higher than the rejection point (10.117) we cannot reject H0. In other words, the sample standard deviation is not small enough to validate the fund manager’s claim that population standard deviation is less than 6%.
In order to test the equality or inequality of two variances, we use an F-test which is the ratio of sample variances.
The assumptions for a F-test to be valid are:
Properties of the F-distribution
The F-distribution, like the chi-square distribution, is a family of asymmetrical distributions bounded from below by 0. Each F-distribution is defined by two values of degrees of freedom, called the numerator and denominator degrees of freedom. As shown in the figure below, the F-distribution is skewed to the right and is truncated at zero on the left hand side.
As shown in the graph, the rejection region is always in the right side tail of the distribution.
When working with F-tests, there are three hypotheses that can be formulated:
The term σ12 represents the population variance of the first population and σ22 represents the population variance of the second population.
The formula for the test statistic of the F-test is:
where:
= the sample variance of the first population with n observations
= the sample variance of the second population with n observations
A convention is to put the larger sample variance in the numerator and the smaller sample variance in the denominator.
df1 = n1 – 1 numerator degrees of freedom
df2 = n2 – 1 denominator degrees of freedom
The test statistic is then compared with the critical values found using the two degrees of freedom and the F-tables.
Finally, a decision is made whether to reject or not to reject the null hypothesis.
Example
You are investigating whether the population variance of the Indian equity market changed after the deregulation of 1991. You collect 120 months of data before and after deregulation.
Variance of returns before deregulation was 13. Variance of returns after deregulation was 18. Check your hypothesis at a confidence level of 99%.
Solution:
Null and alternate hypothesis: H0: σ12 = σ22 versus HA: σ12 ≠ σ22
df = 119 for the numerator and denominator
α = 0.01 which means 0.005 in each tail. From the F-table: critical value = 1.6
Since the F-stat is less than the critical value, do not reject the null hypothesis.
Summary of types of test statistics
Hypothesis test of | Use |
One population mean | t-statistic or z-statistic |
Two population mean | t-statistic |
One population variance | Chi-square statistic |
Two-population variance | F-statistic |
The strength of linear relationship between two variables is assessed through correlation coefficient. The significance of a correlation coefficient is tested by using hypothesis tests concerning correlation.
There are two hypotheses that can be formulated (ρ represents the population correlation coefficient):
This test is used when we believe the population correlation is not equal to 0, or it is different from the hypothesized correlation. It is a two-tailed test.
As long as the two variables are distributed normally, we can use sample correlation, r for our hypothesis testing. The formula for the t-test is
Where, n – 2 = degrees of freedom if H0 is true.
The magnitude of r needed to reject the null hypothesis H0: ρ= 0 decreases as sample size n increases due to the following:
In other words, as n increases, the probability of Type-II error decreases, all else equal.
Example
The sample correlation between the oil prices and monthly returns of energy stocks in a Country A is 0.7986 for the period from January 2014 through December 2018. Can we reject a null hypothesis that the underlying or population correlation equals 0 at the 0.05 level of significance?
Solution:
→ true correlation in the population is 0.
→ correlation in the population is different from 0.
From January 2014 through December 2018, there are 60 months, so n = 60. We use the following statistic to test the above.
At the 0.05 significance level, the critical level for this test statistic is 2.00 (n = 60, degrees of freedom = 58). When the test statistic is either larger than 2.00 or smaller than 2.00, we can reject the hypothesis that the correlation in the population is 0. The test statistic is 10.1052, so we can reject the null hypothesis.
The hypothesis-testing procedures we have discussed so far have two characteristics in common:
Any procedure which has either of the two characteristics is known as a parametric test.
Nonparametric tests are not concerned with a parameter and/or make few assumptions about the population from which the sample are drawn. We use nonparametric procedures in three situations:
The Spearman rank correlation coefficient test is a popular nonparametric test. The coefficient is calculated based on the ranks of two variables within their respective samples.