A sample is a subset of a population. We can study a sample to infer conclusions about the population itself. For example, if all the stocks trading in the US are considered a population, then indices such as the S&P 500 are samples. We can look at the performance of the S&P 500 and draw conclusions about how all stocks in the US are performing. This process is known as sampling and estimation.
There are various methods for obtaining information on a population through samples. The information we obtain usually concerns a parameter, a quantity used to describe a population. To estimate a parameter, we use sample statistics. A statistic is a quantity used to describe a sample.
There are two reasons why sampling is used:
Simple random sampling is the process of selecting a sample from a larger population in such a way that each member of the population has the same probability of being included in the sample.
If we draw samples of the same size several times and calculate the sample statistic, the sample statistic will be different each time. The distribution of values of the sample statistic is called a sampling distribution.
For example, say you select 100 stocks from a universe of 10,000 stocks and calculate the average annual returns of these 100 stocks. Let’s say you get an average return of 15%. You repeat this process with a second sample of 100 stocks. This time, you get an average return of 14%. You keep repeating this process and each time you get a different average return. The distribution of these sample average returns is called a sampling distribution.
Sampling error is the difference between a sample statistic and the corresponding population parameter. The sampling error of the mean is given by:
For example, let’s say you want to estimate the average returns of 10,000 stocks. You draw a sample of 100 stocks and calculate the average return of these 100 stocks as 15%. However, the actual average of the 10,000 stocks was 12%. Then the sampling error = 15% – 12% = 3%.
In stratified random sampling, the population is divided into subgroups based on one or more distinguishing characteristics. Samples are then drawn from each subgroup, with sample size proportional to the size of the subgroup relative to the population. Finally, samples from each subgroup are pooled together to form a stratified random sample.
The advantage of stratified random sampling is that the sample will have the same distribution of key characteristics as the overall population. This can help reduce the sampling error. Stratified random sampling therefore produces more precise parameter estimates than simple random sampling
For example, you divide the universe of 10,000 stocks as per their market capitalization such that you have 5,000 large cap stocks, 3,000 mid cap stocks, and 2,000 small cap stocks. In stratified random sampling, to select a total sample of 100 stocks, you will randomly select 50 large cap stocks, 30 mid cap stocks, and 20 small cap stocks and pool all these samples together to form a stratified random sample.
Paul wants to categorize publicly listed stocks for his research project. He first divides the stocks into 15 industries. Then from each industry, he categorizes companies into three groups: small, medium, large. Finally, he divides these into value versus growth stocks. How many cells or strata does the sampling plan entail?
C is correct. This is an application of the multiplication rule of counting. The total number of cells is the product of 15, 3, and 2. Thus the answer is 90.
Time-series data consists of observations for a single subject taken at specific and equally spaced intervals of time. For example, the monthly returns on Microsoft stock from January 1995 to January 2005.
Cross-sectional data consists of observations for multiple subjects taken at a specific point in time. For example, the sample of reported earnings per share of all NASDAQ companies for 2005.
For both time-series and cross-sectional data, the random sample must be representative of the population we wish to study.
A ‘longitudinal’ data or ‘panel’ data keeps the same sample for each observation over time.
The sample mean is a random variable with a probability distribution known as the statistic’s sampling distribution. To understand this concept, consider the following population: last year’s returns on every stock traded in the United States. We are interested in the mean return of all stocks but do not have time to calculate the population mean. Hence, we draw a sample of 50 stocks and compute the sample mean. We then draw another sample of 50 stocks and compute the sample mean. This exercise can be repeated several times giving us a distribution of sample means. This distribution is called the statistic’s sampling distribution. The central limit theorem, explained below, helps us understand the sampling distribution of the mean.
According to the central limit theorem, if we draw a sample from a population with a mean µ and a variance σ2, then the sampling distribution of the sample mean:
For example, suppose the average return of the universe of 10,000 stocks is 12% and its standard deviation is 10%. Through central limit theorem we can conclude that if we keep drawing samples of 100 stocks and plot their average returns, we will get a sampling distribution that will be normally distributed with mean = 12% and variance of 102/100 = 1%.
Standard error of the sample mean
The standard deviation of the distribution of the sample means is known as the standard error of the sample mean.
When we know the population standard deviation, the standard error of the sample mean can be calculated as:
When we do not know the population standard deviation (σ) we can use the sample standard deviation (s) to estimate the standard error of the sample mean:
The mean of a population is 12 and the standard deviation is 3. Given that the population comprises of 64 observations, what is the standard error of the sample mean?
A is correct. Standard Error = σ/√n = 3/√64 = 0.375