| Definitions -
Parameter: A characteristic or measure obtained by using ALL the data values in a population (the mean, for example) Point estimate: A specific numerical estimate of a parameter. Interval estimate: An interval or range of values used to estimate a parameter. Confidence level: The confidence level of an interval estimate is the probability that the interval estimate will contain the parameter. Confidence interval:
A specific interval
estimate of a parameter determined
by using data obtained from a sample and
the specific confidence level of the estimate.
|
| Remember that the Central Limit Theorem stated that
as the sample size increases, the shape of the distribution of sample means
will approach a normal distribution. This means it will have essentially
the same properties as the Normal Distribution, i.e., 68% of the sample
means will fall within 1 standard deviation (or standard error) of the
population mean.
Looking at Table 3, you see that 95% of the sample
means will fall within 1.96 standard errors of the population mean (1.96
equates to an area of 0.475 - this area on each side of the mean adds to
0.95, or 95%, of the total). Mathematically, this is the same as
saying the sample mean (X) would be equivalent to:
|
| Remember from Lesson 7 that the z-value for sample
means was given by:
_
Using some algebra, you can derive the formula for
a confidence interval by following the steps shown. While the math
is somewhat complicated, it is important to know what the equation tells
us:
“The mean of a population (m) will be contained within an area of : +za/2{s/SQRT(n)} of the sample mean X, where za/2 is the z-value (from Table E) that corresponds to that area!
The term ......... E = +za/2{s/SQRT(n)} is called the maximum error of the estimate, and is defined as follows: Maximum error of the estimate (E): The maximum difference between the point estimate of a parameter and the actual value of the parameter
|
|
Let’s look at a run-down of how this works... Say you wish to know the average age of the students at Peru State, and you want to be 95% confident that the answer you give is correct. You take a sample of 40 students, and you note the mean of the ages of these students is 28.5 years. From previous studies, you learn that the standard deviation of age from the mean is 2 years. 1. Since you want the 95% confidence interval of ages that contain the true mean of the entire population (based on your 100 student sample), the first thing you need to do is to calculate Za/2 : a. a = 1 - confidence
interval desired (95%, or .95, in this case)
b. a/2 = 0.05/2 = 0.025 c. Subtract a/2
from 0.5 (1/2 the area under the Normal Distribution curve)
d. Go to Table E and find the z-value that
corresponds to 0.475 (1.96)
2. Now that we know this, we can solve the problem. Since s = 2 years, and n = 40, we simply sub these into the formula to get: Za/2{s/SQRT(n)} = (1.96)(2/?40) = 0.62 3. We know X, our sample mean, was found to be 28.5 years, then putting this into our formula yields: (28.5 - 0.6) < m < (28.5 + 0.6) -or- 27.9 < m < 29.1 or more simply, 28.5 + 0.6 years We can now say, with 95% confidence, that the average
age of the students at Peru State is between 27.9 and 29.1 years, based
on our 40-student sample!
**Note: When using a sample mean and a standard deviation, as we will here in this book, always round to the same decimal point as the given mean.
|
|
On the other side of the coin, given we know what confidence level we wish to achieve, what the population standard deviation is, and our maximum error of the estimate, we can then ask: “What minimum sample size will I have to use to achieve that confidence level? What do we know? 1. We have our confidence level...this will let us calculate Za/2 2. We have the population standard deviation (s) 3. We have the maximum error of the estimate (E) 3. We also know the formula for the maximum error of estimate: E = +za/2{s/SQRT(n)} so we just have to solve this for n, the required minimum size of our sample a. Multiply both sides of the equation by the square root of "n" b. Divide both sides by E c. Square both sides - this yields the formula for n! (See Example 8-4) When the standard deviation is known and the variable is normally distributed, the process described above will work. It will also work if the standard deviation is not known, as long as the sample size is greater than 30. But what if the standard deviation is not known and the sample size is less than 30? In these cases, we must use a slightly different distribution, known as the t distribution. (See the green box at the top of page 330 for the characteristics of this distribution)
|
| Confidence Intervals for
the Mean - s
unknown and n < 30
The t-distribution actually describes a “family” of
curves, which differ according to a specific variable, known as the “degrees
of freedom”. These degrees of freedom are the number of values in
a sample that are free to vary after a statistic (such as the mean) for
the sample has been computed.
For example: given a sample of 5 values: 4, 6, 8, 10, 12 The mean of this sample is 8. Now that we’ve calculated that, throw out the 5 values and start putting in new numbers: say the first is 7
Can we still build a 5-value sample with a mean of 8? sure!
But - adding these four arbitrary (aka “free”) data values up yields forty. What value must we put in there that will yield a mean of 8 for the data set? We must use the number 10, since this will yield a mean of 8 for this new sample. So we had FOUR degrees of freedom (d.f.) for this sample
of FIVE numbers. The degrees of freedom will always be found by subtracting
1 from your sample size! You must take the degrees of freedom into
account when using the t-distribution.
The formula for finding a specific confidence interval
when the standard distribution is unknown and your sample size is less
than 30 is given in the green box at the top of page 331. The values
for ta/2
are given in Appendix A, Table F. Notice you’ll need to know the
desired confidence interval (95%, e.g.) and degrees of freedom. (disregard
the “One tail” and “Two tails” rows at the top of the chart...we’ll get
to them in Chapter 9)
If you have trouble knowing when to use the z-values
in the Normal Distribution or the t-values in the t-distribution,
follow the flow chart at the top of page 333.
|
| Confidence Intervals and
Sample Size for Proportions
When we work with proportions (12% of housewives, 20% of doctors, etc.), we use a different method for finding confidence intervals. We obtain the proportions from samples or populations, and proportions have a special set of symbols to help identify them. Those symbols can be found in the green box at the bottom of page 335. Here’s an example of how to find the values for
In a recent survey of 500 Americans, 190 were upset at the way the media was handling the current White House scandal. Find p and q . ^
^
|
| To compute a confidence interval when using proportions,
we use a slightly modified form of the formula for E:
^^
where we have an additional set of criteria similar
to what we saw earlier:
**The same method is used when computing Za/2
as
was discussed earlier. The only difference here is in the rounding:
round off to three decimal places when computing the confidence interval
for a proportion.
As before, computing the necessary sample size for
a set confidence interval is simply a matter of rearranging the formula
above to solve for n, and is given at the top of page 339 in the green
box.
**Note:
|
| Confidence Intervals for
Variances and Standard Deviations
Since variances and standard deviations are used all
the time in industry, the medical professions, farming, etc., it is important
that we know how to compute confidence intervals and sample sizes for these
as well. But to do that we need yet another type of statistical distribution:
the "Chi-square" distribution. Note: c
is
the Greek letter "Chi".
It’s similar to the t-distribution in that it is a
family of curves based on degrees of freedom. Table G in Appendix
A gives values for the Chi-square distribution. You’ll notice that
it looks a little different from the ones we’ve seen. There are five
columns to the left and five to the right, and are used independently to
come up with the values we want.
Here’s how it works:
To find the values we need for a 95% confidence interval, 1. Get a by subtracting 1- confidence interval (here 1 - 0.95 = 0.05) 2. Compute a/2 : 0.05 / 2 = 0.025 This is the column on the right side of the table, and will be used to determine c2 3. Subtract a/2 from 1 to get 0.975 This is the column on the left side of the table, and will be used to calculate c1 4. Find the appropriate number of degrees of freedom (remember d.f. = n-1) 5. Then simply plug these values into the formula listed in the green box in the middle of page 344. Examples 8-13 and 8-14 will help guide you through the process, if you get stuck. Rounding is done to the same number of decimal points as those given in the variance or standard deviation.
|
| HOMEWORK:
Read the rest of chapter 6 in the text. |