We often use inferential statistics to make decision or judgments about the value of a parameter, such as a population mean. One of the most commonly used methods for making such decisions or judgements is to perform a hypothesis test. A hypothesis is a statement taht something is true. Typically, a hypothesis test involves two hypotheses: the null hypothesis and the alternative hypothesis (or research hypothesis), which we define as follows.
- Null hypothesis: A hypothesis is to be tested. We use the symbol H0 to represent the null hypothesis.
- Alternative hypothesis: A hypothesis to be considered as an alternative to the null hypothesis. We use the symbol Ha to represent the alternative hypothesis.
- Hypothesis test: The problem in a hypothesis test is to decide whether the null hypothesis should be rejected in favor of the alternative hypothesis.
The first step in setting up a hypothesis test is to decide on the null hypothesis and the alternative hypothesis. The following are some guidelines for choosing these two hypotheses. Although the guidelines refer specifically to hypothesis tests for one population mean, μ, they apply to any hypothesis test concerning one parameter.
Null hypothesis for a hypothesis test concerning a population mean, μ, always specifies a single value for that parameter. Hence we can express the null hypothesis as H0: μ = μ0, where μ0 is some number. The choice of the alternative hypothesis depdends on and should reflect the purpose of the hypothesis test. Three choices are possible for the alternative hypothesis: 1) If the primary concern is deciding whether a population mean, μ, is different from a specified value μ0, we express the alternative hypothesis as, Ha: μ != μ0, where a hypothesis test whose alternative hypothesis has this form is called a two-tailed test. 2) If the primary concern is deciding whether a population mean, μ, is less than a specified value μ0, we express the alternative hypothesis as, Ha: μ < μ0, where a hypothesis test whose alternative hypothesis has this form is called a left-tailed test. 3) If the primary concern is deciding whether a population mean, μ, is greater than a specified value μ0, we express the alternative hypothesis as, Ha: μ > μ0, where a hypothesis test whose alternative hypothesis has this form is called a right-tailed test. A hypothesis test is called a one-tailed test if it is either left tailed or right tailed. It is not uncommon that an sample mean falls within the area of acceptance for a two-tailed test but falls within the area of rejection for a one-tailed test. Therefore, a researcher who wishes to reject the null hypothesis may sometimes find that using a one-tailed rather a two-tailed test allows a previously nonsignificant result to become significant. For this reason, it is important that one-tailed test must depend on the nature of the hypothesis being tested and should therefore be decided at the outset of the research, rather than being decided afterward according to how the results turn out. One-tailed tests can only be used when there is a directional alternative hypothesis. This means that they may not be used unless results in only one direction are of interest and the possibility of the results being in the opposite direction is of no interest or consequence to the researcher.
PS: Results form Wiki
A two-tailed test is appropriate if the estimated value may be more than or less than the reference value, for example, whether a test taker may score above or below the historical average. A one-tailed test is appropirate if the estimated value may depart from the reference value in only one direction, for example, whether a machine produces more than one-percent defective products.
The basic logic hypothesis testing is that: Take a random sample from the population. If the sample data are consistent with the null hypothesis, do not reject the null hypothesis; if the sample data are inconsistent with the null hypothesis and supportive of the alternative hypothesis, reject the null hypothesis in favor of the alternative hypothesis. Suppose that a hypothesis test is conducted at a small significance level: If the null hypothesis is rejected, we conclude that the data provide sufficient evidence to support the alternative hypothesis. If the null hypothesis is not rejected, we conclude that the data do not provide sufficient evidence to support the alterantive hypothesis. Another way of viewing the use of a small significance level is as follows: The null hypothesis gets the benefit of the doubt; the alternative hypothesis has the burden of proof.
When the null hypothesis is rejected in a hypothesis test performed at the signifiance level α, we frequently express that fact with the phrase "the test results are statistically signifant at the α level." Simiarly, when the null hypothesis is not rejected in a hypothesis test performed at the sigificance level α, we often express that fact with the phrase "the test results are not statistically significant at the α level."
One-Mean z-Test (σ known)
The one-mean z-test is also known as the one-sample z-test and the one-variable z-test. We prefer "one-mean" because it makes clear the parameter being tested. Procedure 9.1 provides a step by step method for performing a one-mean z-test. As you can see, Procedure 9.1 includes options for either the critical-value approach or the P-value approach.
Properties and guidelines for use of the one-mean z-test are similar to those for the one-mean z-interval procedure. In particular, the one-mean z-test is robust to moderate violations of the normality assumption but, even for large samples, can sometimes be unduly affected by outlikers because the sample mean is not resistant to outliers.
PS: By saying that the hypothesis test is exact, we mean that the true significance level equals α; by saying that it is approximately correct, we mean that the true significance level only approximately equals α.
Type II Error
Hypothesis tests do not always yield correct conclusions; they have built-in margins of error. An important part of planning a study is to consider both types of errors that can be made and their effects. Recall that two types of errors are possible with hypothesis tests. One is a Type I error: rejecting a true null hypothesis. The other is a Type II error: not rejecting a false null hypothesis. Also recall that the probability of making a Type I error is called the significance level of the hypothesis test and is denoted α, and that the probability of making a Type II error is denoted β.
Computing Type II Error Probabilities
The probability of making a Type II error depends on the sample size, the significance level, and the true value of the parameter under consideration.
Power Curve for a Oone-Mean z-Test
In modern statistical practice, analysts generally use the probability of not making a Type II error, called the power, to appraise the performance of a hypothesis test. Once we know the Type II error probability, β, obtaining the power is simple – we just substract β from 1. The power of a hypothesis test is between 0 and 1 and measures the ability of the hypothesis test to detect a false null hypothesis. If the power is near 0, the hypothesis test is not very good at detecting a false full hypothesis; if the power is near 1, the hypothesis test is extremely good at detect a false null hypothesis.
In reality, the true value of the parameter in question will be unkown. Consequently, construct a table of power for various values of the parameter consistent with the alternative hypothesis is helpful in evaluating the overall effectiveness of a hypothesis test. Even more helpful is a visual display of the effectiveness of the hypothesis test, obtained by plotting points of power against various values of the parameter and then connecting the points with a smooth curve. The resulting curve is called a power curve. In general, the closer a power is to 1, the better the hypothesis test is at detecting a false null hypothesis. Procedure 9.5 provides a step-by-step method for obtaining a power curve for a one-mean z-test.
Sample Size and Power
For a fixed significance level, increasing the sample size increases the power. By using a sufficiently large sample size, we can obtain a hypothesis test with as much power as we want. However, in practice, larger sample sizes tend to increase the cost of a study. Consequently, we must balance, among other things, the cost of a large sample against the cost of possible errors. As we have indicated, power is a useful way to evaluate the overall effectiveness of a hypothesis-testing procedure. Additionally, power can be used to compare different procedures. For example, a researcher might decide between two hypothesis-testing procedures on the basis of which test is more powerful for the situation under consideration.