## Type I and Type II Error in Statistics

We often use inferential statistics to make decisions or judgements about the value of a parameter, such as a population mean. For example, we might need to decide whether the mean weight, 𝜇, of all bags of pretzels packaged by a particular company differs from the advertised weight of 454 grams, or we might want to determine whether the mean age, 𝜇, of all cars in use has increased from the year 2000 mean of 9.0 years. One of the most commonly used methods for making such decisions or judgments is to perform a hypothesis test. A hypothesis is a statement that something is true. For example, the statement “the mean weight of all bags of pretzels packaged differs from the advertised weight of 454 g” is a hypothesis. Typically, a hypothesis test involves two hypotheses: the null hypothesis and the alternative hypothesis (or research hypothesis), which we define as follows. For instance, in the pretzel packaging example, the null hypothesis might be “the mean weight of all bags of pretzels packaged equals the advertised weight of 454 g,” and the alternative hypothesis might be “the mean weight of all bags of pretzels packaged differs from the advertised weight of 454 g.”

The first step in setting up a hypothesis test is to decide on the null hypothesis and the alternative hypothesis. Generally, the null hypothesis for a hypothesis test concerning a population mean, 𝜇, alway specifies a single value for that parameter. Hence, we can express the null hypothesis as

H0: 𝜇 = 𝜇0

The choice of the alternative hypothesis depends on and should reflect the purpose of the hypothesis test. Three choices are possible for the alternative hypothesis.

• If the primary concern is deciding whether a population mean, 𝜇, is different from a specific value 𝜇0, we express the alternative hypothesis as, Ha ≠ 𝜇0. A hypothesis test whose alternative hypothesis has this form is called a two-tailed test.
• If the primary concern is deciding whether a population mean, 𝜇, is less than a specific value 𝜇0, we express the alternative hypothesis as, Ha < 𝜇0. A hypothesis test whose alternative hypothesis has this form is called a left-tailed test.
• If the primary concern is deciding whether a population mean, 𝜇, is greater than a specified value 𝜇0, we express the alternative hypothesis as, Ha > 𝜇0. A hypothesis test whose alternative hypothesis has this form is called a right-tailed test.

PS: A hypothesis test is called a one-tailed test if it is either left tailed or right tailed. After we have chosen the null and alternative hypotheses, we must decide whether to reject the null hypothesis in favor of the alternative hypothesis. The procedure for deciding is roughly as follows. In practice, of course, we must have a precise criterion for deciding whether to reject the null hypothesis, which involves a test statistic, that is, a statistic calculated from the data that is used as a basis for deciding whether the null hypothesis should be rejected.

Type I and Type II Errors

In statistics, type I error is to reject the null hypothesis when it is in fact true; whereas type II error is not to reject the null hypothesis when it is in fact false. The probabilities of both type I and type II errors are useful (and essential) to evaluating the effectiveness of a hypothesis test, which involves analyzing the chances of making an incorrect decision. A type I error occurs if a true null hypothesis is rejected. The probability of that happening, the type I error probability, commonly called the significance level of the hypothesis test, is denote 𝛼. A type II error occurs if a false null hypothesis is not rejected. The probability of that happening, the type II error probability, is denote 𝛽. Ideally, both type I and Type II errors should have small probabilities. Then the chance of making an incorrect decision would be small, regardless of whether the null hypothesis is true or false. We can design a hypothesis test to have any specified significance level. So, for instance, of not rejecting a true null hypothesis is important, we should specify a small value for 𝛼. However, in making our choice for 𝛼, we must keep Key Fact 9.1 in mind. Consequently, we must always assess the risks involved in committing both types of errors and use that assessment as a method for balancing the type I and type II error probabilities.

The significance level, 𝛼, is the probability of making type I error, that is, of rejecting a true null hypothesis. Therefore, if the hypothesis test is conducted at a small significance level (e.g., 𝛼 = 0.05), the chance of rejecting a true null hypothesis will be small. Thus, if we do reject the null hypothesis, we can be reasonably confident that the null hypothesis is false. In other words, if we do reject the null hypothesis, we conclude that the data provide sufficient evidence to support the alternative hypothesis.

However, we usually do not know the probability, 𝛽, of making a type II error, that is, of not rejecting a false null hypothesis. Consequently, if we do not reject the null hypothesis, we simply reserve judgement about which hypothesis is true. In other words, if we do not reject the null hypothesis, we conclude only that the data do not provide sufficient evidence to support the alternative hypothesis; we do not conclude that the data provide sufficient evidence to support the null hypothesis. In short, it might be true that there is a true difference but the power of the statistic procedure is not high enough to detect it.

## Statistic Procedures – Hypothesis Tests for One Population Mean

We often use inferential statistics to make decision or judgments about the value of a parameter, such as a population mean. One of the most commonly used methods for making such decisions or judgements is to perform a hypothesis test. A hypothesis is a statement taht something is true. Typically, a hypothesis test involves two hypotheses: the null hypothesis and the alternative hypothesis (or research hypothesis), which we define as follows.

• Null hypothesis: A hypothesis is to be tested. We use the symbol H0 to represent the null hypothesis.
• Alternative hypothesis: A hypothesis to be considered as an alternative to the null hypothesis. We use the symbol Ha to represent the alternative hypothesis.
• Hypothesis test: The problem in a hypothesis test is to decide whether the null hypothesis should be rejected in favor of the alternative hypothesis.

The first step in setting up a hypothesis test is to decide on the null hypothesis and the alternative hypothesis. The following are some guidelines for choosing these two hypotheses. Although the guidelines refer specifically to hypothesis tests for one population mean, μ, they apply to any hypothesis test concerning one parameter.

Null hypothesis for a hypothesis test concerning a population mean, μ, always specifies a single value for that parameter. Hence we can express the null hypothesis as H0: μ = μ0, where μ0 is some number. The choice of the alternative hypothesis depdends on and should reflect the purpose of the hypothesis test. Three choices are possible for the alternative hypothesis: 1) If the primary concern is deciding whether a population mean, μ, is different from a specified value μ0, we express the alternative hypothesis as, Ha: μ != μ0, where a hypothesis test whose alternative hypothesis has this form is called a two-tailed test. 2) If the primary concern is deciding whether a population mean, μ, is less than a specified value μ0, we express the alternative hypothesis as, Ha: μ < μ0, where a hypothesis test whose alternative hypothesis has this form is called a left-tailed test. 3) If the primary concern is deciding whether a population mean, μ, is greater than a specified value μ0, we express the alternative hypothesis as, Ha: μ > μ0, where a hypothesis test whose alternative hypothesis has this form is called a right-tailed test. A hypothesis test is called a one-tailed test if it is either left tailed or right tailed. It is not uncommon that an sample mean falls within the area of acceptance for a two-tailed test but falls within the area of rejection for a one-tailed test. Therefore, a researcher who wishes to reject the null hypothesis may sometimes find that using a one-tailed rather a two-tailed test allows a previously nonsignificant result to become significant. For this reason, it is important that one-tailed test must depend on the nature of the hypothesis being tested and should therefore be decided at the outset of the research, rather than being decided afterward according to how the results turn out. One-tailed tests can only be used when there is a directional alternative hypothesis. This means that they may not be used unless results in only one direction are of interest and the possibility of the results being in the opposite direction is of no interest or consequence to the researcher.

PS: Results form Wiki

A two-tailed test is appropriate if the estimated value may be more than or less than the reference value, for example, whether a test taker may score above or below the historical average. A one-tailed test is appropirate if the estimated value may depart from the reference value in only one direction, for example, whether a machine produces more than one-percent defective products.

The basic logic hypothesis testing is that: Take a random sample from the population. If the sample data are consistent with the null hypothesis, do not reject the null hypothesis; if the sample data are inconsistent with the null hypothesis and supportive of the alternative hypothesis, reject the null hypothesis in favor of the alternative hypothesis. Suppose that a hypothesis test is conducted at a small significance level: If the null hypothesis is rejected, we conclude that the data provide sufficient evidence to support the alternative hypothesis. If the null hypothesis is not rejected, we conclude that the data do not provide sufficient evidence to support the alterantive hypothesis. Another way of viewing the use of a small significance level is as follows: The null hypothesis gets the benefit of the doubt; the alternative hypothesis has the burden of proof.

When the null hypothesis is rejected in a hypothesis test performed at the signifiance level α, we frequently express that fact with the phrase "the test results are statistically signifant at the α level." Simiarly, when the null hypothesis is not rejected in a hypothesis test performed at the sigificance level α, we often express that fact with the phrase "the test results are not statistically significant at the α level."

One-Mean z-Test (σ known)

The one-mean z-test is also known as the one-sample z-test and the one-variable z-test. We prefer "one-mean" because it makes clear the parameter being tested. Procedure 9.1 provides a step by step method for performing a one-mean z-test. As you can see, Procedure 9.1 includes options for either the critical-value approach or the P-value approach.

Properties and guidelines for use of the one-mean z-test are similar to those for the one-mean z-interval procedure. In particular, the one-mean z-test is robust to moderate violations of the normality assumption but, even for large samples, can sometimes be unduly affected by outlikers because the sample mean is not resistant to outliers. PS: By saying that the hypothesis test is exact, we mean that the true significance level equals α; by saying that it is approximately correct, we mean that the true significance level only approximately equals α.

One-Mean t-Test Type II Error

Hypothesis tests do not always yield correct conclusions; they have built-in margins of error. An important part of planning a study is to consider both types of errors that can be made and their effects. Recall that two types of errors are possible with hypothesis tests. One is a Type I error: rejecting a true null hypothesis. The other is a Type II error: not rejecting a false null hypothesis. Also recall that the probability of making a Type I error is called the significance level of the hypothesis test and is denoted α, and that the probability of making a Type II error is denoted β.

Computing Type II Error Probabilities

The probability of making a Type II error depends on the sample size, the significance level, and the true value of the parameter under consideration. Power Curve for a Oone-Mean z-Test

In modern statistical practice, analysts generally use the probability of not making a Type II error, called the power, to appraise the performance of a hypothesis test. Once we know the Type II error probability, β, obtaining the power is simple – we just substract β from 1. The power of a hypothesis test is between 0 and 1 and measures the ability of the hypothesis test to detect a false null hypothesis. If the power is near 0, the hypothesis test is not very good at detecting a false full hypothesis; if the power is near 1, the hypothesis test is extremely good at detect a false null hypothesis.

In reality, the true value of the parameter in question will be unkown. Consequently, construct a table of power for various values of the parameter consistent with the alternative hypothesis is helpful in evaluating the overall effectiveness of a hypothesis test. Even more helpful is a visual display of the effectiveness of the hypothesis test, obtained by plotting points of power against various values of the parameter and then connecting the points with a smooth curve. The resulting curve is called a power curve. In general, the closer a power is to 1, the better the hypothesis test is at detecting a false null hypothesis. Procedure 9.5 provides a step-by-step method for obtaining a power curve for a one-mean z-test. Sample Size and Power

For a fixed significance level, increasing the sample size increases the power. By using a sufficiently large sample size, we can obtain a hypothesis test with as much power as we want. However, in practice, larger sample sizes tend to increase the cost of a study. Consequently, we must balance, among other things, the cost of a large sample against the cost of possible errors. As we have indicated, power is a useful way to evaluate the overall effectiveness of a hypothesis-testing procedure. Additionally, power can be used to compare different procedures. For example, a researcher might decide between two hypothesis-testing procedures on the basis of which test is more powerful for the situation under consideration.