Research Methods - Hypothesis Testing Pt2 - Hypothesis Testing Examples
Bonus Video on Hypothesis Testing
Introduction to the Video
- This video serves as a follow-up to the previous discussion on hypothesis testing, introducing some statistical concepts not yet covered.
- The focus is on understanding how statistics programs calculate P-values and determine whether to reject the null hypothesis.
Recap of Previous Example: MemX Drug
- The example involves testing if a new drug, MemX, increases memory scores compared to the general population. The null hypothesis (H0) states that the mean score of MemX users is equal to or less than that of the general population (mean = 50).
- The alternative hypothesis (H1) posits that MemX users have a higher mean score than the general population.
Statistical Distribution and Sample Means
- A distribution of sample means is created based on all possible samples of size 36 from the general population under H0, which has a mean of 50 and standard deviation adjusted for sample size (10/√36 = 1.6667).
- The actual sample mean for MemX users is reported as 55, prompting an analysis of how extreme this value is within the context of our distribution.
Calculating Z-scores and P-values
- To assess where a sample mean of 55 falls in relation to our distribution centered at 50, we calculate its Z-score: it’s three standard deviations above the mean (Z = 3). This indicates how unusual this result would be if H0 were true.
- Consulting a Z-table reveals that only about 0.13% of samples would yield a mean score as extreme as or more extreme than 55 when H0 holds true; thus, this low probability suggests rejecting H0 in favor of H1.
Conclusion from MemX Example
- Given the extremely low P-value associated with obtaining such an extreme sample mean under H0, it seems likely that MemX does indeed improve memory scores compared to the general population. Thus, we reject H0 and accept H1 tentatively until further studies are conducted for validation.
New Example: Game of Thrones Character Heights
- A hypothetical scenario introduces analyzing character heights from "Game of Thrones," where it's assumed that average height across characters is approximately 66 inches.
- Viewers are encouraged to formulate their own null (H0) and alternative hypotheses (H1), speculating whether Lannister characters might be shorter than average based on observed data from specific episodes.
Hypothesis Testing in Statistics
Setting Up Hypotheses
- The alternative hypothesis (H1) posits that the Lannister population mean height is below 66 inches, while the null hypothesis (H0) states it is greater than or equal to 66 inches.
- An alternative representation of hypotheses includes H1 as mu of Lannisters being less than mu of the entire cast, with H0 being that Lannisters are not shorter on average.
Conducting a Statistical Test
- To validate these hypotheses, a statistical test can be performed using sample data. A statistics program can facilitate this process.
- Assuming the heights of the entire cast follow a normal distribution with a mean of 66 inches and standard deviation of 8 inches helps establish parameters for testing.
Sample Mean Calculation
- A sample size of four Lannisters is taken to calculate an average height, which might yield a sample mean (M) of 56 inches.
- The question posed is about the proportion of size-four samples from the entire cast that would yield a mean height less than or equal to 56 inches under the null hypothesis.
Distribution and Standard Error
- If H0 holds true, all possible sample means form a normal curve centered at 66 inches. The standard deviation for this distribution is calculated as four (standard error).
- The standard error of the mean (SEM), representing how much a size-four sample's mean deviates from 66 inches on average, is crucial for understanding variability in sampling.
Z-Scores and Conclusion
- The calculation shows that being 10 inches below the mean corresponds to -2.5 steps in terms of SEM. This z-score indicates how far our sample mean lies from the expected value.
- Consulting a z-table reveals that having a z-score of -2.5 occurs only in about 0.62% of samples if H0 were true, suggesting it's highly unlikely to obtain such results by chance.
Quality Control in Food Production
Monitoring Package Weights
- As head of quality control for food production, monitoring cherry tomato package weights ensures compliance with customer expectations regarding product weight.
- Each package should ideally weigh an average of 227 grams; deviations could lead to customer complaints or increased shipping costs due to overfilled packages.
Formulating New Hypotheses
- The null hypothesis here would state that packages are averaging exactly 227 grams—indicating no difference from what was promised—while exploring whether recalibration may be necessary based on observed weights.
Hypothesis Testing and Z-Scores in Statistical Analysis
Understanding Null and Alternative Hypotheses
- The null hypothesis (H0) posits that the machine is functioning correctly, producing packages with an average weight of 227 grams. The alternative hypothesis (H1) suggests that the average weight is not 227 grams, indicating a need for recalibration.
- The alternative hypothesis is represented by a "not equals" sign, while the null hypothesis uses an equals sign. This pairing is essential for proper statistical testing.
Sample Size and Mean Calculation
- A sample size of 36 packages (n = 36) is chosen to test the hypotheses. The sample mean might be calculated as 225.5 grams, prompting the need for a statistical test to determine if this indicates a malfunctioning machine.
Setting Alpha Level and Distribution
- An alpha level of 0.05 (5%) is selected as the cutoff for significance in this analysis.
- The distribution of sample means under the null hypothesis must be established to understand what results would look like if the machine were operating correctly.
Critical Regions and Z-Scores
- Using a z-table, critical z-scores are determined; typically, ±1.96 corresponds to cutting off 5% in both tails of the normal distribution.
- If our sample mean falls outside these critical values, we can reject the null hypothesis, suggesting that recalibration may be necessary.
Calculating Z-Scores from Sample Data
- To find out how far our sample mean (225.5 grams) deviates from the hypothesized mean (227 grams), we calculate its z-score based on standard error.
- With a known standard deviation of 3 grams divided by √36 (the square root of our sample size), we find that our standard error is 0.5 grams.
Interpreting Results: Rejection or Acceptance?
- Our calculated z-score shows how many standard errors away from the mean our sample lies; here it’s -3 steps below.
- Since -3 exceeds -1.96 in magnitude, it falls into the rejection region, leading us to reject H0 and conclude that recalibration is needed.
Summary of Findings
- By assuming H0 was true initially and finding such an unlikely result (<5% probability), we confirm that something has gone wrong with package production.
- Given our two-tailed test setup with an alpha level split between both tails at ±1.96, we assess how extreme our observed value truly is against expected outcomes under H0.
This structured approach provides clarity on conducting hypothesis tests using z-scores while emphasizing key concepts within statistical analysis relevant to quality control processes in manufacturing settings.
Understanding Hypothesis Testing and P-Values
Rejection of the Null Hypothesis
- The sample mean is significantly far in the rejection region, indicating a strong reason to reject the null hypothesis that the machine is functioning correctly. This conclusion is based on an alpha level cut-off of 2.5% in each tail, with our sample being even further out at less than 1% in the tail beyond our sample mean's z-score.
P-Value Interpretation
- The p-value for this sample is calculated as 0.13, suggesting that if the null hypothesis were true, we would expect to see a sample mean this extreme or more extreme only about 13% of the time. This low probability supports rejecting the null hypothesis.
One-Tailed vs Two-Tailed Tests
- If there was a suspicion that packages are smaller than 227 (a directional one-tailed hypothesis), we still arrive at a p-value of 0.13 for this left-tail test, reinforcing our findings regarding statistical significance under both hypotheses.
Steps for Conducting One-Tailed Tests
- For one-tailed tests, after stating hypotheses and choosing an alpha level (commonly set at 0.05), all of it goes into one tail (e.g., left). The critical z-score corresponding to this alpha level (-1.64) helps define rejection regions for decision-making based on whether our sample z-score falls within these limits or not.
Calculating Z-Scores and Making Decisions
- To determine if we reject or fail to reject the null hypothesis, calculate your sample's z-score using standard error of the mean (SEM). If your calculated z-score exceeds critical values established earlier (like -1.64), you reject the null hypothesis; otherwise, you do not detect significant effects but refrain from accepting it outright as true.