Using R for Hypothesis Testing
This tutorial demonstrates step-by step how to use R and Jupyter Notebook to conduct a two-tailed or two sided hypothesis test. An eight step approach that begins with the formulation of Null and Alternative Hypothesis and ends by stating what the results of the test mean in plain English.
What is hypothesis testing?
Hypothesis testing is one of the cornerstones of inferential statistics. It is generally used to test whether some phenomenon observed in a sample is likely the result of random change or is "real", that is statistically significant.
What are the steps of hypothesis testing?
Hypothesis testing is statistics is usually challenging at first, but a step-by-step approach can help keep everything straight. We always start with a hypothesis we would like to prove, i.e. Human activity is contributing to global warming. This is what we need to prove, and is called the alternative hypothesis. Its opposite, humans are not contributing to global warming, is the null hypothesis. To prove the alternative hypothesis we would need to collect data and if the collected sample data deviates enough from what we can attribute to normal variation we will have proved the alternative hypothesis is much more likely than not. The global warming question is devilishly hard to test, and as you may have heard the jury is still out. But it does serve as a nice vignette.
When presented with the sorts of questions found in introductory statistics, you can use the following steps:
- State the null and alternative hypotheses - the null is the status quo, and requires sufficient evidence to disprove. It is often easier to start wit hthe alternative hypothesis and then present its opposite as the null
- Choose the level of significance at which you would reject the Null - so at what point would you be reasonably sure that the difference between the sample mean and status quo mean is not the result of random chance? The level of significance is usually 0.05, but not always
- Choose a sample size to test your hypothesis. The textbook problem usually does this for you, but you will need to use the sample size to claculate your test statistic.
- Determine the appropriate statistical technique to use, i.e. Z-test or t-test
- Determine the critical value(s) that separate the rejection region from the non-rejection region. You will always need to either reject or fail to reject he null hyopthesis after you calculate your test statistic. This is the line or lines in thesand that allow you to do that.
- Collect data and calculate the test statistic. Again, this stepo is usually done for you in textbook problems.
- Compare the test statistic to the critical value(s). Is the test statistic beyond your line in the sand?
- State the statistical decision - if the test statistic is beyond a boundary established by a critical value, then you reject the null hypothesis, oitherwise you do not reject the null.
- State your decision in terms of the original question, i.e the results either do or do nbot support the alternative hypothese.
The video tutorial presented here is an example of a so called two-tailed hypothesis test, and walks through a problem using these steps.
You can download the Jupyter Notebook used in the video here