Statistical Significance Summary

This post gives the most important points to understand for statistical significance.  If you want to see the rest my content for statistics, please go to this table of contents

What Is Hypothesis Testing – In 3 Sentences

Hypothesis testing is a way of determining if some measured effect is a real difference, or likely just statistical noise.  The baseline belief is always that any difference in measurements is just statistical randomness.  We use the hypothesis testing equations to demonstrate that any measured differences are large enough that they are very unlikely to be merely random variations.

 

Hypothesis Testing – As An Image

Hypothesis testing is essentially placing an error band (the bell curve below) around a point that you measured (orange dot) by using a modification of the normal curve, determining where another point would be located on that chart, and seeing how much area is under the modified normal curve up until that location

basic hypothesis testing

The width of the curve can change as you get more data

hypothesis testing with a narrower width

And sometimes you have error bands around both points

two sample t test error bands

When Is Hypothesis Testing Used?

Hypothesis testing is used in scientific studies of all kinds to determine if an effect exists.  This is synonymous with the term “Statistical Significance”.   It is used, for instance, to show the difference between a real medicine and a placebo.  This is also used in things such as A/B tests for advertising to determine which ads are most effective.

Hypothesis Testing In More Detail

  • Hypothesis testing always has two sets of measurements. (i.e. measure 10 samples from over here, and 15 samples from over there) Each of those two sets of measurements will have some average value.  So there are always two averages.
  • Those two averages will always have some difference between them.
    • Sometimes that difference is very large, i.e. if I measure the maximum weight lifting ability of a group of people before they start training vs. after they spend a year training
    • Sometimes the difference is very small. Sometimes the difference can be so small it is within the precision limits of the data and shows up as zero.
  • Hypothesis testing is determining if
    • There is some systematic cause which results in the observed difference between the two averages or
    • If it is likely that the observed difference is solely due to statistical noise, i.e. the typical fluctuations in results you get when you take measurements. This is the “Null Hypothesis”
      • Example – I have a coin that I know is a fair coin and will thus come up heads 50% of the time. I flip it 100 times and get 53 heads.  The difference between 53 heads and the expected 50 heads is small enough that it is probably statistical noise rather than “Someone gave me a weighted coin”  Hypothesis testing is a way of putting concrete numbers to the statement “Probably statistical noise”
    • Our default assumption is the “Null Hypothesis”. We assume that any difference in the average results is merely statistical noise until we show that to be more unlikely than a certain threshold.
      • We get to decide what we want that threshold to be. A typical value is that there must be less than a 5% the results are merely random noise before we assume that the results are systematic differences.  (Less than 1% chance, and less than 0.1% chance are also common thresholds used)
    • Note, even if we determine that there is a systematic difference, our calculations won’t tell us what is causing the difference in the average value between the two sets of data, just that there is at least one systematic difference
      • e. if we are confident that certain groups of people are stronger after a year spent lifting weights, that doesn’t tell us if the training actually caused the difference. It could have been something different like secret steroid usage.

 

 Hypothesis Testing Equations

  • There are 5 different types of hypothesis tests, each with their own equations. However don’t get hung up on that yet, they are only small differences between all 5 of the tests.
  • All hypothesis tests compare two sets of data. Call them the baseline set and the test set.
  • Each of those two sets of data has 3 attributes, for a total of 6 attributes.
    • The first attribute is the average of that set
    • The second is the standard deviation of that set
    • The third is the number of measurements in the data set
  • There are 5 different types of hypothesis tests (1 Z-test, and 4 T-Tests) and the only reason there is more than 1 type of hypothesis test is that they all make different assumptions about the 6 attributes. It isn’t important to know all the different assumptions yet, but here are some examples
    • One of the tests assumes that you don’t know anything about any of the 6 attributes other than what you measured
    • One of the tests assumes that the only thing you know is that both sets have the same standard deviation
    • Another test assumes that you have infinite measurements of the baseline set. For instance, you know the average height of people in a certain state with certainty because you looked it up in the government census results
    • Other tests have different assumptions. It isn’t important to know any of these yet other than to know that the equations are doing the same thing with different assumptions
  • In most cases, all 5 different types of hypothesis tests will give a similar answer. This is a good thing, and it means that if you understand how any of them work, you basically understand all of them
  • This free PDF cheat sheet has the equations for the 5 different types of hypothesis tests, as well as an example of when you would use each one.

 

Hypothesis Testing And The Normal Curve

  • It is pretty important to have some knowledge of the normal curve. (i.e. bell curve), as well as a general understanding of what a standard deviation is.   See this blog post for an overview.   (blog post TBD)
  • Remember that we have a baseline set of data, and a test set of data. Each of those sets has an average, a standard deviation, and a number of measurements.
  • What hypothesis testing is all about is “How well do we know the average values of the populations that we took our measurements from?”
    • e. even though we have an average value of our measurements, that is just the average of the samples we took, not the full population
    • There will always be some difference between our sample average and the true population average
    • Our average has a range of uncertainty, and we can use the normal curve and standard deviation to quantify that uncertainty
  • This blog post shows how to quantify the uncertainty you have in your average value. The examples given are dice outcomes that you are already familiar with.  (i.e. that the most likely roll using 2 dice is a 7) http://www.fairlynerdy.com/intuitive-guide-statistical-significance/

 

T-Test Degrees Of Freedom

  • Compared to the Z-test, T-tests have an additional equation, where you calculate the degrees of freedom.
  • Degrees of freedom is just a way of determining how many overall measurements you have in your data set, which is used to determine how accurate your calculated standard deviation likely is.

 

Hypothesis Testing Examples

 

 

 Level 2

That’s it for the first block of information.  If you work through a couple of examples and understand most of those points you will have a good grasp of hypothesis testing.  I recommend coming back and learning this second section in a few days or a week.

 

T Distribution vs Z Distribution

What is the difference between a T-Distribution and a Z distribution?

  • The point of having any distribution at all is that there is a range of values that your average could be. e. having a normal distribution accounts for the fact that you don’t know your average exactly.  But you also don’t know the standard deviation of your data exactly.
    • To be more precise, you know the standard deviation of your measured data. However, you don’t know the exact standard deviation of the population it was drawn from
  • A Z distribution uses a normal curve and ignores any uncertainty in your standard deviation. This is because it assumes you have enough data that your standard deviation is quite accurate.
  • A T distribution takes into account the fact that you have a range of error in your standard deviation. It does this by changing the shape of the distribution
  • It based the shape of the curve on the degrees of freedom, which is a way of calculating how many measurements you have.
  • With a T-distribution, Instead of the typical normal curve, you get a curve with fatter tails
    • This applet lets you play with the shape of a T distribution vs a Z distribution assuming different number of samples http://rpsychologist.com/d3/tdist/
      • To use it, slide the slider above the charts to the right or the left to change the degrees of freedom assumed in the T-Distribution
      • This will change the shape of the T-distribution
      • You will see that with a low number of degrees of freedom, the T distribution has much fatter tails than the normal distribution. However as the number of degrees of freedom  (i.e the number of measurements, i.e. the confidence you have in your measured standard deviation) increases the T distribution becomes nearly identical to the standard normal distribution
    • Once you get around 30 data points, or so, the difference between the T Distribution and the Z distribution mostly goes away, which is why 30 degrees of freedom is a common rule of thumb on when to switch to a Z-test instead of a T-test. (Note, there other important considerations as well, such as whether you have a baseline measurement or are measuring both the baseline and the test sets of data)

 

When To Use Each Test

This block summarizes when you would use any given test.  As we go down this list we know less and less information and have to rely on what we measure.   I.e. instead of looking up from the government census what the average age of a region is  (i.e. knowing it), we go ask 100 people what their age is (measure it)

 

Z Test

  • You know baseline average
  • You measure sample average
  • You know baseline standard deviation
  • You assume sample standard deviation is the same as the baseline standard deviation

1 Sample t-test

  • You know baseline average
  • You measure sample average
  • You don’t care about baseline standard deviation  (because we have so many baseline samples that it doesn’t matter)
  • You measure sample standard deviation

2 Sample t-test – equal variance

  • You measure baseline average
  • You measure sample average
  • You measure both sample standard deviation and baseline standard deviation but assume that they are the same as each other so you just measure them all together and do the calculations as a group

2 Sample t-test – unequal variance

  • You measure baseline average
  • You measure sample average
  • You measure baseline standard deviation
  • You separately measure sample standard deviation

The last hypothesis test is slightly different because the previous ones all assumed that what you were measuring was different groups.  The paired t-test assumes that each data point in the two sets of data are tied together.  I.e. each data point measuring the same people before and after

Paired T-test

  • The average value is the average of the difference between the before and after data
  • The stander deviation value is the standard deviation of the difference in before and after values

 

Truthfully, in many cases, you aren’t going to get very much difference no matter which of these equations you use.   Some of the equations pretty much reduce into the other equations as you get more and more information, i.e. if you measure at least 30-50 data points the difference between the 1 sample T-test and the Z test is pretty small.

 

 

Level 3 – Morphing Equations Into Each Other

It turns out that many of the equations for the 5 different hypothesis tests are just simplifications of each other.  They can be morphed into each other as you make assumptions, such as that the number of measurements goes to infinity for one of the datasets.

  • Blog post on this to be done.

Leave a Reply

Your email address will not be published. Required fields are marked *