Changes

Jump to navigation Jump to search
1,985 bytes removed ,  08:40, 28 August 2012
no edit summary
Line 1: Line 1: −
           
  −
'''Statistics'''
  −
  −
  −
<br>
  −
<br>
  −
  −
  −
<br>
     −
   
= Introduction =
 
= Introduction =
 
   
 
   
Line 70: Line 60:  
=== Descriptive and Inferential Statistics ===
 
=== Descriptive and Inferential Statistics ===
 
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
Line 175: Line 165:  
= Mind Map =
 
= Mind Map =
 
   
 
   
<br>
+
 
    
   
 
   
[[Image:KOER-%20Mathematics%20-%20Statistics_html_m14464871.jpg]]<br>
+
[[Image:KOER-%20Mathematics%20-%20Statistics_html_m14464871.jpg]]
    
   
 
   
Line 231: Line 221:     
    
 
    
<br>
+
 
    
   
 
   
<br>
+
 
    
   
 
   
<br>
+
 
    
   
 
   
<br>
+
 
    
   
 
   
<br>
+
 
    
   
 
   
<br>
+
 
    
   
 
   
<br>
+
 
    
   
 
   
<br>
+
 
    
   
 
   
Line 270: Line 260:     
   
 
   
<br>
+
 
    
   
 
   
Line 278: Line 268:     
   
 
   
<br>
+
 
    
   
 
   
Line 297: Line 287:  
* A teacher may ask “How many hours of TV by students on TV' to get an idea of what children are learning from TV at home and how it supplements (or affects) the learning in the school
 
* A teacher may ask “How many hours of TV by students on TV' to get an idea of what children are learning from TV at home and how it supplements (or affects) the learning in the school
 
   
 
   
<br>
+
 
    
   
 
   
Line 335: Line 325:  
# Specialised equipment (rainwater gauges to measure rainfall in a place, various medical equipment that collect information about different biological processes)
 
# Specialised equipment (rainwater gauges to measure rainfall in a place, various medical equipment that collect information about different biological processes)
 
   
 
   
<br>
+
 
    
   
 
   
Line 344: Line 334:     
   
 
   
<br>
+
 
    
   
 
   
Line 354: Line 344:     
   
 
   
<br>
+
 
    
   
 
   
 
NatWest One Day
 
NatWest One Day
International Series: England v India<br>
+
International Series: England v India
 
Friday, 16 September 2011 at
 
Friday, 16 September 2011 at
 
The Swalec Stadium
 
The Swalec Stadium
Line 385: Line 375:  
   
 
   
 
|}  
 
|}  
<br>
+
 
    
   
 
   
Line 395: Line 385:  
|-
 
|-
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
Line 482: Line 472:  
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
Line 573: Line 563:  
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
Line 631: Line 621:  
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
Line 656: Line 646:  
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
Line 668: Line 658:  
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
Line 677: Line 667:  
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
Line 693: Line 683:  
   
 
   
 
|}  
 
|}  
<br>
+
 
    
        
 
        
Line 848: Line 838:  
   
 
   
 
|}  
 
|}  
<br>
+
 
    
   
 
   
 
|}  
 
|}  
<br>
+
 
    
   
 
   
Line 865: Line 855:     
   
 
   
<br>
+
 
<br>
+
 
    
            
 
            
Line 934: Line 924:  
   
 
   
 
|}   
 
|}   
<br>
+
 
<br>
+
 
    
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
Line 982: Line 972:  
   
 
   
 
|}  
 
|}  
<br>
+
 
<br>
+
 
    
   
 
   
Line 1,037: Line 1,027:  
   
 
   
 
|}   
 
|}   
<br>
+
 
    
   
 
   
Line 1,095: Line 1,085:  
   
 
   
 
|}   
 
|}   
<br>
+
 
    
   
 
   
<br>
+
 
    
   
 
   
Line 1,109: Line 1,099:     
   
 
   
<br>
+
 
    
                                
 
                                
Line 1,179: Line 1,169:  
   
 
   
 
|}  
 
|}  
<br>
+
 
    
   
 
   
Line 1,201: Line 1,191:     
   
 
   
<br>
+
 
    
   
 
   
Line 1,209: Line 1,199:     
   
 
   
<br>
+
 
    
                                
 
                                
Line 1,279: Line 1,269:  
   
 
   
 
|}  
 
|}  
<br>
+
 
    
    
 
    
Line 1,328: Line 1,318:  
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|-
 
|-
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|-
 
|-
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|-
 
|-
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|-
 
|-
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|-
 
|-
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|-
 
|-
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|-
 
|-
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|-
 
|-
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|-
 
|-
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|-
 
|-
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|  
 
|  
<br>
+
 
    
   
 
   
 
|}  
 
|}  
<br>
+
 
<br>
+
 
    
   
 
   
Line 1,727: Line 1,717:  
=== What is a histogram? ===
 
=== What is a histogram? ===
 
   
 
   
<br>
+
 
    
   
 
   
Line 1,738: Line 1,728:     
   
 
   
<br>
+
 
    
   
 
   
<br>
+
 
    
   
 
   
<br>
+
 
    
   
 
   
<br>
+
 
    
   
 
   
<br>
+
 
    
   
 
   
<br>
+
 
    
   
 
   
[[Image:KOER-%20Mathematics%20-%20Statistics_html_6201ec25.png]]<br>
+
[[Image:KOER-%20Mathematics%20-%20Statistics_html_6201ec25.png]]
    
   
 
   
<br>
+
 
    
   
 
   
<br>
+
 
    
   
 
   
<br>
+
 
    
   
 
   
<br>
+
 
    
   
 
   
<br>
+
 
    
   
 
   
<br>
+
 
    
   
 
   
<br>
+
 
    
   
 
   
<br>
+
 
    
   
 
   
<br>
+
 
    
   
 
   
<br>
+
 
    
   
 
   
<br>
+
 
    
   
 
   
<br>
+
 
    
   
 
   
<br>
+
 
    
   
 
   
<br>
+
 
    
   
 
   
<br>
+
 
    
   
 
   
Line 1,812: Line 1,802:     
   
 
   
<br>
+
 
    
   
 
   
 
=== How do you construct a histogram from a continuous variable? ===
 
=== How do you construct a histogram from a continuous variable? ===
 
   
 
   
<br>
+
 
    
   
 
   
Line 1,830: Line 1,820:     
   
 
   
<br>
+
 
    
   
 
   
Line 1,861: Line 1,851:     
   
 
   
<br>
+
 
    
   
 
   
Line 1,873: Line 1,863:     
   
 
   
<br>
+
 
    
   
 
   
 
=== Choosing the correct bin width ===
 
=== Choosing the correct bin width ===
 
   
 
   
<br>
+
 
    
   
 
   
Line 1,889: Line 1,879:     
   
 
   
<br>
+
 
    
   
 
   
[[Image:KOER-%20Mathematics%20-%20Statistics_html_75ab55c3.png]]<br>
+
[[Image:KOER-%20Mathematics%20-%20Statistics_html_75ab55c3.png]]
    
   
 
   
<br>
+
 
    
   
 
   
<br>
+
 
    
   
 
   
Line 1,914: Line 1,904:     
   
 
   
<br>
+
 
    
   
 
   
Line 1,930: Line 1,920:     
   
 
   
<br>
+
 
    
   
 
   
 
=== What is the difference between a bar chart and a histogram? ===
 
=== What is the difference between a bar chart and a histogram? ===
 
   
 
   
[[Image:KOER-%20Mathematics%20-%20Statistics_html_6dfca87b.png]]<br>
+
[[Image:KOER-%20Mathematics%20-%20Statistics_html_6dfca87b.png]]
    
   
 
   
Line 1,946: Line 1,936:     
   
 
   
<br>
+
 
    
   
 
   
<br>
+
 
    
   
 
   
<br>
+
 
    
   
 
   
Line 1,964: Line 1,954:     
   
 
   
<br>
+
 
    
   
 
   
Line 1,974: Line 1,964:     
   
 
   
<br>
+
 
    
    
 
    
Line 1,981: Line 1,971:  
==== Learning ObjectivesMaterials and Resources Required Pre-requisites/ Instructions ====
 
==== Learning ObjectivesMaterials and Resources Required Pre-requisites/ Instructions ====
 
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
Line 2,007: Line 1,997:     
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
Line 2,137: Line 2,127:  
   
 
   
 
|}  
 
|}  
<br>
+
 
    
   
 
   
<br>
+
 
    
   
 
   
Line 2,153: Line 2,143:     
   
 
   
<br>
+
 
    
                                                  
 
                                                  
Line 2,283: Line 2,273:  
=== Dependent and Independent Variables ===
 
=== Dependent and Independent Variables ===
 
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
Line 2,293: Line 2,283:     
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
Line 2,308: Line 2,298:     
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
Line 2,316: Line 2,306:     
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
Line 2,324: Line 2,314:     
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
Line 2,341: Line 2,331:     
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
Line 2,354: Line 2,344:     
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
Line 2,365: Line 2,355:  
=== Experimental and Non-Experimental Research ===
 
=== Experimental and Non-Experimental Research ===
 
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
Line 2,407: Line 2,397:     
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
 
=== Categorical and Continuous Variables ===
 
=== Categorical and Continuous Variables ===
 
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
Line 2,422: Line 2,412:     
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
Line 2,466: Line 2,456:     
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
Line 2,475: Line 2,465:     
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
Line 2,500: Line 2,490:     
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
 
=== Ambiguities in classifying a type of variable ===
 
=== Ambiguities in classifying a type of variable ===
 
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
Line 2,521: Line 2,511:     
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
Line 2,579: Line 2,569:     
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
Line 2,622: Line 2,612:     
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
Line 2,760: Line 2,750:     
   
 
   
<br>
+
 
<br>
+
 
    
                            
 
                            
Line 2,811: Line 2,801:  
   
 
   
 
|}  
 
|}  
<br>
+
 
<br>
+
 
    
   
 
   
Line 2,819: Line 2,809:     
   
 
   
<br>
+
 
<br>
+
 
    
                            
 
                            
Line 2,870: Line 2,860:  
   
 
   
 
|}  
 
|}  
<br>
+
 
<br>
+
 
    
   
 
   
Line 2,883: Line 2,873:     
   
 
   
<br>
+
 
<br>
+
 
    
                          
 
                          
Line 2,930: Line 2,920:  
   
 
   
 
|}  
 
|}  
<br>
+
 
<br>
+
 
    
   
 
   
 
We again rearrange that data into order of
 
We again rearrange that data into order of
magnitude (smallest first):<br>
+
magnitude (smallest first):
<br>
+
 
<br>
+
 
    
                            
 
                            
Line 2,987: Line 2,977:  
   
 
   
 
|}  
 
|}  
<br>
+
 
<br>
+
 
    
   
 
   
Line 3,003: Line 2,993:     
   
 
   
[[Image:KOER-%20Mathematics%20-%20Statistics_html_58d59706.png]]<br>
+
[[Image:KOER-%20Mathematics%20-%20Statistics_html_58d59706.png]]
<br>
+
 
    
   
 
   
<br>
  −
<br>
     −
+
 
<br>
  −
<br>
      
   
 
   
<br>
  −
<br>
     −
  −
<br>
  −
<br>
     −
  −
<br>
  −
<br>
      
   
 
   
<br>
  −
<br>
     −
  −
<br>
  −
<br>
     −
  −
<br>
  −
<br>
      
   
 
   
<br>
  −
<br>
     −
  −
<br>
  −
<br>
     −
  −
<br>
  −
<br>
      
   
 
   
<br>
  −
<br>
     −
  −
<br>
  −
<br>
     −
  −
<br>
  −
<br>
      
   
 
   
<br>
  −
<br>
     −
  −
Normally, the mode is used for categorical data
  −
where we wish to know which is the most common category as
  −
illustrated below:
     −
  −
We can see above that the most common form of
  −
transport, in this particular data set, is the bus. However, one of
  −
the problems with the mode is that it is not unique, so it leaves us
  −
with problems when we have two or more values that share the highest
  −
frequency, such as below:
      
   
 
   
<br>
  −
<br>
     −
  −
[[Image:KOER-%20Mathematics%20-%20Statistics_html_m64bbad46.png]]<br>
  −
<br>
     −
  −
<br>
  −
<br>
      
   
 
   
<br>
  −
<br>
     −
  −
<br>
  −
<br>
     −
  −
<br>
  −
<br>
      
   
 
   
<br>
  −
<br>
     −
  −
<br>
  −
<br>
     −
  −
<br>
  −
<br>
      
   
 
   
<br>
  −
<br>
     −
  −
<br>
  −
<br>
     −
  −
<br>
  −
<br>
      
   
 
   
<br>
  −
<br>
     −
  −
<br>
  −
<br>
     −
  −
<br>
  −
<br>
      
   
 
   
<br>
  −
<br>
     −
  −
<br>
  −
<br>
     −
  −
<br>
  −
<br>
      
   
 
   
We are now stuck as to which mode best describes
  −
the central tendency of the data. This is particularly problematic
  −
when we have continuous data, as we are more likely not to have any
  −
one value that is more frequent than the other. For example, consider
  −
measuring 30 peoples' weight (to the nearest 0.1 kg). How likely is
  −
it that we will find two or more people with '''exactly'''
  −
the same weight, e.g. 67.4 kg? The answer, is probably very unlikely
  −
- many people might be close but with such a small sample (30 people)
  −
and a large range of possible weights you are unlikely to find two
  −
people with exactly the same weight, that is, to the nearest 0.1 kg.
  −
This is why the mode is very rarely used with continuous data.
     −
  −
<br>
  −
<br>
     −
  −
Another problem with the mode is that it will not
  −
provide us with a very good measure of central tendency when the most
  −
common mark is far away from the rest of the data in the data set, as
  −
depicted in the diagram below:
      
   
 
   
[[Image:KOER-%20Mathematics%20-%20Statistics_html_152dd141.png]]<br>
  −
<br>
     −
  −
<br>
  −
<br>
     −
  −
<br>
  −
<br>
      
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
<br>
+
Normally, the mode is used for categorical data
<br>
+
where we wish to know which is the most common category as
 +
illustrated below:
    
   
 
   
<br>
+
We can see above that the most common form of
<br>
+
transport, in this particular data set, is the bus. However, one of
 +
the problems with the mode is that it is not unique, so it leaves us
 +
with problems when we have two or more values that share the highest
 +
frequency, such as below:
    
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
<br>
+
[[Image:KOER-%20Mathematics%20-%20Statistics_html_m64bbad46.png]]
<br>
+
 
    
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
<br>
  −
<br>
     −
  −
<br>
  −
<br>
     −
  −
<br>
  −
<br>
      
   
 
   
<br>
  −
<br>
     −
  −
<br>
  −
<br>
     −
  −
<br>
  −
<br>
      
   
 
   
<br>
  −
<br>
     −
  −
In the above diagram the mode has a value of 2. We
  −
can clearly see, however, that the mode is not representative of the
  −
data, which is mostly concentrated around the 20 to 30 value range.
  −
To use the mode to describe the central tendency of this data set
  −
would be misleading.
     −
  −
== Skewed Distributions and the Mean and Median ==
  −
  −
[[Image:KOER-%20Mathematics%20-%20Statistics_html_26c6186d.png]]We
  −
often test whether our data is normally distributed as this is a
  −
common assumption underlying many statistical tests. An example of a
  −
normally distributed set of data is presented below:
      
   
 
   
<br>
  −
<br>
     −
  −
<br>
  −
<br>
     −
  −
<br>
  −
<br>
      
   
 
   
<br>
  −
<br>
     −
  −
<br>
  −
<br>
     −
  −
<br>
  −
<br>
      
   
 
   
<br>
  −
<br>
     −
  −
<br>
  −
<br>
     −
  −
<br>
  −
<br>
      
   
 
   
<br>
  −
<br>
     −
+
 
When you have a normally distributed sample you
  −
can legitimately use both the mean or the median as your measure of
  −
central tendency. In fact, in any symmetrical distribution the mean,
  −
median and mode are equal. However, in this situation, the mean is
  −
widely preferred as the best measure of central tendency as it is the
  −
measure that includes all the values in the data set for its
  −
calculation, and any change in any of the scores will affect the
  −
value of the mean. This is not the case with the median or mode.
      
   
 
   
However, when our data is skewed, for example, as
  −
with the right-skewed data set below:
     −
  −
[[Image:KOER-%20Mathematics%20-%20Statistics_html_m2609c500.png]]<br>
  −
<br>
     −
  −
<br>
  −
<br>
      
   
 
   
<br>
  −
<br>
     −
  −
<br>
  −
<br>
     −
  −
<br>
  −
<br>
      
   
 
   
<br>
  −
<br>
     −
  −
<br>
  −
<br>
     −
  −
<br>
  −
<br>
      
   
 
   
<br>
  −
<br>
     −
  −
<br>
  −
<br>
     −
  −
<br>
  −
<br>
      
   
 
   
<br>
  −
<br>
     −
  −
<br>
  −
<br>
     −
  −
<br>
  −
<br>
      
   
 
   
<br>
  −
<br>
     −
  −
<br>
  −
<br>
     −
  −
<br>
  −
<br>
      
   
 
   
we find that the mean is being dragged in the
  −
direct of the skew. In these situations, the median is generally
  −
considered to be the best representative of the central location of
  −
the data. The more skewed the distribution the greater the difference
  −
between the median and mean, and the greater emphasis should be
  −
placed on using the median as opposed to the mean. A classic example
  −
of the above right-skewed distribution is income (salary), where
  −
higher-earners provide a false representation of the typical income
  −
if expressed as a mean and not a median.
     −
  −
If dealing with a normal distribution, and tests
  −
of normality show that the data is non-normal, then it is customary
  −
to use the median instead of the mean. This is more a rule of thumb
  −
than a strict guideline however. Sometimes, researchers wish to
  −
report the mean of a skewed distribution if the median and mean are
  −
not appreciably different (a subjective assessment) and if it allows
  −
easier comparisons to previous research to be made.
     −
  −
<br>
  −
<br>
      
   
 
   
== Summary of when to use the mean, median and mode ==
+
We are now stuck as to which mode best describes
 +
the central tendency of the data. This is particularly problematic
 +
when we have continuous data, as we are more likely not to have any
 +
one value that is more frequent than the other. For example, consider
 +
measuring 30 peoples' weight (to the nearest 0.1 kg). How likely is
 +
it that we will find two or more people with '''exactly'''
 +
the same weight, e.g. 67.4 kg? The answer, is probably very unlikely
 +
- many people might be close but with such a small sample (30 people)
 +
and a large range of possible weights you are unlikely to find two
 +
people with exactly the same weight, that is, to the nearest 0.1 kg.
 +
This is why the mode is very rarely used with continuous data.
 +
 
 
   
 
   
Please use the following summary table to know
  −
what the best measure of central tendency is with respect to the
  −
different types of variables.
     −
  −
<br>
  −
<br>
     −
                       
  −
{| border="1"
  −
|-
  −
|
  −
'''Type of Variable'''
      
   
 
   
|
+
Another problem with the mode is that it will not
'''Best measure of central tendency'''
+
provide us with a very good measure of central tendency when the most
 +
common mark is far away from the rest of the data in the data set, as
 +
depicted in the diagram below:
    
   
 
   
|-
+
[[Image:KOER-%20Mathematics%20-%20Statistics_html_152dd141.png]]
|
+
 
Nominal
      
   
 
   
|
  −
Mode
     −
+
 
|-
  −
|
  −
Ordinal
      
   
 
   
|
+
 
Median
+
 
    
   
 
   
|-
+
 
|
+
 
Interval/Ratio (not skewed)
      
   
 
   
|
+
 
Mean
+
 
    
   
 
   
|-
+
 
|
+
 
Interval/Ratio (skewed)
      
   
 
   
|
  −
Median
     −
+
 
|}
  −
<br>
  −
<br>
      
   
 
   
== Relative advantages and disadvantages of mean, median and  mode ==
  −
  −
Mean.<br>
  −
Advantages:
  −
Finds the most accurate average of the set of numbers.<br>
  −
Disadvantages:
  −
Outliers (few values are very different from most) can change the
  −
mean a lot... making it much lower/higher than it should
  −
be.<br>
  −
<br>
  −
Median:<br>
  −
Advantages: Finds the middle number of a set of
  −
data, so outliers have little or no effect.<br>
  −
Disadvantages: If the
  −
gap between some numbers is large, while it is small between other
  −
numbers in the data, this can cause the median to be a very
  −
inaccurate way to find the middle of a set of
  −
values.<br>
  −
<br>
  −
Mode:<br>
  −
Advantages: Allows you to see what value
  −
happened the most in a set of data. This can help you to figure out
  −
things in a different way. It is also quick and easy.<br>
  −
Disadvantages:
  −
Could be very far from the actual middle of the data. The least
  −
reliable way to find the middle or average of the data.
     −
  −
<br>
     −
  −
This means that each of
  −
these measures can be useful in different kinds of distributions.
      
   
 
   
<br>
+
 
 +
 
    
   
 
   
== Activities ==
  −
  −
== Activity 1 : Central Tendency ==
  −
  −
==== Learning Objectives ====
  −
  −
Learn to calculate each average measure - Mean,
  −
Median, Mode. And understand the difference between them. Know in
  −
which situation which measure must be used.
     −
  −
==== Pre-requisites/ Instructions ====
  −
  −
<br>
  −
<br>
     −
  −
==== Materials and Resources Required ====
  −
  −
Paper and Pencil
      
   
 
   
==== Method ====
  −
  −
Solve the problems A and B
     −
  −
<br>
  −
<br>
     −
  −
A. 27 members of a
  −
class were given a puzzle to solve and the times (in minutes) each
  −
pupil took to solve it were noted.
      
   
 
   
<br>
  −
<br>
     −
       
  −
{| border="1"
  −
|-
  −
|
  −
'''the times (in minutes) each pupil took'''
     −
  −
|-
  −
|
  −
19 14 15 9 18 16 10 11 16
      
   
 
   
4 20 10 14 11 9 13 15 13
     −
  −
12 2 17 15 14 10 11 10 12
     −
  −
|}
  −
<br>
  −
<br>
      
   
 
   
<br>
  −
<br>
     −
+
 
# The MEAN value of a set of data is Sum of Values / Number of Values . What is the mean (to 2 decimal places) of the times given in the table?
  −
# The MEDIAN is the middle value of an ordered set of data.
  −
## Write down the times in the table above in ascending order.
  −
## How many values are there?
  −
## What is the median ?
  −
#
  −
# The MODE is the value which occurs most often, i.e. the most popular.
  −
## What is the mode of the times in the table above?
  −
#
  −
# Which of the three measures do you think is most representative of the average time? In this case it is probably the mean, but this will not always be so.
  −
  −
<br>
  −
<br>
      
   
 
   
'''B Choosing which measure to use '''
     −
  −
The sales in one week of a particular dress are
  −
given in terms of the dress sizes.
     −
  −
# Determine the mean, median and mode for this data .
  −
# What is the size that is sold the most ?
  −
# Which of these measures is of most use?
  −
  −
<br>
  −
<br>
      
   
 
   
Dress sizes sold in one week
     −
             
  −
{| border="1"
  −
|-
  −
|
  −
10
     −
  −
16
      
   
 
   
16
     −
  −
12
     −
  −
16
      
   
 
   
|
+
In the above diagram the mode has a value of 2. We
14
+
can clearly see, however, that the mode is not representative of the
 +
data, which is mostly concentrated around the 20 to 30 value range.
 +
To use the mode to describe the central tendency of this data set
 +
would be misleading.
    
   
 
   
12
+
== Skewed Distributions and the Mean and Median ==
 +
 +
[[Image:KOER-%20Mathematics%20-%20Statistics_html_26c6186d.png]]We
 +
often test whether our data is normally distributed as this is a
 +
common assumption underlying many statistical tests. An example of a
 +
normally distributed set of data is presented below:
    
   
 
   
14
     −
  −
16
     −
  −
18
      
   
 
   
|
  −
12
     −
  −
10
     −
  −
18
      
   
 
   
10
     −
  −
14
     −
  −
|
  −
16
      
   
 
   
14
     −
  −
8
     −
  −
10
      
   
 
   
16
     −
  −
|
  −
18
     −
  −
16
      
   
 
   
14
     −
  −
16
     −
  −
8
      
   
 
   
|}
+
 
<br>
+
 
<br>
      
   
 
   
==== Evaluation ====
+
 
 +
 
 +
 
 
   
 
   
# Does the student understand the difference between Mean, Median and Mode
+
 
# Can the student calculate each of the measures ?
+
 
# Does the student know which measure is useful and represents the actual data given a data set ?
+
 
 
   
 
   
== Self-Evaluation ==
+
 
 +
 
 +
 
 
   
 
   
== Further Explorations ==
+
When you have a normally distributed sample you
 +
can legitimately use both the mean or the median as your measure of
 +
central tendency. In fact, in any symmetrical distribution the mean,
 +
median and mode are equal. However, in this situation, the mean is
 +
widely preferred as the best measure of central tendency as it is the
 +
measure that includes all the values in the data set for its
 +
calculation, and any change in any of the scores will affect the
 +
value of the mean. This is not the case with the median or mode.
 +
 
 
   
 
   
== Enrichment Activities ==
+
However, when our data is skewed, for example, as
 +
with the right-skewed data set below:
 +
 
 
   
 
   
= Dispersion =
+
[[Image:KOER-%20Mathematics%20-%20Statistics_html_m2609c500.png]]
 +
 
 +
 
 
   
 
   
== Introduction ==
+
 
 +
 
 +
 
 
   
 
   
A measure of spread, sometimes also called a
+
 
measure of dispersion, is used to describe the variability in a
+
 
sample or population. It is usually used in conjunction with a
  −
measure of central tendency, such as, the mean or median, to provide
  −
an overall description of a set of data.
      
   
 
   
There are many reasons why the measure of the
+
 
spread of data values is important but one of the main reasons
+
 
regards its relationship with measures of central tendency. A measure
  −
of spread gives us an idea of how well the mean, for example,
  −
represents the data. If the spread of values in the data set is large
  −
then the mean is not as representative of the data as if the spread
  −
of data is small. This is because a large spread indicates that there
  −
are probably large differences between individual scores.
  −
Additionally, in research, it is often seen as positive if there is
  −
little variation in each data group as it indicates that the similar.
      
   
 
   
We will be looking at the range, quartiles,
+
 
variance, absolute deviation and standard deviation.
+
 
    
   
 
   
== Objectives ==
  −
  −
* Understand that a measure of dispersion is a measure of spread, is used to describe the variability in a sample or population.
  −
* It is usually used in conjunction with a measure of central tendency, such as, the mean or median, to provide an overall description of a set of data.
  −
* It important to measure the spread of data because we can understand its relationship with measures of central tendency to make more accurate interpretation of data.
  −
* Understand and know the terms:Range, Quartile, Standard Deviation , Cumulative Frequency
  −
* Calculation of Co-efficient of Variation. Meaning and interpretation of C.V. Analyse data and make conclusions
  −
  −
== Range ==
  −
  −
The range is the difference between the highest
  −
and lowest scores in a data set and is the simplest measure of
  −
spread. So we calculate range as:
     −
  −
<br>
  −
<br>
     −
  −
Range = maximum value - minimum value
      
   
 
   
<br>
  −
<br>
     −
  −
For example, let us consider the following data
  −
set:
     −
  −
23 56 45 65 59 55 62 54 85 25
      
   
 
   
<br>
  −
<br>
     −
+
 
The maximum value is 85 and the minimum value is
  −
23. This results in a range of 62, which is 85 minus 23. Whilst using
  −
the range as a measure of spread is limited, it does set the
  −
boundaries of the scores. This can be useful if you are measuring a
  −
variable that has either a critical low or high threshold (or both)
  −
that should not be crossed. The range will instantly inform you
  −
whether at least one value broke these critical thresholds. In
  −
addition, the range can be used to detect any errors when entering
  −
data. For example, if you have recorded the age of school children in
  −
your study and your range is 7 to 123 years old you know you have
  −
made a mistake!<br>
  −
<br>
  −
<br>
      
   
 
   
=== Quartiles and Interquartile Range ===
  −
  −
<br>
  −
<br>
     −
+
 
Quartiles tell us about the spread of a data set
  −
by breaking the data set into quarters, just like the median breaks
  −
it in half. For example, consider the marks of the 100 students
  −
below, which have been ordered from the lowest to the highest scores,
  −
and the quartiles highlighted in red.
      
   
 
   
<br>
  −
<br>
     −
  −
Order Score Order Score Order Score Order
  −
Score Order Score
     −
  −
1st 35 21st 42 41st 53 61st 64 81st 74
      
   
 
   
2nd 37 22nd 42 42nd 53 62nd 64 82nd 74
     −
  −
3rd 37 23rd 44 43rd 54 63rd 65 83rd 74
     −
  −
4th 38 24th 44 44th 55 64th 66 84th 75
      
   
 
   
5th 39 25th 45 45th 55 65th 67 85th 75
     −
  −
6th 39 26th 45 46th 56 66th 67 86th 76
     −
  −
7th 39 27th 45 47th 57 67th 67 87th 77
      
   
 
   
8th 39 28th 45 48th 57 68th 67 88th 77
     −
  −
9th 39 29th 47 49th 58 69th 68 89th 79
     −
  −
10th 40 30th 48 50th 58 70th 69 90th 80
      
   
 
   
11th 40 31st 49 51st 59 71st 69 91st 81
     −
   
+
 
12th 40 32nd 49 52nd 60 72nd 69 92nd 81
+
 
 +
 +
 
 +
 
 +
 
 +
 +
 
 +
 
 +
 
 +
 +
 
 +
 
 +
 
 +
 +
we find that the mean is being dragged in the
 +
direct of the skew. In these situations, the median is generally
 +
considered to be the best representative of the central location of
 +
the data. The more skewed the distribution the greater the difference
 +
between the median and mean, and the greater emphasis should be
 +
placed on using the median as opposed to the mean. A classic example
 +
of the above right-skewed distribution is income (salary), where
 +
higher-earners provide a false representation of the typical income
 +
if expressed as a mean and not a median.
 +
 
 +
 +
If dealing with a normal distribution, and tests
 +
of normality show that the data is non-normal, then it is customary
 +
to use the median instead of the mean. This is more a rule of thumb
 +
than a strict guideline however. Sometimes, researchers wish to
 +
report the mean of a skewed distribution if the median and mean are
 +
not appreciably different (a subjective assessment) and if it allows
 +
easier comparisons to previous research to be made.
 +
 
 +
 +
 
 +
 
 +
 
 +
 +
== Summary of when to use the mean, median and mode ==
 +
 +
Please use the following summary table to know
 +
what the best measure of central tendency is with respect to the
 +
different types of variables.
 +
 
 +
 +
 
 +
 
 +
 
 +
                       
 +
{| border="1"
 +
|-
 +
|
 +
'''Type of Variable'''
 +
 
 +
 +
|
 +
'''Best measure of central tendency'''
 +
 
 +
 +
|-
 +
|
 +
Nominal
 +
 
 +
 +
|
 +
Mode
 +
 
 +
 +
|-
 +
|
 +
Ordinal
 +
 
 +
 +
|
 +
Median
 +
 
 +
 +
|-
 +
|
 +
Interval/Ratio (not skewed)
 +
 
 +
 +
|
 +
Mean
 +
 
 +
 +
|-
 +
|
 +
Interval/Ratio (skewed)
 +
 
 +
 +
|
 +
Median
 +
 
 +
 +
|}
 +
 
 +
 
 +
 
 +
 +
== Relative advantages and disadvantages of mean, median and  mode ==
 +
 +
Mean.
 +
Advantages:
 +
Finds the most accurate average of the set of numbers.
 +
Disadvantages:
 +
Outliers (few values are very different from most) can change the
 +
mean a lot... making it much lower/higher than it should
 +
be.
 +
 
 +
Median:
 +
Advantages: Finds the middle number of a set of
 +
data, so outliers have little or no effect.
 +
Disadvantages: If the
 +
gap between some numbers is large, while it is small between other
 +
numbers in the data, this can cause the median to be a very
 +
inaccurate way to find the middle of a set of
 +
values.
 +
 
 +
Mode:
 +
Advantages: Allows you to see what value
 +
happened the most in a set of data. This can help you to figure out
 +
things in a different way. It is also quick and easy.
 +
Disadvantages:
 +
Could be very far from the actual middle of the data. The least
 +
reliable way to find the middle or average of the data.
 +
 
 +
 +
 
 +
 
 +
 +
This means that each of
 +
these measures can be useful in different kinds of distributions.
 +
 
 +
 +
 
 +
 
 +
 +
== Activities ==
 +
 +
== Activity 1 : Central Tendency ==
 +
 +
==== Learning Objectives ====
 +
 +
Learn to calculate each average measure - Mean,
 +
Median, Mode. And understand the difference between them. Know in
 +
which situation which measure must be used.
 +
 
 +
 +
==== Pre-requisites/ Instructions ====
 +
 +
 
 +
 
 +
 
 +
 +
==== Materials and Resources Required ====
 +
 +
Paper and Pencil
 +
 
 +
 +
==== Method ====
 +
 +
Solve the problems A and B
 +
 
 +
 +
 
 +
 
 +
 
 +
 +
A. 27 members of a
 +
class were given a puzzle to solve and the times (in minutes) each
 +
pupil took to solve it were noted.
 +
 
 +
 +
 
 +
 
 +
 
 +
       
 +
{| border="1"
 +
|-
 +
|
 +
'''the times (in minutes) each pupil took'''
 +
 
 +
 +
|-
 +
|
 +
19 14 15 9 18 16 10 11 16
 +
 
 +
 +
4 20 10 14 11 9 13 15 13
 +
 
 +
 +
12 2 17 15 14 10 11 10 12
 +
 
 +
 +
|}
 +
 
 +
 
 +
 
 +
 +
 
 +
 
 +
 
 +
 +
# The MEAN value of a set of data is Sum of Values / Number of Values . What is the mean (to 2 decimal places) of the times given in the table?
 +
# The MEDIAN is the middle value of an ordered set of data.
 +
## Write down the times in the table above in ascending order.
 +
## How many values are there?
 +
## What is the median ?
 +
#
 +
# The MODE is the value which occurs most often, i.e. the most popular.
 +
## What is the mode of the times in the table above?
 +
#
 +
# Which of the three measures do you think is most representative of the average time? In this case it is probably the mean, but this will not always be so.
 +
 +
 
 +
 
 +
 
 +
 +
'''B Choosing which measure to use '''
 +
 
 +
 +
The sales in one week of a particular dress are
 +
given in terms of the dress sizes.
 +
 
 +
 +
# Determine the mean, median and mode for this data .
 +
# What is the size that is sold the most ?
 +
# Which of these measures is of most use?
 +
 +
 
 +
 
 +
 
 +
 +
Dress sizes sold in one week
 +
 
 +
             
 +
{| border="1"
 +
|-
 +
|
 +
10
 +
 
 +
 +
16
 +
 
 +
 +
16
 +
 
 +
 +
12
 +
 
 +
 +
16
 +
 
 +
 +
|
 +
14
 +
 
 +
 +
12
 +
 
 +
 +
14
 +
 
 +
 +
16
 +
 
 +
 +
18
 +
 
 +
 +
|
 +
12
 +
 
 +
 +
10
 +
 
 +
 +
18
 +
 
 +
 +
10
 +
 
 +
 +
14
 +
 
 +
 +
|
 +
16
 +
 
 +
 +
14
 +
 
 +
 +
8
 +
 
 +
 +
10
 +
 
 +
 +
16
 +
 
 +
 +
|
 +
18
 +
 
 +
 +
16
 +
 
 +
 +
14
 +
 
 +
 +
16
 +
 
 +
 +
8
 +
 
 +
 +
|}
 +
 
 +
 
 +
 
 +
 +
==== Evaluation ====
 +
 +
# Does the student understand the difference between Mean, Median and Mode
 +
# Can the student calculate each of the measures ?
 +
# Does the student know which measure is useful and represents the actual data given a data set ?
 +
 +
== Self-Evaluation ==
 +
 +
== Further Explorations ==
 +
 +
== Enrichment Activities ==
 +
 +
= Dispersion =
 +
 +
== Introduction ==
 +
 +
A measure of spread, sometimes also called a
 +
measure of dispersion, is used to describe the variability in a
 +
sample or population. It is usually used in conjunction with a
 +
measure of central tendency, such as, the mean or median, to provide
 +
an overall description of a set of data.
 +
 
 +
 +
There are many reasons why the measure of the
 +
spread of data values is important but one of the main reasons
 +
regards its relationship with measures of central tendency. A measure
 +
of spread gives us an idea of how well the mean, for example,
 +
represents the data. If the spread of values in the data set is large
 +
then the mean is not as representative of the data as if the spread
 +
of data is small. This is because a large spread indicates that there
 +
are probably large differences between individual scores.
 +
Additionally, in research, it is often seen as positive if there is
 +
little variation in each data group as it indicates that the similar.
 +
 
 +
 +
We will be looking at the range, quartiles,
 +
variance, absolute deviation and standard deviation.
 +
 
 +
 +
== Objectives ==
 +
 +
* Understand that a measure of dispersion is a measure of spread, is used to describe the variability in a sample or population.
 +
* It is usually used in conjunction with a measure of central tendency, such as, the mean or median, to provide an overall description of a set of data.
 +
* It important to measure the spread of data because we can understand its relationship with measures of central tendency to make more accurate interpretation of data.
 +
* Understand and know the terms:Range, Quartile, Standard Deviation , Cumulative Frequency
 +
* Calculation of Co-efficient of Variation. Meaning and interpretation of C.V. Analyse data and make conclusions
 +
 +
== Range ==
 +
 +
The range is the difference between the highest
 +
and lowest scores in a data set and is the simplest measure of
 +
spread. So we calculate range as:
 +
 
 +
 +
 
 +
 
 +
 
 +
 +
Range = maximum value - minimum value
 +
 
 +
 +
 
 +
 
 +
 
 +
 +
For example, let us consider the following data
 +
set:
 +
 
 +
 +
23 56 45 65 59 55 62 54 85 25
 +
 
 +
 +
 
 +
 
 +
 
 +
 +
The maximum value is 85 and the minimum value is
 +
23. This results in a range of 62, which is 85 minus 23. Whilst using
 +
the range as a measure of spread is limited, it does set the
 +
boundaries of the scores. This can be useful if you are measuring a
 +
variable that has either a critical low or high threshold (or both)
 +
that should not be crossed. The range will instantly inform you
 +
whether at least one value broke these critical thresholds. In
 +
addition, the range can be used to detect any errors when entering
 +
data. For example, if you have recorded the age of school children in
 +
your study and your range is 7 to 123 years old you know you have
 +
made a mistake!
 +
 
 +
 
 +
 
 +
 +
=== Quartiles and Interquartile Range ===
 +
 +
 
 +
 
 +
 
 +
 +
Quartiles tell us about the spread of a data set
 +
by breaking the data set into quarters, just like the median breaks
 +
it in half. For example, consider the marks of the 100 students
 +
below, which have been ordered from the lowest to the highest scores,
 +
and the quartiles highlighted in red.
 +
 
 +
 +
 
 +
 
 +
 
 +
 +
Order Score Order Score Order Score Order
 +
Score Order Score
 +
 
 +
 +
1st 35 21st 42 41st 53 61st 64 81st 74
 +
 
 +
 +
2nd 37 22nd 42 42nd 53 62nd 64 82nd 74
 +
 
 +
 +
3rd 37 23rd 44 43rd 54 63rd 65 83rd 74
 +
 
 +
 +
4th 38 24th 44 44th 55 64th 66 84th 75
 +
 
 +
 +
5th 39 25th 45 45th 55 65th 67 85th 75
 +
 
 +
 +
6th 39 26th 45 46th 56 66th 67 86th 76
 +
 
 +
 +
7th 39 27th 45 47th 57 67th 67 87th 77
 +
 
 +
 +
8th 39 28th 45 48th 57 68th 67 88th 77
 +
 
 +
 +
9th 39 29th 47 49th 58 69th 68 89th 79
 +
 
 +
 +
10th 40 30th 48 50th 58 70th 69 90th 80
 +
 
 +
 +
11th 40 31st 49 51st 59 71st 69 91st 81
 +
 
 +
   
 +
12th 40 32nd 49 52nd 60 72nd 69 92nd 81
 +
 
 +
 +
13th 40 33rd 49 53rd 61 73rd 70 93rd 81
 +
 
 +
 +
14th 40 34th 49 54th 62 74th 70 94th 81
 +
 
 +
 +
15th 40 35th 51 55th 62 75th 71 95th 81
 +
 
 +
 +
16th 41 36th 51 56th 62 76th 71 96th 81
 +
 
 +
 +
17th 41 37th 51 57th 63 77th 71 97th 83
 +
 
 +
 +
18th 42 38th 51 58th 63 78th 72 98th 84
 +
 
 +
 +
19th 42 39th 52 59th 64 79th 74 99th 84
 +
 
 +
 +
20th 42 40th 52 60th 64 80th 74 100th 85
 +
 
 +
 +
 
 +
 
 +
 
 +
 +
 
 +
 
 +
 
 +
 +
The first quartile (Q1) lies between the 25th and
 +
26th student's marks, the second quartile (Q2) between the 50th and
 +
51st student's marks, and the third quartile (Q3) between the 75th
 +
and 76th student's marks. Hence:
 +
 
 +
 +
 
 +
 
 +
 
 +
 +
First quartile (Q1) = 45 + 45 ÷ 2 = 45
 +
 
 +
 +
Second quartile (Q2) = 58 + 59 ÷ 2 = 58.5
 +
 
 +
 +
Third quartile (Q3) = 71 + 71 ÷ 2 = 71
 +
 
 +
 +
 
 +
 
 +
 
 +
 +
In the above example, we have an even number of
 +
scores (100 students rather than an odd number such as 99 students).
 +
This means that when we calculate the quartiles, we take the sum of
 +
the two scores around each quartile and then half them (hence Q1= 45
 +
+ 45 ÷ 2 = 45) . However, if we had an odd number of scores (say, 99
 +
students), then we would only need to take one score for each
 +
quartile (that is, the 25th, 50th and 75th scores). You should
 +
recognize that the second quartile is also the median.
 +
 
 +
 +
 
 +
 
 +
 
 +
 +
Quartiles are a useful measure of spread because
 +
they are much less affected by outliers or a skewed data set than the
 +
equivalent measures of mean and standard deviation. For this reason,
 +
quartiles are often reported along with the median as the best choice
 +
of measure of spread and central tendency, respectively, when dealing
 +
with skewed and/or data with outliers. A common way of expressing
 +
quartiles is as an interquartile range. The interquartile range
 +
describes the difference between the third quartile (Q3) and the
 +
first quartile (Q1), telling us about the range of the middle half of
 +
the scores in the distribution. Hence, for our 100 students:
    
   
 
   
13th 40 33rd 49 53rd 61 73rd 70 93rd 81
     −
  −
14th 40 34th 49 54th 62 74th 70 94th 81
     −
  −
15th 40 35th 51 55th 62 75th 71 95th 81
      
   
 
   
16th 41 36th 51 56th 62 76th 71 96th 81
+
Interquartile range = Q3 - Q1
    
   
 
   
17th 41 37th 51 57th 63 77th 71 97th 83
+
= 71 - 45
    
   
 
   
18th 42 38th 51 58th 63 78th 72 98th 84
+
= 26
    
   
 
   
19th 42 39th 52 59th 64 79th 74 99th 84
     −
  −
20th 42 40th 52 60th 64 80th 74 100th 85
     −
  −
<br>
  −
<br>
      
   
 
   
<br>
+
However, it should be noted that in journals and
<br>
+
other publications you will usually see the interquartile range
 +
reported as 45 to 71, rather than the calculated range.
    
   
 
   
The first quartile (Q1) lies between the 25th and
  −
26th student's marks, the second quartile (Q2) between the 50th and
  −
51st student's marks, and the third quartile (Q3) between the 75th
  −
and 76th student's marks. Hence:
     −
  −
<br>
  −
<br>
     −
  −
First quartile (Q1) = 45 + 45 ÷ 2 = 45
      
   
 
   
Second quartile (Q2) = 58 + 59 ÷ 2 = 58.5
+
A slight variation on this is the
 +
semi-interquartile range, which is half the interquartile range = ½
 +
(Q3 - Q1). Hence, for our 100 students, this would be 26 ÷ 2 = 13.
    
   
 
   
Third quartile (Q3) = 71 + 71 ÷ 2 = 71
+
== Standard Deviation ==
 
   
   
 
   
<br>
+
The standard deviation is a measure of the spread
<br>
  −
 
  −
  −
In the above example, we have an even number of
  −
scores (100 students rather than an odd number such as 99 students).
  −
This means that when we calculate the quartiles, we take the sum of
  −
the two scores around each quartile and then half them (hence Q1= 45
  −
+ 45 ÷ 2 = 45) . However, if we had an odd number of scores (say, 99
  −
students), then we would only need to take one score for each
  −
quartile (that is, the 25th, 50th and 75th scores). You should
  −
recognize that the second quartile is also the median.
  −
 
  −
  −
<br>
  −
<br>
  −
 
  −
  −
Quartiles are a useful measure of spread because
  −
they are much less affected by outliers or a skewed data set than the
  −
equivalent measures of mean and standard deviation. For this reason,
  −
quartiles are often reported along with the median as the best choice
  −
of measure of spread and central tendency, respectively, when dealing
  −
with skewed and/or data with outliers. A common way of expressing
  −
quartiles is as an interquartile range. The interquartile range
  −
describes the difference between the third quartile (Q3) and the
  −
first quartile (Q1), telling us about the range of the middle half of
  −
the scores in the distribution. Hence, for our 100 students:
  −
 
  −
  −
<br>
  −
<br>
  −
 
  −
  −
Interquartile range = Q3 - Q1
  −
 
  −
  −
= 71 - 45
  −
 
  −
  −
= 26
  −
 
  −
  −
<br>
  −
<br>
  −
 
  −
  −
However, it should be noted that in journals and
  −
other publications you will usually see the interquartile range
  −
reported as 45 to 71, rather than the calculated range.
  −
 
  −
  −
<br>
  −
<br>
  −
 
  −
  −
A slight variation on this is the
  −
semi-interquartile range, which is half the interquartile range = ½
  −
(Q3 - Q1). Hence, for our 100 students, this would be 26 ÷ 2 = 13.
  −
 
  −
  −
== Standard Deviation ==
  −
  −
The standard deviation is a measure of the spread
   
of scores within a set of data. Usually, we are interested in the
 
of scores within a set of data. Usually, we are interested in the
 
standard deviation of a population. However, as we are often
 
standard deviation of a population. However, as we are often
Line 4,045: Line 4,035:  
   
 
   
 
s = sample standard
 
s = sample standard
deviation<br>
+
deviation
 
Σ = sum
 
Σ = sum
of...<br>
+
of...
X = sample mean<br>
+
X = sample mean
 
n = number of scores in sample.
 
n = number of scores in sample.
   Line 4,063: Line 4,053:  
   
 
   
 
σ
 
σ
= population standard deviation<br>
+
= population standard deviation
 
Σ
 
Σ
= sum of...<br>
+
= sum of...
 
μ =
 
μ =
population mean<br>
+
population mean
 
n = number of scores in sample.
 
n = number of scores in sample.
   Line 4,081: Line 4,071:     
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
Line 4,095: Line 4,085:     
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
 
=== Absolute Deviation and Mean Absolute Deviation ===
 
=== Absolute Deviation and Mean Absolute Deviation ===
 
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
Line 4,115: Line 4,105:     
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
Line 4,133: Line 4,123:     
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
 
=== Variance ===
 
=== Variance ===
 
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
Line 4,156: Line 4,146:     
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
Line 4,180: Line 4,170:     
   
 
   
<br>
+
 
    
   
 
   
Line 4,189: Line 4,179:     
   
 
   
[[Image:KOER-%20Mathematics%20-%20Statistics_html_1afc44b3.png]]<br>
+
[[Image:KOER-%20Mathematics%20-%20Statistics_html_1afc44b3.png]]
    
   
 
   
<br>
+
 
    
   
 
   
<br>
+
 
    
   
 
   
Line 4,214: Line 4,204:  
* When the coefficient of variation is less, the given data is more consistent.
 
* When the coefficient of variation is less, the given data is more consistent.
 
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
Line 4,240: Line 4,230:     
   
 
   
<br>
+
 
<br>
+
 
    
   
 
   
1,823

edits

Navigation menu