Changes

Jump to navigation Jump to search
1,979 bytes added ,  08:50, 4 January 2013
no edit summary
Line 1: Line 1:  +
         
 +
<br>
 +
<br>
   −
= Scope of this document =
+
 +
'''Statistics'''
 +
 
 +
 +
<br>
 +
<br>
 +
 
 +
 +
= Introduction =
 
   
 
   
 
The following is a background literature for
 
The following is a background literature for
Line 10: Line 21:     
   
 
   
= Syllabus =
+
The teacher will get an overall idea of all the
                       
+
sub topics required for school level statistics. The flow of how to
{| border="1"
+
build/develop an understanding of the topic for students from basics
|-
+
to more advanced aspects. Each subtopic will be developed by way of
|
+
introductions, objectives, activities, evaluation and advanced and
'''Class '''
+
additional information and resources.
    
   
 
   
|
+
== Statistics ==
'''Topic'''
  −
 
   
   
 
   
|-
+
In early times, the meaning of statistics was
|
+
restricted to information about states ( any political organization
6
+
with a government that has supreme independent authority over a
 +
geographic area). This was later extended to include all collections
 +
of information of all types, and later still it was extended to
 +
include the analysis and interpretation of such data. In modern
 +
terms, &quot;statistics&quot; means both sets of collected
 +
information and analytical work which requires statistical inference.
    
   
 
   
|
+
Doing statistical analysis it is possible to test
'''Data handling '''
+
numerical data for relevance, reliability and validity. In order to
 +
do this, statisticians must present data in such a form that others
 +
can utilise the relevant information to enable them to make
 +
judgements. One view is that the study of statistics is reported to
 +
have started with the Englishman, John Graunt (1620 – 1674), who
 +
collected and studied the death records in various cities of Britain.
 +
He was fascinated by the patterns he found in the whole population.
 +
Much of current day statistical analysis is of quite recent
 +
development, the availability of cheap computing power acting as a
 +
catalyst for the development of appropriate ways of presenting and
 +
analysing data. In fact, the more advanced statistical analyses and
 +
tests are based on probability theory, developed over the past few
 +
centuries, but put into a more modern context by mathematical
 +
statisticians such as Karl Pearson (1857 – 1936) , Sir Ronald
 +
Fisher (1890 – 1962) , Jerzy Neyman (1894 – 1981).
    
   
 
   
(i) What is data - choosing data to examine a hypothesis?
+
The curricular objectives for school level
 +
statistical work can be described as follows:
    
   
 
   
(ii) Collection and organisation of data examples of organising
+
* To understand the meaning of data. The need for statistics and how to collect, organise and represent data in different ways.
it in tally bars and a table.
+
* Skills to represent and analyse data in tabular and graphical forms.
 
+
* Understanding central tendency and computation of the measure of central tendency namely arithmetic mean, median and mode for both grouped and non-grouped data. Have the ability to use the appropriate central tendency to represent the data appropriately.
 +
* Understanding dispersion determine the measures of dispersion such as range quartile deviation, mean deviation and standard deviation.
 +
* Understand the limitations and drawbacks of statistics
 
   
 
   
(iii) Pictograph- Need for scaling in pictographs
+
=== Descriptive and Inferential Statistics ===
interpretation &amp; construction.
  −
 
   
   
 
   
|-
+
<br>
|
+
<br>
7
      
   
 
   
|
+
When analysing data, for example, the marks
'''Data handling'''
+
achieved by 100 students for a piece of coursework, it is possible to
 +
use both descriptive and inferential statistics in your analysis of
 +
their marks. Typically, in most research conducted on groups of
 +
people, you will use both descriptive and inferential statistics to
 +
analyse your results and draw conclusions. So what are descriptive
 +
and inferential statistics? And what are their differences?
    
   
 
   
(i) Collection and organisation of data – choosing the data
+
==== Descriptive Statistics ====
to collect for a hypothesis testing.
  −
 
   
   
 
   
(ii) Mean, median and mode of ungrouped data understanding
+
Descriptive statistics is the term given to the
what they represent.
+
analysis of data that helps describe, show or summarize data in a
 +
meaningful way such that, for example, patterns might emerge from the
 +
data. Descriptive statistics do not, however, allow us to make
 +
conclusions beyond the data we have analysed or reach conclusions
 +
regarding any hypotheses we might have made. They are simply a way to
 +
describe our data.
    
   
 
   
(iii) Constructing bar graphs
+
Descriptive statistics are very important, as if
 
+
we simply presented our raw data it would be hard to visualize what
 +
the data was showing, especially if there was a lot of it.
 +
Descriptive statistics therefore allow us to present the data in a
 +
more meaningful way which allows simpler interpretation of the data.
 +
For example, if we had the results of 100 pieces of students'
 +
coursework, we may be interested in the overall performance of those
 +
students. We would also be interested in the distribution or spread
 +
of the marks. Descriptive statistics allow us to do this. How to
 +
properly describe data through statistics and graphs is an important
 +
topic and discussed in other Laerd Statistics Guides. Typically,
 +
there are two general types of statistic that are used to describe
 +
data:
 +
 
 
   
 
   
(iv) Feel of probability using data through experiments.
+
'''Measures of central tendency: '''these are
Notion of chance in events like tossing coins, dice etc.
+
ways of describing the central position of a frequency distribution
Tabulating and counting occurrences of 1 through 6 in a number of
+
for a group of data. In this case, the frequency distribution is
throws. Preparing the bar graph. Comparing the observation with
+
simply the distribution and pattern of marks scored by the 100
that for a coin. Observing strings of throws, notion of
+
students from the lowest to the highest. We can describe this central
Randomness of ungrouped data.
+
position using a number of statistics, including the mode, median,
 +
and mean. You can read about measures of central tendency here.
    
   
 
   
|-
+
'''Measures of spread:''' these are ways of
|
+
summarizing a group of data by describing how spread out the scores
9
+
are. For example, the mean score of our 100 students may be 65 out of
 +
100. However, not all students will have scored 65 marks. Rather,
 +
their scores will be spread out. Some will be lower and others
 +
higher. Measures of spread help us to summarize how spread out these
 +
scores are. To describe this spread, a number of statistics are
 +
available to us, including the range, quartiles, absolute deviation,
 +
variance and standard deviation.
    
   
 
   
|
+
When we use descriptive statistics it is useful to
'''Statistics'''
+
summarize our group of data using a combination of tabulated
 +
description (i.e. tables), graphical description (i.e. graphs and
 +
charts) and statistical commentary (i.e. a discussion of the
 +
results).
    
   
 
   
Mean, median, mode of grouped and
+
==== Inferential Statistics ====
un-grouped data, a review; range quartile deviation and mean
  −
diviation for a given grouped and un-grouped data; graphical
  −
representation;construction and interpretation of histograms of
  −
varying width, ogives and frequency polygons; review of random
  −
experiments leading to the concept of chance or probability.
  −
 
   
   
 
   
|-
+
We have seen that descriptive statistics provide
|
+
information about our immediate group of data. For example, we could
10
+
calculate the mean and standard deviation of the exam marks for the
 +
100 students and this could provide valuable information about this
 +
group of 100 students. Any group of data like this, that includes all
 +
the data you are interested, in is called a population. A population
 +
can be small or large, as long as it includes all the data you are
 +
interested in. For example, if you were only interested in the exam
 +
marks of 100 students, then the 100 students would represent your
 +
population. Descriptive statistics are applied to populations and the
 +
properties of populations, like the mean or standard deviation, are
 +
called parameters as they represent the whole population (i.e.
 +
everybody you are interested in).
    
   
 
   
|
+
Often, however, you do not have access to the
'''Statistics'''
+
whole population you are interested in investigating but only have a
 +
limited number of data instead. For example, you might be interested
 +
in the exam marks of all students in the UK. It is not feasible to
 +
measure all exam marks of all students in the whole of the UK so you
 +
have to measure a smaller sample of students, for example, 100
 +
students, that are used to represent the larger population of all UK
 +
students. Properties of samples, such as the mean or standard
 +
deviation, are not called parameters but statistics. Inferential
 +
statistics are techniques that allow us to use these samples to make
 +
generalizations about the populations from which the samples were
 +
drawn. It is, therefore, important the sample accurately represents
 +
the population. The process of achieving this is called sampling.
 +
Inferential statistics arise out of the fact that sampling naturally
 +
incurs sampling error and thus a sample is not expected to perfectly
 +
represent the population. The methods of inferential statistics are
 +
(1) the estimation of parameter(s) and (2) testing of statistical
 +
hypotheses.
    
   
 
   
Standard deviation of grouped and
+
= Mind Map =
un-grouped data; calculation of standard deviation by direct
  −
method; coefficient of variation; construction and interpretation
  −
of pie charts
  −
 
   
   
 
   
|}
+
<br>
 
  −
 
      
   
 
   
 
+
[[Image:KOER-%20Mathematics%20-%20Statistics_html_m14464871.jpg]]<br>
 
      
   
 
   
= Curricular Objectives =
+
= Data Handling =
 
   
 
   
# To understand the meaning of data. The need for statistics and how to collect, organise and represent data in different ways.
+
== Introduction ==
# Skills to represent and analyse data in tabular and graphical forms.
  −
# Understanding central tendency and computation of the measure of central tendency namely arthmetic mean, median and mode for both grouped and ungrouped data. Have the ability to use the appropriate central tendency depending on the data.
  −
# Understanding dispersion determine the measures of dispersion such as range quartile deviation, mean deviation and standard deviation.
  −
# Understand the limitations and drawbacks of statistics
   
   
 
   
= Concept Map =
+
Data is a
+
collection of facts, such as values or measurements. It can be
[[Image:Statistics_html_m14464871.jpg]]
+
numbers, words, measurements, observations or even just descriptions
 
+
of things. Statistical work is done for problem solving. For problem
 +
solving, we first have to understand the problem (postulating
 +
hypotheses ) , then we have to collect relevant data , after which we
 +
must be able to present the data, finally analyse the data and make
 +
conclusions related to the original hypotheses. Statistics provides
 +
us with tools to analyse data and draw conclusions from a large set
 +
of data by organising the data in the set in different ways and
 +
analysing the data by observing patterns. Data handling would
 +
include identifying data, collecting data, organising/representing
 +
data and summarising data.
    
   
 
   
 
+
== Objective ==
 
   
   
 
   
= Theme Plan =
+
* What is statistical work and why and where we would need to use this.
                                                     
+
* To understand different types of data: qualitative and quantitative
{| border="1"
+
* To understand the sources of data : Primary and Secondary
|-
+
* To learn how to collect, classify and display data; data is information that is used in any process connected with statistics.
|
+
 
+
== Data ==
 
   
   
 
   
|
+
The term data refers to qualitative or
THEME PLAN FOR THE TOPIC
+
quantitative attributes of a variable or set of variables.Data refers
STATISTICS
+
to the pieces of information that have been observed and recorded,
 +
from an experiment or a survey. There are two types of data: primary
 +
and secondary. The word ”data” is the plural of the word ”datum”,
 +
and therefore one should say, ”the data are” and not ”the data
 +
is”. Data can be classified as primary or secondary, and primary or
 +
secondary data can be classified as qualitative or quantitative.
    
   
 
   
|
+
The figure below summarises the classifications of
 +
data. Primary data describes the original data that have been
 +
collected. This type of data is also known as raw data. Often the
 +
primary data set is very large and is therefore summarised or
 +
processed to extract meaningful information. Qualitative data is
 +
information that cannot be written as numbers, for example, if you
 +
were collecting data from people on how they feel or what their
 +
favourite colour is.Quantitative data is information that can be
 +
written as numbers, for example, if you were collecting data from
 +
people on their height or weight.
    +
 
 +
<br>
    
   
 
   
|
+
<br>
 
      
   
 
   
|-
+
<br>
|
  −
'''CLASS'''
      
   
 
   
|
+
<br>
'''SUBTOPIC'''
      
   
 
   
|
+
<br>
'''CONCEPT
  −
DEVELOPMENT'''
      
   
 
   
|
+
<br>
'''KNOWLEDGE'''
      
   
 
   
|
+
<br>
'''SKILL'''
      
   
 
   
|
+
<br>
'''ACTIVITY'''
      
   
 
   
|-
+
Secondary data is
|
+
primary data that has been summarised or processed, for example, the
6
+
set
    
   
 
   
|
+
of colours that people
Data reading, comprehension and data
+
gave as favourite colours would be secondary data because it is a
collection
      
   
 
   
|
+
summary of responses.
Data is a
+
Data already collected prior our use is secondary data. Primary data
collection of facts, such as values or measurements. It can be
+
is what we collect as a part of our study. All processed data
numbers, words, measurements, observations or even just
+
therefore is also secondary.
descriptions of things.
      
   
 
   
 +
<br>
    +
 +
Transforming primary
 +
data into secondary data through analysis, grouping or organisation
 +
into secondary data is the process of generating information.
    
   
 
   
The need for statistics – to be
+
<br>
able to analyse and draw conclusions for a large set of data by
  −
organising data in different ways and observing patterns.
      
   
 
   
|
+
=== Purpose of Collecting Primary Data ===
Data and patterns of data. Raw
+
Scores Class Intervals, Tally Marks and usage. Frequency in
+
Data is collected to
statistcs.
+
provide answers that help with understanding a particular situation.
 +
Here are examples to illustrate some real world data collections
 +
scenarios in the categories of qualitative and quantitative data.
    
   
 
   
|
+
=== Qualitative Data ===
Identification of patterns and
  −
different methods of collection of data and collation of data
  −
 
   
   
 
   
|
+
* The local government might want to know how many residents have electricity and might ask the question: ”Does your home have a safe supply of electricity?”
ACTIVITY1
+
* A supermarket manager might ask the question: “What flavours of soft drink should be stocked in my supermarket?” The question asked of customers might be “What is your favourite soft drink?” Based on the customers’ responses, the manager can make an informed decision as to what soft drinks to stock.
 
+
* A company manufacturing medicines might ask “How effective is our pill at relieving a headache?” The question asked of people using the pill for a headache might be: “Does taking the pill relieve your headache?” Based on responses, the company learns how effective their product is.
 +
* A motor car company might want to improve their customer service, and might ask their customers: “How can we improve our customer service?”
 +
* A teacher may ask “How many hours of TV by students on TV' to get an idea of what children are learning from TV at home and how it supplements (or affects) the learning in the school
 
   
 
   
|-
+
<br>
|
  −
7
      
   
 
   
|
+
=== Quantitative Data ===
Graphical representation of Data
  −
 
   
   
 
   
|
+
* A cell phone manufacturing company might collect data about how often people buy new cell phones and what factors affect their choice, so that the cell phone company can focus on those features that would make their product more attractive to buyers.
Tabular
+
* A town councillor might want to know how many accidents have occurred at a particular intersection, to decide whether a robot should be installed. The councillor would visit the local police station to research their records to collect the appropriate data.
data can be also represented in the form of a picture ( charts)
+
* A supermarket manager might ask the question: “What flavours of soft drink should be stocked in my supermarket?” The question asked of customers might be “What is your favourite soft drink?” Based on the customers’ responses, the manager can make an informed decision as to what soft drinks to stock.
as visual representations can sometimes be easier to interpret.
+
* What kind of TV programs are watched by students, how many are educational in nature.
 
   
   
 
   
 
+
However, it is important to note that
 +
different questions reveal different features of a situation, and
 +
that this affects the ability to understand the situation. For
 +
example, if the question in the list What kind of TV programs are
 +
watched by students, how many are educational in nature. was
 +
re-phrased to be: Do your children watch educational programs on TV
 +
and if you answered yes, but most programs being watched were of
 +
entertainment value, , then this could give the wrong impression that
 +
TV was being used as an educational tool in your home .
    
   
 
   
There are
+
== Data Collection ==
different types of pictorial representations that can be used to
  −
represent different type of data.
  −
 
   
   
 
   
 
+
The method of
 +
collecting the data must be appropriate to the question being asked.
 +
Some
    
   
 
   
Looking at the data be able to
+
examples of data
select the chart that would clearly represent the data as well as
+
collecting methods are:
convey intended information about the data
      +
 
 +
# Experiments
 +
# Questionnaires, surveys, focus group discussions and interviews
 +
# Other sources (friends, family, newspapers, books, magazines and now increasingly the Internet)
 +
# Observation
 +
# Specialised equipment (rainwater gauges to measure rainfall in a place, various medical equipment that collect information about different biological processes)
 
   
 
   
|
+
<br>
Frequency Distribution, Class
  −
intervals, Bar Chart, Pie Chart , Histogram
      
   
 
   
|
+
The most important
Given statistical data in a table
+
aspect of each method of data collecting is to clearly formulate the
format, develop the skills to select the appropriate chart.
+
question that is to be answered. The details of the data collection
Represent the data as a chart and be able to interpret data
+
should therefore be structured to take your question into account.
given a chart.
+
 
 +
 +
<br>
    
   
 
   
|
+
You must have observed
ACTIVITY 2
+
your teacher recording the attendance of students in your class
 +
everyday, or recording marks obtained by you after every test or
 +
examination. Similarly, you must have also seen a cricket score
 +
board. One score boards have been illustrated here :
    
   
 
   
|-
+
<br>
|
  −
9
      
   
 
   
|
+
NatWest One Day
Central tendency
+
International Series: England v India<br>
 +
Friday, 16 September 2011 at
 +
The Swalec Stadium
    
   
 
   
 +
'''England beat India
 +
by 6 wickets (D/L). '''England won the toss and decided to field
 +
 +
       
 +
{| border="1"
 +
|-
 
|  
 
|  
A measure
+
[[India Innings]]
of central tendency is a single value that attempts to describe a
  −
set of data by identifying the central position within that set of
  −
data.
      
   
 
   
The mean, median and mode are all
+
304 for 6 (50.0 overs)
valid measures of central tendency but, under different
  −
conditions, some measures of central tendency become more
  −
appropriate to use than others.
      
   
 
   
 +
|-
 
|  
 
|  
Mean, Median and Mode as methods of
+
[[England Innings]]
calculating central tendency
      
   
 
   
|
+
241 for 4 (32.2 overs)
Calculation of mean and median
  −
Analyse data and make conclusions
      
   
 
   
|  
+
|}
ACTIVITY 3
+
<br>
    
   
 
   
 +
'''India
 +
1st Innings - Close'''
 +
 +
                                                                                                     
 +
{| border="1"
 
|-
 
|-
 
|  
 
|  
9 &amp; 10
+
<br>
    
   
 
   
 
|  
 
|  
Dispersion
+
<br>
    
   
 
   
 
|  
 
|  
A measure
+
Runs
of dispersion is a measure of spread, is used to describe the
  −
variability in a sample or population.
      
   
 
   
 +
|
 +
Balls
    +
 +
|
 +
4s
    
   
 
   
It is
+
|
usually used in conjunction with a measure of central tendency,
+
6s
such as, the mean or median, to provide an overall description of
  −
a set of data.
      
   
 
   
 
+
|-
 +
|
 +
P Patel
    
   
 
   
It important to measure the spread
+
|
of data because we can understand its relationship with measures
+
c Bresnan
of central tendency to make more accurate interpretation of data.
      
   
 
   
 
|  
 
|  
Range, Quartile, Standard Deviation
+
b Swann
, Cumulative Frequency
      
   
 
   
 
|  
 
|  
Calculation of Co-efficient of
+
'''19'''
Variation. Meaning and interpretation of C.V. Analyse data and
  −
make conclusions
      
   
 
   
 
|  
 
|  
 
+
39
    
   
 
   
|}
+
|  
 
+
0
    
   
 
   
 
+
|
 +
0
    
   
 
   
= Statistics =
+
|-
+
|
'''Statistics'''
+
Rahane
is the study of the collection, organization, and interpretation of
  −
data. It deals with all aspects of this, including the planning of
  −
data collection in terms of the design of surveys and experiments.
  −
(source [[http://en.wikipedia.org/wiki/Statistics]])
      
   
 
   
 
+
|
 +
c Finn
    
   
 
   
''Statistics
+
|
is a set of tools used to organize and analyze data. Data must either
+
b Dernbach
be numeric in origin or transformed by researchers into numbers. For
  −
instance, statistics could be used to analyze percentage scores
  −
English students receive on a grammar test: the percentage scores
  −
ranging from 0 to 100 are already in numeric form. Statistics could
  −
also be used to analyze grades on an essay by assigning numeric
  −
values to the letter grades, e.g., A=4, B=3, C=2, D=1, and F=0.
  −
Though this is not strictly necessary, statistical computations can
  −
be done on a set of textual data as in this case, translating them
  −
into numeric data has been a convention in the past.''
      
   
 
   
 
+
|
 
+
'''26'''
    
   
 
   
A statistician is someone who is
+
|
particularly well versed in the ways of thinking necessary for the
+
47
successful application of statistical analysis. Such people have
  −
often gained this experience through working in any of a wide number
  −
of fields. There is also a discipline called ''mathematical
  −
statistics'', which is concerned with the theoretical basis of the
  −
subject.
      
   
 
   
The word ''statistics'', when
+
|
referring to the scientific discipline, is singular, as in
+
3
&quot;Statistics is an art.&quot;This should not be confused with the
  −
word ''statistic'', referring to a quantity (such as mean or
  −
median) calculated from a set of data,whose plural is ''statistics''
  −
(&quot;this statistic seems wrong&quot; or &quot;these statistics are
  −
misleading&quot;). Source - http://en.wikipedia.org/wiki/Statistics
     −
 
+
Statistical is concerned with the planning of
+
|
studies, especially with the design of randomized experiments and
+
0
with the planning of surveys using random sampling.
      
   
 
   
Of course, the data from a randomized study can be
+
|-
analyzed to consider secondary hypotheses or to suggest new ideas. A
+
|
secondary analysis of the data from a planned study uses tools from
+
Dravid
data analysis.
      
   
 
   
Data analysis is divided into:
+
|
 +
<br>
    
   
 
   
* descriptive statistics - the part of statistics that describes data, i.e. summarises the data and their typical properties.
+
|
 +
b Swann
 +
 
 
   
 
   
* inferential statistics - the part of statistics that draws conclusions from data (using some model for the data): For example, inferential statistics involves selecting a model for the data, checking whether the data fulfill the conditions of a particular model, and with quantifying the involved uncertainty (e.g. using confidence intervals).
+
|
 +
'''69'''
 +
 
 
   
 
   
 +
|
 +
79
    +
 +
|
 +
4
    +
 +
|
 +
0
    
   
 
   
While the tools of data analysis work best on data
+
|-
from randomized studies, they are also applied to other kinds of data
+
|
--- for example, from natural experiments and observational studies,
+
Kohli
in which case the inference is dependent on the model chosen by the
  −
statistician, and so subjective.
      
   
 
   
Mathematical statistics has been inspired by and
+
|
has extended many procedures in applied statistics.
+
hit wicket
    
   
 
   
== Descriptive and Inferential Statistics ==
+
|
 +
b Swann
 +
 
 
   
 
   
 +
|
 +
'''107'''
    +
 +
|
 +
93
    +
 +
|
 +
9
    
   
 
   
When analysing data, for example, the marks
+
|
achieved by 100 students for a piece of coursework, it is possible to
+
1
use both descriptive and inferential statistics in your analysis of
  −
their marks. Typically, in most research conducted on groups of
  −
people, you will use both descriptive and inferential statistics to
  −
analyse your results and draw conclusions. So what are descriptive
  −
and inferential statistics? And what are their differences?
      
   
 
   
 +
|-
 +
|
 +
Raina
    +
 +
|
 +
c Bresnan
    +
 +
|
 +
b Finn
    
   
 
   
=== Descriptive Statistics ===
+
|
 +
'''15'''
 +
 
 
   
 
   
 +
|
 +
15
   −
 
+
 +
|
 +
0
    
   
 
   
Descriptive statistics is the term given to the
+
|
analysis of data that helps describe, show or summarize data in a
+
1
meaningful way such that, for example, patterns might emerge from the
  −
data. Descriptive statistics do not, however, allow us to make
  −
conclusions beyond the data we have analysed or reach conclusions
  −
regarding any hypotheses we might have made. They are simply a way to
  −
describe our data.
      
   
 
   
 +
|-
 +
|
 +
Dhoni
   −
 
+
 +
|
 +
not out
    
   
 
   
Descriptive statistics are very important, as if
+
|
we simply presented our raw data it would be hard to visulize what
+
<br>
the data was showing, especially if there was a lot of it.
  −
Descriptive statistics therefore allow us to present the data in a
  −
more meaningful way which allows simpler interpretation of the data.
  −
For example, if we had the results of 100 pieces of students'
  −
coursework, we may be interested in the overall performance of those
  −
students. We would also be interested in the distribution or spread
  −
of the marks. Descriptive statistics allow us to do this. How to
  −
properly describe data through statistics and graphs is an important
  −
topic and discussed in other Laerd Statistics Guides. Typically,
  −
there are two general types of statistic that are used to describe
  −
data:
      
   
 
   
 +
|
 +
'''50'''
    +
 +
|
 +
26
    +
 +
|
 +
5
    
   
 
   
'''Measures of central tendency: '''these are
+
|
ways of describing the central position of a frequency distribution
+
2
for a group of data. In this case, the frequency distribution is
  −
simply the distribution and pattern of marks scored by the 100
  −
students from the lowest to the highest. We can describe this central
  −
position using a number of statistics, including the mode, median,
  −
and mean. You can read about measures of central tendency here.
      
   
 
   
 +
|-
 +
|
 +
Jadeja
   −
 
+
 +
|
 +
c Bopara
    
   
 
   
'''Measures of spread:''' these are ways of
+
|
summarizing a group of data by describing how spread out the scores
+
b Dernbach
are. For example, the mean score of our 100 students may be 65 out of
  −
100. However, not all students will have scored 65 marks. Rather,
  −
their scores will be spread out. Some will be lower and others
  −
higher. Measures of spread help us to summarize how spread out these
  −
scores are. To describe this spread, a number of statistics are
  −
available to us, including the range, quartiles, absolute deviation,
  −
variance and standard deviation.
      
   
 
   
 +
|
 +
'''0'''
    +
 +
|
 +
1
    +
 +
|
 +
0
    
   
 
   
When we use descriptive statistics it is useful to
+
|
summarize our group of data using a combination of tabulated
+
0
description (i.e. tables), graphical description (i.e. graphs and
  −
charts) and statistical commentary (i.e. a discussion of the
  −
results).
      
   
 
   
 +
|-
 +
|
 +
Ashwin
    +
 +
|
 +
not out
    +
 +
|
 +
<br>
    
   
 
   
== Inferential Statistics ==
+
|
 +
'''0'''
 +
 
 
   
 
   
 +
|
 +
0
   −
 
+
 +
|
 +
0
    
   
 
   
We have seen that descriptive statistics provide
+
|
information about our immediate group of data. For example, we could
+
0
calculate the mean and standard deviation of the exam marks for the
  −
100 students and this could provide valuable information about this
  −
group of 100 students. Any group of data like this, that includes all
  −
the data you are interested, in is called a population. A population
  −
can be small or large, as long as it includes all the data you are
  −
interested in. For example, if you were only interested in the exam
  −
marks of 100 students, then the 100 students would represent your
  −
population. Descriptive statistics are applied to populations and the
  −
properties of populations, like the mean or standard deviation, are
  −
called parameters as they represent the whole population (i.e.
  −
everybody you are interested in).
      
   
 
   
 +
|-
 +
|
 +
'''Extras'''
    +
 +
|
 +
<br>
    +
 +
|
 +
6w 1b 11lb
    
   
 
   
Often, however, you do not have access to the
+
|
whole population you are interested in investigating but only have a
+
'''18'''
limited number of data instead. For example, you might be interested
  −
in the exam marks of all students in the UK. It is not feasible to
  −
measure all exam marks of all students in the whole of the UK so you
  −
have to measure a smaller sample of students, for example, 100
  −
students, that are used to represent the larger population of all UK
  −
students. Properties of samples, such as the mean or standard
  −
deviation, are not called parameters but statistics. Inferential
  −
statistics are techniques that allow us to use these samples to make
  −
generalizations about the populations from which the samples were
  −
drawn. It is, therefore, important the sample accurately represents
  −
the population. The process of achieving this is called sampling
  −
(sampling strategies are discussed in detail here on our sister
  −
site). Inferential statistics arise out of the fact that sampling
  −
naturally incurs sampling error and thus a sample is not expected to
  −
perfectly represent the population. The methods of inferential
  −
statistics are (1) the estimation of parameter(s) and (2) testing of
  −
statistical hypotheses.
      
   
 
   
= Data handling =
+
|
 +
<br>
 +
 
 
   
 
   
The term data refers to qualitative or
+
|-
quantitative attributes of a variable or set of variables.Data refers
+
|
to the pieces of information that have been observed and recorded,
+
'''Total'''
from an experiment or a survey. There are two types of data: primary
  −
and secondary. The word ”data” is the plural of the word ”datum”,
  −
and therefore one should say, ”the data are” and not ”the data
  −
is”. Data can be classified as primary or secondary, and primary or
  −
secondary data can be classified as qualitative or quantitative.
      
   
 
   
Figure 16.1 summarises the classifications of
+
|
data. Primary data describes the original data that have been
+
<br>
collected. This type of data is also known as raw data. Often the
+
 
primary data set is very large and is therefore summarised or
  −
processed to extract meaningful information. Qualitative data is
  −
information that cannot be written as numbers, for example, if you
  −
were collecting data from people on how they feel or what their
  −
favourite colour is.Quantitative data is information that can be
  −
written as numbers, for example, if you were collecting data from
  −
people on their height or weight.
   
   
 
   
Secondary data is primary data that has been summarised or processed, for example, the
+
|
set of colours that people gave as favourite colours would be secondary data because it is a
+
for 6
summary of responses.
     −
Data already collected prior our use is secondary data. Primary data
+
is what we collect as a part of our study. All processed data
+
|
therefore is also secondary.
+
'''304'''
    
   
 
   
 
+
|
Transforming primary
+
'''(50.0 ovs)'''
data into secondary data through analysis, grouping or organisation
  −
into secondary data is the process of generating information.
      
   
 
   
== Purpose of Collecting Primary Data ==
+
|}
+
<br>
Data is collected to
  −
provide answers that help with understanding a particular situation.
  −
Here are examples to illustrate some real world data collections
  −
scenarios in the categories of qualitative and quantitative data.
     −
+
     
== Qualitative Data ==
+
{| border="1"
+
|-
• The local
+
|                                                       
government might want to know how many residents have electricity and
+
{| border="1"
might
+
|-
 +
|
 +
Bowler
    
   
 
   
ask the question: ”Does
+
|
your home have a safesupply of electricity?”
+
Overs
    
   
 
   
• A supermarket
+
|
manager might ask the question: “What flavours of soft drink should
+
Maidens
be
      
   
 
   
stocked in my
+
|
supermarket?” The question asked of customers might be “What is
+
Runs
your
      
   
 
   
favourite soft drink?”
+
|
Based on the customers’ responses, the manager can make an
+
Wickets
    
   
 
   
informed decision as to
+
|-
what soft drinks to stock.
+
|
 +
Bresnan
    
   
 
   
• A company
+
|
manufacturing medicines might ask “How effective is our pill at
+
9.0
relieving a
      
   
 
   
headache?” The
+
|
question asked of people using the pill for a headache might be:
+
0
“Does
      
   
 
   
taking the pill relieve
+
|
your headache?” Based on responses, the company learns how
+
62
    
   
 
   
effective their product
+
|
is.
+
0
    
   
 
   
• A motor car company
+
|-
might want to improve their customer service, and might ask their
+
|
 +
Finn
    
   
 
   
customers: “How can
+
|
we improve our customer service?”
+
10.0
    
   
 
   
A teacher may ask “How
+
|
many hours of TV by students on TV' to get an idea of what children
+
1
are learning from TV at home and how it supplements (or affects) the
  −
learning in the school
      
   
 
   
 
+
|
 +
44
    
   
 
   
== Quantitative Data ==
+
|
+
1
• A cell phone manufacturing company might
  −
collect data about how often people buy new
      
   
 
   
cell phones and what factors affect their choice,
+
|-
so that the cell phone company can focus
+
|
 +
Dernbach
    
   
 
   
on those features that would make their product
+
|
more attractive to buyers.
+
10.0
    
   
 
   
• A town councillor might want to know how many
+
|
accidents have occurred at a particular
+
0
    
   
 
   
intersection, to decide whether a robot should be
+
|
installed. The councillor would visit the
+
73
    
   
 
   
local police station to research their records to
+
|
collect the appropriate data.
+
2
    
   
 
   
• A supermarket manager might ask the question:
+
|-
“What flavours of soft drink should be
+
|
 +
Swann
    
   
 
   
stocked in my supermarket?” The question asked
+
|
of customers might be “What is your
+
9.0
    
   
 
   
favourite soft drink?” Based on the customers’
+
|
responses, the manager can make an
+
0
    
   
 
   
informed decision as to what soft drinks to stock.
+
|
 +
34
    
   
 
   
What kind of TV programs are watched by students,
+
|
how many are educational in nature.
+
3
    
   
 
   
 
+
|-
 
+
|
 +
S Patel
    
   
 
   
 
+
|
 
+
8.0
    
   
 
   
However, it is important to note that different
+
|
questions reveal different features of a situation, and that this
+
0
affects the ability to understand the situation. For example, if the
  −
question in the list What kind of TV programs are watched by
  −
students, how many are educational in nature. was re-phrased to be:
  −
Do your children watch educational programs on TV and if you answered
  −
yes, but most programs being watched were of entertainment value, ,
  −
then this could give the wrong impression that TV was being used as
  −
an educational tool in your home .
     −
= Methods of Data Collection =
   
   
 
   
The method of
+
|
collecting the data must be appropriate to the question being asked.
+
55
Some
      
   
 
   
examples of data
+
|
collecting methods are:
+
0
   −
 
  −
# Experiments
  −
# Questionnaires, surveys, focus group discussions and interviews
  −
# Other sources (friends, family, newspapers, books, magazines and now increasingly the Internet)
  −
# Observation
  −
# Specialised equipment (rainwater guages to measure rainfall in a place, various medical equipment that collect information about different biological processes)
   
   
 
   
 +
|-
 +
|
 +
Bopara
    +
 +
|
 +
4.0
    
   
 
   
The most important
+
|
aspect of each method of data collecting is to clearly formulate the
+
0
question that is to be answered. The details of the data collection
  −
should therefore be structured to take your question into account.
      
   
 
   
 +
|
 +
24
    +
 +
|
 +
0
    
   
 
   
You must have observed
+
|}
your teacher recording the attendance of students in your class
+
<br>
everyday, or recording marks obtained by you after every test or
  −
examination. Similarly, you must have also seen a cricket score
  −
board. One score boards have been illustrated here :
      
   
 
   
 
+
|}
 +
<br>
    
   
 
   
 +
== Recording Data ==
 +
 +
Let us take an example of a class which is
 +
preparing to go for a picnic. The teacher asked the students to give
 +
their choice of fruits out of banana, apple, orange or guava. Uma is
 +
asked to prepare the list. She prepared a list of all the children
 +
and wrote the choice of fruit against each name. This list would help
 +
the teacher to distribute fruits according to the choice.
 +
 +
 +
<br>
 +
<br>
   −
 
+
         
     
   
{| border="1"
 
{| border="1"
 
|-
 
|-
 
|  
 
|  
|}
+
Raghav — Banana
    +
 +
Preeti — Apple
    
   
 
   
NatWest One Day
+
Amar — Guava
International Series: England v India
  −
Friday, 16 September 2011 at
  −
The Swalec Stadium
      
   
 
   
'''England beat India
+
Fatima — Orange
by 6 wickets (D/L). '''England won the toss and decided to field
  −
 
  −
       
  −
{| border="1"
  −
|-
  −
|
  −
[[India Innings]]
      
   
 
   
304 for 6 (50.0 overs)
+
Amita — Apple
    
   
 
   
|-
+
Raman — Banana
|
  −
[[England Innings]]
      
   
 
   
241 for 4 (32.2 overs)
+
Radha — Orange
    
   
 
   
|}
+
Farida — Guava
 
      
   
 
   
'''India
+
Anuradha — Banana
1st Innings - Close'''
     −
                                                                                                     
+
   
{| border="1"
+
Rati — Banana
|-
  −
  −
|  
  −
|
  −
Runs
      
   
 
   
 
|  
 
|  
Balls
+
Bhawana — Apple
    
   
 
   
|
+
Manoj — Banana
4s
      
   
 
   
|
+
Donald — Apple
6s
      
   
 
   
|-
+
Maria — Banana
|
  −
P Patel
      
   
 
   
|
+
Uma — Orange
c Bresnan
      
   
 
   
|
+
Akhtar — Guava
b Swann
      
   
 
   
|
+
Ritu — Apple
'''19'''
      
   
 
   
|
+
Salma — Banana
39
      
   
 
   
|
+
Kavita — Guava
0
      
   
 
   
|
+
Javed — Banana
0
      
   
 
   
|-
+
|
|
+
<br>
Rahane
+
<br>
    
   
 
   
|
+
<br>
c Finn
+
<br>
    
   
 
   
|  
+
Example 1 : A teacher
b Dernbach
+
wants to know the choice of food of each student as part of the
 +
mid-day meal programme. The teacher assigns the task of collecting
 +
this information to Maria. Maria does so using a paper and a pencil.
 +
After arranging the choices in a column, she puts against a choice of
 +
food one ( | ) mark for every student making that choice.
   −
+
           
 +
{| border="1"
 +
|-
 
|  
 
|  
'''26'''
+
Choice
    
   
 
   
 
|  
 
|  
47
+
Number of students
    
   
 
   
 +
|-
 
|  
 
|  
3
+
Rice only
    
   
 
   
|
+
Chapati only
0
      
   
 
   
|-
+
Both rice and chapati
|
  −
Dravid
      
   
 
   
   
|  
 
|  
b Swann
+
|||||||||||||||||
    
   
 
   
|  
+
|||||||||||||
'''69'''
      
   
 
   
|  
+
||||||||||||||||||||
79
      
   
 
   
|  
+
|}
4
+
<br>
 +
<br>
    
   
 
   
|  
+
Umesh, after seeing the
0
+
table suggested a better method to count the students. He asked
 +
Maria to organise the marks ( | ) in a group of ten as shown below :
   −
+
               
 +
{| border="1"
 
|-
 
|-
 
|  
 
|  
Kohli
+
Choice
    
   
 
   
 
|  
 
|  
hit wicket
+
Tally marks
    
   
 
   
 
|  
 
|  
b Swann
+
Number of students
    
   
 
   
 +
|-
 
|  
 
|  
'''107'''
+
Rice only
    
   
 
   
|
+
Chapati only
93
      
   
 
   
|
+
Both rice and chapati
9
      
   
 
   
 
|  
 
|  
1
+
|||||||||| |||||||
    
   
 
   
|-
+
|||||||||| |||
|  
  −
Raina
      
   
 
   
|  
+
|||||||||| ||||||||||
c Bresnan
      
   
 
   
 
|  
 
|  
b Finn
+
17
    
   
 
   
|
+
13
'''15'''
      
   
 
   
|
+
20
15
      
   
 
   
|  
+
|
0
+
<br>
    
   
 
   
|
+
Rajan made it simpler
1
+
by asking her to make groups of five instead of ten, as
    
   
 
   
 +
shown below :
 +
 +
                 
 +
{| border="1"
 
|-
 
|-
 
|  
 
|  
Dhoni
+
Choice
    
   
 
   
 
|  
 
|  
not out
+
Tally marks
    
   
 
   
   
|  
 
|  
'''50'''
+
Number of students
    
   
 
   
 +
|-
 
|  
 
|  
26
+
Rice only
    
   
 
   
|
+
Chapati only
5
      
   
 
   
|
+
Both rice and chapati
2
      
   
 
   
|-
   
|  
 
|  
Jadeja
+
||||| |||||
 +
||||| ||
    
   
 
   
|  
+
||||| |||||  
c Bopara
+
|||
    
   
 
   
|  
+
||||| ||||| ||||| |||||
b Dernbach
      
   
 
   
 
|  
 
|  
'''0'''
+
17
    
   
 
   
|
+
13
1
      
   
 
   
|
+
20
0
      
   
 
   
|  
+
|
0
+
<br>
    
   
 
   
|-
+
<br>
|
  −
Ashwin
      
   
 
   
|
+
=== Meaning of Frequency ===
not out
+
 +
Frequency means the number of occurrences within a
 +
given time period. It is not easy to answer the
 +
question looking at the choices written haphazardly. We arrange the
 +
data in Table below using tally marks.
    
   
 
   
+
<br>
|
  −
'''0'''
     −
+
                             
 +
{| border="1"
 +
|-
 
|  
 
|  
0
+
Subject
    
   
 
   
 
|  
 
|  
0
+
Tally Marks
    
   
 
   
 
|  
 
|  
0
+
Number of Students
    
   
 
   
 
|-
 
|-
 
|  
 
|  
'''Extras'''
+
Art
    
   
 
   
   
|  
 
|  
6w 1b 11lb
+
|||| ||
    
   
 
   
 
|  
 
|  
'''18'''
+
7
    
   
 
   
   
|-
 
|-
 
|  
 
|  
'''Total'''
+
Mathematics
    
   
 
   
   
|  
 
|  
for 6
+
||||
    
   
 
   
 
|  
 
|  
'''304'''
+
5
    
   
 
   
 +
|-
 
|  
 
|  
'''(50.0 ovs)'''
+
Science
    
   
 
   
|}
  −
  −
  −
       
  −
{| border="1"
  −
|-
  −
|                                                       
  −
{| border="1"
  −
|-
   
|  
 
|  
Bowler
+
|||||
    
   
 
   
 
|  
 
|  
Overs
+
6
    
   
 
   
 +
|-
 
|  
 
|  
Maidens
+
English
    
   
 
   
 
|  
 
|  
Runs
+
||||
    
   
 
   
 
|  
 
|  
Wickets
+
4
    
   
 
   
|-
+
|}
|
+
<br>
Bresnan
      
   
 
   
|
+
The number of tallies
9.0
+
before each subject gives the number of students who like that
 +
particular subject. This is known as the frequency of that subject.
 +
Frequency gives the number of times that a particular entry occurs.
 +
From above table, Frequency of students who like English is 4
 +
Frequency of students who like Mathematics is 5 The table made is
 +
known as frequency distribution table as it gives the number of times
 +
an entry occurs.
    
   
 
   
|
+
=== Categorical Frequency Distributions ===
0
+
 +
Categorical frequency
 +
distributions - can be used for data that can be placed in specific
 +
categories, such as nominal- or ordinal-level data. (nominal or
 +
ordinal also called discrete data is where we can distinctly count
 +
the occurrences of a variable).
    
   
 
   
|
+
<br>
62
      
   
 
   
|
+
Examples - political
0
+
affiliation, religious affiliation, blood type etc. Below is Blood
 +
Type frequency distribution example.
    
   
 
   
 +
<br>
 +
 +
                             
 +
{| border="1"
 
|-
 
|-
 
|  
 
|  
Finn
+
Class
    
   
 
   
 
|  
 
|  
10.0
+
Frequency
    
   
 
   
 
|  
 
|  
1
+
Percent
    
   
 
   
 +
|-
 
|  
 
|  
44
+
A
    
   
 
   
 
|  
 
|  
1
+
5
    
   
 
   
|-
   
|  
 
|  
Dernbach
+
20
    
   
 
   
 +
|-
 
|  
 
|  
10.0
+
B
    
   
 
   
 
|  
 
|  
0
+
7
    
   
 
   
 
|  
 
|  
73
+
28
    
   
 
   
 +
|-
 
|  
 
|  
2
+
C
    
   
 
   
|-
   
|  
 
|  
Swann
+
9
    
   
 
   
 
|  
 
|  
9.0
+
36
    
   
 
   
 +
|-
 
|  
 
|  
0
+
D
    
   
 
   
 
|  
 
|  
34
+
4
    
   
 
   
 
|  
 
|  
3
+
16
 +
 
 +
 +
|}
 +
<br>
 +
 
 +
 
 +
== Activities ==
 +
 +
=== Activity 1 Data Collection ===
 +
 +
==== Learning Objectives ====
 +
 +
Understand collection of data .
 +
 
 +
 +
==== Materials and resources required ====
 +
 +
Paper &amp; Pen
 +
 
 +
 +
==== Pre-requisites/ Instructions ====
 +
 +
The meaning of data and how to data is organised
 +
in a tabular form
    
   
 
   
 +
==== Method ====
 +
 +
The table below has spaces for up to 10 entries.
 +
The first four columns have headings. Choose headings for the other
 +
columns and collect data from the 10 of your class mates
 +
 +
                                                                                                                       
 +
{| border="1"
 
|-
 
|-
 
|  
 
|  
S Patel
+
'''Name'''
    
   
 
   
 
|  
 
|  
8.0
+
'''Age'''
    
   
 
   
 
|  
 
|  
0
+
'''Height'''
    
   
 
   
 
|  
 
|  
55
+
'''Favourite Colour '''
    
   
 
   
 
|  
 
|  
0
+
<br>
    
   
 
   
|-
   
|  
 
|  
Bopara
+
<br>
    
   
 
   
 
|  
 
|  
4.0
+
<br>
    
   
 
   
 
|  
 
|  
0
+
<br>
    
   
 
   
 +
|-
 
|  
 
|  
24
+
<br>
    
   
 
   
 
|  
 
|  
0
+
<br>
    
   
 
   
|}
+
|  
 +
<br>
    +
 +
|
 +
<br>
    
   
 
   
 
|  
 
|  
|}
+
<br>
    +
 +
|
 +
<br>
    
   
 
   
== Recording Data ==
+
|
 +
<br>
 +
 
 
   
 
   
Let us take an example of a class which is
+
|
preparing to go for a picnic. The teacher asked the students to give
+
<br>
their choice of fruits out of banana, apple, orange or guava. Uma is
  −
asked to prepare the list. She prepared a list of all the children
  −
and wrote the choice of fruit against each name. This list would help
  −
the teacher to distribute fruits according to the choice.
      
   
 
   
  −
  −
  −
         
  −
{| border="1"
   
|-
 
|-
 
|  
 
|  
Raghav — Banana
+
<br>
    
   
 
   
Preeti — Apple
+
|
 +
<br>
    
   
 
   
Amar — Guava
+
|
 +
<br>
    
   
 
   
Fatima — Orange
+
|
 +
<br>
    
   
 
   
Amita — Apple
+
|
 +
<br>
    
   
 
   
Raman — Banana
+
|
 +
<br>
    
   
 
   
Radha — Orange
+
|
 +
<br>
    
   
 
   
Farida — Guava
+
|
 +
<br>
    
   
 
   
Anuradha — Banana
+
|-
 +
|
 +
<br>
    
   
 
   
Rati — Banana
+
|
 +
<br>
    
   
 
   
 
|  
 
|  
Bhawana — Apple
+
<br>
    
   
 
   
Manoj — Banana
+
|
 +
<br>
    
   
 
   
Donald — Apple
+
|
 +
<br>
    
   
 
   
Maria — Banana
+
|
 +
<br>
    
   
 
   
Uma — Orange
+
|
 +
<br>
    
   
 
   
Akhtar — Guava
+
|
 +
<br>
    
   
 
   
Ritu — Apple
+
|-
 +
|
 +
<br>
    
   
 
   
Salma — Banana
+
|
 +
<br>
    
   
 
   
Kavita — Guava
+
|
 +
<br>
    
   
 
   
Javed — Banana
+
|
 +
<br>
    
   
 
   
|
+
|  
 +
<br>
    +
 +
|
 +
<br>
    +
 +
|
 +
<br>
    
   
 
   
 
+
|
 
+
<br>
    
   
 
   
Example 1 : A teacher
  −
wants to know the choice of food of each student as part of the
  −
mid-day meal programme. The teacher assigns the task of collecting
  −
this information to Maria. Maria does so using a paper and a pencil.
  −
After arranging the choices in a column, she puts against a choice of
  −
food one ( | ) mark for every student making that choice.
  −
  −
           
  −
{| border="1"
   
|-
 
|-
 
|  
 
|  
Choice
+
<br>
    
   
 
   
 
|  
 
|  
Number of students
+
<br>
    
   
 
   
|-
   
|  
 
|  
Rice only
+
<br>
    
   
 
   
Chapati only
+
|
 +
<br>
    
   
 
   
Both rice and chapati
+
|
 +
<br>
    
   
 
   
 
|  
 
|  
|||||||||||||||||
+
<br>
    
   
 
   
|||||||||||||
+
|  
 +
<br>
    
   
 
   
||||||||||||||||||||
+
|  
 +
<br>
    
   
 
   
|}
+
|-
 +
|
 +
<br>
    +
 +
|
 +
<br>
    +
 +
|
 +
<br>
    
   
 
   
Umesh, after seeing the
+
|
table suggested a better method to count the students. He asked
+
<br>
Maria to organise the marks ( | ) in a group of ten as shown below :
     −
               
+
{| border="1"
  −
|-
   
|  
 
|  
Choice
+
<br>
    
   
 
   
 
|  
 
|  
Tally marks
+
<br>
    
   
 
   
 
|  
 
|  
Number of students
+
<br>
    
   
 
   
|-
   
|  
 
|  
Rice only
+
<br>
    
   
 
   
Chapati only
+
|-
 +
|
 +
<br>
    
   
 
   
Both rice and chapati
+
|
 +
<br>
    
   
 
   
 
|  
 
|  
|||||||||| |||||||
+
<br>
    
   
 
   
|||||||||| |||
+
|  
 +
<br>
    
   
 
   
|||||||||| ||||||||||
+
|  
 +
<br>
    
   
 
   
 
|  
 
|  
17
+
<br>
    
   
 
   
13
+
|
 +
<br>
    
   
 
   
20
+
|
 +
<br>
    
   
 
   
|
+
|-
 +
|
 +
<br>
    +
 +
|
 +
<br>
    
   
 
   
Rajan made it simpler
+
|
by asking her to make groups of five instead of ten, as
+
<br>
    
   
 
   
shown below :
+
|
 +
<br>
   −
                 
+
{| border="1"
  −
|-
   
|  
 
|  
Choice
+
<br>
    
   
 
   
 
|  
 
|  
Tally marks
+
<br>
    
   
 
   
 
|  
 
|  
Number of students
+
<br>
    
   
 
   
|-
   
|  
 
|  
Rice only
+
<br>
    
   
 
   
Chapati only
+
|-
 +
|
 +
<br>
    
   
 
   
Both rice and chapati
+
|
 +
<br>
    
   
 
   
 
|  
 
|  
||||| |||||
+
<br>
||||| ||
      
   
 
   
||||| |||||  
+
|  
|||
+
<br>
    
   
 
   
||||| ||||| ||||| |||||
+
|  
 +
<br>
    
   
 
   
 
|  
 
|  
17
+
<br>
    
   
 
   
13
+
|
 +
<br>
    
   
 
   
20
+
|
 +
<br>
    
   
 
   
|
+
|-
 
+
|
 +
<br>
    
   
 
   
 
+
|
 +
<br>
    
   
 
   
== Meaning of Frequency ==
+
|
+
<br>
Frequency means the number of occurrences within a
  −
given time period.
  −
 
  −
  −
It is not easy to answer the question looking at
  −
the choices written haphazardly. We
      
   
 
   
arrange the data in
  −
Table 1 using tally marks.
  −
  −
  −
Table 1
  −
  −
                             
  −
{| border="1"
  −
|-
   
|  
 
|  
Subject
+
<br>
    
   
 
   
 
|  
 
|  
Tally Marks
+
<br>
    
   
 
   
 
|  
 
|  
Number of Students
+
<br>
    
   
 
   
|-
   
|  
 
|  
Art
+
<br>
    
   
 
   
 
|  
 
|  
|||| ||
+
<br>
    
   
 
   
|  
+
|}
7
+
<br>
 +
<br>
    
   
 
   
|-
+
==== Evaluation ====
|
  −
Mathematics
  −
 
   
   
 
   
|
+
Looking at the table
||||
+
and data can the student answer the following questions ?
    
   
 
   
|
+
# Does any student like green the most ?
5
+
# Do you think red is the most popular colour, why ?
 
+
# What other information did you come to know about each student ?
 
   
 
   
|-
+
== Evaluation ==
|
  −
Science
  −
 
   
   
 
   
|
+
At the end of this sub-topic the student should be
|||||
+
able to
    
   
 
   
|
+
# Identify the different types of data
6
+
# Collect, classify and organise data in a tabular form
 
+
# Calculate the frequency of data
 +
# Interpret data that is given in a tabular form
 
   
 
   
|-
+
== Self-Evaluation ==
|
+
English
+
== Further Explorations ==
 +
 +
== Enrichment Activities ==
 +
 +
= Graphical representation of Data =
 +
 +
== Introduction ==
 +
 +
Tabular data
 +
can be also represented in the form of a picture ( charts) as visual
 +
representations can sometimes be easier to interpret. There are
 +
different types of pictorial representations that can be used to
 +
represent different type of data.
 +
 
 +
 +
== Objectives ==
 +
 +
* Understand and know the different pictorial representations: Histogram, Bar Char, Pie Chart
 +
* To be able to look at the data and select the chart that would clearly represent the data as well as convey intended information about the data.
 +
* Understand and know the terms : Frequency Distribution, Class intervals
 +
* To be able to look at a graphical representation and interpret the data
 +
 +
== Histogram & Bar Chart ==
 +
 
 +
=== What is a histogram? ===
 +
 +
<br>
    
   
 
   
|
+
A histogram is a plot
||||
+
that lets you discover, and show, the underlying frequency
 +
distribution (shape) of a set of continuous data. This allows the
 +
inspection of the data for its underlying distribution (e.g. normal
 +
distribution), outliers, skewness, etc. An example of a histogram,
 +
and the raw data it was constructed from, is shown below:
    
   
 
   
|
+
<br>
4
      
   
 
   
|}
+
<br>
    +
 +
<br>
    
   
 
   
The number of tallies
+
<br>
before each subject gives the number of students who like that
      
   
 
   
particular subject.
+
<br>
This is known as the frequency of that subject. Frequency gives the
  −
number of times that a particular entry occurs. From Table 1,
  −
Frequency of students who like English is 4 Frequency of students
  −
who like Mathematics is 5 The table made is known as frequency
  −
distribution table as it gives the number of times an entry occurs.
      
   
 
   
 +
<br>
    +
 +
[[Image:KOER-%20Mathematics%20-%20Statistics_html_6201ec25.png]]<br>
    
   
 
   
== Categorical Frequency Distributions ==
+
<br>
 +
 
 
   
 
   
 +
<br>
    +
 +
<br>
    
   
 
   
Categorical frequency
+
<br>
distributions - can be used for data that can be placed in specific
  −
categories, such as nominal- or ordinal-level data. (nominal or
  −
ordinal also called discrete data is where we can distinctly count
  −
the occurences of a variable).
      
   
 
   
 +
<br>
    +
 +
<br>
    
   
 
   
Examples - political
+
<br>
affiliation, religious affiliation, blood type etc.
     −
                             
+
{| border="1"
+
<br>
|-
  −
|
  −
Class
      
   
 
   
|
+
<br>
Frequency
      
   
 
   
|
+
<br>
Percent
      
   
 
   
|-
+
<br>
|
  −
A
      
   
 
   
|
+
<br>
5
      
   
 
   
|
+
<br>
20
      
   
 
   
|-
+
<br>
|
  −
B
      
   
 
   
|
+
<br>
7
      
   
 
   
|
+
36 25 38 46 55 68
28
+
72 55 36 38
    
   
 
   
|-
+
67 45 22 48 91 46
|
+
52 61 58 55
C
      
   
 
   
|
+
<br>
9
      
   
 
   
|
+
=== How do you construct a histogram from a continuous variable? ===
36
  −
 
   
   
 
   
|-
+
<br>
|
  −
D
      
   
 
   
|
+
To construct a
4
+
histogram from a continuous variable you first need to split the data
 +
into intervals, called bins. In the example above, age has been split
 +
into bins, with each bin representing a 10-year period starting at 20
 +
years. Each bin contains the number of occurrences of scores in the
 +
data set that are contained within that bin. For the above data set,
 +
the frequencies in each bin have been tabulated along with the scores
 +
that contributed to the frequency in each bin (see below):
    
   
 
   
|
+
<br>
16
      
   
 
   
|}
+
Bin Frequency Scores
 
+
Included in Bin
    
   
 
   
Blood Type frequency
+
20-30 2 25,22
distribution example
      
   
 
   
= Graphical Representations =
+
30-40 4 36,38,36,38
  −
== Histogram & Bar Chart ==
  −
 
  −
=== What is a histogram? ===
  −
  −
 
      
   
 
   
A histogram is a plot
+
40-50 4 46,45,48,46
that lets you discover, and show, the underlying frequency
  −
distribution (shape) of a set of continuous data. This allows the
  −
inspection of the data for its underlying distribution (e.g. normal
  −
distribution), outliers, skewness, etc. An example of a histogram,
  −
and the raw data it was constructed from, is shown below:
      
   
 
   
 
+
50-60 5 55,55,52,58,55
    
   
 
   
[[Image:Statistics_html_6201ec25.png]]
+
60-70 3 68,67,61
    
   
 
   
 
+
70-80 1 72
    
   
 
   
 
+
80-90 0 -
    
   
 
   
 
+
90-100 1 91
    
   
 
   
 
+
<br>
    
   
 
   
 
+
Notice that, unlike a
 +
bar chart, there are no &quot;gaps&quot; between the bars (although
 +
some bars might be &quot;absent&quot; reflecting no frequencies).
 +
This is because a histogram represents a continuous data set, and as
 +
such, there are no gaps in the data. (Although you will have to
 +
decide whether you round up or round down scores on the boundaries of
 +
bins)
    
   
 
   
 
+
<br>
    
   
 
   
 
+
=== Choosing the correct bin width ===
 
   
   
 
   
 
+
<br>
    
   
 
   
 
+
There is no right or
 +
wrong answer as to how wide a bin should be, but there are rules of
 +
thumb. You need to make sure that the bins are not too small or too
 +
large. Consider the histogram we produced earlier (see above): the
 +
following histograms use the same data but have either much smaller
 +
or larger bins, as shown below:
    
   
 
   
 
+
<br>
    
   
 
   
 
+
[[Image:KOER-%20Mathematics%20-%20Statistics_html_75ab55c3.png]]<br>
    
   
 
   
 
+
<br>
    
   
 
   
 
+
<br>
    
   
 
   
 
+
We can see from the
 +
histogram on the left, that the bin width is too small as it shows
 +
too much individual data and does not allow the underlying pattern
 +
(frequency distribution) of the data to be easily seen. At the other
 +
end of the scale, is the diagram on the right, where the bins are too
 +
large and, again, we are unable to find the underlying trend in the
 +
data.
    
   
 
   
 
+
Histograms are based on
 +
area not height of bars
    
   
 
   
 
+
<br>
    
   
 
   
36 25 38 46 55 68
+
In a histogram, it is
72 55 36 38
+
the area of the bar that indicates the frequency of occurrences for
 +
each bin. This means that the height of the bar does not necessarily
 +
indicate how many occurrences of scores there were within each
 +
individual bin. It is the product of height multiplied by the width
 +
of the bin that indicates the frequency of occurrences within that
 +
bin. One of the reasons that the height of the bars is often
 +
incorrectly assessed as indicating frequency and not the area of the
 +
bar is due to the fact that a lot of histograms often have equally
 +
spaced bars (bins) and, under these circumstances, the height of the
 +
bin does reflect the frequency.
    
   
 
   
67 45 22 48 91 46
+
<br>
52 61 58 55
      
   
 
   
=== How do you construct a histogram from a continuous variable? ===
+
=== What is the difference between a bar chart and a histogram? ===
 
   
 
   
 
+
[[Image:KOER-%20Mathematics%20-%20Statistics_html_6dfca87b.png]]<br>
    
   
 
   
To construct a
+
The major difference is
histogram from a continuous variable you first need to split the data
+
that a histogram is only used to plot the frequency of score
into intervals, called bins. In the example above, age has been split
+
occurrences in a continuous data set that has been divided into
into bins, with each bin representing a 10-year period starting at 20
+
classes, called bins. Bar charts, on the other hand, can be used for
years. Each bin contains the number of occurrences of scores in the
+
a great deal of other types of variables including ordinal and
data set that are contained within that bin. For the above data set,
+
nominal data sets.
the frequencies in each bin have been tabulated along with the scores
  −
that contributed to the frequency in each bin (see below):
      
   
 
   
 
+
<br>
    
   
 
   
Bin Frequency Scores
+
<br>
Included in Bin
      
   
 
   
20-30 2 25,22
+
<br>
    
   
 
   
30-40 4 36,38,36,38
+
== Circle or Pie Chart ==
 
   
   
 
   
40-50 4 46,45,48,46
+
[[Image:KOER-%20Mathematics%20-%20Statistics_html_461389d1.png]]These
 +
are called circle graphs. A circle graph shows the relationship
 +
between a whole and its parts. Here, the whole circle is divided into
 +
sectors. The size of each sector is proportional to the activity or
 +
information it represents.
    
   
 
   
50-60 5 55,55,52,58,55
+
<br>
    
   
 
   
60-70 3 68,67,61
+
A variety of graphical
 +
representations of data are now possible using spreadsheet software.
 +
OpenOffice CALC can convert a table of data into bar charts, pie
 +
charts, area charts etc and make data much more easy to
 +
read/interpret.
    
   
 
   
70-80 1 72
+
<br>
    +
 
 +
== Activities ==
 +
 +
=== Avtivity 2: Histogram and Bar Chart ===
 
   
 
   
80-90 0 -
+
==== Learning Objectives ====
 
   
   
 
   
90-100 1 91
+
Learn to draw a histogram and bar chart.
 +
Understand the difference between a bar chart and a histogram and be
 +
able to select the appropriate chart by looking at the problem and
 +
data.
    
   
 
   
 
+
==== Materials and Resources Required ====
 
   
   
 
   
Notice that, unlike a
+
Paper and Pencil
bar chart, there are no &quot;gaps&quot; between the bars (although
  −
some bars might be &quot;absent&quot; reflecting no frequencies).
  −
This is because a histogram represents a continuous data set, and as
  −
such, there are no gaps in the data. (Although you will have to
  −
decide whether you round up or round down scores on the boundaries of
  −
bins)
      
   
 
   
 
+
==== Pre-requisites/ Instructions ====
 
   
   
 
   
=== Choosing the correct bin width ===
+
==== Method ====
 
   
 
   
 
+
Solve the problems A and B
    
   
 
   
There is no right or
+
<br>
wrong answer as to how wide a bin should be, but there are rules of
+
<br>
thumb. You need to make sure that the bins are not too small or too
  −
large. Consider the histogram we produced earlier (see above): the
  −
following histograms use the same data but have either much smaller
  −
or larger bins, as shown below:
      
   
 
   
 +
A&gt; In the past year, you have recorded the
 +
number of tickets that a movie theater has sold during each month.
 +
To represent this data set graphically, would you construct a bar
 +
graph or a histogram? Why is this choice better than the other?
 +
Using the following data, construct the graph that you choose.
    +
                                                       
 +
{| border="1"
 +
|-
 +
|
 +
Month
    
   
 
   
[[Image:Statistics_html_75ab55c3.png]]
+
|
 +
Number of Tickets Sold
    
   
 
   
 +
|-
 +
|
 +
January
    +
 +
|
 +
25
    
   
 
   
 +
|-
 +
|
 +
February
    +
 +
|
 +
20
    
   
 
   
We can see from the
+
|-
histogram on the left, that the bin width is too small as it shows
+
|
too much individual data and does not allow the underlying pattern
+
March
(frequency distribution) of the data to be easily seen. At the other
  −
end of the scale, is the diagram on the right, where the bins are too
  −
large and, again, we are unable to find the underlying trend in the
  −
data.
      
   
 
   
Histograms are based on
+
|
area not height of bars
+
15
    
   
 
   
 +
|-
 +
|
 +
April
    +
 +
|
 +
20
    
   
 
   
In a histogram, it is
+
|-
the area of the bar that indicates the frequency of occurrences for
+
|
each bin. This means that the height of the bar does not necessarily
+
May
indicate how many occurrences of scores there were within each
+
 
individual bin. It is the product of height multiplied by the width
+
of the bin that indicates the frequency of occurrences within that
+
|
bin. One of the reasons that the height of the bars is often
+
30
incorrectly assessed as indicating frequency and not the area of the
  −
bar is due to the fact that a lot of histograms often have equally
  −
spaced bars (bins) and, under these circumstances, the height of the
  −
bin does reflect the frequency.
      
   
 
   
 +
|-
 +
|
 +
June
    +
 +
|
 +
35
    
   
 
   
=== What is the difference between a bar chart and a histogram? ===
+
|-
+
|
[[Image:Statistics_html_6dfca87b.png]]
+
July
    
   
 
   
The major difference is
+
|
that a histogram is only used to plot the frequency of score
+
40
occurrences in a continuous data set that has been divided into
  −
classes, called bins. Bar charts, on the other hand, can be used for
  −
a great deal of other types of variables including ordinal and
  −
nominal data sets.
      
   
 
   
 
+
|-
 +
|
 +
August
    
   
 
   
 
+
|
 +
20
    
   
 
   
 
+
|-
 +
|
 +
September
    
   
 
   
== Circle or Pie Chart ==
+
|
 +
25
 +
 
 
   
 
   
 +
|-
 +
|
 +
October
    +
 +
|
 +
15
    
   
 
   
These are called circle
+
|-
graphs. A circle graph shows the relationship between a whole and its
+
|
parts. Here, the whole circle is divided into sectors. The size of
+
November
each sector is proportional to the activity or information it
  −
represents.
      
   
 
   
 +
|
 +
20
    +
 +
|-
 +
|
 +
December
    
   
 
   
A variety of graphical
+
|
representations of data are now possible using spreadsheet software.
+
30
OpenOffice CALC can convert a table of data into bar charts, pie
  −
charts, area charts etc and make data much more easy to
  −
read/interpret.
      
   
 
   
 +
|}
 +
<br>
    +
 +
<br>
   −
   
  −
= Types of Variables =
   
   
 
   
All experiments examine some kind of variable(s).
+
B&gt; For a recent
A variable is not only something that we measure, but also something
+
science project, you collected data regarding the distribution of
that we can manipulate and something we can control for. To
+
fish and aquatic life in a nearby pond. Your data consists of the
understand the characteristics of variables and how we use them in
+
number of living creatures found in each 1 meter depth increment in
research, this guide is divided into three main sections. First, we
+
the pond. Construct a bar graph and several histograms (vary the
illustrate the role of dependent and independent variables. Second,
+
depth increment size) for the following data. In which case(s) is the
we discuss the difference between experimental and non-experimental
+
histogram the same as the bar graph? How do the other histograms vary
research. Finally, we explain how variables can be characterised as
+
from the bar graph?
either categorical or continuous.
      
   
 
   
== Dependent and Independent Variables ==
+
<br>
      +
                                               
 +
{| border="1"
 +
|-
 +
|
 +
'''Depth Range'''
    +
 +
|
 +
'''Number of Living Creatures '''
    
   
 
   
An independent variable, sometimes called an
+
|-
experimental or predictor variable, is a variable that is being
+
|
manipulated in an experiment in order to observe the effect on a
+
0 – 1 meters
dependent variable, sometimes called an outcome variable.
      
   
 
   
 
+
|
 
+
10
    
   
 
   
Imagine that a tutor asks 100 students to complete
+
|-
a maths test. The tutor wants to know why some students perform
+
|
better than others. Whilst the tutor does not know the answer to
+
1 2 meters
this, she thinks that it might be because of two reasons: (1) some
  −
students spend more time revising for their test; and (2) some
  −
students are naturally more intelligent than others. As such, the
  −
tutor decides to investigate the effect of revision time and
  −
intelligence on the test performance of the 100 students. The
  −
dependent and independent variables for the study are:
      
   
 
   
 
+
|
 
+
93
    
   
 
   
Dependent Variable: Test Mark (measured from 0 to
+
|-
100)
+
|
 +
2 – 3 meters
    
   
 
   
 +
|
 +
23
    +
 +
|-
 +
|
 +
3 – 4 meters
    +
 +
|
 +
47
    
   
 
   
Independent Variables: Revision time (measured in
+
|-
hours) Intelligence (measured using IQ score)
+
|
 +
4 – 5 meters
    
   
 
   
 +
|
 +
68
    +
 +
|-
 +
|
 +
5 – 6 meters
    +
 +
|
 +
51
    
   
 
   
The dependent variable is simply that, a variable
+
|-
that is dependent on an independent variable(s). For example, in our
+
|
case the test mark that a student achieves is dependent on revision
+
6 – 7 meters
time and intelligence. Whilst revision time and intelligence (the
  −
independent variables) may (or may not) cause a change in the test
  −
mark (the dependent variable), the reverse is implausible; in other
  −
words, whilst the number of hours a student spends revising and the
  −
higher a student's IQ score may (or may not) change the test mark
  −
that a student achieves, a change in a student's test mark has no
  −
bearing on whether a student revises more or is more intelligent
  −
(this simply doesn't make sense).
      
   
 
   
 
+
|
 
+
43
    
   
 
   
Therefore, the aim of the tutor's investigation is
+
|-
to examine whether these independent variables - revision time and IQ
+
|
- result in a change in the dependent variable, the students' test
+
7 – 8 meters
scores. However, it is also worth noting that whilst this is the main
  −
aim of the experiment, the tutor may also be interested to know if
  −
the independent variables - revision time and IQ - are also connected
  −
in some way.
      
   
 
   
 +
|
 +
21
    +
 +
|-
 +
|
 +
8 – 9 meters
    +
 +
|
 +
15
    
   
 
   
In the section on experimental and
+
|-
non-experimental research that follows, we find out a little more
+
|
about the nature of independent and dependent variables.
+
9 – 10 meters
    
   
 
   
 +
|
 +
8
   −
  −
   
   
 
   
== Experimental and Non-Experimental Research ==
+
|}
 +
==== Evaluation ====
 +
 +
# Does the student understand the difference between a bar chart and a histogram ?
 +
# Does the student know when to use each of these charts - - depending on the type of data continuous and discrete ?
 +
 +
== Evaluation ==
 
   
 
   
 
+
== Self-Evaluation ==
 
  −
 
   
   
 
   
Experimental research: In experimental research,
+
== Further Explorations ==
the aim is to manipulate an independent variable(s) and then examine
+
the effect that this change has on a dependent variable(s). Since it
+
=== Types of Variables ===
is possible to manipulate the independent variable(s), experimental
+
research has the advantage of enabling a researcher to identify a
+
All experiments examine some kind of variable(s).
cause and effect between variables. For example, take our example of
+
A variable is not only something that we measure, but also something
100 students completing a maths exam where the dependent variable was
+
that we can manipulate and something we can control for. To
the exam mark (measured from 0 to 100) and the independent variables
+
understand the characteristics of variables and how we use them in
were revision time (measured in hours) and intelligence (measured
+
research, this guide is divided into three main sections. First, we
using IQ score). Here, it would be possible to use an experimental
+
illustrate the role of dependent and independent variables. Second,
design and manipulate the revision time of the students. The tutor
+
we discuss the difference between experimental and non-experimental
could divide the students into two groups, each made up of 50
+
research. Finally, we explain how variables can be characterised as
students. In &quot;group one&quot;, the tutor could ask the students
+
either categorical or continuous.
not to do any revision. Alternately, &quot;group two&quot; could be
+
 
asked to do 20 hours of revision in the two weeks prior to the test.
+
The tutor could then compare the marks that the students achieved.
+
=== Dependent and Independent Variables ===
 +
 +
<br>
 +
<br>
    
   
 
   
Non-experimental research: In non-experimental
+
An independent variable, sometimes called an
research, the researcher does not manipulate the independent
+
experimental or predictor variable, is a variable that is being
variable(s). This is not to say that it is impossible to do so, but
+
manipulated in an experiment in order to observe the effect on a
it will either be impractical or unethical to do so. For example, a
+
dependent variable, sometimes called an outcome variable.
researcher may be interested in the effect of illegal, recreational
  −
drug use (the dependent variable(s)) on certain types of behaviour
  −
(the independent variable(s)). However, whilst possible, it would be
  −
unethical to ask individuals to take illegal drugs in order to study
  −
what effect this had on certain behaviours. As such, a researcher
  −
could ask both drug and non-drug users to complete a questionnaire
  −
that had been constructed to indicate the extent to which they
  −
exhibited certain behaviours. Whilst it is not possible to identify
  −
the cause and effect between the variables, we can still examine the
  −
association or relationship between them.In addition to understanding
  −
the difference between dependent and independent variables, and
  −
experimental and non-experimental research, it is also important to
  −
understand the different characteristics amongst variables. This is
  −
discussed next.
      
   
 
   
 +
<br>
 +
<br>
    +
 +
Imagine that a tutor asks 100 students to complete
 +
a maths test. The tutor wants to know why some students perform
 +
better than others. Whilst the tutor does not know the answer to
 +
this, she thinks that it might be because of two reasons: (1) some
 +
students spend more time revising for their test; and (2) some
 +
students are naturally more intelligent than others. As such, the
 +
tutor decides to investigate the effect of revision time and
 +
intelligence on the test performance of the 100 students. The
 +
dependent and independent variables for the study are:
    +
 +
<br>
 +
<br>
    
   
 
   
=== Categorical and Continuous Variables ===
+
Dependent Variable: Test Mark (measured from 0 to
 +
100)
 +
 
 
   
 
   
 
+
<br>
 
+
<br>
    
   
 
   
Categorical variables are also known as discrete
+
Independent Variables: Revision time (measured in
or qualitative variables. Categorical variables can be further
+
hours) Intelligence (measured using IQ score)
categorized as either''' nominal, ordinal or dichotomous.'''
      
   
 
   
 
+
<br>
 
+
<br>
    
   
 
   
'''Nominal variables''' are variables that have
+
The dependent variable is simply that, a variable
two or more categories but which do not have an intrinsic order. For
+
that is dependent on an independent variable(s). For example, in our
example, a real estate agent could classify their types of property
+
case the test mark that a student achieves is dependent on revision
into distinct categories such as houses, condos, co-ops or bungalows.
+
time and intelligence. Whilst revision time and intelligence (the
So &quot;type of property&quot; is a nominal variable with 4
+
independent variables) may (or may not) cause a change in the test
categories called houses, condos, co-ops and bungalows. Of note, the
+
mark (the dependent variable), the reverse is implausible; in other
different categories of a nominal variable can also be referred to as
+
words, whilst the number of hours a student spends revising and the
groups or levels of the nominal variable. Another example of a
+
higher a student's IQ score may (or may not) change the test mark
nominal variable would be classifying where people live in USA by
+
that a student achieves, a change in a student's test mark has no
state. In this case there will be many more levels of the nominal
+
bearing on whether a student revises more or is more intelligent
variable (50 in fact).
+
(this simply doesn't make sense).
 +
 
 +
 +
<br>
 +
<br>
    
   
 
   
'''Dichotomous variables''' are nominal
+
Therefore, the aim of the tutor's investigation is
variables which have only two categories or levels. For example, if
+
to examine whether these independent variables - revision time and IQ
we were looking at gender, we would most probably categorize somebody
+
- result in a change in the dependent variable, the students' test
as either &quot;male&quot; or &quot;female&quot;. This is an example
+
scores. However, it is also worth noting that whilst this is the main
of a dichotomous variable (and also a nominal variable). Another
+
aim of the experiment, the tutor may also be interested to know if
example might be if we asked a person if they owned a mobile phone.
+
the independent variables - revision time and IQ - are also connected
Here, we may categorise mobile phone ownership as either &quot;Yes&quot;
+
in some way.
or &quot;No&quot;. In the real estate agent example, if type of
  −
property had been classified as either residential or commercial then
  −
&quot;type of property&quot; would be a dichotomous variable.
      
   
 
   
'''Ordinal variables''' are variables that have
+
<br>
two or more categories just like nominal variables only the
+
<br>
categories can also be ordered or ranked. So if you asked someone if
  −
they liked the policies of the Democratic Party and they could answer
  −
either &quot;Not very much&quot;, &quot;They are OK&quot; or &quot;Yes,
  −
a lot&quot; then you have an ordinal variable. Why? Because you have
  −
3 categories, namely &quot;Not very much&quot;, &quot;They are OK&quot;
  −
and &quot;Yes, a lot&quot; and you can rank them from the most
  −
positive (Yes, a lot), to the middle response (They are OK), to the
  −
least positive (Not very much). However, whilst we can rank the
  −
levels, we cannot place a &quot;value&quot; to them; we cannot say
  −
that &quot;They are OK&quot; is twice as positive as &quot;Not very
  −
much&quot; for example.
      
   
 
   
 
+
In the section on experimental and
 
+
non-experimental research that follows, we find out a little more
 +
about the nature of independent and dependent variables.
    
   
 
   
Continuous variables are also known as
+
=== Experimental and Non-Experimental Research ===
quantitative variables. Continuous variables can be further
  −
categorized as either interval or ratio variables.
  −
 
   
   
 
   
 
+
<br>
 
+
<br>
    
   
 
   
'''Interval variables''' are variables for which
+
Experimental research: In experimental research,
their central characteristic is that they can be measured along a
+
the aim is to manipulate an independent variable(s) and then examine
continuum and they have a numerical value (for example, temperature
+
the effect that this change has on a dependent variable(s). Since it
measured in degrees Celsius or Fahrenheit). So the difference between
+
is possible to manipulate the independent variable(s), experimental
20C and 30C is the same as 30C to 40C. However, temperature measured
+
research has the advantage of enabling a researcher to identify a
in degrees Celsius or Fahrenheit is NOT a ratio variable.
+
cause and effect between variables. For example, take our example of
 +
100 students completing a maths exam where the dependent variable was
 +
the exam mark (measured from 0 to 100) and the independent variables
 +
were revision time (measured in hours) and intelligence (measured
 +
using IQ score). Here, it would be possible to use an experimental
 +
design and manipulate the revision time of the students. The tutor
 +
could divide the students into two groups, each made up of 50
 +
students. In &quot;group one&quot;, the tutor could ask the students
 +
not to do any revision. Alternately, &quot;group two&quot; could be
 +
asked to do 20 hours of revision in the two weeks prior to the test.
 +
The tutor could then compare the marks that the students achieved.
    
   
 
   
'''Ratio variables''' are interval variables but
+
Non-experimental research: In non-experimental
with the added condition that 0 (zero) of the measurement indicates
+
research, the researcher does not manipulate the independent
that there is none of that variable. So, temperature measured in
+
variable(s). This is not to say that it is impossible to do so, but
degrees Celsius or Fahrenheit is not a ratio variable because 0C does
+
it will either be impractical or unethical to do so. For example, a
not mean there is no temperature. However, temperature measured in
+
researcher may be interested in the effect of illegal, recreational
Kelvin is a ratio variable as 0 Kelvin (often called absolute zero)
+
drug use (the dependent variable(s)) on certain types of behaviour
indicates that there is no temperature whatsoever. Other examples of
+
(the independent variable(s)). However, whilst possible, it would be
ratio variables include height, mass, distance and many more. The
+
unethical to ask individuals to take illegal drugs in order to study
name &quot;ratio&quot; reflects the fact that you can use the ratio
+
what effect this had on certain behaviours. As such, a researcher
of measurements. So, for example, a distance of ten metres is twice
+
could ask both drug and non-drug users to complete a questionnaire
the distance of 5 metres.
+
that had been constructed to indicate the extent to which they
 +
exhibited certain behaviours. Whilst it is not possible to identify
 +
the cause and effect between the variables, we can still examine the
 +
association or relationship between them.In addition to understanding
 +
the difference between dependent and independent variables, and
 +
experimental and non-experimental research, it is also important to
 +
understand the different characteristics amongst variables. This is
 +
discussed next.
    
   
 
   
 
+
<br>
 
+
<br>
    
   
 
   
=== Ambiguities in classifying a type of variable ===
+
=== Categorical and Continuous Variables ===
 
   
 
   
 +
<br>
 +
<br>
    +
 +
Categorical variables are also known as discrete
 +
or qualitative variables. Categorical variables can be further
 +
categorized as either''' nominal, ordinal or dichotomous.'''
    +
 +
<br>
 +
<br>
    
   
 
   
In some cases, the measurement scale for data is
+
'''Nominal variables''' are variables that have
ordinal but the variable is treated as continuous. For example, a
+
two or more categories but which do not have an intrinsic order. For
Likert scale that contains five values - strongly agree, agree,
+
example, a real estate agent could classify their types of property
neither agree nor disagree, disagree, and strongly disagree - is
+
into distinct categories such as houses, condos, co-ops or bungalows.
ordinal. However, where a Likert scale contains seven or more value -
+
So &quot;type of property&quot; is a nominal variable with 4
strongly agree, moderately agree, agree, neither agree nor disagree,
+
categories called houses, condos, co-ops and bungalows. Of note, the
disagree, moderately disagree, and strongly disagree - the underlying
+
different categories of a nominal variable can also be referred to as
scale is sometimes treated as continuous although where you should do
+
groups or levels of the nominal variable. Another example of a
this is a cause of great dispute.
+
nominal variable would be classifying where people live in Karnataka
 +
by district. In this case there will be many more levels of the
 +
nominal variable (30 in fact).
    
   
 
   
 +
'''Dichotomous variables''' are nominal
 +
variables which have only two categories or levels. For example, if
 +
we were looking at gender, we would most probably categorize somebody
 +
as either &quot;male&quot; or &quot;female&quot;. This is an example
 +
of a dichotomous variable (and also a nominal variable). Another
 +
example might be if we asked a person if they owned a mobile phone.
 +
Here, we may categorise mobile phone ownership as either &quot;Yes&quot;
 +
or &quot;No&quot;. In the real estate agent example, if type of
 +
property had been classified as either residential or commercial then
 +
&quot;type of property&quot; would be a dichotomous variable.
    +
 +
'''Ordinal variables''' are variables that have
 +
two or more categories just like nominal variables only the
 +
categories can also be ordered or ranked. So if you asked someone if
 +
they liked the policies of the Democratic Party and they could answer
 +
either &quot;Not very much&quot;, &quot;They are OK&quot; or &quot;Yes,
 +
a lot&quot; then you have an ordinal variable. Why? Because you have
 +
3 categories, namely &quot;Not very much&quot;, &quot;They are OK&quot;
 +
and &quot;Yes, a lot&quot; and you can rank them from the most
 +
positive (Yes, a lot), to the middle response (They are OK), to the
 +
least positive (Not very much). However, whilst we can rank the
 +
levels, we cannot place a &quot;value&quot; to them; we cannot say
 +
that &quot;They are OK&quot; is twice as positive as &quot;Not very
 +
much&quot; for example.
    +
 +
<br>
 +
<br>
    
   
 
   
It is worth noting that how we categorise
+
Continuous variables are also known as
variables is somewhat of a choice. Whilst we categorised gender as a
+
quantitative variables. Continuous variables can be further
dichotomous variable (you are either male or female), social
+
categorized as either interval or ratio variables.
scientists may disagree with this, arguing that gender is a more
  −
complex variable involving more than two distinctions, but also
  −
including measurement levels like genderqueer, intersex, and
  −
transgender. At the same time, some researchers would argue that a
  −
Likert scale, even with seven values, should never be treated as a
  −
continuous variable.
      
   
 
   
= Central Tendency =
+
<br>
 +
<br>
 +
 
 
   
 
   
 +
'''Interval variables''' are variables for which
 +
their central characteristic is that they can be measured along a
 +
continuum and they have a numerical value (for example, temperature
 +
measured in degrees Celsius or Fahrenheit). So the difference between
 +
20C and 30C is the same as 30C to 40C. However, temperature measured
 +
in degrees Celsius or Fahrenheit is NOT a ratio variable.
    +
 +
'''Ratio variables''' are interval variables but
 +
with the added condition that 0 (zero) of the measurement indicates
 +
that there is none of that variable. So, temperature measured in
 +
degrees Celsius or Fahrenheit is not a ratio variable because 0C does
 +
not mean there is no temperature. However, temperature measured in
 +
Kelvin is a ratio variable as 0 Kelvin (often called absolute zero)
 +
indicates that there is no temperature whatsoever. Other examples of
 +
ratio variables include height, mass, distance and many more. The
 +
name &quot;ratio&quot; reflects the fact that you can use the ratio
 +
of measurements. So, for example, a distance of ten metres is twice
 +
the distance of 5 metres.
    +
 +
<br>
 +
<br>
    
   
 
   
== Introduction ==
+
=== Ambiguities in classifying a type of variable ===
 
   
 
   
A measure of central tendency is a single value
+
<br>
that attempts to describe a set of data by identifying the central
+
<br>
position within that set of data. As such, measures of central
+
 
tendency are sometimes called measures of central location. They are
+
also classed as summary statistics. The mean (often called the
+
In some cases, the measurement scale for data is
average) is most likely the measure of central tendency that you are
+
ordinal but the variable is treated as continuous. For example, a
most familiar with, but there are others, such as, the median and the
+
Likert scale that contains five values - strongly agree, agree,
mode.
+
neither agree nor disagree, disagree, and strongly disagree - is
 +
ordinal. However, where a Likert scale contains seven or more value -
 +
strongly agree, moderately agree, agree, neither agree nor disagree,
 +
disagree, moderately disagree, and strongly disagree - the underlying
 +
scale is sometimes treated as continuous although where you should do
 +
this is a cause of great dispute.
 +
 
 +
 +
<br>
 +
<br>
 +
 
 +
 +
It is worth noting that how we categorise
 +
variables is somewhat of a choice. Whilst we categorised gender as a
 +
dichotomous variable (you are either male or female), social
 +
scientists may disagree with this, arguing that gender is a more
 +
complex variable involving more than two distinctions, but also
 +
including measurement levels like genderqueer, intersex, and
 +
transgender. At the same time, some researchers would argue that a
 +
Likert scale, even with seven values, should never be treated as a
 +
continuous variable.
 +
 
 +
 +
== Enrichment Activities ==
 +
 +
= Central tendency =
 +
 +
== Introduction ==
 +
 +
A measure of central tendency is a single value
 +
that attempts to describe a set of data by identifying the central
 +
position within that set of data. As such, measures of central
 +
tendency are sometimes called measures of central location. They are
 +
also classed as summary statistics. The mean (often called the
 +
average) is most likely the measure of central tendency that you are
 +
most familiar with, but there are others, such as, the median and the
 +
mode.
    
   
 
   
Line 2,253: Line 2,554:  
appropriate to be used.
 
appropriate to be used.
    +
 +
== Objectives ==
 +
 +
* Understand and know that a measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data.
 +
* Understand that the mean, median and mode are all valid measures of central tendency but, under different conditions, some measures of central tendency become more appropriate to use than others.
 +
* Learn to calculation of mean and median and analyse data and make conclusions.
 
   
 
   
 
== Mean (Arithmetic) ==
 
== Mean (Arithmetic) ==
Line 2,263: Line 2,570:  
values in a data set and they have values x<sub>1</sub>, x<sub>2</sub>,
 
values in a data set and they have values x<sub>1</sub>, x<sub>2</sub>,
 
..., x<sub>n</sub>, then the sample mean, usually denoted by  
 
..., x<sub>n</sub>, then the sample mean, usually denoted by  
[[Image:Statistics_html_174cec39.gif]]
+
[[Image:KOER-%20Mathematics%20-%20Statistics_html_174cec39.gif]]
 
(pronounced x bar), is:
 
(pronounced x bar), is:
    
   
 
   
 +
<br>
 +
<br>
   −
 
+
   
 
+
[[Image:KOER-%20Mathematics%20-%20Statistics_html_69b2cf9e.gif]]
   
  −
[[Image:Statistics_html_69b2cf9e.gif]]
      
   
 
   
Line 2,279: Line 2,586:     
   
 
   
[[Image:Statistics_html_m50e9a786.gif]]
+
[[Image:KOER-%20Mathematics%20-%20Statistics_html_m50e9a786.gif]]
    
   
 
   
Line 2,292: Line 2,599:     
   
 
   
[[Image:Statistics_html_7b1e9596.gif]]
+
[[Image:KOER-%20Mathematics%20-%20Statistics_html_7b1e9596.gif]]
    
   
 
   
Line 2,310: Line 2,617:     
   
 
   
 
+
<br>
 
+
<br>
    
   
 
   
Line 2,448: Line 2,755:     
   
 
   
 
+
<br>
 
+
<br>
    
                            
 
                            
Line 2,499: Line 2,806:  
   
 
   
 
|}  
 
|}  
 
+
<br>
 
+
<br>
    
   
 
   
Line 2,507: Line 2,814:     
   
 
   
 
+
<br>
 
+
<br>
    
                            
 
                            
Line 2,558: Line 2,865:  
   
 
   
 
|}  
 
|}  
 
+
<br>
 
+
<br>
    
   
 
   
Line 2,571: Line 2,878:     
   
 
   
 
+
<br>
 
+
<br>
    
                          
 
                          
Line 2,618: Line 2,925:  
   
 
   
 
|}  
 
|}  
 
+
<br>
 
+
<br>
    
   
 
   
 
We again rearrange that data into order of
 
We again rearrange that data into order of
magnitude (smallest first):
+
magnitude (smallest first):<br>
 
+
<br>
 
+
<br>
    
                            
 
                            
Line 2,675: Line 2,982:  
   
 
   
 
|}  
 
|}  
 
+
<br>
 
+
<br>
    
   
 
   
Line 2,691: Line 2,998:     
   
 
   
 +
[[Image:KOER-%20Mathematics%20-%20Statistics_html_58d59706.png]]<br>
 +
<br>
    +
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    
   
 
   
Normally, the mode is used for categorical data
+
<br>
where we wish to know which is the most common category as
+
<br>
illustrated below:
      
   
 
   
We can see above that the most common form of
+
<br>
transport, in this particular data set, is the bus. However, one of
+
<br>
the problems with the mode is that it is not unique, so it leaves us
  −
with problems when we have two or more values that share the highest
  −
frequency, such as below:
      
   
 
   
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    
   
 
   
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    
   
 
   
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    
   
 
   
 +
<br>
 +
<br>
   −
 
+
 +
<br>
 +
<br>
    
   
 
   
 +
Normally, the mode is used for categorical data
 +
where we wish to know which is the most common category as
 +
illustrated below:
 +
 +
 +
We can see above that the most common form of
 +
transport, in this particular data set, is the bus. However, one of
 +
the problems with the mode is that it is not unique, so it leaves us
 +
with problems when we have two or more values that share the highest
 +
frequency, such as below:
    +
 +
<br>
 +
<br>
    +
 +
[[Image:KOER-%20Mathematics%20-%20Statistics_html_m64bbad46.png]]<br>
 +
<br>
    
   
 
   
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
 +
 +
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    
   
 
   
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    
   
 
   
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    
   
 
   
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    
   
 
   
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    
   
 
   
Line 2,760: Line 3,159:     
   
 
   
 
+
<br>
 
+
<br>
    
   
 
   
Line 2,770: Line 3,169:     
   
 
   
 +
[[Image:KOER-%20Mathematics%20-%20Statistics_html_152dd141.png]]<br>
 +
<br>
    +
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    
   
 
   
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    
   
 
   
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    
   
 
   
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    
   
 
   
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    
   
 
   
 
+
<br>
 
+
<br>
    
   
 
   
 
+
<br>
 
+
<br>
    
   
 
   
 +
In the above diagram the mode has a value of 2. We
 +
can clearly see, however, that the mode is not representative of the
 +
data, which is mostly concentrated around the 20 to 30 value range.
 +
To use the mode to describe the central tendency of this data set
 +
would be misleading.
    +
 +
== Skewed Distributions and the Mean and Median ==
 +
 +
[[Image:KOER-%20Mathematics%20-%20Statistics_html_26c6186d.png]]We
 +
often test whether our data is normally distributed as this is a
 +
common assumption underlying many statistical tests. An example of a
 +
normally distributed set of data is presented below:
    +
 +
<br>
 +
<br>
    
   
 
   
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    
   
 
   
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    
   
 
   
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    
   
 
   
 +
When you have a normally distributed sample you
 +
can legitimately use both the mean or the median as your measure of
 +
central tendency. In fact, in any symmetrical distribution the mean,
 +
median and mode are equal. However, in this situation, the mean is
 +
widely preferred as the best measure of central tendency as it is the
 +
measure that includes all the values in the data set for its
 +
calculation, and any change in any of the scores will affect the
 +
value of the mean. This is not the case with the median or mode.
    +
 +
However, when our data is skewed, for example, as
 +
with the right-skewed data set below:
    +
 +
[[Image:KOER-%20Mathematics%20-%20Statistics_html_m2609c500.png]]<br>
 +
<br>
    
   
 
   
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    
   
 
   
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    
   
 
   
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    
   
 
   
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    
   
 
   
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    +
 +
<br>
 +
<br>
    
   
 
   
 
+
<br>
 
+
<br>
    
   
 
   
 
+
we find that the mean is being dragged in the
 
+
direct of the skew. In these situations, the median is generally
 +
considered to be the best representative of the central location of
 +
the data. The more skewed the distribution the greater the difference
 +
between the median and mean, and the greater emphasis should be
 +
placed on using the median as opposed to the mean. A classic example
 +
of the above right-skewed distribution is income (salary), where
 +
higher-earners provide a false representation of the typical income
 +
if expressed as a mean and not a median.
    
   
 
   
 
+
If dealing with a normal distribution, and tests
 
+
of normality show that the data is non-normal, then it is customary
 +
to use the median instead of the mean. This is more a rule of thumb
 +
than a strict guideline however. Sometimes, researchers wish to
 +
report the mean of a skewed distribution if the median and mean are
 +
not appreciably different (a subjective assessment) and if it allows
 +
easier comparisons to previous research to be made.
    
   
 
   
In the above diagram the mode has a value of 2. We
+
<br>
can clearly see, however, that the mode is not representative of the
+
<br>
data, which is mostly concentrated around the 20 to 30 value range.
  −
To use the mode to describe the central tendency of this data set
  −
would be misleading.
      
   
 
   
== Skewed Distributions and the Mean and Median ==
+
== Summary of when to use the mean, median and mode ==
 
   
 
   
We often test whether our data is normally
+
Please use the following summary table to know
distributed as this is a common assumption underlying many
+
what the best measure of central tendency is with respect to the
statistical tests. An example of a normally distributed set of data
+
different types of variables.
is presented below:
      
   
 
   
 +
<br>
 +
<br>
    +
                       
 +
{| border="1"
 +
|-
 +
|
 +
'''Type of Variable'''
    +
 +
|
 +
'''Best measure of central tendency'''
    
   
 
   
When you have a normally distributed sample you
+
|-
can legitimately use both the mean or the median as your measure of
+
|
central tendency. In fact, in any symmetrical distribution the mean,
+
Nominal
median and mode are equal. However, in this situation, the mean is
  −
widely preferred as the best measure of central tendency as it is the
  −
measure that includes all the values in the data set for its
  −
calculation, and any change in any of the scores will affect the
  −
value of the mean. This is not the case with the median or mode.
      
   
 
   
However, when our data is skewed, for example, as
+
|
with the right-skewed data set below:
+
Mode
    
   
 
   
 +
|-
 +
|
 +
Ordinal
   −
 
+
 +
|
 +
Median
    
   
 
   
 
+
|-
 
+
|
 +
Interval/Ratio (not skewed)
    
   
 
   
 
+
|
 
+
Mean
    
   
 
   
 
+
|-
 
+
|
 +
Interval/Ratio (skewed)
    
   
 
   
 
+
|
 
+
Median
    
   
 
   
 
+
|}
 
+
<br>
 +
<br>
    
   
 
   
 +
== Relative advantages and disadvantages of mean, median and  mode ==
 +
 +
Mean.<br>
 +
Advantages:
 +
Finds the most accurate average of the set of numbers.<br>
 +
Disadvantages:
 +
Outliers (few values are very different from most) can change the
 +
mean a lot... making it much lower/higher than it should
 +
be.<br>
 +
<br>
 +
Median:<br>
 +
Advantages: Finds the middle number of a set of
 +
data, so outliers have little or no effect.<br>
 +
Disadvantages: If the
 +
gap between some numbers is large, while it is small between other
 +
numbers in the data, this can cause the median to be a very
 +
inaccurate way to find the middle of a set of
 +
values.<br>
 +
<br>
 +
Mode:<br>
 +
Advantages: Allows you to see what value
 +
happened the most in a set of data. This can help you to figure out
 +
things in a different way. It is also quick and easy.<br>
 +
Disadvantages:
 +
Could be very far from the actual middle of the data. The least
 +
reliable way to find the middle or average of the data.
    +
 +
<br>
    +
 +
This means that each of
 +
these measures can be useful in different kinds of distributions.
    
   
 
   
 +
<br>
    +
 +
== Activities ==
 +
 +
== Activity 1 : Central Tendency ==
 +
 +
==== Learning Objectives ====
 +
 +
Learn to calculate each average measure - Mean,
 +
Median, Mode. And understand the difference between them. Know in
 +
which situation which measure must be used.
    +
 +
==== Pre-requisites/ Instructions ====
 +
 +
<br>
 +
<br>
    
   
 
   
 +
==== Materials and Resources Required ====
 +
 +
Paper and Pencil
    +
 +
==== Method ====
 +
 +
Solve the problems A and B
    +
 +
<br>
 +
<br>
    
   
 
   
 
+
A. 27 members of a
 
+
class were given a puzzle to solve and the times (in minutes) each
 +
pupil took to solve it were noted.
    
   
 
   
 +
<br>
 +
<br>
    +
       
 +
{| border="1"
 +
|-
 +
|
 +
'''the times (in minutes) each pupil took'''
    +
 +
|-
 +
|
 +
19 14 15 9 18 16 10 11 16
    
   
 
   
 
+
4 20 10 14 11 9 13 15 13
 
      
   
 
   
 
+
12 2 17 15 14 10 11 10 12
 
      
   
 
   
 
+
|}
 
+
<br>
 +
<br>
    
   
 
   
 
+
<br>
 
+
<br>
    
   
 
   
 
+
# The MEAN value of a set of data is Sum of Values / Number of Values . What is the mean (to 2 decimal places) of the times given in the table?
 
+
# The MEDIAN is the middle value of an ordered set of data.
 
+
## Write down the times in the table above in ascending order.
 +
## How many values are there?
 +
## What is the median ?
 +
#
 +
# The MODE is the value which occurs most often, i.e. the most popular.
 +
## What is the mode of the times in the table above?
 +
#
 +
# Which of the three measures do you think is most representative of the average time? In this case it is probably the mean, but this will not always be so.
 
   
 
   
 
+
<br>
 
+
<br>
    
   
 
   
 
+
'''B Choosing which measure to use '''
 
      
   
 
   
 +
The sales in one week of a particular dress are
 +
given in terms of the dress sizes.
   −
 
+
 +
# Determine the mean, median and mode for this data .
 +
# What is the size that is sold the most ?
 +
# Which of these measures is of most use?
 +
 +
<br>
 +
<br>
    
   
 
   
 +
Dress sizes sold in one week
    +
             
 +
{| border="1"
 +
|-
 +
|
 +
10
    +
 +
16
    
   
 
   
 +
16
    +
 +
12
    +
 +
16
    
   
 
   
we find that the mean is being dragged in the
+
|
direct of the skew. In these situations, the median is generally
+
14
considered to be the best representative of the central location of
  −
the data. The more skewed the distribution the greater the difference
  −
between the median and mean, and the greater emphasis should be
  −
placed on using the median as opposed to the mean. A classic example
  −
of the above right-skewed distribution is income (salary), where
  −
higher-earners provide a false representation of the typical income
  −
if expressed as a mean and not a median.
      
   
 
   
If dealing with a normal distribution, and tests
+
12
of normality show that the data is non-normal, then it is customary
  −
to use the median instead of the mean. This is more a rule of thumb
  −
than a strict guideline however. Sometimes, researchers wish to
  −
report the mean of a skewed distribution if the median and mean are
  −
not appreciably different (a subjective assessment) and if it allows
  −
easier comparisons to previous research to be made.
      
   
 
   
 +
14
   −
 
+
 +
16
    
   
 
   
== Summary of when to use the mean, median and mode ==
+
18
 +
 
 
   
 
   
Please use the following summary table to know
+
|
what the best measure of central tendency is with respect to the
+
12
different types of variables.
      
   
 
   
 +
10
    +
 +
18
    +
 +
10
   −
                       
+
{| border="1"
+
14
|-
  −
|
  −
'''Type of Variable'''
      
   
 
   
 
|  
 
|  
'''Best measure of central tendency'''
+
16
    
   
 
   
|-
+
14
|
  −
Nominal
      
   
 
   
|
+
8
Mode
      
   
 
   
|-
+
10
|
  −
Ordinal
      
   
 
   
|
+
16
Median
      
   
 
   
|-
   
|  
 
|  
Interval/Ratio (not skewed)
+
18
    
   
 
   
|
+
16
Mean
+
 
 +
 +
14
    
   
 
   
|-
+
16
|
  −
Interval/Ratio (skewed)
      
   
 
   
|
+
8
Median
      
   
 
   
 
|}  
 
|}  
 +
<br>
 +
<br>
   −
  −
   
   
 
   
== Relative advantages and disadvantages of mean, median and mode ==
+
==== Evaluation ====
 +
 +
# Does the student understand the difference between Mean, Median and Mode
 +
# Can the student calculate each of the measures ?
 +
# Does the student know which measure is useful and represents the actual data given a data set ?
 
   
 
   
Mean.
+
== Self-Evaluation ==
Advantages:
  −
Finds the most accurate average of the set of numbers.
  −
Disadvantages:
  −
Outliers (few values are very different from most) can change the
  −
mean a lot... making it much lower/higher than it should
  −
be.
  −
 
  −
Median:
  −
Advantages: Finds the middle number of a set of
  −
data, so outliers have little or no effect.
  −
Disadvantages: If the
  −
gap between some numbers is large, while it is small between other
  −
numbers in the data, this can cause the median to be a very
  −
inaccurate way to find the middle of a set of
  −
values.
  −
 
  −
Mode:
  −
Advantages: Allows you to see what value
  −
happened the most in a set of data. This can help you to figure out
  −
things in a different way. It is also quick and easy.
  −
Disadvantages:
  −
Could be very far from the actual middle of the data. The least
  −
reliable way to find the middle or average of the data.
  −
 
   
   
 
   
 
+
== Further Explorations ==
 
   
   
 
   
This means that each of
+
== Enrichment Activities ==
these measures can be useful in different kinds of distributions.
  −
 
   
   
 
   
 
+
= Dispersion =
 
   
   
 
   
= Dispersion =
+
== Introduction ==
 
   
 
   
 
A measure of spread, sometimes also called a
 
A measure of spread, sometimes also called a
Line 3,099: Line 3,706:  
an overall description of a set of data.
 
an overall description of a set of data.
   −
  −
  −
  −
  −
  −
=== Why is it important to measure the spread of data? ===
   
   
 
   
 
There are many reasons why the measure of the
 
There are many reasons why the measure of the
Line 3,116: Line 3,717:  
Additionally, in research, it is often seen as positive if there is
 
Additionally, in research, it is often seen as positive if there is
 
little variation in each data group as it indicates that the similar.
 
little variation in each data group as it indicates that the similar.
  −
  −
  −
      
   
 
   
Line 3,126: Line 3,723:     
   
 
   
 
+
== Objectives ==
 
  −
 
   
   
 
   
=== Range ===
+
* Understand that a measure of dispersion is a measure of spread, is used to describe the variability in a sample or population.
 +
* It is usually used in conjunction with a measure of central tendency, such as, the mean or median, to provide an overall description of a set of data.
 +
* It important to measure the spread of data because we can understand its relationship with measures of central tendency to make more accurate interpretation of data.
 +
* Understand and know the terms:Range, Quartile, Standard Deviation , Cumulative Frequency
 +
* Calculation of Co-efficient of Variation. Meaning and interpretation of C.V. Analyse data and make conclusions
 
   
 
   
 
+
== Range ==
 
  −
 
   
   
 
   
 
The range is the difference between the highest
 
The range is the difference between the highest
Line 3,141: Line 3,738:     
   
 
   
 
+
<br>
 
+
<br>
    
   
 
   
Line 3,148: Line 3,745:     
   
 
   
 
+
<br>
 
+
<br>
    
   
 
   
Line 3,159: Line 3,756:     
   
 
   
 
+
<br>
 
+
<br>
    
   
 
   
Line 3,173: Line 3,770:  
data. For example, if you have recorded the age of school children in
 
data. For example, if you have recorded the age of school children in
 
your study and your range is 7 to 123 years old you know you have
 
your study and your range is 7 to 123 years old you know you have
made a mistake!
+
made a mistake!<br>
 
+
<br>
 
+
<br>
    
   
 
   
 
=== Quartiles and Interquartile Range ===
 
=== Quartiles and Interquartile Range ===
 
   
 
   
 
+
<br>
 
+
<br>
    
   
 
   
Line 3,191: Line 3,788:     
   
 
   
 
+
<br>
 
+
<br>
    
   
 
   
Line 3,259: Line 3,856:     
   
 
   
 
+
<br>
 
+
<br>
    
   
 
   
 
+
<br>
 
+
<br>
    
   
 
   
Line 3,273: Line 3,870:     
   
 
   
 +
<br>
 +
<br>
   −
 
+
   
 
  −
   
   
First quartile (Q1) = 45 + 45 ÷ 2 = 45
 
First quartile (Q1) = 45 + 45 ÷ 2 = 45
   Line 3,286: Line 3,883:     
   
 
   
 
+
<br>
 
+
<br>
    
   
 
   
Line 3,300: Line 3,897:     
   
 
   
 
+
<br>
 
+
<br>
    
   
 
   
Line 3,316: Line 3,913:     
   
 
   
 
+
<br>
 
+
<br>
    
   
 
   
Line 3,329: Line 3,926:     
   
 
   
 
+
<br>
 
+
<br>
    
   
 
   
Line 3,338: Line 3,935:     
   
 
   
 
+
<br>
 
+
<br>
    
   
 
   
Line 3,348: Line 3,945:  
   
 
   
 
== Standard Deviation ==
 
== Standard Deviation ==
  −
=== Introduction ===
   
   
 
   
 
The standard deviation is a measure of the spread
 
The standard deviation is a measure of the spread
Line 3,438: Line 4,033:     
   
 
   
[[Image:Statistics_html_m5610ded5.gif]]
+
[[Image:KOER-%20Mathematics%20-%20Statistics_html_m5610ded5.gif]]
    
   
 
   
Line 3,445: Line 4,040:  
   
 
   
 
s = sample standard
 
s = sample standard
deviation
+
deviation<br>
 
Σ = sum
 
Σ = sum
of...
+
of...<br>
X = sample mean
+
X = sample mean<br>
 
n = number of scores in sample.
 
n = number of scores in sample.
   Line 3,456: Line 4,051:     
   
 
   
[[Image:Statistics_html_m48922b88.gif]]
+
[[Image:KOER-%20Mathematics%20-%20Statistics_html_m48922b88.gif]]
    
   
 
   
Line 3,463: Line 4,058:  
   
 
   
 
σ
 
σ
= population standard deviation
+
= population standard deviation<br>
 
Σ
 
Σ
= sum of...
+
= sum of...<br>
 
μ =
 
μ =
population mean
+
population mean<br>
 
n = number of scores in sample.
 
n = number of scores in sample.
    
    
 
    
 
== Variation ==
 
== Variation ==
  −
  −
  −
   
   
 
   
 
Quartiles are useful but they are also somewhat
 
Quartiles are useful but they are also somewhat
Line 3,485: Line 4,076:     
   
 
   
 
+
<br>
 
+
<br>
    
   
 
   
Line 3,499: Line 4,090:     
   
 
   
 
+
<br>
 
+
<br>
    
   
 
   
 
=== Absolute Deviation and Mean Absolute Deviation ===
 
=== Absolute Deviation and Mean Absolute Deviation ===
 
   
 
   
 
+
<br>
 
+
<br>
    
   
 
   
Line 3,519: Line 4,110:     
   
 
   
 
+
<br>
 
+
<br>
    
   
 
   
Line 3,537: Line 4,128:     
   
 
   
 
+
<br>
 
+
<br>
    
   
 
   
 
=== Variance ===
 
=== Variance ===
 
   
 
   
 
+
<br>
 
+
<br>
    
   
 
   
Line 3,560: Line 4,151:     
   
 
   
 
+
<br>
 
+
<br>
    
   
 
   
Line 3,584: Line 4,175:     
   
 
   
 
+
<br>
    
   
 
   
Line 3,593: Line 4,184:     
   
 
   
[[Image:Statistics_html_1afc44b3.png]]
+
[[Image:KOER-%20Mathematics%20-%20Statistics_html_1afc44b3.png]]<br>
    
   
 
   
 
+
<br>
    
   
 
   
 
+
<br>
    
   
 
   
Line 3,610: Line 4,201:     
   
 
   
Remarks
+
'''Remarks '''
    
   
 
   
(i) The coefficient of
+
* The coefficient of variation helps us to compare the consistency of two or more
variation helps us to compare the consistency of two or more
+
* collections of data.
 
+
* When the coefficient of variation is more, the given data is less consistent.
 +
* When the coefficient of variation is less, the given data is more consistent.
 
   
 
   
collections of data.
+
<br>
 +
<br>
    
   
 
   
(ii) When the
+
== Self-Evaluation ==
coefficient of variation is more, the given data is less consistent.
  −
 
   
   
 
   
(iii) When the
+
== Further Explorations ==
coefficient of variation is less, the given data is more consistent.
  −
 
   
   
 
   
 
+
== Enrichment Activities ==
 
   
   
 
   
== Key Vocabulary ==
+
= See Also =
 
   
 
   
 
+
Statistics
 
+
on Wikipedia [[http://en.wikipedia.org/wiki/Statistics]]
    
   
 
   
== Additional Resources: ==
+
<br>
+
<br>
[[http://en.wikipedia.org/wiki/Statistics]]
      
   
 
   
 +
A
 +
social Science statistical free and open source statistical software
 
[[http://www.gnu.org/software/pspp/]]
 
[[http://www.gnu.org/software/pspp/]]
    
   
 
   
 
+
= Teachers Corner =
 
  −
 
  −
  −
= Activities : =
  −
  −
== Activity 1 : Data Collection ==
  −
  −
=== Objective ===
  −
  −
Understand collection of data and preparing
  −
frequency distrubution table for a given sets of sources
  −
 
  −
  −
=== Procedure ===
  −
  −
Collect information
  −
regarding the number of family members of your classmates and
  −
represent it in the form of a table. Find to which category most
  −
students belong.
  −
 
  −
  −
 
  −
 
  −
  −
 
  −
 
  −
  −
 
  −
 
  −
             
  −
{| border="1"
  −
|-
  −
|
  −
Number of family members
  −
 
  −
  −
|
  −
Tally marks members
  −
 
  −
  −
|
  −
Number of students
  −
 
  −
  −
with that many
  −
 
  −
  −
family members
  −
 
  −
  −
|-
  −
|
  −
 
  −
 
  −
  −
|
  −
 
  −
 
  −
  −
|
  −
 
  −
 
  −
  −
|}
  −
 
  −
 
  −
  −
Make a table and enter
  −
the data using tally marks. Find the number that appeared
  −
 
  −
  −
 
  −
 
  −
  −
(a) the minimum number
  −
of times?
  −
 
  −
  −
 
  −
 
  −
  −
(b) the maximum number
  −
of times?
  −
 
  −
  −
 
  −
 
  −
  −
(c) same number of
  −
times?
  −
 
  −
  −
 
  −
 
  −
  −
 
  −
 
  −
  −
== Avtivity 2: Histogram and Bar Chart ==
  −
  −
=== Objective ===
  −
  −
Learn to draw a histogram and bar chart.
  −
Understand the difference between a bar chart and a hsitogram and be
  −
able to select the approriate chart by looking at the problem and
  −
data.
  −
 
  −
  −
 
  −
 
  −
 
  −
  −
=== Materials ===
  −
  −
Paper and Pencil
  −
 
  −
  −
=== Procedure ===
  −
  −
Solve the problems A and B
  −
 
  −
  −
 
  −
 
  −
 
  −
  −
In the past year, you have recorded the number of
  −
tickets that a movie theater has sold during each month. To
  −
represent this data set graphically, would you construct a bar graph
  −
or a histogram? Why is this choice better than the other? Using the
  −
following data, construct the graph that you choose.
  −
 
  −
                                                       
  −
{| border="1"
  −
|-
  −
|
  −
Month
  −
 
  −
  −
|
  −
Number of Tickets Sold
  −
 
  −
  −
|-
  −
|
  −
January
  −
 
  −
  −
|
  −
25
  −
 
  −
  −
|-
  −
|
  −
February
  −
 
  −
  −
|
  −
20
  −
 
  −
  −
|-
  −
|
  −
March
  −
 
  −
  −
|
  −
15
  −
 
  −
  −
|-
  −
|
  −
April
  −
 
  −
  −
|
  −
20
  −
 
  −
  −
|-
  −
|
  −
May
  −
 
  −
  −
|
  −
30
  −
 
  −
  −
|-
  −
|
  −
June
  −
 
  −
  −
|
  −
35
  −
 
  −
  −
|-
  −
|
  −
July
  −
 
  −
  −
|
  −
40
  −
 
  −
  −
|-
  −
|
  −
August
  −
 
  −
  −
|
  −
20
  −
 
  −
  −
|-
  −
|
  −
September
  −
 
  −
  −
|
  −
25
  −
 
  −
  −
|-
  −
|
  −
October
  −
 
  −
  −
|
  −
15
  −
 
  −
  −
|-
  −
|
  −
November
  −
 
  −
  −
|
  −
20
  −
 
  −
  −
|-
  −
|
  −
December
  −
 
  −
  −
|
  −
30
  −
 
  −
  −
|}
  −
 
  −
 
  −
  −
 
  −
 
  −
  −
B For a recent science
  −
project, you collected data regarding the distribution of fish and
  −
aquatic life in a nearby pond. Your data consists of the number of
  −
living creatures found in each 1 meter depth increment in the pond.
  −
Construct a bar graph and several histograms (vary the depth
  −
increment size) for the following data. In which case(s) is the
  −
histogram the same as the bar graph? How do the other histograms vary
  −
from the bar graph?
  −
 
  −
  −
 
  −
 
  −
                                               
  −
{| border="1"
  −
|-
  −
|
  −
'''Depth Range'''
  −
 
  −
  −
|
  −
'''Number of Living Creatures '''
  −
 
  −
  −
|-
  −
|
  −
0 – 1 meters
  −
 
  −
  −
|
  −
10
  −
 
  −
  −
|-
  −
|
  −
1 – 2 meters
  −
 
  −
  −
|
  −
93
  −
 
  −
  −
|-
  −
|
  −
2 – 3 meters
  −
 
  −
  −
|
  −
23
  −
 
  −
  −
|-
  −
|
  −
3 – 4 meters
  −
 
  −
  −
|
  −
47
  −
 
  −
  −
|-
  −
|
  −
4 – 5 meters
  −
 
  −
  −
|
  −
68
  −
 
  −
  −
|-
  −
|
  −
5 – 6 meters
  −
 
  −
  −
|
  −
51
  −
 
  −
  −
|-
  −
|
  −
6 – 7 meters
  −
 
  −
  −
|
  −
43
  −
 
  −
  −
|-
  −
|
  −
7 – 8 meters
  −
 
  −
  −
|
  −
21
  −
 
  −
  −
|-
  −
|
  −
8 – 9 meters
  −
 
  −
  −
|
  −
15
  −
 
  −
  −
|-
  −
|
  −
9 – 10 meters
  −
 
  −
  −
|
  −
8
  −
 
  −
  −
|}
  −
== Evaluation ==
  −
  −
# Does the student understand the difference between a bar chart and a histogram ?
  −
# Does the student know when to use each of these charts - - depending on the type of data continous and discrete ?
  −
  −
== Activity 3 : Central Tendency ==
  −
  −
=== Objective ===
  −
  −
Learn to calculate each average measure - Mean,
  −
Median, Mode. And understand the difference between them. Know in
  −
which situation which measure must be used.
  −
 
  −
  −
 
  −
 
  −
 
  −
  −
=== Materials ===
  −
  −
Paper and Pencil
  −
 
  −
  −
=== Process ===
  −
  −
Solve the problems A and B
  −
 
  −
  −
 
  −
 
  −
 
  −
  −
A. 27 members of a
  −
class were given a puzzle to solve and the times (in minutes) each
  −
pupil took to solve it were noted.
  −
 
  −
  −
 
  −
 
  −
 
  −
       
  −
{| border="1"
  −
|-
  −
|
  −
'''the times (in minutes) each pupil took'''
  −
 
  −
  −
|-
  −
|
  −
19 14 15 9 18 16 10 11 16
  −
 
  −
  −
4 20 10 14 11 9 13 15 13
  −
 
  −
  −
12 2 17 15 14 10 11 10 12
  −
 
  −
  −
|}
  −
 
  −
 
  −
 
  −
  −
 
  −
 
  −
 
  −
  −
# The MEAN value of a set of data is Sum of Values / Number of Values . What is the mean (to 2 decimal places) of the times given in the table?
  −
# The MEDIAN is the middle value of an ordered set of data.
  −
## Write down the times in the table above in ascending order.
  −
## How many values are there?
  −
## What is the median ?
  −
#
  −
# The MODE is the value which occurs most often, i.e. the most popular.
  −
## What is the mode of the times in the table above?
  −
#
  −
# Which of the three measures do you think is most representative of the average time? In this case it is probably the mean, but this will not always be so.
  −
  −
 
  −
 
  −
 
  −
  −
'''B Choosing which measure to use '''
  −
 
  −
  −
The sales in one week of a particular dress are
  −
given in terms of the dress sizes.
  −
 
  −
  −
# Determine the mean, median and mode for this data .
  −
# What is the size that is sold the most ?
  −
# Which of these measures is of most use?
  −
  −
 
  −
 
  −
 
  −
  −
Dress sizes sold in one week
  −
 
  −
             
  −
{| border="1"
  −
|-
  −
|
  −
10
  −
 
  −
  −
16
  −
 
  −
  −
16
  −
 
  −
  −
12
  −
 
  −
  −
16
  −
 
  −
  −
|
  −
14
  −
 
  −
  −
12
  −
 
  −
  −
14
  −
 
  −
  −
16
  −
 
  −
  −
18
  −
 
  −
  −
|
  −
12
  −
 
  −
  −
10
  −
 
  −
  −
18
  −
 
  −
  −
10
  −
 
  −
  −
14
  −
 
  −
  −
|
  −
16
  −
 
  −
  −
14
  −
 
  −
  −
8
  −
 
  −
  −
10
  −
 
  −
  −
16
  −
 
   
   
 
   
|
+
= Books =
18
  −
 
   
   
 
   
16
+
&quot;How to lie with statistics&quot; by Darrell
 +
Huff, Pelican, ISBN 0 14 021300 7
    
   
 
   
14
+
&quot;Use and abuse of statistics&quot; by W.
 +
Reichmann, Pelican , ISBN 0 14 020707 4
    
   
 
   
16
+
&quot;Figuring and society&quot; by Ronald Meek,
 +
Fontana ISBN 0 00 632560
    
   
 
   
8
+
<br>
 +
<br>
    
   
 
   
|}
+
<br>
 
+
<br>
 
      
   
 
   
=== Evaluation ===
+
<br>
  −
# Does the student understand the difference between Mean, Median and Mode
  −
# Can the student calculate each of the measures ?
  −
# Does the student know which measure is useful and represents the actual data given a data set ?
 
283

edits

Navigation menu