Mean, Median & Mode- Statistics

Mean, median, and mode are fundamental concepts in statistics used to describe the central tendency of a data set.

1. Mean (Average)

The mean is the sum of all the values in a data set divided by the total number of values.

Formula:

Mean = Sum of all values/Number of values

Mean = Number of values/Sum of all values

2. Median

The median is the middle value in a data set when the numbers are arranged in ascending (or descending) order. If there’s an even number of values, the median is the average of the two middle numbers.

Steps to Find Median:

Arrange the data in ascending order.
Identify the middle value.

3. Mode

The mode is the value(s) that appear most frequently in a data set. A data set can have one mode (unimodal), more than one mode (multimodal), or no mode at all if all values occur with the same frequency.

Mode:

Elementary Class Frequency Distribution:

The mode is defined as the most frequently occurring score. If the data are arranged in a frequency distribution similar to illustration 4, then the mode is easy to identify. In illustration 4 the mode is 89. Why is the mode 89? Because there were four students who scored 89, and that was the largest number of students who scored at the same level on this assessment.

The mode is easy to locate on any type of distribution curve graph, regardless of skewing. Let’s examine several examples to further, understand the concept of mode by locating it on three representative types of graphs.

Mode of a Normal Curve:

Note that the mode is located at the highest point of the graphed data. This represents the greatest frequency of that score.

Mode of Skewed Graphs:

As expected, the mode is located at the highest point on both the positively and negatively skewed graph. Again, the highest point indicates the score with the greatest frequency. Note that the mode moves to the left on a positively skewed distribution and to the right on a negative skewed distribution. Both cases are examples of non-normal data distributions.

Also note that non-normal does not imply that it is incorrect. It simply means the data does not indicate a normal distribution of data that would create a normal curve when graphed.

Let’s complicate the process by looking at the data collected from an elementary class where 14 students were given the same 10 point quiz. The frequency distribution for the class is listed in Illustration 9.

Bimodal simply means that there are two modes within the same distribution of data. In this case, because the modes are considerably far apart, the elementary teacher likely has a class where a substantial number of the students understand the content and a substantial number of students who do not. However, if in the same bimodal scenario, one mode was a score of 10 and a second mode was a score of 9, then the teacher would be entitled to a victory lap around the school parking lot. A bimodal graph is easy to identify. In every case, there will be two peaks in the data. The two peaks represent the frequency that students attained those scores. Illustration 10 is a graph of the data displayed in illustration 9. Note the two humps in the graph representing a bimodal distribution of the data.

Is it possible to have more than two modes? Multi-modal distributions become more common as the amount of data gets considerably larger. The same rules for selecting the mode apply, although the educational implications may vary. A distribution with four modes at equally spaced intervals of 90, 80, 70, and 60 on a diagnostic exam indicates a wide variety of levels of understanding. This type of information would be useful to guide the teacher in the selection of appropriate types of activities that should include lesson preparations that will reach all students. The teacher would prepare differently when the four modes were clustered in the following manner: 99, 97, 45, and 38. In this scenario, the teacher would prepare two different lesson plans for this class: one for the high achievers and one for the lower achievers.

The determination of the mode is a useful statistic for teachers. It not only measures the central tendency or grouping of data, but it also provides a reference point to assist teachers in understanding the nature of the students and their needs, and then guides teachers in planning instruction that will meet their needs.

Median:

The median divides a distribution exactly in half so that 50% of the scores are at or below the median and 50% of the scores are at or above it. It is the “middle value” in a frequency distribution. When the number of data points is an odd number, the middle score is the median. For example, given 13 scores, the 7 score would be the median. When the number of data points is even, like 14, then the median is equal to the sum of the two middle scores in a frequency distribution divided by 2.

Illustration 11: Ordered Array of Unit

Exam Scores (Odd number of scores)

Student Score

Whenever dealing with an odd number, the median is the middle number. So in Illustration 11, the total number of student scores is 15, an odd number. The midpoint of 15 is the 8 score because there are 7 scores above it and 7 scores below it. The teacher then counts down or up to the 8 scores to determine the midpoint or median. In the case of Illustration 11, the median is 29. Note that, for this data set, 29 is also the mode.

What if this teacher had a class with an even number of students?

How would the median be calculated?

Illustration 12 provides an example of how to determine the median in an even-numbered class. Let’s assume that the class size is 6 and they have just completed an exam worth 50 points. The following illustration displays their scores.

To determine the median of an even number of scores, we begin by adding the 2 middle numbers and dividing by 2. In this case, the numbers 12 and 19 are the middle numbers. Together they total 31. The quotient of dividing 31 by 2 delivers a median of 15.5. Note that the median does not have to represent one of the listed scores. For a teacher using an ordered array of test scores, the median locates the middle or center grade.

On a display of the normal curve, the median is exactly the midpoint of the data distribution and is located in the exact center of the graph. This is also the highest point on the curve.

Would the median be affected by a skewed data distribution? Since the median represents the midpoint, skewed data would move the midpoint in the direction of the bulk of the scores. Illustration 13 displays how the median is influenced by a positively or negatively skewed data distribution.

It is interesting to note that skewed data moves the median o of the mode, or the highest peak on the normal curve. In skewed data, the median moves toward the direction of the skew or tail. For a positively skewed data distribution, the median moves to the right of the mode; for negatively skewed data, to the left. The movement of the median to the right or left of the mode indicates that a larger than-normal number of scores are located in that area. Mean The mean is the arithmetic average of all of the data points. It is also the most common measure of central tendency and is the most widely understood. In fact, when most people think of average, they are imagining the mean. The mean is easy to calculate and most people have been doing it since elementary school. To calculate the mean, add up all of the data points and divide that result by the total number of data points. Consider the following ordered array of test scores on a 25-point quiz from a typical middle school class of 20 students.

The total number of scores is 10 and the sum of the numbers is 92. Therefore, the mean is 9.2. How might this affect the child? One score out of ten was enough to keep the child from regaining a mean score of 10. In fact, the child could never get an average of 10 because there is no way to recoup the mathematical effects of the low score. The mean has limitations as a statistic and this is a classic example of the most common one. This is a teacher’s dilemma: what score does the student deserve? It is important for teachers to remember that the mean is strongly influenced by extreme scores. At this point, it may by useful for the teacher to reference the median and mode for additional support.

Some school districts may have a policy stating that a teacher cannot fail a student by recording a score lower than a certain grade, like 40% for example. This is to help avoid situations where a student can never bring up their scores. When grades are deated to a hopelessly low number, this can have very negative effects on classroom behavior and participation.

The way that extreme scores affect the mean is apparent in illustration 18. The mean is identified in a positively and negatively skewed data distribution as it generally relates to both the mode and the median.

Skewed data moves the mean away from the center point of a normal curve. The more skewed the data, the further the mean migrates to the area of the skew. The more extreme the scores, the more the mean is affected.

Like the median, in a positively skewed frequency distribution, the mean moves to the right and the majority of the scores fall below the mean. For a frequency distribution that is negatively skewed, the mean moves to the left and is shaped so that the majority of its scores fall above its mean.

For a teacher, the use of the mean may be inappropriate. In the case where the bulk of scores are located in one mode, and a minimum number of scores are a significant distance from the mode, the mean average may create an arithmetic model that does not approximate the nature of the students. Likewise, the mean of a bimodal distribution may not describe anything useful to the teacher.

Mode, Median, and Mean:

The mode, median, and mean are measures of central tendency and they provide meaningful information to the teacher when used correctly. Each of the statistics is a good measure of central tendency in certain situations and a bad measure in others. So what are their limitations, and when should a teacher use a particular statistic? Here are some helpful tips:

Most data approximates, but do not constitute, a normal distribution because of small sample sizes and intervening educational factors such as tracking.
In a perfectly normal distribution of data, as described by the normal curve, the mode, median, and mean are located at the same point. A perfectly normal curve almost never occurs.
The mode, median, and mean are usually different numbers, especially in a non-normal distribution of data.
The mode is not affected by extreme scores and, therefore, will vary greatly from the median and mean in an extremely skewed distribution of data.
The mean is generally considered the average score and is considered the best measure of central tendency unless exaggerated by extreme scores.
The median establishes the midpoint of the data regardless of skewed data.

So which statistic should the wise teacher use? The best answer is to use the one(s) that are appropriate for that purpose.

Often it depends upon what the teacher wants to know. When in doubt, use all three before making a major decision.

Measures of central tendency provide the teacher with a mathematical description of how well the students are performing.

However, it should be noted that two completely different sets of data, such as the results of two different tests in elementary social studies, can have the same mode, median, and mean, but have vastly different scores. For a better understanding of this phenomenon, it is necessary to understand the basics of variability, which we will look at next.

The distributions of data displayed in illustration 19 have the same measures of central tendency. The mode, median, and mean of Graph A are identical to the mode, median, and mean of Graph B. So as far as central tendency is concerned, they are equal. However, in educational terms, they are anything but equal. For a teacher, graphs of this nature represent two very different circumstances.

Let’s consider that both graphs represent the test scores of two different sets of students in the same subject area on the same day. Graph A shows a tight band of scores near the midpoint. Graph B shows a more diverse range of scores. Translated, the students in Group A have performed at about the same level of average understanding. However, the students represented by Graph B displayed a much more diverse level of understanding. In this case, some of the students performed quite well, while others scored considerably less well. If the same teacher had both sets of students, this would likely indicate the need for two different lesson plans for each class.

By looking at variability we can access a more complete story than what the measures of central tendency have told us about students’ scores.

Standard Deviation:

Standard deviation is a measure of the spread of scores around the mean in a normal curve. It is sometimes referred to as the mean of the mean. For a given situation, the standard deviation measures how close the data points are to the mean. If most of the data points are clustered around the mean, then the standard deviation is small. Conversely, if most of the data points are widely spread and are not grouped around the mean, then the standard deviation is large. In other words, the more the data points differ from the mean, the greater the standard deviation, and vice-versa. Remember, data points for a teacher are likely to be test scores.

To clarify the concept of standard deviation, let’s consider a class of 30 students. Each of the 30 students received a score of 87 on a test. Since every student received the same grade, the mean is 87. Since all of the scores are the mean, there is no arithmetic difference between the scores and the mean. Therefore, the standard deviation in this scenario would be zero. If a few students scored an 85, the standard deviation would not be zero, but it would be quite small and much less than one.

The focus here is on standard deviation rather than variance, because although the two are related (the standard deviation is the square root of the variance), the standard deviation is easier to interpret because it is expressed in the same units as the data, e.g. points on a test. The standard deviation is usually denoted with the letter σ, whereas the variance is σ .

The calculation of standard deviation is quite simple, but there are two slightly different ways to do it depending on the context. First, consider the steps below:

1. Determine the mean (arithmetic average)

2. Subtract the mean from each score

3. Square the result for each score

4. Add the results together

5. Divide this result by either the number of scores (biased) or the number of scores minus 1 (unbiased), as explained below.

6. Determine the square root of this number which is what we call the biased standard deviation

This method is appropriate when the data represents the entire population of interest. What is much more common, however, is that the data being analyzed is a sample taken from a larger population. In this case, the biased standard deviation will be too small compared to the expected but unknown standard deviation of the population. Therefore, we need a way to calculate an unbiased standard deviation. Fortunately, this is simple, as shown in Step 5. Instead of dividing by the total number of scores, divide by the total number of scores minus 1. If you are unsure whether to use the biased or unbiased standard deviation, use the unbiased (number of scores minus 1) calculation.

Let’s work an actual problem. In a class of 4 students, the following scores were recorded:

1. Determine the mean: The mean is 6.

2. Subtract the mean from each score:

(9−6) = 3

(8−6) = 2

(4−6) = −2

(3−6) = −3

3. Square the result for each score:

3 = 9

2 = 4

(−2) = 4

(−3) = 9

4. Add the results together: 9 + 4 + 4 + 9 = 26

5. Divide this result by the number of scores minus 1 (unbiased), because we are interested in considering these students as a sample from the entire school: 26/3= 8.67

6. Determine the square root of this number which is the standard deviation: The square root of 8.67 = 2.94.

It is a general rule of thumb for statisticians that a large standard deviation means an excessive spread of data well dispersed away from the mean. A small standard deviation indicates a tight cluster of data points near the mean.

Probably the most valuable information regarding standard deviation is gained by analyzing the application of standard deviation to the normal curve. When the normal curve is divided according to standard deviations, the result is displayed in illustration 20.

Standard Deviation:

Dividing the normal curve according to standard deviations reveals a tremendous amount of information to the teacher such as the following:

68% of the data points, such as test scores, will fall within one standard deviation of the mean. Note that the standard deviation includes the area on both sides of the mean.

95% of the data points will fall between two standard deviations of the mean.

99.7% of the data points will fall within three standard deviations of the mean.

99.993665% of the data points will fall within four standard deviations of the mean.

99.9999426% of the data points will fall within ve standard deviations of the mean.

99.999999802% of the data points will fall within six standard deviations of the mean.

>99.99999999974% of the data points will fall within seven standard deviations of the mean.

So why is it important to know about standard deviations and the normal curve? Consider a situation where a teacher gives a 100-point test. When the data were analyzed, the mean score was 70 and the standard deviation was 5. If we assume that the distribution of scores is normal, resulting in a normal curve, then we can conclude:

68% of the students scored between a 65 and 75, (70−5 and 70+5).

95% of the students received scores between 60 and 80, (70−5−5 and 70+5+5).

99.7% of the students received scores between 55 and 85, (70−5−5−5 and 70+5+5+5).

This data can be transferred to a data table for easier analysis:

From this table, a teacher can get a much clearer picture of how well the students performed on a particular assessment. In the scenario presented, the standard deviation was quite small. Let’s look at the same situation, except this time the standard deviation will be 10.

68% of the students received a score between 60 and 80, (70−10 and 70+10).

95% of the students received a score between 50 and 90, (70−10−10 and 70+10+10).

99.7% of the students received a score between 40 and 100, (70−10−10−10 and 70+10+10+10).

It is easy to see that the standard deviation on this set of scores indicates that the students have a wider range of understanding as measured by this assessment. Imagine if the standard deviation was 20 instead of 10!

Correlations:

A correlation is the measure of a relationship between two or more variables. A correlation is simply a co-relation which denotes how well two separate variables “go together.” For a teacher, the two variables might be items such as correlating a successful homework assignment for students to their grades on a related assessment. Correlations also hold the distinction of being the statistic that is most likely to be misunderstood and misused in education.

It is possible to correlate any item to any other item. Some correlations are silly, such as the correlation made between the relative abundance of clouds in the sky to a winning lottery ticket. Other correlations have real value to a teacher, such as the correlation between the amount of time that students study and student achievement. Sometimes correlations are useful, sometimes they are not; sometimes they are positive, sometimes they are negative.

A positive correlation between two events means that when the value of one item increases, then the value of the other item is likely to increase. An example of a positive correlation is the relationship between height and weight: taller people generally weigh more than shorter people. However, since there are plenty of “short” people who weigh as much or more than “tall” people, the correlation cannot be described as strong. A strong positive correlation means that when the value of one item increases, the value of the other item also increases. An example of a strong positive correlation is the relationship between an increase in wealth and an increase in spending. In America, the more people make, the more they spend.

A negative correlation describes an inverse relationship. In this case, when one event increases, the other decreases. For example, before the advent of safety devices in cars, a negative correlation existed between increasing car speed and the number of days without car accidents. In this case, the faster the cars were able to travel, the less the chance of an accident-free day.

For educational purposes, a correlation may be quite useful. For instance, it may be helpful for the teacher to know that a score greater than 75% on a student’s review packet has a strong positive correlation to student performance on the subsequent exam. The teacher would then know to suggest that the students take the review packet seriously and complete it thoughtfully. It may also work in reverse. Some things do not correlate well together. For instance, a teacher may find that completing a particular activity in class had no effect on student performance. Or, a teacher may identify a negative correlation, such as the more time that recess cuts into the spelling instruction period, the lower students’ spelling scores begin to fall.

Correlations are often used to predict events. One of the most common predictions from a strong positive correlation is high school class rank and success in college. That is one of the reasons why post-high school institutions are so interested in high school class rank before admitting incoming freshmen. It is important to know that correlations are often helpful in predicting events. Yet, it is also important to know that a correlation does not imply cause and effect. Thus, a strong positive correlation does not mean that one event causes the next event. This is probably the biggest error that educators make.

For example, there is a strong correlation between people wearing warm coats and cold weather. This does not imply or suggest that people wearing winter coats cause cold weather. The correlation simply means that when there is cold weather,

people are more likely to wear winter coats, or, people are more likely to wear winter coats when the weather is cold. There also tends to be a strong correlation between students who play instruments and their high academic achievement in other subject areas. Does this mean that playing in the band causes high academic achievement? No, it means that some of the students who do well in school also play in the band, or students who play in the band are also doing well in school. This correlation does not preclude other students from doing well academically and not playing in the band. One event does not cause the other any more than wearing winter coats causes cold weather.

Wouldn’t it be nice to have a method of correlating important educational events so that decisions could be made with a degree of certainty? Fortunately, there is a mathematical procedure for just this type of situation. It is called the Pearson product-moment correlation coefficient.

Pearson Product-Moment Correlation Coefficient:

The Pearson product-moment correlation coefficient is a simple statistic that indicates the degree of linear relationship between two variables. The Pearson correlation coefficient can range in value from +1.0 to -1.0. A value of +1.0 indicates a perfect direct linear relationship; a value of -1.0 indicates a perfect inverse relationship. For instance, a coefficient of +1.0

means that when the value of something increases, the value of the other item increases. A coefficient of -1.0 means that when something trends upward; the other event is likely to trend downward. For most educational decisions, it isn’t quite that easy. Most coefficients are very seldom +1.0 or -1.0.

A strong coefficient is still helpful, regardless of whether it is positive or negative. For instance, grade point average (GPA) has a strong positive coefficient with scores on the Scholastic Aptitude Test (SAT). For that reason, students are encouraged to do well in school as a preparation for scoring well on the SAT. Likewise, o-task behaviors in students and student achievement have a strong negative coefficient. That is why teachers construct lessons to keep the students meaningfully and thoughtfully engaged in their learning. The level of student achievement and the size of students’ feet have neither a positive nor a negative coefficient. In fact, it would be near zero. This means that there is no predictive relationship that can be established between foot size and student achievement.

The Pearson coefficient provides statistical insight which allows educators to make better decisions regarding what is best for students.

Scatter Plots:

A scatter plot is a graph that shows to what degree two variables correlate. In other words, a scatter plot is a graph of a correlation. In a scatter plot, one of the variables is plotted on the X-axis and the other variable is plotted on the Y-axis. Each individual score or measurement of the two variables is plotted as a single point. The single point is the junction where the two variables meet. It is directly above the score represented by the horizontal axis and to the right of the score on the vertical axis.

The location of the plotted scores indicates the nature of the correlation:

When plotted, if the scores trend in a pattern from lower left to upper right, then the correlation is positive. A positive correlation where all of the scores fall on a single diagonal line would have a +1.0 Pearson coefficient.

If the scores trend from upper left to lower right, then the correlation is negative. A negative correlation where all of the scores were on a single diagonal line would have a -1.0 Pearson coefficient.

If the scores appear in a random or circular fashion, the correlation is neither positive nor negative. A perfectly random or circular array of scores would have a zero Pearson coefficient, which means that the two items do not correlate well. Translated this means that when one value increases, the other value cannot be predicted; it either increases, stays the same, or decreases. With a zero coefficient, there is no ability to predict the results.

The value of a scatter plot is that the correlation is easily identified by the shape created by the plotted points. Illustration 22 represents a positive correlation graphed as a scatter plot of IQ and achievement in school for a particular 4-grade class.

Scatter Plot Example:

In this particular case, the scores tend to fall in a line extending from the lower left to the upper right as noted by the gray line. Translated, this means that as the students’ IQ increased in that particular 4-grade class, their achievement also increased.

Scatter plots also indicate the degree of strength of a correlation by how close the numbers fall in line. As the plotted points come closer to falling in a perfectly straight line, they also come closer to a perfect +1.0 or -1.0 Pearson coefficient and have greater predictive value. Examine the sequence of scatterplots listed in illustration 23 and note how the points begin to fall into a straight line as the Pearson coefficient increases or decreases toward a +1.0 or -1.0.

Pearson correlation coefficient = -1

10 Things To Remember About Statistics:

1. It is important to understand both the benefits and limitations of statistical data. avoid the temptation to extend a statistical analysis beyond the limitation of the statistic.
2. A frequency distribution is easily converted into a histogram.
3. The normal curve is a statistical derivation that serves as a basis for understanding data, but it is seldom found completely intact in the real world.
4. The mean, median, and mode are different measures of central tendency.
5. Standard deviation is a measure of the variance of measurements.
6. A correlation does not prove cause and effect.
7. A correlation can be used as a predictive statistic.
8. The Pearson product-moment correlation coefficient is the most common type of predictive correlation with measures that range from -1.0 to +1.0.
9. Scatter plots are a graphic presentation of correlations.
10. Statistical analysis is important to teaching and provides information for lesson planning, reading instruction, remediation, and communicating about students.

Comparison

Statistic	Definition	Best Use Case
Mean	Average of the data set	Sensitive to extreme values (outliers).
Median	Middle value of the data set	Not affected by outliers; good for skewed data.
Mode	Most frequently occurring value(s)	Useful for categorical data or detecting patterns.

Let me know if you’d like examples, formulas, or applications for specific types of data!

Mean, Median & Mode- Statistics