Frequency distribution is a list of data values ( can be individual values or data values that have been grouped into certain intervals ) accompanied by the appropriate frequency value.
The measurement results we get are called raw data. The magnitude of the measurement results we obtain usually varies. If we look at the raw data, it is very difficult for us to draw meaningful conclusions. The raw data needs to be processed first so that we can get a good picture of the data.
In this discussion, Smartstat will describe the meaning of frequency distribution accompanied by examples and Techniques for Making Frequency Distribution Tables . In addition, it will also discuss the Relative Frequency Distribution and Cumulative Frequency Distribution , Histogram , Frequency Polygon , and Ogive .
Frequency Distribution
When we are faced with a large set of data, it is often helpful to organize and summarize the data by creating a table that lists the possible different data values (either individually or by grouping) together with an appropriate frequency, which represents the number of times the values it happens. The distribution list of data values is called the Frequency List or Frequency Distribution ( Frequency Distribution ).
Thus, the frequency distribution is a list of data values ( can be individual values or data values that have been grouped into certain intervals ) accompanied by the appropriate frequency value.
How to create Histograms, Frequency Polygons and Ogives? You can learn in the following Excel Tutorial: Histogram (Toolpak)
The grouping of data into several classes is intended so that the important characteristics of the data can be seen immediately. This frequency distribution will give a typical picture of how diverse the data is. The nature of data diversity is very important to know, because in subsequent statistical tests we must always pay attention to the nature of data diversity. Regardless of the heterogeneous nature of the data, drawing a conclusion is generally invalid.
For example, consider the sample data in Table 1. The table is a list of Statistics Course test scores from 80 students (Sudjana, 19xx).
Table 1. List of Statistics Course Exam Scores
79 | 49 | 48 | 74 | 81 | 98 | 87 | 80 |
80 | 84 | 90 | 70 | 91 | 93 | 82 | 78 |
70 | 71 | 92 | 38 | 56 | 81 | 74 | 73 |
68 | 72 | 85 | 51 | 65 | 93 | 83 | 86 |
90 | 35 | 83 | 73 | 74 | 43 | 86 | 88 |
92 | 93 | 76 | 71 | 90 | 72 | 67 | 75 |
80 | 91 | 61 | 72 | 97 | 91 | 88 | 81 |
70 | 74 | 99 | 95 | 80 | 59 | 71 | 77 |
63 | 60 | 83 | 82 | 60 | 67 | 89 | 63 |
76 | 63 | 88 | 70 | 66 | 88 | 79 | 75 |
It is very difficult to draw any conclusions from this list of data. At a glance, we cannot determine what the smallest or largest test scores will be. Likewise, we can't know exactly, how many test scores are the most or how many students get a certain score. Thus, we must process the data first in order to provide a better description or information.
Compare with the tables that have been compiled in the form of a list/ frequency distribution (Table 2a and Table 2b). Table 2a is the frequency distribution of single data and Table 2b is a list of frequencies compiled from data that has been grouped into classes according to the interval. We can get some information or characteristics from student test score data.
tab 2a.
No | Test scores | Frequency |
xi | i _ | |
1 | 35 | 1 |
2 | 36 | 0 |
3 | 37 | 0 |
4 | 38 | 1 |
: | : | : |
16 | 70 | 4 |
17 | 71 | 3 |
: | : | 1 |
42 | 98 | 1 |
43 | 99 | 1 |
Total | 80 |
In Table 2a, we can see that there are 80 students who take the exam, the smallest test score is 35 and the highest is 99. The score 70 is the score that most students get, which is 4 people, or we can also say there are 4 students who scored 70, none of the students got a score of 36, or only one student who got a score of 35.
Table 2b.
class- | Test scores | Frequency f i |
1 | 31 - 40 | 2 |
2 | 41 - 50 | 3 |
3 | 51 - 60 | 5 |
4 | 61 - 70 | 13 |
5 | 71 - 80 | 24 |
6 | 81 - 90 | 21 |
7 | 91 - 100 | 12 |
Amount | 80 |
Table 2b is a list of frequency distribution of grouped data. This list is a list of frequently used frequencies. We often group sample data into certain intervals in order to get a better picture of the characteristics of the data. From the list, we can see that there are 80 students who take the exam, the range of grades with the most scores obtained by students is around 71 to 80, i.e. there are 24 people, and so on. Just keep in mind that in this way we can lose the identity of the original data. For example, we can know that there are 2 people who get a score between 31 to 40. However, we will not know exactly, what is the true value of the 2 students, whether 31 is 32 or 36 and so on.
There are several terms that must be understood first in compiling a frequency distribution table.
Table 3.
class- | Exam Score Interval | Class Limit | Grade Value (x i ) | Frequency (f i ) |
1 | 31 - 40 | 30.5 – 40.5 | 35.5 | 2 |
2 | 41 - 50 | 40.5 – 50.5 | 45.5 | 3 |
3 | 51 - 60 | 50.5 – 60.5 | 55.5 | 5 |
4 | 61 - 70 | 60.5 – 70.5 | 65.5 | 13 |
5 | 71 - 80 | 70.5 – 80.5 | 75.5 | 24 |
6 | 81 - 90 | 80.5 – 90.5 | 85.5 | 21 |
7 | 91 - 100 | 90.5 – 100.5 | 95.5 | 12 |
Amount | 80 |
Range : The difference between the highest and lowest values. In the example test above, Range = 99 – 35 = 64
Lower class limit : The smallest value in each class. (Example: In Table 3 above, the lower class limits are 31, 41, 51, 61, …, 91)
Upper class limit : The largest value in each class. (Example: In Table 3 above, the lower class limits are 40, 50, 60, …, 100)
Class boundary : The value used to separate between classes, but without the distance between the upper limit of the class and the lower limit of the next class. Example: In class 1, the smallest class limit is 30.5 and the largest is 40.5. In class 2, the class limits are 40.5 and 50.5. The value at the upper limit of the 1st class (40.5) is the same as and is the lower limit value for the 2nd class (40.5). Class limits are always expressed by the number of digits of one decimal place more than the original observation data . This is done to ensure that no observation values fall within the class limits, thus avoiding doubts on which class the data should be placed in. Example: if class boundaries are created like this:
1st class : 30 – 40
2nd class : 40 – 50
:
etc.
If there is a test score with a number of 40, should it be placed in class-1 or class-2?
Class length/width (class interval) : The difference between two consecutive lower class limit values or the difference between two consecutive class upper limit values or the difference between the largest and smallest class limit values for the class in question. Usually the width of the class has the same width. Example:
class width = 41 – 31 = 10 (difference between 2 consecutive lower class limits) or
class width = 50 – 40 = 10 (difference between 2 consecutive upper class limits) or
class width = 40.5 – 30.5 = 10. (difference between the largest and smallest class limits in the 1st class)
Middle value of class : Class value is the middle value of the class in question which is obtained by the following formula: (upper limit of class + lower limit of class) . This value is used as a representative of a certain class interval for further statistical analysis calculations. Example: The value of the 1st class is (31+40) = 35.5
Multiple classes : It's obvious! In the table there are 7 classes.
Class frequency : The number of events (values) that occur in a certain class interval. For example, in class 1, the frequency = 2. The value of frequency = 2 because in the interval between 30.5 – 40.5, there are only 2 numbers that appear, namely test scores 31 and 38.
How to create Frequency Distribution Table (FDT)
The frequency distribution is created for the following reasons:
- large data sets can be summarized
- we can get some idea about the characteristics of the data, and
- is the basis for the creation of important graphs (such as histograms).
Many software (computational technology) can be used to create frequency distribution tables automatically. However, here we will describe the basic procedure for creating a frequency distribution table .
The steps in compiling a frequency distribution table:
- Sort the data, usually sorted from the smallest value
- The goal is to know the range of data and make it easier to calculate the frequency of each class!
- Specify range (range or range)
- Range = maximum value – minimum value
- Specify as many classes as desired. Not too much / too little, ranging between 5 and 20, depending on the amount and distribution of the data.
- Sturge's Rule:
- Number of classes = 1 + 3.3 log n, where n = number of data
- Determine the length/width of the interval class (p)
- Class length (p) = [range]/[many classes]
- Determine the lower end value of the first interval class
When compiling the FDT, make sure that the classes do not overlap so that each observed value must fit into exactly one class. Also make sure that no observational data will be left behind (cannot be assigned to a certain class). Try to use the same width for all classes, although sometimes it is not possible to avoid open intervals, such as " 91 " (91 or more). There may also be a certain class with zero frequency.
Example:
We use the above procedure to construct a distribution table for the frequency distribution of student test scores (Table 1).
Here are the test scores in order:
35 38 43 48 49 51 56 59 60 60
61 63 63 63 65 66 67 67 68 70
70 70 70 71 71 71 72 72 72 73
73 74 74 74 74 75 75 76 76 77
78 79 79 80 80 80 80 81 81 81
82 82 83 83 83 84 85 86 86 87
88 88 88 88 89 90 90 90 91 91
91 92 92 93 93 93 95 97 98 99
2. Range :
[highest value – lowest value] = 99 – 35 = 64
3. Number of Classes :
Specify the number of desired classes.
If we look at the value of Range = 64, there may be many classes
around 6 or 7.
As an exercise, we will use Sturges' rule.
number of classes = 1 + 3.3 x log(n)
= 1 + 3.3 x log(80)
= 7.28 7
4. Class Length : Class
Length = [range]/[number of classes]
= 64/7
= 9.14 10
(for facilitate the preparation of FDT)
5. Determine the value of the lower class limit in the first class.
The smallest test score = 35
The determination of the lower limit value for the class is free,
as long as the smallest value is still included in the class.
For example: if the lower limit value we choose is 26,
then the first class interval: 26 – 35, the value of 35 falls right
on the upper limit of the 1st class. However, if we choose
the lower limit value of class 20 or 25, it is clear that the smallest value, 35,
will not be included in that class.
But for convenience in preparing and reading TDF, of
course also for beauty, he2.. we better choose
the lower limit of 30 or 31. Ok, I'm interested in the number 31,
so the lower limit is 31 .
From the above procedure, we get the following info:
Number of classes : 7
Length of class : 10
Lower limit of class : 31
Next we arrange FDT:
FDT Form:
-------------------------------------------------- ----------
Class- | Test Score | Class Limit | Turus | Frequency
------------------------------------------------- -----------
1 31 -
2 41 -
3 51 -
: : -
6 81 -
7 91 -
--------------------- ---------------------------------------
Quantity
---------- --------------------------------------------------
The following table is a completed table
class- | Test scores | Class Limit | Frequency (f i ) |
1 | 31 - 40 | 30.5 – 40.5 | 2 |
2 | 41 - 50 | 40.5 – 50.5 | 3 |
3 | 51 - 60 | 50.5 – 60.5 | 5 |
4 | 61 - 70 | 60.5 – 70.5 | 13 |
5 | 71 - 80 | 70.5 – 80.5 | 24 |
6 | 81 - 90 | 80.5 – 90.5 | 21 |
7 | 91 - 100 | 90.5 – 100.5 | 12 |
Amount | 80 |
or in a more concise form:
class- | Test scores | Frequency (fi) |
1 | 31 - 40 | 2 |
2 | 41 - 50 | 3 |
3 | 51 - 60 | 5 |
4 | 61 - 70 | 13 |
5 | 71 - 80 | 24 |
6 | 81 - 90 | 21 |
7 | 91 - 100 | 12 |
Amount | 80 |
Relative and Cumulative Frequency Distribution
An important variation of the fundamental frequency distribution is to use its relative frequency values, which are constructed by dividing the frequency of each class by the total of all frequencies (number of data). A relative frequency distribution includes the same class boundaries as FDT, but the frequency used is not the actual frequency but the relative frequency. The relative frequency is sometimes expressed as a percent .
Relative frequency = $ \dfrac{{{f_i}}}{{\sum {f_i}}} \times 100\% = \dfrac{{{f_i}}}{n} \times 100\%$
Example: 1st class relative frequency:
f i = 2; n = 80
Relative frequency = 2/80 x 100% = 2.5%
class- | Test scores | Relative frequency (%) |
1 | 31 - 40 | 2.50 |
2 | 41 - 50 | 3.75 |
3 | 51 - 60 | 6.25 |
4 | 61 - 70 | 16.25 |
5 | 71 - 80 | 30.00 |
6 | 81 - 90 | 26.25 |
7 | 91 - 100 | 15.00 |
Amount | 100.00 |
Cumulative Frequency Distribution
Another variation of the standard frequency distribution is the cumulative frequency . The cumulative frequency for a class is the frequency value for that class plus the sum of the frequencies of all previous classes.
Note that the frequency column other than its header label is replaced with a cumulative frequency of less than , the class boundaries are replaced with "less than" expressions that describe the range of new values.
Test scores | Cumulative frequency less than |
less than 30.5 | 0 |
less than 40.5 | 2 |
less than 50.5 | 5 |
less than 60.5 | 10 |
less than 70.5 | 23 |
less than 80.5 | 47 |
less than 90.5 | 68 |
less than 100.5 | 80 |
or sometimes arranged in a form like this:
Test scores | Cumulative frequency less than |
less than 41 | 2 |
less than 51 | 5 |
less than 61 | 10 |
less than 71 | 23 |
less than 81 | 47 |
less than 91 | 68 |
less than 101 | 80 |
Another variation is the cumulative frequency over. The principle is almost the same as the above procedure.
Histogram
The histogram is a part of the bar graph where the horizontal scale represents the class data values and the vertical scale represents the frequency values. The height of the rods corresponds to the frequency value, and the rods are close to each other, there is no gap between the rods. We can create a histogram after the frequency distribution table of the observation data is created.
Frequency Polygon
Frequency Polygons use line segments connected to points that lie directly above the class's midpoint values. The heights of the points correspond to the class frequencies, and the line segments are expanded to the right and left so that the graph begins and ends on the horizontal axis.
Ogive
An ogive is a line graph that depicts cumulative frequencies, such as a cumulative frequency distribution list . Note that class boundaries are connected by line segments that start at the lower bounds of the first class and end at the upper bounds of the last class. Ogive is useful for determining the number of values below a certain value. For example, the following figure shows that 68 students scored less than 90.5.
How to create Histograms, Frequency Polygons and Ogives? You can learn in the following Excel Tutorial: Histogram (Toolpak)