Sidebar Menu

Reliability Analysis

Reliability analysis is the process of evaluating the extent to which an instrument, such as a survey or test, produces consistent and stable results over time. It is an important concept in quantitative research and psychometrics.

There are several types of reliability, including:

  1. Test-Retest Reliability: This measures the consistency of an instrument when tested on the same subjects at different times. For example, if we ask a series of questions to a group of people, then ask the same questions a few weeks later, test-retest reliability would measure the extent to which their responses are consistent over time. The test used: Pearson or Spearman correlation can be used to measure test-retest reliability.
  2. Parallel or Equivalent Reliability: This measures the consistency between two different versions of the same instrument given to the same subjects. For example, if you have two different versions of a test intended to measure the same ability, parallel reliability would measure the extent to which both tests yield the same results. The test used: Pearson or Spearman correlation can be used to measure parallel reliability.
  3. Internal Reliability: This measures the extent to which items or questions within one instrument relate to each other. For example, if you have a questionnaire with a number of questions all intended to measure stress levels, internal reliability would measure how much the answers to these questions are related. The test used: Cronbach's Alpha or Split-Half (correlation between two randomly selected sets of questions) are common methods used to measure internal reliability.
  4. Inter-Rater Reliability: This measures the extent to which there is consistency between the evaluations of two or more evaluators. For example, if two teachers are grading the same essay, inter-rater reliability would measure how much their evaluations are related. The test used: Cohen’s Kappa, Fleiss’s Kappa, and Intraclass Correlation are common methods used to measure inter-rater reliability.

Note that while reliability is a necessity, it is not sufficient for a research instrument. The instrument must also be valid (measuring what it is supposed to measure). An instrument can be reliable (producing consistent results), but not valid (not measuring the intended variable).

Case Example

The following is a case example of "Reliability Testing of Customer Satisfaction Scale in the E-Commerce Industry". In this case, the type of test used is the Internal Reliability Test with a dataset containing responses from 50 respondents. This scale consists of 8 items, covering various aspects of e-commerce service, and is rated using a 5-point Likert scale. The primary purpose is to evaluate the consistency and reliability of this scale. For this purpose, Cronbach's Alpha Test is used, an internal reliability method that will assess how much the items are interrelated. Through the application of this method, it is hoped to get a better picture of the consistency and reliability of the survey instrument used.

ID

E-commerce Platform

Product Quality

Product Price

Product Variations

Shipping Speed

Customer Service Responsiveness

Product Return Process

Transaction Convenience

Description Accuracy

1

Tokopedia

4

3

5

4

4

3

4

4

2

Shopee

3

4

4

3

4

4

4

3

3

Lazada

4

5

4

3

4

4

4

5

4

Shopee

4

3

5

4

4

4

5

4

5

Bukalapak

5

4

3

4

5

3

4

5

6

Tokopedia

3

3

4

3

4

3

4

4

7

Shopee

4

4

5

4

4

5

5

4

8

Bukalapak

4

3

4

3

3

3

4

3

9

Lazada

3

4

4

3

4

4

4

4

10

Shopee

4

4

5

5

4

5

5

5

11

Tokopedia

5

4

4

4

5

4

4

5

12

Shopee

3

4

4

3

4

4

4

3

13

Lazada

5

5

5

5

5

5

5

5

14

Shopee

4

4

4

4

4

4

4

4

15

Bukalapak

5

5

5

5

5

5

5

5

16

Tokopedia

2

3

2

3

2

2

2

2

17

Shopee

4

4

4

4

4

4

4

4

18

Bukalapak

3

3

3

3

3

3

3

3

19

Lazada

4

4

4

4

4

4

4

4

20

Shopee

5

5

5

5

5

5

5

5

21

Tokopedia

4

4

4

4

4

4

4

4

22

Shopee

2

3

2

3

2

3

3

2

23

Lazada

5

5

5

5

5

5

5

5

24

Shopee

4

4

4

4

4

4

4

4

25

Bukalapak

5

5

5

5

5

5

5

5

26

Tokopedia

2

3

2

3

2

2

2

2

27

Shopee

4

4

4

4

4

4

4

4

28

Bukalapak

3

3

3

3

3

3

3

3

29

Lazada

4

4

4

4

4

4

4

4

30

Shopee

5

5

5

5

5

5

5

5

31

Tokopedia

4

4

4

4

4

4

4

4

32

Shopee

2

3

2

3

2

3

3

2

33

Lazada

5

5

5

5

5

5

5

5

34

Shopee

4

4

4

4

4

4

4

4

35

Bukalapak

5

5

5

5

5

5

5

5

36

Tokopedia

2

3

2

3

2

2

2

2

37

Shopee

4

4

4

4

4

4

4

4

38

Bukalapak

3

3

3

3

3

3

3

3

39

Lazada

4

4

4

4

4

4

4

4

40

Shopee

5

5

5

5

5

5

5

5

41

Tokopedia

4

4

4

4

4

4

4

4

42

Shopee

2

3

2

3

2

3

3

2

43

Lazada

5

5

5

5

5

5

5

5

44

Shopee

4

4

4

4

4

4

4

4

45

Bukalapak

5

5

5

5

5

5

5

5

46

Tokopedia

2

3

2

3

2

2

2

2

47

Shopee

4

4

4

4

4

4

4

4

48

Bukalapak

3

3

3

3

3

3

3

3

49

Lazada

4

4

4

4

4

4

4

4

50

Shopee

5

5

5

5

5

5

5

5

Note: This is fictitious data

Steps for Reliability Testing:

  1. Activate the worksheet (Sheet) to be analyzed.
  2. Place the cursor on the Dataset (to create a Dataset, see Data Preparation method).
  3. If the active cell (Active Cell) is not on the Dataset, SmartstatXL will automatically try to determine the Dataset.
  4. Activate the SmartstatXL Tab.
  5. Click Menu Multivariate > Reliability Analysis.
  6. SmartstatXL will display a dialog box to confirm if the Dataset is correct or not (usually the Dataset cell address is automatically selected correctly).
  7. If it's correct, Click the Next Button.
  8. Next, the Reliability Analysis dialog box will appear:
    A screenshot of a computer Description automatically generated
  9. Select the Variable from the first variable list. In this case, we determine:
    • Item: Product Quality to Description Suitability
    • Model: Cronbach's Alpha
    • Outputs: Check all

    More details can be seen in the following dialog box view:
    A screenshot of a computer Description automatically generated

    Other models that can be chosen: Guttman's Lambda 4 (L4) and Split-Half.

    Guttman's Lambda 4 (L4), also known as the maximum lambda coefficient, is a method used to measure the internal reliability of a scale or instrument, similar to Cronbach's Alpha. Lambda 4 takes into account variance caused by individual variation and item variation.
    Split-Half Reliability: involves splitting the test into two parts (for example, even and odd questions) and then correlating the scores from those two parts.

  10. Press the OK button to generate the output on the Output Sheet.

Analysis Results

Reliability Statistic Value

The reliability analysis results show that the customer satisfaction rating scale has very good reliability. The Cronbach's Alpha value of 0.981 indicates that the items on the scale are very consistent in measuring the same concept, namely customer satisfaction. This value is far above the common threshold for good reliability, which is 0.70. In other words, if respondents rate one item high, they are likely to rate other items high as well, and vice versa.

The N value of 8 indicates that there are 8 items on the scale, namely "Product Quality", "Product Price", "Product Variation", "Delivery Speed", "Customer Service Response", "Return Process", "Ease of Transaction", and "Description Accuracy".

The Lower Bound and Upper Bound (0.972 and 0.988) indicate the range in which we can be 95% confident that the actual Cronbach's Alpha value lies, if we were to repeatedly measure on the same population. The fact that this range is also very high indicates that we can be very confident about the reliability of our scale.

However, it's important to remember that just because a scale is reliable, it doesn't automatically mean that it's valid. Validity, or the extent to which a scale measures what it's supposed to measure, needs to be determined through other methods such as factor analysis or other empirical research.

Cronbach's Alpha Value if Item Removed

A screenshot of a spreadsheet Description automatically generated with medium confidence

This part of the reliability analysis output shows what would happen to the Cronbach's Alpha value if each item is removed one by one from the scale. This helps us determine if there are any items that, if removed, would increase the scale's reliability.

In this case, there are no items that, if removed, would significantly increase the Cronbach's Alpha value. For example, if "Product Quality" was removed, Cronbach's Alpha would drop slightly to 0.977, which is still a very high value. The same applies to the other items.

Therefore, based on these results, it seems that all items contribute well to overall reliability and there are no items that need to be removed.

Correlation and Covariance Matrix

A screenshot of a spreadsheet Description automatically generated with medium confidence

A screenshot of a spreadsheet Description automatically generated with medium confidence

The correlation matrix shows the Pearson correlation between each pair of items. Pearson correlation ranges between -1 and 1, where 1 indicates perfect positive correlation, -1 indicates perfect negative correlation, and 0 indicates no correlation.

In this case, all items seem to have a strong and positive correlation with each other, which is an indication that all items measure the same concept (customer satisfaction) and contribute to the overall scale reliability.

For instance, "Product Quality" has a very strong correlation with "Customer Service Response" (r = 0.950) and "Description Accuracy" (r = 0.952), indicating that respondents who rate product quality highly also tend to rate the customer service response and the product description accuracy positively.

However, keep in mind that a very high correlation between items might suggest redundancy, where several items may be measuring the same aspect of a broader concept. In this case, you might want to consider combining or eliminating some items to make the scale more efficient, as long as it doesn't sacrifice the validity and reliability of the scale. You should also check if there are any other assumptions violated, such as multicollinearity, which can be an issue in some types of advanced statistical analysis.

Item and Scale Statistical Values

A screenshot of a spreadsheet Description automatically generated with medium confidence

This output is the descriptive statistics for each item in the scale and for the scale as a whole.

The descriptive statistics for each item include the total, mean, standard deviation, and number of respondents (N). For example, the total score for "Product Quality" is 192, with an average score of 3.840 (which is obtained from the total divided by N), standard deviation of 0.997 (which measures how far individual scores are from the average), and 50 respondents. The same applies to the other items. For instance, "Ease of Transaction" has the highest average score (4.000), indicating that this is an area where respondents feel most satisfied.

The scale statistics provide similar information for the scale as a whole. The total score for all items is 1568, with an average of 31.360, standard deviation of 6.895, and 8 items in the scale. The average scale score (31.360) is the average of the total scores of all items, which you can get by adding up the total scores for each item and dividing it by the number of items. The scale standard deviation (6.895) measures how far individual total scores are from the scale average. However, note that a large standard deviation could indicate that there is a lot of variation in how respondents rate the items in the scale, which may suggest that there are some areas where respondents feel more satisfied than others.

Analysis Results using Split-Half and Guttman's Lambda 4 model

A screenshot of a computer Description automatically generated with medium confidence

Based on the reliability test results using the Split-Half method, the interpretation is as follows:

  • In the Split-Half reliability analysis, the scale or questionnaire is divided into two parts, namely First Half and Last Half. First Half includes the items "Product Quality", "Product Price", "Product Variety", and "Delivery Speed". Meanwhile, Last Half includes the items "Customer Service Response", "Return Process", "Ease of Transaction", and "Description Accuracy".
  • Cronbach's Alpha for the First Half is 0.945, indicating that the first part of the scale has excellent reliability.
  • Cronbach's Alpha for the Last Half is 0.973, indicating that the second part of the scale also has excellent reliability.
  • The correlation between scores on the First Half and Last Half is 0.976, indicating a very high correlation. This means that respondents who give high scores on the first part tend to give high scores on the second part, and vice versa.
  • The Spearman-Brown coefficient is 0.988, indicating that if this scale is extended with similar items, its reliability is expected to reach 0.988. The Spearman-Brown coefficient is used to predict the reliability of the scale if the number of its items is doubled.
  • The Guttman Split-Half coefficient is 0.985, this is another assessment of scale reliability based on the split-half method. A high Guttman Split-Half coefficient indicates that the scale has excellent reliability.

Overall, these Split-Half reliability test results show that this scale is highly reliable.

Analysis Results using Guttman's Lambda 4

A screenshot of a computer Description automatically generated with medium confidence

Based on the reliability test results with Guttman's Lambda 4 (Max L4), the interpretation is as follows:

  • The maximum value for Guttman's Lambda 4 is 0.993, which means there is a very high consistency among respondents in their evaluation of the items "Product Quality", "Product Price", "Product Variety", and "Ease of Transaction" with the items "Delivery Speed", "Customer Service Response", "Return Process", and "Description Accuracy". This value indicates that this scale is very reliable in measuring the intended construct.
  • Conversely, the minimum value for Guttman's Lambda 4 is 0.957, which means there is a fairly high consistency, but slightly lower, among respondents in their evaluation of the items "Product Quality", "Product Variety", "Customer Service Response", and "Description Accuracy" with the items "Product Price", "Delivery Speed", "Return Process", and "Ease of Transaction".
  • The average and median values of Guttman's Lambda 4 are 0.981 and 0.984, respectively, which means that, on average, there is a very high consistency among respondents in their evaluation of all items in this scale.

Overall, these results indicate that this customer satisfaction scale has very high reliability, meaning that this scale is consistent in measuring what it is intended to measure.