Sidebar Menu

Here is how to conduct Descriptive Statistical Analysis and Normality Tests using SmartstatXL. In addition to calculating statistical values, SmartstatXL can also be used for Normality Tests, both through formal tests and graphically. Descriptive Statistics and Normality Tests that can be calculated using SmartstatXL include:

  1. Measures of Central Tendency (central tendency): Mean, Median, Mode, Harmonic & Geometric Mean
  2. Measures of Dispersion (variability or dispersion): Range, Variance, Standard Deviation, Quartile, Confidence Interval (CI)
  3. Distribution: Skewness, Kurtosis
  4. Normality Tests: Shapiro-Wilks, Kolmogorov-Smirnov & Liliefors, D'Agostino
  5. Homogeneity Tests: Levene and Bartlett's
  6. Charts: Histogram & Normal P-Plot

Analysis Steps

Here are the steps for performing Descriptive Statistical Analysis and Normality Tests:

  1. Activate the worksheet (Sheet) to be analyzed.
  2. Place the cursor on the dataset (for creating a dataset, see the Data Preparation guide).
  3. If the active cell is not on the dataset, SmartstatXL will automatically attempt to determine the dataset.
  4. Activate the SmartstatXL Tab
  5. Click on the Descriptive Statistics menu.
  6. SmartstatXL will display a dialog box to ensure whether the dataset is correct or not (usually, the dataset cell address is automatically selected correctly).
  7. If it is correct, click the Next Button
  8. A Descriptive Statistics Dialog Box will then appear:

    Select the variables to be analyzed!
  9. If the statistical values will be grouped based on certain categories, check the box "Group by variable" and select the variable to be used as the Category/Group.
  10. Select the statistical values you wish to calculate, and if you also want to display the Normality and or Homogeneity Tests, either formal or graphical, check the items in the Normality Tests and Homogeneity Tests section.
  11. Press the "OK" button to proceed.

Analysis Results

Each section of this Descriptive Statistics analysis provides different yet essential information for the interpretation of your data. Statistical values offer measures of central tendency and data dispersion, Normality Tests give insights into data distribution, and Charts or Graphs provide a clear visualization of the data.

The following are the results of Descriptive Statistical Analysis:

Statistical Values

The following is the interpretation of the descriptive statistical analysis for two rock types, Andesitic and Basaltic, based on four variables: Sand, Silt, Clay, and BD (g/cm3).

  1. Sand:
    • Andesitic rocks have an average sand content of 25.441, with a relatively high variation (coefficient of variation 44.091%).
    • Basaltic rocks have a lower average sand content, which is 17.959, but its variation is much higher (coefficient of variation 94.171%).
  2. Silt:
    • Andesitic rocks have an average silt content of 42.621, with a lower variation (coefficient of variation 31.088%).
    • Basaltic rocks have a higher average silt content, which is 49.766, with slightly lower variation (coefficient of variation 28.417%).
  3. Clay:
    • Andesitic rocks have an average clay content of 30.503, with a relatively high variation (coefficient of variation 45.603%).
    • Basaltic rocks have a slightly higher average clay content, which is 32.278, with a similarly high variation (coefficient of variation 46.827%).
  4. BD (g/cm3):
    • Andesitic rocks have an average BD of 0.344 g/cm3, with relatively low variation (coefficient of variation 27.219%).
    • Basaltic rocks have a higher average BD, which is 0.556 g/cm3, with slightly higher variation (coefficient of variation 31.720%).

In general, Basaltic rocks tend to have higher Silt and BD content compared to Andesitic rocks, while Andesitic rocks have higher Sand content. The variability of Sand, Clay, and BD content is higher in Basaltic rocks than in Andesitic rocks.

Normality Tests

Statistical values like skewness and kurtosis can serve as preliminary indicators of data distribution abnormalities.

Skewness: Skewness measures the extent to which the data distribution is skewed or tilted away from a normal distribution. Positive skewness indicates that the tail of the distribution is on the right side (more lower values), while negative skewness indicates that the tail is on the left side (more higher values). If skewness approaches 0, this suggests that the data approximates a normal distribution.

Kurtosis: Kurtosis measures the "thickness" of the tails of the distribution. Positive kurtosis indicates that the data have thicker tails and a sharper peak compared to a normal distribution (leptokurtic), which means more extreme values. Negative kurtosis indicates that the data have thinner tails and a flatter peak compared to a normal distribution (platykurtic), which means fewer extreme values. If kurtosis approaches 0, this suggests that the data approximates a normal distribution (mesokurtic).

However, it should be noted that skewness and kurtosis only provide a preliminary picture of data distribution. Formal normality tests like Shapiro-Wilk, Kolmogorov-Smirnov, or Anderson-Darling are still required to determine whether the data is normally distributed or not.

The following are tables for some normality test results:

Here is the interpretation of normality test results for two rock types, Andesitic and Basaltic, based on four variables: Sand, Silt, Clay, and BD (g/cm3).

  1. Sand:
    • For Andesitic rocks, all normality tests (Shapiro-Wilk's, D'Agostino Pearson, Anderson Darling, Kolmogorov-Smirnov, and Liliefors) indicate that Sand data are normally distributed (p-value > 0.05).
    • For Basaltic rocks, the Shapiro-Wilk's and Anderson Darling tests indicate that the Sand data are not normally distributed (p-value < 0.05).
  2. Silt:
    • For both Andesitic and Basaltic rocks, all normality tests indicate that Silt data are normally distributed (p-value > 0.05).
  3. Clay:
    • For both Andesitic and Basaltic rocks, all normality tests indicate that Clay data are normally distributed (p-value > 0.05).
  4. BD (g/cm3):
    • For Andesitic rocks, the Shapiro-Wilk's, Anderson Darling, and Liliefors tests indicate that BD data are not normally distributed (p-value < 0.05).
    • For Basaltic rocks, the Shapiro-Wilk's and Anderson Darling tests indicate that BD data are not normally distributed (p-value < 0.05).

Generally, Silt and Clay data for both rock types are normally distributed. Conversely, Sand data in Basaltic rocks and BD data in both rock types are not normally distributed.

Charts (Graphs)

Here are general ways to interpret the Normal P-P Plot and Histogram:

  1. Normal P-P Plot: This plot is used to check if our data is normally distributed. In this plot, our data points are compared with a theoretical normal distribution. If our data is normally distributed, the points will closely follow the diagonal line.
  2. Histogram: This plot provides a general idea about the distribution of our data. We can identify the skewness and kurtosis visually by observing the shape of the plot. A symmetric plot indicates that the data are normally distributed.

When reviewing the charts, Sand and BD data for both rock types show a considerable deviation from the diagonal line in the Normal P-P Plot, suggesting they are not normally distributed. On the other hand, the Silt and Clay data points closely follow the diagonal line, indicating they are normally distributed.

Conclusion

In summary, the Descriptive Statistical Analysis indicates that Basaltic rocks tend to have higher Silt and BD content compared to Andesitic rocks, while Andesitic rocks have higher Sand content. Normality tests suggest that Silt and Clay data for both rock types are normally distributed, while Sand and BD data are not.

This analysis is crucial for further hypothesis testing or inferential statistical analysis. It helps to identify which variables are normally distributed and which are not, thereby aiding in the selection of appropriate statistical methods.