Sidebar Menu

As an additional feature in Excel, SmartstatXL offers convenience in analyzing experimental data. This includes specialization in factorial Analysis of Variance (ANOVA) based on a Completely Randomized Design (CRD). While it mainly supports balanced designs, SmartstatXL is not limited to standard designs. It also supports data analysis with various mixed models.

Specific features for factorial experiments in SmartstatXL include:

  1. Factorial CRD: Refers to factorial experiments where each observation is measured only once.
  2. Factorial CRD: Sub-Sampling: This option is for observations that are repeated, with the ability to draw sub-samples from a single observational unit. For example, in a single observational unit (treatment 3Dok1, repetition 1), 10 plants are measured.
  3. Factorial CRD: Repeated Measure: Designed for observations that are measured periodically from a single observational unit, such as every 14-day intervals.
  4. Factorial CRD: Multi-Location/Season/Year: This option is intended for experiments conducted in different locations, seasons, or years.

If treatments show a significant effect, SmartstatXL provides a variety of Post hoc Tests for comparing the average treatment values. Some of these are: Tukey, Duncan, LSD, Bonferroni, Sidak, Scheffe, REGWQ, Scott-Knott, and Dunnet.

Case Example

An Electrical Engineer claims that the maximum output voltage of a car battery is influenced by the type of material and the temperature of the location where the battery is assembled. Four repetitions of the factorial experiment were conducted in the laboratory for three materials and three temperatures. The Battery Durability Experiment with a basic CRD design yielded results as shown in the following table:

Material

Replication

Temperature

15

70

125

A

1

130

34

20

 

2

74

80

82

 

3

155

40

70

 

4

180

75

58

B

1

150

136

25

 

2

159

106

70

 

3

188

122

58

 

4

126

115

45

C

1

138

174

96

 

2

168

150

82

 

3

110

120

104

 

4

160

139

60

Cited from:
Gaspersz, Vincent. 1991. Experimental Design Methods: For Agricultural Sciences, Engineering, and Biology. Bandung: Armico, 1991. p. 211.

Steps for Analysis of Variance (ANOVA) and Post Hoc Tests:

  1. Ensure that the worksheet (Sheet) you wish to analyze is active.
  2. Place the cursor on the Dataset. (For information on creating a Dataset, please refer to the 'Data Preparation' guide).
  3. If the active cell is not on the dataset, SmartstatXL will automatically detect and select the appropriate dataset.
  4. Activate the SmartstatXL Tab
  5. Click on the Menu Factorial > Factorial CRD.
    Menu Factorial > Factorial CRD
  6. SmartstatXL will display a dialog box to confirm whether the Dataset is correct or not (usually the cell address for the Dataset is automatically selected correctly).
  7. After confirming the Dataset is correct, press the Next Button
  8. The following Anova – Factorial CRD Dialog Box will appear:
    Anova – Factorial CRD Dialog Box
  9. There are three stages in this dialog. In the first stage, select the Factors and at least one Response you wish to analyze.
  10. When you select Factors, SmartstatXL will provide additional information about the number of levels and their names. In a CRD experiment, Repetitions are not included as a factor.
  11. Details of the Anova STAGE 1 dialog box can be seen in the following image:
    Anova STAGE 1 Dialog Box
  12. After confirming the Dataset is correct, press the Next Button to move to the Anova Stage-2 Dialog Box
  13. The dialog box for the second stage will appear.
    Anova STAGE 2 Dialog Box
  14. Adjust the settings based on your research method. In this example, the Post Hoc Test used is the REGWQ Test.
  15. To set up additional outputs and default values for subsequent outputs, press the "Advanced Options..." button.
  16. The following shows the Advanced Options Dialog Box:
  17. After completing the setup, close the "Advanced Options" dialog box.
  18. Next, in the Anova Stage 2 Dialog Box, click the Next button.
  19. In the Anova Stage 3 Dialog Box, you will be asked to specify the average table, ID for each Factor, and rounding of the average values. The details can be seen in the following image:
    Anova Stage 3 Dialog Box
  20. As a final step, click "OK"

Analysis Results

Analysis Information

From the information provided, we can understand the structure of the experiment conducted. This experiment utilizes a Factorial Completely Randomized Design (CRD). Factorial CRD is one of the experimental design methods that allow us to examine the influence of more than one factor simultaneously. In this context, there are two factors being examined: Material and Temperature.

  • Material: In this experiment, three different types of materials are being tested. The objective is to determine whether there is a significant difference in battery durability based on the type of material used.
  • Temperature: Additionally, three different temperature levels are being tested to find out their influence on battery durability. This means that the experiment aims to understand how the environmental temperature during battery assembly affects its durability.

The post hoc test employed is REGWQ (Ryan-Einot-Gabriel-Welsch-Quelett). This is a post-hoc test used for comparing group means among various groups in an analysis of variance. This test is used when we want to know among which groups there are significant differences after finding that there is a significant difference in the ANOVA.

The response measured in this experiment is "Battery Durability." This means that battery durability is the dependent variable we want to predict or explain based on independent variables (in this case, Material and Temperature).

In the context of Analysis of Variance, we will assess whether there is a significant difference in battery durability based on the type of material and assembly temperature. If significant differences are present, the post hoc test will indicate between which groups these differences lie.

Experiments like this are crucial, particularly in the manufacturing industry, where the correct selection of material and optimal assembly conditions can improve product quality and reduce production costs.

Analysis of Variance

Interpretation and Discussion:

In the Analysis of Variance table for the response "Battery Durability," we can see the following information:

  1. Variance Source (Material, Temperature, Material x Temperature Interaction, and Error): These are the factors or combinations of factors being analyzed to determine their influence on the variability in battery durability.
  2. Degrees of Freedom (DF): This indicates the number of values that can change freely without affecting other values.
  3. Sum of Squares (SS): This is a measure of the variability explained by a specific variance source.
  4. Mean Square (MS): This is the average variability explained by a specific variance source.
  5. F-Value: This is the test statistic used to compare variability between groups with variability within groups.
  6. P-Value: This is the probability of obtaining a result at least as extreme as the one observed if there is no actual effect.
  7. F-0.05 and F-0.01: These are the critical values from the F-distribution at the 5% and 1% significance levels, respectively.

From the analysis results, we can conclude:

  • Material (M): There is a significant difference in battery durability based on the material type with an F-value of 7.911, which is higher than the critical value at 1% (5.488). Therefore, we can state that material has a significant effect on battery durability at a 99% confidence level.
  • Temperature (T): Temperature also shows a significant influence on battery durability with an F-value of 28.968, which is far higher than the critical value at 1% (5.488). This indicates that temperature has a very significant effect on battery durability at a 99% confidence level.
  • Material x Temperature Interaction (M x T): There is a significant interaction between material and temperature at a 95% confidence level with an F-value of 3.560, which is higher than the critical value at 5% (2.728). This means that different combinations of material types and temperatures have varying effects on battery durability.
  • Coefficient of Variation (CV): A CV value of 24.62% indicates that approximately 24.62% of the total variability in battery durability can be explained by the model that includes material, temperature, and their interaction.

In practical terms, these results indicate that both material and temperature have significant influences on battery durability. However, it is also important to consider how these two factors work together (their interaction) to affect battery durability. This can assist electrical engineers in choosing the right material and optimal temperature conditions during battery assembly to achieve maximum durability.

Post hoc Test

Based on the Analysis of Variance, there is a significant interaction effect between Material and Temperature on battery lifespan. Even though the Single Effects are significant, the discussion should focus on the interaction effect of these two factors. Interaction between two or more factors in an analysis of variance indicates that the influence of one factor on the response depends on the level of the other factor. In the context of this experiment, there is a significant interaction between Material and Temperature on Battery Lifespan. This means that the effect of the type of material on battery lifespan varies depending on the temperature at which the battery is assembled.

Single Effect

Post hoc Test (table and graph)

However, it's important to remember that even though there is a significant interaction, this does not mean we should disregard the main effects of each factor. For example, if a material consistently shows better battery lifespan at all temperatures, then that material may still be the best choice despite the interaction with temperature.

Interaction Effect: Material x Temperature

The interaction between Material and Temperature suggests that specific combinations of material type and temperature can result in different battery lifespans. For example, one material may perform best at low temperatures but not at high temperatures, while another material may perform the opposite.

To fully understand and interpret this interaction, we need to look at the visualization of this interaction, such as interaction graphs. These graphs will display the average battery lifespan for each combination of material and temperature. From these graphs, we can see:

  1. Whether there is a specific pattern in the interaction between material and temperature.
  2. Which combination of material and temperature yields the maximum battery lifespan.
  3. How changes in temperature affect the battery lifespan for each type of material.

Additionally, knowing this interaction allows manufacturers to make more informed decisions about which material to use and at what temperature the battery should be assembled for optimal performance. This is crucial for improving product quality and maximizing battery lifespan.

There are two formats for presenting the average effect tables for the interaction. You can choose either one or both. 

First Format (Material x Temperature Interaction Effect):

  • The table is presented in a one-way table format, where treatment levels are combined and laid out like the Single Effect table. 
  • This table presents the combined effects of Material and Temperature, illustrating how specific combinations of Material and Temperature affect Battery Lifespan. This is a direct representation of the interaction effect.

Post hoc Test (table and graph)

Second Format (Simple Effects of Material x Temperature Interaction):

  • Tests for simple effects and presented in a two-way table format. 
  • This table delineates the simple effects, showing how the effect of one factor (e.g., Material) varies at certain levels of the other factor (e.g., Temperature). This helps in understanding how the interaction operates more deeply.

Post hoc Test (table and graph)

The choice of average effect table and graph presentation can be adjusted through Advanced Options (refer back to step 15 of the Analysis of Variance Steps).

To test and interpret the interaction effects, the First Format is a good starting point as it provides a direct overview of how specific combinations of Material and Temperature affect Battery Lifespan. Once you understand these combined effects, you can delve into the Second Format for more detailed insights into how the effect of one factor varies at certain levels of the other factor.

When there is a significant interaction in the analysis of variance, it is often more informative to test for simple effects as it provides deeper insights into how one factor affects the response at specific levels of another factor.

Let's delve deeper based on the following questions:

  1. Which material and temperature combination yields the maximum battery lifespan?
    • From the First Table, we can see that combination B at 15°C (B_15) provides the highest average battery lifespan with a value of 155.75. Therefore, material B at 15°C is the optimal combination for maximum battery lifespan.
  2. How does temperature variation affect the battery lifespan for each type of material?
    • From the Second Table, we can observe how battery lifespan changes with temperature variations for each material type:
      • Material A: Highest lifespan at 15°C, decreasing at temperatures of 70°C and 125°C.
      • Material B: Highest lifespan at 15°C, slightly declining at 70°C, and significantly dropping at 125°C.
      • Material C: Relatively stable lifespan at 15°C and 70°C, but decreases at 125°C.
    • This indicates that temperature affects battery lifespan, but the effect varies depending on the material type.

Decision for Manufacturers:

From the Second Table, manufacturers can make more informed decisions regarding which material to use and at what temperature the batteries should be assembled for optimal performance. For example, if manufacturers opt to use material A, they should consider assembling batteries at 15°C for optimal battery lifespan. Conversely, if they choose material C, they could possibly assemble batteries at 15°C or 70°C for comparable performance.

Thus, the Second Table provides deeper insights into how the interaction between material and temperature impacts battery lifespan, which is invaluable for manufacturers in making the right production decisions.

Recommendations

In the context of interaction analysis and the goal of providing clear recommendations to manufacturers, I recommend the Second Table (Simple Effects of Material x Temperature Interaction). The reasons are:

  1. Depth of Analysis: The Second Table provides more nuanced insights into how one factor (e.g., Material) affects the response (Battery Lifespan) at each level of another factor (Temperature). This allows manufacturers to better understand how each material performs at specific temperatures.
  2. Ease of Interpretation: The Second Table presents simple effects of the interaction, making it easier to interpret. By comparing the battery lifespan across different materials at the same temperature, as well as comparing battery lifespan at different temperatures for the same material, manufacturers can quickly identify optimal conditions for maximum battery lifespan.
  3. Production Decisions: From a decision-making perspective, the Second Table offers more direct and actionable information. For instance, if manufacturers want to know the optimal temperature for assembling batteries with a specific material, they can easily find it in the Second Table.

However, this does not mean that the First Table is irrelevant. The First Table provides a general overview of the combined effects of Material and Temperature. But, for the purpose of providing clear and specific recommendations, the Second Table is more appropriate.

ANOVA Assumption Checks

Formal Approach (Statistical Tests)

ANOVA assumption checks (normality test, homoscedasticity test, residual plots)

Interpretation and Discussion:

Before conducting an Analysis of Variance (ANOVA), several assumptions must be met, including the homogeneity of variances and data normality. Violations of these assumptions can affect the validity of the ANOVA results.

Levene's Test for Homogeneity of Variances:

  • Levene's Test is used to evaluate whether the response variances are equal across all the groups being compared.
  • Result: The F-statistic from Levene's Test is 0.902 with a p-value of 0.529. Because the p-value is greater than 0.05, we fail to reject the null hypothesis stating that the variances are equal across all groups. Therefore, the data meet the homogeneity of variances assumption.

Normality Test:

  • The purpose of the normality test is to assess whether the residuals (the differences between observed values and the values estimated by the model) follow a normal distribution.
  • All of the tests above indicate a p-value greater than 0.05, which means that we fail to reject the null hypothesis that the residuals are normally distributed. Therefore, the data meet the assumption of normality.

Conclusion:

The data meet both crucial assumptions of ANOVA: homogeneity of variance and normality. This means that the results of the prior analysis of variance are valid and reliable. If either or both of these assumptions are violated, different analytical methods or data transformations may be required to meet these assumptions. However, in this case, no additional actions are needed.

Visual Approach (Graphical Plot)

Assumption checks for ANOVA (normality test, test for homoscedasticity, residual plots)

In addition to formal tests, visual inspection for normality can also be carried out using the provided residual plots. Let's discuss each graph you uploaded for graphical assumption checks:

Normal P-Plot of Residual Data:

Interpretation: In a normal probability plot, points should follow a diagonal line if the data are normally distributed. From the graph, the points tend to follow the diagonal line with minor deviations at both ends, but overall, the points are closely aligned with the line. This suggests that the residuals approximate a normal distribution.

Residual Data Histogram:

Interpretation: Histograms are used to visualize the frequency distribution of data. From the residual histogram, we can see that the distribution shape tends to be symmetric and resembles a bell curve, indicating a normal distribution.

Plot of Residual vs. Predicted:

Interpretation: In a residual vs. predicted plot, we aim to see a random scatter of points without any specific shape or pattern. If a pattern exists, it may indicate a problem with the model. From the graph, the points appear to be randomly scattered without any specific pattern, indicating that the model fits the data well and that there are no issues of heteroscedasticity.

Standard Deviation vs. Mean:

Interpretation: This graph is used to evaluate the homogeneity of variances. If points are randomly scattered around a horizontal line without any specific pattern, it indicates that the variance is constant across the levels of the independent variable. From the graph, points appear to be randomly scattered without any specific pattern, indicating that the data meet the assumption of homogeneity of variances.

Conclusion:

Based on the visual inspection of the four provided graphs, it seems that the data already meet the required assumptions for analysis of variance. This confirms the findings from the prior statistical assumption checks and adds further confidence to the validity of the analysis results.

Box-Cox and Residual Analysis

Information on Box-Cox transformation, residual table, and outlier data information

Notes:

  • If the analysis of variance assumptions are not met, re-examine the outlier data or replace it using missing data formulas.
  • You may also try transforming the data using the lambda value from the Box-Cox Transformation calculations.
  • The transformed data can be found in the last column (Box Cox Trans).

Box-Cox Transformation

The Box-Cox method is used to identify the most suitable transformation to make the data more closely meet the assumptions of normality and homoscedasticity. In this case, the found lambda value is 0.894. However, "No Transformation" is chosen, meaning the original (untransformed) data are used in the analysis. This implies that either the original data already meet the required assumptions or the transformation suggested by Box-Cox does not offer significant improvements.

Residual Values and Outlier Data Examination:

These columns provide information on how well the model fits the data at each observation point and whether there are any observations that may behave unusually or as outliers.

  • Predicted: These are the values estimated by the model for each observation.
  • Residual: This is the difference between the actual observed value and the value estimated by the model.
  • Leverage: Measures how far the independent values of a particular observation deviate from the average. Observations with high leverage can have a significant impact on model fit.
  • Studentized Residual: These are the normalized residuals; they help identify outliers.
  • Studentized Deleted Residual: Similar to studentized residuals but more robust for identifying outliers.
  • Cook's Distance: Measures the influence of each observation on all regression estimates. Large values may indicate outliers.
  • DFITS: Similar to Cook's Distance, used for identifying outliers.
  • Diagnostic: Provides additional information about each observation, such as whether it is considered an outlier.
  • Box Cox Data: This is the data that has been transformed using the Box-Cox transformation.

From the residual data, the second observation (with Material A and Temperature 15) is identified as an "Outlier". This means that this observation behaves differently than expected based on the model and may need to be handled carefully or further analyzed to understand the cause of its difference.

Conclusion:

The analysis results indicate that the overall model fits the data well, but there are a few observations that may behave as outliers. It is important to consider whether these outliers should be excluded from the analysis or whether there are substantive reasons for their differences.