In the realm of research, experiments are often conducted across various locations or repeated in different seasons and years to ensure the validity and reliability of the results. In this context, SmartstatXL offers advanced solutions for researchers. As an Excel Add-In, SmartstatXL simplifies data analysis for single-factor experiments, including Completely Randomized Design (CRD), Randomized Block Design (RBD), and Latin Square Design (LSD), conducted in diverse geographic and temporal settings. Although currently prioritizing balanced design, this tool still offers flexibility and accuracy in analysis.
If the results indicate a significant treatment effect, the next step is to compare the average treatment values. For this purpose, SmartstatXL provides various Post hoc Tests, among them: Tukey, Duncan, LSD, Bonferroni, Sidak, Scheffe, REGWQ, Scott-Knott, and Dunnet. Through these features, researchers can gain deep insights into their data and make decisions based on robust empirical evidence.
Case Example
A rice trial with five treatments and three replications conducted over two growing seasons. The design used: Randomized Block Design. Rice Production Data (ton/ha) is presented in the following table:
Season | |||
Nitrogen | Replication | Rainy | Dry |
0 | 1 | 4.999 | 4,891 |
2 | 3,503 | 2,577 | |
3 | 5.356 | 4,541 | |
60 | 1 | 6.351 | 6.009 |
2 | 6.316 | 6.625 | |
3 | 6.582 | 5.672 | |
90 | 1 | 6.071 | 6,712 |
2 | 5,969 | 6.693 | |
3 | 5.893 | 6,799 | |
120 | 1 | 4.818 | 6.458 |
2 | 4.024 | 6.675 | |
3 | 5.813 | 6.636 | |
150 | 1 | 3.436 | 5.683 |
2 | 4047 | 6.868 | |
3 | 3,740 | 5.692 |
Cited from:
Gomez, Kwanchai A. and Gomez, Arturo A. 1995. Statistical Procedures for Agricultural Research. Translated by Endang Sjamsuddin and Justika S. Baharsjah. Second Edition. Jakarta: UI-Press, 1995. ISBN: 979-456-139-8. p. 326.
Steps for Analysis of Variance (Anova) and Post Hoc Test:
- Ensure the worksheet (Sheet) you wish to analyze is active.
- Place the cursor on the Dataset. (For information on creating a Dataset, please refer to the 'Data Preparation' guide).
- If the active cell is not on the dataset, SmartstatXL will automatically detect and select the appropriate dataset.
- Activate the SmartstatXL Tab.
- Click on the Menu One Factor > CRD/RBD > Multi Location/Year/Season (In this case study, SmartstatXL uses RBD).
Or - SmartstatXL will display a dialog box to confirm whether the Dataset is correct or not (usually, the cell address for the Dataset is automatically selected correctly).
- After confirming that the Dataset is correct, click on the Next Button.
- A dialog box titled Anova – Single-Factor CRD (repeated across multiple seasons) will appear next:
The One-Factor Experimental Model is analyzed by adding additional Season/Location/Year factors and is analyzed jointly (sometimes referred to as Split Plot in Time).
In the Split Plot in Time model, for both CRD and RBD, the Replication Factor must be included in the model!
Compare this with a Single-Factor CRD Experimental Model analyzed partially for each season, as shown in the following picture:
In the Single-Factor CRD Experimental Model above, the only variable that enters the Model (Factor) is Treatment alone. - There are 3 Phases. In the first phase, select the Factor and at least one Response to be analyzed (As shown in the picture above)!
- When you select a Factor, SmartstatXL will provide additional information about the number of levels and their names.
- The details of the Anova Step 1 dialog box can be seen in the following picture:
- After confirming that the Dataset is correct, click on the Next Button to proceed to Anova Dialog Box Step 2
- The dialog box for the second phase will appear.
- Adjust the settings according to your research method. In this example, the Post Hoc Test used is Duncan Test.
- To adjust additional outputs and default values for subsequent outputs, click on the "Advanced Options…" button.
- Here is the appearance of the Advanced Options dialog box:
- Once the settings are completed, close the "Advanced Options" dialog box.
- Next, in the Anova Step 2 dialog box, click on the Next button.
- In the Anova Step 3 dialog box, you will be asked to specify the average table, ID for each Factor, and rounding of the average values. The details can be seen in the following picture:
- As the final step, click "OK".
Analysis Results
Analysis Information
Based on the provided information, the analysis of variance (ANOVA) was conducted using a Randomized Block Design (RBD) with a single repeated factor across multiple seasons. Three factors were considered in this analysis:
- Replications: There are 3 replications in this experiment.
- Seasons: Two different seasons (Wet and Dry) are used as experimental conditions.
- Nitrogen: There are 5 levels of nitrogen used as treatments.
The Post hoc Test employed is the Duncan Test, commonly used for comparing group means.
A note is made on "Assumption Violations. Assumptions commonly applied in ANOVA include data normality, variance homogeneity, and observation independence. If these assumptions are violated, the results from the ANOVA may become less valid.
ANOVA Assumption Checks
Formal Approach (Statistical Tests)
Levene's Test for Homogeneity of Variance
The Levene's Test result indicates that there is a violation of the assumption of homogeneity of variance between groups. In other words, the variances among the groups are not equal. The significant p-value (0.003 < 0.05) suggests caution in interpreting ANOVA results, as this assumption is not met. There are several approaches to address this, such as data transformation or employing more robust analysis techniques against this assumption violation.
Normality Tests
Unlike the Levene's Test, all normality tests show that the residual data is normally distributed (all p-values > 0.05). This means the normality assumption is met, and we can proceed with ANOVA without worrying about this assumption violation.
Preliminary Conclusion
- The homogeneity of variance is not met, requiring more careful interpretation of ANOVA results.
- Normality assumption is met, allowing us to proceed with the analysis.
Visual Approach (Plot Graphs)
Here is the interpretation and discussion of the assumption checks based on the graphs:
- Normal P-Plot of Residual Data
- This graph is used to assess whether the residual data is normally distributed. In this plot, data points should follow the diagonal line if the data is normally distributed. The plot indicates that data points generally follow the diagonal line, although with some deviations. This confirms that the normality assumption is largely met, consistent with the statistical normality tests conducted.
- Residual Data Histogram
- This histogram is also used to assess data normality. The graph shows that the residual data distribution appears symmetrical and follows a pattern similar to a normal distribution. This corroborates the results of the statistical tests and the Normal P-Plot, indicating that the residual data is generally normal.
- Residual vs. Predicted Plot
- This plot is used to check the assumptions of independence and homoscedasticity (constant variance across predictor levels). Ideally, points should be randomly dispersed without any specific pattern. The presented graph shows some patterns, albeit not very clear. This may indicate a violation of assumptions, consistent with the Levene's Test result.
- Standard Deviation vs. Mean
- This graph is used to assess the stability of variance across groups. In an ideal plot, points should be scattered around a constant mean value. The graph shows some variation between groups, confirming the Levene's Test results about the instability of variance between groups.
Conclusion
Overall, visual assumption checks indicate that normality assumptions are met but there are signs of violation of homogeneity of variance and possibly also independence. This will affect how we interpret ANOVA results and may require further actions, such as data transformation or using more robust analysis methods.
Analysis of Variance (ANOVA)
Analysis of variancefor Rice Production Output Variable (ton/ha)
Season (M)
From these results, we can see that the season effect on rice production output is significant at the 5% level (P-Value = 0.020 < 0.05, F-Statistic > F-0.05). This implies that the season has a substantial impact on rice production.
Nitrogen (N)
The results indicate that the effect of nitrogen on rice production is highly significant at the 1% level (P-Value = 0.000 < 0.01). This means that different levels of nitrogen have a highly significant effect on rice production.
Season x Nitrogen Interaction (M x N)
The interaction between season and nitrogen is also highly significant at the 1% level (P-Value = 0.006 < 0.01). This shows that the effect of nitrogen on rice production varies between the wet and dry seasons.
Conclusion
- Season has a significant effect on rice production output.
- Nitrogen levels have a highly significant effect on rice production output.
- There is a significant interaction between season and nitrogen levels in influencing rice production output.
Coefficient of Variation (CV)
- CV(a) = 10.18%
- CV(b) = 12.05%
The coefficient of variation represents the relative variation of the data. These numbers provide insight into how much the data varies. CV(a) and CV(b) indicate a relatively moderate variation, providing a high level of confidence in the analysis results.
Thus, further post hoc tests may be required to identify which nitrogen treatment provides the best results for each season.
Post hoc Test
Based on the analysis of variancethat shows significance in all three components (Season, Nitrogen, and Season x Nitrogen interaction), ideally, all three aspects would be discussed to provide a comprehensive overview. However, the sequence can be adjusted depending on the research objective or questions to be answered.
- Independent Effect of Season: If the research focus is to understand how seasons affect rice production, then this can be discussed first.
- Independent Effect of Nitrogen: If the goal is to identify the most effective nitrogen dose, then a post hoc test for nitrogen may be prioritized.
- Interaction Effect of Season and Nitrogen: If the research aim is to understand how seasons and nitrogen interact to influence production, then this can be discussed last or even become the main focus.
It should be noted that the interpretation of independent effects can differ or even contradict the interpretation of interaction effects. This is one reason why it is important to examine interaction effects in experimental designs involving more than one factor.
Interpretation and Discussion Method
- Explain Both: Initially, it is crucial to explain both findings in your interpretation and discussion. Explaining both independent and interaction effects concurrently provides a more comprehensive understanding of the phenomenon being studied.
- Clarification: If there are discrepancies between independent and interaction effects, this needs clarification. For example, you could state that although rice production is generally higher in the dry season, this finding only holds if more than 120 kg/ha of nitrogen fertilizer is applied.
- Contextualize: Context is key. Explaining under what conditions the independent effects hold and under what conditions interaction effects become more dominant can significantly help the reader in understanding the research outcomes.
- Specify Conditions: In this case, the effect of the season on rice production is not universal but depends on the amount of nitrogen fertilizer applied. This is critical information that should be emphasized.
- Recommendations: Based on the interpretation, you can also offer recommendations. For instance, if the aim is to increase rice production in the dry season, applying more than 120 kg/ha of nitrogen fertilizer may be recommended based on the research findings.
- Importance of Further Study: Lastly, any ambiguity or contradiction in the data presents an opportunity for further research. Therefore, highlighting this in the 'Recommendations for Future Research' could be a valuable addition.
By integrating all this data, a comprehensive and in-depth perspective will be gained on the effects of season and nitrogen, as well as the interaction between the two, on rice production.
Single Effects
1. Single Effect of Season (M)
Based on the Duncan post hoc test, rice production in the dry season (5.90 ton/ha) is significantly higher than in the wet season (5.13 ton/ha) at the 5% significance level. This indicates that, irrespective of other factors, the dry season is generally more favorable for rice production compared to the wet season.
2. Single Effect of Nitrogen (N)
Based on the post hoc test results:
- Nitrogen application at 0 and 150 kg/ha resulted in the lowest rice yields (4.31 and 4.91 ton/ha).
- Nitrogen application at 60, 90, and 120 kg/ha yielded higher results, with 90 kg/ha providing the highest yield (6.36 ton/ha).
This indicates that nitrogen application does have an effect on rice production yield, but increasing the nitrogen dose doesn't always lead to higher yields. Instead, there is an optimal level of nitrogen that yields the best results.
Interaction Effects
First Format
Overall Interaction
The results indicate that the combination of season and nitrogen dose influences rice production. For instance, during the dry season with no nitrogen application (0 kg/ha), the yield was the lowest (4.00 ton/ha). However, in the dry season with a nitrogen application of 90 kg/ha, the yield was among the highest (6.73 ton/ha).
Second Format
Simple Interaction Effects
From the table, we can compare the two seasons at the same nitrogen levels as well as compare two nitrogen levels within the same season.
- At nitrogen doses below 120 kg/ha, there is no significant difference between the wet and dry seasons.
- At nitrogen doses of 120 and 150 kg/ha, dry season yields are higher than those in the wet season.
When comparing between two nitrogen doses in the same season:
- In the dry season, applying nitrogen at 60, 90, 120, and 150 kg/ha resulted in significantly higher yields compared to a 0 kg/ha dose.
- In the wet season, a nitrogen dose of 60 kg/ha yielded significantly higher results compared to doses of 0, 120, and 150 kg/ha.
Conclusion
- The season has a significant impact on rice production yields.
- Nitrogen dosage affects rice production but there's an optimal dose that yields the best results.
- There is a significant interaction between season and nitrogen dosage. The effect of nitrogen dosage varies depending on the season.
Recommendation:
- To maximize rice production yield, consider the appropriate nitrogen dose in relation to the planting season. For instance, during the dry season, a nitrogen application of 90 kg/ha appears to be the most effective.
Box-Cox and Residual Analysis
Box-Cox Transformation
- Lambda: 2.000
- Transformation: Square Transformation: Y2
In an effort to meet the assumptions of the statistical model (such as homogeneity of variance), a Box-Cox transformation is employed. In this case, a square transformation (with λ=2) is used, which means the response variable (Rice Production Yield) is squared (Y2) in the analysis.
Outlier and Residual Examination
Residuals and Leverage
- Residuals are the differences between actual observed values and the values predicted by the model. For instance, for the first treatment (Dry Season, Nitrogen 0), the residual is 4.891–4.0515=0.8395.
- Leverage indicates the extent to which a data point influences the model estimation. A high leverage value (for example, greater than 2(k+1)/n, where k is the number of predictors and n is the number of observations) suggests that the data point has a significant influence on the model.
Studentized Residual
- Studentized residuals are normalized residuals. High absolute values of studentized residuals (for example, greater than 2 or 3) could indicate the presence of outliers.
Outliers
- From the residual data, we see that one data point (Dry Season, Nitrogen 0, Replication 2) is marked as an "Outlier". This means that this data point significantly deviates from others and influences the model.
Conclusion
- Box-Cox transformation is used to help meet the statistical model assumptions.
- Examination of residuals and outliers is an essential part of the data analysis process. In this case, there is one data point marked as an outlier, which might require further investigation to understand why it deviates.
- Leverage and studentized residuals are useful metrics for evaluating model fit and identifying data points that might be problematic or have a significant impact on the model.
Thus, this provides additional valuable information for understanding model fit and the reliability of the analysis of variance results. If there are data points considered as outliers, further analysis or adjustments in the model may be needed to get more accurate estimates.
Data Transformation: Data Imputation Process
In the process of data analysis, meeting statistical assumptions is a crucial step to ensure the accuracy and validity of interpretations. In this case, there are violations of assumptions that cannot be resolved despite various transformation efforts, including those suggested by the Box-Cox method. To address this issue, SmartstatXL adopts an alternative approach by replacing data considered as outliers using imputation methods for missing data. Through this approach, a solution that meets the assumptions of homogeneity of variances and data distribution normality is finally found, thereby enhancing the reliability of the analysis outcomes.
Results of Data Imputation Process
In an effort to meet crucial statistical assumptions, SmartstatXL has performed data imputation, replacing data considered as outliers with values calculated from missing data. This process is carried out with a depth of three levels ("Depth: 3") to find the most closely approximating and accurate solution.
Data Change Details
- Data for Dry Season, Nitrogen 0 kg/ha, Replication 2 was originally 2.577 and has been replaced with 4.9908.
- Data for Dry Season, Nitrogen 150 kg/ha, Replication 2 was originally 6.868 and has been replaced with 5.9644.
- Data for Wet Season, Nitrogen 0 kg/ha, Replication 2 was originally 3.503 and has been replaced with 5.2781.
- Data for Wet Season, Nitrogen 120 kg/ha, Replication 2 was originally 4.024 and has been replaced with 5.4139.
Notes
- Missing: Missing data is replaced using calculations from missing data.
- Replace: Data considered as outliers is replaced using calculations from missing data.
By performing this imputation, the final statistical model now satisfies the assumptions of homogeneity of variances and normality, which will enhance the accuracy and validity of analysis interpretation. This is an important step as it ensures that the conclusions drawn from the data are valid and reliable.
ANOVA Assumption Checks
ANOVA Assumption Checks After Data Imputation
Homogeneity of Variances
The Levene's Test shows an F-Value of 2.125 with a P-Value of 0.077. Because the P-Value is greater than 0.05, this indicates that the assumption of homogeneity of variances is met. This means that the variation among different experimental groups is sufficiently uniform, validating the continuation of analysis using ANOVA.
Data Normality
Several tests are used to check the assumption of normality of the residuals. All normality tests show P-Values greater than 0.05, indicating that the normality assumption is met.
Conclusion
After the data imputation process, both the homogeneity of variances and normality assumptions are met. This increases the level of confidence in the ANOVA results and subsequent interpretation of the data. The imputation process has been successful not only in addressing assumption violations but also in strengthening the validity of the statistical analysis conducted.
ANalysis of Variance: Imputation
Comparative Analysis of Variance: Original Data vs Imputed Data
Discussion
- Seasonal Effect (M): The F-Value increased from 14.253 to 21.902, and its significance level also increased (p-value decreased from 0.020 to 0.009). This suggests that the season has a stronger effect on rice production outcomes after data imputation.
- Nitrogen Effect (N): The effect of nitrogen on rice production also became stronger (F-Value increased from 10.617 to 59.511). The significance level remains the same (p = 0.000), confirming that nitrogen indeed has a very significant effect.
- Season x Nitrogen Interaction (M x N): The interaction effect also became more significant after imputation (F-Value increased from 5.467 to 29.661). This suggests that data imputation strengthens the evidence that there is a significant interaction between the season and nitrogen dosage affecting rice production.
- Coefficient of Variation (CV): CV(a) and CV(b) decreased after data imputation, indicating that variability in the data has reduced, possibly due to the removal of outliers.
- Error: The error also decreased in the imputation model, indicating an improvement in model fit.
Conclusion
The data imputation process was successful in enhancing the accuracy and validity of the statistical model. This is evidenced by the increased significance of the seasonal effects, nitrogen, and their interactions on rice production yield. Moreover, the reduction in the coefficient of variation and error indicates that the model is more appropriate and the data more consistent following imputation.
Post hoc Test: Imputation
Discussion of Post hoc Test Results: Original Data vs Imputed Data
Independent Effect of Season (M)
- Original Data: The average rice yield in the Dry season is 5.90, whereas in the Rainy season it is 5.13.
- Imputed Data: The average rice yield in the Dry season increased to 6.00, while in the Rainy season it increased to 5.34.
Independent Effect of Nitrogen (N)
- Original Data: Nitrogen doses of 60 and 90 kg/ha yield the highest rice production, whereas doses of 0 and 150 kg/ha yield the lowest.
- Imputed Data: The same trend persists; however, there are some shifts in the confidence intervals.
Interaction Effect of Season x Nitrogen
- Original Data: The interaction between season and nitrogen dose affects rice production yield.
- Imputed Data: The same pattern still applies; however, confidence intervals indicate a shift.
Simple Interaction Effects of Season x Nitrogen
- Original Data: The simple effects show that the Dry season yields higher rice production only at higher nitrogen doses.
- Imputed Data: The same pattern is observed, but with narrower confidence intervals, indicating a higher level of confidence in these results.
Discussion
- Enhanced Seasonal Effect: In the imputed data, the average rice yield in the Dry season is higher than the original data, affirming the positive effect of the Dry season on rice production.
- Consistency of Nitrogen Effect: Although there are shifts in the confidence intervals, the effect of nitrogen dose on rice production yield remains consistent between the original and imputed data.
- Interaction Effect: In both datasets, the interaction between season and nitrogen dose is proven to be significant. However, the imputed data shows narrower confidence intervals, indicating a higher level of confidence in this finding.
- Simple Interaction Effects: The imputed data continues to support the findings from the original data, that the positive effect of the Dry season on rice production is more pronounced at higher nitrogen doses.
- Confidence Intervals: The shifts in confidence intervals in the imputed data indicate that data imputation has influenced the level of confidence in the findings, mostly in a positive manner.
Conclusion
The data imputation process was successful not only in meeting statistical assumptions but also in increasing confidence in the findings. The effects of season and nitrogen dose, as well as the interaction between them, remain significant and consistent with the original data, but with a higher level of confidence.