In the current era of globalization, it is crucial for researchers to ensure that their experimental results are relevant and applicable across diverse geographical and temporal contexts. By utilizing SmartstatXL, data analysis for factorial experiments based on CRD, RBD, and LSD across various locations, seasons, or years becomes increasingly easy and efficient. Although the tool is more focused on Balanced Design, it still offers precision and reliability in data analysis.
One of the main advantages of SmartstatXL is its ability to quickly identify whether there is a significant treatment effect on the observed variable. If a significant effect is found, researchers can easily compare the treatment means using various available Post Hoc tests. The Post Hoc test options in SmartstatXL include: Tukey, Duncan, LSD, Bonferroni, Sidak, Scheffe, REGWQ, Scott-Knott, and Dunnet. Thus, researchers can ensure that the conclusions drawn from their data are supported by robust statistical analysis.
Case Example
The table below shows the Rice Yield data (ton/ha) from two rice varieties, tested with six nitrogen levels at three different locations. Although the original design of this experiment used the RBD Split-Plot method, this data will be analyzed as a case example using the RBD Factorial model in the SmartstatXL application. Please note that this step is taken solely for illustrative purposes, given the lack of real-world data examples for RBD Factorial replicated across multiple locations. There are three replications for each condition in this analysis.
Nitrogen | ||||||||
Location | Varieties | Replication | N1 | N2 | N3 | N4 | N5 | N6 |
L1 | V1 | 1 | 1979 | 4572 | 5630 | 7153 | 7223 | 7239 |
2 | 1511 | 4340 | 6780 | 6504 | 7107 | 6829 | ||
3 | 3664 | 4132 | 4933 | 6326 | 6051 | 5874 | ||
V2 | 1 | 5301 | 5655 | 6339 | 8108 | 7530 | 7853 | |
2 | 1883 | 5100 | 6622 | 8583 | 7097 | 7105 | ||
3 | 3571 | 5385 | 6332 | 7637 | 6667 | 7443 | ||
L2 | V1 | 1 | 3617 | 6065 | 6092 | 5916 | 7191 | 5805 |
2 | 3580 | 5463 | 6571 | 6982 | 6109 | 6890 | ||
3 | 3939 | 5435 | 6084 | 7145 | 7967 | 7113 | ||
V2 | 1 | 3447 | 5905 | 5322 | 6513 | 8153 | 7290 | |
2 | 3560 | 5969 | 5883 | 6556 | 7208 | 6564 | ||
3 | 3516 | 6026 | 6489 | 7853 | 6685 | 7401 | ||
L3 | V1 | 1 | 4320 | 5862 | 5136 | 6336 | 5571 | 6765 |
2 | 4068 | 4626 | 5836 | 5456 | 5854 | 5263 | ||
3 | 3856 | 4913 | 4898 | 5663 | 5533 | 3910 | ||
V2 | 1 | 4891 | 6009 | 6712 | 6458 | 5683 | 6335 | |
2 | 2577 | 6625 | 6693 | 6675 | 6868 | 6064 | ||
3 | 4541 | 5672 | 6799 | 6636 | 5692 | 5949 |
Cited from:
Gomez, Kwanchai A. and Gomez, Arturo A. 1995. Statistical Procedures for Agricultural Research. [translator] Endang Sjamsuddin and Justika S. Baharsjah. Second Edition. Jakarta: UI-Press, 1995. ISBN: 979-456-139-8. p. 350.
Steps for Analysis of Variance (Anova) and Post Hoc Test:
- Ensure the worksheet (Sheet) you want to analyze is active.
- Place the cursor on the Dataset. (For information on creating a Dataset, please refer to the 'Data Preparation' guide).
- If the active cell is not on the dataset, SmartstatXL will automatically detect and determine the appropriate dataset.
- Activate the SmartstatXL Tab
- Click on the Menu Factorial > Multi Location/Year/Season.
- SmartstatXL will display a dialog box to verify whether the Dataset is correct or not (usually, the cell address for the Dataset is automatically selected correctly).
- After confirming that the Dataset is correct, press the Next Button
- Next, the Anova – RBD Factorial dialog box will appear (replicated across multiple locations):
Factorial experimental models are analyzed by incorporating an additional Season/Location/Year factor and analyzed simultaneously (sometimes referred to as Mixed Design or Split Plot in Time).
In Split Plot in Time models, for both CRD and RBD, the Replication Factor must be included in the model!
Compare this with factorial experimental models analyzed separately at each location, as shown in the following image:
In the above Factorial Experimental Model, there is no Location Factor. - There are 3 Steps. First step, choose the Factors and at least one Response to be analyzed (as shown in the above image)!
- When you select Factors, SmartstatXL will provide additional information about the number of levels and their names.
- Details from the Anova STEP 1 dialog box can be seen in the following image:
- After confirming that the Dataset is correct, press the Next Button to proceed to the Anova Step-2 Dialog Box
- The dialog box for the second step will appear.
- Adjust settings according to your research method. In this example, the Post Hoc test used is Duncan's Test.
- To configure additional output and default values for subsequent output, press the "Advanced Options…" button.
- Here is the Advanced Options Dialog Box view:
- After completing the settings, close the "Advanced Options" dialog box.
- Next, in the Anova Step 2 Dialog Box, click the Next Button.
- In the Anova Step 3 Dialog Box, you will be asked to specify the mean table, ID for each Factor, and rounding of the mean values. Details can be seen in the following image:
- As the final step, click "OK"
Analysis Results
Analysis Information
Experimental Design
This analysis employs a Randomized Block Design (RBD) Factorial that is replicated across multiple locations. This method allows researchers to test the effects of various different factors—in this case, location, nitrogen levels, and rice varieties—on rice yield. In this analysis, there are three replications for each condition.
Factors in the Analysis
- Replication: There are three replications in this experiment, which provide more information about the variability of the data.
- Location: With three different locations, this experiment aims to unearth information about how location affects rice yield. This is crucial because some locations may have soil conditions, climate, or other factors that affect rice yield.
- Nitrogen (Main): Nitrogen levels are tested at six different levels. This provides data on the effect of nitrogen levels on rice yield, which could be critical in determining fertilizer usage policies.
- Variety (Sub): Two rice varieties are tested to ascertain if there are significant differences in rice yield between different varieties.
Post Hoc Test
The Post Hoc test used is Duncan, which is commonly used to determine differences between groups if the analysis of variance indicates significant differences.
Violation of Assumptions
One important note is the presence of a violation of assumptions in this analysis. This violation needs to be further explained, as it could affect the validity of the conclusions drawn from the analysis.
Preliminary Conclusion
Overall, the experimental design seems to be well-structured to ascertain the effects of location, nitrogen levels, and rice varieties on rice yield. However, the existing assumption violations need to be carefully addressed to ensure that the analysis results are representative and valid.
ANOVA Assumption Checks
Formal Approach (Statistical Tests)
Levene's Test for Homogeneity of Variances
Levene's test is conducted to check the assumption of homogeneity of variances. In this analysis, the F-value is 2.32 with a p-value of 0.001, which is smaller than 0.05. This indicates a violation of the homogeneity of variances assumption. If this assumption is violated, it may affect the validity of the Analysis of Variance (ANOVA) results, and corrective measures may need to be taken, such as employing more robust analysis techniques against this assumption violation.
Normality Test
The normality test is performed to check the normal distribution of residuals. Nearly all tests (Shapiro-Wilk's, Anderson Darling, D'Agostino Pearson) indicate a p-value less than 0.05, showing a violation of the normality assumption. This violation can affect the accuracy and reliability of the ANOVA results.
Summary of Assumption Violations
- Homogeneity of variances: Assumption violated.
- Normality: Assumption violated.
Implications and Next Steps
These assumption violations require corrective action. Possible options include data transformation or using more robust statistical methods against assumption violations, such as non-parametric ANOVA. Additionally, the results of this analysis should be interpreted cautiously, and these limitations must be stated when presenting the findings.
Overall, these assumption violations indicate that caution is needed in interpreting the ANOVA results. It is important to consider how these assumption violations may affect the conclusions drawn from the data.
Visual Approach (Plotting Graphs)
Interpretation and Discussion
- Normal P-Plot of Residual Data
- The Normal P-Plot graph is commonly used to check if data follows a normal distribution. If the data points closely follow the diagonal line from the bottom-left corner to the top-right corner, it is a strong indication that the data follows a normal distribution. From the presented graph, it appears there are some deviations from the diagonal line, indicating a violation of the normality assumption. This is consistent with the findings from the formal normality test.
- Residual Data Histogram
- The histogram is also used to check data normality. From the given graph, it appears that the residual data shows slight deviation from a normal distribution shape, confirming the violation of the normality assumption.
- Residual vs. Predicted Plot
- This plot is usually used to check the assumption of homoskedasticity—that is, the variance of errors is constant across levels of the predictor variable. If points are randomly and evenly distributed around the horizontal zero line, this assumption is considered met. From the presented graph, there seem to be some patterns in the residual distribution, indicating potential violation of the homoskedasticity assumption.
- Standard Deviation vs. Mean
- This graph is used to check the homogeneity of variances between different groups. If points are evenly distributed, then the assumption of homogeneity of variances is considered met. From the graph, there appears to be variation in the standard deviation among different groups, indicating a violation of the homogeneity of variances assumption.
Conclusion
Graphically, these findings are consistent with the formal tests and indicate violations against the assumptions of normality and homoskedasticity. This reaffirms the need for corrective action in the analysis, such as data transformation or the use of more robust analysis methods against assumption violations.
Variance Analysis
Analysis of Variance for Grain Yield Variable
- Location (L): The location factor does not show a significant effect on grain yield (F-value = 2.905, p-value = 0.131). Therefore, it can be concluded that location does not statistically significantly influence grain yield in this experiment.
- Nitrogen (Main) (N): The nitrogen factor shows a highly significant effect (F-value = 66.218, p-value < 0.001). This means that nitrogen levels have a significant effect on grain yield and further tests are needed to understand the differences between levels.
- Variety (Sub) (V): The variety factor also shows a highly significant effect (F-value = 22.587, p-value < 0.001). This indicates that there are significant differences in grain yield between different varieties.
- Interaction L x N: The interaction between location and nitrogen level shows a highly significant effect (F-value = 4.498, p-value < 0.001). This means that the effect of nitrogen level on grain yield varies depending on the location.
- Interaction L x V: The interaction between location and variety shows a significant effect at the 5% level (F-value = 3.585, p-value = 0.033). This means that the effect of variety on grain yield varies depending on the location.
- Interactions N x V and L x N x V: Both of these interactions do not show a significant effect (p-value > 0.05), indicating that the interactions between nitrogen level and variety, as well as the three-way interaction between location, nitrogen level, and variety, do not significantly affect grain yield.
- Coefficient of Variation (CV): CV(a) and CV(b) are 14.89% and 11.00%, respectively, which are relatively low, indicating that the data is fairly consistent.
Conclusion
- Nitrogen level and rice variety have a significant effect on grain yield.
- This effect varies depending on the location, but the location itself does not show a significant effect.
- There are no significant interactions between nitrogen level and variety or three-way interactions between location, nitrogen level, and variety in affecting grain yield.
Overall, these results highlight the importance of considering nitrogen level and rice variety in cultivation practices to improve grain yield. Additionally, although location does not show a significant effect independently, it influences how nitrogen level and variety affect grain yield.
Post hoc Test
Single Effect of Nitrogen and Variety
Single Effect of Nitrogen (N)
- The significant difference in grain yield among nitrogen levels indicates that the appropriate selection of nitrogen levels is crucial in enhancing grain yield.
- Grain yield tends to increase with the increase in nitrogen level, reaching a saturation point around N4, after which further increases in nitrogen level do not yield significant improvements in grain yield.
- Nitrogen levels N4, N5, and N6 do not show significant differences in grain yield, indicating that beyond a certain nitrogen level (N4), further increases do not result in significant yield improvements.
Overall, these findings emphasize the importance of selecting the appropriate nitrogen level for optimizing grain yield, and also indicate that there is a limit beyond which further increases in nitrogen level are no longer beneficial.
Single Effect of Variety (V)
- There is a significant difference between varieties V1 and V2 in terms of grain yield. Variety V2 statistically produces a higher grain yield compared to V1.
- This indicates that in addition to considering factors like nitrogen level, variety selection also plays a crucial role in enhancing grain yield.
- Therefore, farmers and researchers can consider both nitrogen level and rice variety when planning strategies to improve grain yield.
Overall, these results emphasize the importance of selecting the appropriate variety for optimizing grain yield, and this should be a key consideration in rice cultivation practices.
Effect of Location-Nitrogen Interaction
First Format:
Interpretation of Location x Nitrogen Interaction Effect
- The significant difference in grain yield among the Location x Nitrogen combinations indicates that the effect of nitrogen on grain yield varies depending on the location. This means that an effective nitrogen management strategy in one location may not be equally effective in another.
- For instance, at Location 1 (L1), nitrogen level N4 yields the highest grain yield (7,385.17 tons/ha), while at Location 2 (L2), nitrogen level N5 gives the highest yield (7,218.83 tons/ha). At Location 3 (L3), grain yield is relatively lower compared to the other two locations for nearly all nitrogen levels.
- It is important to note that in some cases, such as L1_N6 and L1_N5 or L2_N4 and L2_N6, grain yields are not significantly different despite differing nitrogen levels. This indicates that further increases in nitrogen level do not necessarily lead to significant increases in grain yield, depending on the location.
Overall, these findings underscore the importance of considering the interaction between location and nitrogen level when planning strategies to improve grain yield. It also emphasizes the need for further research to understand how these factors interact under different geographical and climatic conditions.
Second Format: Simple Effects
Simple Effects of Location x Nitrogen Interaction
- Within the Same Location (Capital Letters):
- At Location 1 (L1), nitrogen levels N4, N5, and N6 are not significantly different from each other but are different from N1, N2, and N3. This indicates that increasing nitrogen levels above N3 improves grain yield but not significantly beyond N4.
- A similar phenomenon is also observed at Location 2 (L2).
- At the Same Nitrogen Level (Lowercase Letters):
- For each nitrogen level, there are some differences between locations. For instance, at nitrogen level N1, Location 3 (L3) yields higher grain yield than Location 1 (L1) and is not significantly different from Location 2 (L2).
- Comparison with the First Format Table:
- This analysis provides more detailed information about how the interaction between location and nitrogen level affects grain yield. It allows us to further understand how nitrogen effects vary across locations and to what extent these effects differ at different nitrogen levels.
Overall, these results reinforce the findings from the previous interaction analysis and add further nuances. It shows that there are multiple factors to consider when planning strategies to improve grain yield, including the specific location and nitrogen level to be used. Therefore, a more measured and tailored approach needs to be taken to maximize grain yield.
Simple Effects of Location x Variety Interaction
Conclusion
- Within the Same Location (Capital Letters): At Location 1 (L1) and Location 3 (L3), variety V2 produces a higher grain yield than V1, and the difference is significant. Meanwhile, at Location 2 (L2), grain yields from varieties V1 and V2 are not significantly different.
- At the Same Variety (Lowercase Letters): For variety V1, the highest grain yield is found at Location 2 (L2), and the lowest at Location 3 (L3). For variety V2, the highest grain yield is found at Location 1 (L1) and is not significantly different at other locations.
- Complexity of Interaction: These results indicate that the effect of variety on grain yield is not consistent across all locations. This confirms that, in addition to variety, location also influences grain yield, and both factors interact in a complex manner.
Overall, these findings emphasize the importance of considering the specific location when selecting a variety for cultivation. Additionally, it also shows that the positive effects from choosing a superior variety can differ depending on the location conditions. Therefore, to maximize grain yield, the choice of variety should be adjusted to the cultivation location conditions.
Other tables and three-way tables between Location x Nitrogen x Variety are not discussed in this context.
Box-Cox and Residual Analysis
Box-Cox Transformation
To address the violations of the assumptions of normality and homoscedasticity, a Box-Cox transformation was performed with λ=2, which in this case is a square transformation (Y2). This transformation is generally effective in stabilizing variance and making data more closely approximate a normal distribution.
Residual Values and Outlier Examination
- Predicted: These are the grain yield values predicted by the model.
- Residual: This is the difference between the actual observed values and the values predicted by the model.
- Leverage: Measures how far individual observations influence model estimation.
- Studentized Residual: These are the normalized residuals; absolute values greater than 2 typically indicate outliers.
- Studentized Deleted Residual: Similar to Studentized Residual, but calculated by removing that particular observation from the model.
- Cook's Distance: Measures how far individual observations influence the overall model.
- DFITS: Another metric for measuring observation influence.
- Diagnostic: This column indicates whether the observation is an outlier or not.
- Box Cox Data: This is the data that has been transformed using the Box-Cox method.
From the table, there are several observations noted as "Outliers" based on high Studentized Residual and Studentized Deleted Residual values, as well as significant Cook's Distance. These observations have the potential to influence the model and may need to be handled cautiously.
Implications and Next Steps
- The Box-Cox transformation has been performed, but it needs to be retested to see if this transformation successfully meets the model's assumptions.
- The presence of outliers indicates that the model may need to be adjusted, or that those observations need to be further reviewed to determine whether they should indeed be considered as outliers or represent natural variability in the data.
Overall, this analysis shows that steps have been taken to address assumption violations, but more work needs to be done, especially in handling outliers and validating the effectiveness of the Box-Cox transformation.