Response Surface Methodology, also known as Response Surface Methodology (RSM), is a combination of mathematical and statistical techniques focused on modeling the relationship between response and several independent variables. The primary objective of this method is optimization, which is to find the combination of independent variables that can produce an optimal response.
In the analysis process, response surface methodology involves various testing procedures, such as:
- Goodness-of-fit tests (R2, AIC, AICc, BIC, RMSE, MAE, MPE, MAPE, sMAPE).
- Lack of fit.
- Regression analysis of variance.
- Simultaneous testing of regression parameters.
- Residual assumption testing.
SmartstatXL simplifies the user's task in analyzing and selecting the best response surface model by providing various regression methods, such as Stepwise, Forward Selection, Backward Selection, and Forward Information Criteria.
In the context of two independent variables, SmartstatXL typically uses a second-order regression model, with the equation form:
Y = β0 + β1X1 + β2X12 + β3X2 + β4X22 + β5X1X2 + ε
However, for first-order models, users can choose to use multiple regression or custom regression through the application.
Here are some key features of response surface regression analysis with the SmartstatXL application:
- Missing data calculation
- Regression Methods: Enter, Stepwise, Forward Selection, Backward Selection, Forward Information Criteria
- Regression Diagnostics:
- Normality Test, Heteroskedasticity Test, Residual Plot, and
- Box-Cox Transformation
- Outlier data information and Automatic outlier data replacement
- Automatic Transformation
- Output:
- Regression Equation
- Regression/Goodness-of-Fit Statistics: R2, Adjusted R2, Correlation Coefficient; AIC, AICc, BIC, RMSE, MAE, MPE, MAPE, sMAPE
- Regression Coefficient Estimates: Coefficient Value; Standard error; t statistic, p value, Upper/Lower, VIF
- ANOVA: Sequential and Partial
- Graphics: 2D and 3D Surface Response Plots,
- Optimization (Maximum and Minimum)
Case Example
There is a study on the effect of fertilizer and compost on several soil chemical properties, nutrient uptake, and yield. Here is a snippet of data from the study:
In this case example, suppose we want to model the relationship between fertilizer dosage and compost dosage with Total Dry Weight (g/plant). The method used is the response surface model with the aim of finding out what dosage of fertilizer and compost can maximize Total Dry Weight (g/plant). The model used is a second-order model with the equation as follows:
Regression equation model: Y = β0 + β1X1 + β2X12 + β3X2 + β4X22 + β5X1X2
Where: Y = Total Dry Weight (g/plant), X1 = Fertilizer, and X2 = Compost
Steps for Response Surface Regression Analysis
- Activate the worksheet (Sheet) to be analyzed.
- Place the cursor on the dataset (for creating a dataset, see Data Preparation method).
- If the active cell is not on the dataset, SmartstatXL will automatically attempt to determine the dataset.
- Activate the SmartstatXL Tab
- Click on the Regression > Response Surface Regression menu.
- SmartstatXL will display a dialog box to confirm whether the dataset is correct or not (usually the dataset is automatically selected correctly).
- If it is correct, click on the Next Button
- Next, the Regression Analysis dialog box will appear. Choose the Factor Variables (Independent) and one or more Response Variables (Dependent). The factor variables selected depend on the type of regression analysis.
- Regression equation model: Y = β₀ + β₁X₁ + β₂X₁² + β₃X₂ + β₄X₂² + β₅X₁X₂
- Type of Regression: Response Surface Regression
- Predictor Variables: Fertilizer Dosage and Compost Dosage
- Response Variable: Total Dry Weight (g/plant)
For more details, refer to the following dialog box display:
- Press the "Next" button
- Select the regression output as shown in the following display:
- Press the OK button to generate the output in the Output Sheet
Analysis Results
Analysis Information: type of regression used, regression method, response, and predictors
There are violations of Regression Assumptions for the variable pH H2O (See in the Regression Assumptions Check section). The residuals are not normally distributed.
Regression Equation
The following is the Interpretation of Response Surface Regression Analysis Results for the given case:
- Regression Model: The regression equation obtained from the analysis is:
- Y = 26.5342 + 0.9789 × Fertilizer Dosage - 0.0107 × Fertilizer Dosage2 + 4.35 × Compost Dosage - 0.2165 × Compost Dosage2 + 0.0035 × Fertilizer Dosage × Compost Dosage
- The equation above shows the relationship between fertilizer dosage, compost dosage, and total dry weight.
- Coefficient Interpretation:
- 26.5342: This is the constant. If the dosages of fertilizer and compost are both zero, the expected total dry weight is 26.5342 g/plant.
- 0.9789: For every one-unit increase in fertilizer dosage (keeping compost dosage constant), the total dry weight is expected to increase by 0.9789 g/plant.
- -0.0107: The quadratic effect of the fertilizer dosage indicates that the increase in total dry weight will slow down as the fertilizer dosage increases.
- 4.35: For every one-unit increase in compost dosage (keeping fertilizer dosage constant), the total dry weight is expected to increase by 4.35 g/plant.
- -0.2165: The quadratic effect of the compost dosage indicates that the increase in total dry weight will slow down as the compost dosage increases.
- 0.0035: This indicates the interaction between fertilizer dosage and compost dosage. The combined effect of both predictors on total dry weight is 0.0035 g/plant.
- Coefficient of Determination (R²) and Adjusted R²:
- R² = 0.657: This means that 65.7% of the variation in total dry weight can be explained by the regression model involving fertilizer dosage and compost dosage.
- Adjusted R² = 0.626: After adjusting for the number of predictors in the model, 62.6% of the variation in total dry weight can be explained by the model.
- F-Test and Significance:
- F = 20.726: This indicates that the overall regression model is significant in predicting total dry weight based on fertilizer and compost dosages.
- Sig = 0.00: This indicates that the p-value (significance level) for the F-test is less than 0.05, which means the regression model is statistically significant in explaining the variation in total dry weight.
Therefore, based on the response surface regression analysis, it can be concluded that fertilizer dosage and compost dosage have a significant impact on total dry weight. The obtained regression model can be used to predict total dry weight based on the given fertilizer and compost dosages.
Model Goodness of Fit
Several statistical goodness-of-fit values and regression coefficient estimates
The following is the interpretation of the model's goodness of fit:
- Correlation Coefficient (r) - 0.8108: This indicates that there is a strong positive correlation between predictor variables (fertilizer dosage and compost dosage) and the response (total dry weight).
- Coefficient of Determination (R²) - 0.6574: As much as 65.74% of the variation in total dry weight can be explained by the regression model involving fertilizer dosage and compost dosage.
- Adjusted R² - 0.6257: After adjusting for the number of predictors in the model, 62.57% of the variation in total dry weight can be explained by the model.
- AIC - 278.9127 and AICc - 280.4976: AIC (Akaike Information Criterion) and AICc (Corrected AIC) are metrics for comparing the relative quality of statistical models. Lower AIC and AICc values indicate a better model. When comparing multiple models, the model with the lowest AIC or AICc is preferred.
- BIC - 291.4787: BIC (Bayesian Information Criterion) is similar to AIC but imposes a greater penalty for models with more parameters. Like AIC, models with lower BIC are preferred.
- RMSE - 9.7469: RMSE (Root Mean Square Error) measures the magnitude of the error between the model's predictions and the actual values. A lower RMSE value indicates that the model has smaller prediction errors.
- MAE - 7.6866: MAE (Mean Absolute Error) measures the average absolute error between predictions and actual values. A lower MAE value indicates a better model.
- MPE - -0.0595: MPE (Mean Percentage Error) measures the average percentage error between predictions and actual values. An MPE value close to zero indicates a better model. A negative sign indicates that the model tends to underestimate the response.
- MAPE - 0.1869: MAPE (Mean Absolute Percentage Error) measures the average absolute percentage error between predictions and actual values. A lower MAPE value indicates a better model.
- sMAPE - 0.1669: sMAPE (Symmetric Mean Absolute Percentage Error) is a symmetric error metric that equally weighs overestimation and underestimation errors. A lower sMAPE value indicates a better model.
Therefore, based on the goodness-of-fit metrics, the model obtained from the analysis shows fairly good quality in predicting total dry weight based on fertilizer and compost dosages. However, it is always important to compare these metrics with other models (if available) to determine the best model.
Regression Coefficient Estimation
Below is the interpretation of the regression coefficient estimation results:
- Intercept (26.534):
- This coefficient indicates that if both the fertilizer and compost dosages are zero, the expected total dry weight is 26.534 g/plant.
- With a t-value of 6.408 and a p-value of 0.000 (less than 0.01), this intercept is significant at the 1% significance level.
- Fertilizer Dosage (0.979):
- For every one-unit increase in fertilizer dosage (assuming compost dosage remains constant), the total dry weight is expected to increase by 0.979 g/plant.
- With a t-value of 5.533 and a p-value of 0.000 (less than 0.01), this coefficient is significant at the 1% significance level.
- Fertilizer Dosage^2 (-0.011):
- The quadratic effect of the fertilizer dosage indicates that the increase in total dry weight will slow down as the fertilizer dosage increases.
- With a t-value of -4.910 and a p-value of 0.000 (less than 0.01), this coefficient is significant at the 1% significance level.
- ... etc
- VIF (Variance Inflation Factor):
- VIF measures how much the variance of an estimated regression coefficient increases when predictors are correlated. As a general rule, a VIF greater than 10 indicates the presence of multicollinearity issues.
- In this case, several predictors have a VIF greater than 10, indicating the potential for multicollinearity in the model.
Based on the results of regression coefficient estimation, it can be concluded that fertilizer dosage, fertilizer dosage^2, compost dosage, and compost dosage^2 have a significant impact on the total dry weight at the 1% significance level. Meanwhile, the interaction between fertilizer dosage and compost dosage is not significant in affecting the total dry weight. The presence of indications of multicollinearity may require further investigation.
Response Surface Plots and Optimization
Optimizing Fertilizer and Compost Dosage to Maximize Total Dry Weight:
The aim of Response Surface Methodology is to determine the optimal dosages of fertilizer and compost that maximize the yield—in this case, the total dry weight of plants. In mathematics, to find the maximum or minimum point of a function, we use differentiation techniques. By finding the point where the first derivative of the function is zero, we can identify stationary points, which may be maximum, minimum, or inflection points.
However, with technological advancements, this process has been simplified and can easily be performed with the help of statistical software. In this case, SmartstatXL has facilitated this process.
From the analysis with SmartstatXL, it was found that the maximum total dry weight is 72.429 g/plant. To achieve this, the required fertilizer dosage is 47.4 (e.g., in grams or ml) and the compost dosage is 10.43 (in the same or possibly different units, depending on the units used initially).
It's important to note that these results are based on the Response Surface Regression model created, and there's always the possibility of variation when implemented in actual practice. Therefore, it is advisable to conduct field trials to verify these results before applying them on a large scale.
Analysis of Variance for Regression
Here is the interpretation of the Analysis of Variance results for the Total Dry Weight variable (g/plant):
- Source of Variance for Regression: With degrees of freedom (DF) of 5, the sum of squares (SS) for regression is 9845.3462 with a mean square (MS) of 1969.0692. The F-value for the overall regression is 20.726, which is significant at the 1% level (since the p-value is 0.000 < 0.01).
- Fertilizer Dosage: The contribution of fertilizer dosage to the model is indicated by an F-value of 30.609, which is significant at the 1% level (since the p-value is 0.000 < 0.01). This shows that fertilizer dosage has a significant effect on total dry weight.
- ...etc
- Error: The sum of squares for error in the model is 5130.1436 with a mean square of 95.0027.
- Total: The total sum of squares for the model is 14975.4898.
Based on the analysis of variance results, it can be concluded that fertilizer dosage, quadratic effects of fertilizer dosage, compost dosage, and quadratic effects of compost dosage have a significant effect on the total dry weight at the 1% significance level. Meanwhile, the interaction between fertilizer dosage and compost dosage is not significant in affecting the total dry weight.
Regression Assumption Testing
Breusch–Pagan–Godfrey (BPG) Test for Heteroskedasticity Detection
- The assumption of homoscedasticity states that the variability of the errors (residuals) should be constant across all levels of the predictors.
- With a χ²-statistic of 3.416 and a p-value of 0.636 (greater than 0.05), the BPG test results suggest that there is insufficient evidence to reject the null hypothesis that the variance is equal across groups. This means that the assumption of homoscedasticity is not violated.
Normality Tests
The normality assumption states that the errors (residuals) from the model should be normally distributed.
- Shapiro-Wilk's: With a statistic of 0.977 and a p-value of 0.311 (greater than 0.05), it suggests that the residuals are normally distributed.
- Anderson-Darling: With a statistic of 0.484 and a p-value of 0.228 (greater than 0.05), it suggests that the residuals are normally distributed.
- D'Agostino Pearson: With a statistic of 1.563 and a p-value of 0.458 (greater than 0.05), it suggests that the residuals are normally distributed.
- Liliefors and Kolmogorov-Smirnov: Both of these tests show that the p-value is greater than 0.20, further confirming that the residuals are normally distributed.
Based on the Regression Assumption Testing results, it can be concluded that the regression model meets the assumptions of homoscedasticity and normality. Therefore, the regression model can be considered valid and can be used for further analysis.
Residual Plots
In addition to formal tests, the assumption of normality can also be visually assessed using the included residual plots. Checks can be performed using the Normal Probability Plot (Normal P-Plot), Histogram, and Residual vs. Predicted plots.
- Normal P-Plot for Residuals:
- The Normal Probability Plot plots the residual values against the predicted or observed values. Ideally, the points on this plot should follow a straight diagonal line. If the points deviate from this line, it may indicate a deviation from normality.
- The fact that the points almost follow the straight diagonal line indicates that the residuals are mostly normally distributed across the range of values. This is a good sign and indicates that the normality assumption of the residuals is almost met. However, the presence of points deviating from the diagonal line at both ends indicates deviations from normality in the tails of the distribution.
- Although there are some deviations from normality, depending on the context and purpose of the analysis, these deviations may not be significant. However, if the analysis is very sensitive to the normality assumption, you may need to consider transformation techniques or other methods to correct these deviations.
- Histogram for Residuals:
- The histogram should display a distribution that is approximately bell-shaped (normally distributed). Deviations from this shape (e.g., skewed or heavy-tailed distributions) could indicate a violation of the normality assumption.
- Residual vs. Predicted:
- To check for homoscedasticity, the points on this plot should scatter randomly around the horizontal line at 0 without any specific pattern. If you see a specific pattern, like a funnel shape or a curve, this could indicate heteroscedasticity or other violations of the regression assumptions.
Considering that all formal tests show the residuals to be normally distributed (as all p-values are greater than 0.05), slight deviations in the Normal P-Plot are likely not a major issue.
In practice, regression analysis is often quite tolerant of minor violations of the normality assumption, especially if the sample size is large. Therefore, even though there are some points deviating from the diagonal line in the Normal P-Plot, if formal tests indicate normality and you do not observe any other significant violations of assumptions, the regression model can be considered sufficiently valid for analytical purposes.
Box-Cox Transformation and Residual Analysis
Interpretation of Box-Cox Transformation:
The Box-Cox transformation is used to make data that is not normally distributed approximate a normal distribution. The transformation parameter suggested by the Box-Cox method is \( \lambda \) (Lambda), which in this case is 1.493. However, the results indicate "No Transformation: \( Y^1 \)", meaning the response data (Total Dry Weight) does not require transformation and is already approximately normally distributed.
Interpretation of Residual Values and Outlier Examination:
Examination of residuals and outlier data is essential to ensure that the regression model meets its assumptions.
- Fertilizer Dose and Compost Dose: These are the doses of fertilizer and compost administered in the experiment.
- TOTAL DRY WEIGHT (g/plant): This is the actual outcome obtained from the experiment.
- Predicted: This is the value predicted by the regression model based on the doses of fertilizer and compost.
- Residual: This is the difference between the actual outcome and the value predicted by the model.
- Leverage: Leverage values indicate how far the predictor values are from the mean. High leverage values can indicate potential outliers.
- Studentized Residual: These are the standardized residuals. Values far from 0 may indicate potential outliers.
- Studentized Deleted Residual: Similar to studentized residuals but with the influence of that data point itself removed. Values far from 0 can also indicate potential outliers.
- Cook's Distance: Measures the influence of a particular data point on the entire model. High values can indicate potential outliers that have a significant impact on the model.
- DFITS: Like Cook's Distance, it measures the influence of a particular data point but on a different scale.
- Diagnostic: Combines various measures to provide an overall view of potential outliers.
From the residual analysis, it appears that there are some data points that have high residual, studentized residual, and studentized deleted residual values, which may indicate the presence of potential outliers. However, before making any decisions, it is essential to further examine these values and consider the experimental context.
Thus, based on the Box-Cox transformation results, the response data is already approximating a normal distribution. However, based on the residual and outlier examination, there are some data points that may need further review.
Conclusion
- Regression Model:
- The generated response surface regression model indicates that the fertilizer dose, the quadratic effect of the fertilizer dose, the compost dose, and the quadratic effect of the compost dose have a significant impact on the total dry weight at the 1% significance level. Meanwhile, the interaction between the fertilizer and compost doses does not have a significant effect on the total dry weight.
- Optimization:
- The maximum total dry weight is 72.429 g/plant, which can be achieved with a fertilizer dose of 47.4 and a compost dose of 10.43.
- Assumption Check:
- The regression model meets the assumptions of homoscedasticity and normality, meaning the regression model can be considered valid.
- Box-Cox Transformation:
- The response data (Total Dry Weight) does not require transformation and is already approximately normally distributed.
- Outlier Examination:
- Some data points show potential as outliers, but before making any further decisions, it is essential to further examine these values and consider the experimental context.
Scientific Paper Writing of Results and Discussion
In this research, a response surface regression analysis was conducted to determine the effect of fertilizer and compost doses on the total dry weight of plants. The analysis yielded a regression model that shows the fertilizer dose, the quadratic effect of the fertilizer dose, the compost dose, and the quadratic effect of the compost dose significantly affect the total dry weight at the 1% significance level. However, the interaction between the fertilizer and compost doses showed no significant effect.
Through optimization techniques, it was found that the maximum total dry weight is 72.429 g/plant, achievable with a fertilizer dose of 47.4 and a compost dose of 10.43.
The Regression Assumption Check indicates that the model satisfies the homoscedasticity and normality assumptions. Furthermore, through the Box-Cox transformation, it was found that the response data is already approximately normally distributed without requiring transformation. However, in the outlier examination, some data points show potential as outliers.
Therefore, the results of this analysis provide valuable information for agricultural practitioners in determining the optimal doses of fertilizer and compost to maximize the total dry weight of plants.