Forecasting of Maize Production in Bangladesh using ARIMA and Mixed Model Approaches

Maize ( Zea mays L. ) is the most vital cereal crop in Bangladesh and, after rice and wheat, it is also the leading crop of the world. Maize is the highest-yielding grain crop, having multiple uses. It serves as a primary raw material for the production of feed, food, fodder, and fuel. Thus, maize can play an important role as an alternative and a multipurpose food crop in Bangladesh. This study considered the BBS-published secondary yearly maize data, including area and production in Bangladesh, over the period 1970-71 to 2019-20. The best-selected Box-Jenkins ARIMA model is ARIMA (0, 2, 1) for forecasting maize production all over Bangladesh. When the area of maize is considered in the mixed model, i.e., ARIMA (1, 0, 0) is the best model rather than the univariate ARIMA (0, 2, 1) model. The length of the 95% confidence interval of the forecast value for the mixed model is smaller than that for the ARIMA model. Therefore, the forecasting performance of the mixed model is better than that of the econometric model like ARIMA; the reason could be the consideration of an explanatory variable like area. Thus, the mixed model approach can be applied for policy decisions based on the forecasting of maize production in Bangladesh.


Introduction
Maize (Zea mays L.) is the most prominent cereal crop in Bangladesh.Because of its assorted uses (e.g., food, feed, and industrial processed food items), maize has been highlighted as an intriguing and emerging crop for the region.The demand for maize has been rising steadily year after year.The area, production, and yield of wheat are decreasing each year from 2017-18 to 2019-20, while they are rising for maize during the same period.Now, maize has become a significant cereal crop in Bangladesh, taking the place of wheat after rice (BBS, 2020).
Rice and wheat are essential foods in Bangladesh, but their production is not adequate to meet the rising requirements of the growing population in the country.It is well recognised that to meet the requirement for food of the rising population of Bangladesh, food habits have to be diversified slightly to diminish dependence on rice and wheat.To achieve this purpose, the production of the highest amount of nutrients per unit of area and per unit of time is to be highlighted (Hossain et al., 2016).Because of its additional nutritional status, maize could be a good source of nutrients for the undernourished and malnourished population in Bangladesh.The demand for maize is rising day by day in the world as well as in Bangladesh due to its diversified uses.If the stern food habits of Bangladeshis are to be diversified from rice to maize, it would probably be feasible to reduce food shortages to a great extent.As it is a high-yielding and low-cost crop compared to rice and wheat, a comprehensible scheme is needed to make the crop gradually popular and sustainable (Moniruzzaman et al., 2009).Uddin et al. (2015) conducted a study to build the union-level digital database and maps of maize-growing areas during the season 2011-12 at Pirgonj in the Thakurgaon district of Bangladesh.They also showed that maize cultivation is rising vastly around Bangladesh, especially in the northern part of Bangladesh.Hasan et al. (2008) examined the change and instability in the area, production, and yield of two major cereal crops, namely maize and wheat, in Bangladesh.
Ensuring lucrative prices for producers not only requires developed crop technology to increase production but also entails long-term and short-term policies for yearly export and import (Kumar et al., 2020).This generally demands a believable and valid pre-harvest prediction of crop production.Several methods available for predicting crop productivity are ARIMA and mixedmodel approaches used for extensive investigation and the prediction of crop productivity.Forecasting is applied to provide support to decision-making and in planning for the future effectively and efficiently.Comparing the predictive performance of the regression model, the Auto Regressive Integrated Moving Average (ARIMA) model is also used in this study as a benchmark model.Sarika et al. (2011) applied the ARIMA model for modelling and predicting India's pigeon pea production.Suresh et al. (2011) applied the ARIMA model for estimating sugarcane area, production, and productivity in Tamil Nadu state of India.Mishra and Singh (2013) predicted prices of groundnut oil in Delhi by ARIMA methodology and Artificial Neural Networks.Kumari et al. (2014) applied the ARIMA model for the estimation of rice yield in India.Naveena et al. (2014) predicted coconut production in India applying ARIMA methodology.The ARIMA method has been applied elaborately by a number of researchers to predict demands in terms of internal consumption, imports, and exports to adopt appropriate solutions (Muhammed et al., 1992;Shabur and Haque, 1993;Sohail et al., 1994).Najeeb et al. (2005) used the Box-Jenkins model to estimate wheat area and production in Pakistan.They showed that ARIMA (1, 1, 1) and ARIMA (2, 1, 2) were the appropriate models for wheat area and production, respectively.Rachana et al. (2010) applied ARIMA models to predict pigeon pea production in India.Rathod et al. (2017) used the ARIMA model to forecast the area and production of mango in Karnataka, India.Rathod et al. (2018) employed the ARIMA model to forecast the yield of mango and banana in Karnataka, India.Biswas et al. (2013) revealed that for the study of the gross cultivated area, the ARIMA (2,1,3) model is found to be the best-fitted model, whereas for the series of production, ARIMA (2,1,1) is found to be the best-fitted one.
The model executes good accuracy levels for future projection of area and production of rice in West Bengal, India.Kumar et al. (2020) used the ARIMA model for forecasting groundnut productivity in the Junagadh district of Gujarat in India.
Several previous studies about different agricultural crops in Bangladesh have used the ARIMA model to analyze and predict the production of different agricultural crops.Hossain et al. (2016) carried out a study to predict potato production in Bangladesh.The study revealed that the ARIMA (0, 2, 1) model was appropriate for predicting future potato production.Hossain et al. (2015) suggested that the ARIMA approach could be applied to estimate the production of tea in Bangladesh.This study regarded the time series of secondary tea production yearly data over the period 1972 to 2013 in Bangladesh.The most fitted model to forecast tea production in Bangladesh is ARIMA (0, 2, 1).The research attempted to explore the most fitted ARIMA approach that could be accustomed to making efficient predictions of Boro rice production in Bangladesh from 2008-09 to 2012-13 and found that ARIMA (0, 1, 3), ARIMA (0, 1, 0), and ARIMA (0, 1, 2) are the best for modern, local, and overall Boro rice production, respectively (Rahman, 2010).A study revealed that the forecasting performance of the mixed model (considering area as an independent variable) for four crops, e.g., Aus, Aman, Boro, and Potato, is better than the univariate econometric model ARIMA in Bangladesh.In this study, the forecast value of total rice production for the financial year 2015-2016 (summation of Aus, Aman, and Boro rice) is almost equivalent to national production (AMIS, 2017).
In order to increase agricultural production further, the only option is to grow high-productivity crops, like maize.Recently, the government has been trying to diversify food habits and encourage cereal crop consumption like maize to reduce pressure on rice.In this regard, maize can play an important role as an alternative and a multipurpose food crop in Bangladesh.Thus, it is greatly essential to estimate the production of food grains, which leads us to do this study.The main aim of this research is to identify the Auto-Regressive Integrated Moving Average (ARIMA) model, which will be compared to the mixed model (Dynamic Regression model), that could be used to predict the production of maize in Bangladesh.All the statistical analyses were done through the open-source programming language R.

Methodology
This study considered the published yearly maize area (hectares) and production (metric tons) data (total of Rabi and Kharif growing seasons) of Bangladesh, which were collected over the period 1970-71 to 2019-20, published by the Bangladesh Bureau of Statistics (BBS).
The autoregressive integrated moving average (ARIMA) and mixed model approaches were applied for short-term and long-term forecasting of crop production at the national level.Ultimately, integrated approaches to crop prediction are essential to be used for the ultimate forecast.Mostly, in time series models, autoregressive integrated moving average (ARIMA) is better than deterministic growth models for short-term prediction (Rahman et al., 2013).The Box-Jenkins method was used to fit the best autoregressive integrated moving average model for time series prediction.
The prediction methodology consists of three steps, namely identification, estimation of parameters, and diagnostic checking.Plotting the data was applied to detect any unusual observations.For achieving stabilised variance among the time series data, possible transformation of the data is necessary.If the time series is not stationary, the stationarity of the series is to be ensured by means of seasonal and non-seasonal differencing.Once stationarity has been attained, examination of the autocorrelation function and correlogram is to be carried out to see if any pattern remains.Firstly, the order of differencing has been diagnosed, and the differenced univariate time series can be computed by the method of both time-domain and frequency-domain approaches (Cressie, 1988).The second step is to estimate the parameters of the model.The method of maximum likelihood estimation was applied to estimate the parameters of the models.The third step is to check whether the preferred model fits the data reasonably well.In diagnostic checking, the goodness of fit for the ARIMA model (Sarda and Prajneshu, 2002) was checked by Akaike's Information Criterion (AIC) and Schwartz-Bayesian Criterion (SBC).Considerable skill is required to adopt the actual ARIMA (p, d, q) model so that the residuals predicted from this model are white noise.Thus, the autocorrelations of the residuals are to be predicted for the diagnostic checking of the model.These may also be evaluated by the Chisquare test (Ljung and Box, 1978) under the null hypothesis that the autocorrelation coefficient is equal to zero.
The ARIMA model for the study is as follows: Where ∆ refers to the differencing of the time series.p, d, and q are the orders of autoregressive, differencing, and moving average, respectively.
The mixed model approach is a dynamic regression model (Pankratz, 1991) which can be used to discern the role of independent variables in determining the variable of interest (response variable).A dynamic regression model (Mixed Model) is a regression model that allows lagged values of the independent variable(s) to be included.This model is applied to predict what will occur to the predict variable if the independent variable changes (AMIS, 2017).The first step in determining the appropriate dynamic regression model is to fit a multiple regression model of the form: " =  +  #  " +  '  "%' +  (  "%( + ⋯ +  )  "%) +  " Where  " refers to the response variable;  "%) means the independent variable with the time-lag  = 0, 1, 2, … , ;  # ,  ' ,  ( , … ,  ) are the regression coefficients, and  " is an ARIMA process.The useful information is that the roles of times (past) lag, i.e.,  "%' ,  "%( ,  "%* , … ,  "%) , in explaining the movement in  " .The parameters of mixed models are estimated through the maximum likelihood method (Akpan et al., 2016).
In the second step, it is essential to apply differencing to the variable if the errors from the regression appear non-stationary.The model is fitted again, applying the lower order ARIMA model for the errors.The third step is to diagnose the number of lagged independent variables which will influence the forecast variable if the errors now exhibit stationarity.The fourth step is to compute the errors from the regression model and detect an appropriate ARMA model for the error series.The fifth step is to refit the entire model, applying the new ARMA model for the errors and the transfer function model for independent variables.Finally, the sixth step is to test the adequacy of the fitted model by analysing the residuals (Makridakis et al., 1998).As, after systematic investigation, the study has considered the area in the mixed model approach, the functional form of the final mixed model of the study is as follows: Where  + is the i th coefficient of the lagged independent variables, whereas  $ is the p th coefficient of the lagged response variable;  & is the q th coefficient of lagged error, and  # is the constant.
The best model can be carried out on the basis of the largest value of R 2 and the lowest value of root mean squared error (RMSE), mean absolute percent error (MAPE), and Bayesian information criterion (BIC), where the Ljung-Box test reveals that the residuals follow white noise.

Result and Discussion
The time series plot of maize production shows that the production of maize (including Rabi and Kharif) is fluctuating over time (Figure 1).The fluctuation with time indicates that the data series may not be stationary.Specifically, maize production increased frequently in the last 11 (eleven) financial years from 2008-09 to 2019-20.The increasing and decreasing trends of maize production suggest that the variance is not stable, which leads to the maize production data series being non-stationary because its mean and variance depend on time.It is also noticeable that the mean and variance do not remain constant over time, so we can assert that the production of maize is not stationary.The stationarity of maize (in M.Tons) from the financial year 1970-71 to 2019-20 has been checked on the basis of the Augmented Dickey-Fuller and Phillips-Perron tests.The Augmented Dickey-Fuller (ADF) unit root test and the Phillips-Perron (PP) unit root test are applied to determine whether the data series is stationary.The Augmented Dickey-Fuller (ADF) test with (⎸⎹ ≥ −6.3608) < 0.01 and the Phillips-Perron (PP) test with (⎸⎹ ≥ −63.2099) < 0.01 at a 5% level of significance adequately indicated that the data series are stationary and observed that there is no unit root after the second difference.At the second difference, maize production data for different years are stationary, as presented in Table 1.It is clear that the maize production data series shows an increasing trend, but the variance is not stable (Figure 1), and so we examine whether the second difference of the maize production data series shows stable variance, i.e., the differenced data series becomes stationary (Figure 2.a).To stabilize the variance and make the data stationary, a second difference is enough.The alternating positive and negative ACF (Figure 2.b) and exponentially decaying PACF (Figure 2.c) indicate an autoregressive moving average process.The PACF with a significant spike at lags 1 and 2, as well as the ACF with a significant spike at lag 1, suggests that no autoregressive coefficient and first-order moving average are effective on maize production in Bangladesh.  2 for predicting maize production on the basis of separate model selection criteria.From Table 2, we observe that ARIMA (0, 2, 1) is the best-selected model for predicting maize production among the alternative models tested under the Box-Jenkins methodology.The ARIMA model depends on crop production time series data (response variable) but does not include crop area data.On the other hand, the mixed-model approach is a dynamic regression model which makes out the role of independent variables in determining the response variable.So, the mixed-model depends on crop production time series data (response variable), including crop area time series data (independent variable).Here, the mixed-model is the best-fitted model for predicting maize production on the basis of different model selection criteria.Except for MAPE, the mixed-model is the best-fitted model for predicting maize production rather than ARIMA (0, 2, 1) in Table 2.The mixed model for maize production incorporates the area of production and ARIMA (1, 0, 0) as the explanatory variables.
Various graphical tests of the residuals for the fitted ARIMA (0, 2, 1) model are represented in Figure 3.a, suggesting that there is no significant pattern and hence, there is no autocorrelation among the residuals."Histogram with Normal Curve" is applied to check the normality assumption of the residuals of the fitted model.The histograms with a normal curve of the residuals of the fitted ARIMA (0, 2, 1) and mixed model are given in Figure 3 (a, b).The histograms with a normal curve approximately suggest that the residuals of the fitted ARIMA (0, 2, 1) and mixed model are normally distributed.While the mixed model can be used to identify the emphasis of the time lag of the independent variable in determining the movement in the response variable, it is clear that our fitted mixed model, i.e., ARIMA (1, 0, 0), is better fitted on the basis of normality rather than the ARIMA (0, 2, 1) model and is adequate to predict maize production in Bangladesh.
Different varieties of maize are available in Bangladesh, whose plantation times are also different.Therefore, forecasts are necessary for different phases.It is significant to predict crop production at successive steps like planting, growing, and harvesting periods.ARIMA may be used for forecasting the production of crops in the plantation stage (or earlier).Similarly, the mixed-model approach may be used for forecasting the production of crops in the growing stage when we have information on the area under maize cultivation.On the basis of the most useful forecasting criteria, the forecast value of the ARIMA and mixed-model approach, as well as the 95% confidence level for five financial years for maize production, are shown in Table 3. Table 3 indicates that maize production will be 4,327,360 M.tons in 2020-21 and, if the present situation continues, the maize production of Bangladesh would be 5,575,575 M.tons in 2024-25 based on the ARIMA model forecasting.However, the mixed-model approach suggests that maize production for 2020-21 will be 4,202,586 M.tons and will reach up to 5,103,710 M.tons in 2024-25, considering land area and lag of land area as the independent variables.The length of the confidence interval of the predicted value for the mixed-model approach is lower than that of the ARIMA model.
The graphical comparison of the actual production data and the forecast production data is shown in Figure 4.It is apparent that the original production data of maize for the ARIMA model initially shows equal production and, after some time, shows a slightly upward tendency and again after 2000 shows a rapidly upward tendency in maize production in Bangladesh.The forecast maize production data varies from the original series by a small amount, which shows production in the same manner as the original data (Figure 4).Similarly, the original and forecast maize production data for the mixed-model approach initially show equal production, and after some time, the forecast maize production data shows an upward tendency compared to the original data.Also, the interval variation of forecast maize production data for the mixed-model approach is closer than that of the ARIMA forecast maize production data.Finally, it may be concluded that the mixed-model approach will provide a more appropriate interval estimation rather than the ARIMA model for maize production (Figure 4).

Conclusion
Maize, being an important cereal crop, has gained much interest, and the production and area under cultivation are increasing day by day.Furthermore, maize is thought to have a significant contribution to combating food insecurity in Bangladesh.Given the price volatility in local and international markets, systematic planning for maize cultivation is necessary in the country.To make maize cultivation sustainable and efficient, correct and timely information on the production is very important.This is very necessary with regards to storage, and import and export decisions, and any other incentive plans for the farmers.This study, using time series data, tried to predict the maize production in Bangladesh with the help of the Box-Jenkins methodology.The best-selected Box-Jenkins ARIMA model is ARIMA (0, 2, 1) for forecasting maize production all over Bangladesh.When the area of maize is considered in the analysis, the mixed model is the best model rather than the univariate ARIMA model.For both models, forecasts have been made for the period 2020-21 to 2024-25, where the mixed model shows a narrower range of confidence intervals.Given the availability of information on the area under cultivation, both models will be useful for the prediction of maize production, i.e., ARIMA for before (or during plantation), and the mixed model for before harvesting.As maize is being considered for dual use, e.g., food and feed, proper planning and accurate prediction will guide the country to achieve the most in terms of food security.

Figure 3 .
Figure 3. Several plots of residuals and Histogram with Normal Curve; a. ARIMA approach; b.Mixed-model approach

Figure 4 .
Figure 4. Comparison of original and forecast values of maize production; (a+b).ARIMA model approach; (c+d).Mixed-model approach

Table 1 .
Checking the stationarity of maize production (M.Ton)

Table 2 .
Model selection criteria for checking the best-fitted model

Table 3 .
Forecast value of the production of maize (in M.Ton) from the year 2020-21 to 2024-25