Forecasting winter wheat yields using MODIS NDVI data for the Central Free State region

AFFILIATIONS: 1Agricultural Research Council – Institute for Soil, Climate and Water, Pretoria, South Africa 2Department of Geography, Geoinformatics and Meteorology, University of Pretoria, Pretoria, South Africa 3School of Animal, Plant and Environmental Sciences, University of the Witwatersrand, Johannesburg, South Africa 4South African Weather Service, Pretoria, South Africa 5Agricultural Research Council – Small Grain Institute Production Systems, Bethlehem, South Africa


Introduction
Wheat (Triticum aestivum L.) is an important crop in many parts of the world including South Africa, where it is the second largest component of the staple diet after maize. 1,2Consequently, it is crucial to predict wheat yields as global wheat production is expected to decrease under conventional management as a result of climate variability. 3,4dditionally, a challenge exists to feed a growing human population while avoiding environmental problems such as deforestation and land degradation. 5The central Free State Province of South Africa is a land-locked, dryland wheat production region, which exhibits variable agricultural production as a consequence of droughts and a reduced capacity to operate in world markets owing to high transport costs and foreign exchange constraints. 6,7n order to ensure food security, there is a need for generating timely and accurate information on crop yields. 8e report here on the development of a reliable estimate of wheat yields using the Moderate Resolution Imaging Spectroradiometer -Normalized Difference Vegetation Index (MODIS-NDVI).Accurate forecasting of the yield potential of dryland wheat in the Central Free State region will aid agricultural decision-makers in balancing the trade of agricultural commodities and reducing short-term price instabilities. 9mmonly, yield forecasting models are more reliable if applied during specific critical plant growth stages.1][12] During this time, water deficiencies lead to yield losses by reducing the spike and spikelet numbers, as well as the fertility of the remaining spikelets. 13ater shortages during this stage also accelerate leaf senescence and reduce the rate of grain filling, thereby reducing the mean kernel weight. 14High temperatures reduce the number of grains per ear, kernel weight and harvest index, leading to reduced grain yields. 15getation indices derived from remote sensing technology are often used for crop monitoring and crop yield estimates. 2 The technique is based on the assumption that spectral data are related to canopy reflectance parameters which, in turn, are related to the final yield. 16,17The NDVI used in this study can be used as an indicator of the photosynthetic potential of a vegetation canopy. 18The NDVI makes use of the near infrared band and visible red band in the electromagnetic spectrum. 19The limitation of this index is that it gets saturated in areas with dense biomass, but hyperspectral data can overcome this limitation. 20,21Hyperspectral imaging is, however, costly as it requires a dedicated campaign, has a limited extent and a complex data structure compared to MODIS data, which is freely available.
][24] For example, Yang et al. 22 evaluated the accuracy of QuickBird satellite imagery and airborne imagery for mapping grain sorghum yield patterns.The results illustrated that QuickBird and airborne imagery had similar correlations with grain yield at 2.8-m and 8.4-m resolutions.Nuarsa et al. 23 estimated rice yields using an exponential model derived from Landsat Enhanced Thematic Mapper plus (ETM+) NDVI and field-observed rice yield in the Tabanan Regency.The study observed a coefficient of determination (R 2 ) and standard error of 0.852 and 0.077 tons/ha, respectively.Mutanga et al. 24 determined the optimal time for predicting sugarcane yield to be 2 months before harvest using NDVI derived from SPOT images.However, few studies have been done on using remote sensing data for yield estimation in South Africa. 25,26For example, Unganai and Kogan 25 demonstrated that corn yields can be estimated using Advanced Very High Resolution Radiometer (AVHRR) data with a spectral resolution of 0.58-12.5 µm.In the study it was found that Vegetation Condition Index and Temperature Condition Index derived from AVHRR data are highly correlated with corn yields.Frost et al. 26 demonstrated that Terra MODIS (0.6-1.1 µm) satellite sensor data products can be applied for maize yield estimation in South Africa.In that study, the window method was used and the resulting window periods showed that average NDVI and average Enhanced Vegetation Index data can be used for maize yield estimations.
Further studies are needed in South Africa on using remote sensing for yield predictions.This topic is important because wheat production is decreasing as a result of weather variability within the summer rainfall region, the deregulation of the wheat industry, and farmers converting to sustainable crops (e.g.soybean and canola). 1 Furthermore, timeous generation of yield projections will support timeous decisions concerning either importation or exportation of wheat.Therefore, the overall objective of this study was to develop a yield model for dryland winter wheat for the Central Free State region using MODIS data.This objective was undertaken by investigating whether wheat yields in that region are correlated with the MODIS-NDVI during the anthesis stage [10][11][12] , and validating the performance and adequacy of the calibrated model.

Study area
The Free State Province hosts four distinct dryland wheat production regions: Central Free State, North Western Free State, South Western Free State and Eastern Free State. 27,28These regions receive summer rainfall and experience frequent droughts.Therefore, farmers adopt farm management practices which make efficient use of rain for crop production.The underlying geology of the Free State is rocks from the Beaufort and Ecca Groups of the Karoo Supergroup, which make up the parent material for the soils. 27The study sites of Arlington, Tweespruit and Excelsior are in the Central Free State wheat production region (Figure 1).These sites are part of the National Wheat Cultivar Evaluation Programme conducted by the Small Grain Institute of the Agricultural Research Council (ARC-SGI).The programme delivers information about the performance of wheat varieties from the major breeding companies of South Africa.The sites were selected systematically in a manner such that they are representative of all the production conditions of this geographical region. 29Dryland wheat planting normally takes place from the first week of July (South African winter). 28

MODIS-NDVI
The 16-day composite MODIS-NDVI (MOD13Q1) product images with a 250-m resolution used in this study are freely available from the US National Aeronautical Space Agency (NASA) Earth Observing System (EOS) website.The data obtained were for a 10-year period from 2000 to 2013 (excluding years 2001, 2002, 2008 and 2011) based on the available wheat yield data collected at the study sites.The MOD13Q1 product is computed from the surface reflectance of each band (red and near-infrared) as it would have been measured at ground level if there was no atmospheric scattering or absorption. 30During data processing, corrections are made for the effects of atmospheric gases, aerosols and thin cirrus clouds. 31e downloaded MODIS-NDVI images were reprojected from Sinusoidal Projection to Geographic Projection using the MODIS Reprojection Tool. 32Additionally, rescaling of the raster images had to be done to correct the range of NDVI values to range from -1 to 1.The raster images were cropped with the wheat boundaries obtained from the ARC in collaboration with Geo Terra Image and Spatial Business Intelligence.The values for NDVI were extracted for each of the wheat boundary pixels at the three localities.In order to adequately test the cultivars on the planting date spectrum, which is as long as 3 months, two independent randomised trials (early and late planting) were planted at all sites within the geographical regions.

Savitzky-Golay filter applied to NDVI time series
The Savitzky-Golay 33 (S-G) filter algorithm was used to smooth out the MODIS-NDVI data.The S-G filter is based on local least-squares polynomial approximation.The advantages of the S-G filter are that it preserves features of the data such as relative maxima, minima and widths. 34The S-G smoothing algorithm is given by Equation 1: where Y * j is the filtered value, C i is the coefficient for the i-th NDVI value of the filter, Y j+i represents the original NDVI value, and N is the number of convoluting integers equal to smoothing window size (2m+1). 33he larger the value of m, the smoother the results at the expense of flattening sharp peaks. 34

Model development
A linear regression model was developed between the average yield of different late planted wheat cultivars and the average NDVI for the three study sites.The average yield was considered to be an independent variable and the average NDVI was considered to be a dependent variable according to: 2where P(Y|x) is the predicted yield as function of NDVI, x is the NDVI, β 0 is the coefficient and β 1 is the constant for winter wheat yield.Different models (logarithmic, exponential and power) were compared for the purpose of evaluating the models so that the best fitting model was selected. 35Other studies 2,12,36 have found the linear model to be ideal for winter wheat yield and NDVI in various regions.

Model validation
Statistical tests were performed to validate the performance of the model.The goodness of fit of the model and the percentage of variance explained by the model were assessed using the coefficient of determination.The significance of the model was tested by means of p-values.The root mean squared error (RMSE) was also included in the analysis.Diagnostic plots were constructed to compare the observed yield and the predicted yield.The residuals were plotted against the wheat yield, which was necessary to check for linearity of the data and the presence or absence of inhomogeneity of variance. 37Additionally, a quantile-quantile (Q-Q) residual plot was used to assess how close the theoretical distribution was to the model distribution.A normal distribution is indicated by a strong linear pattern for the sample points, and outliers can also be detected by visual inspection of the plot. 38hese validation methods differ from the widely used methods, which mostly focus on the RMSE and comparing the correlation between observed yield and predicted yield.The validation techniques also aided in understanding the underlying trends in the data.

Model testing
The NDVI data for the years 2001, 2002, 2008, 2011 and 2014 were used for model testing.These years were not used in model calibration and model validation.The NDVI of these years was used to predict the expected wheat yield.The predicted yield was then subtracted from the observed yield and the percentage error was calculated for the observed yield.The standard error was used as a measure of the accuracy of the predicted yield for which values close to zero indicate high accuracy.The year 2015 could not be used in the analysis because of severe drought in the non-irrigated wheat regions.

Relationships between wheat yield and NDVI data
The best fitting model between wheat yield and NDVI was developed by using the Zadoks scale 39 to identify the critical growth stages of wheat.
1][12] This stage was also highly correlated with the final wheat yield in the Central Free State, as with other regions.The linear relationship between the average yield and average NDVI is represented by Equation 3: In this study, the seasonal maximum NDVI was used to correlate with average wheat yields.This time occurs approximately 30 days prior to harvest during the first week of November (day of year 305 in regular years).The range of NDVI values for the model is 0.32-0.49(Figure 2).These values fall within the threshold indicated by Ren et al. 2 for winter wheat of 0.2-0.8.The model was calibrated using NDVI but other parameters such as growing degree days or heat units, soil conditions and weather conditions can also be considered in order to improve the accuracy of the model.

Model validation
The regression models' predicted yield was compared with the observed yield from the 10-year winter wheat yield data (Figure 2).The p-value was 0.00161 (p<0.05) and the R 2 value was 0.73 between wheat yield and NDVI, indicating a good relationship between the variables.These results are similar to those reported by Lopresti 12 of an R 2 of 0.75 for winter wheat yield in Northern Buenos Aires Province, Argentina.The similarity of these results could be because both Argentina and South Africa are in the southern hemisphere.Periods for winter wheat production are similar for both countries because the seasonal cycles (winter from June to September) coincide.However, Ren et al. 2 observed an R 2 of 0.88 for Shandong (China) when relating the production of winter wheat with the accumulated MODIS-derived NDVI.
The RMSE of the calibrated model was validated against the observed yield.An RMSE of 0.41 tons/ha results from using the least squares regression line to predict the wheat yield.Becker-Reshef et al. 8 reported a similar RMSE of 0.44 tons/ha for Kansas using MODIS-NDVI for wheat forecasting.Moriondo et al. 11 observed an RMSE of 0.44 tons/ha and 0.47 tons/ha, respectively, when using AVHRR-NDVI data to develop a wheat yield model for two Italian provinces.
Diagnostic plots were constructed to assess the fit of the model and whether the residuals are normally distributed.Residual plots (Figure 3a) were used to determine if the model was linear.Linear models have a random scatter of data points whereas non-linear models have a distinctive pattern. 37A random dispersal of the residuals was observed, which means that the linear model is an ideal fit between the wheat yield and NDVI.The Q-Q plot depicted in Figure 3b indicated that the residuals are homogeneous, although small variations were present at the lower and upper tails of the plot.Periods of drought could induce such variations (outliers) and reduce the wheat yield as dryland wheat relies on residual soil moisture for growth.all observed a linear relationship between wheat yield and NDVI.However, Mkhabela et al. 43 observed that a power function was representative of the relationship between NDVI and spring wheat in different environments.The differences in models for spring wheat and winter wheat could be induced by different weather conditions and irrigation as winter wheat in South Africa is not irrigated.

Model testing
The percentage errors in Throughout this study MODIS data from an optical sensor was used but synthetic aperture radar (SAR) data can also be used.The advantages of SAR are that it can operate during the day or night, and in rainy or cloudy conditions.However, SAR data require complex processing, by often specialised software such as the INAHOR 44 and thus were not a focus of

Conclusion
The prospect of using MODIS-NDVI for winter wheat yield forecasts in the Central Free State production region was investigated using regression models.Findings suggest the best time to relate MODIS-NDVI to final wheat yields for this area is the period leading to 30 days before harvest (first week of November).This period coincides with the anthesis stage, and at this time, wheat yield is highly correlated with NDVI.The relationship between NDVI and wheat yield was significant with an R 2 -value of 0.73, a p-value of 0.00161 and an RMSE of 0.41 tons/ha.Furthermore, diagnostic plots, model testing and validation provided evidence of the reasonable levels of model accuracy, model reliability and a good fit.These techniques complement the widely used techniques of comparing the correlation between the observed yields and predicted yields, and using the RMSE.

Figure 2 :Figure 3 :
Figure 2: Central Free State predicted wheat yield as a function of observed wheat yield.
Annual wheat yield data for 10 seasons(2000-2013 excluding 2001,  2002, 2008 and 2011)were used for this study.These data were collected by the ARC-SGI under the National Wheat Cultivar Evaluation Programme.A randomised complete block design was used for the entire trials layout.The trials were planted inside non-irrigated wheat fields, in line with crop management practices with regard to tillage practices, seeding rates, weed control, fertiliser application, pest and disease control as well as planting date.The dryland wheat trials consist of five rows that are each 5 m long with an inter-row spacing of 0.45 m.The harvest area is 5 m × 1.35 m, represented by three central rows.
km Figure 1: Map of the Central Free State wheat production region depicting the three study sites.http://www.sajs.co.za 21 et al.41reported a power function to be representative of the relationship between soybean yield and canopy reflectance measurements at different soil types.The R 2value of the model was 0.80.Benedetti and Rossini 42 , Ren et al.2and Lopresti et al. http://www.sajs.co.za

Table 1 :
Model testing results for the observed and predicted yield Data provided by satellites at a high resolution, such as SPOT or RapidEye, are better for small-scale estimates of crop yield because many tiles are needed to cover a large area, which also is costly.