Extreme 1-day rainfall distributions : Analysing change in the Western Cape

FUNDING: None Severe floods in the Western Cape Province of South Africa have caused significant damage to property and infrastructure over the past decade (2003–2014). The hydrological design criteria for exposed structures and design flood calculations are based mostly on the implicit assumption of stationarity, which holds that natural systems vary within an envelope of variability that does not change with time. This assumption was tested by examining the changes in extreme 1-day rainfall high percentiles (95th and 98th) and both the 20and 50-year return period rainfall, comparing the period 1950–1979 against that of 1980–2009 across the province. A generalised Pareto distribution and a peaks-over-threshold sampling approach was applied to 76 rainfall stations across the province. Of these stations, 48 (63%) showed an increase in the 50-year return period 1-day rainfall and 28 (37%) showed a decrease in the 1980–2009 period at the 95th percentile peaks-over-threshold. At the 98th percentile peaks-overthreshold, 49 stations (64%) observed an increase and 27 (36%) a decrease for the later period. The change in the number of 3-day storms from the first to the second period is negligible, evaluated at 0.9% and 0.5% at the 95th and 98th percentile peaks-over-threshold levels, using cluster analysis. While there is no clear spatial coherency to the results, the general trend indicates an increase in frequency of intense rainfalls in the latter half of the 20th and early 21st centuries. These results bring into question assumptions of stationarity commonly used in design rainfall.


Introduction
Katz et al. 1 state that 'it is the unusual disturbances that have disproportionate effects on ecosystems'.This view can be extended to economic infrastructure.From 2003 to 2014, the Western Cape Province of South Africa (Figure 1) has been affected by severe storms occurring almost annually, which cause substantial damage to economic infrastructure and farmlands. 2,3These severe floods, caused mostly by intense rainfall resulting from cut-off low weather systems, resulted in damage equating to at least ZAR4.9 billion (Table 1). 2,3Whether this damage arises as a result of changes in the frequency and intensity of extreme rainfall, or of human intrusion into hazardous areas and land-use change is unknown.Unfortunately, financial loss records are sparse prior to the initial analysis done by Holloway et al. 2 , making analysis of prior losses difficult.However, extreme rainfalls are the predominant drivers of flood risk and concerns of increases in the frequency and/or intensity of extremes have been raised across the world. 4though national and provincial roads as well as hydraulic structures are designed for appropriate design lives, 5 the probabilistic estimation methods (through the use of the annual maximum series) of these design floods are based on the assumption of stationarity of the rainfall regime.Stationarity is a theory that natural systems vary within an envelope of variability that does not change with time.This is a foundational concept that is prevalent throughout hydrological engineering practice. 6However, flood levels for given return periods are known to be increasing in some places in the world 7 , and are possibly decreasing in other parts, which may be a result of changing rainfalls in these regions 8 .A limitation to the estimation of reasonably accurate return period rainfalls is that it requires long records (multi-decadal) of high-quality data. 7Unfortunately, in many areas, the length of the data record is limited and the possibility of a non-stationary climate is largely ignored. 6The design of hydraulic structures and determination of design flood-levels therefore assumes that weather events are independent outcomes of a stationary climate. 6t is, however, evident that the climate is changing in some places 9 and that the concept of stationarity is no longer an appropriate assumption, which potentially renders the design capacity of hydraulic structures inappropriate for the evolving conditions.We examined the stationarity of rainfall extremes for the historical rainfall record of 76 rainfall stations in the Western Cape Province (Figure 1) and, using appropriate statistical analytical processes, tested whether change is indeed occurring.

Previous studies examining changes in South African rainfall
Various previous studies have been undertaken to investigate the changing frequency and intensity of extreme rainfalls in South Africa.Mason et al. 10 found significant changes in rainfall over 70% of South Africa, except in parts of the winter rainfall region (the Western Cape) for two periods : 1931-1960 and 1961-1990.According to their analysis, the intensity of the 10-year rainfall events had increased by 10% over much of the country except in the northeast (Limpopo and Mpumalanga Lowveld and the eastern Highveld), northwest and the winter rainfall region (Western Cape).However, the density of the rain gauge network used in the study was low for those areas, indicating declines in maximum rainfall intensity.
Easterling et al. 11 found significant increases in heavy rainfall frequency over the thresholds of 25.4 mm and 50.8 mm for southwestern South Africa (the Western Cape) and parts of KwaZulu-Natal, respectively.The frequencies of these events were estimated to increase at 5.5% and 4.1% per decade, respectively.Richard et al. 12 discovered an increase in rainfall in the southeastern part of the country in a review of time series rainfall data, while Groisman et al. 13 showed an increase in frequency of very heavy rainfalls between the periods 1901-1910 and 1931-1997  for most of South Africa for those rainfalls equal to or above the 99.7 percentile (the upper 0.3%).Their data set was spatially differentiated by blocks of about 50 km on a side with six or more rain gauges per block.Kruger 14 found that changes in annual maximum 1-day rainfalls for the Western Cape above the 99th percentile are inconclusive for records ending in 2002.New et al. 15 , using similar methods, showed that there was a 'statistically significant increase in regionally averaged daily rainfall intensity and dry spell duration' from 1961 to 2000 across the southern African region, but that few trends at individual stations were statistically significant.

Methods and data
One of the most important tasks in analysing extremes is to choose the statistical basis for the calculation. 16There are two fundamental approaches used in extreme value theory: the 'block maxima' approach, used in the annual maximum series and the 'peaks-over-threshold' (POT) approach. 16The traditional approach to extreme value analysis in South African rainfall analyses has been to use the annual maximum series. 5,17,18he block maxima approach relies on identifying the highest value for each year and fitting a distribution to the data, while the POT approach fits a distribution to all data points that exceed a defined threshold. 16e advantage of the block maxima approach is its relative simple application; however, this method results in a substantial loss of information resulting from discarding values that are less than but close to the maximum in each year.As Katz 16 argues, this modelling approach may result in poorer return level estimates and the POT may be considered a more appropriate alternative.

Peaks-over-threshold
The POT approach is based on the same principle as that of the partial duration series, which was developed as a way of avoiding problems associated with the block maxima approach. 19The POT approach is predicated on fitting any suitable probability distribution, such as a generalised Pareto distribution (GPD), to all data points exceeding a defined extreme threshold. 20The benefit of this approach is the inclusion of more data, but it does have the drawback that the events being observed may not be fully independent, which is a requirement for modelling. 16e POT approach obtains extreme values of a data set X 1 , X 2 , ... ,X n by considering the exceedances Y = X -t over a sufficiently high threshold t.According to extreme value theory, the distribution of the exceedances Y 1 , Y 2 , ... ,Y Nt , where N t is the number of exceedances above the threshold t, can be reasonably approximated by a GPD. 21The distribution function of the GPD is given by Equation 1: where σ and γ are the respective scale and shape parameters of the distribution.The shape parameter, which is known as the extreme value index in the generalised extreme value distribution, is considered the most important parameter in determining the tail behaviour of a distribution.The procedure of fitting the GPD to the exceedances above the threshold requires the estimation of the two unknown parameters σ and γ.Their maximum likelihood estimates can be obtained through maximisation of the log-likelihood function corresponding to the GPD, Equation 2 with respect to σ and γ.
[25] Use of the GPD requires a high threshold and the choice of threshold must be such that the excess over the threshold should have a nearly exponential distribution -to fit with the requirements of the GPD theorem. 26 threshold value that is too large, however, will result in few data points (exceedances) from which the parameters of the GPD are estimated, with the consequence of possibly high variance of parameter and return level estimates.However, because the exceedances correspond to data points that lie far in the tail of the distribution of the original data, the limiting results on which the GPD is based are valid and the estimates are likely to exhibit low bias. 21In contrast, estimates that result from thresholds set too low will exhibit low variance as more data points (exceedances) are available for use in the estimation.However, data points that lie more towards the centre of the distribution of the original data, rather than in the tail, translate to estimates with higher degrees of bias. 21The choice of threshold is often made subjectively, keeping the above-mentioned bias-variance trade-off in mind. 21A statistical compromise needs to be achieved between setting the POT threshold high enough so that the excess distribution (above the threshold) converges to that of the GPD but low enough to have a sample of sufficient size so that the location, size and scale parameters can be estimated efficiently. 24tting the threshold high does not imply an exact value -the outcome is somewhat subjective.'ismev' -an add-on package to the R statistical language -provides a technique of fitting a range of thresholds, in which the scale and shape parameters gradually change over the fitted range. 27When setting the threshold, the scale and shape parameters should not end up diverging to such an extent as to imply increasing uncertainty in those parameters.While this is a useful process when applied as an intensive examination of an individual record, it is an overly time-consuming method when applied to many stations (as in this study).A more direct approach which could give sufficiently robust results is required.In similar extreme rainfall studies which evaluate many stations, two approaches are commonly adopted: (1) the use of relative thresholds or percentiles as thresholds (commonly the 95th and 98th percentiles) or (2) absolute thresholds. 4,28We thus took two useful thresholds to be located at the 95th and 98th percentiles for each rainfall station.Separate thresholds at the 95th and 98th percentiles were estimated for each time period (1950-1979 and 1980-2009) as the two periods need to be viewed as separate data sets.

Cluster analysis
Rainfall exceedances over thresholds often occur in groups, with one extreme closely following another (because they may be caused by the same event) and this clustering produces dependence in the observations. 21 utilising the GPD function for deriving the probability distribution of rainfall events, it is assumed that rainfall extremes fit the Poisson process -a stochastic process in which events occur continuously (as opposed to discrete occurrences) and, importantly, independently of one another. 16The Poisson process has a long history of use in rainfall modelling. 29It is therefore crucial to determine whether the data points that exceed the threshold value (threshold exceedances) are independent of each other when fitting a GPD. 24This problem is addressed by making use of a process known as (de)clustering.Once a definition of cluster length of exceedances has been specified, only the maximum value in each cluster is retained and the remainder of the exceedances are discarded.This process then results in a 'new' set of exceedances that are approximately independent and the GPD can then be applied.
In the case of large storm systems such as cut-off low pressure systems, mid-latitude cyclones and cold fronts prevalent in the Western Cape, heavy rainfall can last for 3 days and more per storm.As a result, the observed rainfall may exceed the threshold value every day over a sequence of days for the same event and thus the individual daily observations may not be independent.These data points need to be 'declustered' in order to remove such related threshold exceedances. 21he process of declustering groups is done as another step with the 'evd' package, which has a specific function serving that purpose.In this study, a value of r=3 in the declustering process was considered reasonable.The value of the parameter implies that three consecutive threshold deficits following exceedances above the threshold indicate the end of a cluster.Further, in order to investigate the influence of cluster length on the results, r=2 and r=4 were also considered but showed very little variation in the outputs.
Following this definition of clusters of exceedances, a cluster can be interpreted as a single extreme 1-day rainfall event (the highest value). Where: n t = the number of exceedances over the threshold t, and n c = the effective number of clusters.
The number of storms resulting in heavy rainfall lasting longer than a single day can be obtained by computing n t -n c .Here, n c is referred to as the number of effective clusters rather than just the number of clusters, to emphasise that any cluster may consist of only a single exceedance above the threshold t.A GPD was then fitted to the new 'declustered' data set using the 'ismev' package in R. http://www.sajs.co.za

Rainfall data
Rainfall stations used in this analysis were initially identified using the database compiled by Lynch 30 .Suitably long (60 years -divided into two 30-year periods) records of daily rainfall data of SAWS stations were selected.The Lynch 30 data set terminates in 2001.SAWS supplied the remaining data to extend the records to 2009.
Long data records are required in order to divide the rainfall record into equal time periods and to have sufficient data points for the fitting of a distribution.Therefore, use was made of patched (infilled) data provided by the Lynch 30 data set.
The use of patched data for extreme value analysis has precedents in the literature 31,32 and we believe that it provides, in this research, an acceptable means for overcoming gaps in data and extending records where necessary, especially in the early parts of the records.The possible errors that could be introduced into the analysis using this method are low as the rainfall events have already been observed at nearby rain gauges.Although rainfall records extend further back than 1950 for all of the stations used, the quantity of infilled data in these records increases substantially for earlier patched data sets.Thus, rainfall stations were chosen based not only on data length, but also on the low totals of infilled data.The quantum of all exceedances greater than the 95th percentile of the data used in this study that are infilled is 1.4%.

Rainfall catch anomalies and inhomogeneities
Mason et al. 10 and Zang et al. 33 mention the problem of inhomogeneities in the data.These inhomogeneities arise when a rain gauge is moved or instrumentation is changed, or as a result of other environmental factors that may influence rainfall catch.These authors caution against use of data containing inhomogeneities, noting that testing of long-term change requires quality-controlled data.Indeed, Mason et al. 10 left out many gauges for potential analysis because of these perceptions.
We take a different view.Our approach is that the quantum of individual extreme 1-day rainfalls overwhelms subtle changes within a station time series, and that removing such data from the study implies an a priori understanding of the impacts of the inhomogeneities in the data on extreme values.Inhomogeneities in the extremes of temperature data, by contrast and comparison, are of real concern, 34 but are likely because temperature extremes -both maxima and minima -are far more strongly bounded in nature than precipitation extremes.Thus, measuring an extreme temperature of 51 °C is very unlikely under natural conditions (i.e.above a vegetated surface) and the probability of an extreme temperature of 52 °C above the same surface would decrease by an order of magnitude or more.Measuring a temperature of 60 °C is almost impossible under current circumstances (an increase from 51 °C of 17.6%).However, a comparison of two extremes of rainfall -e.g. 150 mm and 200 mm in 24 h, an increase of 33% -is quite plausible and the upper limit of rainfall is very poorly defined, for example 597 mm in one day was recorded at St Lucia during Cyclone Domoina on 31 January 1984. 35Further, much of the data utilised in this study have already been checked and corrected for errors by Lynch 30 .

Results and discussion
Two key results based on the 95th and 98th percentile POT sampling are presented here, in which there is a strong change to more frequent extreme 1-day rainfalls in the later part of the rainfall record.Both results are considered in more detail below in separate sections.However, a spatial representation of these results reveals that these changes do not provide a clear spatial pattern of change.A pattern may be evident but is difficult to determine because of the low station density across the province.

Changes in design rainfalls
Table 2 presents a summary of the changes in the 50-year return period rainfalls.The GPD output is displayed in the return period rainfall plots in Figure 2 as examples.For the study, similar graphs were produced for each of the 76 rainfall stations.These graphs display the return period rainfalls (on the linear y-axis) against the return period (on the logarithmic x-axis) for both time periods.The blue response functions (continuous curves) in Figure 2 indicate 95% confidence intervals, while the black response function is the curve indicating estimated return period rainfalls for the same periods.Figure 2 displays results for a rainfall station at Villiersdorp that had an increase in the 50-year return period rainfall magnitude from 118.2 mm for the period 1950-1979 to 202.6 mm for the period 1980-2009 -an increase of 71%.
It should be noted that the confidence intervals expand rapidly as a consequence of the logarithmic scale used on the x-axis, and that, if both return period plots were displayed on the same set of axes, their confidence levels would overlap.The differences between the return period for individual stations are therefore not statistically significant.However, it can be expected that the differences for all 76 stations should be randomly and normally distributed.A following section addressed this assertion.However, the overall pattern of all gauges suggests that substantial changes in rainfall extremes have occurred in the Western Cape over the 60-year period studied.
Figure 3 represents the changes in both 20-and 50-year return period rainfall from the initial return level calculated in period 1 (1950-1979)  for each individual rainfall station based on the 95th percentile POT data series.A majority of rainfall stations show an increase (>0) in return period rainfalls in the second period (63% for 50-year return period rainfalls), in comparison with the number of rainfall stations that display a decrease (<0) in the magnitude of 50-year storms (37%).We also looked for spatial coherency in the signal by spatially interpolating between stations using ArcGIS (Figures 4 and 5).The results produced using the 98th percentile as the GPD threshold show an expectedly similar pattern to that of the 95th percentile.
Increases in 50-year return period rainfall intensity range up to 219% (0022803W Bellevue -between Villiersdorp and Worcester in the Cape Winelands).Using a higher threshold (98th percentile) also increases the proportion of rainfall stations that indicate a rise in the number of 50-year return period rainfalls to 64% -up from 63% using a 95th percentile threshold (Table 2).

Percentile analysis
The use of the GPD method requires a setting of the threshold value to extract the data series.For this study, the 95th and 98th percentiles were accepted as useful estimates of rainfall extremes, as discussed earlier.As the periods 1950-1979 and 1980-2009 need to be viewed as separate data sets for each station, the method entailed calculating separate percentile values for each time period across all rainfall stations.The changes in these percentiles represent changes in the tails of these data distributions, illustrated by the histogram in Figure 6, which represents the percentage of change from the earlier period at the 95th percentile for the given number of stations.The substantial weight of change lies in an increase for the later 1980-2009 period, represented statistically as a positive skewness of 1.32 and the station records are highly skewed to increases in 1-day extreme rainfall.A skewness close to zero should be expected for a comparison in which there is no change between periods.
These changes are represented spatially in Figure 7. Here, light areas represent a decrease in 95th percentile between the two periods, while increasingly darker areas represent regions showing 0-10%, 10-20% and >20% increases in the 95th percentile.The Swartland region up the West Coast, as well as a few isolated stations, display a decrease in 95th percentile between the two time periods, while the Kannaland, Oudtshoorn, Laingsburg and Prince Albert Local Municipalities of the Central Karoo and Eden Districts show a marked increase in 95th percentile, which indicates a substantial change in the tails of these distributions.Figure 7 indicates that, in some areas, the threshold value (95th percentile) for the latter period has changed when compared with the original 1950-1979 period.In general, these results show that rainfalls have become more extreme.

Conclusions
Our results indicate that the intensity of extreme 1-day rainfalls have shown both increases and decreases across the Western Cape Province in the previous 30 years in comparison to an earlier period.The results are positively skewed towards increases in intensity in the later period across the distribution of the rainfall stations included in this study.The results indicate that the general assumption of stationarity in design rainfall assessments should be strongly questioned.This study provides robust results because the two different methods used provide broadly similar results.These methods can be replicated by future studies in other regions across South Africa and elsewhere and should use the latest rainfall values so as to incorporate more data.The incorporation of additional (more recent) data for the later period to the present in an analysis may change the evaluation further still, as cut-off low weather systems have produced extreme rainfalls in the Western Cape each year from 2010 to 2014 which were not incorporated in this analysis. 3 the magnitude and probabilities of extreme 1-day rainfalls appears to be changing, we also suggest a need for a review of estimation methods and what these mean for design calculations.Our results have implications for the future design and maintenance of infrastructure, as well as water resources management.The statistical methods of evaluating the extreme rainfalls are evolving and it is important that new studies consider these methodological changes.They can add value to the design and decision-making process, and account for changing flood risks, which will benefit society by ultimately reducing deaths, injury and socio-economic losses.Further study is also needed on the changing seasonality of extreme events, which we did not address herein.
We also recommend further investigations into the methods of evaluation, for example including the full rainfall data sequences and using parametric non-stationary approaches by considering changes in the σ and γ parameters of the extreme value distributions with time. 21Du Plessis and Burger 36 have applied this approach in a limited way to shortduration (<24 h) rainfall intensities for a small number of rainfall stations across South Africa.Such potential future studies would represent analyses of continuous change over time, rather than the comparison of two periods presented here.

Figure 1 :
Figure 1: The Western Cape Province of South Africa showing the location of the 76 rainfall stations included in the study.

Figure 3 :Figure 4 : 6 South
Figure 3: 20-and 50-year return period rainfall changes by rainfall station as a percentage of change in the 1980-2009 period from the 1950-1979 period (peaks-over-threshold data series: 95th percentile threshold).

Figure 6 :
Figure 6: A histogram of the percentage of change return periods at the 95th percentile from the 1950-1979 period for all rainfall stations.

Figure 7 :Figure 7 :
Figure 7: Spatial changes in 95th percentile between 1950-1979 and 1980-2009 periods in the Western Cape Province interpolated between rainfall stations using inverse-distance weighting

Table 1 :
Financial and human losses as a result of extreme flood events in the Western Cape(2003-2014)

Table 2 :
A summary of the change in rainfall intensity at the 50-year return period for rain stations analysed at the 95th and 98th percentile peaks-over-threshold for the 1980-2009 period compared with the 1950-1979 period (n=76)