In June 2012, more than 3,000 daily maximum temperature records were broken or tied in the United States, according to the National Climatic Data Center (NCDC) of the U.S. National Oceanic and Atmospheric Administration (NOAA). Meteorologists commented at that time that this number was very unusual. By comparison, in June 2013, only about 1,200 such records were broken or tied. Was that number “normal”? Was it perhaps lower than expected? Was June 2012 (especially the last week of that month) perhaps just an especially warm time period, something that should be expected to happen every now and then? Also in June 2013, about 200 daily minimum temperature records were broken or tied in the United States. Shouldn’t that number be comparable to the number of record daily highs, if everything was “normal”?
Surprisingly, it is possible to make fairly precise mathematical statements about such temperature extremes (or for that matter, about many other record-setting events) simply by reasoning, almost without any models. Well, not quite. The mathematical framework is that individual numerical observations are random variables. One then has to make a few assumptions. The two main assumptions are that (1) the circumstances under which observations are made do not change, and (2) observations are stochastically independent, that is, knowledge of some observations does not convey any information about any of the other observations. Let’s work with these assumptions for the moment and see what can be said about records.
Suppose N numerical observations of a certain phenomenon have already been made and a new observation is added. What is the probability that this new observation exceeds all the previous ones? Think about it this way: Each of these N+1 observations has a rank, 1 for the largest value, and N+1 for the smallest value. (For the time being, let’s assume that there are no ties). Thus any particular sequence of N+1 observations defines a sequence of ranks, that is, a permutation of the numbers from 1 to N+1. Since observations are independent and have the same probability distribution (that’s what the two assumptions from above imply), all possible (N+1)! permutations are equally likely. A new record is observed during the last observation if its rank equals 1. There are N! permutations that have this additional property. Therefore, the probability that the last observation is a new record is N!/(N+1)! = 1/(N+1).
This reasoning makes it possible to compute the expected number of record daily high temperatures for a given set of weather stations. For example, there are currently about 5,700 weather stations in the United States at which daily high temperatures are observed. In 1963, there were about 3,000 such stations and in 1923 only about 220. Assuming for simplicity that each of the current stations has been recording daily temperatures for 50 years, one would expect that on a typical day about 2% of all daily high records are broken, resulting in about 3,000 new daily high records per month on average – if the circumstances of temperature measurements remain the same and if the observations at any particular station are independent of each other. It is fairly clear that temperature records for the same date are indeed independent of one another for the same station: Knowing the maximum temperature at a particular location on August 27, 2013 does not give one any information about the maximum temperature on the same day a year later. However, circumstances of observations could indeed change for many different reasons. What if new equipment is used to record temperatures? What if the location of the station is changed? For example, until 1945, daily temperatures in Washington, DC, were recorded at a downtown location (24th and M St.). Since then, measurements have been made at National Airport. National Airport is adjacent to a river, which lowered daily temperatures measurements compared to downtown. The area around the airport has however become more urban over the last decades, possibly leading to higher temperature readings (the well-known urban heat island effect). And what about climate change?
Perhaps it is better to use a single climate record and not thousands. Consider for example the global mean temperature record that is shown in the blog post of August 20. It shows that the largest global mean temperature for the 50 years from 1950 to 1999 (recorded in 1998) was exceeded twice in the 11 years from 2000 to 2010. The second-highest global mean temperature for these 50 years (that of 1997) was exceeded in 10 out of 11 years between 2000 to 2010. Can this be a coincidence?
There is a mathematical theory to study such questions. Given a reference value equal to the $m$th largest out of $N$ observations, any observation out of $n$ additional ones that exceeds this reference value is called an “exceedance”. For example, we might be interested in the probability of observing two exceedances of the largest value out of 50 during 12 additional observations. A combinatorial argument implies that the probability of seeing $k$ exceedances of the $m$th largest observation out of $N$ when $n$ additional observations are made equals
\[ \frac{C(N+k-m,N-m) C(m+n-k-1, m-1)}{C(N+n,N)} , \]
where C(r,s) is the usual binomial coefficient. The crucial assumption is again that observations are independent and come from the same probability distribution.
Applied to the global mean temperature record, the formula implies that the probability of two or more exceedances of a 50 year record during an 11 year period is no more than 3%. The probability of 10 exceedances of the second-highest observation from 50 years during an 11 year period is tiny – of the order of 0.0000001%. Yet these exceedances were actually observed during the last decade.
Clearly, at least one of the assumptions of stochastic independence and of identical distribution must be violated. The plot of August 20 already shows that distributions may vary from year to year, due to El Niño/La Niña effects. La Niña years in particular tend to be cooler when averaged over the entire planet. The assumption of stochastic independence is also questionable, since global weather patterns can persist over months and therefore influence more than one year. Could it be that more exeedances than plausible were observed because global mean temperatures became generally more variable during the past decades? In that case, low exceedances of the minimum temperature would also have been observed more often than predicted by the formula. That’s clearly not the case, so that particular effect is unlikely to be solely responsible for what has been observed.
We see that even this fairly simple climate record leads to serious questions and even partial answers about possible climate change, without any particular climate model.
Part of this contribution is adapted from the forthcoming book Mathematics and Climate by Hans Kaper and Hans Engler; SIAM, Philadelphia, Pennsylvania, USA (OT131, October 2013).