Some of the fundamental questions about the Earth’s climate are only partially addressed: What is the relationship between temperature measurements and greenhouse gas emissions, and what do these relationships tell us about the sensitivity of climate to increased greenhouse gas concentrations? How can historical temperature measurements inform this understanding? To what extent are temperatures during the last few decades anomalous in a millennial context? What is the link between tropical cyclone intensity and ocean warming?

To answer these questions accurately, data that is reliable, continuous, and of broad spatial coverage is required. It is well known that direct physical measurements of climate fields (such as temperature) are limited both temporally and spatially, with measurement quality and availability sharply decreasing as one goes further back in time. Unfortunately, measurements of land and sea surface temperature fields cover only the post-1850 period (often referred to as the *instrumental period*), with large regions afflicted by missing data, measurement errors, and changes in observational practices. Hence, hemispheric temperatures during the past millennium can only be inferred indirectly by using temperature-sensitive geological proxy data such as tree rings, ice cores, corals, speleothems (cave formations), and lake sediments. These temperature-sensitive geological proxy data act as nature’s thermometers and thus contain valuable information about past climates; see Guillot, Rajaratnam and Emile-Geay (2013), Janson and Rajaratnam (2013), and other literature for more details on this topic.

The reconstruction of past climates using proxy data is basically a statistical problem which requires tools from various branches of the mathematical sciences. More concretely, the statistical paleoclimate reconstruction problem involves (1) extracting the relationship between temperature and temperature-sensitive geological proxy data, (2) using this relationship to backcast (or hindcast) past temperature, and (3) quantifying the uncertainty that is implicit in such paleoclimate reconstructions, i.e., make probabilistic assessments on past climate.

The problem is exacerbated by several methodological and practical issues:

- Data: Proxy data is not available everywhere on the globe and decreases sharply back in time.
- Data: Instrumental temperature data is limited, both spatially and temporally.
- Methodology: The high-dimensional nature of the reconstruction problem stems from the fact that the number of time points to undertake regression to relate temperature to proxies is very limited. Hence, standard statistical methods such as ordinary least squares regression techniques do not readily apply.
- Methodology: There is both temporal and spatial correlation in both proxy and temperature data.
- Methodology: The traditional assumption of normality of errors is often unrealistic due to the outliers in the data.

The nonstandard settings under which paleoclimate reconstructions have to be undertaken leads to a variety of statistical problems with important and deep questions in applied and computational mathematics and also in pure mathematics.

First, given the ill-posed nature of the regression problem, it is not clear which high-dimensional regression methodology or type of regularization (like Tikhonov regularization) is applicable. Second, the need to model a spatial random field requires specifying probabilistic models for understanding the correlation structure of temperature points and proxies in both space and time. Even a coarse 5-by-5 latitude/longitude gridded field on the earth leads to more than 2000 spatial points. Specifying covariance matrices of this order requires estimating about 2 million parameters—which is a non-starter given the fact that only 150 years of data is available. Hence, sparse covariance modeling is naturally embedded in the statistical paleoclimate reconstruction problem. Estimating covariance matrices in an accurate but sparse way leads to important questions in convex optimization. Regularization methods for inducing sparsity in covariance matrices leads to characterizing maps which leave the cone invariant. Such questions have actually been considered in a more classical setting by the work of Rudin and Schoenberg. They are however not directly applicable to the paleoclimate reconstruction problems and require further generalizations and extensions.

These are just a few examples where pure mathematics, statistics, applied and computational mathematics are essential tools in current techniques that are used for understanding Earth’s past climate. The need to develop rigorous mathematical and statistical tools is thus critical for such contemporary earth science endeavors.

References:

Guillot, D., B. Rajaratnam, and J. Emile-Geay (2013), *Statistical Paleoclimate Reconstructions via Markov Random Fields*, Technical Report, Department of Statistics, Stanford University; arXiv:1309.6702 [stat.AP]

Janson, L. and B. Rajaratnam, B. (2013), *A Methodology for Robust Multiproxy Paleoclimate Reconstructions and Modeling of Temperature Conditional Quantiles*, Journal of the American Statistical Association (in print); arXiv:1308.5736 [stat.ME]

Bala Rajaratnam

Department of Statistics and Environmental Earth System Science

Institute for Computational and Mathematical Engineering

The Woods Institute for the Environment

Stanford University