Predicting from model ensembles

My current research requires the use of up to date predictions of climate change. These are being combined with species distribution data in order to look at impacts on diversity. Because we need to use credible predictions of climate change in our models over the last week I have been looking in some detail at the work being carried out by the ensembles project.

This is an ambitious, highly collaborative project that aims to develop an ensemble prediction system for climate change based on the principal state-of-the-art, high resolution, global and regional Earth System models. By bringing model results together within a common framework the researchers aim to quantify and even reduce the uncertainty in the representation of physical, chemical, biological and human-related feedbacks in the Earth System (including water resource, land use, and air quality issues, and carbon cycle feedbacks. In other words really big science, addressing a really big question. Fascinating stuff, even if most of the technical details are way beyond my understanding.

The most interesting, and rather novel aspect of this project is its stated aim of quantifying uncertainty through looking at multiple predictions from multiple models. It is rarely appreciated by decision makers (even some scientific reviewers can overlook the fact) that a fully quantified estimate of uncertainty requires very much more work to produce than a simple direct statement based on a single viewpoint or model. Take for example the varying estimates of carbon dioxide emissions over the next century presented in the SRES report.

emissions.png

This is a good example of the way different models with different structures and assumptions, lead to large differences in final predictions that can represented as a probability density function. Alterations in the parameterization of a single model can have the same effect. Decision makers often ask for a worst case, best case and most probable scenario. However until multiple scenarios have been evaluated the probabilities attached to each of these remain unknown. Even when large numbers of scenarios can be used ignorance of the true model structure still does not allow for a complete formalization of all the uncertain elements.

Mat Collins of the Hadley centre published an interesting and very accessible review of the issues involved in making predictions regarding future climates. collins-climate-models-ensembles-probabilities.pdf . Collins even gets to use a favourite quotation from Donald Rumsfeld. I will undoubtedly comment in more detail on the profundity of Rumsfeld’s famous lines in some later post.

Here I will look briefly at the point Collins makes regarding the relationship between model complexity and estimates of uncertainty in model predictions. Simple models have the advantage that assumptions can be specified in terms of probabilities for a certain parameter, while more complex models are less transparent. For example in aggregated energy balance models (EBMs) it is usual to specify the climate sensitivity (the global mean temperature change for a doubling of atmospheric CO2), whereas for a Global or General Circulation Model (GCM), the climate sensitivity is a function of the interaction between resolved and parameterized physical processes and cannot be specified a priori.

This means that large complex models are not necessarily “better” than small simple models. They are just doing very different jobs. Progress in climate research requires ever more complex models. The scientific process places a premium on such models and appreciates the work involved in model construction and analysis. However communication of uncertainties often requires the use of simple, intuitive models

This is good, because we can write down and analyse a small simple model in a few minutes in R, while writing and running a GCM is clearly somewhat more ambitious!

Lets take Collins example and use R to reproduce the result. Colins writes ..

“Assume that, after all our efforts, we have produced an estimate of the uncertainty in the climate feedback parameter

eq2.png

defined for an equilibrium climate with a radiative forcing of Q divided by the temperature change DT, which is normally distributed with a mean of 1.0 W mK2 KK1 and a 5–95% range of 0.6–1.4 W mK2 KK1. This is translated simply to a right-skewed distribution for climate sensitivity (owing to the inverse relationship in equation that has a 5–95% range of 2.7–6.3 K. Now, say, that we wish to stabilize atmospheric CO2 levels so as to avoid a global mean temperature rise, DT, of 3 K above pre-industrial values as that is deemed to be some ‘dangerous’ level of climate change. The radiative forcing, Q, for doubled CO2 is approximately 3.8 W mK2 and is known to vary logarithmically

eq1.png

where Cstab is future and the stabilized level of CO2 and Cpre is the pre-industrial level. Thus, in order to be 95% sure of not passing 3 K of global warming (or alternatively, accepting a 5% risk that we do), the CO2 must be stabilized at 1.4 pre-industrial levels or around 386 ppm by volume. This is approximately the CO2 concentration today.
However, assume that we might improve our estimate of the feedback parameter, producing a distribution with the same mean value but with half the 5–95% range (0.8–1.2 W mK2 KK1). Then, to be 95% sure that we do not exceed 3 K warming, we only have to stabilize at 431 ppm. We have bought society some time to develop technologies to limit emissions and allowed economic development for poorer societies.

Is this right? Can we confirm the results in R. Yes, its very easy. We can write the second equation as a function using a single line of R.

Cstab<-function(Cpre=273,deltaT=3,lambda,Q=3.8){Cpre*exp(deltaT*lambda*log(2)/Q)}

Notice this provides default values for Cpre,deltaT and Q, leaving lambda as an input.

Now because R has so many built in ways of simulating values from different distributions it is very easy to use a Monte Carlo simulation to reproduce the example.

Lets assume the first case lambda is distributed with mean 1 and sd 0.2 giving approximately the 95% confidence intervals in the article. We can simulate the distribution of 10,000 possible values with..

lambda<-rnorm(10000,1,0.2)

Now to find the quantile use

quantile(Cstab(lambda=lambda),c(0.05))

5%
392.5294
And in the second example

lambda<-rnorm(10000,1,0.1)

quantile(Cstab(lambda=lambda),c(0.05))

5%
431.0692

So, we used standard deviations of 0.1 and 0.2. This is not exactly the same as using Collins’ 95% confidence intervals. We also used stochastic simulation which is not exact, but the results is close enough to confirm the calculation. It is interesting to reflect on whether extreme precision in statistical calculations is ever really worth persuing when there are so many additional sources of uncertainty.

In this case there is a precise analytical result available, but if several parameters are uncertain the use of a simple Monte Carlo simulation in this way is an incredibly useful application for R as it extends to any function with no further effort. It is a simple task to give distributions for all the parameters of any function in exactly the same way.

The results can be plotted using

plot(density(Cstab(lambda=rnorm(10000,1,0.1)),bw=20),col=1,lwd=2)
lines(density(Cstab(lambda=rnorm(10000,1,0.2)),bw=20),col=2,lwd=2)
legend(“topright”,c(“sd lambda=0.1″,”sd lambda=0.2”),col=c(1,2),lwd=2)

fig3.png

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s