Duncan Golicher’s weblog

Priority areas for dry forest restoration in La Frailesca

Posted in Climate change, Evidence and Ecology, POSTGIS, Uncategorized by Duncan Golicher on July 8th, 2008

This is a marker for work in progress. Do not cite.

This morning I was contacted in order to provide geographical information for one of our partners in the Reforlan project. The aim of the research is to find priority areas for forest restoration.

The question of how to go about the analysis is thought provoking. I have already mentioned the problem of aggregation error in this weblog. Pooling data in order to find mean values does not work at the scale in which most of the important processes act. The classic example is mean rainfall for Mexico. At the same time, some elements on the landscape cannot be mapped at a fine scale, and others, such as management units (farms or ejidos) form natural aggregation according to social factors.

So perhaps the first step would be determine priority pixels according to a limited number of physical processes that can be resonably accurately mapped at fine resolution (30 m x 30 m) then aggregate to find the number of pixels in each agronomic unit that could be restored. An analysis of social and economic factors at this scale could then be undertaken in order to discover where restoration might be feasible and bring the most social benefit.

The mappable factors at the 30m pixel scale would be

1. Forest cover and recent deforestation: Pixels that have forest cover presumably do not need restoring. Pixels that have been recently deforested may be prioritised, although natural regneration may be a suitable mechanism in some areas close to the original forest.

2. Slope: Areas with steep slopes must be prioritised for restoration in order to avoid soil erosion. Factors such as slope length may be taken into account.

3. Insolation (aspect): South facing slopes that receive intense insolation during the dry winter months may be in need of restoration, but successful restoration may be more challenging.

4. Distance from rivers: Riparian areas may be in greater need of restoration and trees may establish more successfully in moist soil. At the same time these may be the most valuable areas for irrigated  agriculture.

5. Conservation status: Reserves should perhaps take priority when restoration is being considered.

The interesting element of the research involves finding a suitable way of combing the layers in order to allow ranks to be calculated. A typical approach is weighted averaging, but this may not allow interactive effects to be included.

The map below shows the estimated current extent of the forest in the region and recent deforestation hotspots.

Shaded areas are federal reserves.

Forest cover in La Frailesca: Red and yellow areas are recently deforested: Shaded areas are federal reserves.

Areas with the steepest slopes do tend to coincide with remaining forest cover, as would be expected, but some recent deforestation both intentionally and as a result of forest fires has taken place in the steeply sloping “Sierra” region in the West and South West  and South of the study region.

Slope in degrees (high to low is red through to green passing through blue)

Slope in degrees (high to low is red through to green passing through blue)

North facing slopes receive less direct sunlight during clear winter days and are more likely to be covered in evergreen forest. Fuel dries out on south facing slopes leading to more intense fires. Regeneration may be difficult due to hydric stress.

Yellow pixels are level areas that receive the \

Winter insolation: Yellow pixels are level areas that receive the \


Most deforestation has occured close to the major rivers.

Buffers around major rivers

Buffers around major rivers

It may also be useful to evaluate soil type (not shown) although the maps tend to be at a coarse scale.

Once pixels have been ranked according to aptitude and priority for restoration (the two factors may be inversely related in many cases) they may be aggregated to agricultural management units for further analysis of socio-economic factors.

A range of social economic parameters may be associated with agricultural management units (agebs)

A range of social economic parameters may be associated with agricultural management units (agebs)

Visualising deforestation:Marques de Comillas

Posted in Climate change, Evidence and Ecology, POSTGIS, Uncategorized by Duncan Golicher on July 2nd, 2008

Our ongoing work with Conservation International has led to the first regional forest cover change map that has been derived using a single consistent methodology. The details of the study have not yet been published. However as we work on validation and interpretation of the results a pattern has become very clear.

Previous deforestation studies in Chiapas have usually concentrated attention on well defined study areas. This could produce the impression that deforestation is a homogeneous process over the whole state. In fact study areas selected for deforestation analysis are usually those with the highest rates when compared to the rest of the region. This is quite natural. Many of the areas where deforestation is no longer occurring have already lost a large proportion of their forest cover. However the overall regional deforestation we have quantified is considerably lower than previous studies imply.

Where deforestation has followed the classic pattern of forest conversion to permanent agriculture or pasture the CI methodology, which is based on Landsat imagery, has provided a remarkably good match with high resolution imagery. However where chronic, low level forest disturbance takes place the overall impact of human activities are much less easily quantified at the resolution of Landsat imagery. The difficulty in accurately evaluating forest cover change increases in areas of dry forest. Nevertheless regional patterns are robust.

The clearest deforestation hotspot in the state of Chiapas remains the Marques de Comillas area in the Southern Lacandon. Deforestation and carbon sequestration in this area has previously been studied in some detail by De Jong et al 2000

The two images below are animated gif files which change when clicked on to enlarge them to full size. The show clearly how Landsat based deforestation analysis coincides in this area with the conclusions drawn from visual analysis of recent high resolution imagery. The visual analysis is produced by overlaying our change analysis in Google Earth using Geoserver, and through the use of QGis.

Hurricanes in Chiapas

Posted in Climate change, Evidence and Ecology, POSTGIS, Uncategorized by Duncan Golicher on April 19th, 2008

I have just added the historical hurricane paths to my PostGIS data base from NOAA http://maps.csc.noaa.gov/hurricanes/

The data base has storm tracks from 1851 to 2006. The above maps shows only those since 1990. This is a very valuable resource that is well worth including in the regional data base and looking at in more detail. The interesting element as far as the state of Chiapas is concerned is that very few hurricanes actually track over the state. This is clearer when the whole (150 year) historical data set is included as shown in the figure below. However the moist air masses associated with hurricanes can cause substantial rainfall in the state even when the centre of the hurricane is some distance to the north or east.

When the map is zoomed out further a very clear cut off point emerges, forming an almost horizontal line at around 8 degrees North. This is a feature associated with  the Intertropical Convergence Zone (ITCZ),

This rather sharp transition also explains some of the strange patterns in the maps of global circulation model simulations which also show a pronounced band around the equatorial ITCZ. The cut off may alter in extent over the next century as the effects of global warming are felt. If it expands slightly northwards, then rainfall associated with hurricanes and tropical storms may also move, leading to a general decline in rainfall in southern Mexico. This could occur even if hurricane intensity and frequency in the Atlantic basin follows an overall increasing trend.

PostGIS and R continued

Posted in Climate change, Evidence and Ecology, R scripts, Uncategorized by Duncan Golicher on April 2nd, 2008

Last week a colleague sent me an article published in Global Change Biology for comment. I have just got round to looking at it in detail. The abstract is available here (with full text if you have the institutional access.)

http://www.blackwell-synergy.com/doi/abs/10.1111/j.1365-2486.2007.01446.x

The authors claim have identified a correlation at the global scale between forest loss and flooding.

I have a lot of issues with the way the data has been analysed that I intend to discuss here. However one of the positive features of this article is that all the data that used is included as an appendix. This allows anyone to make their own mind up regarding the merits of the analysis.

I will make specific comments in a later post. But I first deal with different technical issue. Having setup a postgis database for Mexico and Central America I realised that I lacked a global country map to help visualise data of this sort. I tried importing the shapefiles that come with Arcview, but with no success. They apparently have errors in the topography. I also tried converting the worldhires data in the R mapdata package, but that also had issues with bad topography (apparently American Samoa was to blame). Finally I found this reply to a R help post from the prolifically helpful Roger Bivand

http://finzi.psych.upenn.edu/R/Rhelp02a/archive/78339.html

I tried

load(url(”http://spatial.nhh.no/R/etc/world_countries.rda”))
library(rgdal)
writeOGR(world_countries, “PG:dbname=mydb”, “world_countries”, “PostgreSQL”)

Bingo. In a couple of minutes I had a rather low resolution world countries map with ISO3166 two letter country codes ready for linkage to any data with a column that uses this international standard. It is now nicely tucked away in my postgis data base ready for use whenever needed. Very satisfying. Notice that I include this code as an example of how R connects to a postgis database once one has been set up. It obviously won’t work if you haven’t gone through the steps to establish your local data base. Here is how on Ubuntu. It is not something I would try under Windows,even though posgresql 8.2 apparently has now been ported to Windows.

I then had to reformat the data from the pdf copy of the article. This was slightly frustrating as it needed a bit of manual work in an open office spreadsheet. I then added the ISO3166 codes. This line (also suggested by Roger Bivand) helped a lot. It would be even better if editors and reviewers insisted on the use of standards when they are available.

d<-data.frame(d,ISO3166=world_countries
$ISO3166[match(toupper(as.character(d$Country)),as.character(world_countries$names))])

The nice thing about a postgis data base is that you can use it for storing all your data, spatial or otherwise, in one convenient place. So using RODBC connectivity the floods data can be incorporated by

library(RODBC)
con<-odbcConnect(”mydb”)
sqlSave(con,d,”floods”)

The only slightly fiddly aspect is that the saved table lacks a primary key. That can be resolved from R with few lines that run sql in postgresql.

odbcQuery(con,”ALTER TABLE floods ADD COLUMN index int4;”)
odbcQuery(con,”UPDATE floods SET index=cast(rownames AS integer);”)
odbcQuery(con,”ALTER TABLE floods ADD CONSTRAINT floods_pkey PRIMARY KEY(\”index\”);”)
odbcQuery(con,”ALTER TABLE floods DROP COLUMN rownames;”)

Then the tables can linked together as a temporary view from within R

sql<-”create view floods2 as select f.*, w.wkb_geometry from floods f,world_countries w where f.ISO3166=w.ISO3166″
odbcQuery(con,sql)

The view can be removed from the database whe it is finished with by

odbcQuery(con,”drop view floods2″)

With the view in place Qgis can be used for the visualisation instead of R, which is really helpful for casual analysis. This convenient interoperability, drawing on the particular strengths of a range of different tools within a common framework (statistical environment, database, GIS visualisation), is one of the great plus points of Open Source software from the point of view of a user. However you do have to discover how to do it first. You don’t get a “wizard” to help.

R does produce very good publication quality maps, but there is rather more overhead involved than when using the simple point and click graphical user interface of QGis I am now finding it much easier to send vector layers strait to postgis and work from there.

floods.png

The view can also be exported as a shapefile strait from QGis. As the wordpress site only allows me to upload files it thinks are documents I have placed it here with a false doc extension. Don’t try to open it, save the file to disk and then change the extension to .zip. floods2thisisreallyazipfile.doc

Permanent forest plots 2.

Posted in Climate change, Evidence and Ecology, Probability and statistics, R scripts, Uncategorized by Duncan Golicher on March 11th, 2008

After thinking a little more about the problem of the allometric equation in the paper that I discussed yesterday, I realised I had a partial solution already to hand. (Again please read this as a marker for an idea that could be developed further. This is one of the motivations for using an informal web log instead of leaving ideas as notes on my own hard disk.)

In order to work with a fairly simple individual based forest model I have implemented many different allometric equations in R over time. My favourites for use in data poor areas are still those derived originally by Botkin and Shugart from way back in the late seventies (see for example Shugart 1984). I have used them in both my thesis and a later implementation of the same model. Results from this are included in a recent book chapter. I aim to include the entire R code for this model on this site some point soon (after tidying up some elements and documenting).

The nice thing about the parameterisations used in these early models is that they were explicitly designed to be run with a minimal amount of knowledge. Superficially that doesn’t sound too encouraging. Shouldn’t we try to improve a model by adding more explicit detail? Not necessarily. After all purely empirical allometric equations assume no knowledge at all, apart from that provided through the data. That is their strength and weakness. In order to be used correctly you have to build them afresh for each area being studied. However if equations can be constrained by some very easily obtained field observations a sensible adjustment can be made for the conditions of each plot without going through the costly process of measuring everything afresh. The results certainly won’t be precise, but that is the point. Uncertainty can be included in the calculations through Monte Carlo simulation. The true value should then fall somewhere in the predicted range.

Botkin’s equations can turn measured diameter at breast height into estimated height and therefore estimated tree volume for any species based simply on an estimate of the maximum diameter the species can obtain and its maximum diameter. It is fundamentally a three parameter quadratic model constrained to take realistic values. As maximum height is quite site specific and can be estimated through rapidly made observations the equations are potentially very useful in the tropics where actual measurements of tree height are hard to obtain, or in cases of models of newly established forests when final height may have to be estimated from some knowledge of site quality. Notice that the equation discussed yesterday must be based implicitly on some sort of rule regarding tree height and form factor. You simply can’t get around this. The equation is a statistical black box that just takes in diameter and churns out above ground biomass. The fact that the equation doesn’t explicitly state that height has been estimated doesn’t make it robust. You simply can’t turn diameter into volume and thus biomass without making some sort of assumption regarding the height and shape of a tree, even if it is hidden within an impenetrable calculation. This is a very common failing of empirical equations and a good reason to generally mistrust their uncritical application (This is often the result of a misplaced appeal to authority, an empirical equation has been published by an important research group so it is eminently citable. Yes fine and good, but the equation still only applies to their particular data set.)

So here are the set of simple equations Botkin suggested. I have prefixed each function with f. to avoid confusing a function placed in the global environment in R with a variable.

(All the code shown below can be downloaded in this script … permanentplots2.doc)

f.B2<-function(Hmax,Dmax)2 * ((Hmax - 137) / Dmax)
f.B3<-function(Hmax,Dmax) (Hmax - 137) / Dmax ^ 2
f.BA<-function(Diameter)pi * (Diameter / 2) ^ 2
f.Height<-function(Diameter,Hmax,Dmax)137 + f.B2(Hmax,Dmax) * Diameter - f.B3(Hmax,Dmax) * Diameter ^ 2
f.AGB<-function(D,Hmax=2000,Dmax=100,FormFactor=0.8,r=0.5)r*FormFactor * f.BA(D) * f.Height(D,Hmax,Dmax)/1000

Let’s see how the equation for height works in practice. Assume maximum diameter is 120 cm, but forests (or species) vary in the maximum height they can reach. Note that as this is a quadratic (i.e. parabola) maximum diameter must coincide with the maximum in the data or the curve turns downwards.

D<-10:120

plot(D,f.Height(D,Dmax=120,Hmax=3000)/100,type=”l”,lwd=2,col=”green”,ylab=”Height (m)”,xlab=”Diameter”,ylim=c(0,30))
lines(D,f.Height(D,Dmax=120,Hmax=2000)/100,col=”blue”,lwd=2)
lines(D,f.Height(D,Dmax=120,Hmax=1000)/100,col=”red”,lwd=2)
legend(”topleft”,lwd=2,legend=c(”High (30m)”,”Medium (20m)”,”Low (10m)”),col=c(”green”,”blue”,”red”)
)

permanent-plots4.png

Now let’s compare with the formulas used in the paper and included in the previous post and multiply appropriately by basal area and form to produce volume.

plot(D,wet(0.5,D=D),type=”l”,lwd=2,col=”green”,ylab=”Above ground biomass (kg)”,xlab=”Diameter”)
lines(D,moist(0.5,D=D),col=”blue”,lwd=2)
lines(D,dry(0.5,D=D),col=”red”,lwd=2)
legend(”topleft”,lwd=2,legend=c(”wet”,”moist”,”dry”),col=c(”green”,”blue”,”red”))

points(D,f.AGB(D=D,Hmax=3500,Dmax=120),col=”blue”)
points(D,f.AGB(D=D,Hmax=2000,Dmax=120),col=”green”)
points(D,f.AGB(D=D,Hmax=1500,Dmax=120),col=”red”)
legend(”bottomright”,pch=1,legend=c(”Hmax=20m”,”Hmax=35m”,”Hmax=15m”),

col=c(”green”,”blue”,”red”))

permanent-plots6.png

So the curves are very comparable. The second set can be produced by making a single reasonable assumption regarding maximum height. In fact the equation for above ground biomass has four parameters. Maximum height, maximum diameter, “form factor” and wood density. The form factor is a term used in forestry and here it might best be expressed as “branchiness”. Due to the way tree growth occurs (the “pipestem”model) total above ground volume can never be expected to exceed the volume of a cylinder. Foresters are only interested in the trunk, but a factor ranging between 0.6 and 1 can be applied quite reasonably to biomass as a whole. Here I have set the value at 0.8. This reproduces quite well the original allometric equations. We can add in uncertainty regarding this parameter in the analysis.

This is now the point at which the equations become useful. If a lot of knowledge is available, then separate parameters can be assigned for each species. Maximum height is quite easy to get.Simply by making this justifiable assumption accuracy should improve markedly over the “one size fits all” assumption that was implicit in the paper under discussion. A nice element is that the quadratic is constrained so the meaningless estimates that can sometimes arise when using polynomials outside their range of applicability will never occur. With less information functional groupings can be used. In the worst case scenario the same equation can be used for all trees in the forest, as was apparently done with the empirical allometric equation in the paper (with the exception of including a species specific term for wood density).

Using R it is very easy to include uncertainty regarding these parameters. As has been pointed out in the previous post, this is likely to affect the conclusions drawn much more than within sample variability. The point is that bootstrapping on parameters of a polynomial allometric equation with no real life interpretation would be very dangerous. The curve could go in many unexpected directions. In contrast bootstrapping on an equation with direct interpretation is reasonable. Lets use a boxplot as a quick way of looking at the results. Lets assume a great deal of uncertainty in Hmax (20m with a sd of 3m) and some uncertainty in the form factor (0.8, sd=0.1). It is easy to extend the uncertainty to all four parameters.

boot.strap<-replicate(10000,f.AGB(D,Hmax=rnorm(1,2000,300),FormFactor=rnorm(1,0.8,0.1)) )
boot.strap<-data.frame(rep(10:120,times=10000),as.vector(boot.strap))
names(boot.strap)<-c(”D”,”V”)
boxplot(V~D,data=boot.strap,range=0.1,outline=F,col=2)

permanent-plots7.png

So we can see how uncertainty in total above ground biomass increases with the diameter of the tree. Finally lets see how this could be applied to the simulated data. If we just want an uncertain estimate of total above ground biomass by applying the bootstrapped conversion formula to a single large plot we can do it like this.

D<-rlnorm(100000,2.2,0.8 )
D<-D[D>10]
D<-D[D<120]

boot.strap<-replicate(1000,sum(f.AGB(D,Hmax=rnorm(1,2000,300),FormFactor=rnorm(1,0.8,0.1))) )
quantile(boot.strap/1000000,c(0.05,0.5,0.95))

5% 50% 95%
6.97 9.75 13.04

plot(density(boot.strap/1000000,bw=0.7),xlab=”Total above ground biomass kg x 10^6″)
abline(v=quantile(boot.strap/1000000,c(0.05,0.5,0.95)),col=c(”blue”,”black”,”red”))

permanent-plots8.png

The extension of the technique to be used on the difference between two measurement periods is trivial if the data is available. More accuracy is obtained by narrowing the range of parameters by using a separate parameter for each species, so there is clearly a cost of ignorance at work here.. This can even be quantified (cf with this post) The method can clearly also be applied to estimating future biomass yield in the case of plots being planted for fuelwood or biofuels, based on making some very simple but justifiable assumptions regarding growth form, wood density and planting density. Because a measure of uncertainty is produced this is very useful in decision making.

biomasa.doc

Permanent forest plots

Posted in Climate change, Current affairs, Evidence and Ecology, R scripts, Uncategorized by Duncan Golicher on March 9th, 2008

On friday my colleague Mario Gonzalez brought an interesting article to my attention that has been published on line in the Open Access biology journal PLoS Biology. I am a fan of this journal myself and I have tried to place a feed to the table of contents on this site (WordPress apparently doesn’t pick it up). I was particularly moved by the editors’ personal reasons for supporting Open Access.

The article in question is available by clicking here.

I was intrigued by the use of the words “Assessing evidence” in the title. The monitoring of large permanent plots has become a staple source of evidence for tropical forest ecologists. In recent years a great deal of progress has been made in understanding the processes at work in tropical forests, largely as a result of heroic efforts in a few well studied plots. Permanent plot monitoring involves a huge investment in time and labour. It produces a great deal of replication at the level of individual trees. However the amount of replication at the landscape scale is clearly limited. This leads to inevitable challenges for both statistical analysis and for the interpretation of results. I was very encouraged by the honesty and transparency in the article’s method section regarding problems associated with data quality. For example the authors explain that “For the plots with more than two censuses, we were able to correct anomalous dbh values by comparing the stem dbh growth rates across census intervals. If a tree showed a dramatic change in dbh growth rate, we changed the one outside of the range (−5 mm/y, +45 mm/y) with the likely value, and updated the dbh value accordingly. This filter was applied using a computer routine, and then checked manually.”

This sort of analysis involves a lot of work after the data has been collected. It also leads to some necessary but rather arbitrary decisions being taken. Trees generally really shouldn’t shrink over time, but anyone who has worked with this sort of data in the tropics has confronted the problem. One element that might help with this in the future is the wider use of PDAs in the field to enable real time sanity checking of data as it is recorded. Errors are often made in the data capture process and not detected until field work is completed. The level of training of technicians also tends to improve over time. So the next decade’s results should be more consistent than the last.

However this is not th element that most concerned me. Despite the interesting title I was left rather unsure whether the authors really had managed to evaluate all the evidence the data set provided. The problem in my mind arose from a common assumption that uncertainty within a statistical analysis is all attributable to variability in the data. In fact any assumption used in any calculation will change the results.. It is now becoming common practice to attempt to quantify the sensitivity of conclusions to assumptions, but this doesn’t seem to have been done here. Instead,even though a complete census was undertaken, statistical significance was based on within sample variability using an arbitrary division into sub-plots and the assumption of independence between them. It is not clear how useful this is.

In fact the main conclusion from the work would seem to be heavily reliant on the use of an allometric equation to convert diameter (usually measured at breast height) into an estimate of total oven-dry aboveground biomass, the fundamental response variable of interest.

Let’s look at how sensitive conclusions might be to this. There is a mistake in the parentheses of the functions as printed in the paper, but as I understand the equations they can be implemented in R using the following code. I have repeated the function with different name and parametrisation for each of the forest types. The important point is that, at least on my initial reading of the analysis, the authors’ applied the same function to all trees in each plot assuming the equation for a forest type applied to the whole plot.

(The code shown below is available in this text file. Use it if there are problems with quotation marks .. permanentplots1.doc )

dry<-function(r=0.5,D,a1=0.667,a2=1.784,a3=0.207,a4=0.0281){
D<-log(D)
r*exp(-a1+a2*D+a3*D^2 -a4*D^3)}

moist<-function(r=0.5,D,a1=1.499,a2=2.148,a3=0.207,a4=0.0281){
D<-log(D)
r*exp(-a1+a2*D+a3*D^2 -a4*D^3)}

wet<-function(r=0.5,D,a1=1.239,a2=1.980,a3=0.207,a4=0.0281){
D<-log(D)
r*exp(-a1+a2*D+a3*D^2 -a4*D^3)}

x<-1:120
plot(x,wet(0.5,D=x),type=”l”,lwd=2,col=”green”,ylab=”Above ground biomass”,xlab=”Diameter”)
lines(x,moist(0.5,D=x),col=”blue”,lwd=2)
lines(x,dry(0.5,D=x),col=”red”,lwd=2)
legend(”topleft”,lwd=2,legend=c(”wet”,”moist”,”dry”),col=c(”green”,”blue”,”red”))

permanent-plots.png

A point to notice is that the equations suggest that trees in moist forest have higher biomass than those in dry or wet forest. Seven of the ten plots in the paper are considered “moist” but these must apparently lie somewhere on a continuum between dry and wet. There seems to be no clear justification for using the same optimum equation for all of these plots. I am not quite sure of the units as the paper says this produces above ground biomass in Mg (tons). The units look as if they should be Kg to me.

Also these rather complex looking formula basically reproduce the same curve that could be provided by simply fitting a quadratic function with three parameters over the likely range for the data (10:120 cm lets say). Two parameters do not change.

The curves diverge only diverge sharply for larger trees and it may be argued that really large trees are rare, so this shouldn’t matter too much. In all undisturbed tropical forests that I know of there tend to be a large number of small trees and a very few large trees. Thus a simple simulated data set can be produced by slightly truncating a log normal distribution.

set.seed(1)
D1<-rlnorm(100000,2.2,0.8 )
D1<-D1[D1>10]
D1<-D1[D1<130]
hist(D1,main=”Simulated diameter distribution”,xlab=”DBH (cm)”,col=”lightgrey”)

permanent-plots1.png

With time, patience, knowledge of the plots and perhaps a suitable individual based simulation model to hand it would be possible to simulate some reasonable growth and mortality data for each tree over time. For the moment I will simply assume (quite wrongly of course) that all trees increase their diameters equally by 1cm. I will also assume no mortality at all, so an increment in biomass is assured. Not at all realistic, but the question is, how might the increment in biomass in a plot affected by the assumption made regarding the fixed paramaterisation of the allometric equation.

Let’s compare the assumption that the trees follows the dry forest formula against the assumption of the moist forest formula. The code divides the total number of trees into 200 random subplots and carries out the sort of bootstrap resampling that was used to provide confidence intervals. The test of significance used in the paper relied on whether 95% of the bootstrapped values where above zero. In this simulated case, without mortality, this is not interesting. The question is how important is the assumption made for the allometric equation.

nsubplots<-200
subplot<-as.factor(rep(1:nsubplots,each=length(D1)/nsubplots) )
D1<-D1[1:length(subplot)]
D2<-D1+1
d<-data.frame(subplot,D1,D2,moist=moist(D=D2)-moist(D=D1),dry=dry(D=D2)-dry(D=D1))
a<-with(d,tapply(moist,subplot,sum))
boot.moist<-replicate(1000,mean(a[sample(1:nsubplots,nsubplots,replace=T)]))
b<-with(d,tapply(dry,subplot,sum))
boot.dry<-replicate(1000,mean(b[sample(1:nsubplots,nsubplots,replace=T)]))

d2<-data.frame(category=rep(c(”moist”,”dry”),each=1000),bootstrap=c(boot.moist,boot.dry))
bwplot(bootstrap~category,data=d2,breaks=1000)

permanent-plots2.png

So in this simulated data, the confidence intervals attributable to the bootstrapping procedure are overwhelmed completely by the effect of changing the assumptions in the allometric equation. This was simulated data and the effect may be much less in real data. But it should be taken into account in some sense. One way of dealing with it is to bootstrap on the parameters used in the entire calculation as was done here.

Another interesting point is that although PLOs Biology is laudably open access, the raw data itself is not made available. Thus I had to simulate data to try to make a point. The future of open access research should (IMHO) be directed towards open, transparent and documented data analysis. R can play an important role in this as any analysis can be freely shared as a script. We always have to make assumptions when analysing data and some assumptions are a matter of personal preference or instinct. A future model for meta-analysis might be based on an “analysis of analyses”. The more ways a data set is looked at the more likely it is that its fundamental content will be revealed.

Deforestation in La Sepultura 2.

Posted in Climate change, R scripts, Uncategorized by Duncan Golicher on March 6th, 2008

Understanding and predicting patterns of deforestation in El Tablon.

protected-areas3.pngprotected-areas1.pngprotected-areas2.png

(Note: This is an informal marker for work in progress that will be formalised at a later date as part of the reforlan project. Any suggestions and contributions are welcome. The work has not been reviewed and the conclusions may change. Please contact me if you wish to use any part of this analysis )

tablon5.pngtablon4.pngtablon3.png

Deliberate destruction of existing forest cover is clearly a necessary component of anthropogenic deforestation. However, it is not a sufficient condition. For permanent deforestation to result there must be an impediment to forest regeneration. If vigorous regeneration occurs after timber harvest, or other forest use, then the activity meets at least one of the most basic criteria for sustainability. Unfortunately disturbance of the dry forests of La Sepultura biosphere is not being followed by regeneration. In the last decade the number of cattle grazing extensively within the buffer region of the biosphere reserve has increased rapidly. This has led not only to direct, deliberate removal of forest in order to provide pasture but also an increase in fire severity as the vegetation becomes more open and fuel dries out. Loss of vegetation leads to both chronic and acute soil erosion. Trampling and browsing by cattle prevents tree seedlings establishing. The result is generally agreed by both residents and researchers in the region to be causing degradation in the long term productivity.

tablon8.pngtablon6.pngtablon7.png

This afternoon a colleague sent me a students’ analysis of deforestation in a very different region for comments. The student had produced tables of correlation coefficients and then then fitted a multiple linear regression in order to “explain” the factors responsible for deforestation. This is common way of looking at deforestation, but not a very informative one. I was completely confused and couldn’t see the point of the analysis.

Here I will show what I would consider a more informative way to proceed. In most cases we start an analysis of deforestation with a reasonable amount of prior knowledge concerning causal processes and linkages. The aim of the analysis should be to try to gain some new insights or reinforce an important, interesting and informative linkage. If for example poorer farmers are considered more likely to deforest their land than wealthier farmers then an analysis based on a correlation between income and forest use may be useful. However all correlation analyses must be approached with extreme care. For example, if poorer farmers live in areas with poorer soils where regeneration doesn’t occur then the linkages are clearly more complex.

GIS allows us to analyse whole landscapes with complete raster coverages. This puts into very sharp relief the traditional use of statistical tests of significance. By taking all the pixels huge amounts of data points could be analysed. Significance is then virtually guaranteed for even the most minor correlation, simply by assuming (obviously wrongly) that all points are independent. Now that would clearly be bad practice. But, how many independent data points are there on any landscape? It is really impossible to tell. Fitting variograms can give a hint of typical range of autocorrelative effects but it is always a matter of degree. If one landscape represents a single altitudinal gradient, then in one sense there are no degrees of freedom to play with.

A reasonable starting position is to put on hold the idea of automated statistical inference based on known degrees of freedom (independent replication) and work on producing informative documentation of patterns. Many scientifically interesting hypotheses can be tested during this process.

R and GRASS can work together very easily under Linux. R can be run from GRASS and then on loading the package spgrass6 raster coverages can be read into memory directly form the current GRASS region.

projection: 99 (Transverse Mercator)
zone: 0
datum: wgs84
ellipsoid: a=6378137 es=0.006694379990141317
nsres: 29.99281781
ewres: 30.00276767
rows: 844
cols: 1007
cells: 849908

d<-readRAST6(c(”chisdem”,”slope”,”beam1″,”for75″,

“for87″,”for99″,”for2005″,”basin.tablon”,”streams.buffer”,”cost2″,”topindex”))

These layers are as follows:
chisdem is the digital elevation model
slope in degrees
beam1 direct beam insolation under clear skies on the first clear day of the year.
for75 forest cover in 1975
for87 forest cover in 1987
for99 forest cover in 1999
for2005 forest cover in 2005
basin.tablon watersheds with a threshold of 1000 ha
streams.buffer distances of 100,1000,2000,5000,10000,15000,100000 from water courses.
topindex

Without needing accces to GRASS the RspatialPixelsDataFrame can be loaded into memory from the Ecosur ftp site using the code below. The rest of the analysis can then be duplicated in R under Windows. Source here … tablon.doc

urlstring<-”ftp://anonymous@200.23.34.16/simulacion/Datos”
a<-paste(urlstring,”tablon.rob”,sep=”/”)
load(url(a))

Now although R is capable of fitting models using the entire coverages, this would be “overkill”. We can take either a regular, stratified or random set of points using spsample. If points are going to be fairly close together, i.e. the whole landscape pattern is of interest, then a regular sampling scheme is probably preferable. The code below samples 50,000 points from the whole bounding box, in other words the number of points falling on the masked area is rather less. The points are then overlaid on the coverages to extract the values.

a<-spsample(d,50000,”regular”)
image(d,1,col=terrain.colors(100))
points(a,pch=21,cex=0.15,bg=2)

a<-as.data.frame(overlay(d,a))

Now, how can we look at systematic patterns over the landscape? There are three potentially orthogonal coverages that might collectively form a useful starting point. They are cost of access as measured in cumulative slope percentage from the entrance to the watershed, slope itself, and direct beam radiation in the winter months (an informative variable derived from aspect). Our hypothesis, that has already been partially confirmed by visual pattern analysis, is that deforestation has affected the most accessible areas with gentler slopes first. This is admittedly something of a “no brainer” but controlling for it allows more subtle elements to be analysed. The effect of winter insolation is the most interesting as it has many applied implications for management of the buffer zone. It appears that slopes that receive more sunlight in the winter are vulnerable to deforestation on two counts. 1) Fuel dries out more quickly leading to more severe fires 2) Regeneration is prevented by hydric stress experienced by juvenile trees.

So if a clearly monotonic relationship between direct beam radiation and the probability of a pixel being deforested exists we would have clear evidence that a process is at work. This can support further research in the area and suggest more sustainable management practices. A useful modern statistical tool for looking at this is provided by generalised additive models. There are several reasons for using GAMs as a default first analysis of this complex pattern. 1) The response is binomial (forest-non forest), so scatterplots are not useful. 2) An additive model allows partialling out of effects, in other words statistically holding for the effects of variables 3) Generalised additive models use splines to depict the shape of the relationship. These are flexible curves, thus if the response is multi-modal or inconsistently “wavy” we can spot this early in the analysis and thus not be fooled by the assumption of a simple monotonic relationship if it doesn’t exist in the data. Overly complex responses tend to raise a red flag. In this context they are most probably attributable to other highly autocorrelated spatial processes overlaying the area of interst. For example if three forest plantations had been established at differing elevations a spline would be trimodal, but knowledge of the history of the area would reveal why an attempt to relate tree cover to elevation would not be an informative analysis.

So lets fit a gam and look at the results. We can use the package mgcv

library(mgcv)
mod<-gam(for2005~s(cost2)+s(beam1)+s(slope),data=a,family=”binomial”)
par(mfcol=c(2,2))
plot(mod)

tablon.png

There are several things to notice in these graphs. First the y axes show the response on the scale of the linear predictor (logit). This lies between – infinity and infinity. It is not natural to think on a logit scale. Back transformation is more usual to improve interpretability. This doesn’t matter here as we are not interested in the absolute calibration for the moment. The questions involve the shape of the relationships. The results are quite encouraging. All three are effectively monotonic and form close approximations to strait lines on the scale of the linear predictor. In other words the probability of finding a forested pixel increases with slope and inaccessibility but decreases sharply with winter direct beam insolation. The slight waviness at the top of the first curve can be attributed to the scarcity of data points at the extreme. The confidence interval widens. So we are quite justified in using a simple linear model as a descriptor of the pattern. Notice that as data points are not independent we do not even think of talking about statistical significance. However we have now picked up a clear trend over the landscape.

However now things can become more complex. General additive models are just that. Additive. In other words we can’t analyse interactions. There must be many interactive effects occurring here. In fact a complete predictive model of the landscape would be best provided by a black box such as a neural network. We can build models of arbitrary complexity form this sort of data. Even very complex uninterpretable terms are often supported. Thus model selection is a very complex issue. Lets simplify it using some pragmatism and common sense rather than statistical reasoning. We have established that winter radiation, accessibility and slope all have monotonic relationships with deforestation on the scale of the logit linear predictor. Thus we can now justify the use of classical logistic regression that can include an interaction term. Let’s ask if accessibility interacts with insolation after accounting for slope.

mod1<-glm(for2005~cost2+beam1+slope,data=a,family=”binomial”)
mod2<-glm(for2005~cost2*beam1+slope,data=a,family=”binomial”)
mod3<-glm(for2005~cost2*beam1,data=a,family=”binomial”)

anova(mod1,mod2,mod3)
Analysis of Deviance Table

Model 1: for2005 ~ cost2 + beam1 + slope
Model 2: for2005 ~ cost2 * beam1 + slope
Model 3: for2005 ~ cost2 * beam1
Resid. Df Resid. Dev Df Deviance
1 18077 17205.8
2 18076 17137.8 1 68.0
3 18077 17675.7 -1 -537.9

So adding in the interaction reduces the deviance, i.e. it produces a better fit to the data. It is particularly important to include slope per se in the model as an additional term. Notice we have still not carried out a significance test (the reduction in deviance would be significant and would be strongly supported by information criteria .

Lets look at model 2 in more detail. We can do this using the effects package.

library(effects)
plot(all.effects(mod2),x.var=”cost2″)

plot(all.effects(mod2),x.var=”beam1″)

tablon1.pngtablon2.pngtablon81.png

Note that effects plots also use the linear predictor scale, but are annotated using the response (probability of deforestation). The panel plots can be arranged using either variable. In the second example they progressively move from accessible to inaccessible (bottom left to top right). So we can conclude that the effects of winter insolation as a factor in deforestation seem to be more marked in more accessible areas even after accounting for slope. This helps to discount the possibility that we are analysing an artefact of the shadowing effect during the classification process and detecting a genuine trend. It also suggests an interesting process at work. Less remote areas have undergone a form of chronic deforestation through the cumulative effects of forest fires, uncontrolled browsing and poor regeneration that has been accentuated on south facing slopes receiving more sun in the winter months. It does indeed seem to be that deforestation is arising from poor regeneration.

We can go a step further and use the model to simulate how the historical deforestation process would have proceeded if these general relationships are assumed to have held in the past. The way to do this is to first predict the probability that a pixel is forested using the model based on the current pattern. Then arrange pixels in rank order. A simple simulation can be made by taking them in this order and deforesting in steps up to the current level.

mod<-glm(for2005~cost2*beam1+slope,data=a,family=”binomial”)
d[["pred"]]<-predict(mod,newdata=d@data)
d[["rank"]]<-rank(jitter(d[["pred"]]),na.last=”keep”)

for (i in seq(1000,90000,length=10)){
d[["defor"]]<-ifelse(d[["rank"]]<i,0,1)
image(d,”defor”,col=c(”red”,”green”))
view.3d(d,1,exag=1.5,14,pal=c(”red”,”green”))
filename <- paste(”pic”, sprintf(”%07.0f”, i),”.png”,sep=”")
rgl.snapshot(filename)
}

tablon1.gif

Now there are still some quite big issues with this. Although a reasonable match to the current pattern is provided, it is not perfect. This is probably due to nucleation effects of settlements. I will show how this can be added into the analysis and used to look at future deforestation risks in a later post.

Deforestation in La Sepultura

Posted in Climate change, Evidence and Ecology, GRASS scripts, R scripts, Uncategorized by Duncan Golicher on March 3rd, 2008

tablon.gif

The animation above first shows an accessibility index calculated as a cost surface with the entrance to the watershed as a starting point. There is an excellent “How To” on calculating cost surfaces in GRASS available here. The cost surface (warm colours accessible, cold colours inaccessible) is followed by an animation of deforestation estimated by supervised classification of a series of landsat images taken in1975, 1987, 1999 and 2003. The area is a focus case study, the Tablon watershed in the Sepaltura biosphere reserve. The set of images in Google Earth format are available here.

defor-tablon.png

Landsat imagery is the most frequently used for tracing patterns of deforestation over time. Landsat has been acquiring coverage of the Earth’s surface since 1972 when Landsat 1 was launched. Since then, four other satellites have been in operation. Landsat 1, 2, and 3 flew in a circular orbit 913 kilometers (570 miles) above the Earth’s surface and circled the Earth every 103 minutes (14 times a day). Landsat 4 and 5 fly about 705 kilometers (440 miles) above the Earth and circle every 98 minutes.

Landsat 4 and 5 are still operating. Landsat 6 was launched in 1993. Thus the sensors used to provide images have changed over time and are not easily comparable. The current Thematic Mapper (TM) class of Landsat sensors began with Landsat-4 (1982). This series continued with the nearly identical sensor on Landsat-5 (1984). Landsat-7 Enhanced Thematic Mapper Plus (ETM+) was carried into orbit in 1999. Currently, both the Landsat-5 TM and the Landsat-7 ETM+ are operational and providing data. Landsat-7 ETM+ experienced a failure of its Scan Line Corrector mechanism in May 2003. Data products have been developed to fill these gaps using other ETM+ scenes.

The images used for this analysis were downloaded from the global land cover facility and processed in GRASS. We are currently engaged in a classifying deforestation over a much larger area using similar imagery. Quantifying patterns of deforestation from these historical satellite images is never straightforward, for many reasons. Differences in the conditions when the images where taken can lead to misclassification of pixels leading to errors when documenting the overall spatial pattern. However even though it can be difficult to tell with certainty if a specific small area has changed, the general regional trends detected are quite reliable.

Recently high resolution images of this particular study region have become available and we are acquiring more. It is encouraging to find that at least the classification into “forest-non-forest” that has been achieved using Landsat imagery provides a very acceptable match with the visual impression provided by high resolution images. This can be seen below. You can check your own visual impression of the match by downloading the KML files and changing the transparency in Google Earth. The following lines load the layers into R.

urlstring<-”ftp://anonymous@200.23.34.16/simulacion/Datos”
a<-paste(urlstring,”tablondefor.rob”,sep=”/”)
load(url(a))

Visualising the consequences of global warming

Posted in Climate change, GRASS scripts, Uncategorized by Duncan Golicher on February 27th, 2008

I have just been working again on the HADCM3A2 scenario using GRASS. On taking a closer look at the model results I am surprised by dramatic consequences predicted for Mexico under this scenario in 100 years time. I subtracted the last fifty years mean monthly temperatures and mean monthly rainfall (measured as mm per day) from the means for 2000-2049 and 2050-2100 as simulated by this model. The red to green shaded figures show changes in rainfall with red as drier and green moister (note that the colour scheme is not necessarily strictly negative to positive). The blue to red maps show changes in temperature. The file name is in the top right hand corner prec50mx.1 means changes in the first fifty years in precipitation during the month of January, prec50mx.2 February etc. Note the scales on each map change depending on the absolute range.

So the fifty year scenario suggests a slight reduction in rainfall and moderate (0.5 C) warming.

test503.gif

Now look at the simulation after 100 years.

The reductions in rainfall are potentially severe and warming may be between 3 and 5 C. Even the cooler areas are positive warming, not cooling. This is the most extreme of the new IPCC SRES scenarios and one that is fundamentally similar to the old “business as usual” type scenarios.

The most surprising element of the HADCM3 model results, (that is also shown by the model under lower carbon emission scenarios) is the tendency towards drier conditions in Southern Mexico and Central America. Not all GCMs show the same regional pattern and the level of resolution of these models is still very low. Thus the results should be taken as representing one of many credible futures (see post on the ensembles project). A full treatment of the uncertainties involved in predicting the impact of climate change on the distribution and abundance of organisms should take into account as many credible futures as possible. However the prospect of a hotter, drier climate is very disturbing for conservation as many of the most vulnerable organisms are found under cooler, moister conditions.

test1001.gif


Encyclopedia of life

Posted in Climate change, Current affairs, Natural history by Duncan Golicher on February 26th, 2008

I have just registered with the Encyclopedia of Life. The idea behind the project is outlined in this BBC article.

The project aims to be the equivalent of Wikipedia in the field of systematics and ecology. If successful detailed descriptions of all the world’s known organisms could be available to anyone with an internet connection. This is a very exciting prospect, especially if the success of the project coincides with advances in the barcode of life. One of the barriers to greater understanding of the distribution and abundance of organisms remains difficulty in identification and ignorance of their characteristics. This initiative should help to democratise this knowledge and place it in the hands of field researchers. It is also a collaborative project in which we too have an obligation to contribute.

Biofuels and global warming

Posted in Climate change, Current affairs by Duncan Golicher on February 23rd, 2008

The bbc site today carried news that provides hope that the issues surrounding the use of biofuels to combat global warming are being addresed by European legislators.

The pros and cons of biofuels raise extremely complex issues. Without the time, the inclination nor the obligation to wade through the mass of calculations that are needed to reach a definitive opinion on whether biofuels can make any meaningful contribution to the fight against anthropogenic climate change I am reluctant to express an opinion.

If the scientific jury in general is still out on the issue it is only because they haven’t yet been told what the charges are. If anyone were to be asked to cast a verdict on whether biofuels alone can make a serious contribution to reducing global carbon emissions the issue would be much more quickly resolved. The photosynthetic process is simply not efficient enough to provide a meaningful proportion of our modern global economy’s energy needs. There are much better ways of turning sunlight into energy than growing plants and then burning them.

However the global issue is clouded by local concerns. There are potential winners from biofuels and their interests should of course be considered when weighing the issues. Biofuels can play a very positive role in recycling waste and giving value to forest products. If the correct checks and balances are put in place there may be a place for biofuels in our future energy balance. However to automatically label biofuels as green altenatives while so many legitimate questions have been raised would be irresponsible. There are serious concerns not only regarding  the net carbon balance involved in their production, but also the multiple undesirable side effects in terms of global food security, land use change and biodiversity loss.

A much broader issue is involved here that was touched on in a previous post. The IPCC scenarios that provide hope that carbon emissions  will be below the critical threshold all involve global rather than local action. When issues such as biofuels are under discussion it is important to evaluate how seriously the extent of future global warming is being taken. If an argument has not considered the wider system of global feedbacks doubts must be expressed. Turning food into fuel in one part of the world cannot fail to influence the behaviour of farmers in another part of the world. If behaviour change threatens forests it is clearly difficult for anyone involved in conservation and rural development to express support for the policy.

Predicting from model ensembles

Posted in Climate change, R scripts by Duncan Golicher on February 23rd, 2008

My current research requires the use of up to date predictions of climate change. These are being combined with species distribution data in order to look at impacts on diversity. Because we need to use credible predictions of climate change in our models over the last week I have been looking in some detail at the work being carried out by the ensembles project.

This is an ambitious, highly collaborative project that aims to develop an ensemble prediction system for climate change based on the principal state-of-the-art, high resolution, global and regional Earth System models. By bringing model results together within a common framework the researchers aim to quantify and even reduce the uncertainty in the representation of physical, chemical, biological and human-related feedbacks in the Earth System (including water resource, land use, and air quality issues, and carbon cycle feedbacks. In other words really big science, addressing a really big question. Fascinating stuff, even if most of the technical details are way beyond my understanding.

The most interesting, and rather novel aspect of this project is its stated aim of quantifying uncertainty through looking at multiple predictions from multiple models. It is rarely appreciated by decision makers (even some scientific reviewers can overlook the fact) that a fully quantified estimate of uncertainty requires very much more work to produce than a simple direct statement based on a single viewpoint or model. Take for example the varying estimates of carbon dioxide emissions over the next century presented in the SRES report.

emissions.png

This is a good example of the way different models with different structures and assumptions, lead to large differences in final predictions that can represented as a probability density function. Alterations in the parameterization of a single model can have the same effect. Decision makers often ask for a worst case, best case and most probable scenario. However until multiple scenarios have been evaluated the probabilities attached to each of these remain unknown. Even when large numbers of scenarios can be used ignorance of the true model structure still does not allow for a complete formalization of all the uncertain elements.

Mat Collins of the Hadley centre published an interesting and very accessible review of the issues involved in making predictions regarding future climates. collins-climate-models-ensembles-probabilities.pdf . Collins even gets to use a favourite quotation from Donald Rumsfeld. I will undoubtedly comment in more detail on the profundity of Rumsfeld’s famous lines in some later post.

Here I will look briefly at the point Collins makes regarding the relationship between model complexity and estimates of uncertainty in model predictions. Simple models have the advantage that assumptions can be specified in terms of probabilities for a certain parameter, while more complex models are less transparent. For example in aggregated energy balance models (EBMs) it is usual to specify the climate sensitivity (the global mean temperature change for a doubling of atmospheric CO2), whereas for a Global or General Circulation Model (GCM), the climate sensitivity is a function of the interaction between resolved and parameterized physical processes and cannot be specified a priori.

This means that large complex models are not necessarily “better” than small simple models. They are just doing very different jobs. Progress in climate research requires ever more complex models. The scientific process places a premium on such models and appreciates the work involved in model construction and analysis. However communication of uncertainties often requires the use of simple, intuitive models

This is good, because we can write down and analyse a small simple model in a few minutes in R, while writing and running a GCM is clearly somewhat more ambitious!

Lets take Collins example and use R to reproduce the result. Colins writes ..

“Assume that, after all our efforts, we have produced an estimate of the uncertainty in the climate feedback parameter

eq2.png

defined for an equilibrium climate with a radiative forcing of Q divided by the temperature change DT, which is normally distributed with a mean of 1.0 W mK2 KK1 and a 5–95% range of 0.6–1.4 W mK2 KK1. This is translated simply to a right-skewed distribution for climate sensitivity (owing to the inverse relationship in equation that has a 5–95% range of 2.7–6.3 K. Now, say, that we wish to stabilize atmospheric CO2 levels so as to avoid a global mean temperature rise, DT, of 3 K above pre-industrial values as that is deemed to be some ‘dangerous’ level of climate change. The radiative forcing, Q, for doubled CO2 is approximately 3.8 W mK2 and is known to vary logarithmically

eq1.png

where Cstab is future and the stabilized level of CO2 and Cpre is the pre-industrial level. Thus, in order to be 95% sure of not passing 3 K of global warming (or alternatively, accepting a 5% risk that we do), the CO2 must be stabilized at 1.4 pre-industrial levels or around 386 ppm by volume. This is approximately the CO2 concentration today.
However, assume that we might improve our estimate of the feedback parameter, producing a distribution with the same mean value but with half the 5–95% range (0.8–1.2 W mK2 KK1). Then, to be 95% sure that we do not exceed 3 K warming, we only have to stabilize at 431 ppm. We have bought society some time to develop technologies to limit emissions and allowed economic development for poorer societies.

Is this right? Can we confirm the results in R. Yes, its very easy. We can write the second equation as a function using a single line of R.

Cstab<-function(Cpre=273,deltaT=3,lambda,Q=3.8){Cpre*exp(deltaT*lambda*log(2)/Q)}

Notice this provides default values for Cpre,deltaT and Q, leaving lambda as an input.

Now because R has so many built in ways of simulating values from different distributions it is very easy to use a Monte Carlo simulation to reproduce the example.

Lets assume the first case lambda is distributed with mean 1 and sd 0.2 giving approximately the 95% confidence intervals in the article. We can simulate the distribution of 10,000 possible values with..

lambda<-rnorm(10000,1,0.2)

Now to find the quantile use

quantile(Cstab(lambda=lambda),c(0.05))

5%
392.5294
And in the second example

lambda<-rnorm(10000,1,0.1)

quantile(Cstab(lambda=lambda),c(0.05))

5%
431.0692

So, we used standard deviations of 0.1 and 0.2. This is not exactly the same as using Collins’ 95% confidence intervals. We also used stochastic simulation which is not exact, but the results is close enough to confirm the calculation. It is interesting to reflect on whether extreme precision in statistical calculations is ever really worth persuing when there are so many additional sources of uncertainty.

In this case there is a precise analytical result available, but if several parameters are uncertain the use of a simple Monte Carlo simulation in this way is an incredibly useful application for R as it extends to any function with no further effort. It is a simple task to give distributions for all the parameters of any function in exactly the same way.

The results can be plotted using

plot(density(Cstab(lambda=rnorm(10000,1,0.1)),bw=20),col=1,lwd=2)
lines(density(Cstab(lambda=rnorm(10000,1,0.2)),bw=20),col=2,lwd=2)
legend(”topright”,c(”sd lambda=0.1″,”sd lambda=0.2″),col=c(1,2),lwd=2)

fig3.png

Globalization, the internet and climate change

Posted in Climate change, Linux, Poverty in Chiapas and social issues, Uncategorized by Duncan Golicher on February 21st, 2008

Compiling data for species distribution modelling has led me to look in some depth at the IPCC SRES scenarios. Developing plausible storylines for the future development of the Global economy over the next hundred years is an extremely challenging task, However the IPCC have drawn on their considerable expertise to develop a sophisticated, consensual and inclusive set of scenarios. Without going into great detail, a very simple message struck me as important. The more optimistic scenarios always involve greater convergence between regions and a change towards a service and information economy. In other words “Globalization”.

This suggests that the internet itself could be a major factor in combating global warming. However the section of the World’s population that will suffer most of the negative consequences of global warming, the rural poor, remain largely excluded from access to the internet. I am very interested in the way developments in Open Source software can help to redress this. One of the most positive examples is the Edubuntu project that allows high quality thin client - fat server networks to be built from recycled PCs. I will continue this theme in later posts.

These are the four groups of “storylines” used in the scenario building. Note that the worst case scenario in terms of emissions and consequent climate change arises from A2 or B2.
The A1 storyline and scenario family describes a future world of very rapid economic growth, low population growth, and the rapid introduction of new and more efficient technologies. Major underlying themes are convergence among regions, capacity building, and increased cultural and social interactions, with a substantial reduction in regional differences in per capita income. The A1 scenario family develops into four groups that describe alternative directions of technological change in the energy system. Two of the fossil-intensive groups were merged in the SPM.

The A2 storyline and scenario family describes a very heterogeneous world. The underlying theme is self-reliance and preservation of local identities. Fertility patterns across regions converge very slowly, which results in high population growth. Economic development is primarily regionally oriented and per capita economic growth and technological change are more fragmented and slower than in other storylines.

The B1 storyline and scenario family describes a convergent world with the same low population growth as in the A1 storyline, but with rapid changes in economic structures toward a service and information economy, with reductions in material intensity, and the introduction of clean and resource-efficient technologies. The emphasis is on global solutions to economic, social, and environmental sustainability, including improved equity, but without additional climate initiatives

The B2 storyline and scenario family describes a world in which the emphasis is on local solutions to economic, social, and environmental sustainability. It is a world with moderate population growth, intermediate levels of economic development, and less rapid and more diverse technological change than in the B1 and A1 storylines. While the scenario is also oriented toward environmental protection and social equity, it focuses on local and regional levels.