**Understanding and predicting patterns of deforestation in El Tablon.**

*(Note: This is an informal marker for work in progress that will be formalised at a later date as part of the reforlan project. Any suggestions and contributions are welcome. The work has not been reviewed and the conclusions may change. Please contact me if you wish to use any part of this analysis )*

Deliberate destruction of existing forest cover is clearly a necessary component of anthropogenic deforestation. However, it is not a sufficient condition. For permanent deforestation to result there must be an impediment to forest regeneration. If vigorous regeneration occurs after timber harvest, or other forest use, then the activity meets at least one of the most basic criteria for sustainability. Unfortunately disturbance of the dry forests of La Sepultura biosphere is not being followed by regeneration. In the last decade the number of cattle grazing extensively within the buffer region of the biosphere reserve has increased rapidly. This has led not only to direct, deliberate removal of forest in order to provide pasture but also an increase in fire severity as the vegetation becomes more open and fuel dries out. Loss of vegetation leads to both chronic and acute soil erosion. Trampling and browsing by cattle prevents tree seedlings establishing. The result is generally agreed by both residents and researchers in the region to be causing degradation in the long term productivity.

This afternoon a colleague sent me a students’ analysis of deforestation in a very different region for comments. The student had produced tables of correlation coefficients and then then fitted a multiple linear regression in order to “explain” the factors responsible for deforestation. This is common way of looking at deforestation, but not a very informative one. I was completely confused and couldn’t see the point of the analysis.

Here I will show what I would consider a more informative way to proceed. In most cases we start an analysis of deforestation with a reasonable amount of prior knowledge concerning causal processes and linkages. The aim of the analysis should be to try to gain some new insights or reinforce an important, interesting and informative linkage. If for example poorer farmers are considered more likely to deforest their land than wealthier farmers then an analysis based on a correlation between income and forest use may be useful. However all correlation analyses must be approached with extreme care. For example, if poorer farmers live in areas with poorer soils where regeneration doesn’t occur then the linkages are clearly more complex.

GIS allows us to analyse whole landscapes with complete raster coverages. This puts into very sharp relief the traditional use of statistical tests of significance. By taking all the pixels huge amounts of data points could be analysed. Significance is then virtually guaranteed for even the most minor correlation, simply by assuming (obviously wrongly) that all points are independent. Now that would clearly be bad practice. But, how many independent data points are there on any landscape? It is really impossible to tell. Fitting variograms can give a hint of typical range of autocorrelative effects but it is always a matter of degree. If one landscape represents a single altitudinal gradient, then in one sense there are no degrees of freedom to play with.

A reasonable starting position is to put on hold the idea of automated statistical inference based on known degrees of freedom (independent replication) and work on producing informative documentation of patterns. Many scientifically interesting hypotheses can be tested during this process.

R and GRASS can work together very easily under Linux. R can be run from GRASS and then on loading the package spgrass6 raster coverages can be read into memory directly form the current GRASS region.

projection: 99 (Transverse Mercator)

zone: 0

datum: wgs84

ellipsoid: a=6378137 es=0.006694379990141317

nsres: 29.99281781

ewres: 30.00276767

rows: 844

cols: 1007

cells: 849908

d<-readRAST6(c(“chisdem”,”slope”,”beam1″,”for75″,

“for87″,”for99″,”for2005″,”basin.tablon”,”streams.buffer”,”cost2″,”topindex”))

These layers are as follows:

chisdem is the digital elevation model

slope in degrees

beam1 direct beam insolation under clear skies on the first clear day of the year.

for75 forest cover in 1975

for87 forest cover in 1987

for99 forest cover in 1999

for2005 forest cover in 2005

basin.tablon watersheds with a threshold of 1000 ha

streams.buffer distances of 100,1000,2000,5000,10000,15000,100000 from water courses.

topindex

Without needing accces to GRASS the RspatialPixelsDataFrame can be loaded into memory from the Ecosur ftp site using the code below. The rest of the analysis can then be duplicated in R under Windows. Source here … tablon.doc

urlstring<-”ftp://anonymous@200.23.34.16/simulacion/Datos”

a<-paste(urlstring,”tablon.rob”,sep=”/”)

load(url(a))

Now although R is capable of fitting models using the entire coverages, this would be “overkill”. We can take either a regular, stratified or random set of points using spsample. If points are going to be fairly close together, i.e. the whole landscape pattern is of interest, then a regular sampling scheme is probably preferable. The code below samples 50,000 points from the whole bounding box, in other words the number of points falling on the masked area is rather less. The points are then overlaid on the coverages to extract the values.

a<-spsample(d,50000,”regular”)

image(d,1,col=terrain.colors(100))

points(a,pch=21,cex=0.15,bg=2)

a<-as.data.frame(overlay(d,a))

Now, how can we look at systematic patterns over the landscape? There are three potentially orthogonal coverages that might collectively form a useful starting point. They are cost of access as measured in cumulative slope percentage from the entrance to the watershed, slope itself, and direct beam radiation in the winter months (an informative variable derived from aspect). Our hypothesis, that has already been partially confirmed by visual pattern analysis, is that deforestation has affected the most accessible areas with gentler slopes first. This is admittedly something of a “no brainer” but controlling for it allows more subtle elements to be analysed. The effect of winter insolation is the most interesting as it has many applied implications for management of the buffer zone. It appears that slopes that receive more sunlight in the winter are vulnerable to deforestation on two counts. 1) Fuel dries out more quickly leading to more severe fires 2) Regeneration is prevented by hydric stress experienced by juvenile trees.

So if a clearly monotonic relationship between direct beam radiation and the probability of a pixel being deforested exists we would have clear evidence that a process is at work. This can support further research in the area and suggest more sustainable management practices. A useful modern statistical tool for looking at this is provided by generalised additive models. There are several reasons for using GAMs as a default first analysis of this complex pattern. 1) The response is binomial (forest-non forest), so scatterplots are not useful. 2) An additive model allows partialling out of effects, in other words statistically holding for the effects of variables 3) Generalised additive models use splines to depict the shape of the relationship. These are flexible curves, thus if the response is multi-modal or inconsistently “wavy” we can spot this early in the analysis and thus not be fooled by the assumption of a simple monotonic relationship if it doesn’t exist in the data. Overly complex responses tend to raise a red flag. In this context they are most probably attributable to other highly autocorrelated spatial processes overlaying the area of interst. For example if three forest plantations had been established at differing elevations a spline would be trimodal, but knowledge of the history of the area would reveal why an attempt to relate tree cover to elevation would not be an informative analysis.

So lets fit a gam and look at the results. We can use the package mgcv

library(mgcv)

mod<-gam(for2005~s(cost2)+s(beam1)+s(slope),data=a,family=”binomial”)

par(mfcol=c(2,2))

plot(mod)

There are several things to notice in these graphs. First the y axes show the response on the scale of the linear predictor (logit). This lies between – infinity and infinity. It is not natural to think on a logit scale. Back transformation is more usual to improve interpretability. This doesn’t matter here as we are not interested in the absolute calibration for the moment. The questions involve the shape of the relationships. The results are quite encouraging. All three are effectively monotonic and form close approximations to strait lines on the scale of the linear predictor. In other words the probability of finding a forested pixel increases with slope and inaccessibility but decreases sharply with winter direct beam insolation. The slight waviness at the top of the first curve can be attributed to the scarcity of data points at the extreme. The confidence interval widens. So we are quite justified in using a simple linear model as a descriptor of the pattern. Notice that as data points are not independent we do not even think of talking about statistical significance. However we have now picked up a clear trend over the landscape.

However now things can become more complex. General additive models are just that. Additive. In other words we can’t analyse interactions. There must be many interactive effects occurring here. In fact a complete predictive model of the landscape would be best provided by a black box such as a neural network. We can build models of arbitrary complexity form this sort of data. Even very complex uninterpretable terms are often supported. Thus model selection is a very complex issue. Lets simplify it using some pragmatism and common sense rather than statistical reasoning. We have established that winter radiation, accessibility and slope all have monotonic relationships with deforestation on the scale of the logit linear predictor. Thus we can now justify the use of classical logistic regression that can include an interaction term. Let’s ask if accessibility interacts with insolation after accounting for slope.

mod1<-glm(for2005~cost2+beam1+slope,data=a,family=”binomial”)

mod2<-glm(for2005~cost2*beam1+slope,data=a,family=”binomial”)

mod3<-glm(for2005~cost2*beam1,data=a,family=”binomial”)

anova(mod1,mod2,mod3)

Analysis of Deviance Table

Model 1: for2005 ~ cost2 + beam1 + slope

Model 2: for2005 ~ cost2 * beam1 + slope

Model 3: for2005 ~ cost2 * beam1

Resid. Df Resid. Dev Df Deviance

1 18077 17205.8

2 18076 17137.8 1 68.0

3 18077 17675.7 -1 -537.9

So adding in the interaction reduces the deviance, i.e. it produces a better fit to the data. It is particularly important to include slope per se in the model as an additional term. Notice we have still not carried out a significance test (the reduction in deviance would be significant and would be strongly supported by information criteria .

Lets look at model 2 in more detail. We can do this using the effects package.

library(effects)

plot(all.effects(mod2),x.var=”cost2″)

plot(all.effects(mod2),x.var=”beam1″)

Note that effects plots also use the linear predictor scale, but are annotated using the response (probability of deforestation). The panel plots can be arranged using either variable. In the second example they progressively move from accessible to inaccessible (bottom left to top right). So we can conclude that the effects of winter insolation as a factor in deforestation seem to be more marked in more accessible areas even after accounting for slope. This helps to discount the possibility that we are analysing an artefact of the shadowing effect during the classification process and detecting a genuine trend. It also suggests an interesting process at work. Less remote areas have undergone a form of chronic deforestation through the cumulative effects of forest fires, uncontrolled browsing and poor regeneration that has been accentuated on south facing slopes receiving more sun in the winter months. It does indeed seem to be that deforestation is arising from poor regeneration.

We can go a step further and use the model to simulate how the historical deforestation process would have proceeded if these general relationships are assumed to have held in the past. The way to do this is to first predict the probability that a pixel is forested using the model based on the current pattern. Then arrange pixels in rank order. A simple simulation can be made by taking them in this order and deforesting in steps up to the current level.

mod<-glm(for2005~cost2*beam1+slope,data=a,family=”binomial”)

d[[“pred”]]<-predict(mod,newdata=d@data)

d[[“rank”]]<-rank(jitter(d[[“pred”]]),na.last=”keep”)

for (i in seq(1000,90000,length=10)){

d[[“defor”]]<-ifelse(d[[“rank”]]<i,0,1)

image(d,”defor”,col=c(“red”,”green”))

view.3d(d,1,exag=1.5,14,pal=c(“red”,”green”))

filename <- paste(“pic”, sprintf(“%07.0f”, i),”.png”,sep=””)

rgl.snapshot(filename)

}

Now there are still some quite big issues with this. Although a reasonable match to the current pattern is provided, it is not perfect. This is probably due to nucleation effects of settlements. I will show how this can be added into the analysis and used to look at future deforestation risks in a later post.

Pingback: Population density and deforestation « Duncan Golicher’s weblog