Visualising deforestation:Marques de Comillas
Our ongoing work with Conservation International has led to the first regional forest cover change map that has been derived using a single consistent methodology. The details of the study have not yet been published. However as we work on validation and interpretation of the results a pattern has become very clear.
Previous deforestation studies in Chiapas have usually concentrated attention on well defined study areas. This could produce the impression that deforestation is a homogeneous process over the whole state. In fact study areas selected for deforestation analysis are usually those with the highest rates when compared to the rest of the region. This is quite natural. Many of the areas where deforestation is no longer occurring have already lost a large proportion of their forest cover. However the overall regional deforestation we have quantified is considerably lower than previous studies imply.
Where deforestation has followed the classic pattern of forest conversion to permanent agriculture or pasture the CI methodology, which is based on Landsat imagery, has provided a remarkably good match with high resolution imagery. However where chronic, low level forest disturbance takes place the overall impact of human activities are much less easily quantified at the resolution of Landsat imagery. The difficulty in accurately evaluating forest cover change increases in areas of dry forest. Nevertheless regional patterns are robust.
The clearest deforestation hotspot in the state of Chiapas remains the Marques de Comillas area in the Southern Lacandon. Deforestation and carbon sequestration in this area has previously been studied in some detail by De Jong et al 2000
The two images below are animated gif files which change when clicked on to enlarge them to full size. The show clearly how Landsat based deforestation analysis coincides in this area with the conclusions drawn from visual analysis of recent high resolution imagery. The visual analysis is produced by overlaying our change analysis in Google Earth using Geoserver, and through the use of QGis.
Deforestation:Overlaying vector and raster layers in PostGIS using GRASS
A very common operation in GIS involves overlaying vector polygons on a raster map and calculating the area of each class in the raster map. The results are then added to the attribute table of the vector map. How can this be achieved in PostGIS?
If the raster layer is quite small it could be imported as points and the result achieved through a point in polygon SQL overlay. However that is not the usual way to achieve the operation. When large raster layers derived from the analysis of satelite imagery are of interest the conventional GIS approach is to first rasterise the vector layer. The computations involved in the overlay are then simple and fast.
This example taken from our ongoing work. It overlays a map of agricultural units on a raster layer representing deforestation in the last decade derived from analysis of satelite data. Because deforestation in Chiapas has tended to take place on a rather small scale involving only a few hectares at a time, it is not at all easy to see the large scale regional pattern from an image. Calculating the proportion of the total area of each agricultural unit deforested helps to see the pattern. The map above shows the percentage of the total (forested + non-forested) area of each agricultural unit estimated to have been deforested between 1990 and 2000.
The simplest way to achieve this result with PostGIS is to import the vector layer into GRASS, where the raster is also held. There are a few issues that have to be addressed when doing this. The first is that the gid index is lost when the attributes are imported. As a unique identifier for each polygon is needed during the overlay an index should be added to the table first if the gid is the only unique identifier for the table. This is very simple.
ALTER TABLE chiapas.agros ADD COLUMN id int8;
UPDATE chiapas.agros SET id=gid;
Now enter GRASS in a location with the same projection as the PostGIS data base (typically EPSG:4326)
Now there is another technical issue. Clearly vector processing in any GIS involves a combination of the geometery and attribute tables. We have both already in PostGIS. However in GRASS we use functions that work on GRASS’s own representation of the geometry, so an import is needed. GRASS can use a wide variety of databases as a back end for storing attribute data. Among these is Postgresql. What is not obvious, and potentially quite confusing, is that we already have the data in a Postgresql data base! So, does it all need to be imported again? The answer seems to be yes and no. It is quite possible to link the GRASS topology with the original table. However it is much easier to import it all in the usual way using v.in.ogr. This produces a new table, that could either stored in an external data base or perhaps even within the original PostGIS data base (as a new copy). Of course this attribute only table lacks the geometry column.
For example to set up GRASS to use the original local PostGIS data base as the backend for vector layers
db.connect driver=pg database=”host=localhost,dbname=gisdb2″
db.login user=Duncan pass=******
Note that this step is not necessary. By default Grass will store the table as in dbf format.
A layer can be imported from PostGIS using
v.in.ogr dsn=”PG:host=localhost dbname=gisdb2 user=Duncan” layer=chiapas.agros output=agros type=boundary,centroid
If the raster layer is in another coordinate system, as it is in this case, it will be necessary to exit the location and reproject in the location of the raster layer.
v.proj input=agros location=Global mapset=mesoanalysis output=agros
Then form a raster layer.
v.to.rast input=agros output=agros use=attr column=id layer=1 value=1
Then in GRASS the overlay is achieved quite simply using r.stats and sending the output to a text file.
r.stats -a input=agros,defor >agros.txt
This forms a file in which the first column holds the id of the vector layer, the second the class in the raster layer and the third the area in square meters of each combination. For example…
1 111 1319094.000000
1 112 159201.000000
1 122 1041304.500000
1 211 4026323.250000
1 222 85285437.750000
1 244 89347.500000
1 422 52796.250000
1 444 1749586.500000
2 111 565326.000000
So the task is now done, apart from the job of adding the data back into the original PostGIS table. There are many ways to do this. Often only one, or a few focal classes are of sufficient interest to merit a new column. So an easy way to process the output and send it back to PostGIS involves R.
Connect to the PostGIS data base using RODBC and set the schema
library(RODBC)
con<-odbcConnect(”mydb”)
odbcQuery(con,”SET search_path =chiapas, pg_catalog;”)
Read in the data and set the first column as numeric, removing any missing data (by default it will probably have been interpreted as a factor as GRASS will use a * for missing data).
d<-read.table(”agros.txt”)
d$V1<-as.numeric(as.character(d$V1))
d<-na.omit(d)
Now a subset of the data can be taken for each of the columns of interest, named and added back to the data base. In this case code “111″ represents area that was classified as forest in 1990,2000 and 2005. “122″ pixels that were deforested in the period between 1990 and 2000.
odbcQuery(con,”ALTER TABLE agros ADD COLUMN forest int4;”)
d2<-subset(d,d$V2==111)
d2<-d2[,-2]
names(d2)<-c(”gid”,”forest”)
d2$forest<-round(d2$forest/10000,0)
sqlUpdate(con,d2,”agros”)
odbcQuery(con,”ALTER TABLE agros ADD COLUMN defor2000 int4;”)
d2<-subset(d,d$V2==122)
d2<-d2[,-2]
names(d2)<-c(”gid”,”defor2000″)
d2$defor90<-round(d2$defor2000/10000,0)
sqlUpdate(con,d2,”agros”)
We are rather concerned that this cover change analysis, that has been achieved with the standard methodology used by Conservation International, tends to quite seriously underestimate small scale disturbance and canopy opening. Nevertheless in comparative terms it does demonstrate that absolute deforestation in Chiapas is certainly not a homogeneous process. Extensive deforestation is not occurring, although if the map is zoomed out further to show the Marques de Comillas area a very clear hotspot of deforestation is shown. In this part of Chiapas (Central Valley, Sierra and Altos) comparatively few agricultural units have deforested more than 2% of their total area in the last decade.
The other way of looking at this would be as the proportion of remaining forest lost. This woud tend to emphasise deforestation rates in ejidos and agricultural units with a very small ammount of forest in 1990. However at this level the approach would be somewhat unfair as it would be biased by classification errors in small areas.
To put the map in context a map with the proportion of the total area classified as forest can be used. In R the column can be added using
odbcQuery(con,”ALTER TABLE agros ADD COLUMN propforest float;”)
odbcQuery(con,”update agros set propforest=100*forest/hectares;”)
A surprising number of the agricultural units have over 40% tree cover. This has to be placed in the context of a classification method that draws no distinction between a pixel within a pasture with sufficient cover of secondary vegetation to be classified as forest and untouched primary forest. The next step clearly involves analysis of fragmentation in order to clarify this.
Landscape of the Highlands of Chiapas
While backing up some old information on my hard drive I rediscovered a document I produced for Pronatura at the end of November 2003. It lies somewhere between the status of informal field notes and a formal technical report. I presented it to Pronatura a week after a field trip in a light plane. The idea of the document was to record the discussion during the flight.
This was very much at the end of the “pre-google earth” era. Spatial analysis was still rather more of a specialised activity than it is today and I had only begun to become interested in the subject myself. My previous research was generally non-spatial.
Now a “fly over” of the region can be achieved rather more comfortably and safely with a laptop than a four seater plane. The experience was stomach churning at times with a few moments of sheer terror as the turbulence caused by thermals rising from the central depression hit.
Despite the fact that some of the analytical methods used were hurriedly applied and could certainly be refined, I still generally agree with most of the conclusions in the document.So I have placed it here in case it could be useful as a general introduction to the region. It is not directly citable, but similar statements to those included in the document have been made in peer reviewed work I have published together with Luis Cayuela, although there have been very slight differences in emphasis between my interpretation of patterns and causes of deforestation in the region and those adopted by Luis Cayuela and some other colleagues.
Here is the document in PDF form ….
Floods and forest cover 1
Good decision making should be based on good scientific advice. It is clearly the job of the scientific community to support decision makers whenever they feel able to do so. Floods cause tremendous suffering, not only in terms of loss of life but in the immense disruption caused to lives, livelihoods and economies. Most residents of Southern Mexico have been affected in some way by floods in Tapachula (2005) and Villa Hermosa (2007). Fortunately these severe events caused comparatively few lives to be lost directly. However their consequences to the regional economy were extremely grave. In both cases politicians in the lower part of the watersheds cited deforestation as a major contributory factor.
Conservation of the native forests of Chiapas will not be achieved unless the full value of the services they provide is appreciated by society. Yet the linkages between flooding and deforestation are complex. When a study claims to provide new evidence regarding linkages between deforestation and flooding it must be considered with some care.
Comments on the scientific value of the paper I cited in the previous post should be based only on evidence. I therefore checked most of the websites the authors used to derive their information before analyzing the paper. An important source was the world atlas of flood events at Dartmouth College
http://www.dartmouth.edu/%7Efloods/archiveatlas/index.htm
The data found there can be mapped directly using their web site as shown below.
I also downloaded and reformatted the raw data and moved it into postgis. It can be used as a shapefile through the zipped file here floodevents(change-extension-to-zip).doc
The data is clearly of variable quality. This is par for the course. Prior to 2006 the geographical coordinates were not entered explicitly in this particular data base. All the original raw data from the Dartmouth data (that I downloaded in Excel format) including textual descriptions of the floods can be loaded strait into R using read.table by changing the extension of the following file to txt flood_events.doc. Here are some quick (Qgis) maps derived from the data showing loss of life from the floods that count with coordinates.
There are various points to note before I look at the statistical analysis used in the article in greater detail. Southern and Eastern Asia seems to be particularly flood prone. Most floods on this continent occur around coastlines, on the lowlying floodplains and in the Himalayan watersheds. Africa has comparatively few floods. Flooding in North and Central America tends to be associated with the effects of tropical depressions and hurricanes.
The patterns of flooding at this scale seem to be determined largely by global circulation patterns, but the consequences and probability of being reported will be determined by settlement patterns and economic activity. Floods are rarely reported from the Amazon basin, even though much of the low lying areas of forest (várzea) are seasonally flooded and these floods vary in severity tremendously between years. These sort of details interest forest ecologists trying to trace tree distribution patterns but never make news because they do not affect human lives.
Deforestation in La Sepultura 2.
Understanding and predicting patterns of deforestation in El Tablon.
(Note: This is an informal marker for work in progress that will be formalised at a later date as part of the reforlan project. Any suggestions and contributions are welcome. The work has not been reviewed and the conclusions may change. Please contact me if you wish to use any part of this analysis )
Deliberate destruction of existing forest cover is clearly a necessary component of anthropogenic deforestation. However, it is not a sufficient condition. For permanent deforestation to result there must be an impediment to forest regeneration. If vigorous regeneration occurs after timber harvest, or other forest use, then the activity meets at least one of the most basic criteria for sustainability. Unfortunately disturbance of the dry forests of La Sepultura biosphere is not being followed by regeneration. In the last decade the number of cattle grazing extensively within the buffer region of the biosphere reserve has increased rapidly. This has led not only to direct, deliberate removal of forest in order to provide pasture but also an increase in fire severity as the vegetation becomes more open and fuel dries out. Loss of vegetation leads to both chronic and acute soil erosion. Trampling and browsing by cattle prevents tree seedlings establishing. The result is generally agreed by both residents and researchers in the region to be causing degradation in the long term productivity.
This afternoon a colleague sent me a students’ analysis of deforestation in a very different region for comments. The student had produced tables of correlation coefficients and then then fitted a multiple linear regression in order to “explain” the factors responsible for deforestation. This is common way of looking at deforestation, but not a very informative one. I was completely confused and couldn’t see the point of the analysis.
Here I will show what I would consider a more informative way to proceed. In most cases we start an analysis of deforestation with a reasonable amount of prior knowledge concerning causal processes and linkages. The aim of the analysis should be to try to gain some new insights or reinforce an important, interesting and informative linkage. If for example poorer farmers are considered more likely to deforest their land than wealthier farmers then an analysis based on a correlation between income and forest use may be useful. However all correlation analyses must be approached with extreme care. For example, if poorer farmers live in areas with poorer soils where regeneration doesn’t occur then the linkages are clearly more complex.
GIS allows us to analyse whole landscapes with complete raster coverages. This puts into very sharp relief the traditional use of statistical tests of significance. By taking all the pixels huge amounts of data points could be analysed. Significance is then virtually guaranteed for even the most minor correlation, simply by assuming (obviously wrongly) that all points are independent. Now that would clearly be bad practice. But, how many independent data points are there on any landscape? It is really impossible to tell. Fitting variograms can give a hint of typical range of autocorrelative effects but it is always a matter of degree. If one landscape represents a single altitudinal gradient, then in one sense there are no degrees of freedom to play with.
A reasonable starting position is to put on hold the idea of automated statistical inference based on known degrees of freedom (independent replication) and work on producing informative documentation of patterns. Many scientifically interesting hypotheses can be tested during this process.
R and GRASS can work together very easily under Linux. R can be run from GRASS and then on loading the package spgrass6 raster coverages can be read into memory directly form the current GRASS region.
projection: 99 (Transverse Mercator)
zone: 0
datum: wgs84
ellipsoid: a=6378137 es=0.006694379990141317
nsres: 29.99281781
ewres: 30.00276767
rows: 844
cols: 1007
cells: 849908
d<-readRAST6(c(”chisdem”,”slope”,”beam1″,”for75″,
“for87″,”for99″,”for2005″,”basin.tablon”,”streams.buffer”,”cost2″,”topindex”))
These layers are as follows:
chisdem is the digital elevation model
slope in degrees
beam1 direct beam insolation under clear skies on the first clear day of the year.
for75 forest cover in 1975
for87 forest cover in 1987
for99 forest cover in 1999
for2005 forest cover in 2005
basin.tablon watersheds with a threshold of 1000 ha
streams.buffer distances of 100,1000,2000,5000,10000,15000,100000 from water courses.
topindex
Without needing accces to GRASS the RspatialPixelsDataFrame can be loaded into memory from the Ecosur ftp site using the code below. The rest of the analysis can then be duplicated in R under Windows. Source here … tablon.doc
urlstring<-”ftp://anonymous@200.23.34.16/simulacion/Datos”
a<-paste(urlstring,”tablon.rob”,sep=”/”)
load(url(a))
Now although R is capable of fitting models using the entire coverages, this would be “overkill”. We can take either a regular, stratified or random set of points using spsample. If points are going to be fairly close together, i.e. the whole landscape pattern is of interest, then a regular sampling scheme is probably preferable. The code below samples 50,000 points from the whole bounding box, in other words the number of points falling on the masked area is rather less. The points are then overlaid on the coverages to extract the values.
a<-spsample(d,50000,”regular”)
image(d,1,col=terrain.colors(100))
points(a,pch=21,cex=0.15,bg=2)
a<-as.data.frame(overlay(d,a))
Now, how can we look at systematic patterns over the landscape? There are three potentially orthogonal coverages that might collectively form a useful starting point. They are cost of access as measured in cumulative slope percentage from the entrance to the watershed, slope itself, and direct beam radiation in the winter months (an informative variable derived from aspect). Our hypothesis, that has already been partially confirmed by visual pattern analysis, is that deforestation has affected the most accessible areas with gentler slopes first. This is admittedly something of a “no brainer” but controlling for it allows more subtle elements to be analysed. The effect of winter insolation is the most interesting as it has many applied implications for management of the buffer zone. It appears that slopes that receive more sunlight in the winter are vulnerable to deforestation on two counts. 1) Fuel dries out more quickly leading to more severe fires 2) Regeneration is prevented by hydric stress experienced by juvenile trees.
So if a clearly monotonic relationship between direct beam radiation and the probability of a pixel being deforested exists we would have clear evidence that a process is at work. This can support further research in the area and suggest more sustainable management practices. A useful modern statistical tool for looking at this is provided by generalised additive models. There are several reasons for using GAMs as a default first analysis of this complex pattern. 1) The response is binomial (forest-non forest), so scatterplots are not useful. 2) An additive model allows partialling out of effects, in other words statistically holding for the effects of variables 3) Generalised additive models use splines to depict the shape of the relationship. These are flexible curves, thus if the response is multi-modal or inconsistently “wavy” we can spot this early in the analysis and thus not be fooled by the assumption of a simple monotonic relationship if it doesn’t exist in the data. Overly complex responses tend to raise a red flag. In this context they are most probably attributable to other highly autocorrelated spatial processes overlaying the area of interst. For example if three forest plantations had been established at differing elevations a spline would be trimodal, but knowledge of the history of the area would reveal why an attempt to relate tree cover to elevation would not be an informative analysis.
So lets fit a gam and look at the results. We can use the package mgcv
library(mgcv)
mod<-gam(for2005~s(cost2)+s(beam1)+s(slope),data=a,family=”binomial”)
par(mfcol=c(2,2))
plot(mod)
![]()
There are several things to notice in these graphs. First the y axes show the response on the scale of the linear predictor (logit). This lies between – infinity and infinity. It is not natural to think on a logit scale. Back transformation is more usual to improve interpretability. This doesn’t matter here as we are not interested in the absolute calibration for the moment. The questions involve the shape of the relationships. The results are quite encouraging. All three are effectively monotonic and form close approximations to strait lines on the scale of the linear predictor. In other words the probability of finding a forested pixel increases with slope and inaccessibility but decreases sharply with winter direct beam insolation. The slight waviness at the top of the first curve can be attributed to the scarcity of data points at the extreme. The confidence interval widens. So we are quite justified in using a simple linear model as a descriptor of the pattern. Notice that as data points are not independent we do not even think of talking about statistical significance. However we have now picked up a clear trend over the landscape.
However now things can become more complex. General additive models are just that. Additive. In other words we can’t analyse interactions. There must be many interactive effects occurring here. In fact a complete predictive model of the landscape would be best provided by a black box such as a neural network. We can build models of arbitrary complexity form this sort of data. Even very complex uninterpretable terms are often supported. Thus model selection is a very complex issue. Lets simplify it using some pragmatism and common sense rather than statistical reasoning. We have established that winter radiation, accessibility and slope all have monotonic relationships with deforestation on the scale of the logit linear predictor. Thus we can now justify the use of classical logistic regression that can include an interaction term. Let’s ask if accessibility interacts with insolation after accounting for slope.
mod1<-glm(for2005~cost2+beam1+slope,data=a,family=”binomial”)
mod2<-glm(for2005~cost2*beam1+slope,data=a,family=”binomial”)
mod3<-glm(for2005~cost2*beam1,data=a,family=”binomial”)
anova(mod1,mod2,mod3)
Analysis of Deviance Table
Model 1: for2005 ~ cost2 + beam1 + slope
Model 2: for2005 ~ cost2 * beam1 + slope
Model 3: for2005 ~ cost2 * beam1
Resid. Df Resid. Dev Df Deviance
1 18077 17205.8
2 18076 17137.8 1 68.0
3 18077 17675.7 -1 -537.9
So adding in the interaction reduces the deviance, i.e. it produces a better fit to the data. It is particularly important to include slope per se in the model as an additional term. Notice we have still not carried out a significance test (the reduction in deviance would be significant and would be strongly supported by information criteria .
Lets look at model 2 in more detail. We can do this using the effects package.
library(effects)
plot(all.effects(mod2),x.var=”cost2″)
plot(all.effects(mod2),x.var=”beam1″)
Note that effects plots also use the linear predictor scale, but are annotated using the response (probability of deforestation). The panel plots can be arranged using either variable. In the second example they progressively move from accessible to inaccessible (bottom left to top right). So we can conclude that the effects of winter insolation as a factor in deforestation seem to be more marked in more accessible areas even after accounting for slope. This helps to discount the possibility that we are analysing an artefact of the shadowing effect during the classification process and detecting a genuine trend. It also suggests an interesting process at work. Less remote areas have undergone a form of chronic deforestation through the cumulative effects of forest fires, uncontrolled browsing and poor regeneration that has been accentuated on south facing slopes receiving more sun in the winter months. It does indeed seem to be that deforestation is arising from poor regeneration.
We can go a step further and use the model to simulate how the historical deforestation process would have proceeded if these general relationships are assumed to have held in the past. The way to do this is to first predict the probability that a pixel is forested using the model based on the current pattern. Then arrange pixels in rank order. A simple simulation can be made by taking them in this order and deforesting in steps up to the current level.
mod<-glm(for2005~cost2*beam1+slope,data=a,family=”binomial”)
d[["pred"]]<-predict(mod,newdata=d@data)
d[["rank"]]<-rank(jitter(d[["pred"]]),na.last=”keep”)
for (i in seq(1000,90000,length=10)){
d[["defor"]]<-ifelse(d[["rank"]]<i,0,1)
image(d,”defor”,col=c(”red”,”green”))
view.3d(d,1,exag=1.5,14,pal=c(”red”,”green”))
filename <- paste(”pic”, sprintf(”%07.0f”, i),”.png”,sep=”")
rgl.snapshot(filename)
}
Now there are still some quite big issues with this. Although a reasonable match to the current pattern is provided, it is not perfect. This is probably due to nucleation effects of settlements. I will show how this can be added into the analysis and used to look at future deforestation risks in a later post.
Deforestation in La Sepultura
The animation above first shows an accessibility index calculated as a cost surface with the entrance to the watershed as a starting point. There is an excellent “How To” on calculating cost surfaces in GRASS available here. The cost surface (warm colours accessible, cold colours inaccessible) is followed by an animation of deforestation estimated by supervised classification of a series of landsat images taken in1975, 1987, 1999 and 2003. The area is a focus case study, the Tablon watershed in the Sepaltura biosphere reserve. The set of images in Google Earth format are available here.
Landsat imagery is the most frequently used for tracing patterns of deforestation over time. Landsat has been acquiring coverage of the Earth’s surface since 1972 when Landsat 1 was launched. Since then, four other satellites have been in operation. Landsat 1, 2, and 3 flew in a circular orbit 913 kilometers (570 miles) above the Earth’s surface and circled the Earth every 103 minutes (14 times a day). Landsat 4 and 5 fly about 705 kilometers (440 miles) above the Earth and circle every 98 minutes.
Landsat 4 and 5 are still operating. Landsat 6 was launched in 1993. Thus the sensors used to provide images have changed over time and are not easily comparable. The current Thematic Mapper (TM) class of Landsat sensors began with Landsat-4 (1982). This series continued with the nearly identical sensor on Landsat-5 (1984). Landsat-7 Enhanced Thematic Mapper Plus (ETM+) was carried into orbit in 1999. Currently, both the Landsat-5 TM and the Landsat-7 ETM+ are operational and providing data. Landsat-7 ETM+ experienced a failure of its Scan Line Corrector mechanism in May 2003. Data products have been developed to fill these gaps using other ETM+ scenes.
The images used for this analysis were downloaded from the global land cover facility and processed in GRASS. We are currently engaged in a classifying deforestation over a much larger area using similar imagery. Quantifying patterns of deforestation from these historical satellite images is never straightforward, for many reasons. Differences in the conditions when the images where taken can lead to misclassification of pixels leading to errors when documenting the overall spatial pattern. However even though it can be difficult to tell with certainty if a specific small area has changed, the general regional trends detected are quite reliable.
Recently high resolution images of this particular study region have become available and we are acquiring more. It is encouraging to find that at least the classification into “forest-non-forest” that has been achieved using Landsat imagery provides a very acceptable match with the visual impression provided by high resolution images. This can be seen below. You can check your own visual impression of the match by downloading the KML files and changing the transparency in Google Earth. The following lines load the layers into R.
urlstring<-”ftp://anonymous@200.23.34.16/simulacion/Datos”
a<-paste(urlstring,”tablondefor.rob”,sep=”/”)
load(url(a))






