Duncan Golicher’s weblog

“Raster” data in PostGIS?

Posted in Linux, POSTGIS, R scripts, Uncategorized by Duncan Golicher on July 5th, 2008


PostGIS does not support raster data as yet. Hopefully there should be raster support soon. Raster layers can be visualised from a local or remote server using geoserver. However having raster layers integrated into a PostGIS data base would be extremely useful for tasks such as overlaying herbarium records with continuous extrapolated climate data or elevation.

I have previously experimented with importing the points of a raster layer. This works quite well. It can be used for simple spatial overlays using proximity operations. However the visual impression produced by the points is not very satisfactory and a query using the contains operator would be more efficient.

QGIS has a plugin for graticule generation. This forms a grid of individual polygons. So if a grid is produced using the graticule plugin of QGIS and then imported into PostGIS using SPIT it is possible to partially mimic a raster layer, although with some computational overhead.

The document mnp-workshop3.pdf showed how worldclim data (http://www.worldclim.org/) can be imported into GRASS and R.

If R is run from GRASS the following lines can be used to import the appropriate raster layers at a chosen resolution into R.

Most of the lines of the script are concerned with calculations on the basic worldclim data. The dataset consists of 36 grids of monthly mean minimum temperature, maximum temperature and precipitation. Some mean annual values may be derived for simplicity.

a<-c(paste(”tmin”,1:12,sep=”"),paste(”tmax”,1:12,sep=”"),paste(”prec”,1:12,sep=”") ,”alt”)
d<-readRAST6(a)
d@data[,1:24]<-lapply(d@data[,1:24],function(x)x/10)
fullgrid(d)<-FALSE

f<-function(x)mean(x[1:12])
d@data[["meanmim"]]<-apply(d@data,1,f)

f<-function(x)mean(x[13:24])
d@data[["meanmax"]]<-apply(d@data,1,f)

f<-function(x)mean(x[1:24])
d@data[["meantemp"]]<-apply(d@data,1,f)

f<-function(x)sum(x[25:36])
d@data[["totprec"]]<-apply(d@data,1,f)

The trick for importing the layers to PostGIS from R is shown below. If the data is in R as a spatialPixelDataFrame the coordinates can be added to the data to form a table.

dd<-data.frame(coordinates(d),d@data)

This table can then be saved to a PostGIS database using ODBC.

library(RODBC)
con<-odbcConnect(”mydb”)
odbcQuery(con,”SET search_path =chiapas, pg_catalog;”)

sqlSave(con,dd,”worldclim”)

###########################################

Now either in pgadmin or running the queries directly through R by placing them inside an odbcQuery function.

ALTER TABLE chiapas.worldclim add column gid serial NOT NULL;
ALTER TABLE chiapas.worldclim ADD CONSTRAINT worldclim_pkey PRIMARY KEY(gid);
ALTER TABLE chiapas.worldclim ADD COLUMN the_geom geometry;

UPDATE chiapas.worldclim
SET the_geom = PointFromText(’POINT(’ || s1 || ‘ ‘ || s2 || ‘)’,4326);

CREATE INDEX
grat30arc_idx_geo on grat30arc USING GIST(the_geom GIST_GEOMETRY_OPS);

CREATE INDEX
worldclim_idx_geo on chiapas.worldclim USING GIST(the_geom GIST_GEOMETRY_OPS);

If a polygon graticule called grat30arc covering the same region with a 30 arch second grid has been produced using the graticule plugin in Qgis and imported into the public schema then the average values of the points within each grid square can then be calculated for the graticule and a new table produced using this as a model query.

CREATE TABLE chiapas.worldclim2 with oids as
SELECT a.*, g.totprec, g.meanmin, g.meanmax, g.elev from
grat30arc a,

(SELECT AVG(c.totprec) as totprec,
AVG(c.meanmim) as meanmin,
AVG(c.meanmax) as meanmax,
AVG(c.alt) as elev,d.gid
FROM chiapas.worldclim c, grat30arc d
WHERE c.the_geom && d.the_geom
AND contains(d.the_geom, c.the_geom) group by d.gid) g

WHERE a.gid=g.gid;

The result is very similar to the visualisation of an arcgrid in arcview when it is viewed in QGis over a shaded relief map. In this case the pixels are approximately 1 km x 1 km (30 arc seconds or 0.008333333 degrees) and the elevation lines up well with the contours derived from a 1:250,000 map.

Spatial indices on the herbarium table and the worldclim2 layer with the following

CREATE INDEX
herb_idx_geo on herb.herbario USING GIST(the_geom GIST_GEOMETRY_OPS);

CREATE INDEX
worldclim2_idx_geo on chiapas.worldclim2 USING GIST(the_geom GIST_GEOMETRY_OPS);

Then the following query runs in less than four seconds to provide 6,000 herbarium records with extrapolated climate data to the nearest km.

SELECT h.*, w.totprec, w.elev,w.meanmin,w.meanmax
FROM herb.herbario h, chiapas.worldclim2 w
WHERE h.the_geom && w.the_geom
AND contains(w.the_geom, h.the_geom);

Or

select h.*, w.totprec, w.elev,w.meanmin,w.meanmax
from herb.herbario h, chiapas.worldclim2 w
where ST_contains(w.the_geom, h.the_geom);


Tagged with: , , , ,

An example of using PostGIS from R

Posted in POSTGIS, R scripts, Uncategorized by Duncan Golicher on April 19th, 2008

One of the interesting elements of PostGIS is that it can be run from R using RODBC. I have found this to be very useful, especially when tables are being altered, as it allows the properties of the results of a query to be quickly analysed and checked for sanity before being added to the data base. So in order to work with POSTGIS I usually open QGIS, PgAdmin and an R console and switch between them.

Here is an example of how you might calculate the population density within polygons and add the results to a PostGIS table.

In R first connect to a data base assuming an ODBC connector has been set up

library(RODBC)
con<-odbcConnect(”mydb”)

Now the example will find the sum of the population within polygons representing watersheds. This layer was originallybuilt using r.watershed in GRASS followed by r.to.vect. The automated watershed limitation doesn’t work very well along the coast, but provides a useful set if polygons for the rest of Chiapas, particularly the central valley. First add a new column to hold the result.

odbcQuery(con,”ALTER TABLE chiapas_basins ADD COLUMN poptot int4;”)

Then the query that calculates the total population can be run from R. The result can by looked at first in R to check that it is sensible. This is useful, as I found differences between using the 2000 census data and the 2005 population count that are not attributable to population change or migration, but rather are caused by errors in the geopositioning of the data.

sql<-”select sum(m.z1) as poptot, b.id from chiapas_conteo_2005 m, chiapas_basins b where m.the_geom && b.the_geom AND contains(b.the_geom, m.the_geom) group by b.id;”

d<-sqlQuery(con,sql)

This takes a while to run. I quickly discovered that it is extremely important to make sure that a GIST index is added before attempting point in polygon operations. This can be added to each table in PgAdmin, for example.

CREATE INDEX chiapas_basins_idx_geo on chiapas_basins USING GIST(the_geom GIST_GEOMETRY_OPS);

Now histograms and totals of the population in the watersheds can be produced in R to look at the results before adding them back into the data base.

For example

hist(log(d$poptot))

The population density is typically log normal.

sum(d$poptot)
4255957

This gives Chiapas a total population of 4.25 million which looks about right. As the query returned the results with a column name that has already been set up and has the primary key this line will update the database.

sqlUpdate(con,d,”chiapas_basins”)

Now we need to calculate the area of the watersheds in km2. To do this I first had to insert an srid for Albers conic projection for North America. This can be found here

http://www.spatialreference.org/ref/epsg/102008/

Then some more queries that can either be run in pgAdmin or wrapped up by odbcQuery(con, “some sql statements “) and run from R directly.

odbcQuery(con,”ALTER TABLE chiapas_basins ADD COLUMN areakm2 float4;”)
odbcQuery(con,”UPDATE chiapas_basins SET areakm2 = area(transform(the_geom,9102008))/1000000; “)

odbcQuery(con,”ALTER TABLE chiapas_basins ADD COLUMN density float4;”)
odbcQuery(con,”UPDATE chiapas_basins SET density = poptot/areakm2; “)

The results can then be visualised in QGIS. Whether this is easier than using ARCGIS certainly depends on what you are used to. I am still working out how to achieve fairly basic operations like this in PostGIS. However when I have achieved them once, it is then easy to repeat the operation for other examples, which is why I am posting fairly simple examples like this to the weblog for reference in case they can help others to overcome the intially steep learning curve with PostGIS. Although the geoprocessing wizard in ArcView is very simple to use and the point in polygon operation does seem to run much faster than the corresponding query in PostGIS, this particular operation would still require some additional work with the database tables if it were to be done using ArcView alone.

Tagged with: , ,

Code 9

Posted in Family, R scripts by Duncan Golicher on March 19th, 2008

During the holiday Mickey and I have been watching a program called brainbox challenge on the BBC. Neither of us can keep up with any of the competitors, although Mickey sometimes beats them to it on the memory games. One of the games is called code 9. The rules are (fairly) simple. The competitors are given a mathematical operation to perform on two numbers. However each digit of the numbers has first to be coded by subtracting it from 9. Thus 1 becomes 8; 12, becomes 87 etc. After completing the operation the competitors have to apply the code again to the answer. So 1 +1 becomes 8+8=16, which is then recoded to give the answer 83. As I just can’t get my brain to do it unaided I thought of writing a function in R to help.

This line carries out the operation on whole positive numbers.

code9<-function(x)as.numeric(paste(unlist(lapply(strsplit(as.character(x),”"),
function(x)9-as.numeric(x))),collapse=”"))

so

code9(code9(10)+code9(10))
[1] 821

This works as a one liner, but there should be a better way requiring fewer steps within the function using a regular expression. 

Blue marble and Landsat in R

Posted in GRASS scripts, Linux, R scripts by Duncan Golicher on March 13th, 2008

bluemarble.pnglandsat.png

R is of course a statistical environment, so it really shouldn’t be expected to be doing the job of GRASS (Nviz), Google Earth, WorldWind, Qgis or Udig as well. However some nice visualizations can be done with rgl, including rgb composites from satellite images. They can even be more useful for quick regional overviews than the alternatives mentioned above.

The code to produce these images be found here bluemarble3d.doc. The first line will download the imported spatial object from this site. R does need a lot of memory for images of this size, so I guess you need a minimum of 1GB RAM (I am using a Toshiba Tecra laptop with Nvidia running Ubuntu Feisty 7.04 with 2GB). Given that, you should get very own zoomable 3d image of Mexico and Central America to play with in the same time it takes the screenshot below to run. It shows the steps in real time. (For R beginners, just open the file, paste all the code into the R console and wait a little for the download). The resolution is quite coarse (2 minute) so this is only suitable for a regional overview. You will need the rgl and sp packages installed first of course.

bluemarble3d.doc

Exactly he same can be done with Landsat imagery. landsat.doc

If you want to get your own coverages for this for your own region the starting point is to run R from GRASS in a lat-long location. For example

g.region -p

projection: 3 (Latitude-Longitude)
datum: wgs84
ellipsoid: a=6378137 es=0.006694379990141317
north: 32:51:47.86586N
south: 5:28:57.975592N
west: 118:30:20.609692W
east: 76:26:43.294846W
nsres: 0:02:00.060768
ewres: 0:01:59.982024
rows: 821
cols: 1262
cells: 1036102


Then get some of the Blue Marble (Modis) imagery that can be downloaded from the NASA wms strait into GRASS using r.in.onearth. From within R you can use system to run GRASS. This does work with GRASS under Cygwin in Windows with the shell command.

system(”r.in.onearth -b file=/tmp/test month=Apr time=2005-3-24 ‘wgetopt=-c -t 5 –user-agent=MSIE5.5′ “)

The layers can then be imported into R with spgrass6.

library(spgrass6)

d<-readRAST6(c(”BMNG_Apr.red”,”BMNG_Apr.blue”,”BMNG_Apr.green”,”alt”))

Notice that this line imported a digital elevation model of my own as well, as one is of course needed for the 3d terrain effect.

Disturbance and the Lotka Volterra competition model

Posted in Evidence and Ecology, R scripts, Uncategorized by Duncan Golicher on March 12th, 2008

lotka-volterra.png

One of the works that provided constant inspiration both as an undergraduate and graduate student was Huston’s huge 1994 monograph. Despite some inevitable repetition in a book of this length I have continued to find new ideas in it over the years. The book contained a very simple demonstration of the intermediate disturbance hypothesis using the Lotka Volterra competition equations that I continue to use in classes. It can be implemented very concisely in R. The idea is simple. Reset the classic competition equations at periodic intervals and see how it affects the balance between species with r or K strategies. A very simple model with important implications.

Source here …lotka-volterra.doc


R<-c(0.1,0.12,0.15,0.2,0.3)

K<-c(1000,800,600,200,100)

Run<-function(runtime=100,freq=101,intensity=0.1){
N<-numeric(runtime*5)
N<-matrix(N,nrow=runtime,ncol=5)
N[1,]<-10

for (i in 2:runtime){
Totpop<-sum(N[i-1,])
for (j in 1:5){
N[i,j]<-N[i-1,j]+N[i-1,j]*R[j]*(1-Totpop/K[j])
}

if(!i%%freq){
N[i,]<-N[i,]*intensity
}
}
res<-data.frame(N)
res
}
#####################################
par(mfcol=c(2,2))
res<-Run()
matplot(1:100,res,type=”l”,xlab=”Tiempo”,ylab=”Poblaciones”,lwd=2)
title(”Sin perturbacion”)
res<-Run(freq=60)
matplot(1:100,res,type=”l”,xlab=”Tiempo”,ylab=”Poblaciones”,lwd=2)
title(”Perturbacion infrequente”)
res<-Run(freq=25)
matplot(1:100,res,type=”l”,xlab=”Tiempo”,ylab=”Poblaciones”,lwd=2)
title(”Perturbacion intermedio”)
res<-Run(freq=12)
matplot(1:100,res,type=”l”,xlab=”Tiempo”,ylab=”Poblaciones”,lwd=2)
title(”Perturbacion frequente”)

Reference


Huston, M.A. 1994. Biological Diversity: The Coexistence of Species on Changing Landscapes. Cambridge University Press, 708 pp.

Aggregating time series in R: The Iraq body count

Posted in Current affairs, Personal and family, R scripts, Uncategorized by Duncan Golicher on March 7th, 2008

Yesterday I was asked whether R could be used to analyse time series. The answer is of course it can. R is used extensively in the financial sector for analysing complex time series such as stock prices. I have already included an example using R in the context of climate variability (El Niño). One challenge is that there are a lot of different ways of working with time series and representing dates. Aggregation can be rather tricky. The zoo package is one of the most powerful tools for working with time series, but it is not always simple to use. I still haven’t got my head around all the different ways to achieve results.

So here is an example. The code first reads in the data from an online database, then uses tapply to sum the number of casualties per day. Zoo is then used to aggregate by month and the total is plotted. If anyone reading this can suggest better ways to do this or add a more sophisticated analysis I would like to know.

(Open this document if the code doesn’t work due to problems with quotation marks bodycount.doc)

Or try this  …

source(url(”http://duncanjg.files.wordpress.com/2008/03/bodycount.doc”)) ,

again you might need to retype the quotation marks)

library(zoo)

library(date)

a<-”http://www.iraqbodycount.org/database/download/ibc-incidents”
d<-read.csv(url(a),skip=11,header=T)
a<-tapply(d$Reported.Minimum,d$Start.Date,sum)
x <- zoo(as.vector(a), as.POSIXct(as.date(row.names(a))))
f<-function(x)as.POSIXct(as.yearmon(x))
Deaths<-aggregate(x,f, sum)
par(bg=”lightgrey”)
plot(Deaths,lwd=3,col=2)

deaths.png

However you get there, the result is shocking. These are documented civilian casualties and I chose the lower estimate.

Even if the trend was initially downwards after the start of the surge, it still only bottomed out at around the level it began at when the “mopping up” operations were taking place in the first few months after the invasion. The time series taken from the compiled online data base stops in January 2008, so it doesn’t take into account the recent renewal of violence. Today, Thursday 6 March there were 86 civilian dead. Two bomb attacks killed 68 in Baghdad alone. Apparently this is not an isolated incident. The trend is sadly upwards again.

Joseph Stiglitz has estimated the price of the war at 3 trillion dollars. I wasn’t even sure what a trillion was until he used the figure. It has twelve zeros, in other words a thousand billion. That works out at 30 million dollars for each dead Iraqi civilian or enough to make ever one of these people as rich as Bill Gates. No further comment, apart from a heartfelt request to visit the site of those who have worked so hard to compile this important data set and re-run the code periodically to check the updated figures.

As an addition, I was saddened to hear that Harvard professor Samantha Power has resigned from Obama’s campaign team this week, apparently for speaking too openly on this, among other, issues. I was extremely impressed by her thoughtful, yet emotional, contribution to BBC radio’s start the week in which she talked of the biography she has written on Sergio Vieira de Mello, a heroic figure who constantly impressed me in every interview I heard with him up to his tragic death in Iraq. The recent news suggests that honest, open minded expression by academics is still considered to be a liability, even for apparently honest, open minded, politicians. This is a link to the Hard Talk interview.

A simple illustration of an ecological lottery model in R

Posted in Evidence and Ecology, Natural history, Probability and statistics, R scripts by Duncan Golicher on March 7th, 2008

A colleague contacted me yesterday with some interesting modelling work based on the neutral community model of Stephen Hubbell.

Hubbell’s work fascinated me when it was first published. I have always assumed that it is easily misunderstood. If Hubbell is taken to imply that all species are equal, then the work appears to be eroding the basic subject matter of ecology. Ecologists tend to look for informative differences between species. However this is not exactly what the theory is based on. It is rather more subtle. The theory concerns the equality of individuals (in this sense it almost has political implications). Hubbell writes.

“The theory treats organisms in a community as essentially identical in their per capita probabilities of giving birth, dying, migrating, and speciating. This neutrality is defined at the individual level, not the species level. All that is required is that all individuals of every species obey exactly the same rules of ecological engagement. So, for example, if all individuals and species enjoy a frequency-dependent advantage in per capita birth rate when rare, and if this per capita advantage is exactly the same for each and every individual of a species of equivalent abundance then such a theory would qualify as a bona fide neutral theory by the present definition.”

I have illustrated this idea for students with a very small simulation in R . R code, like matlab or octave, is very terse and efficient so a couple of lines can do a fair amount of work. Lets assume a very simple reproductive rule applies to all individuals. The rule is that all have the same discrete generation time, in other words they all survive one time step of a simulation. At the end of the time step each individual produces two offspring. The lottery then applies. The offspring are randomly mixed and placed back on the finite space occupied by the previous generation. So half find a home and half “die”. The process is then repeated. This was exactly how I first implemented the model. I then re-implemented the idea in fewer lines by making the probability of selection depend on the proportion of individuals in each species, which amounts to the same thing.

This is all the code needed to run the model. (Try this if the quotation marks are not right lottery.doc)


nspecies<-8
gridsize<-20
time<-100

RUN<-function(){
X<<-matrix(0,nspecies,time)
palette(rainbow(nspecies))

mat<-sample(1:nspecies,gridsize*gridsize,replace=T)
X[,1]<-table(mat)
image(matrix(mat,gridsize,gridsize),col=sort(unique(mat)))

for (i in 2:time){
a<-X[,i-1]
mat<-sample(1:nspecies,gridsize*gridsize,prob=a,replace=T)
X[,i]<-table(factor(mat,levels=1:nspecies))
image(matrix(mat,gridsize,gridsize),col=sort(unique(mat)))
gc()
}

matplot(t(X),type=”l”,lwd=2,col=rainbow(nspecies))
}

#However to get a little tcltk interface add this

############################################

library(tcltk)

Run<-function(…){
nspecies<<-as.numeric(tclvalue(”nspecies”))
gridsize<<-as.numeric(tclvalue(”gridsize”))
time<<-as.numeric(tclvalue(”time”))
RUN()}

tt <-tktoplevel()
d1<-tklabel(tt,text=”Numero de species”)
d2<-tklabel(tt,text=”Tamaño del grid”)
d3<-tklabel(tt,text=”Tiempo”)

s1 <- tkscale(tt,from=0, to=20, variable=”nspecies”,
showvalue=TRUE, resolution=1, orient=”horiz”)
s2 <- tkscale(tt, from=2, to=100, variable=”gridsize”,
showvalue=TRUE, resolution=1, orient=”horiz”)
s3 <- tkscale(tt, from=10, to=500, variable=”time”,
showvalue=TRUE, resolution=10, orient=”horiz”)

but <- tkbutton(tt, text=”Run”,command=Run)

tkgrid(d1,s1)
tkgrid(d2,s2)
tkgrid(d3,s3)
tkgrid(but)

The model is to be played with (just download R from CRAN and paste all the code into the console), rather than watched but here are some example runs on You Tube,

The moral of the story is that under a neutral model some species start a random walk to extinction if co-existing in a small space with other species. However we cannot tell which will dominate and which become extinct from their characteristics, because all individuals, regardless of species are governed by exactly the same rules. The fate of species is determined by the cumulative luck or misfortune of the individuals that have been given the label. The larger the grid the more species that can be supported for a longer time, but extinction risk is always present if a long losing streak hits. The element of Hubbell’s theory that this model does not reproduce well is the log-normal (-ish) relative abundance distribution of the surviving species, although come to think of it I have not tried running the model with a really large number of species and a really large grid. R runs out of colours for the illustration, but this is not important.

Many interesting questions remain. Could a simulation model based on this sort of idea be used to predict extinction over a real landscape? Models of this sort seem far too abstract to provide real insight into the real world. But in some sense the process that Hubbell describes in a more formal mathematical way in his book probably does apply. But, how can these sort of ideas be communicated accurately to prevent them from provoking the sort of unproductive SLOSS debate that has often resulted from theory being pushed too far?

Processing output from Global Circulation Models 2

Posted in Linux, R scripts, Uncategorized by Duncan Golicher on February 20th, 2008

After setting up cdo and netcdf you need to download some scenarios from cera.

The distribution modelling project we are working on is looking at a wide range of simulations in order to look at the effects of credible patterns of change in the global climate on tree species diversity in MesoAmerica. There are still relatively few simulations that have specifically represented the new emisions scenarios in the IPCC fourth assessment report. Many of the available model data sets were derived using the Special Report on Emission Scenarios (IPCC 2000, SRES).

The SRES data sets were classified into four different scenario families (A1, A2, B1, B2).
SRES. The A2 storyline describes a very heterogeneous world with the underlying theme of self-reliance and preservation of local identities. It results in this scenario a continous increasing population together with a slower economic growth and technological change. The Hadley Centre Circulation Model is a 3-dim AOGCM described by (Gordon et al., 2000 and Pope et al., 2000).
The atmospheric component has a 19 levels horizontal resolution, comparable with spectral resolution of T42, while the ocean component has a 20 levels resolution. HADCM3(http://www.meto.gov.uk/research/hadleycentre/models/HadCM3.html )

Here I will show a simple use way to look at the HADCM3_A2 scenario in R.

For simple interpretation and use of climate model output three variables are of most interest. Maximum temperature, minimum temperature and precipitation. A great deal of processing can be done in cdo,but here we move the precipitation data set to R using cdo to rewrite the file in netcdf format.

cdo -f nc copy HADCM3_A2_prec_1-1800.grb HADCM3_A2_prec_1-1800.nc

Now in R we can open the file using ncdf.

library(ncdf)
library(maptools)
library(maps)
library(RColorBrewer)

prec.nc <- open.ncdf(”HADCM3_A2_prec_1-1800.nc”)

x <-get.var.ncdf( prec.nc, “lon”)
y <-get.var.ncdf( prec.nc, “lat”)
z<- get.var.ncdf( prec.nc, “var61″)
time <-get.var.ncdf( prec.nc, “time”)

This reads all the data in. Notice that this global grid is quite a low resolution so the whole simulation can be read into memory in one go. Subsets of netcdf files can be taken if they are larger.

The order of the x and y vector has to be changed to be strictly ascending and the matrix rearranged accordingly.

z<-z[,73:1,]
y<-sort(y)
x[x>180]<- (-(360-x[x>180]))
xorder<-order(x)
z<-z[xorder,,]
x<-sort(x)

Now we can average the first fifty years monthly precipitation and subtract this from each month’s simulated precipitation in order to show the simulated pattern of rainfall anomalies over the next fifty years.

for (i in 600:1200){
j<-i%%12;j[j<1]<-12
a<-seq(j,600,by=12)
aa<-apply(z[,,a],c(1,2),mean)
a<-z[,,i] -aa
image(x,y,a,col=brewer.pal(8,”RdBu”),zlim=c(-10,10))
map(”world”,add=T)
title(main=paste(substr(time[i],1,4),substr(time[i],5,6)))
Sys.sleep(0.2)
}

It is not easy to draw conclusions from this visualization alone. However what seems clear is that the largest anomalies and fluctuations in rainfall patterns (areas of dark red or dark blue) are concentrated in the tropics. This is in part due to the fact that the shading represents absolute variation. The tropics receive more rainfall, thus variability is naturally higher. What would be useful would be look in more detail of the pattern for different areas of the global simulation and contrast them after taking into account natural interannual variability.

In order to look at the trend for any part of the world we can make a small function in R that will select a pixel from the map using the locator function and then use stl to produce a seasonally adjusted analysis similar to the one used here.


plot.trend<-function(){
image(x,y,z[,,1],col=brewer.pal(8,”RdBu”)[8:1],zlim=c(-10,10))
map(”world”,add=T)
l<-locator(1)
a<-z[which(abs(l$x-x)==min(abs(l$x-x))),which(abs(l$y-y)==min(abs(l$y-y))),]
a<-ts(a,start=c(1950,1),frequency=12)
plot(stl(a,s.window=12))
}

Select Chiapas.

simulated-trend-in-rainfall-hadcm3-a2-chiapas.png

The units are in mm per day. Thus the third graph represents the annual trend after seasonal effects (second panel) are extracted. At present averaged over the Chiapas grid square annual rainfall averages just over 3 mm per day ( 1000 - 1200 mm per year) . There is around 40% variability in this between years. The simulation suggests that in general terms the pattern of variability will be maintained for the next fifty years. However the end of the simulation shows a rather sudden, sharp drop in rainfall by the end of the century.

There is a great deal of uncertainty in climate model simulations, not only regarding the amount of greenhouse gas forcing but also the complex patterns of feedbacks in the global systems. An alternative scenario from HADCM3 (the B2 scenario) shows a slightly different pattern for Southern Mexico.

simulated-trend-in-rainfall-hadcm4-b2-chiapas.png

Tagged with: , , , ,

A forest inventory problem

Posted in R scripts by Duncan Golicher on February 13th, 2008

Sometimes actions that are easy using “group by” in SQL aren’t obvious in R or appear to need too many lines of code using “tapply”, “aggregate” or “by” in R.

Splitting and substituting back into the original data frame can work neatly in some cases.

Take this example that is also mentioned in the document available here. A study measures the diameter at breast height and the heights of trees. The trees may have multiple stems. The researcher wants to work with the total sum of the basal area per individual, the maximum height of the tallest stem and the maximum diameter of the stems.

A trial data set can be simulated with.

cuad_Id<-sort(sample(1:180,replace=T,1000))
esp_Id<-sample(1:10,replace=T,1000)
ind_Id<-1:1000
dap<-round(rnorm(1000,30,5),1)
alt<-round(dap/2+rnorm(1000,0,2),2)
tallos<-data.frame(cuad_Id,ind_Id,esp_Id,dap,alt)

Add multiple stems per individual.

tallos2<-tallos[sample(1:1000,500,rep=T),]
tallos2$dap<-round(rnorm(500,30,5),1)
tallos2$alt<-round(tallos2$dap/2+rnorm(500,0,2),2)

tallos<-rbind(tallos,tallos2)
tallos<-tallos[order(tallos$ind_Id),]
row.names(tallos)<-1:1500

So the simulated data looks like this …

fiig141.png

The basal area (ab) in m² per stem can be calculated using ..

tallos<-data.frame(tallos,ab=pi*(tallos$dap/200)^2)

To achieve the result substitute the sum of the basal area for each individual using split and tapply.

split(tallos$ab,tallos$ind_Id)<-tapply(tallos$ab,tallos$ind_Id,sum)

Use the same method to substitute the maximum heights and diameters

split(tallos$alt,tallos$ind_Id)<-tapply(tallos$alt,tallos$ind_Id,max)
split(tallos$dap,tallos$ind_Id)<-tapply(tallos$dap,tallos$ind_Id,max)

Now just remove duplicated rows.

tallos<-tallos[!duplicated(tallos),]

Done.

Tagged with: , , ,

Diversidad de especies

Posted in R scripts by Duncan Golicher on February 12th, 2008

Recientemente me tocaba dar un par de clases a los estudiantes de maestria en el curso de Ecologia general sobre como medir la diversidad de especies. La materia es disponible aqui …

clasediversidad1.pdf

Nota: Exactamente 30 minutos antes de escribir este mensaje a las 6:52 am sentimos un temblor media fuerte en San Cristóbal que se prolongó durante mas de dos minutos.