Duncan Golicher’s weblog

Landsat paths and rows for Southern Mexico

Posted in Uncategorized by Duncan Golicher on July 8th, 2008
I have just imported a Landsat path and row shapefile into PostGIS from http://landsat.usgs.gov/tools_csf.php. I thought this might have some general utility when looking for images (for example from http://www.landsat.org/ortho/ortho_max.htm) , although there are of course very many sites (eg GLCF) that have the same information.
Landsat paths and rows

Landsat paths and rows

Mexico Landsat paths and rows

Mexico Landsat paths and rows

C

C

Tagged with: , , ,

Priority areas for dry forest restoration in La Frailesca

Posted in Climate change, Evidence and Ecology, POSTGIS, Uncategorized by Duncan Golicher on July 8th, 2008

This is a marker for work in progress. Do not cite.

This morning I was contacted in order to provide geographical information for one of our partners in the Reforlan project. The aim of the research is to find priority areas for forest restoration.

The question of how to go about the analysis is thought provoking. I have already mentioned the problem of aggregation error in this weblog. Pooling data in order to find mean values does not work at the scale in which most of the important processes act. The classic example is mean rainfall for Mexico. At the same time, some elements on the landscape cannot be mapped at a fine scale, and others, such as management units (farms or ejidos) form natural aggregation according to social factors.

So perhaps the first step would be determine priority pixels according to a limited number of physical processes that can be resonably accurately mapped at fine resolution (30 m x 30 m) then aggregate to find the number of pixels in each agronomic unit that could be restored. An analysis of social and economic factors at this scale could then be undertaken in order to discover where restoration might be feasible and bring the most social benefit.

The mappable factors at the 30m pixel scale would be

1. Forest cover and recent deforestation: Pixels that have forest cover presumably do not need restoring. Pixels that have been recently deforested may be prioritised, although natural regneration may be a suitable mechanism in some areas close to the original forest.

2. Slope: Areas with steep slopes must be prioritised for restoration in order to avoid soil erosion. Factors such as slope length may be taken into account.

3. Insolation (aspect): South facing slopes that receive intense insolation during the dry winter months may be in need of restoration, but successful restoration may be more challenging.

4. Distance from rivers: Riparian areas may be in greater need of restoration and trees may establish more successfully in moist soil. At the same time these may be the most valuable areas for irrigated  agriculture.

5. Conservation status: Reserves should perhaps take priority when restoration is being considered.

The interesting element of the research involves finding a suitable way of combing the layers in order to allow ranks to be calculated. A typical approach is weighted averaging, but this may not allow interactive effects to be included.

The map below shows the estimated current extent of the forest in the region and recent deforestation hotspots.

Shaded areas are federal reserves.

Forest cover in La Frailesca: Red and yellow areas are recently deforested: Shaded areas are federal reserves.

Areas with the steepest slopes do tend to coincide with remaining forest cover, as would be expected, but some recent deforestation both intentionally and as a result of forest fires has taken place in the steeply sloping “Sierra” region in the West and South West  and South of the study region.

Slope in degrees (high to low is red through to green passing through blue)

Slope in degrees (high to low is red through to green passing through blue)

North facing slopes receive less direct sunlight during clear winter days and are more likely to be covered in evergreen forest. Fuel dries out on south facing slopes leading to more intense fires. Regeneration may be difficult due to hydric stress.

Yellow pixels are level areas that receive the \

Winter insolation: Yellow pixels are level areas that receive the \


Most deforestation has occured close to the major rivers.

Buffers around major rivers

Buffers around major rivers

It may also be useful to evaluate soil type (not shown) although the maps tend to be at a coarse scale.

Once pixels have been ranked according to aptitude and priority for restoration (the two factors may be inversely related in many cases) they may be aggregated to agricultural management units for further analysis of socio-economic factors.

A range of social economic parameters may be associated with agricultural management units (agebs)

A range of social economic parameters may be associated with agricultural management units (agebs)

Animación localidades de Bosque Mesofilo de Montaña en Chiapas

Posted in Uncategorized by Duncan Golicher on July 8th, 2008

El archivo arriba es un gif animado. Haz clic en el mapa para ver la animación, o guardarlo localmente. Nota que los tonos desde amarillo a azul muestra la pluviosidad sobrepuesto.

Tagged with: , ,

Do we see trees or forests?: Knowledge gaps

Posted in Evidence and Ecology, POSTGIS, Uncategorized by Duncan Golicher on July 6th, 2008

It would seem reasonable to assume that research aimed to fill a basic knowledge gap should receive more attention than projects aimed at evaluating complex untestable theory. However in the field of tropical forest ecology it not always clear that fundamental basic knowledge gaps are either being identified or filled quickly enough. An issue that has concerned me for many years is the persistent difficulty in accurately tracing and understanding the spatial distribution of tropical organisms, particularly trees and other plants.

Trees are immobile, long lived, highly visible elements of the tropical landscape. It would therefore seem reasonable to assume that we already know a great deal regarding their distribution. However trees form forests. Forests cause numerous logistical difficulties for field research.  Tropical forests can be hostile, disorienting, pathless environments. They also contain a complex set of natural resources. Illegal exploitation of some of these resources such as valuable timber or plants with interesting pharmaceutical properties (of various types) may obstruct scientific research. Many tropical forests have also provided refuge at times for members of radical political movements.

Tropical trees can be extremely difficult to identify to species. Flowers, fruits and leaves may be out of reach. Their taxonomy is complex and dynamic. Tropical families are unfamiliar to researchers trained in temperate zones. Many species have converged in terms of leaf shape and characteristics.

This has led to chronic ignorance of the distribution of tropical tree species. Only a small fraction of the total area covered by tropical forest of any description has been studied in any detail. Regional species lists rely on data from a few well known areas, but within any given region distribution patterns are poorly known. Many assumptions are made concerning the nature of anthropogenic influence on areas of forest. Some are justifiable, some tend to exaggerate human impact on naturally resilient forest ecosystems.

Given the poor state of our knowledge it is vitally important that the few reliable data sources that are available are used to their full capacity. However even the best global data sources can only provide a small part of the picture alone. Missouri Botanical Gardens (MOBOT) has one of the largest collections of tropical plant specimens of any herbarium. MOBOT has provided online access to most of this data. Yet even when the whole set of tropical tree collections with known coordinates from this source are mapped on to space large gaps are revealed in the empirically based knowledge they provide regarding tree diversity.

A PostGIS query can be used together with the graticule trick mentioned in the last post to map the density of collections within 0.1 degree (approximately 100 km2) grid squares in the MOBOT Vast data base. The query is provided as a model for counting incidences within grid squares. The data itself is stored locally and forms part of a larger set that we have collated over a period of several years.
CREATE TABLE meso.count with oids as
SELECT g.ncollects as Ncollects, a.*  from

grat01 a,

(SELECT d.gid, count(c.id) as ncollects
FROM meso.trees c, grat01 d
WHERE c.the_geom && d.the_geom
AND contains(d.the_geom, c.the_geom)
AND c.source like ‘MOBOT%’
group by d.gid) g

WHERE a.gid=g.gid;

If it is assumed that 100 collections per 100 km2 is a minimum needed to provide a reasonable picture of the distribution of tree species (there may be more than 400 species in each grid square in some regions) the map below shows that only a fraction of the area  of mesoamerica can be considered to be well enough typified by this data set to allow distribution patterns to be mapped through the data alone. This contrasts sharply with the floristic atlases available for most of Western Europe based on known occurrences.

Even when the grid squares with less than 50 collections (far fewer than needed) are added to the map there are many completely unknown areas when this data source is taken alone. Where collections have been taken the density is generally below 5 per 100 km2. Direct empirical evidence regarding tree diversity hot spots is simply not available. Comparisons  between Chiapas, Yucatan, Belize and Costa Rica cannot be made from the data alone.

Light green 1-2, dark green 2-5, blue 5- 50, red >50

Light green 1-2, dark green 2-5, blue 5- 50, red >50

In contrast to our poor knowledge regarding forest composition, our knowledge regarding the extent of both forests and deforestation is becoming ever more accurate and detailed as high resolution satelite imagery becomes available. It appears that we are looking at forests, but not seeing the trees.

“Raster” data in PostGIS?

Posted in Linux, POSTGIS, R scripts, Uncategorized by Duncan Golicher on July 5th, 2008


PostGIS does not support raster data as yet. Hopefully there should be raster support soon. Raster layers can be visualised from a local or remote server using geoserver. However having raster layers integrated into a PostGIS data base would be extremely useful for tasks such as overlaying herbarium records with continuous extrapolated climate data or elevation.

I have previously experimented with importing the points of a raster layer. This works quite well. It can be used for simple spatial overlays using proximity operations. However the visual impression produced by the points is not very satisfactory and a query using the contains operator would be more efficient.

QGIS has a plugin for graticule generation. This forms a grid of individual polygons. So if a grid is produced using the graticule plugin of QGIS and then imported into PostGIS using SPIT it is possible to partially mimic a raster layer, although with some computational overhead.

The document mnp-workshop3.pdf showed how worldclim data (http://www.worldclim.org/) can be imported into GRASS and R.

If R is run from GRASS the following lines can be used to import the appropriate raster layers at a chosen resolution into R.

Most of the lines of the script are concerned with calculations on the basic worldclim data. The dataset consists of 36 grids of monthly mean minimum temperature, maximum temperature and precipitation. Some mean annual values may be derived for simplicity.

a<-c(paste(”tmin”,1:12,sep=”"),paste(”tmax”,1:12,sep=”"),paste(”prec”,1:12,sep=”") ,”alt”)
d<-readRAST6(a)
d@data[,1:24]<-lapply(d@data[,1:24],function(x)x/10)
fullgrid(d)<-FALSE

f<-function(x)mean(x[1:12])
d@data[["meanmim"]]<-apply(d@data,1,f)

f<-function(x)mean(x[13:24])
d@data[["meanmax"]]<-apply(d@data,1,f)

f<-function(x)mean(x[1:24])
d@data[["meantemp"]]<-apply(d@data,1,f)

f<-function(x)sum(x[25:36])
d@data[["totprec"]]<-apply(d@data,1,f)

The trick for importing the layers to PostGIS from R is shown below. If the data is in R as a spatialPixelDataFrame the coordinates can be added to the data to form a table.

dd<-data.frame(coordinates(d),d@data)

This table can then be saved to a PostGIS database using ODBC.

library(RODBC)
con<-odbcConnect(”mydb”)
odbcQuery(con,”SET search_path =chiapas, pg_catalog;”)

sqlSave(con,dd,”worldclim”)

###########################################

Now either in pgadmin or running the queries directly through R by placing them inside an odbcQuery function.

ALTER TABLE chiapas.worldclim add column gid serial NOT NULL;
ALTER TABLE chiapas.worldclim ADD CONSTRAINT worldclim_pkey PRIMARY KEY(gid);
ALTER TABLE chiapas.worldclim ADD COLUMN the_geom geometry;

UPDATE chiapas.worldclim
SET the_geom = PointFromText(’POINT(’ || s1 || ‘ ‘ || s2 || ‘)’,4326);

CREATE INDEX
grat30arc_idx_geo on grat30arc USING GIST(the_geom GIST_GEOMETRY_OPS);

CREATE INDEX
worldclim_idx_geo on chiapas.worldclim USING GIST(the_geom GIST_GEOMETRY_OPS);

If a polygon graticule called grat30arc covering the same region with a 30 arch second grid has been produced using the graticule plugin in Qgis and imported into the public schema then the average values of the points within each grid square can then be calculated for the graticule and a new table produced using this as a model query.

CREATE TABLE chiapas.worldclim2 with oids as
SELECT a.*, g.totprec, g.meanmin, g.meanmax, g.elev from
grat30arc a,

(SELECT AVG(c.totprec) as totprec,
AVG(c.meanmim) as meanmin,
AVG(c.meanmax) as meanmax,
AVG(c.alt) as elev,d.gid
FROM chiapas.worldclim c, grat30arc d
WHERE c.the_geom && d.the_geom
AND contains(d.the_geom, c.the_geom) group by d.gid) g

WHERE a.gid=g.gid;

The result is very similar to the visualisation of an arcgrid in arcview when it is viewed in QGis over a shaded relief map. In this case the pixels are approximately 1 km x 1 km (30 arc seconds or 0.008333333 degrees) and the elevation lines up well with the contours derived from a 1:250,000 map.

Spatial indices on the herbarium table and the worldclim2 layer with the following

CREATE INDEX
herb_idx_geo on herb.herbario USING GIST(the_geom GIST_GEOMETRY_OPS);

CREATE INDEX
worldclim2_idx_geo on chiapas.worldclim2 USING GIST(the_geom GIST_GEOMETRY_OPS);

Then the following query runs in less than four seconds to provide 6,000 herbarium records with extrapolated climate data to the nearest km.

SELECT h.*, w.totprec, w.elev,w.meanmin,w.meanmax
FROM herb.herbario h, chiapas.worldclim2 w
WHERE h.the_geom && w.the_geom
AND contains(w.the_geom, h.the_geom);

Or

select h.*, w.totprec, w.elev,w.meanmin,w.meanmax
from herb.herbario h, chiapas.worldclim2 w
where ST_contains(w.the_geom, h.the_geom);


Tagged with: , , , ,

Visualising deforestation:Marques de Comillas

Posted in Climate change, Evidence and Ecology, POSTGIS, Uncategorized by Duncan Golicher on July 2nd, 2008

Our ongoing work with Conservation International has led to the first regional forest cover change map that has been derived using a single consistent methodology. The details of the study have not yet been published. However as we work on validation and interpretation of the results a pattern has become very clear.

Previous deforestation studies in Chiapas have usually concentrated attention on well defined study areas. This could produce the impression that deforestation is a homogeneous process over the whole state. In fact study areas selected for deforestation analysis are usually those with the highest rates when compared to the rest of the region. This is quite natural. Many of the areas where deforestation is no longer occurring have already lost a large proportion of their forest cover. However the overall regional deforestation we have quantified is considerably lower than previous studies imply.

Where deforestation has followed the classic pattern of forest conversion to permanent agriculture or pasture the CI methodology, which is based on Landsat imagery, has provided a remarkably good match with high resolution imagery. However where chronic, low level forest disturbance takes place the overall impact of human activities are much less easily quantified at the resolution of Landsat imagery. The difficulty in accurately evaluating forest cover change increases in areas of dry forest. Nevertheless regional patterns are robust.

The clearest deforestation hotspot in the state of Chiapas remains the Marques de Comillas area in the Southern Lacandon. Deforestation and carbon sequestration in this area has previously been studied in some detail by De Jong et al 2000

The two images below are animated gif files which change when clicked on to enlarge them to full size. The show clearly how Landsat based deforestation analysis coincides in this area with the conclusions drawn from visual analysis of recent high resolution imagery. The visual analysis is produced by overlaying our change analysis in Google Earth using Geoserver, and through the use of QGis.

Deforestation:Overlaying vector and raster layers in PostGIS using GRASS

Posted in Evidence and Ecology, POSTGIS, R scripts, Uncategorized by Duncan Golicher on July 1st, 2008

A very common operation in GIS involves overlaying vector polygons on a raster map and calculating the area of each class in the raster map. The results are then added to the attribute table of the vector map. How can this be achieved in PostGIS?

If the raster layer is quite small it could be imported as points and the result achieved through a point in polygon SQL overlay. However that is not the usual way to achieve the operation. When large raster layers derived from the analysis of satelite imagery are of interest the conventional GIS approach  is to first rasterise the vector layer. The computations involved in the overlay are then simple and fast.

This example taken from our ongoing work. It overlays a map of agricultural units on a raster layer representing deforestation in the last decade derived from analysis of satelite data. Because deforestation in Chiapas has tended to take place on a rather small scale involving only a few hectares at a time, it is not at all easy to see the large scale regional pattern from an image. Calculating the proportion of the total area of each agricultural unit deforested helps to see the pattern. The map above shows the percentage of the total (forested + non-forested) area of each agricultural unit estimated to have been deforested between 1990 and 2000.

The simplest way to achieve this result with PostGIS is to import the vector layer into GRASS, where the raster is also held. There are a few issues that have to be addressed when doing this. The first is that the gid index is lost when the attributes are imported. As a unique identifier for each polygon is needed during the overlay an index should be added to the table first if the gid is the only unique identifier for the table. This is very simple.

ALTER TABLE chiapas.agros ADD COLUMN id int8;
UPDATE chiapas.agros SET id=gid;

Now enter GRASS in a location with the same projection as the PostGIS data base (typically EPSG:4326)

Now there is another technical issue. Clearly vector processing in any GIS involves a combination of the geometery and attribute tables. We have both already in PostGIS. However in GRASS we use  functions that work on GRASS’s own representation of the geometry, so an import is needed. GRASS can use a wide variety of databases as a back end for storing attribute data. Among these is Postgresql. What is not obvious, and  potentially quite confusing, is that we already have the data in a Postgresql data base! So, does it all need to be imported again? The answer seems to be yes and no. It is quite possible to link the GRASS topology with the original table. However it is much easier to import it all in the usual way using v.in.ogr. This produces a new table, that could either stored in an external data base or perhaps even within the original PostGIS data base (as a new copy). Of course this attribute only table lacks the geometry column.

For example to set up GRASS to use the original local PostGIS data base as the backend for vector layers

db.connect driver=pg database=”host=localhost,dbname=gisdb2″
db.login user=Duncan pass=******

Note that this step is not necessary. By default Grass will store the table as in dbf format.

A layer can be imported from PostGIS using

v.in.ogr dsn=”PG:host=localhost dbname=gisdb2 user=Duncan” layer=chiapas.agros output=agros type=boundary,centroid

If the raster layer is in another coordinate system, as it is in this case, it will be necessary to exit the location and reproject in the location of the raster layer.

v.proj input=agros location=Global mapset=mesoanalysis output=agros

Then form a raster layer.

v.to.rast input=agros output=agros use=attr column=id layer=1 value=1

Then in GRASS the overlay is achieved quite simply using r.stats and sending the output to a text file.

r.stats -a input=agros,defor >agros.txt

This forms a file in which the first column holds the id of the vector layer, the second the class in the raster layer and the third the area in square meters of each combination. For example…

1 111 1319094.000000
1 112 159201.000000
1 122 1041304.500000
1 211 4026323.250000
1 222 85285437.750000
1 244 89347.500000
1 422 52796.250000
1 444 1749586.500000
2 111 565326.000000

So the task is now done, apart from the job of adding the data back into the original PostGIS table. There are many ways to do this. Often only one, or a few focal classes are of sufficient interest to merit a new column. So an easy way to process the output and send it back to PostGIS involves R.

Connect to the PostGIS data base using RODBC and set the schema

library(RODBC)
con<-odbcConnect(”mydb”)
odbcQuery(con,”SET search_path =chiapas, pg_catalog;”)

Read in the data and set the first column as numeric, removing any missing data (by default it will probably have been interpreted as a factor as GRASS will use a * for missing data).

d<-read.table(”agros.txt”)

d$V1<-as.numeric(as.character(d$V1))
d<-na.omit(d)

Now a subset of the data can be taken for each of the columns of interest, named and added back to the data base. In this case code “111″ represents area that was classified as forest in 1990,2000 and 2005. “122″ pixels that were deforested in the period between 1990 and 2000.

odbcQuery(con,”ALTER TABLE agros ADD COLUMN forest int4;”)
d2<-subset(d,d$V2==111)
d2<-d2[,-2]
names(d2)<-c(”gid”,”forest”)
d2$forest<-round(d2$forest/10000,0)
sqlUpdate(con,d2,”agros”)

odbcQuery(con,”ALTER TABLE agros ADD COLUMN defor2000 int4;”)
d2<-subset(d,d$V2==122)
d2<-d2[,-2]
names(d2)<-c(”gid”,”defor2000″)
d2$defor90<-round(d2$defor2000/10000,0)
sqlUpdate(con,d2,”agros”)

We are rather concerned that this cover change analysis, that has been achieved with the standard methodology used by Conservation International, tends to quite seriously underestimate small scale disturbance and canopy opening. Nevertheless in comparative terms it does demonstrate that absolute deforestation in Chiapas is certainly not a homogeneous process. Extensive deforestation is not occurring, although if the map is zoomed out further to show the Marques de Comillas area a very clear hotspot of deforestation is shown. In this part of Chiapas (Central Valley, Sierra and Altos) comparatively few agricultural units have deforested more than 2% of their total area in the last decade.

The other way of looking at this would be as the proportion of remaining forest lost. This woud tend to emphasise deforestation rates in ejidos and agricultural units with a very small ammount of forest in 1990. However at this level the approach would be somewhat unfair as it would be biased by classification errors in small areas.

To put the map in context a map with the proportion of the total area classified as forest can be used. In R the column can be added using

odbcQuery(con,”ALTER TABLE agros ADD COLUMN propforest float;”)

odbcQuery(con,”update agros set propforest=100*forest/hectares;”)

A surprising number of the agricultural units have over 40% tree cover. This has to be placed in the context of a classification method that draws no distinction between a pixel within a pasture with sufficient cover of secondary vegetation to be classified as forest and untouched primary forest. The next step clearly involves analysis of fragmentation in order to clarify this.

Configuration of Geoserver

Posted in Linux, POSTGIS, Uncategorized by Duncan Golicher on June 24th, 2008

Mapservers have never taken centre stage in my thinking on how to make Ecosur’s geographical data more widely available.  When using a geographical data base the underlying information is almost by definition available for analysis. In contrast data served up by a mapserver is usually only suitable for visualisation. However mapservers have an important role in conjunction with PostGIS. The NASA JPL wms server is a an essential provider of satelite images that provide the backdrop that allows a visual intepretation of spatial information.

At a recent workshop Conabio provided me with a very nice shaded relief map for Mexico derived from a digital elevation model. I have been using it locally with Qgis. However I became concerned that it must also be potentially made available for use with POSTGIS at an institutional level. This clearly needs a mapserver.

I have previously wasted a fair ammount of time failing to configure mapservers correctly. So this morning I looked with some trepidation at geoserver. To my relief within an hour I had geoserver configured and the shaded image available to QGIS through a local host WMS.

The steps are extremely simple:

1. Download geoserver as a zip file

2. Extract the contents.

3. Set an environmental variable to point to your implementation of Java. I have sun java 6 installed using the medibuntu repository.

sudo gedit /etc/environment

Add this line

JAVA_HOME=”/usr/lib/jvm/java-6-sun”

4. Reboot and run mapserver using a command that goes something like

/home/duncan/geoserver/bin/startup.sh

5. Point the browser at

http://localhost:8080/geoserver/

Then if everything is working simply follow all the online instructions in order to add layers.

With geoserver started there is a new wms server that QGis can connect to.

It is also easy to include data from geoserver in Google Earth. Choose <add> <image overlay> from the menu. Then the “refresh tab”. Click on WMS parameters and add the geoserver details.

You then have the layer in Google Earth

Postgresql 8.3 and Qgis 0.10.0 Io

Posted in Linux, POSTGIS, Uncategorized by Duncan Golicher on May 3rd, 2008

Ecosur will soon be providing students and researchers with a networked data base for storing and visualising public domain spatial layers. This database should also store results from ongoing research. I am convinced that the advantages of providing a fully functional geographical data base at zero cost by using PostGIS/Postgresql are compelling. The release of a stable Postgresql 8.3 and Qgis 0.10 have made the task even simpler.

Postgresql 8.3 is now notably faster when running spatial queries than the version I had been using (8.1). It also has very advanced features.  These move PostgreSQL up to the power of the premium  proprietary enterprise data base, Oracle. They clearly go beyond those found in MySQL.

The only practical challenge I faced with PostgreSQL 8.1 involved a very simple issue of scalability when the database was viewed with Qgis 0.9. Despite some issues with the stability of Qgis (that did worry me, but appear to have been largely resolved) Qgis is a suitable visualisation tool for PostGIS. I have used it extensively since I started experimenting with PostGIS a few months ago and am generally satisfied. However a simple practical issue concerned me.

How could users easily be given access to the parts of a potentially very large database in such a way that they see only the part they need or are allowed to see?

The answer had clearly to be through the use of Postgresql schemas, but the Qgis interface didn’t show them in a user friendly way that matched the need.

This problem is now largely resolved by the new version of Qgis. Schemas now show up in an unfoldable tree structure. As groups of users and individual users can be given access to a subset of schemas it should now be feasible to design a database that hides all the complexity from users and allows them to quickly find the layers  they need.

Upgrade to Ubuntu Heron

Posted in Linux, Uncategorized by Duncan Golicher on May 2nd, 2008

I am impressed by Ubuntu Heron. The most important difference is  ease of installation. Both Feisty and Gutsy were simple enough to get running, given a reasonable degree of patience. However there were always a few problems to fix. These involved  screen resolution sound and video. Getting everything right needed  a search which produced multiple answers on the users forums.

A single issue that needs to be resolved by editing a config file is more than enough to get most Windows users, acustomed to  preinstalled operating systems, to give up and go strait back to what they know. Thus despite the clear merit of the underlying operating system, uptake remained much slower than it could have been among academic colleagues here.

In contrast my  experience installing Heron on a Toshiba Tecra was effortless. It all worked out of the box. What was more impressive was the short amount of time it took to get everything, including a large number of R packages, GRASS, QGis, Google Earth, MySQL, Apache, Postgresql etc all back together. I already knew what I wanted and where to get it, but even so the short time line is worth stressing.

3:15 Open Laptop and remove old hard disk (It was time to go from 80GB to 250GB and this is a great way to make the change without any fear of losing time)

3:20 New hard disk in place.

3: 25 Heron install CD is asking me for details of location, keyboard and partitioning.

3:41 Heron installed and running.

3:45 Reboot in order to use NVidia, this was recognised as necessary after activating advanced visual effects. No manual configuration needed at all.

5:03 R, GRASS, QGis, Google Earth, Flash plugin + codecs, Apache MySQL, Postgresql all installed.

So I will never worry about a hard disk crash again. Providing I have my personally generated data safe somewhere, all the rest is easy. Perhaps it is not so simple on all laptops, but it is just so encouraging to see that Ubuntu is getting within a hair of the grandma friendly Linux.

I occasionally do have need of Windows XP. So this morning I moved the .virtualbox directory over from my old home directory to the new one, after a quick install of virtual box and adding my usename to the virtualbox group. It worked exactly as it had before. The advantage of virtualising windows is so glaringly obvious I am constantly surprised by my colleagues reluctance to protect themselves against virus attack that way. It is understandable to want to continue using what you are used to (Windows XP). However it is very strange not to want to make sure that you can get back to work quickly after a disaster. Virtualising XP gives the confidence that you can be back up and running in less than an hour. As desktop PCs are more or less generic virtual machines can be backed up and moved freely between them even if an IT department hasn’t done what it should and provided a sensible networked option of virtual machines for Windows XP users.

It is worth being aware that Heron comes with Firefox 3 beta preinstalled, which is fine, as it will soon be the official version. However in the meantime you can install Firefox 2 in order to use  plugins that don’t yet work with the new version. Be aware that  you must  close down Firefox 3 in order to run firefox-2 (type firefox-2 on a command line or set up a launcher).

I also noticed that the Heron repositories has up to date versions of other programs that I didn’t realise had progressed. The most important development for me at the moment was that Postgresql8.3 is now the stable version and that QGis is 0.10.0 instead of 0.9.1. Despite still not getting to version 1.0, QGis  does look as if it has improved greatly in terms of both stability and appearance. This is encouraging after my recent presentations. QGis 0.9 did have a few uncomfortable issues that I didn’t mention. Also Open Office is better. The most notable improvement is when you try to change the language setting. Anyone who knows the frustrating illogicality of being presented with “thesaurus” when you want to check the spelling of a Spanish document will know what I mean. The options are now in much more sensible places.

Installing Heron

Posted in Linux, Uncategorized by Duncan Golicher on May 1st, 2008

Despite recent posts, I admit to being a long term Windows user myself. I therefore share many of the characteristics of Windows users. The most important of these is that I am naturally extremely conservative and very reluctant to change a laptop when I have things working the way I am used to. I have been using Ubuntu Feisty since late July 2007 without ever  bothering to upgrade to Gutsy (although I installed it for my son).

However I decided that the release of Heron really deserved my attention. I was becoming increasingly  frustrated with the limited disk space on my duel boot Toshiba Tecra with its original 80 GB HD (40 GB still in an XP partition that I now never use). So today I treated myself to a 250GB external Simpletech ATA hardrive. On getting it home I  invalidated the warranty by opening the case, taking out the drive and swopping it for my 80GB ATA drive. I mention the fact that it is ATA as older (my Toshiba was bought in October 2006) or cheaper laptops are likely to have IDE drives. Always check first and be aware that this simple trick will not work if your machine uses IDE. But if you need a new hard drive, and can afford one, simply switching the drives saves all worries about upgrading. If it goes wrong just put the old drive back in.

In went the Heron CD (I decided not to bother with any native windows install this time) and 15 minutes later all the basics were installed without any tweaking at all. I turned on the advanced visual effects. Ubuntu realised I needed an Nvidia driver and installed it. After rebooting I then had all the  wobbly windows and spinning cubes if might ever want, if I ever feel like using them. A completely painless, risk free upgrade in no time at all. Now all I need to do is copy the parts of my home director I want to keep back over from the external drive and it is all back where it was, with extras. Two hours maximum for the whole process, including putting back R, GRASS, Qgis MYSQL etc. PostGis is just a little bit more difficult in my case due to the fact that I previously used Postgreql 8.1 and it is time for 8.2. But that detail is fairly tecnical.  Contrast the basic experience with a complete re- install of Vista. It would be a nightmare that could last the best part of a  week.

Last night I read this very useful and sympathetic account of the install experience from a novices’ perspective http://www.bspcn.com/2008/04/28/the-great-ubuntu-girlfriend-experiment/

As I had handed out several disks of the Heron after the Ecosur seminar I thought I should be aware of these sort of problems.  I am quite likely to have a few students coming into my office in the next few days asking for help. The girlfriend gotchas really are all very minor. My  experience with the new Heron install was simple to the point of being disappointingly trivial. I missed all the thrills of fixing screen resolution and sound. The only important detail worth mentioning is to make sure that the flash plugin is installed from symantics before going to watch a you tube video. There are then no problems. You simply get a polite message asking for permission to install codecs and there is no need to try to install a gzipped file from adobe. Done.

Preguntas sobre PostGIS y la presentación de ayer.

Posted in Linux, POSTGIS, Uncategorized by Duncan Golicher on April 30th, 2008

1. ¿Necesito Linux para usarlo?

No. El modelo es lo que se llama “cliente-servidor”. Es decir que PostGIS es el servidor. Proporciona los datos, pero la visualización de los datos o el procesamiento de las consultas es al nivel del cliente. Tu puedes escoger el cliente que mas te gusta. En este momento hay varios programas clientes de acceso abierto (gratis) que pueden leer, procesar y visualizar información de un servidor de PostGIS. Lo mas fácil de usar es probablemente QGIS, aunque Udig también tiene buenas características. Open Jump es otra alternativa. Todos estos programas se puede instalar en Windows. Los clientes están en un proceso de desarrollo todavía y van a incrementar su utilidad con el tiempo. PostGIS ya tiene toda la funcionalidad de servir datos a clientes con mas capacidades.
Al mismo tiempo programas como Excel y Access pueden vincularse con el servidor para procesar datos no espaciales.

2. ¿Es difícil instalar Post GIS? Necesitamos apoyo técnico externo para hacerlo.

No. Se instala PostGIS con un servidor de Linux en alrededor de una media hora. Se requiere un poco de planificación y esfuerzo adicional para la configuración óptima de las opciones de seguridad. Pero es importante recordar que los usuarios de la información no van a instalar PostGIS. Ellos pueden bajar un archivo qgis.exe del internet y instalarlo bajo Windows en minutos. Si usan Linux también van a instalar Qgis, o Udig, no PostGIS. El uso de Qgis es sencillo e intuitivo. Se puede dar capacitación en su uso practico en el curso de una mañana.

3. Hemos intentado bajar capas de un mapserver, pero parece complicado.

La relación entre PostGIS y un mapserver es potencialmente un elemento potente, pero no se necesita correr un mapserver para usar PostGIS. Un mapserver recibe capas de PostGIS, Asi que es otra capa entre el usuario y servidor. La configuración de un mapserver para correr con PostGIS no es muy difícil. Hay dos mapservers de acceso abierto (mapserver y geoserver) que pueden usarse. Sin embargo un mapserver es realmente una forma de dar información al publico. El primer paso hacia el uso de un mapserver sería asegurar la velocidad de la conectividad al Internet de Ecosur.

4.Estoy acostumbrado de usar ArcView, tengo licencia para usarlo y muchos shapefiles disponibles. ¿Por qué me interesa esta iniciativa?

La ventaja de una base de datos espacial es que toda la información esta guardada en el mismo lugar, bajo el mismo sistema de coordenadas y actualizado. Las capas pueden comunicarse entre ellas. Si ves por ejemplo que las coordenadas del censo nacional no coinciden con los del conteo de población sería posible tratar de resolver el problema. Una vez resuelto todos los usuarios de la base no van a repetir los errores. Usuarios de ArcView pueden bajar la información como shapefiles con un clic y trabajarla al nivel local de sus propios Pcs. Es muy importante enfatizar que la iniciativa no fuerza usuarios de cambiar su forma de trabajar. Cada persona desarrolla sus propias preferencias por software al nivel de sus propia “desktop” y hay que respetar sus decisiones. Pero al nivel del servidor hay que asegurar que todos los datos están mantenidos bajo estándares rigurosos, uniformes y convencionales.

5.¿Cuando podemos contar con un servicio así?

Un prototipo ya esta funcional y ya puede ser muy útil. Se puede proporcionar una herramienta de uso interno en el espacio de una semana. Pero la verdadera potencia de un sistema así viene con mas esfuerzo. Se puede programar portales Web atractivos para el publico. Se puede organizar sistemas amigables para que estudiantes entren sus datos en el sistema sin tener que saber detalles sobre el uso de SQL. El control de calidad y mejoramiento de la información se consigue a través de esfuerzos de revisión meticulosos y programación con datos. Para contar con un sistema del cual podemos estar orgullosos como institución debemos emplear por lo menos una persona con capacidad en programación de bases de datos tiempo completo.
Sería vital la coordinación entre investigadores que producen nuevos datos de campo, las colecciones, LAIGE, informática y la biblioteca.

6. ¿Cuales son la ventajas de Linux para mi como individuo?

Mis propias razones estan expresados aqui.

http://duncanjg.wordpress.com/2008/02/08/por-que-linux/

Presentación de PostGIS en Ecosur

Posted in Linux, POSTGIS, Uncategorized by Duncan Golicher on April 29th, 2008

Abajo he incluido las diapositivas de una presentación que voy a dar hoy en Ecosur sobre la potencia de PostGIS como un base de datos para la investigación institucional en Ecosur. Incluyo una introducción sobre la filosofía del movimiento de aceso libre y la potencia del nuevo modelo “profesionalizado” del software de aceso libre.

bases-de-datos-de-aceso-libre

Los ejemplos del uso de POSTGIS se ven en los videos de You Tube abajo.

Desafortunadamente la calidad del imagen no es muy alto. De todos modos dado la velocidad de la conexión institucional no creo que se pueden ver dentro del institución.

Hurricanes in Chiapas

Posted in Climate change, Evidence and Ecology, POSTGIS, Uncategorized by Duncan Golicher on April 19th, 2008

I have just added the historical hurricane paths to my PostGIS data base from NOAA http://maps.csc.noaa.gov/hurricanes/

The data base has storm tracks from 1851 to 2006. The above maps shows only those since 1990. This is a very valuable resource that is well worth including in the regional data base and looking at in more detail. The interesting element as far as the state of Chiapas is concerned is that very few hurricanes actually track over the state. This is clearer when the whole (150 year) historical data set is included as shown in the figure below. However the moist air masses associated with hurricanes can cause substantial rainfall in the state even when the centre of the hurricane is some distance to the north or east.

When the map is zoomed out further a very clear cut off point emerges, forming an almost horizontal line at around 8 degrees North. This is a feature associated with  the Intertropical Convergence Zone (ITCZ),

This rather sharp transition also explains some of the strange patterns in the maps of global circulation model simulations which also show a pronounced band around the equatorial ITCZ. The cut off may alter in extent over the next century as the effects of global warming are felt. If it expands slightly northwards, then rainfall associated with hurricanes and tropical storms may also move, leading to a general decline in rainfall in southern Mexico. This could occur even if hurricane intensity and frequency in the Atlantic basin follows an overall increasing trend.

An example of using PostGIS from R

Posted in POSTGIS, R scripts, Uncategorized by Duncan Golicher on April 19th, 2008

One of the interesting elements of PostGIS is that it can be run from R using RODBC. I have found this to be very useful, especially when tables are being altered, as it allows the properties of the results of a query to be quickly analysed and checked for sanity before being added to the data base. So in order to work with POSTGIS I usually open QGIS, PgAdmin and an R console and switch between them.

Here is an example of how you might calculate the population density within polygons and add the results to a PostGIS table.

In R first connect to a data base assuming an ODBC connector has been set up

library(RODBC)
con<-odbcConnect(”mydb”)

Now the example will find the sum of the population within polygons representing watersheds. This layer was originallybuilt using r.watershed in GRASS followed by r.to.vect. The automated watershed limitation doesn’t work very well along the coast, but provides a useful set if polygons for the rest of Chiapas, particularly the central valley. First add a new column to hold the result.

odbcQuery(con,”ALTER TABLE chiapas_basins ADD COLUMN poptot int4;”)

Then the query that calculates the total population can be run from R. The result can by looked at first in R to check that it is sensible. This is useful, as I found differences between using the 2000 census data and the 2005 population count that are not attributable to population change or migration, but rather are caused by errors in the geopositioning of the data.

sql<-”select sum(m.z1) as poptot, b.id from chiapas_conteo_2005 m, chiapas_basins b where m.the_geom && b.the_geom AND contains(b.the_geom, m.the_geom) group by b.id;”

d<-sqlQuery(con,sql)

This takes a while to run. I quickly discovered that it is extremely important to make sure that a GIST index is added before attempting point in polygon operations. This can be added to each table in PgAdmin, for example.

CREATE INDEX chiapas_basins_idx_geo on chiapas_basins USING GIST(the_geom GIST_GEOMETRY_OPS);

Now histograms and totals of the population in the watersheds can be produced in R to look at the results before adding them back into the data base.

For example

hist(log(d$poptot))

The population density is typically log normal.

sum(d$poptot)
4255957

This gives Chiapas a total population of 4.25 million which looks about right. As the query returned the results with a column name that has already been set up and has the primary key this line will update the database.

sqlUpdate(con,d,”chiapas_basins”)

Now we need to calculate the area of the watersheds in km2. To do this I first had to insert an srid for Albers conic projection for North America. This can be found here

http://www.spatialreference.org/ref/epsg/102008/

Then some more queries that can either be run in pgAdmin or wrapped up by odbcQuery(con, “some sql statements “) and run from R directly.

odbcQuery(con,”ALTER TABLE chiapas_basins ADD COLUMN areakm2 float4;”)
odbcQuery(con,”UPDATE chiapas_basins SET areakm2 = area(transform(the_geom,9102008))/1000000; “)

odbcQuery(con,”ALTER TABLE chiapas_basins ADD COLUMN density float4;”)
odbcQuery(con,”UPDATE chiapas_basins SET density = poptot/areakm2; “)

The results can then be visualised in QGIS. Whether this is easier than using ARCGIS certainly depends on what you are used to. I am still working out how to achieve fairly basic operations like this in PostGIS. However when I have achieved them once, it is then easy to repeat the operation for other examples, which is why I am posting fairly simple examples like this to the weblog for reference in case they can help others to overcome the intially steep learning curve with PostGIS. Although the geoprocessing wizard in ArcView is very simple to use and the point in polygon operation does seem to run much faster than the corresponding query in PostGIS, this particular operation would still require some additional work with the database tables if it were to be done using ArcView alone.

Tagged with: , ,

Importing maps from BERDS: Reprojection using ogr2ogr and shp2pgsql

Posted in POSTGIS, Uncategorized by Duncan Golicher on April 15th, 2008

I am constantly impressed by the BERDS database for Belize. It is powered by POSTGIS, and also offers downloads of some coverages as shapefiles in UTM. I have just downloaded and imported Jan Meerman’s fire risk map into my POSTGIS system. I did it in two steps that I have found to be generally more robust than trying to use ogr2ogr. directly. First reproject to a new shapefile then import the result using shp2pgsql. I am not sure why this process results in fewer errors, but it does seem to work with some difficult imports I have been attempting (although this particular example is very clean).

ogr2ogr -f “ESRI Shapefile” belizefire.shp -t_srs EPSG:4326 -s_srs EPSG:2028 firerisk.shp

shp2pgsql -s 4326 -c belizefire belizefire mydb | psql -d mydb

La niña and early rainfall

Posted in Evidence and Ecology, R scripts, Uncategorized by Duncan Golicher on April 14th, 2008

One of my first posts to this weblog mentioned the statistical correlations between low mid pacific sea surface temperatures and early rainfall in Chiapas.

http://duncanjg.wordpress.com/2008/02/06/checking-on-the-el-nino-effect-using-r/

Heavy rains often do not begin in the state until mid May. However this weekend we have experienced a major rain storm (12 April). Persistent rain fell throughout the night and early morning (14 April). Rerunning the code shows that la Niña effect is still strong.

Of course as any linkages are statistical in nature I do not expect my temporary success at long term weather forecasting to continue.

Tagged with: , ,

Las Chimalapas: A reserve waiting to be declared?

Posted in POSTGIS, Uncategorized by Duncan Golicher on April 13th, 2008

Watch the animation below carefully (Click on the map to open the animated GIF). First a blue marble image is presented. Then a map of settlements is overlain (red points) Then the current protected areas of southern Mexico are added (blue shading). Finally the registered plant collections are added.

So, what is that large hole in the centre of the map doing? Does no-one live there? Hasn’t it been declared a reserve? Have any botanical collections been made there?

This is the Chimalapas region, in the isthmus of eastern Oaxaca bordering on Chiapas. The vegetation ranges from lowland rainforest in the Atlantic north to tropical dry forests in the Pacific south Cloud forest and montane pine-oak forest lie in between. Rainfall ranges from 1100 mm to over 3500 mm. The whole area totals more than 600,000 ha. It is still in a good state of preservation. The avifauna is apparently the most diverse for any region of comparable size in the country, totalling at least 464 species in the region as a whole (with more than 300 species in the lowland rainforest) representing 44% of the bird species known from Mexico. Important species present in the region include Harpy Eagle Harpia harpyja and several other large eagles. Yet it has not received protected status and is seldom visited by plant collectors.

Giving protected status to the region would seem on the face of it to to be an automatic “no brainer”. There would be few losers. The resources of this uninhabited region are presently not being exploited. Pressures on the region are ever present, but not intense. The publicity associated with protected status might encourage incipient ecoturism in the area. Various conservation organizations such as the TNC and Conservation International are interested in the region and my group in Ecosur are currently looking at deforestation in the region in detail in order to strengthen the arguments.

However, is the protected area model really the best option for the area? I leave the question open in the hope of receiving some comments.

There are a few points that might be discussed.

1. Are governmental resources available in order to ensure that protected status would indeed prevent activities that negatively impact the biodiversity of las Chimalapas?
2. Could protected status have unexpected consequences, such as encouraging unsustainable activities in the the area?
3. Would protected status for las Chimalapas divert resources from initiatives that could be effectively combating more immediate threats?
4. Would protected status interfere with wider long term economic development of the region, specifically in terms of trans isthmus transportation?

It is quite astonishing to notice that the “hole” in question even stands out at a national level when desert regions are included. The only other large areas without inhabitants in Southern Mexico are around Calakmul.

Floods and forest loss 3

Posted in Uncategorized by Duncan Golicher on April 13th, 2008

Unfortunately time is short. I have found it more difficult than I expected to make a handful of specific criticisms of the statistics in Bradshaw et al.

I originally aimed to concentrate attention on a few narrow methodological issues when the paper was brought to my attention. I now feel that statistical weaknesses are indicative of a much deeper problem with papers of this sort.

Environmental science should guide environmental policy. Environmental politics should not exercise direct influence on the way scientific evidence is evaluated. If this is allowed to occur there is a constant risk of the underlying science becoming devalued. This has to be resisted. Scientific publishing concerns itself with rigor and objectivity. Scientists do have an obligation to make decision makers more aware of issues such as the wider positive value of forests. There are now many forums available that can allow us to achieve these goals. As individuals we can find always find, develop and strengthen, outlets that permit us to express subjective opinions freely and explicitly. A weblog such as this may even be an example.

However, the traditional scientific peer review process should not be seconded to an environmental cause. It must have a different role. The strict objectivity of peer review gives us the credentials and the confirmed evidence that we can use in order to make a difference when we express our opinions as scientists.

In this particular case the authors’ openly state that they deliberately set out to support decisions that strengthen forest conservation. They write that “for conservation to receive wide political and popular attention and priority, especially in the developing world, there needs to be empirical evidence of nature’s role in supporting human well-being. … For centuries it has been vehemently claimed, and hotly disputed, that forests provide natural protection from floods … Yet, because the claim lacks broad-scale empirical support, the development and implementation of clear flood-mitigation policies regularly stall (FAO & CIFOR, 2005; Calder & Aylward, 2006).” The authors stated intention was thus to correct this deficit. I find this quite uncomfortable reading. Scientific evidence doesn’t fall out of a data set as a result of political necessity to provide it.

Why does the claim of a link between deforestation and flooding “lack empirical support”? I don’t believe that it does. There is ample evidence on which to base a consensus. Flooding, and more specifically, the loss of life and property associated with flooding, is the result of a broad set of factors that combine in complex, case specific ways. These combinations (that could be called the “cause” of flooding) are often quite well understood at the local level when any specific event is investigated. However the “causes” of floods in general can rarely be analyzed successfully by looking for global scale correlations. The number of interactions involved prevents purely statistical analyses from incorporating enough of our current understanding to cast new light on the issue. Contemporary, process based, hydrological models now capture the dynamics of flooding remarkably well. Water does move in predictable ways, even if higher powers, including politicians, do not. There are relatively few deep scientific mysteries to be resolved.

The intrinsic difficulty is still in effectively communicating what is already known to decision makers. Flooding is largely predictable (at least in a probabilistic sense). Nevertheless, poor political decisions are constantly made that ignore both science and common sense. They will continue to be made if decision makers can claim that a scientific consensus has not been reached and then use the resulting confusion as a justification to act on non-scientific agendas.

In some specific circumstances forest cover can indeed mitigate the effects of intense rainfall events. This is especially true in the case of small scale, local events. In other cases flood plains are inundated regardless of whether the upper parts of watersheds are forested. Politicians sometimes use deforestation in order to distract attention from their own poor planning in heavily populated flood plains. Sound environmental management involves both protecting forests and ensuring high quality urban planning with sophisticated risk management in the flood plains. Distracting attention from the complexities of the trade offs involved by setting up false dichotomies based on questionable analysis of evidence is unhelpful. Floods cost lives. We must try to get this right.

Here are a few quick maps based on the authors, data.

The authors investigated two statements. (i) flood frequency is correlated with the total forest cover (natural and plantation) and/or (ii) flood frequency is correlated with the total forest cover loss over the period of interest. Only floods caused by heavy or brief torrential rain were included; those caused by typhoons, cyclones, dam breakage and tsunamis were excluded because they represent events that originate independently of landscape characteristics.

It should be clear that total forest cover must be correlated generally with climate. Absolute forest loss is positively correlated with the size of the country with large countries clearly losing more. Proportional forest loss is negatively correlated with the size of the country, with small countries losing more forest. So the authors’ attempted to hold for these effects statistically by including them in an initial model.

As I read the paper, this analysis looks increasingly odd. Instead of concentrating on “offsetting” effects such as total area, population or rainfall they threw them directly into the statistical model. The use of “random effect” to refer to country level soil moisture is also extremely odd. On initial reading I thought that country id must have been used as a random factor in a GLMM that looked at within country effects, sensu hierarchical mixed effects models as used in social science. I could find no clue that the authors had investigated any random effects as I understand the term.

Using the data set that I had previously made available for download in a previous post on this weblog I fit one of the models cited in the paper using R.

mod summary(mod)

Call:
glm(formula = nfloods ~ area + slope + rain + degrad + for2000 +
forloss, family = gaussian(link = “sqrt”), data = d)

Deviance Residuals:
Min 1Q Median 3Q Max
-12.570 -4.091 -1.526 2.090 38.183

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.377e-01 6.697e-01 1.102 0.27666
area 6.295e-07 3.268e-07 1.926 0.06054 .
slope 1.182e-01 2.022e-01 0.585 0.56171
rain 9.215e-04 3.056e-04 3.015 0.00425 **
degrad 1.925e-06 7.054e-07 2.729 0.00909 **
for2000 -1.732e-06 5.392e-07 -3.211 0.00247 **
forloss 8.676e-05 5.967e-05 1.454 0.15307

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 74.80764)

Null deviance: 15817.0 on 50 degrees of freedom
Residual deviance: 3291.4 on 44 degrees of freedom
(5 observations deleted due to missingness)
AIC: 373.26

In other words, at first sight no significant effect of deforestation at this scale. I am still not sure what the authors’ analysis actually involved but although the (often justifiable) use of information criteria to mediate between models appears sophisticated, and the lack of mention of statistical significance testing is sensible, in this particular setting the contemporary feel to the analysis mainly serves to make it more difficult to penetrate for the uninitiated.

As my major issue remains that of aggregation errors (for example what does median annual rainfall of 616 mm mean for a countrty like Mexico?) I am reluctant to go into further detail.

Connecting Qgis to an Ecosur based POSTGIS server

Posted in Uncategorized by Duncan Golicher on April 4th, 2008

The internet connection speed from Ecosur is unfortunately extremely slow. We have a limited number of external IPS. In order to set up a temporary trial POSTGIS server I had to commandeer a PC that is also being used for image processing. So there is only limited and intermittent connectivity. It would fail completely if many users connected at the same time.

However I aim to put some of the most useful basic coverages for Chiapas and Mexico on this server as an alternative to passing around shapefiles. One advantage is that everything is held in the same projection using the same datum (EPSG code 4326, PROJ4 +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs
ESRI GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137,298.257223563]],
PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]]

Import to POSTGIS requires clean topology. I have unfortunately found that many of the available layers have problems with this and do not import cleanly. I am trying to fix by cleaning them in GRASS but it is tedious and does not always work. However the layers I do get into the system are then guaranteed to be usable by any software.

The easiest way to connect to the server is to download Qgis (available for windows here http://download.qgis.org/downloads.rhtml - choose a stable version)

Once Qgis is running choose “add a postgis layer”.

Connect to the server with the details shown below (The password is available by contacting me directly to avoid over use and details such as the name of the database may change).

suelos2.png

Then layers can be loaded strait into Qgis. For example the 1:25000 soils map for Mexico can be downloaded by choosing the layer shown below.

suelos.png

This can then be saved locally as a shapefile by right clicking on the name of the layer in the panel on the right. I have also had problems with encoding (bad characters) as most of the dbf files are not in UTF8. I am working on this. Again the advantage is that once everything is done there are no future problems.

Floods and forest cover 1

Posted in Evidence and Ecology, R scripts, Uncategorized by Duncan Golicher on April 2nd, 2008

Good decision making should be based on good scientific advice. It is clearly the job of the scientific community to support decision makers whenever they feel able to do so. Floods cause tremendous suffering, not only in terms of loss of life but in the immense disruption caused to lives, livelihoods and economies. Most residents of Southern Mexico have been affected in some way by floods in Tapachula (2005) and Villa Hermosa (2007). Fortunately these severe events caused comparatively few lives to be lost directly. However their consequences to the regional economy were extremely grave. In both cases politicians in the lower part of the watersheds cited deforestation as a major contributory factor.

Conservation of the native forests of Chiapas will not be achieved unless the full value of the services they provide is appreciated by society. Yet the linkages between flooding and deforestation are complex. When a study claims to provide new evidence regarding linkages between deforestation and flooding it must be considered with some care.

Comments on the scientific value of the paper I cited in the previous post should be based only on evidence. I therefore checked most of the websites the authors used to derive their information before analyzing the paper. An important source was the world atlas of flood events at Dartmouth College

http://www.dartmouth.edu/%7Efloods/archiveatlas/index.htm

The data found there can be mapped directly using their web site as shown below.

floods1.png

I also downloaded and reformatted the raw data and moved it into postgis. It can be used as a shapefile through the zipped file here floodevents(change-extension-to-zip).doc

The data is clearly of variable quality. This is par for the course. Prior to 2006 the geographical coordinates were not entered explicitly in this particular data base. All the original raw data from the Dartmouth data (that I downloaded in Excel format) including textual descriptions of the floods can be loaded strait into R using read.table by changing the extension of the following file to txt flood_events.doc. Here are some quick (Qgis) maps derived from the data showing loss of life from the floods that count with coordinates.

floods2.pngfloods4.pngfloods5.pngfloods3.png

There are various points to note before I look at the statistical analysis used in the article in greater detail. Southern and Eastern Asia seems to be particularly flood prone. Most floods on this continent occur around coastlines, on the lowlying floodplains and in the Himalayan watersheds. Africa has comparatively few floods. Flooding in North and Central America tends to be associated with the effects of tropical depressions and hurricanes.

The patterns of flooding at this scale seem to be determined largely by global circulation patterns, but the consequences and probability of being reported will be determined by settlement patterns and economic activity. Floods are rarely reported from the Amazon basin, even though much of the low lying areas of forest (várzea) are seasonally flooded and these floods vary in severity tremendously between years. These sort of details interest forest ecologists trying to trace tree distribution patterns but never make news because they do not affect human lives.

Tagged with: , ,

PostGIS and R continued

Posted in Climate change, Evidence and Ecology, R scripts, Uncategorized by Duncan Golicher on April 2nd, 2008

Last week a colleague sent me an article published in Global Change Biology for comment. I have just got round to looking at it in detail. The abstract is available here (with full text if you have the institutional access.)

http://www.blackwell-synergy.com/doi/abs/10.1111/j.1365-2486.2007.01446.x

The authors claim have identified a correlation at the global scale between forest loss and flooding.

I have a lot of issues with the way the data has been analysed that I intend to discuss here. However one of the positive features of this article is that all the data that used is included as an appendix. This allows anyone to make their own mind up regarding the merits of the analysis.

I will make specific comments in a later post. But I first deal with different technical issue. Having setup a postgis database for Mexico and Central America I realised that I lacked a global country map to help visualise data of this sort. I tried importing the shapefiles that come with Arcview, but with no success. They apparently have errors in the topography. I also tried converting the worldhires data in the R mapdata package, but that also had issues with bad topography (apparently American Samoa was to blame). Finally I found this reply to a R help post from the prolifically helpful Roger Bivand

http://finzi.psych.upenn.edu/R/Rhelp02a/archive/78339.html

I tried

load(url(”http://spatial.nhh.no/R/etc/world_countries.rda”))
library(rgdal)
writeOGR(world_countries, “PG:dbname=mydb”, “world_countries”, “PostgreSQL”)

Bingo. In a couple of minutes I had a rather low resolution world countries map with ISO3166 two letter country codes ready for linkage to any data with a column that uses this international standard. It is now nicely tucked away in my postgis data base ready for use whenever needed. Very satisfying. Notice that I include this code as an example of how R connects to a postgis database once one has been set up. It obviously won’t work if you haven’t gone through the steps to establish your local data base. Here is how on Ubuntu. It is not something I would try under Windows,even though posgresql 8.2 apparently has now been ported to Windows.

I then had to reformat the data from the pdf copy of the article. This was slightly frustrating as it needed a bit of manual work in an open office spreadsheet. I then added the ISO3166 codes. This line (also suggested by Roger Bivand) helped a lot. It would be even better if editors and reviewers insisted on the use of standards when they are available.

d<-data.frame(d,ISO3166=world_countries
$ISO3166[match(toupper(as.character(d$Country)),as.character(world_countries$names))])

The nice thing about a postgis data base is that you can use it for storing all your data, spatial or otherwise, in one convenient place. So using RODBC connectivity the floods data can be incorporated by

library(RODBC)
con<-odbcConnect(”mydb”)
sqlSave(con,d,”floods”)

The only slightly fiddly aspect is that the saved table lacks a primary key. That can be resolved from R with few lines that run sql in postgresql.

odbcQuery(con,”ALTER TABLE floods ADD COLUMN index int4;”)
odbcQuery(con,”UPDATE floods SET index=cast(rownames AS integer);”)
odbcQuery(con,”ALTER TABLE floods ADD CONSTRAINT floods_pkey PRIMARY KEY(\”index\”);”)
odbcQuery(con,”ALTER TABLE floods DROP COLUMN rownames;”)

Then the tables can linked together as a temporary view from within R

sql<-”create view floods2 as select f.*, w.wkb_geometry from floods f,world_countries w where f.ISO3166=w.ISO3166″
odbcQuery(con,sql)

The view can be removed from the database whe it is finished with by

odbcQuery(con,”drop view floods2″)

With the view in place Qgis can be used for the visualisation instead of R, which is really helpful for casual analysis. This convenient interoperability, drawing on the particular strengths of a range of different tools within a common framework (statistical environment, database, GIS visualisation), is one of the great plus points of Open Source software from the point of view of a user. However you do have to discover how to do it first. You don’t get a “wizard” to help.

R does produce very good publication quality maps, but there is rather more overhead involved than when using the simple point and click graphical user interface of QGis I am now finding it much easier to send vector layers strait to postgis and work from there.

floods.png

The view can also be exported as a shapefile strait from QGis. As the wordpress site only allows me to upload files it thinks are documents I have placed it here with a false doc extension. Don’t try to open it, save the file to disk and then change the extension to .zip. floods2thisisreallyazipfile.doc

Qgis Udig and Postgis

Posted in Linux, Uncategorized by Duncan Golicher on March 30th, 2008

Until comparatively recently visualisation of geographical information tended to mean one thing. ArcView.

When Google Earth (and the perhaps under rated and certainly under used NASA Worldwind) appeared on the scene visualisation of free geographical information became popular and ubiquitous. Three dimensional spinning globes revolutionized the way none GIS professionals could work with geographical data. At around the same time some comparatively modest open source projects turned into mature easily used products that could be applied to two dimensional desktop map composition.

At the time of writing the best known of these are QGis (Quantum GIS) and Udig. Both are cross platform. Udig uses Java and runs well under Windows. Qgis still performs rather better under Linux, at least in my experience.

Neither of these programs are true geographical information systems, in the sense that they do not of themselves provide very much in the way of analytical tools. They simply allow geographical information to be visualised in a convenient way. Visualisation is a fundamental element of geographical analysis so it should be done well. These open source programs are acceptable alternatives to the traditional ESRI benchmark.

Users of Arcview commonly activate a range of plugins to give the program more features. Qgis provides a connection with GRASS to provide analytical capability, although I would assume that few GRASS users routinely run GRASS through Qgis. Both Qgis and Udig have wms (world map server) connectivity, which means that they can be used to customize the sort of maps that are found on web sites showing wms data.

An interesting recent development in GIS has been the rapid growth in the use of spatially aware data bases. Most major databases (MySQL, Oracle, Postgresql) can now be given a spatial component. However in the open source context “spatial database” has become almost synonymous with the postgis component of postgresql. It is the built in connectivity with posttgis feature that gives qgis and udig their rapidly expanding potential value. Spatial databases allow sophisticated extraction and combination of geographical data. The results usually usually have to be shown as maps. Command line sql statements are not visual of themselves. QGis and Udig can thus potentially provide the full functionality of an Arc* type environment through levering the capabilities of postgis.

At present this very powerful framework is still in a rather incipient phase, in the sense that there are no GUIs that carry out analyses comparable with Arcview/ArcGis. Most of the functionality of the expensive commercial software is now available through the open source alternatives. However while the documentation is comprehensive it does usually assume prior knowledge of data base programming. Most of the current postgis users are knowledgeable data base programmers with considerable experience using spatial sql queries in the context of online mapserver based web sites. However this is changing fast. The WKB (Well Known Binary) representation of geographical information used in postgis databases looks like a good standard and will undoubtedly be more widely used. This should result in the very rapid development of higher level forms of manipulating information in this format.

One advantage of using an online data base as opposed to separate shapefiles is that the coordinate system of all data has to be consistent, or made so through reprojection when the data base is set up.This is especially important in Mexico where data is often provided without details of the projection and datum used (NAD27 and WGS84 are both used and projection details are rarely provided). Also details of how to connect to the data base using Qgis or Udig can be provided in an email, while it can be difficult to send large shapefiles. Both Qgis and Udig allow POSTGIS data to be saved locally as shapefiles.

Easter holidays

Posted in Family, Personal and family, Uncategorized by Duncan Golicher on March 23rd, 2008

nicole7.pngnicole1.pngnicole2.png

nicole3.pngnicole5.pngnicole6.png

Potential species distributions 1

Posted in Evidence and Ecology, Natural history, R scripts, Uncategorized by Duncan Golicher on March 22nd, 2008

A constant frustration when working in tropical regions is the shortage of really high quality data. The quantitative knowledge that applied conservation needs in order to prevent the extinction of the world’s endangered species of plants and animals is still surprisingly hard to come by. At best the available hard data is fragmented, of variable quality, difficult to organize and thus hard to analyze in a formal manner. At worst it is either missing or positively misleading

Information used for conservation planning also forms the central subject matter of the scientific discipline of ecology. In other words, it concerns the abundance and distribution of organisms. Our ignorance of this most basic characteristic of the planet we live on is remarkable. Astronomers (arguably) provide estimates of the number of stars in our galaxy with narrower proportional margins of error than we have for the number of species in a Mexican forest. For example, Wikipedia cites three sources in stating that of 2006, the Milky Way is thought to be comprised of between 200 to 400 billion stars. A defensible estimate of the number of tree species in Chiapas could still range from 500 to 2500 depending on how data of dubious quality is interpreted. This is not good enough for such highly visible and important elements of ecological communities. The hard data is just not there to improve on this.

Although in recent years considerable advances have been made in systematics (Linnean shortfalls), these so called “Wallacean” shortfall remains. The large scale, systematic, planned, coordinated field work needed to redress our ignorance is simply not being funded.

In recent years I have been working with techniques for producing automated distribution maps of tropical tree species using R. Even using comparatively good data for areas that we know at first hand, it is a remarkably difficult task. Modelling species distributions can involve more pragmatism than theoretical insight. At a regional scale the task becomes yet more challenging.

This is not because R is too limited in the number of useful tools it provides. A very broad range of models are available. They range from rule based classification and regression trees through AI “black boxes” such as neural networks, to generalised additive models and simple logistic regression. All can be fit within R in a common framework. The models can be used to produce predicted distributions based on either presence-absence data or known occurrences alone (with the addition of “pseudo absences”). It is also possible to automate linkages between R and other popular software such as maxent or GARP. Increasingly accurate layers for many key predictor variables are now available.

However at least fifty well distributed occurrence points are needed to provide credibly well validated distribution maps for a single species using a fully automated procedure. If fewer data points are available some sort of input from “expert judgement” is inevitably required in order to evaluate which of the potentially infinite outputs from species distribution models are most credible. Well documented, repeatable automation of the fitting process using R is very useful in this process, as many maps can be produced in a short space of time based on different assumptions. However resorting to visual pattern matching is frustratingly informal and breaks a standard rule of data analysis, i.e. ensuring perfectly reproducible results.

The code example below is one possible very simple implementation of GAMs with pseudo absences. It is is applied here to modelling the potential distribution of tree species using data points provided mainly from MOBOT. The code (deliberately in this case) does not take into account spatial trends or autocorrelation, thus potential matching climates will often be suggested outside a “known” species range. This can be corrected by adding in coordinates as predictors to the model, but at the cost of potential loss of insight into the distribution of species that may not yet have been collected from their entire range. Any modelling technique has to find a balance between too many false positives and too many false negatives. A common technique is to look at the ROC curve, but this is not particularly useful if the number of known occurrences is very low and restricted to a small part of the suspected range.

I have come to the conclusion that we should not be overly defensive regarding t