An example of using PostGIS from R

One of the interesting elements of PostGIS is that it can be run from R using RODBC. I have found this to be very useful, especially when tables are being altered, as it allows the properties of the results of a query to be quickly analysed and checked for sanity before being added to the data base. So in order to work with POSTGIS I usually open QGIS, PgAdmin and an R console and switch between them.

Here is an example of how you might calculate the population density within polygons and add the results to a PostGIS table.

In R first connect to a data base assuming an ODBC connector has been set up

library(RODBC)
con<-odbcConnect(“mydb”)

Now the example will find the sum of the population within polygons representing watersheds. This layer was originallybuilt using r.watershed in GRASS followed by r.to.vect. The automated watershed limitation doesn’t work very well along the coast, but provides a useful set if polygons for the rest of Chiapas, particularly the central valley. First add a new column to hold the result.

odbcQuery(con,”ALTER TABLE chiapas_basins ADD COLUMN poptot int4;”)

Then the query that calculates the total population can be run from R. The result can by looked at first in R to check that it is sensible. This is useful, as I found differences between using the 2000 census data and the 2005 population count that are not attributable to population change or migration, but rather are caused by errors in the geopositioning of the data.

sql<-“select sum(m.z1) as poptot, b.id from chiapas_conteo_2005 m, chiapas_basins b where m.the_geom && b.the_geom AND contains(b.the_geom, m.the_geom) group by b.id;”

d<-sqlQuery(con,sql)

This takes a while to run. I quickly discovered that it is extremely important to make sure that a GIST index is added before attempting point in polygon operations. This can be added to each table in PgAdmin, for example.

CREATE INDEX chiapas_basins_idx_geo on chiapas_basins USING GIST(the_geom GIST_GEOMETRY_OPS);

Now histograms and totals of the population in the watersheds can be produced in R to look at the results before adding them back into the data base.

For example

hist(log(d$poptot))

The population density is typically log normal.

sum(d$poptot)
4255957

This gives Chiapas a total population of 4.25 million which looks about right. As the query returned the results with a column name that has already been set up and has the primary key this line will update the database.

sqlUpdate(con,d,”chiapas_basins”)

Now we need to calculate the area of the watersheds in km2. To do this I first had to insert an srid for Albers conic projection for North America. This can be found here

http://www.spatialreference.org/ref/epsg/102008/

Then some more queries that can either be run in pgAdmin or wrapped up by odbcQuery(con, “some sql statements “) and run from R directly.

odbcQuery(con,”ALTER TABLE chiapas_basins ADD COLUMN areakm2 float4;”)
odbcQuery(con,”UPDATE chiapas_basins SET areakm2 = area(transform(the_geom,9102008))/1000000; “)

odbcQuery(con,”ALTER TABLE chiapas_basins ADD COLUMN density float4;”)
odbcQuery(con,”UPDATE chiapas_basins SET density = poptot/areakm2; “)

The results can then be visualised in QGIS. Whether this is easier than using ARCGIS certainly depends on what you are used to. I am still working out how to achieve fairly basic operations like this in PostGIS. However when I have achieved them once, it is then easy to repeat the operation for other examples, which is why I am posting fairly simple examples like this to the weblog for reference in case they can help others to overcome the intially steep learning curve with PostGIS. Although the geoprocessing wizard in ArcView is very simple to use and the point in polygon operation does seem to run much faster than the corresponding query in PostGIS, this particular operation would still require some additional work with the database tables if it were to be done using ArcView alone.

2 thoughts on “An example of using PostGIS from R

Leave a comment