Working with raster data in PostGIS

I am leaving yesterday’s post online, as the purpose of a weblog is to track changes to knowledge as they occur.

However I learned a great deal more about PostGIS raster today, in large part as a result of an email from Pierre Racine.

As a result of my new insight I am aware that some elements in yesterday’s post may be slightly misleading. I have added a warning and will edit it further in order to avoid leading anyone astray. I hope that comparing the two posts will help others to avoid falling into the some of the same pitfalls that I did initially.

The first important mistake, that I found for myself this morning after looking at the results more critically, was that I had completely misunderstood tile size. It was all explicit in the manual, but I just didn’t read it properly the first time.

Tile size is tile size. Not the number of columns and rows! I had badly misunderstood this and so thought that using -t 100×100 would result in smaller tiles than -t 10×10 (i.e I was assuming that it meant 100 by 100 and 10 by 10 etc)

In fact, it turns out that the tile size is expressed in numbers of pixels. This is very useful to know. I should have realised this straight off by watching the insert rows commands scroll by. It never made sense to find that using a tile size of 1×1 was much faster than no tiling!

The vector representation of -t 400×400 for Mexico for the WorldCLim data at 0.01 degree resolution looks like this.

And the representation of -t 100×100

One of the reasons that I did not spot the tile size issue sooner was because I had tried, and failed, to use QGIS to visualise the imported rasters. Qgis was not showing the imported layers properly, so I was working blind. I soon realised that the distorted visual representations were not due to errors when importing the data, as the results from overlays coincided with those from other programs. Nevertheless the fact that imported layers do not display well in QGIS at the moment is an understandable cause of confusion.

Pierre Racine told me that

“The plugin is dependent on GDAL and the GDAL driver have been unstable for a while. I use OpenJump.”

So, the be warned. The QGIS plugin is still “experimental” until the gdal problems are fully resolved, Don’t panic if uploaded raster do not look right on the QGIS canvas as yet.

Another “gotcha” involves implementing overlays to obtain zonal stats from polygon on raster overlays. I made major advances in understanding this today, thanks again to some useful advice from Pierre.

It is important to add the ST_SummaryStatsAgg function before trying to build simple queries to extract zonal stats from a polygon on raster overlay.

This is easy enough to do, as the function is written in SQL, so all that is involved, if it is not already included in your build, is copying and pasting the functions from here into the SQL editor and running them.

http://trac.osgeo.org/postgis/browser/trunk/raster/scripts/plpgsql/st_summarystatsagg.sql

The “gotcha” for me was that I intially assumed that the ST_SummaryStats worked without a grouping clause to aggregate the results, more or less in the way that the ST_SummaryStatsAgg function actually does.

I will clarify this issue tomorrow with some practical examples that I now have working to my satisfaction. Meanwhile I will post the official advice I got from from Piere, that will be incorporated in tutorials available from Geospatial Elucubration


"You can extract raster value summary from polygons in two modes:

1) the raster mode in which the polygon is rasterized:

SELECT bufid, (ST_SummaryStatsAgg(ST_Clip(rast, geom, true))).*
FROM rasttable, buftable
WHERE ST_Intersects(rast, geom)
GROUP BY bufid

2) the vector mode in which the raster is vectorized:

SELECT (ST_AreaWeightedSummaryStats(gv)).*
FROM (SELECT ST_Intersection(rt.rast, gt.geom) gv
FROM rasttable rt, linetable gt
WHERE ST_Intersects(rt.rast, gt.geom)
) foo
GROUP BY gt.id

The raster is much faster but everything is rounded to the shapes of the pixels.
The vector mode is slower but is precise since once pixels are vectorized, geometry/geometry intersections are performed. It is also the desired mode if you intersect lines and wonder about the total length of a line type in say a categorical raster (or the mean elevation of a road). It requires small tiles (10×10) or a least about the size of the mean polygon size.”
Pierre Racine

My own experiments confirmed all this, and I will show the encouraging results of some experiments tomorrow. Many thanks to Pierre, for not only writing so much of the code but for being so generous with advice and patient with users.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s