Speeding up buffered point on raster overlays in PostGIS

The query shown in the last post vectorised the raster in order to produce a very precise overlay and measure the are exactly. Raster points falling on the edge of the buffer would be split and only the part of the pixel falling inside is measured.
If pixels are small in relation to the polygons this is overkill. The rasterisation method that was used in this post would be accurate enough. This should run much faster, after finding a good tile size. There is as yet no equivalent of ST_SummaryStatsAgg when the raster layer consists of classes. However, it is easy to extract counts of the number of pixels in each cover class, which is all that is needed.


select
"name",id, (pvc).value coverclass, sum((pvc).count) pcount
FROM (SELECT
"name",
id,
ST_ValueCount(ST_Clip(rast, geom, true)) pvc

from

(select id,name,st_buffer(geom,0.1) as geom from
testdata where id < 11) bf,
globcov600x600
where st_intersects(rast,geom)

) foo
group by coverclass,id,"name"
order by name,id,pcount DESC

Contrast this query with the vectorised version


select
name,id, (gv).val,
100*sum(st_area((gv).geom)/area) as percent
FROM (SELECT
name,
id,
ST_Intersection(rast, geom) AS gv,
st_area(geom) as area
from
(select id,name,st_buffer(geom,0.1) as geom from
testdata where id = 12) bf,
globcov100x100
where st_intersects(rast,geom)
) foo
group by val,id,name
order by name,id,percent DESC

The vectorised version already calculates the percent cover while the rasterised version just returns a count of pixels. A more complex query could be built around this by further grouping, but the simplest solution is to send the results to R and sort it all out there.

Another issue was apparent with this particular data set, that consisted of typical herbarium collections. There were multiple records of the same species at some sites. This can be fixed by running a select distinct query on the combination of site coordinates and species name.


create table sp_sites as
select DISTINCT ON (site) id,name,lon,lat,geom from
(select *, (name,cast(lon as "varchar")||cast(lat as "varchar")) as site from testdata ) d

An even better solution would be to select only distinct sites. However this would then require indexing species to unique sites, which would make sense for a more formal database but is not needed for this exercise.

Now, the query for the whole data set can be run in a matter of minutes, rather than hours.


create table pixel_count as
select
"name",id, (pvc).value coverclass, sum((pvc).count) pcount
FROM (SELECT
"name",
id,
ST_ValueCount(ST_Clip(rast, geom, true)) pvc

from

(select id,name,st_buffer(geom,0.1) as geom from
sp_sites) bf,
globcov600x600
where st_intersects(rast,geom)

) foo
group by coverclass,id,"name"
order by name,id,pcount DESC

After importing the results into R with ..

library(RODBC)
library(reshape)
con<-odbcConnect("latam")
d<-sqlQuery(con,"select * from pixel_count")

A quick check showed that the total pixel count for each unique Id varied between 4041 and 4051. So at this level of accuracy it is fine to divide by a single number for all (the median was 4043).

So this processes the data into simplified classes for the region. Some of the less common land use classes are not included in order to keep things simple, but this can be changed if needed.


d$percent<-100*d$pcount/4043
d<-melt(d,id=1:3,m=5)
d<-cast(d,id~coverclass)
d<-data.frame(id=d$id,crop=d$"14",mixcrop=d$"20",mix=d$"30",openfor=d$"40",closedfor=d$"50")
d[is.na(d)]<-0
sqlSave(con,d,"Percent_cover",safer=F)

Then just glue the table back in place in QuantumGIS and display it.
select p.*,geom from
percent_cover p,
sp_sites s
where
s.id=p.id;

I hope this is useful as a template for similar analyses. Once a general form has been found for a spatial query in PostGIS raster it is quite easy to see how to adapt the query to other situations. In this specific case it may be necessary to review the choice of key landclasses that are selected in R in order to simplify the output. Linking R to PostGIS through RODBC is very useful, as a lot of data processing is simpler in R than in SQL.

At the time of writing ODBC install in Ubuntu 12.04 on a 32 bit PC involves installing unixodbc and editing

sudo gedit /etc/odbc.ini

The key is finding where the driver has been placed. It is now at
/usr/lib/i386-linux-gnu/odbc/psqlodbcw.so
Whereas on previous version it was
/usr/lib/odbc/psqlodbcw.so


[ODBC Data Sources]
mydb = Database description

[latam]
Driver = /usr/lib/i386-linux-gnu/odbc/psqlodbcw.so
Database = latam
Servername = localhost
Username = postgres
Password = postgres
Protocol = 8.2.5
ReadOnly = 0

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s