PostGIS raster statistics with PLR

Although it is simple to move data from PostGIS in and out of R using ODBC there are some moments when it would be nice to use an R function within the database. This is easy enough in Ubuntu. First install the postgresql-plr package and then run the query through your data base to add the language (plr.sql goes into /usr/share/postgresql/9.1/extension in Ubuntu 12.04)

sudo apt-get install postgresql-9.1-plr
psql -d YourDataBase -f /usr/share/postgresql/9.1/extension/plr.sql

Now you can use R code (with a few additional complications) directly within PostGIS. This is useful when a query produces multiple values. In PostGIS raster this can occur in two ways. A query can produce an array of raster values, or a query may use a group by clause. The latter requires an aggregate function, but the first is very simple as it produces an array that R will treat as a vector.

So, for example, if we want to extract all the NDVI values for a Mexican Ecoregion using the Modis data imported in the previous post we could write.

select desecon3,mon,yr,(st_dumpvalues(ST_Clip(rast,geom, true))).valarray
from
(select extract(month from fdate) mon,extract(year from fdate) yr,* from modis.ndvi) m,
(select * from ecoregions where desecon3 like '%Depresion Central de Chiapas%') S
where st_intersects(rast,geom)

data_array

This produces an array of all the NDVI values for each combination of polygon, month and year. The data type is 16 bit signed integer, which has a theoretical range of values from -32,768 to +32,768. The documented data range is from -2000 to +10000. What that means in practice is that all negative values can be discounted as being produced by water or no data. Very low positive values (< 1000) are derived from urban areas or totally barren desert. So, to look at NDVI dynamics for vegetated areas we may want to impose a cutoff at around 1000 to ignore water and built up areas (taking the raw 16 bit values).

In R we would just write

x<-x[x>1000]
mean(x)

Using PLR we can do exactly the same thing.


CREATE OR REPLACE FUNCTION ndvi_mean (float[]) RETURNS float AS '
   x<-arg1
   x<-x[x>1000]
   mean(x)' 
   LANGUAGE 'plr' STRICT;


select desecon3,mon,yr,ndvi_mean((st_dumpvalues(ST_Clip(rast,geom, true))).valarray)
from
(select extract(month from fdate) mon,extract(year from fdate) yr,* from modis.ndvi) m,
(select * from ecoregions where desecon3 like '%Depresion Central de Chiapas%') S
where st_intersects(rast,geom)

Of course you could use any other R functions in the same way, providing they return a useful value.

True aggregate functions (that work on many rows rather than an array) are just a bit trickier to set up. The classic example is a function that returns the median (http://www.joeconway.com/plr/doc/plr-aggregate-funcs.html)

This requires two steps. First the r function that does the work is declared in exactly the same way as above.

create or replace function r_median(_float8) returns float as '
  median(arg1)
' language 'plr';

Then an aggregate function is added.

CREATE AGGREGATE median (
  sfunc = plr_array_accum,
  basetype = float8,
  stype = _float8,
  finalfunc = r_median
);

So if (for demonstration sake) we want to take the median value of the mean_ndvis for each month and year in the data set this will do the job.

create or replace function r_median(_float8) returns float as '
  median(arg1)
' language 'plr';

CREATE AGGREGATE median (
  sfunc = plr_array_accum,
  basetype = float8,
  stype = _float8,
  finalfunc = r_median
);


CREATE OR REPLACE FUNCTION ndvi_mean (float[]) RETURNS float AS '
   x<-arg1
   x<-x[x>1000]
   mean(x)' 
   LANGUAGE 'plr' STRICT;

select yr,mon,median(ndvi_mean) from
(select desecon3,mon,yr,ndvi_mean((st_dumpvalues(ST_Clip(rast,geom, true))).valarray)
from
(select extract(month from fdate) mon,extract(year from fdate) yr,* from modis.ndvi) m,
(select * from ecoregions where desecon3 like '%Depresion Central de Chiapas%') S
where st_intersects(rast,geom)) foo
group by yr,mon
order by yr,mon;

If this is turned into view (let’s call it test) then we can import the results into R proper using RODBC

library(RODBC)
con<-odbcConnect("modelling")
d<-sqlQuery(con,"select * from test")
ndvi<-ts(d$median,start=c(2002,7),frequency=12)
plot(stl(ndvi,s.window=12,t.window=30))

This suggests that there is no overall trend in NDVI values in the Central Depression once seasonal fluctuations are accounted for. The yearly changes are minor and the final upturn is an artefact of incomplete data for 2012.

data_array1

4 thoughts on “PostGIS raster statistics with PLR

  1. Great to see that you’re using ST_DumpValues() for PL/R as that was my original reason for writing ST_DumpValues() (in addition to my need to debug).

  2. Hi,

    I was at Geostats 2014 and really enjoyed your videoconference on R, PostGIS and plr. Now I just started using
    pl/r and I ran into a problem you may help me with: I wrote a plr function that plots data to a pdf using R plotting functions. The problem is that the created pdf can only be saved to /tmp if I want to access it (otherwise I need to log in as postgres to recover it from the postgresql working dir) and also, I need to sudo and chmod the permissions on the pdf if I want to open it. Default permissions are set to postgres only. This is very annoying because I have to do it everytime I want to look at a pdf I created. Have you ever had this sort of permission problem with plr and postgresql in general? I tried changing the database owner to my user and it didn’t help. Also tried running from psql logged in as my user but no luck there either… Thanks.

    • Hello Gillaume,

      Yes, I know the problem. I solve it by adding a system command in the PLR code itself to change the permissions. Something along the lines of
      system(“chmod 777 /var/www/apps/figs/temp/climdiag.png”)
      You don’t need (and can’t run anyway) sudo from within PLR. The owner of the files is not the owner of the database nor the user that you log in. It is always postgres, but while you are running PLR you do have the rights to change the owner or permissions. Hope that helps.
      I tend to use PLR for small functions that do something useful with data within the data base, but use RODBC for more complete data analysis that produces graphical output. You can run this server side using the RStudio Rserver.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s