Another very common operation using both raster and vector data involves extracting statistics from the raster that falls within a buffer around points. The method shown yesterday can be used to obtain mean values from numerical values. Extracting the percentage of the buffer that consists of each category of a categorical raster requires a slightly different approach. Examples are shown here
A straightforward question arose yesterday that seemed to require exactly this sort of approach. What proportion of the area around collection points is forested?
At a fairly coarse scale the GlobCover2009 map may be a starting point. It can be downloaded and used with permission from here.http://ionia1.esrin.esa.int/
The file comes with a colour table so in order to view the file in QGIS I put together a quick style sheet using the same approach shown in a previous post. Style sheet available here.
I will try to tidy up the code for doing this and make it generic some time.
The geotiff can be imported into PostGIS using various tile sizes. For this particular use I found that 100×100 was about right. I first clipped the file to the study region, in this case South America.
raster2pgsql -s 4326 -d -I -C -M globcovsam.tiff -F -t 100x100 globcov100x100|psql -d latam
I then added in some collection points.
So ready to go.
Buffers can be drawn around points easily using PostGIS with a simple query run from the database manager. For example, using a 0.1 degree buffer(if distances are required in Km it may be better to transform the SRID).
So to calculate the proportion of each cover type around a point I adapted the queries shown in the tutorial linked above.
100*sum(st_area((gv).geom)/area) as percent
ST_Intersection(rast, geom) AS gv,
st_area(geom) as area
(select id,name,st_buffer(geom,0.1) as geom from
testdata where id = 12) bf,
group by val,id,name
order by name,id,percent DESC
It is slightly more complex as the outputs have been grouped by the id of the collection point and the name of the species (in this case only one point is selected, but the intention is to run the query on the whole data set).
This ran in just under 2 seconds, which is fine for one record. However the data set of interest in this case has around 6000 records so I am looking at ways of optimising this. It is slower than would be ideal.
Changing the query in order to extract results for a magnolia species found at several sites.
Notice that the output has a column identifier for the site, species, landcover class and percent cover in the buffer. It can be convenient to make a temporary view from the queries in order to avoid typing if they are going to be exported through ODBC, as is the case here.
create view magnolia as
select ... etc, etc
Now analysis of the results can be caried out in R through ODBC. This can be particulary useful for reshaping the data, which may be needed here in order to get one row for each site. You need an ODBC connection setup to do this.
d<-sqlQuery(con,"select * from magnolia")
id crop mixcrop mix openfor closedfor
1 13 2.1207805 7.873869 28.57101 59.19737 0.000000
2 14 2.1207805 7.873869 28.57101 59.19737 0.000000
3 15 2.1207805 7.873869 28.57101 59.19737 0.000000
4 16 2.1207805 7.873869 28.57101 59.19737 0.000000
5 294 0.5037997 12.216632 18.62423 58.15301 2.284837
6 295 0.5037997 12.216632 18.62423 58.15301 2.284837
Data summaries of various kinds could be easily produced in R, but the purpose of sending the table back into PostGIS is to visualise the results again. Now that they are in the form of one row per record with the same id number as the original records they can be joined to the points.
In QuantumGIS database manager run
select testdata.geom,test.* from test,testdata where test.id=testdata.id
Now it is easy to make piecharts to show the results.