“Raster” data in PostGIS?
PostGIS does not support raster data as yet. Hopefully there should be raster support soon. Raster layers can be visualised from a local or remote server using geoserver. However having raster layers integrated into a PostGIS data base would be extremely useful for tasks such as overlaying herbarium records with continuous extrapolated climate data or elevation.
I have previously experimented with importing the points of a raster layer. This works quite well. It can be used for simple spatial overlays using proximity operations. However the visual impression produced by the points is not very satisfactory and a query using the contains operator would be more efficient.
QGIS has a plugin for graticule generation. This forms a grid of individual polygons. So if a grid is produced using the graticule plugin of QGIS and then imported into PostGIS using SPIT it is possible to partially mimic a raster layer, although with some computational overhead.
The document mnp-workshop3.pdf showed how worldclim data (http://www.worldclim.org/) can be imported into GRASS and R.
If R is run from GRASS the following lines can be used to import the appropriate raster layers at a chosen resolution into R.
Most of the lines of the script are concerned with calculations on the basic worldclim data. The dataset consists of 36 grids of monthly mean minimum temperature, maximum temperature and precipitation. Some mean annual values may be derived for simplicity.
a<-c(paste(”tmin”,1:12,sep=”"),paste(”tmax”,1:12,sep=”"),paste(”prec”,1:12,sep=”") ,”alt”)
d<-readRAST6(a)
d@data[,1:24]<-lapply(d@data[,1:24],function(x)x/10)
fullgrid(d)<-FALSE
f<-function(x)mean(x[1:12])
d@data[["meanmim"]]<-apply(d@data,1,f)
f<-function(x)mean(x[13:24])
d@data[["meanmax"]]<-apply(d@data,1,f)
f<-function(x)mean(x[1:24])
d@data[["meantemp"]]<-apply(d@data,1,f)
f<-function(x)sum(x[25:36])
d@data[["totprec"]]<-apply(d@data,1,f)
The trick for importing the layers to PostGIS from R is shown below. If the data is in R as a spatialPixelDataFrame the coordinates can be added to the data to form a table.
dd<-data.frame(coordinates(d),d@data)
This table can then be saved to a PostGIS database using ODBC.
library(RODBC)
con<-odbcConnect(”mydb”)
odbcQuery(con,”SET search_path =chiapas, pg_catalog;”)
sqlSave(con,dd,”worldclim”)
###########################################
Now either in pgadmin or running the queries directly through R by placing them inside an odbcQuery function.
ALTER TABLE chiapas.worldclim add column gid serial NOT NULL;
ALTER TABLE chiapas.worldclim ADD CONSTRAINT worldclim_pkey PRIMARY KEY(gid);
ALTER TABLE chiapas.worldclim ADD COLUMN the_geom geometry;
UPDATE chiapas.worldclim
SET the_geom = PointFromText(’POINT(’ || s1 || ‘ ‘ || s2 || ‘)’,4326);
CREATE INDEX
grat30arc_idx_geo on grat30arc USING GIST(the_geom GIST_GEOMETRY_OPS);
CREATE INDEX
worldclim_idx_geo on chiapas.worldclim USING GIST(the_geom GIST_GEOMETRY_OPS);
If a polygon graticule called grat30arc covering the same region with a 30 arch second grid has been produced using the graticule plugin in Qgis and imported into the public schema then the average values of the points within each grid square can then be calculated for the graticule and a new table produced using this as a model query.
CREATE TABLE chiapas.worldclim2 with oids as
SELECT a.*, g.totprec, g.meanmin, g.meanmax, g.elev from
grat30arc a,
(SELECT AVG(c.totprec) as totprec,
AVG(c.meanmim) as meanmin,
AVG(c.meanmax) as meanmax,
AVG(c.alt) as elev,d.gid
FROM chiapas.worldclim c, grat30arc d
WHERE c.the_geom && d.the_geom
AND contains(d.the_geom, c.the_geom) group by d.gid) g
WHERE a.gid=g.gid;
The result is very similar to the visualisation of an arcgrid in arcview when it is viewed in QGis over a shaded relief map. In this case the pixels are approximately 1 km x 1 km (30 arc seconds or 0.008333333 degrees) and the elevation lines up well with the contours derived from a 1:250,000 map.
Spatial indices on the herbarium table and the worldclim2 layer with the following
CREATE INDEX
herb_idx_geo on herb.herbario USING GIST(the_geom GIST_GEOMETRY_OPS);
CREATE INDEX
worldclim2_idx_geo on chiapas.worldclim2 USING GIST(the_geom GIST_GEOMETRY_OPS);
Then the following query runs in less than four seconds to provide 6,000 herbarium records with extrapolated climate data to the nearest km.
SELECT h.*, w.totprec, w.elev,w.meanmin,w.meanmax
FROM herb.herbario h, chiapas.worldclim2 w
WHERE h.the_geom && w.the_geom
AND contains(w.the_geom, h.the_geom);
Or
select h.*, w.totprec, w.elev,w.meanmin,w.meanmax
from herb.herbario h, chiapas.worldclim2 w
where ST_contains(w.the_geom, h.the_geom);
Deforestation:Overlaying vector and raster layers in PostGIS using GRASS
A very common operation in GIS involves overlaying vector polygons on a raster map and calculating the area of each class in the raster map. The results are then added to the attribute table of the vector map. How can this be achieved in PostGIS?
If the raster layer is quite small it could be imported as points and the result achieved through a point in polygon SQL overlay. However that is not the usual way to achieve the operation. When large raster layers derived from the analysis of satelite imagery are of interest the conventional GIS approach is to first rasterise the vector layer. The computations involved in the overlay are then simple and fast.
This example taken from our ongoing work. It overlays a map of agricultural units on a raster layer representing deforestation in the last decade derived from analysis of satelite data. Because deforestation in Chiapas has tended to take place on a rather small scale involving only a few hectares at a time, it is not at all easy to see the large scale regional pattern from an image. Calculating the proportion of the total area of each agricultural unit deforested helps to see the pattern. The map above shows the percentage of the total (forested + non-forested) area of each agricultural unit estimated to have been deforested between 1990 and 2000.
The simplest way to achieve this result with PostGIS is to import the vector layer into GRASS, where the raster is also held. There are a few issues that have to be addressed when doing this. The first is that the gid index is lost when the attributes are imported. As a unique identifier for each polygon is needed during the overlay an index should be added to the table first if the gid is the only unique identifier for the table. This is very simple.
ALTER TABLE chiapas.agros ADD COLUMN id int8;
UPDATE chiapas.agros SET id=gid;
Now enter GRASS in a location with the same projection as the PostGIS data base (typically EPSG:4326)
Now there is another technical issue. Clearly vector processing in any GIS involves a combination of the geometery and attribute tables. We have both already in PostGIS. However in GRASS we use functions that work on GRASS’s own representation of the geometry, so an import is needed. GRASS can use a wide variety of databases as a back end for storing attribute data. Among these is Postgresql. What is not obvious, and potentially quite confusing, is that we already have the data in a Postgresql data base! So, does it all need to be imported again? The answer seems to be yes and no. It is quite possible to link the GRASS topology with the original table. However it is much easier to import it all in the usual way using v.in.ogr. This produces a new table, that could either stored in an external data base or perhaps even within the original PostGIS data base (as a new copy). Of course this attribute only table lacks the geometry column.
For example to set up GRASS to use the original local PostGIS data base as the backend for vector layers
db.connect driver=pg database=”host=localhost,dbname=gisdb2″
db.login user=Duncan pass=******
Note that this step is not necessary. By default Grass will store the table as in dbf format.
A layer can be imported from PostGIS using
v.in.ogr dsn=”PG:host=localhost dbname=gisdb2 user=Duncan” layer=chiapas.agros output=agros type=boundary,centroid
If the raster layer is in another coordinate system, as it is in this case, it will be necessary to exit the location and reproject in the location of the raster layer.
v.proj input=agros location=Global mapset=mesoanalysis output=agros
Then form a raster layer.
v.to.rast input=agros output=agros use=attr column=id layer=1 value=1
Then in GRASS the overlay is achieved quite simply using r.stats and sending the output to a text file.
r.stats -a input=agros,defor >agros.txt
This forms a file in which the first column holds the id of the vector layer, the second the class in the raster layer and the third the area in square meters of each combination. For example…
1 111 1319094.000000
1 112 159201.000000
1 122 1041304.500000
1 211 4026323.250000
1 222 85285437.750000
1 244 89347.500000
1 422 52796.250000
1 444 1749586.500000
2 111 565326.000000
So the task is now done, apart from the job of adding the data back into the original PostGIS table. There are many ways to do this. Often only one, or a few focal classes are of sufficient interest to merit a new column. So an easy way to process the output and send it back to PostGIS involves R.
Connect to the PostGIS data base using RODBC and set the schema
library(RODBC)
con<-odbcConnect(”mydb”)
odbcQuery(con,”SET search_path =chiapas, pg_catalog;”)
Read in the data and set the first column as numeric, removing any missing data (by default it will probably have been interpreted as a factor as GRASS will use a * for missing data).
d<-read.table(”agros.txt”)
d$V1<-as.numeric(as.character(d$V1))
d<-na.omit(d)
Now a subset of the data can be taken for each of the columns of interest, named and added back to the data base. In this case code “111″ represents area that was classified as forest in 1990,2000 and 2005. “122″ pixels that were deforested in the period between 1990 and 2000.
odbcQuery(con,”ALTER TABLE agros ADD COLUMN forest int4;”)
d2<-subset(d,d$V2==111)
d2<-d2[,-2]
names(d2)<-c(”gid”,”forest”)
d2$forest<-round(d2$forest/10000,0)
sqlUpdate(con,d2,”agros”)
odbcQuery(con,”ALTER TABLE agros ADD COLUMN defor2000 int4;”)
d2<-subset(d,d$V2==122)
d2<-d2[,-2]
names(d2)<-c(”gid”,”defor2000″)
d2$defor90<-round(d2$defor2000/10000,0)
sqlUpdate(con,d2,”agros”)
We are rather concerned that this cover change analysis, that has been achieved with the standard methodology used by Conservation International, tends to quite seriously underestimate small scale disturbance and canopy opening. Nevertheless in comparative terms it does demonstrate that absolute deforestation in Chiapas is certainly not a homogeneous process. Extensive deforestation is not occurring, although if the map is zoomed out further to show the Marques de Comillas area a very clear hotspot of deforestation is shown. In this part of Chiapas (Central Valley, Sierra and Altos) comparatively few agricultural units have deforested more than 2% of their total area in the last decade.
The other way of looking at this would be as the proportion of remaining forest lost. This woud tend to emphasise deforestation rates in ejidos and agricultural units with a very small ammount of forest in 1990. However at this level the approach would be somewhat unfair as it would be biased by classification errors in small areas.
To put the map in context a map with the proportion of the total area classified as forest can be used. In R the column can be added using
odbcQuery(con,”ALTER TABLE agros ADD COLUMN propforest float;”)
odbcQuery(con,”update agros set propforest=100*forest/hectares;”)
A surprising number of the agricultural units have over 40% tree cover. This has to be placed in the context of a classification method that draws no distinction between a pixel within a pasture with sufficient cover of secondary vegetation to be classified as forest and untouched primary forest. The next step clearly involves analysis of fragmentation in order to clarify this.
Postgresql 8.3 and Qgis 0.10.0 Io
Ecosur will soon be providing students and researchers with a networked data base for storing and visualising public domain spatial layers. This database should also store results from ongoing research. I am convinced that the advantages of providing a fully functional geographical data base at zero cost by using PostGIS/Postgresql are compelling. The release of a stable Postgresql 8.3 and Qgis 0.10 have made the task even simpler.
Postgresql 8.3 is now notably faster when running spatial queries than the version I had been using (8.1). It also has very advanced features. These move PostgreSQL up to the power of the premium proprietary enterprise data base, Oracle. They clearly go beyond those found in MySQL.
The only practical challenge I faced with PostgreSQL 8.1 involved a very simple issue of scalability when the database was viewed with Qgis 0.9. Despite some issues with the stability of Qgis (that did worry me, but appear to have been largely resolved) Qgis is a suitable visualisation tool for PostGIS. I have used it extensively since I started experimenting with PostGIS a few months ago and am generally satisfied. However a simple practical issue concerned me.
How could users easily be given access to the parts of a potentially very large database in such a way that they see only the part they need or are allowed to see?
The answer had clearly to be through the use of Postgresql schemas, but the Qgis interface didn’t show them in a user friendly way that matched the need.
This problem is now largely resolved by the new version of Qgis. Schemas now show up in an unfoldable tree structure. As groups of users and individual users can be given access to a subset of schemas it should now be feasible to design a database that hides all the complexity from users and allows them to quickly find the layers they need.
An example of using PostGIS from R
One of the interesting elements of PostGIS is that it can be run from R using RODBC. I have found this to be very useful, especially when tables are being altered, as it allows the properties of the results of a query to be quickly analysed and checked for sanity before being added to the data base. So in order to work with POSTGIS I usually open QGIS, PgAdmin and an R console and switch between them.
Here is an example of how you might calculate the population density within polygons and add the results to a PostGIS table.
In R first connect to a data base assuming an ODBC connector has been set up
library(RODBC)
con<-odbcConnect(”mydb”)
Now the example will find the sum of the population within polygons representing watersheds. This layer was originallybuilt using r.watershed in GRASS followed by r.to.vect. The automated watershed limitation doesn’t work very well along the coast, but provides a useful set if polygons for the rest of Chiapas, particularly the central valley. First add a new column to hold the result.
odbcQuery(con,”ALTER TABLE chiapas_basins ADD COLUMN poptot int4;”)
Then the query that calculates the total population can be run from R. The result can by looked at first in R to check that it is sensible. This is useful, as I found differences between using the 2000 census data and the 2005 population count that are not attributable to population change or migration, but rather are caused by errors in the geopositioning of the data.
sql<-”select sum(m.z1) as poptot, b.id from chiapas_conteo_2005 m, chiapas_basins b where m.the_geom && b.the_geom AND contains(b.the_geom, m.the_geom) group by b.id;”
d<-sqlQuery(con,sql)
This takes a while to run. I quickly discovered that it is extremely important to make sure that a GIST index is added before attempting point in polygon operations. This can be added to each table in PgAdmin, for example.
CREATE INDEX chiapas_basins_idx_geo on chiapas_basins USING GIST(the_geom GIST_GEOMETRY_OPS);
Now histograms and totals of the population in the watersheds can be produced in R to look at the results before adding them back into the data base.
For example
hist(log(d$poptot))
The population density is typically log normal.
sum(d$poptot)
4255957
This gives Chiapas a total population of 4.25 million which looks about right. As the query returned the results with a column name that has already been set up and has the primary key this line will update the database.
sqlUpdate(con,d,”chiapas_basins”)
Now we need to calculate the area of the watersheds in km2. To do this I first had to insert an srid for Albers conic projection for North America. This can be found here
http://www.spatialreference.org/ref/epsg/102008/
Then some more queries that can either be run in pgAdmin or wrapped up by odbcQuery(con, “some sql statements “) and run from R directly.
odbcQuery(con,”ALTER TABLE chiapas_basins ADD COLUMN areakm2 float4;”)
odbcQuery(con,”UPDATE chiapas_basins SET areakm2 = area(transform(the_geom,9102008))/1000000; “)
odbcQuery(con,”ALTER TABLE chiapas_basins ADD COLUMN density float4;”)
odbcQuery(con,”UPDATE chiapas_basins SET density = poptot/areakm2; “)
The results can then be visualised in QGIS. Whether this is easier than using ARCGIS certainly depends on what you are used to. I am still working out how to achieve fairly basic operations like this in PostGIS. However when I have achieved them once, it is then easy to repeat the operation for other examples, which is why I am posting fairly simple examples like this to the weblog for reference in case they can help others to overcome the intially steep learning curve with PostGIS. Although the geoprocessing wizard in ArcView is very simple to use and the point in polygon operation does seem to run much faster than the corresponding query in PostGIS, this particular operation would still require some additional work with the database tables if it were to be done using ArcView alone.







