A common step when we are modelling ecological processes on a landscape is to set up a network of interconnected patches. The data that are used to initialise the model vector layers of polygons. The polygons vary in size and shape. They represent habitat patches such as woods, fields, heathlands etc. The habitat patches are assumed to be internally homogeneous for modelling purposes and they are assumed to be surrounded by a homogeneous matrix. Relaxing these assumptions typically requires the additional use of raster data in the form of cost surfaces and other features.
For example, Natural England provide a vector coverage that represents patches of “Ancient Woodland” found in the Newforest. The data are made public, but you must of course register with Natural England in order to use them.
The simplified data that are used in this exercise only contain habitat areas and perimeters. They can be loaded into Quantum GIS and the steps followed. Simplified data here.
There are over 400 discrete patches, some separated only by roads and drives. A key attribute of the polygons that we typically will incorporate in most models will be area. Perimeter length may also be used when considering edge effects.
A simple and intuitive measure of connectivity would be based on the minimum distance between the edges of each polygon. However this is surprisingly difficult to implement within a GIS. While measuring the distance between the centroids of patches is trivial and quick to carry out, such a distance measure provides a very poor measure of connectivity between large patches, as most of the distance between the centroids will probably consist of suitable habitat. The better measure of edge to edge distance involves tracking along the edges of neighbouring features and identifying the shortest result of a large number of distance calculations. When this has to be iterated for many combinations of polygons it can become extremely computationally intensive.
A search on the web shows that edge to edge distance can be calculated routinely using Arc GIS. However getting results in a useable form in Arc requires several steps. So it is good to know that PostGIS 2 now measures edge to edge distance between polygons (rather than distances between nearest vertices) by default. There are also other nice new features of PostGIS 2 that make it very powerful.
Providing that you have got PostGIS 2 setup locally it is extremely simple to import a shapefile from QGIS using the PostGIS manager plugin without any need to use commands.
Running a query
Once the data are in PostGIS running a single query will produce a layer containing a set of lines representing all the minimum distances between polygons a set distance apart. For example if the layer is called nfanwood (in the test schema) and we want to measure the distances to all patches within 2 km of each of the polygons this will do the job.
select a.gid from_gid, b.gid to_gid, st_shortestline(a.geom,b.geom), st_distance(a.geom,b.geom) as distance from test.nfanwood as a, (select * from test.nfanwood) as b where st_dwithin(a.geom,b.geom,2000)
The query run time is noticeable, but it completes quickly enough on a laptop to be very useful for moderately large vector layers such as this (around 19 minutes). It can be executed using the QGIS PostGIS query editor plugin and loaded into the canvas when complete.
Alternatively the table can be saved in PostGIS and then added as a layer. The DBManager plugin is better for doing this (Note in the shot below that I forgot to specify a schema, so the table went into the public schema).
Obviously the time for the run increases quickly as more polygons are added or the search distance increased. However PostGIS is unlikely to ever crash, so it is just a case of being patient with larger data sets and thinking about the number of combinations before starting. I have not tried the comparable operations in Arc GIS. They may run more quickly on small data sets, but they are likely to be slower overall to implement in practice as more discrete steps are needed to obtain a comparable output. They are also quite likely to fold completely when the data sets become larger due to differences between the underlying architecture of a desktop GIS and a spatial database.
There are a few things to note about the output from the query as it stands.
- The distances between each polygon and itself will be included. As this is produces all the zero distances the unwanted records can easily be removed using a simple query. In most cases it is not a problem, as at any point in an analysis distances of zero can be excluded. This could have been built into the query itself.
- If a patch is within the focal distance it will be joined, even if the line crosses another polygon. Again, if this is a real problem such lines could be removed using an intersection operation. In many cases removing them is not desirable as they represent a genuine potential for movement between patches (eg. pollen flow). It all depends on the nature of the process being modelled.
- The set of lines will include a distance between polygon a and b and a second (identical) line between polygon b and a. There is duplication. This is trickier to deal with and I have not worked out an elegant solution. They could be removed by an sql query that looked for unique combinations. However if the intention is to setup a network containing undirected links (edges) it will not usually matter. If using directed links it may even be desirable.
Running the query with a filter of st_dwithin of 2000 m will connect all the patches in this layer-
Finding many links using a rather large st_dwithin distance makes it easy to quickly look at the consequences of lowering the threshold with a simple query. The only overhead is initial computational time.
In the next post I will show how these layers can be imported into Netlogo and a network constructed from them.