Nearest neighbour in POSTGIS

The issue to solve is how to find the name of the nearest blue point (settlements with a population over 100) from the red points (plant collections).

Some interesting ways of solving the problem are provided here.

http://www.bostongis.com/?content_name=postgis_nearest_neighbor_generic

There are some slightly easier way using the current version of PostGIS (1.8).

Breaking down the queries by parts. First find all the towns with a population over 100.

We will call this query “w” and we wrap it up as subquery.

(select * from mexico.conteo2005 where pobtot>100) w

Now we choose all the elements we want to see from the herbarium table together with a calculation of the distance between the two points. The original tables are in epsg:4326 so distances are in degrees. This doesn’t matter for our purposes, but it would be easy to include a CRS transform in PostGIS if we do want distances in km.

select h.gid, nom_loc,genus,species, h.the_geom, ST_distance(w.the_geom,h.the_geom) as dist from mexico.oldherbario h,

The search can be restricted to a distance of 0.1 degrees (approximately 10km) using

ST_DWithin(w.the_geom,h.the_geom,0.1)

And we want to order the results first by the herbarium collection ID and then by the distance.

order by h.gid,dist

Roll this all together and we get

select h.gid, nom_loc,genus,species, h.the_geom, ST_distance(w.the_geom,h.the_geom) as dist from mexico.oldherbario h,
(select * from mexico.conteo2005 where pobtot>100) w
where ST_DWithin(w.the_geom,h.the_geom,0.1)

order by h.gid,dist

So now we just need to get the first entry from each group and we have the nearest neighbour (with a population over 100 and within 10 km … the later criteria could be changed easily if needed, but the query would be slower). We can do that using select distinct. taking advantage of the fact that the function will return the first distinct id for each group, and the data is ordered by distance, so this is the nearest neighbour.

Less than 7 seconds is not bad. The query now looks like this

select distinct on(d.gid) d.* from
(select h.gid, nom_loc,genus,species, h.the_geom, ST_distance(w.the_geom,h.the_geom) as dist from mexico.oldherbario h,
(select * from mexico.conteo2005 where pobtot>100) w
where ST_DWithin(w.the_geom,h.the_geom,0.1)
order by h.gid,dist) as d

Finally if we wish to visualise the results we can form a new table or a new view.

create table chiapas.herblocalities as

select distinct on(d.gid) d.* from
(select h.gid, nom_loc,genus,species, h.the_geom, ST_distance(w.the_geom,h.the_geom) as dist from mexico.oldherbario h,
(select * from mexico.conteo2005 where pobtot>100) w
where ST_DWithin(w.the_geom,h.the_geom,0.1)
order by h.gid,dist) as d

If you are going to keep the table in the data base don’t forget to add entries in the geometry_columns table.

INSERT INTO geometry_columns(f_table_catalog, f_table_schema, f_table_name, f_geometry_column, coord_dimension, srid, “type”)
SELECT ”, ‘chiapas’, ‘herblocalities’, ‘the_geom’, ST_CoordDim(the_geom), ST_SRID(the_geom), GeometryType(the_geom)
FROM chiapas.herblocalities LIMIT 1;

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s