After thinking a little more about the problem of the allometric equation in the paper that I discussed yesterday, I realised I had a partial solution already to hand. (*Again please read this as a marker for an idea that could be developed further. This is one of the motivations for using an informal web log instead of leaving ideas as notes on my own hard disk.)*

In order to work with a fairly simple individual based forest model I have implemented many different allometric equations in R over time. My favourites for use in data poor areas are still those derived originally by Botkin and Shugart from way back in the late seventies (see for example Shugart 1984). I have used them in both my thesis and a later implementation of the same model. Results from this are included in a recent book chapter. I aim to include the entire R code for this model on this site some point soon (after tidying up some elements and documenting).

The nice thing about the parameterisations used in these early models is that they were explicitly designed to be run with a minimal amount of knowledge. Superficially that doesn’t sound too encouraging. Shouldn’t we try to improve a model by adding more explicit detail? Not necessarily. After all purely empirical allometric equations assume no knowledge at all, apart from that provided through the data. That is their strength and weakness. In order to be used correctly you have to build them afresh for each area being studied. However if equations can be constrained by some very easily obtained field observations a sensible adjustment can be made for the conditions of each plot without going through the costly process of measuring everything afresh. The results certainly won’t be precise, but that is the point. Uncertainty can be included in the calculations through Monte Carlo simulation. The true value should then fall somewhere in the predicted range.

Botkin’s equations can turn measured diameter at breast height into estimated height and therefore estimated tree volume for any species based simply on an estimate of the maximum diameter the species can obtain and its maximum diameter. It is fundamentally a three parameter quadratic model constrained to take realistic values. As maximum height is quite site specific and can be estimated through rapidly made observations the equations are potentially very useful in the tropics where actual measurements of tree height are hard to obtain, or in cases of models of newly established forests when final height may have to be estimated from some knowledge of site quality. Notice that the equation discussed yesterday must be based implicitly on some sort of rule regarding tree height and form factor. You simply can’t get around this. The equation is a statistical black box that just takes in diameter and churns out above ground biomass. The fact that the equation doesn’t explicitly state that height has been estimated doesn’t make it robust. You simply can’t turn diameter into volume and thus biomass without making some sort of assumption regarding the height and shape of a tree, even if it is hidden within an impenetrable calculation. This is a very common failing of empirical equations and a good reason to generally mistrust their uncritical application (This is often the result of a misplaced appeal to authority, an empirical equation has been published by an important research group so it is eminently citable. Yes fine and good, but the equation still only applies to their particular data set.)

So here are the set of simple equations Botkin suggested. I have prefixed each function with f. to avoid confusing a function placed in the global environment in R with a variable.

(All the code shown below can be downloaded in this script … permanentplots2.doc)

f.B2<-function(Hmax,Dmax)2 * ((Hmax – 137) / Dmax)

f.B3<-function(Hmax,Dmax) (Hmax – 137) / Dmax ^ 2

f.BA<-function(Diameter)pi * (Diameter / 2) ^ 2

f.Height<-function(Diameter,Hmax,Dmax)137 + f.B2(Hmax,Dmax) * Diameter – f.B3(Hmax,Dmax) * Diameter ^ 2

f.AGB<-function(D,Hmax=2000,Dmax=100,FormFactor=0.8,r=0.5)r*FormFactor * f.BA(D) * f.Height(D,Hmax,Dmax)/1000

Let’s see how the equation for height works in practice. Assume maximum diameter is 120 cm, but forests (or species) vary in the maximum height they can reach. Note that as this is a quadratic (i.e. parabola) maximum diameter must coincide with the maximum in the data or the curve turns downwards.

D<-10:120

plot(D,f.Height(D,Dmax=120,Hmax=3000)/100,type=”l”,lwd=2,col=”green”,ylab=”Height (m)”,xlab=”Diameter”,ylim=c(0,30))

lines(D,f.Height(D,Dmax=120,Hmax=2000)/100,col=”blue”,lwd=2)

lines(D,f.Height(D,Dmax=120,Hmax=1000)/100,col=”red”,lwd=2)

legend(“topleft”,lwd=2,legend=c(“High (30m)”,”Medium (20m)”,”Low (10m)”),col=c(“green”,”blue”,”red”))

Now let’s compare with the formulas used in the paper and included in the previous post and multiply appropriately by basal area and form to produce volume.

plot(D,wet(0.5,D=D),type=”l”,lwd=2,col=”green”,ylab=”Above ground biomass (kg)”,xlab=”Diameter”)

lines(D,moist(0.5,D=D),col=”blue”,lwd=2)

lines(D,dry(0.5,D=D),col=”red”,lwd=2)

legend(“topleft”,lwd=2,legend=c(“wet”,”moist”,”dry”),col=c(“green”,”blue”,”red”))

points(D,f.AGB(D=D,Hmax=3500,Dmax=120),col=”blue”)

points(D,f.AGB(D=D,Hmax=2000,Dmax=120),col=”green”)

points(D,f.AGB(D=D,Hmax=1500,Dmax=120),col=”red”)

legend(“bottomright”,pch=1,legend=c(“Hmax=20m”,”Hmax=35m”,”Hmax=15m”),

col=c(“green”,”blue”,”red”))

So the curves are very comparable. The second set can be produced by making a single reasonable assumption regarding maximum height. In fact the equation for above ground biomass has four parameters. Maximum height, maximum diameter, “form factor” and wood density. The form factor is a term used in forestry and here it might best be expressed as “branchiness”. Due to the way tree growth occurs (the “pipestem”model) total above ground volume can never be expected to exceed the volume of a cylinder. Foresters are only interested in the trunk, but a factor ranging between 0.6 and 1 can be applied quite reasonably to biomass as a whole. Here I have set the value at 0.8. This reproduces quite well the original allometric equations. We can add in uncertainty regarding this parameter in the analysis.

This is now the point at which the equations become useful. If a lot of knowledge is available, then separate parameters can be assigned for each species. Maximum height is quite easy to get.Simply by making this justifiable assumption accuracy should improve markedly over the “one size fits all” assumption that was implicit in the paper under discussion. A nice element is that the quadratic is constrained so the meaningless estimates that can sometimes arise when using polynomials outside their range of applicability will never occur. With less information functional groupings can be used. In the worst case scenario the same equation can be used for all trees in the forest, as was apparently done with the empirical allometric equation in the paper (with the exception of including a species specific term for wood density).

Using R it is very easy to include uncertainty regarding these parameters. As has been pointed out in the previous post, this is likely to affect the conclusions drawn much more than within sample variability. The point is that bootstrapping on parameters of a polynomial allometric equation with no real life interpretation would be very dangerous. The curve could go in many unexpected directions. In contrast bootstrapping on an equation with direct interpretation is reasonable. Lets use a boxplot as a quick way of looking at the results. Lets assume a great deal of uncertainty in Hmax (20m with a sd of 3m) and some uncertainty in the form factor (0.8, sd=0.1). It is easy to extend the uncertainty to all four parameters.

boot.strap<-replicate(10000,f.AGB(D,Hmax=rnorm(1,2000,300),FormFactor=rnorm(1,0.8,0.1)) )

boot.strap<-data.frame(rep(10:120,times=10000),as.vector(boot.strap))

names(boot.strap)<-c(“D”,”V”)

boxplot(V~D,data=boot.strap,range=0.1,outline=F,col=2)

So we can see how uncertainty in total above ground biomass increases with the diameter of the tree. Finally lets see how this could be applied to the simulated data. If we just want an uncertain estimate of total above ground biomass by applying the bootstrapped conversion formula to a single large plot we can do it like this.

D<-rlnorm(100000,2.2,0.8 )

D<-D[D>10]

D<-D[D<120]

boot.strap<-replicate(1000,sum(f.AGB(D,Hmax=rnorm(1,2000,300),FormFactor=rnorm(1,0.8,0.1))) )

quantile(boot.strap/1000000,c(0.05,0.5,0.95))

5% 50% 95%

6.97 9.75 13.04

plot(density(boot.strap/1000000,bw=0.7),xlab=”Total above ground biomass kg x 10^6″)

abline(v=quantile(boot.strap/1000000,c(0.05,0.5,0.95)),col=c(“blue”,”black”,”red”))

The extension of the technique to be used on the difference between two measurement periods is trivial if the data is available. More accuracy is obtained by narrowing the range of parameters by using a separate parameter for each species, so there is clearly a cost of ignorance at work here.. This can even be quantified (cf with this post) The method can clearly also be applied to estimating future biomass yield in the case of plots being planted for fuelwood or biofuels, based on making some very simple but justifiable assumptions regarding growth form, wood density and planting density. Because a measure of uncertainty is produced this is very useful in decision making.