Permanent forest plots
On friday my colleague Mario Gonzalez brought an interesting article to my attention that has been published on line in the Open Access biology journal PLoS Biology. I am a fan of this journal myself and I have tried to place a feed to the table of contents on this site (WordPress apparently doesn’t pick it up). I was particularly moved by the editors’ personal reasons for supporting Open Access.
The article in question is available by clicking here.
I was intrigued by the use of the words “Assessing evidence” in the title. The monitoring of large permanent plots has become a staple source of evidence for tropical forest ecologists. In recent years a great deal of progress has been made in understanding the processes at work in tropical forests, largely as a result of heroic efforts in a few well studied plots. Permanent plot monitoring involves a huge investment in time and labour. It produces a great deal of replication at the level of individual trees. However the amount of replication at the landscape scale is clearly limited. This leads to inevitable challenges for both statistical analysis and for the interpretation of results. I was very encouraged by the honesty and transparency in the article’s method section regarding problems associated with data quality. For example the authors explain that “For the plots with more than two censuses, we were able to correct anomalous dbh values by comparing the stem dbh growth rates across census intervals. If a tree showed a dramatic change in dbh growth rate, we changed the one outside of the range (−5 mm/y, +45 mm/y) with the likely value, and updated the dbh value accordingly. This filter was applied using a computer routine, and then checked manually.”
This sort of analysis involves a lot of work after the data has been collected. It also leads to some necessary but rather arbitrary decisions being taken. Trees generally really shouldn’t shrink over time, but anyone who has worked with this sort of data in the tropics has confronted the problem. One element that might help with this in the future is the wider use of PDAs in the field to enable real time sanity checking of data as it is recorded. Errors are often made in the data capture process and not detected until field work is completed. The level of training of technicians also tends to improve over time. So the next decade’s results should be more consistent than the last.
However this is not th element that most concerned me. Despite the interesting title I was left rather unsure whether the authors really had managed to evaluate all the evidence the data set provided. The problem in my mind arose from a common assumption that uncertainty within a statistical analysis is all attributable to variability in the data. In fact any assumption used in any calculation will change the results.. It is now becoming common practice to attempt to quantify the sensitivity of conclusions to assumptions, but this doesn’t seem to have been done here. Instead,even though a complete census was undertaken, statistical significance was based on within sample variability using an arbitrary division into sub-plots and the assumption of independence between them. It is not clear how useful this is.
In fact the main conclusion from the work would seem to be heavily reliant on the use of an allometric equation to convert diameter (usually measured at breast height) into an estimate of total oven-dry aboveground biomass, the fundamental response variable of interest.
Let’s look at how sensitive conclusions might be to this. There is a mistake in the parentheses of the functions as printed in the paper, but as I understand the equations they can be implemented in R using the following code. I have repeated the function with different name and parametrisation for each of the forest types. The important point is that, at least on my initial reading of the analysis, the authors’ applied the same function to all trees in each plot assuming the equation for a forest type applied to the whole plot.
(The code shown below is available in this text file. Use it if there are problems with quotation marks .. permanentplots1.doc )
dry<-function(r=0.5,D,a1=0.667,a2=1.784,a3=0.207,a4=0.0281){
D<-log(D)
r*exp(-a1+a2*D+a3*D^2 -a4*D^3)}
moist<-function(r=0.5,D,a1=1.499,a2=2.148,a3=0.207,a4=0.0281){
D<-log(D)
r*exp(-a1+a2*D+a3*D^2 -a4*D^3)}
wet<-function(r=0.5,D,a1=1.239,a2=1.980,a3=0.207,a4=0.0281){
D<-log(D)
r*exp(-a1+a2*D+a3*D^2 -a4*D^3)}
x<-1:120
plot(x,wet(0.5,D=x),type=”l”,lwd=2,col=”green”,ylab=”Above ground biomass”,xlab=”Diameter”)
lines(x,moist(0.5,D=x),col=”blue”,lwd=2)
lines(x,dry(0.5,D=x),col=”red”,lwd=2)
legend(”topleft”,lwd=2,legend=c(”wet”,”moist”,”dry”),col=c(”green”,”blue”,”red”))
A point to notice is that the equations suggest that trees in moist forest have higher biomass than those in dry or wet forest. Seven of the ten plots in the paper are considered “moist” but these must apparently lie somewhere on a continuum between dry and wet. There seems to be no clear justification for using the same optimum equation for all of these plots. I am not quite sure of the units as the paper says this produces above ground biomass in Mg (tons). The units look as if they should be Kg to me.
Also these rather complex looking formula basically reproduce the same curve that could be provided by simply fitting a quadratic function with three parameters over the likely range for the data (10:120 cm lets say). Two parameters do not change.
The curves diverge only diverge sharply for larger trees and it may be argued that really large trees are rare, so this shouldn’t matter too much. In all undisturbed tropical forests that I know of there tend to be a large number of small trees and a very few large trees. Thus a simple simulated data set can be produced by slightly truncating a log normal distribution.
set.seed(1)
D1<-rlnorm(100000,2.2,0.8 )
D1<-D1[D1>10]
D1<-D1[D1<130]
hist(D1,main=”Simulated diameter distribution”,xlab=”DBH (cm)”,col=”lightgrey”)
With time, patience, knowledge of the plots and perhaps a suitable individual based simulation model to hand it would be possible to simulate some reasonable growth and mortality data for each tree over time. For the moment I will simply assume (quite wrongly of course) that all trees increase their diameters equally by 1cm. I will also assume no mortality at all, so an increment in biomass is assured. Not at all realistic, but the question is, how might the increment in biomass in a plot affected by the assumption made regarding the fixed paramaterisation of the allometric equation.
Let’s compare the assumption that the trees follows the dry forest formula against the assumption of the moist forest formula. The code divides the total number of trees into 200 random subplots and carries out the sort of bootstrap resampling that was used to provide confidence intervals. The test of significance used in the paper relied on whether 95% of the bootstrapped values where above zero. In this simulated case, without mortality, this is not interesting. The question is how important is the assumption made for the allometric equation.
nsubplots<-200
subplot<-as.factor(rep(1:nsubplots,each=length(D1)/nsubplots) )
D1<-D1[1:length(subplot)]
D2<-D1+1
d<-data.frame(subplot,D1,D2,moist=moist(D=D2)-moist(D=D1),dry=dry(D=D2)-dry(D=D1))
a<-with(d,tapply(moist,subplot,sum))
boot.moist<-replicate(1000,mean(a[sample(1:nsubplots,nsubplots,replace=T)]))
b<-with(d,tapply(dry,subplot,sum))
boot.dry<-replicate(1000,mean(b[sample(1:nsubplots,nsubplots,replace=T)]))
d2<-data.frame(category=rep(c(”moist”,”dry”),each=1000),bootstrap=c(boot.moist,boot.dry))
bwplot(bootstrap~category,data=d2,breaks=1000)
So in this simulated data, the confidence intervals attributable to the bootstrapping procedure are overwhelmed completely by the effect of changing the assumptions in the allometric equation. This was simulated data and the effect may be much less in real data. But it should be taken into account in some sense. One way of dealing with it is to bootstrap on the parameters used in the entire calculation as was done here.
Another interesting point is that although PLOs Biology is laudably open access, the raw data itself is not made available. Thus I had to simulate data to try to make a point. The future of open access research should (IMHO) be directed towards open, transparent and documented data analysis. R can play an important role in this as any analysis can be freely shared as a script. We always have to make assumptions when analysing data and some assumptions are a matter of personal preference or instinct. A future model for meta-analysis might be based on an “analysis of analyses”. The more ways a data set is looked at the more likely it is that its fundamental content will be revealed.
Aggregating time series in R: The Iraq body count
Yesterday I was asked whether R could be used to analyse time series. The answer is of course it can. R is used extensively in the financial sector for analysing complex time series such as stock prices. I have already included an example using R in the context of climate variability (El Niño). One challenge is that there are a lot of different ways of working with time series and representing dates. Aggregation can be rather tricky. The zoo package is one of the most powerful tools for working with time series, but it is not always simple to use. I still haven’t got my head around all the different ways to achieve results.
So here is an example. The code first reads in the data from an online database, then uses tapply to sum the number of casualties per day. Zoo is then used to aggregate by month and the total is plotted. If anyone reading this can suggest better ways to do this or add a more sophisticated analysis I would like to know.
(Open this document if the code doesn’t work due to problems with quotation marks bodycount.doc)
Or try this …
source(url(”http://duncanjg.files.wordpress.com/2008/03/bodycount.doc”)) ,
again you might need to retype the quotation marks)
library(zoo)
library(date)
a<-”http://www.iraqbodycount.org/database/download/ibc-incidents”
d<-read.csv(url(a),skip=11,header=T)
a<-tapply(d$Reported.Minimum,d$Start.Date,sum)
x <- zoo(as.vector(a), as.POSIXct(as.date(row.names(a))))
f<-function(x)as.POSIXct(as.yearmon(x))
Deaths<-aggregate(x,f, sum)
par(bg=”lightgrey”)
plot(Deaths,lwd=3,col=2)
However you get there, the result is shocking. These are documented civilian casualties and I chose the lower estimate.
Even if the trend was initially downwards after the start of the surge, it still only bottomed out at around the level it began at when the “mopping up” operations were taking place in the first few months after the invasion. The time series taken from the compiled online data base stops in January 2008, so it doesn’t take into account the recent renewal of violence. Today, Thursday 6 March there were 86 civilian dead. Two bomb attacks killed 68 in Baghdad alone. Apparently this is not an isolated incident. The trend is sadly upwards again.
Joseph Stiglitz has estimated the price of the war at 3 trillion dollars. I wasn’t even sure what a trillion was until he used the figure. It has twelve zeros, in other words a thousand billion. That works out at 30 million dollars for each dead Iraqi civilian or enough to make ever one of these people as rich as Bill Gates. No further comment, apart from a heartfelt request to visit the site of those who have worked so hard to compile this important data set and re-run the code periodically to check the updated figures.
As an addition, I was saddened to hear that Harvard professor Samantha Power has resigned from Obama’s campaign team this week, apparently for speaking too openly on this, among other, issues. I was extremely impressed by her thoughtful, yet emotional, contribution to BBC radio’s start the week in which she talked of the biography she has written on Sergio Vieira de Mello, a heroic figure who constantly impressed me in every interview I heard with him up to his tragic death in Iraq. The recent news suggests that honest, open minded expression by academics is still considered to be a liability, even for apparently honest, open minded, politicians. This is a link to the Hard Talk interview.
Open source, open ideas
The nobel prize winning economist Joseph Stiglitz is very fond of quoting Thomas Jefferson. “Knowledge is like a candle; as it lights another candle, its original light is not diminished”.
Economists like Stiglitz refer to the opposite of this property as exclusivity. In other words if I eat a box of chocolates or turn a tree into timber, you can’t have the chocolates or enjoy the shade and scenic beauty provided by the tree. Only one of us can consume a consumable. On the other hand if I have an idea I can share it with you, if you are interested. If my idea happens to be a good one then denying access to it, or the right to use it, gives rise to an inefficiency. Traditional intellectual property is based on such exclusion. Knowledge is a global public good that is of potential benefit to anyone in the world. There is a global social cost in depriving anyone in the world the right to use available knowledge.
Most of our individual ideas are “memes”. In other words we have all been lit from someone else’s candle. Originality in science is extremely difficult and is only partially achieved even by the small handful of creative individuals that manage to provide a new spark. We all worry that we are wrong or being too derivative when we think or write about scientific issues but I have (quite recently) come to the conclusion that we really shouldn’t be worried that our ideas are not original providing we are casting some light, somewhere. The more candles that are out there, the more light is cast and the collective effect is to everyone’s benefit.
In the open source model of software development the light cast has an even more positive effect, because software is itself useful as a tool for achieving other goals. Software developers in fact make some of the candles. I have neither the time nor the ability to contribute code to open source projects, but I appreciate the tremendous generosity of all those who do and try to support and use open source alternatives to commercial software as a matter of principle.
Publication bias
One of the stories in the news in the UK this week was the publication of a study by Irving Kirsh from the University of Hull and various colleagues on the efficacy of anti depressant drugs, such as the well known selective serotonin uptake inhibitor, Prozac. (101371_journalpmed0050045-l.pdf) . The researchers gained access to previously unpublished studies. This led them to some interesting conclusions. Although meta-analyses of antidepressant medications report modest benefits over placebo treatment, when the unpublished trial data are included, the benefit falls below accepted criteria for clinical significance. There is a link to a file of a BBC radio podcast on the subject here. There are also some earlier studies by the same group that tell a similar story 1171.pdf
It is easy to see this as a simple example of drug companies being cynically economical with the truth. However even drug companies stand to lose out in the long run if they peddle merchandise that doesn’t do what it says on the can. A more important aspect of the study is its broader implications for the scientific publication process in general. The message is that referees should not reject a study for publication merely on the grounds that “significant” results were not produced. Observational studies in ecology are particularly difficult and statistical significance testing is often quite inappropriate and misleading. I will return to this theme in more detail. Meanwhile here is a link to the web site of the Journal of Negative Results Ecology and Evolutionary Biology.
Another, slightly more subtle lesson to be drawn from this study concerns the general importance of data sharing and data pooling. Real insight into how a process works in an applied context often needs a lot of data. In this case, once enough data had been assembled the researchers not only were able to ask the question “is there a response” but could go further and ask about the shape of the response. The question was meaningful as its answer suggested where the benefit of the drugs effects could lie. This is a critical element of statistical analysis that introductory courses on statistics often overlook. Often the most meaningful questions concern issues such as “is the inclusion of a quadratic term supported by the data” rather than a test of the null hypothesis of no effect. Such questions are often unanswerable by a single study. This is another reason why researchers and organizations that do not make raw data available hold back science.
Kirsch I, Deacon BJ, Huedo-Medina TB, Scoboria A, Moore TJ, et
al. (2008 ) Initial severity and antidepressant benefits: A meta-analysis of data submitted to theFood and DrAdministration. PLoSMed 5(2): e45. doi:10.1371/journal. pmed.0050045
Earthquake in Lincolnshire!
I grew up in Lincolnshire which is rarely in the news, so I was quite proud to hear that it was at the epicentre of an earthquake that measured 4.8. To put this into perspective the earthquake I reported at 6:52 am on the 12 February in San Cristobal measured 5.2. However to be fair an earthquake in England is a rare event, while in Mexico we have several every year. I imagine that tomorrow’s news will be full of eye witness reports of the event. Much of the reporting will no doubt be quite light hearted. British residents don’t appreciate the sheer terror that people experience when an earthquake is felt in regions where they are destructive. Even though San Cristobal itself has had few major earthquakes, every time we feel the earth move we are very unsure how bad it could turn out to be. But this time we have been stirred rather than shaken.
As an aside I made the small illustration above using a Nasa satelite image and the excellent open source software QGIS. QGIS stands for Quantum GIS. It took less than a minute to connect to the Nasa site, download the image and find the rough epicentre. I am a great fan of Google Earth but at the moment the fact that GE has a mess of overlays at different scales is a barrier to it being used for illustration at a regional scale. Also the 3d terrain in Google Earth is great for Chiapas, but rather irrelevant in Lincolnshire.
Encyclopedia of life
I have just registered with the Encyclopedia of Life. The idea behind the project is outlined in this BBC article.
The project aims to be the equivalent of Wikipedia in the field of systematics and ecology. If successful detailed descriptions of all the world’s known organisms could be available to anyone with an internet connection. This is a very exciting prospect, especially if the success of the project coincides with advances in the barcode of life. One of the barriers to greater understanding of the distribution and abundance of organisms remains difficulty in identification and ignorance of their characteristics. This initiative should help to democratise this knowledge and place it in the hands of field researchers. It is also a collaborative project in which we too have an obligation to contribute.
It takes a reason to reason
Walking home from work I listened to a podcast of BBC radio’s Analysis program. It dealt with political story telling. Some of the most interesting and insightful comments were made by the American clinical psychologist and political strategist Drew Weston. Here a is a link to an interview with Weston.
In the short sound bites that were included in the analysis program Weston made the point that cold objectivity alone cannot capture peoples’ imaginations nor motivate them to think deeply. It is emotion that leads people to reflect on their own prejudices and perhaps even change their world views. He summed this up in the phrase “it takes a reason to reason”. I also liked his neat line “I don’t think anyone remembers Martin Luther King’s ‘I have a plan’ speech. “
This is relevant in the year of Barak Obama. Whatever the outcome of the primaries it is now inevitable that, one way or another, the most powerful nation in the world will soon be led by its first rational president in almost a decade. It is always easy to be cynical about politicians’ appeal to peoples’ emotions. Weston makes the point that an appeal to the right sort of emotions is much more likely to get rational things done than an appeal to logic alone. That is why Al Gore’s passionate, if not wholly accurate, treatment of the climate change issue has been such an important contribution to changing the direction of the climate debate towards one based on rational evaluation of evidence.
Biofuels and global warming
The pros and cons of biofuels raise extremely complex issues. Without the time, the inclination nor the obligation to wade through the mass of calculations that are needed to reach a definitive opinion on whether biofuels can make any meaningful contribution to the fight against anthropogenic climate change I am reluctant to express an opinion.
If the scientific jury in general is still out on the issue it is only because they haven’t yet been told what the charges are. If anyone were to be asked to cast a verdict on whether biofuels alone can make a serious contribution to reducing global carbon emissions the issue would be much more quickly resolved. The photosynthetic process is simply not efficient enough to provide a meaningful proportion of our modern global economy’s energy needs. There are much better ways of turning sunlight into energy than growing plants and then burning them.
However the global issue is clouded by local concerns. There are potential winners from biofuels and their interests should of course be considered when weighing the issues. Biofuels can play a very positive role in recycling waste and giving value to forest products. If the correct checks and balances are put in place there may be a place for biofuels in our future energy balance. However to automatically label biofuels as green altenatives while so many legitimate questions have been raised would be irresponsible. There are serious concerns not only regarding the net carbon balance involved in their production, but also the multiple undesirable side effects in terms of global food security, land use change and biodiversity loss.
A much broader issue is involved here that was touched on in a previous post. The IPCC scenarios that provide hope that carbon emissions will be below the critical threshold all involve global rather than local action. When issues such as biofuels are under discussion it is important to evaluate how seriously the extent of future global warming is being taken. If an argument has not considered the wider system of global feedbacks doubts must be expressed. Turning food into fuel in one part of the world cannot fail to influence the behaviour of farmers in another part of the world. If behaviour change threatens forests it is clearly difficult for anyone involved in conservation and rural development to express support for the policy.
Relativism
The main news story in the UK last week was the reaction to the comments of the archbishop of Canterbury regarding Sharia law. There were extremely strong counter reactions to his views.
In common with the majority of the British population I felt Rowan Willamson’s position to be fundamentally dangerous. Cultural relativism is an extremely useful research tool for anthropologists, sociologists, historians and even ecologists with an interest in resource management. We should all work hard on developing the admirable skill of listening and understanding the concerns and viewpoints that are the product of alternative world views. Individuals who hold culturally imposed beliefs that differ from our own should always be respected as individuals.
However I find myself in full agreement with Richard Dawkins in rejecting the extension of relativism to include tolerance for cultural practices that violate fundamental individual rights. The most shocking elements of the Archbishop’s speech were not the ones that got the publicity.
Rowan Williamson’s position was deliberately misunderstood and shamelessly exploited by right wing critics in the UK. He clearly did not argue that public stoning and beheading should become a part of an alternative Sharia law that would be tolerated under the British Judicial system. In one sense he made a quite reasonable case that some elements of civil life, such as divorce and family disputes could be settled by Muslims under Muslim customs. However he understated the most admirable aspect of the British judicial system. It is respect for the principle that disputes are settled through the careful inspection and evaluation of empirical evidence. I was deeply disturbed by the following phrase in Williamson’s speech …. “Perhaps it helps to see the universalist vision of law as guaranteeing equal accountability and access primarily in a negative rather than a positive sense”.
Of course the system in the UK is far from perfect, but when the principle (that is expressed as the right to a fair trial) is not upheld, we are generally appalled and demand “justice”. This is not the case in all cultures. I am frequently shocked and scared by the lack of respect for the fundamental principles of (my culturally constructed notion of) justice in Mexico. Hearsay evidence is considered of greater value than forensic evidence and “criminals” are considered guilty until proven innocent. Forty percent of the Mexican prison population has not yet faced trial for the crimes of which they are accused. This is not likely to change in the near future. The Mexican population appears to have a culturally constructed tolerance for “their” form of justice, that also extends to the denigration of captured criminals on public TV stations, after they have clearly been mistreated in custody.
This is the stark problem with relativism. Ironically it leads to a defense of a dangerous division based on a classification into “them” and “us”. Go beyond the Archbishop’s polished, academic, postmodernist speech (that I admit to finding impenetrable) and you encounter a much cruder, paternalistic form of relativism in which it can be justified to state that “they” should be allowed to settle matters according to “their” customs. I have no understanding of the way Sharia law evaluates evidence. However I doubt if the archbishop does either (”This lecture will not attempt a detailed discussion of the nature of sharia, which would be far beyond my competence”) . He was making a liberal point based on tolerance of a whole culture, not on respect for individual rights. I doubt if he knows whether, in matters of divorce, if a Muslim wife is reported as seen entering a car with a stranger that this is sufficient “proof” of infidelity and grounds for a divorce settlement that British law would consider unfair (recourse to obtuse statements regarding the “neuralgic questions of the status of women and converts” just don’t cut it for me. Postmodernism gives me a bigger headache). It may well be that evidence is weighed as carefully in a Muslim court as in British civil proceedings. However the precedent set by the implementation of judicial systems outside the UK does not bode well.
Bar codes for plants
An extremely exciting development in the use of DNA “bar coding” was brought to my attention this week. (BBC) (Science daily). Two large-scale field studies have apparently showed that different plant species can be distinguished quickly using a gene found in the chloroplast. At the time of the successful early work on bar coding (eg. Hevert et al 2004) it was thought that it would be rather challenging to extend the technique to plants.
Apparently this is not the case. Chloroplast DNA is just as useful as mitochondrial DNA. The new study used the “matK” gene from the chloroplast to identify 1,600 species of orchid from Costa Rica, discovering at the same time that what was previously assumed to be one species of orchid was actually two distinct species that live on different slopes of the mountains. In South Africa the same team was able to use the matK gene to identify the trees and shrubs of the Kruger National Park. In the long run the aim is to create a genetic database of the matK DNA of as many plant species as possible, so that samples can be compared to this database and different species accurately identified.
This could potentially be the most important development in the last fifty years for ecologists and biogeographers. My own institution (Ecosur) is keen to use the technique. If indeed it can be quickly placed in the hands of researchers throughout the tropics, much of the confusion surrounding the distribution and abundance of plants could finally be resolved.
However in order to address some of the most interesting “Wallacean” biogeographic questions (as opposed to “Linnean” taxonomic issues) requires a great deal of investment in new field work (eg Bini et al, Cayuela, Golicher et al en prep). Let us hope that the hard work that will be needed in order to fully research biodiversity “in situ” receives the funding required.
Hebert, P. D. N.; Stoeckle, M. Y.; Zemlak, T. S. & Francis, C. M. (2004): Identification of Birds Through DNA Barcodes. 2(10): 1657-1663.
Bini LM, Diniz-Filho JAF, Ranger TFLVB, Bastos RP, Plaza Pinto M (2006) Challenging Wallacean and Linnean shortfalls: knowledge gradients and conservation planning in a biodiversity hotspot. Divers Distrib 12:475–482
