What is a Fossil Species..?

What do we currently understand by a ‘species’?

Naming species, also known as alpha taxonomy, forms the fundamental basis and core of systematic analysis (e.g., for biodiversity, macroevolutionary and ecological studies). Since the origin of the species concept, there has been heated and continuous debate as to what exactly constitutes a species. The discovery of DNA as an evolutionary tool sparked a vigorous new line of discussion into what precisely defines a species. Even to this day, despite a wealth of theoretical, empirical and philosophical studies, there is still a lack of consensus in the way of rigorously defining a species unit. This is not to say that there isn’t a general idea of what a species is (ask any biologist or palaeontologist); in fact most people reading this will probably have a pretty good idea of what they define a species as. But there is not total agreement, not by a long shot. Furthermore, most if not all current species concepts are explicitly based on extant organisms which can be directly observed in their every day life, and also just happen to provide a near-endless supply of DNA. But what about fossils? I’ve outlined the critical importance of using fossils in conjunction with pretty much any systematic analysis before (here), but how do palaeontologists actually recognise and delimit fossil species? This is a pretty serious issue, considering the DNA of fossil organisms has always decayed long before exhumation (except in exceptional circumstances), and fossil remains typically only represent a biased sample of the organism it once was.

What are the current species concepts?

For biologists, the species problem can be framed as: “What level of divergence (morphological, genetic, etc.) between populations constitutes species diagnosis?” This can be modified slightly according to whichever species concept is being applied (see below). Using DNA as a sole basis for species delimitation is fraught with issues, including but not limited to the concept of paralogy, lateral gene transfer (transfection), arbitrary delimitation protocols, lack of data (e.g., in tropical species), and often a lack of training or instrumentation (in third world countries mainly). The relative issues and benefits of morphological and/or DNA-based analysis is a tale for another time though. Currently, there is no single ‘silver bullet’ technique for species delimitation (although many DNA taxonomists will try and pretend there is..). What we actually have are a series of non-independent concepts that actually apply to different stages of the speciation process (de Quieroz 2007 discusses this in a most brilliant manner). Here are a couple of examples:

Biological Species Concept: This is the one most people will have heard of. Species are defined by reproductive isolation, or the ability to produce fertile offspring. Obvious issues with this are if you’re asexual, and how do you know if two organisms (within reason) can or cannot mate if they are not sympatric. Also, reproductive isolation is not always congruent with morphological divergence, so is inadequate with purely morphological data sets.

Phylogenetic Species Concept: This refers to diagnosability based on the monophyly of a population. This invariably invokes the use of DNA. Genetic population divergence goes through three stages: polyphyly, paraphyly and finally reciprocal monophyly, giving two or more irreducible clusters of diagnosable organisms with a traceable pattern of ancestry and descent.

Genealogical Species Concept: This is the use of multiple gene marker distributions to delimit putative species by identifying periods of complete lineage sorting. Essentially this means that the incongruence from coalescence (the point in time where gene variants unite in a gene genealogy) no longer affects delimitation.

A currently widely used method is DNA barcoding. Some molecular systematists deem this as a powerful enough tool to entirely replace standard Linnaean taxonomy, although (obviously) there are numerous vocal objections. DNA barcoding operates on the assumption that there is a threshold for species delimitation based on a single gene, which is the entirely arbitrary 10 times greater genetic divergence (interspecific) than intraspecificity, leading to the concept of reciprocal monophyly. It works sometimes, but is fraught again with theoretical and empirical problems. (I love the idea that molecular systematists will go to the tropics with the aim of identifying unique or diverse haplotypes in insects etc., by killing as many organisms as possible; “We’ve found a unique haplotype! We must therefore preserve this beetle at all costs!”, as the decapitated beetle floats around the dissection palate..)

How do these concepts relate to fossils?

Every single one of these concepts rely on either direct observational data (e.g., sympatry for the BSC), or the use of DNA. Few modern studies rely solely on morphology to delimit species (annoyingly, seeing as it is directly coupled with behaviour, ecology etc.; DNA is just, well, DNA..). So really, with regards to fossils, in which phenotype is the only aspect preserved (and ecology etc. accordingly inferred), as well as the spatio-temporal context in which it exists, how can these concepts be applied? Well, they can’t really. So what can palaeontologists do..?

How are fossil species delimited?

In principle, there are two different methods of species delimitation: a discovery-based approach, and a hypothesis-based approach. The former makes no a priori assumptions regarding the putative species in a sample, only delimiting subsequent to analysis (e.g., DNA barcoding/taxonomy, cladistics). The latter requires an a priori assumption of what species already exist within a sample, with the analysis being a validation test. It varies in papers as to whether a full or partial cladistic analysis is carried out (if at all) when the focus if the paper is the erection and description of a new species. By partial analysis, I simply mean that the authors observe the synapomorphies of a specific clade and see if their specimen(s) match or not. This is a pretty horrendous breach of taxonomy and cladistic methodology, as it ignores the fact the every single character placement and it’s polarity is influenced by the addition of new species (in fact, this is the principal method by which cladograms are initially constructed). Full analysis is the dominantly used method, thankfully, given the accessibility of free software and relative simplicity in executing cladistic analysis (although there may be issues in obtaining and extracting previous data sets, but that’s another tale too. For someone else.) This leads us on to the next part.

Bring on Cladistics

Cladistics is the method that sytematists use to forge a hierarchical grouping of taxa into discrete subsets, or clades, for the inference of common ancestry between species and groups. A clade is defined by a node (or sometimes a branch) – the point of intersection of two or more branches – that represents the common ancestry and speciation of all subsequent taxa. Each node is represented by one or more shared derived characters (synapomorphies) between all branches, and hence taxa, emanating from the node. If the taxa in question are species (i.e., terminal branches), then the minimum required number of synapomorphies to give a sister taxa relationship is one, and the minimum number of required autapomorphies (unique derived characters) to ‘split’ the branch into two separately recognised entities, is one. That is, cladistics can recognise discrete units, including species, on the basis of a single unique character, regardless of the size of the initial character set. There are statistical methods of assessing the strength or support of this (e.g., pseudo-replication analyses, branch decay tests), but the point remains that a species can be delimited through cladistic analysis based on the possession of a single unique character. [this is a really simple overview, there are numerous web-pages and texts out there that describe cladistic methodology in more detail; just search.]

It seems that there are two main methods of delimiting fossil species: qualitatively, whereby the fossil simply looks different but the differences are not broken down into discrete characters; and quantitatively, where the species name is supported by x number of autapomorphies, and the strength or support of the diagnosis is a function of x, and is testable through cladistic methods. This is pretty much the only method available to palaeontologists given the relative paucity of fossil data. But then how many autapomorphies are required to be interpreted as a ‘strong’, or valid, diagnosis? And to what extent are species therefore comparable? It’s a problematic issue, that I haven’t actually came across much at all in the published literature. If I’m mistaken, please do point me in the right direction! What is perhaps required though, is a rigorous species concept that is directly compatible with the full range of fossil diversity, and that extant taxa can be integrated in to.

One thing to consider though is that species are treated as discrete entities when these concepts are applied; is this the correct approach  when really a lineage on which an organism sits is by definition, continuous? What do we gain by stamping an arbitrary and highly subjective boundary on this continuum? A method of classification. It has heuristic value in systematics, but it seems that the fundamental treatment of species as discrete units may need some consideration. Furthermore, speciation is a pretty stochastic and deterministic process, and the application of delimitation criteria must be flexible to account for the variation between lineages. Unless someone comes up with something really neat. Like..

Future prospects? Geometric Morphometrics. 3D automated species recognition software, based on robust statistical delimitation procedures. It’s awesome. Watch this space!

Disclaimer: I’ve probably missed out huge amounts here; this is such a massively studied field, that it’s been difficult to even shrink down to these couple of paltry pages! Comments as always are more than welcome! There are simply too many references to list here too. If people would like to read more about the subject, drop me an email (jon.tenannt.2[at]gmail.com), and I’ll happily whizz a few papers [legally..] your way, depending on taste!

Final thought: with respect to all of the work that has gone into validating ‘species’, what has been done to test the validity of higher taxonomic units, such as Family and Order, or even the Genus..?

For reading all that, here’s a snap of the Iguanodon specimen on display at the museum in Oxford, England.

Surprised American for scale


The Young Systematists Forum, 2011

Having just attended a rather neat conference down in London at the Natural History Museum (NHM), I figured a little summary would make a decent first blog entry. This is my first ever attempt at writing to the general public as well as a scientific audience (or whoever wants to read it really), so if it’s crap, please do let me know.

This year the Young Systematists Forum (YSF), hosted by the Systematics Association, had the greatest turn-out since the conference began aeons ago in the year 2000. The attendees looked eager and ready to devour the broad range of information, despite the starting time of 9.30am (a terrifying prospect for any student). I’d estimate there were at least 100 people in attendance, which is perhaps a little scary for any MSc/PhD student giving a first oral presentation. The multi-disciplinary nature of the field of systematics was apparent straight away – the abstract booklet contained such an immense range of topics, from fossil coelacanths and angiosperm phylogeny, to cladistic methodology and mammoth ecology.

http://www.systass.org/ysf/ <– Webby. Abstract booklet not uploaded yet, but I can send upon request.

Now, as a devout vertebrate palaeontologist, I must say I was overall a little disappointed. Eighteen talks, and only two involved fossils, and just the one poster out of twenty seven, which was mine! Now, given the recently solidified view that fossils and the nature of the fossil record provide critical information in, for example, molecular clock calibration (e.g., Warnock et al. 2011, and refs. within) or macroevolution (Quental and Marshall, 2009), this seemed like a slightly biased series of presentations. That’s all I’m going to say about that, for now.

On to the talks. The first session was hosted by Ellinor Michel, who works with the ICZN in London, and the Zoology Dept. at the NHM. She also lectures on taxonomic principles for the MSc course run at the museum, so was a familiar face to many.

The first talk was on pollen morphology, a stimulating topic to arouse the audience for the next 9 hours of talks. I’m going to digress here. When doing an MSc project, I think that the overall aim is not to do something mind-blowing, but simply to teach yourself new techniques with a dataset that could prove useful for the associated research group. I also think it’s critical that you develop a project personally, instead of simply analysing a small sub-set of your supervisor’s data and falling into the trap of becoming a junior lab assistant. When doing a PhD, you want to be researching a problem or area that is going to have an impact on the scientific community or domain of study. It has to have purpose, a meaning. During very few of these talks, did I see any semblance of the authors knowing why they were doing a particular project. This is my personal feeling, but it may simply be that many chose not to convey this. Setting out with clear-cut aims, objectives and implications is presentation-101 imo though. This leads back on to the first talk.

Descriptive anatomy of pollen with the aim of lineage discrimination. Although well-presented, the speaker failed to reveal what was significant about his group, the Violaceae. Concluding that his studies were, well, inconclusive, he mentioned that pollen needs further systematic evaluation in this group. Despite the multi-genomic analysis that precedes his work on the same group. OK. Unfortunately, he then lost my respect completely. He had constructed a phylogeny to map his few putative synapomorphies on. Now, forgive me for being blunt, but when you are doing a project based on systematics and using cladistic methods, presenting at a systematics conference, you better know what you’re talking about. Especially when you have an entire generation of graduates/masters/doctorate candidates watching you. So when someone asks what optimisation criteria you used to construct your phylogeny, you think back to your basics of cladistics, and answer accordingly. You don’t ask what it means. For those who are not familiar with this terminology, it is nicely summarised in Agnarsson and Miller (2008). To have undertaken research involving cladistics and not know the fundamentals is unforgivable. Rant over.

A quick second digression, but somewhat relevant. About half the speakers (including the above chap) got mixed up between the term ‘cladogram’, and ‘phylogeny’. This is one of my pet peeves, along with confusion between crown and stem groups (http://bit.ly/sptDTk, see this recent article by Mike Keesey). A cladogram is explicitly something created using cladistics, and represents nothing more than a hierarchical branching patterns between taxa. A phylogeny is different; it shows evolutionary trends representing ancestor-descendant relationships between species. The differences are subtle, and I may be completely wrong, (hopefully not), but distinguishing between them and recognising the different implications is important.

Next up was on giant fossil coelacanths! Not much to say here. It was almost a non-stop stream of comparative fish anatomy, in a French accent. Conclusions were that it was possible that gigantism in coelacanths had independently occurred in at least two lineages. Cool.

Following on, bacteria. I’ve got to admit, I don’t know a whole lot about these guys. Some cool images, and horrendous taxonomic nomenclature (Chroococcidiopsis, say it 100 times with a German accent), and then it was time for something completely different. For me, anyway.

Bio-ontologies, an unambiguous way to represent subjective knowledge. It sounds appealing, right? As far as I am aware, the speaker was presenting a new method of describing concepts of words in systematics, to remove subjectivity and refine their semantic power, combined with novel software developed at the Paris NHM. She used the concept of homology as an argument, effectively adding an alternative approach to an already hotly-debated topic. This, of course, was raucously discussed during and after the talk. Break time! And the first poster session. 27 posters to discuss is a bit much, and to be perfectly honest, the idea of reading about the plethora of invertebrate and botanical molecular-based studies was not as appealing as the idea of having a nicotine fix. I’ll just skip to the next session..

Morphometrics! That’s what we like! Kyle, a course-mate from the MSc at the NHM was up, talking about its application in assessing ontogenetic trajectories in mammoths. I believe the software he used was MorphoJ  (available here: http://bit.ly/uoPuzc), which I’m actually not yet acquainted with (we hope to have lunch next week..), having only formally used the tps series (see Links page) and custom software at the NHM designed by Norman MacLeod. It was all good, apart from using positions of particular specimens in Procrustes-transformed data within principal components ordinations to assess changes in shape, which really should have been observed using either a thin-plate spline model (somewhat unreliable) or a strobe-plot. Although, as he used 3-D landmarks, this might have been a little difficult. Regardless, there are alternative methods of visualising shape deformation than speculation (e.g., mapping the highest PCA scores back on to the landmarks, and qualitatively observing deformation trajectories). He got really buggered by a couple of the questions, unfortunately, which was understandable as both questions were nonsense. Firstly, it was asked why ‘discrete character coding’ wasn’t used instead of landmarks. The whole point of the project was to quantitatively assess shape variations, which can’t be done using cladistics. In the future, the results here could be refined to assess the validity of using discrete character states on cranial landmarks, as has been done quite effectively in an MSc project conducted last year on the course, but using Felidae. The second question should have been a piece of piss to answer for someone who’d just done a morphometrics project and having been lectured on it a few months beforehand. Sorry Kyle, but it’s true. “What was the proportion of type 1, type 2, and type 3 landmarks used?” My response would have been “Who gives a toss? (Or why?)”. Type 1 landmarks are those which are topographically homologous, not necessarily biologically homologous. They represent structures which can be explicitly defined on all specimens, such as suture intersections. Type 2 landmarks are those that represent points of geometric significance, such as minima and maxima of curvatures, distal terminations etc. Type 3 are those which are defined or interpolated in terms of other landmarks. The point is that they are all geometrically comparable. So, asking the proportion of these is insignificant. For example, semi-landmarks are a form of secondary landmark, and can contain substantially more information than primary ones when used in abundance to profile outlines. I failed to understand, and glowered at an empty coffee mug for a while.

As a side-note, I love geometric morphometrics. Its statistical power, relative ease in grasping, and broad range of applications make it an invaluable tool in many areas of palaeontology. Imagine, every single character and character state in that matrix it took you so long to create being quantifiable, and the character states formally delimited in a statistically rigorous manner, in such a way that true homologies can not only be irrefutably defined, but also viewed in shape space. I might write an article on this later. Digression x over.

Next up were two vaguely similar talks, both on gastropod morphometrics. The first was excellent! Not surprisingly, it took the award for best talk. A combined biogeographical analysis of thecosomate gastropods using GIS, molecular sequencing and geometric morphometrics allowed the presenter to successfully discriminate geographical gastropod domains. And it came with a meaning! Apparently, these particular gastropods are particularly susceptible to acid dissolution (aragonite shells), so can be used as a proxy for ocean acidification variations. The only thing I’d like to have seen again, is the valve shape changes apparent over these phenotypic and molecular-defined boundaries, and if they could be related to function or environment. The second talk, using a method known as co-ordinate point extended-eigenshape analysis (with Norm’s fingerprints all over it) assessed disparity in Lake Tanganyika’s endemic gastropod fauna. Results seemed to imply that gastropods from analogous but spatio-temporally divergent adaptive radiations occupied near-identical regions of morphospace, which is pretty cool! More was required on casual factors though; for example, were the unoccupied morphospaces less hydrologically stable, or make them look more edible..?

The fourth talk was entertaining. The idea that continental lakes could potentially be used as analogues for island biogeographical patterns seemed appealing, and the sort of crazy idea one would expect from a student from Amsterdam (something in the water, perhaps?). The results, however, seemed to be controversial, and possibly based on incomplete sampling and/or knowledge of the species’ boundaries in the analysed specimens (using CO1, a slowly evolving gene, to assess rapid divergence may have been problematic).

Finally for this session, was an ecological analysis of fiddle crabs in Indonesia, looking at the modes of sympatry between species. Methods were largely observational, using the infamous quadrat methodology, and only preliminary observations were conveyed. Ready for another digression?

The ‘niche’. So far, this word has been used to describe a lake, intra-lacustrine habitats, a quadrat, a mudflat and coeval beach deposits. I posed the question to several conference attendees, and also to the interweb, asking what a ‘niche’ was, and got largely vague responses. It appears that ‘niche’, is understood (or not) by many as a concept to represent any aspect of any environment. Then, clarity. Maybe? John Nudds, palaeontologist, ex-supervisor, Lagerstatten and beer expert, replied “It’s a type of music related to Garage, a British thoroughbred racehorse, or possibly the relational position of an organism’s species“. OK, nice. I trust him. Still, if that’s the definition, does it mean spatially, temporally, sympatrically, allopatrically, ecologically, parasitically? You see where I’m going – it could mean anything, or any combination of things. Therefore, what is it’s use? Considering, unless someone can tell me explicitly what it means, I think it means whatever people want it to. Much like the word ‘trait’. God I hate that word.. Perhaps these vague definitions require some kind of ontological analysis..

Break two. Lunch. Back to the staff canteen after a few months away. Reminiscing, and being miffed at ‘niches’ took up most of that time. Especially as someone had pronounced the word with the ‘ch’ as in ‘leaches’. Practically vomited.

Round three. By now, the lack of sleep and series of misfortunes in getting to London on time was kicking in. Nonetheless, what followed was a series of diverse and interesting talks, despite all being about molecular systematics. Firstly, preliminary results were given regarding the phylogenetic status of economically significant beetles (Elateridae) in Canada. As crop pests, this seemed like quite an important study. But then, what is the point of spending all this time and effort assessing something that you actually just want to kill? And how is a Canadian farmer going to assess the haplotypic affinity of the beetles in his wheat crop from his tractor?

Next up, the most critical talk of the day, as it was directly applicable to almost every other given. Using nymphalid butterflies, it was demonstrated that using identical alignments, you get conflicting trees resulting from using either coalescent or concatenated methods of input. Excellent! Considering this was using a huge data set (12.5kBp, 87 species) using multiple genomic loci, it not only demonstrated the effects of taxonomic and genomic sampling, but also on the impacts of tree construction methodology, something that most other analyses had largely overlooked. This was followed by a largely similar study using Araneae, with largely preliminary work mentioned, so I’ll give it a miss here.

The next one was pretty cool, and took second prize. Phylogeography of a model weed (yep) using haplotype configurations. I think that the main thing to take away, is that there is a geographically-defined system of haplotype differentiation in this particular plant , and secondarily, that if you spend too much time studying at Oxford, accidental quotes such as “explosive seed dispersal that once got in my eye” will be over-looked. I envy the innocent.

That was it. Time for jasmine tea and to tell people about geometric morphometrics and dinosaurs. Got a bit of interest, but most people had resorted to trying to make their own fermentation tanks by this point, with at least two hours to wait until the pub.

On to the final session. It might just have been a trick of the light, but there seemed to be significantly fewer people in the theatre.. The first talk was possibly the most odd of the entire day. I don’t think the chap had actually done anything to present, it was more this concept and series of protocols he wanted to establish to assess population genetics within a couple of Antarctic brittle stars. He had a background in marketing, and it seemed like he was propositioning the audience (comprising one hundred skint students) almost to fund his work. I was confused. It was at this time that my phone battery died, so I had to stop live-tweeting the event (#YSF13), which was also going to be the primary basis for this entry. Ah well.

The next one was reconstructing the ‘phylogeny’ of a particularly speciose clade of angiosperms, the magnoliids. Again, presenting just preliminary work, there wasn’t much to conclude, except that, with the current taxon and genetic coverage, resolution was looking pretty promising, at least with very deep relationships.

Nearly there now. Going on to cladistic methods, the next talk was one of the best-presented of the conference, and with pretty important impacts. They must be feeding those guys at Cambridge something.. The talk revolved around the assessment of ‘hidden’ support in simultaneous and combined analyses. I won’t go into too much depth, but with the creation of intricate new metrics to assess the behaviour of hidden support, I’m looking forward to the publication(s).

Penultimately, an assessment of various competing topologies that reflect the diversification tempo and trajectories of diversification. Now, over the last year, there has been a significant amount of work published within this field (e.g., Stadler (2011), Purvis et al. (2011) and Venditti et al. (2011)). I don’t think they added anything particularly challenging or novel to these three citations, although this may change as additional genomes are acquired and sequenced, and methods of fossil calibration become more robust.

The last talk was on early metazoan evolution using nuclear genomes in extant sponges. In the first bit of the talk, not a single mention of fossils was given, so I had a nap and missed the rest.

And that was it! Rewards were distributed ceremoniously, for some reason in the cramped Bird Gallery (the most easily congested hall in the entire NHM), and free wine was unceremoniously consumed. All in all, there was an eclectic mix of topics, people, and quality throughout the day. Given the pressure, I have to congratulate every single presenter for their courage, and for the most part, unfaltering delivery. Martin Hughes deserves a special mention at the end for asking a question to every single speaker that day. The organisers are again duly noted for, well, organising the event. Although I don’t thank them too much, as they made a typo in my abstract title.

End transmission.



Agnarsson, I. and Miler, J. A. (2008) Is ACCTRAN better than DELTRAN? Cladistics, 24, 1032-1038

Purvis,  A., Fritz, S. A., Rodriguez, J., Harvey, P. H. and Grenyer, R. (2011) The shape of mammalian phylogeny: patterns, process and scales, Proceedings of the Royal Society B, 366, 2462-2477

Quental, T. Q. and Marshall, C. R. (2009) Extinction during evolutionary radiations: reconciling the fossil record with molecular phylogenies, Evolution, 63(12), 3158-3167

Stadler, T. (2011) Mammalian phylogeny reveals recent diversification rate shifts, PNAS, 108(15), 6187-6192

Venditti, C., Meade, A. and Pagel, M. (2011) Multiple routes to mammalian diversity, Nature, 479, 393-396

Warnock, R. C. M., Yang, Z. and Donoghue, P. C. J. (2011) Exploring uncertainty in the calibration of the molecular clock, Biology Letters8, (doi:10.1098/rsbl.2011.