13 February 2009

Using Conservation Status to Automatically Apply Phylogenetic Definitions

To briefly summarize some relevant points in the last post (Extinct or Extant?):
Since 2001, the IUCN Red List has used the following categories:
  • EX: Extinct
  • EW: Extinct in the Wild
  • CR: Critically Endangered
  • EN: Endangered
  • VU: Vulnerable
  • NT: Near Threatened
  • LC: Least Concern
  • DD: Data Deficient
  • NE: Not Evaluated
As mentioned in earlier posts, Names on Nodes uses URIs (URLs, ISBN numbers, DOIs, etc.) for authorities and qualified names (URI + unique local name) for taxonomic signifiers. Thus, these states can be stored as signifiers in the Names on Nodes database. Examples for the 2008 assessment: One wonderful thing about the IUCN database is that you can export query results as XML (also CSV): Here's an example of an entry:
<species id="148296">
  <scientific_name>
    Zosterops xanthochroa
  </scientific_name>
  <kingdom_name>
    ANIMALIA
  </kingdom_name>
  <phylum_name>
    CHORDATA
  </phylum_name>
  <class_name>
    AVES
  </class_name>
  <order_name>
    Passeriformes
  </order_name>
  <family_name>
    Zosteropidae
  </family_name>
  <genus_name>
    Zosterops
  </genus_name>
  <species_name>
    xanthochroa
  </species_name>
  <authority>
    Gray, 1859
  </authority>
  <synonyms>
    <synonym>
      <scientific_name>
        Zosterops xanthochrous
      </scientific_name>
      <genus_name>
        Zosterops
      </genus_name>
      <species_name>
        xanthochrous
      </species_name>
    </synonym>
  </synonyms>
  <common_names>
    <name lang="Eng">
      Green-backed White-eye
    </name>
  </common_names>
  <assessment
      version="3.1"
      year="2008">
    <category>
      LC
    </category>
  </assessment>
</species>
This provides a source not only for the conservation status of species, but also for the species themselves and some of their higher taxa as well. This one XML snippet can provide all of the following signifiers:It also authorizes a number of superset-subset relations, e.g., "Zosterops includes Zosterops xanthochroa" and "Least Concern (2008) includes Zosterops xanthochroa". The latter identifies Z. xanthochroa as an extant species during 2008. Because of relations like this, we can build a MathML set for the set of all organisms (or populations, whatever) which were extant in 2008 according to the IUCN Red List:
<apply xmlns="http://www.w3.org/1998/Math/MathML">
  <union/>
  <csymbol
    definitionURL="urn:isbn:2831706335::categories:EW:2008"/>
  <csymbol
    definitionURL="urn:isbn:2831706335::categories:CR:2008"/>
  <csymbol
    definitionURL="urn:isbn:2831706335::categories:EN:2008"/>
  <csymbol
    definitionURL="urn:isbn:2831706335::categories:VU:2008"/>
  <csymbol
    definitionURL="urn:isbn:2831706335::categories:NT:2008"/>
  <csymbol
    definitionURL="urn:isbn:2831706335::categories:LC:2008"/>
</apply>
Presto, now I can apply modified node-based definitions and total group definitions! Thanks, IUCN, for helping to enable the automated application of phylogenetic definitions! (And, you know, also for all the "saving threatened species from extinction" stuff.)

No comments:

Post a Comment