Exploring Taxon Concepts of Sponges (Porifera) through Natural Language Processing of Systematic Monographs


Meeting Abstract

P1-5  Sunday, Jan. 4 15:30  Exploring Taxon Concepts of Sponges (Porifera) through Natural Language Processing of Systematic Monographs JONES, C.L.*; HAMIDI, H.M.; CUI, H.; RODENHAUSEN, T.; WU, H.H.; THACKER, R.W.; Univ. of Alabama at Birmingham; Univ. of Alabama at Birmingham; Univ. of Arizona; Univ. of Arizona; Univ. of Arizona; Univ. of Alabama at Birmingham dr.bob.thacker@gmail.com

Phylum Porifera contains over 8,000 described sponge species that are represented in systematic monographs ranging from Linnaeus (1759) to Systema Porifera (2002) to now. The concepts underlying the traditional morphological classification of sponges have changed dramatically over the past 100 years, and these concepts often conflict with modern molecular-based phylogenies. To explore and quantify how taxon concepts have changed with advances in both morphological and molecular systematics, we are testing novel natural language processing software, the Explorer of Taxon Concepts (ETC – Beta version), with regional and global systematic monographs of Porifera. ETC enables users to create xml files from the text of semi-structured taxon descriptions, and then parses these files using terms from morphological ontologies and those it discovers from the descriptions. Users then review the terms discovered by ETC, placing the terms into categories (such as anatomical structures, life-history stages, or colorations) and/or combining terms as synonyms. Based on this user feedback, ETC builds a morphological character matrix that incorporates these terms. Users can extensively edit the character matrix, for example, by color-coding data cells and controlling the states that characters can take. Our tests indicate that ETC quickly parses characters associated with numerical measurements, but to assess characters based on the presence or absence of a particular trait, the user needs to carefully categorize the discovered terms.

the Society for
Integrative &
Comparative
Biology