77-7 Sat Jan 2 AniProtDB: A collection of metazoan proteomes for comparative studies Barreira, SN*; Nguyen, AD; Moreland, RT; Baxevanis, AD; NHGRI/NIH; NHGRI/NIH; NHGRI/NIH; NHGRI/NIH sofia.barreira@nih.gov
Comparative genomic and proteomic analyses have provided keen insights into both the commonalities and differences between metazoan species, advancing our understanding of phylogenetic relationships, the evolution of gene families, and the mechanisms underlying biological diversity. Ultimately, the ability to perform these kinds of analyses rests on having reliable proteomic data from which one can confidently make biological conclusions. However, the quality of publicly available data sets remains highly variable, with most being comprised of raw sequencing reads that need to be processed, assembled, and annotated before meaningful information can be extracted from them. To address the void in the availability of high-quality proteomic data traversing the animal tree, we have implemented a pipeline for generating de novo assemblies based on publicly available data from the NCBI Sequence Read Archive, yielding a comprehensive collection of proteomes from 108 species spanning 21 animal phyla. These proteomes were generated using consistent methodologies, quality control thresholds, and measures of completeness. We have also created the Animal Proteome Database (AniProtDB), a resource providing open access to this collection of high-quality proteomes, along with information on predicted proteins and protein domains for each taxonomic classification. A BLAST-based interface also allows users to perform sequence similarity searches against all proteomes generated with this pipeline. This solution vastly increases the utility of these data by removing the barrier to access for research groups who do not have the expertise or resources to generate these data and enables the use of data from non-traditional research organisms that have the potential to address key questions in biomedicine.