Machine learning-based segmentation and landmarking of 2D fish images


SOCIETY FOR INTEGRATIVE AND COMPARATIVE BIOLOGY
2021 VIRTUAL ANNUAL MEETING (VAM)
January 3 – Febuary 28, 2021

Meeting Abstract


P10-8  Sat Jan 2  Machine learning-based segmentation and landmarking of 2D fish images Diamond, KM*; Avants, BB; Maga, AM; Seattle Children’s Research Institute; University of Pennsylvania; Seattle Children’s Research Institute kelly.diamond@seattlechildrens.org https://diamondkmg.github.io/

As museum collections are digitized, specimen images provide potential dataset for new research questions. However, preprocessing and data collection from these images is often a time-limiting step. With advances in machine learning (ML) techniques we can make better use of publicly available data by extracting useful measurements in a fraction of time it takes to measure images individually. In this project our goal is to develop ML pipelines to isolate, landmark, and segment a large number (14,000) of 2D fish pictures from digital museum collections. As a preprocessing step, we isolated fish from museum images that contained other non-fish objects. Using fewer than 100 images as a training set, we had over 95% success in isolating fish from over 14,000 images, representing 118 species. Next, we took a sample of 500 of these images and manually placed 24 landmarks and created segmentations of anatomical portions of the fish’s body. We are working on using these segmented and landmarked images to train a new ML model to automatically place landmarks and segment unseen images. We will review and revise the outputs from this pipeline and then rebuild the ML model. This iterative process, known as active learning, is more time-effective to generate the large amount of training data necessary for successful ML models as well as enabling faster image processing from open source datasets. All of these tasks are accomplished using the open-source software: Advance Normalization in R (ANTsR) for model building and computational tasks, and SlicerMorph for segmentation and interactive landmark data acquisition. Our experiments shows that ML holds the key unlocking large biodiversity data available in specimen collections for organismal biology research.

the Society for
Integrative &
Comparative
Biology