P28-8 Sat Jan 2 Sashimi: Automatic high-throughput pipeline for organismal image segmentation using deep learning Schwartz, ST*; Alfaro, ME; University of California, Los Angeles; University of California, Los Angeles shawnschwartz@ucla.edu http://shawntylerschwartz.com
Deep learning, a branch of machine learning, can serve as a powerful toolkit for studies in ecology and evolutionary biology. Image segmentation is one powerful application of deep learning that has enormous potential for high-throughput phenoscaping in biological studies. As large-scale color pattern analyses across the tree of life become increasingly popular, the need for standardized, high fidelity image sets of organisms becomes essential. One rate-limiting step when preprocessing images of specimens for color pattern analysis is to mask out the background pixels of the image to prevent those pixels from influencing color pattern geometry statistics. Previous studies have relied on manual labor and image editing software to meticulously mask out background pixels; however, this method is relatively slow and lacks reliable consistency when implemented at larger scales with more variability in worker quality. Furthermore, simply using deep learning to conduct image segmentation on taxa is not possible when taking the most commonly used regional convolutional neural network (R-CNN) models at face value, as these deep learning models have been trained on a very small fraction of biodiversity across the tree of life. We therefore custom trained a model to automatically perform high-throughput image segmentation on a heavily underrepresented group in the model space: coral reef fishes. We also present sashimi, a web-based toolkit for manually segmenting images and creating new training datasets for taxa, and a Python-based automated pipeline to carry out background pixel segmentation on any group for which a new model has been trained. We also show that our custom trained fish segmentation model generates cropped images well suited for popular downstream color pattern analysis workflows.