RIASSUNTO
Monitoring fish welfare has become increasingly important for salmon farmers. Current approaches require manual labor and physical inspection or interpretation of video. Echo sounders make real-time monitoring of the entire fish population over time possible. However, current approaches for automatic interpretation of echograms mainly focus on species classification and therefore fail to appropriately encode the spatiotemporal properties contained within the data. Other approaches are primarily aimed at the feeding process and require a human-in-the-loop. Transformer-based approaches have been shown to better handle long sequences than Long Short-Term Memory networks in recent Natural Language Processing research. We therefore introduce EchoBERT - Echo Bidirectional Encoder Representation Transformer, a transformer-based approach for behavior detection in farmed Atlantic salmon (Salmo salar, Salmonidae), using the spatiotemporal properties contained in echograms. The model interprets the spatiotemporal dynamics of echograms through attention mechanisms to classify fish behavior. We compare EchoBERT to a traditional sequence modeling approach on the task of detecting behavior indicative of pancreas disease in a six-fold cross-validation study using data from 6 distinct farming cages. We show that EchoBERT shows a strong correlation between model predictions and true labels, indicated by a Matthew's Correlation Coefficient score of 0.694 ± 0.178 using an ensemble approach, compared to 0.626 ± 0.084 for traditional sequence models. We also find that EchoBERT is capable of detecting disease indicators over a month prior to detection using standard procedures. Our results show that EchoBERT has high potential for automatic behavior detection through unintrusive methods suitable for applications in aquaculture. The source code is available at: https://gitlab.com/hakonma/echobert.