Assignment
Combine is the lead technical partner in the Vinnova-funded consortium Ocean Data Factory Sweden (ODF), Sweden’s national marine data laboratory. One of our main tasks over the last two years has been to construct a high-performance end-to-end system that connects citizen science to data science and machine learning. There is a vast trove of underwater video footage going back as far as 20 years, which in the past has been examined and annotated manually by experts in marine biology. Given recent advances in AI, this is simply not an efficient use of their time and we believe that machine learning techniques could help to automate most parts of the inspection process and become a key asset to the national biodiversity infrastructure.
An automated underwater observation system
We set out to create a system that was able to automatically identify species underwater and thereby reduce the need for laborious human annotation. This system is now called the Koster Seafloor Observatory, or KSO for short. This project is now in its third iteration, KSO 3.0 and has been through many stages of advancement.
KSO 1.0: Setting the stage
The Kosterhavet National Park is one of Sweden’s most important and unique marine environments, and is not only the first but also the most species-rich marine park in all of Sweden. The area obtained official protected status in 2009 and monitoring changes in the marine environment has become a top national priority. We started by building a proof of concept for the Kosterhavet National Park for an important habitat-building marine species (Lophelia Pertusa) using a selection of low-quality footage from the Tjärnö Marine Laboratory. This first stage focused on setting up a flexible data management solution to organize and integrate data collected from the marine laboratory and the citizen science workflows. We set up infrastructure for storage and retrieval of footage and a database to keep track of the metadata related to the movies and the classifications provided by both citizen scientists and machine learning algorithms.
The machine learning model chosen for this task is generally referred to as a single-shot detector, which means that the model looks at the entire image and predicts multiple objects inside the image at once, as opposed to region-based models which first identify regions of interest and then detect features in these regions as a second-step. This leads to faster detections in most cases and allows these models to predict in near real-time. For our object detection model, we used the YOLOv3 architecture by Redmon and Farhadi (2018) [2]. As the name suggests, the third iteration of the YOLO model offers improvements but essentially uses the same building blocks as its predecessors, such as Darknet-53 as the feature detector, adding useful enhancements such as a multi-scale detector at 3 different scales to detect smaller objects. Our model was based on an open-source implementation to ensure that our results could be easily replicated and also be kept up to date with advances in future YOLO releases.
KSO 2.0: Technical upgrades and research interest
In this iteration, we upgraded the machine learning model to its latest version, YoloV5, which included many improvements with regards to processing speed and augmentation techniques that led to overall performance improvements [3]. We also increased the number of species to 4 and released a public version of the model for use by researchers and students. This enabled the use of AI-enhanced image pre-processing techniques for our low data availability for some key species and the creation of synthetic data using generative adversarial networks [4].
KSO 3.0: Modularisation, decentralized ML and national expansion
As the project matured, modularisation became an essential enabler. The reason for this was to facilitate the use of our system by external parties with little or no knowledge of AI or our infrastructural setup with minimal effort. After partnering with the Swedish National Infrastructure for Computing (SNIC) and the European Open Science Cloud (EOSC), we were able to expand the reach of our project to a larger host of consumers, which now include the Swedish Geological Survey (SGU) and Medins Vattenkonsulter. For SGU, we are looking to automate their ecological assessments of the abundance blue mussels in the Baltic Sea. For Medins, we have used our model in a pilot project to count sea pens on Sweden’s west coast.
We enable this modularisation at all levels by creating a system that can be used by a wide spectrum of users, from users without any coding background to users that want to build custom modules. We provide them with easy-to-use, task-based Jupyter Notebooks that can be run on just about any underlying infrastructure (e.g., locally, in the cloud, a Google Colab notebook) and make use of open-source software and solutions as far as possible. In partnership with the European Open Science Cloud and Scaleout Systems’ FEDn, we were also able to build a decentralized version of our ML models called FedKSO, which allows data providers to use our models without sensitive data ever having to leave their systems.
With big data comes big responsibility
This project has continued to deliver on innovative outcomes throughout each iteration, and we look forward to sharing even more about its developments in the future. As this project continues to evolve and grow, many exciting technical challenges remain to be solved. These include:
- Storing and accessing observation data in a centralized and standardized way
- Identifying potentially unknown species, i.e., objects that do not have a class but could be interesting to study
- Comparing past footage with poor visibility conditions and/or low camera resolution to modern footage
- Making use of data that contains sensitive information about the sea floor (restricted information from the Swedish Armed Forces).
Marine experts are now saving valuable time
By using ML to automate large parts of the workflow and reducing the time spent looking at empty footage, users of our system now report spending at least 5x less doing analysis work. This frees them up to focus their mental efforts on generating insights and answering important ecological and biological questions.
Code respositories
We open source all our code in line with ODF objectives. Here are the repositories associated with the Koster seafloor observatory:
- Github repository for the machine learning model
- Github repository for the system data flow
- Github repository for decentralized implementation
Want to find out how our consultants can take your operations to the next level?
Contact Simon Yngve
simon.yngve@combine.se
+46 731 23 45 67
References
[1] https://www.sverigesnationalparker.se/park/kosterhavets-nationalpark/besoksinformation/hitta-hit/
[2] https://pjreddie.com/media/files/papers/YOLOv3.pdf
[3] https://docs.ultralytics.com/
[4] https://gupea.ub.gu.se › gupea_2077_69094_1
[5] https://github.com/scaleoutsystems/fedn
[6] https://www.zooniverse.org/projects/victorav/the-koster-seafloor-observatory/about/results