Fish Recognition Ground-Truth
License: Custom


This fish data is acquired from a live video dataset resulting in 27370 verified fish images. The whole dataset is divided into 23 clusters and each cluster is presented by a representative species, which is based on the synapomorphies characteristic from the extent that the taxon is monophyletic. The representative image indicates the distinction between clusters shown in the figure below, e.g. the presence or absence of components (anal-fin, nasal, infraorbitals), specific number (six dorsal-fin spines, two spiny dorsal-fins), particular shape (second dorsal-fin spine long), etc. This figure shows the representative fish species name and the numbers of detections. The data is very imbalanced where the most frequent species is about 1000 times more than the least one.


Data Annotation

This data is organized into 23 groups, where the fish images and their masks are stored separately. Each cluster has a single package. The image files are named as "tracking id_fish id". Fish images with the same "tracking id" means they are belong to the same trajectory. "fish id" is a global unique id, which ranges from 1 to 27370. A reverse table contains "file name verse cluster id" is provided at here. The whole package of all groups is available here (510,912,000 bytes, checkSum).

ID.species Detection # Trajectory #
01.Dascyllus reticulatus 12112 4240
02.Plectroglyphidodon dickii 2683 1225
03.Chromis chrysura 3593 1175
04.Amphiprion clarkii 4049 1021
05.Chaetodon lunulatus 2534 536
06.Chaetodon trifascialis 190 79
07.Myripristis kuntee 450 71
08.Acanthurus nigrofuscus 218 71
09.Hemigymnus fasciatus 241 58
10.Neoniphon sammara 299 53
11.Abudefduf vaigiensis 98 42
12.Canthigaster valentini 147 28
13.Pomacentrus moluccensis 181 27
14.Zebrasoma scopas 90 23
15.Hemigymnus melapterus 42 16
16.Lutjanus fulvus 206 15
17.Scolopsis bilineata 49 8
18.Scaridae 56 5
19.Pempheris vanicolensis 29 6
20.Zanclus cornutus 21 6
21.Neoglyphidodon nigroris 16 8
22.Balistapus undulatus 41 6
23.Siganus fuscescens 25 6


  title={Long-term underwater camera surveillance for monitoring and analysis of fish populations},
  author={Boom, Bastiaan J and Huang, Phoenix X and Beyan, Cigdem and Spampinato, Concetto and Palazzo, Simone and He, Jiyin and Beauxis-Aussalet, Emmanuelle and Lin, Sun-In and Chou, Hsiu-Mei and Nadarajan, Gayathri and others},
  title={Supporting ground-truth annotation of image datasets using clustering},
  author={Boom, Bastiaan J and Huang, Phoenix X and He, Jiyin and Fisher, Robert B},
  booktitle={Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012)},



Data Summary
Provided by
Fish4Knowledge is funded by the European Union Seventh Framework Programme [FP7/2007-2013] under grant agreement 257024, addressing Objective ICT-2009.4.3: Intelligent Information Management, Challenge 4: Digital Libraries and Content.
Start Building AI Now