Evaluation

During the model evaluation, we will separately evaluate the model's performance on isolated structures and contiguous structures.

Isolated Structures

For assessing segmentation results of both dense isolated and sparse isolated structures, two metrics will be employed:

1) Volumetric Dice Similarity Coefficient (DSC)
2) Panoptic Quality (PQ) [1]

Contiguous Structures

For assessing segmentation results of both dense contiguous and sparse contiguous structures, again two metrics will be employed:

1) Volumetric Dice Similarity Coefficient (DSC)
2) Centerline-Dice Similarity Coefficient (clDice)[2]

References
[1] Kofler, F., Möller, H., Buchner, J., et al. (2023). Panoptica: Instance-wise Evaluation of 3D Semantic and Instance Segmentation Maps. arXiv preprint arXiv:2312.02608.
[2] S. Shit, J.C. Paetzold, A. Sekuboyina, et al. clDice — a novel topology-preserving loss function for tubular structure segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 16555–16564.

Ranking

The ranking of a submitted algorithm is determined through the following process:

• Compute the metric scores for each test case.
• Calculate the average of the metric scores across all test cases for each individual metric.
• Rank the averaged scores for each metric independently based on its specific optimization trend (e.g., higher is better or lower is better).
• Determine the ranking of the submitted algorithm by calculating the mean rank across all metrics separately for dense isolated, sparse isolated, dense contiguous, and sparse contiguous structures.
• Determine the overall ranking of the submitted algorithm by calculating the mean rank across two structure types.
• If two or more algorithms have equal final ranks, the prize will be shared equally among them.