
Our findings show that by leveraging video-based
analysis and state-of-the-art object tracking tech-
niques, we were able to generate detailed perfor-
mance metrics such as path length, velocity, accelera-
tion, jerk, and working area. These metrics may offer
an objective and data-driven approach to evaluate sur-
gical performance.
While we can successfully extract the aforemen-
tioned APMs from input videos, we are currently not
able to evaluate how these APMs translate to surgical
performance. Making such an evaluation would re-
quire thorough analysis of the videos by one or more
expertly trained surgeons. This analysis could poten-
tially be correlated with the extracted APMs to de-
termine what classifies as good surgical performance.
Thus, given the APMs of the videos, we can only
compare the scores between different videos. Future
work is required to determine how well the extracted
APMs can predict surgical performance.
Moreover, it is not currently possible for us to
evaluate the accuracy of the APMs we have calcu-
lated. The datasets we have acquired do not include
ground truth data regarding the positioning of the
tools relative to the environment they are in. Be-
cause of this, we cannot make conclusions about how
good our APM estimates are, but only conclude about
comparison between one video’s surgery performance
over another. One factor that could also potentially in-
troduce error is the spatial component in some APMs,
such as velocity, acceleration, jerk, and path length,
because the calculation requires depth information.
The tools move around in all directions, making it
hard to estimate depth from 2D videos. It might be
possible to estimate this if the camera parameters are
known, but these parameters are not available for the
datasets that we have used.
Additionally, the nature of laparoscopic surgeries
introduces further complexity. These surgeries of-
ten involve a person manually holding the camera,
leading to slight movements that can add noise to
the APM calculations. This camera movement is not
taken into account when calculating the APMs.
Although APMs are still in their infancy, this
emerging field holds significant promise for trans-
forming surgical training and enhancing the perfor-
mance of experienced surgeons. APMs can provide
actionable feedback to surgeons, potentially reducing
the reliance on high surgical volumes for skill acqui-
sition. With the current constraints on surgical care
limiting procedure volumes, the development and re-
finement of APMs offer a viable solution to optimize
training and performance evaluation. Future research
should focus on expanding the repertoire of APMs
and advancing their accuracy and applicability in as-
sessing surgical performance.
6 CONCLUSION
In this project, we developed an automated system
to evaluate surgical performance, providing a foun-
dation for improved assessment methods. The pri-
mary objective was to create a pipeline capable of
processing laparoscopic surgery videos, detecting and
tracking surgical instruments, and calculating APMs
to support objective evaluation.
The fine-tuned YOLO11 model demonstrated
strong performance on the m2cai16-tool-locations
dataset, achieving a mean average precision
(mAP@0.5) of 0.957. However, on the Cholec80-
Boxes dataset, it scored a lower mAP@0.5 of 0.65,
highlighting dataset-dependent variability. Despite
this, the model provided a robust foundation for
tracking.
For tracking, ByteTrack, BoT-SORT, and Uni-
Track were evaluated using established MOT metrics
such as HOTA, MOTA, and IDF1. These MOT met-
rics helped identify BoT-SORT as the most effective
tracker for this application, balancing detection ac-
curacy and identity association across the evaluation
metrics.
The system successfully extracted APMs, such
as path length, velocity, acceleration, jerk, working
area, and usage time distribution, enabling compara-
tive analyses between surgical videos. Future work
should address our limitations to improve APM accu-
racy.
In conclusion, this work validates state-of-the-art
detection and tracking algorithms’ ability to compute
APMs from 2D laparoscopic surgery videos. This
project lays a foundation to improve the quality and
fidelity of surgical training, offering a potential in en-
hancing patient safety and reducing training costs.
ACKNOWLEDGMENT
We would like to thank Aalborg University for pro-
viding the computational resources needed for this
project.
REFERENCES
Abdulbaki Alshirbaji, T. et al. (Aug. 2024). Cholec80-
Boxes: Bounding-Box Labels for Surgical Tools in
Five Cholecystectomy Videos. Zenodo. DOI: 10 . 5281/
Automated Performance Metrics for Objective Surgical Skill Assessment in Laparoscopic Training
829