2 METHOD
2.1 Dataset Preparation
The LUNA16 dataset (LUNA16, 2016) comprises
1186 lung nodules in grayscale, each sized at
330x330 pixels. These images showcase various
types of nodules, serving as essential resources for
medical imaging research and development.
Despite its comprehensiveness, integrating the
LUNA16 dataset with the YOLOv8 model, renowned
for its object detection prowess, presents a notable
challenge due to format disparities. To surmount this
obstacle, a critical preprocessing step becomes
imperative. This entails transforming the dataset into
a format compatible with YOLOv8, such as the VOC
format.
Navigating through each CT image in the dataset
involves meticulous iteration through individual
nodules, each potentially indicative of a distinct
medical anomaly. Accurately calculating precise
coordinates for each nodule necessitates the
conversion from world coordinates to image
coordinates, ensuring precise localization within the
images. Subsequently, XML tags are generated for
each nodule, encapsulating crucial information such
as bounding box coordinates and corresponding class
labels.
Through this detailed preprocessing regimen, the
LUNA-16 dataset is adeptly tailored to meet the
requirements of the YOLOv8 model. By providing a
standardized format aligning with the model's
architecture, researchers and medical practitioners
can effectively harness the capabilities of YOLOv8 to
discern and analyse lung nodules within CT images
with heightened accuracy and efficiency, thus
propelling advancements in medical imaging and
diagnosis.
2.2 YOLOv8-Based Lung Nodule
Detection
You Only Look Once (YOLO) stands as a widely-
utilized model in the realm of object detection and
image segmentation [5]. Since its inception in 2015,
YOLO has garnered widespread acclaim due to its
exceptional speed and accuracy.
Currently, with the advent of YOLOv8, the latest
iteration provided by Ultralytics, the capabilities of
YOLO have been further expanded. YOLOv8 is not
limited to mere object detection [6]; it extends its
functionality to encompass a diverse array of
computer vision tasks, including segmentation, pose
estimation, tracking, and classification. This
versatility is instrumental in empowering users across
various applications and fields, enabling them to
harness the full potential of YOLOv8 in their
respective domains.
At its core, YOLOv8 employs a grid-based
methodology to scrutinize input images. These
images are meticulously dissected into a structured
grid layout, with each grid cell serving as a focal point
for analysis. Within these cells, the model adeptly
predicts bounding boxes encapsulating potential
objects, while concurrently assigning probabilities to
different object classes. This unified approach,
facilitated by a neural network architecture,
seamlessly consolidates these predictions,
culminating in rapid and precise object detection.
The inherent agility and accuracy of YOLOv8
renders it exceptionally well-suited for real-time
applications across diverse domains. Whether in
surveillance, autonomous vehicles, medical imaging,
or any other field reliant on computer vision,
YOLOv8 stands as a cornerstone, driving innovation
and facilitating progress with its unparalleled
capabilities.
The YOLOv8 model employs deep learning
networks to train on lung CT images, learning
features of nodules and performing object detection.
Prior to training, a substantial annotated dataset is
required, encompassing various types of nodules and
cases. The network architecture includes
convolutional, pooling, and fully connected layers.
During training, model parameters are adjusted
guided by a loss function to optimize consistency
between predicted and true labels. In the prediction
phase, new CT images are inputted into the model to
obtain bounding boxes with nodule positions and
class information. Post-processing steps such as non-
maximum suppression are often conducted to
enhance accuracy. In summary, YOLOv8 utilizes
deep learning techniques to achieve automated
detection of nodules in lung CT images.
2.3 Implementation Details
The GPU used for model training is the NVIDIA
GeForce RTX 4090 Laptop, with 16GB of VRAM.
The loss function used in the YOLOv8 model mainly
combines several parts, including the loss function for
object detection and the loss function for image
segmentation. The loss function for object detection
usually includes position loss, confidence loss, and
class loss. The position loss measures the difference
between the predicted bounding box and the true
bounding box, the confidence loss measures the
model's confidence in the presence of the target, and
the class loss measures the accuracy of the model's
prediction of the target category. The loss function for