2 GOALS
This paper aims to develop and implement a
sophisticated multimodal classification model for
integration into Bosch Car Multimedia's production
lines. The model aims to reduce production costs by
preventing the use of faulty Printed Circuit Boards
(PCBs), thereby avoiding resource wastage. Using
innovative multimodal image analysis, we seek to
enhance fault detection precision and effectiveness,
mitigating financial losses from defective component
assembly. Our research includes a comprehensive
comparative analysis between our multimodal
models, which combine structured data and images,
and traditional classification models using only
tabular data. By examining these approaches, we aim
to validate the multimodal model's effectiveness and
improve fault prediction accuracy.
Additionally, our approach aims to address the
challenge of data imbalance, striving to achieve
enhanced efficacy with reduced data volume. This
involves employing specialized preprocessing
techniques and statistical modeling to rectify data
imbalances, all with the aim of enhancing the overall
predictive capabilities of our models. This dual
emphasis on mitigating data imbalances and
achieving superior outcomes with reduced data
volumes underscores our commitment to efficiency
and efficacy in this research endeavor.
Furthermore, a critical aspect of our research
initiative involves establishing a robust and efficient
data pipeline that seamlessly integrates both PCB
images and structured data. Our objective is to
develop a real-time data processing framework
capable of supporting the multimodal classification
model during deployment. This pipeline plays a
pivotal role in ensuring the sustained adaptability and
relevance of our model amidst the dynamic industrial
environment.
3 RELATED WORK
In the rapidly evolving landscape of Industry 4.0,
ensuring PCB quality remains crucial. Literature
highlights significant advancements in PCB fault
detection. Key contributions from various studies
emphasize traditional image processing and modern
deep learning models, particularly convolutional
neural networks (CNNs). A recurring theme is the
need for extensive datasets, with future directions
focusing on augmenting datasets and improving
detection of smaller components.
(Zakaria et al., 2020) explore defects during the
solder paste printing process, introducing Solder
Paste Inspection (SPI) and Automatic Optical
Inspection (AOI) as essential tools. They delve into
machine learning approaches to enhance detection
efficiency, aiming to improve production yields and
reduce rework costs.
(Cho et al., 2023) present a predictive framework
for semiconductor memory module tests, addressing
imbalanced outcomes through multimodal fusion of
tabular and image data. This framework optimizes
testing strategies, demonstrating its real-world
efficacy and reflecting the broader trend of leveraging
advanced technologies to boost productivity in
semiconductor manufacturing.
In multimodal machine learning, diverse data
sources are used to improve model performance and
diagnostic accuracy. (Huang et al., 2020) advance
pulmonary embolism (PE) diagnosis by integrating
CT imaging with electronic health record (EHR) data,
demonstrating the superiority of a late fusion model
over imaging-only or EHR-only models.
Similarly, (Tang et al., 2022) enhance pulmonary
nodule classification by combining structured and
unstructured data. Their models outperform those
using only unstructured data, highlighting the
importance of integrating patient demographics and
clinical characteristics with medical images for more
accurate diagnoses.
(Yang et al., 2022) provide an overview of
multimodal learning, discussing methods like early
fusion, late fusion, and hybrid fusion. They address
challenges in fusing multimodal features efficiently
and explore model-based fusion methods such as
multiple kernel learning (MKL) and neural networks
(NN) to enhance feature representation.
(Yan et al., 2021) focus on breast cancer
classification using multimodal data. They propose
integrating pathological images with Electronic
Medical Records (EMR), emphasizing the benefits of
denoising autoencoders over dimensionality
reduction. Their feature-level fusion method achieves
higher accuracy by combining images and structured
data, surpassing models using only structured data or
images.
3.1 A Comprehensive Analysis of the
Production Line
To gain a comprehensive understanding of the
production line dynamics, a detailed overview of its
constituent processes is essential, with a specific
focus on the initial three stages (Zakaria et al., 2020).
This targeted approach facilitates early detection of