Yadav et al. (Yadav et al., 2019) aimed to predict
the catch per unit effort (CPUE) of fish by designing
and comparing three types of fuzzy inference systems:
Mamdani FIS, Sugeno FIS, and Sugeno-ANFIS, using
Chl-a and Kd_490 as input variables. These factors are
elements of the marine environment that influence
CPUE. Each model was implemented using
MATLAB's Fuzzy Toolbox, and prediction accuracy
was evaluated using Mean Squared Error (MSE) and
Mean Error Rate. The comparison results showed that
the Sugeno-ANFIS model outperformed the other two
FIS models and maintained high prediction accuracy
even on 28 independent test datasets. This confirmed
that Sugeno-ANFIS is effective in handling complex
and uncertain marine environmental data, making it the
most reliable model for predicting CPUE. However,
the study by Yadav et al. aimed to predict CPUE and
did not focus on catch prediction itself. Additionally,
the authors' feature engineering was limited. In contrast,
the present study introduces methods such as lag
features and moving average features to capture
temporal dependencies in time-series data.
3 PROPOSED METHOD
In this study, this study proposes a method that
combines fishing catch data, weather data, and tidal
data to predict fishing outcomes. This approach aims
to forecast whether fish can be caught on a given day
based on prior forecasts, thereby making it easier for
beginners to choose suitable fishing days. This
section first describes data collection and
preprocessing, followed by the method for
constructing the prediction model.
Additionally, the “number of catches per person
per day” is defined as the “recommendation score.”
3.1 Data Collection
The data used in this study consist of three types:
fishing catch data, weather data, and tidal data. Firstly,
fishing catch data were collected from the official
website of "Yokohama Fishing Piers". The collected
data includes "fishing dates," "number of visitors,"
"water temperature," "weather," and "catch data"
from the "Honmoku Fishing Facility" spanning from
January 1, 2023, to October 2, 2024. The catch data
encompass "fish species" and "number of catches."
Next, weather data were downloaded from the
official website of the Japan Meteorological Agency.
The selected region was Yokohama, and the collected
information includes "average temperature (°C),"
"average wind speed (m/s)," "maximum temperature
(°C)," "minimum temperature (°C)," "maximum wind
speed (m/s)," and "average humidity (%)".
Finally, tidal data were obtained from the Japan
Meteorological Agency's official website. The
retrieved information relates to low tide times.
Although there are two low tides per day, this study
utilizes only the first occurrence.
3.2 Data Preprocessing and Feature
Engineering
To enhance the quality of the data used for
constructing the fishing catch prediction model,
preprocessing was performed. The datasets involved
include fishing catch data, tidal data, and weather data,
each possessing unique characteristics and formats.
Below are the preprocessing steps for each dataset.
3.2.1 Data Preprocessing
Since handling missing values and ensuring data
integrity are essential to model performance, we
addressed any missing values in each dataset first. For
consecutive missing data points, Forward-Fill and
Backward-Fill methods were applied to maintain data
continuity. This process formatted the data into a
structure suitable for numerical analysis.
Additionally, fishing catch data may contain
invalid entries or unnecessary information, which
were excluded through data cleaning procedures.
Formatting date information is also an essential
part of preprocessing. The "date" columns in each
dataset were represented in multiple formats, so they
were uniformly converted to date types.
Finally, the fishing catches data, tidal data, and
weather data were merged based on the data to create
a single integrated dataframe. After merging, missing
values were addressed again using Forward-Fill and
Backward-Fill to ensure data continuity. This
integration maintained consistency across the
datasets while formatting the data appropriately for
the prediction model.
3.2.2 Feature Engineering
To maximize the performance of the prediction model,
feature engineering was conducted. In this study, the
following methods were employed to generate and
transform useful features:
Firstly, lag features were added. This method
captures the influence of past data on current fishing
outcomes. Specifically, features such as the number
of catches, number of visitors, and temperature were
lagged based on the past one to seven days. This