several sections, including data exploration and
visualization, machine learning model development,
results and discussion, and conclusions. This
structured approach enables a thorough examination
of the historical context and machine learning's
potential in improving tsunami prediction methods.
The methodology used in this study is based on a
comprehensive historical dataset derived from
tsunami records. The dataset was loaded into Python
programming and data analysis libraries (such as
Pandas, Folium, and Plotly Express), relevant
features were extracted, and visual images were
created to identify patterns and trends. A random
forest regression model was then created, trained, and
tested to predict the maximum water level height
using seismic parameters.
The primary goal of this research is to contribute
to the advancement of tsunami prediction methods by
incorporating machine learning techniques. This
study aims to provide a more accurate and adaptable
tool for predicting maximum water level heights
associated with tsunamis by examining historical data
critically and developing predictive models. The
findings of this study have the potential to
significantly improve the existing understanding of
tsunami dynamics, improve the effectiveness of early
warning systems, and, ultimately, help mitigate
tsunamis' devastating effects on coastal communities.
The limitations of current tsunami prediction
methods are numerous. Analytical models may
oversimplify the complexities of tsunami dynamics,
producing inaccurate predictions. While empirical
methods can be used to analyze historical data, they
may struggle to adapt to the complex relationships
between different factors. The machine learning
model proposed in this study aims to address these
limitations by leveraging the power of data-driven
prediction to capture complex patterns that traditional
methods may miss.
Machine learning, as a data-driven approach, has
proven successful in a wide range of fields, including
image recognition and natural language processing.
Machine learning's ability to recognize complex
patterns and relationships in data makes it a useful
tool for improving tsunami prediction accuracy.
Machine learning models can use historical data to
learn subtle relationships between seismic parameters
and maximum water heights, allowing them to make
more accurate predictions about future events.
In conclusion, this paper introduces a novel
approach to tsunami prediction that incorporates
machine learning techniques. The use of data
exploration, visualization, and the creation of a
random forest regression model provides a
comprehensive approach to understanding historical
tsunami patterns and predicting maximum water
heights. This study is significant because it has the
potential to improve current tsunami forecasting
methodologies by providing more accurate and
adaptable tools. As the research respond to complex
natural hazards, such innovations are critical for
protecting coastal communities and mitigating the
effects of these catastrophic events.
2 METHODOLOGIES
The foundation of this research lies in the careful
collation of a rich dataset obtained from the National
Center for Environmental Information (NCEI), a
division of the National Oceanic and Atmospheric
Administration (NOAA) (2023). This dataset covers
a wide historical period, from 1800 to 2024, and
includes a plethora of parameters critical to
understanding the complexity of tsunamis. These
parameters include earthquake magnitude, tsunami
causes, geographic coordinates, maximum water
levels, and impact statistics. The information's
authenticity and reliability were ensured by using
NCEI's Tsunami Event Data Portal.
The first step was to load the dataset into a Pandas
DataFrame using Python code. The initial
investigation involved selecting relevant columns and
applying filters to include instances where the
maximum water height exceeded the significant
threshold of 10 meters (Parwanto 2014). As a result,
a new data frame was created for visualization,
focusing on key columns such as year, country,
magnitude, and maximum water height.
A notable aspect of the data exploration phase was
the creation of a bar chart. The visualization was
created with Plotly Express and shows a ranking of
the top 30 historical maximum water levels. The chart
cleverly displays the relevant countries and years to
show the effect of tsunamis on water table height.
To improve geographic understanding of a
significant event, specifically the "1958 Lituya Bay
Earthquake and Mega-tsunami" (NOAA 2024),
shown as “USA_1958” in a bar chart, an interactive
map was created with Folium. The map shows the
location of the epicenter (marked as the center) and
Lituya Bay. The interactive map depicts the spatial
context, which aids in providing a nuanced
visualization of the affected area.
During the exploration phase, an innovative effort
was made to create thermal maps of areas with high
water levels (>10 meters). Distinct map was created
using latitude and longitude data to show the areas