
by the PCAP format because of its ability to metic-
ulously record network communications. This ap-
proach facilitates comprehensive analysis; however,
the large volume of data generated necessitates effi-
cient processing methods.
Consequently, in recent years, flow data, rep-
resented by IP Flow Information Export (IPFIX)
(Trammell and Boschi, 2008; Claise et al., 2013),
have been used. Flow data consist of information that
aggregates packet data into sessions and have the fol-
lowing characteristics:
• The data size is small because individual packets
are aggregated into sessions and recorded.
• It preserves statistical information about sessions,
including source and destination IP addresses,
port numbers, protocols, session start and end
times, and communication volume.
• Information can be aggregated into uni-flows or
bi-flows during a given session.
In their seminal work, Tang et al. (Tang et al.,
2016; Tang et al., 2018) proposed deep learning-
based anomalous communication detection models
for flow data and achieved high detection accuracy. In
addition, Lo et al. (Lo et al., 2021) proposed EGraph-
SAGE, an extension of GraphSAGE, a type of graph
neural network, and demonstrated excellent perfor-
mance in anomaly detection using flow data. As net-
work traffic is projected to rise in the future due to
the proliferation of 5G and the surge in IoT devices,
there is a growing need for research on anomalous
communication detection methods using flow data.
The architecture of systems incorporating IoT devices
is predominantly cloud-centered, as exemplified by
Amazon Web Services (AWS), Microsoft Azure, and
Google Cloud Platform (GCP).
In cloud-based architectures, procuring communi-
cation data such as PCAP and IPFIX from operational
services necessitates the installation of dedicated li-
braries on each server. This approach raises concerns
regarding its impact on operational costs and perfor-
mance. As an alternative, cloud logs provided by
cloud services, such as VPC Flow Logs in AWS, can
be utilized. The present study focuses on AWS VPC
Flow Logs; however, these cloud logs are not intended
for storing communication data such as IPFIX data,
which hinders their application in anomalous commu-
nication detection, leading to the following issues:
• Records by time window: Records are output
within a time window, so a single session may be
split into multiple records.
• Unidirectional: Transmitted data and received
data are output as separate records
• Retained information: Because only the time win-
dow is available, the temporal order between
records is lost when multiple records occur within
the same time window. Compared with IPFIX, the
number of recorded features is limited.
In this study, we propose a methodology for de-
tecting anomalous communications using only cloud
logs, without making any specific changes to the
cloud-based system architecture, by addressing the
aforementioned issues.
3 PROPOSED METHOD
3.1 Conversion from VPC Flow Logs to
Sessions
In this study, records of VPC Flow Logs are defined
as a session if they have the same values, including
source IP addresses, destination IP addresses, source
port numbers, destination port numbers, and proto-
cols, and if the time interval between consecutive
records is within the predefined time window. Specif-
ically, two cloud logs are considered to be in the same
session if the time interval between the start times of
consecutive cloud logs is within 600 seconds. The
reason for choosing 600 seconds is that the maximum
duration of a single window in AWS VPC Flow Logs
is 600 seconds.
Furthermore, the aggregation interval of VPC
Flow Logs records is defined as the time window
mentioned above. Records identified as part of the
same session are then aggregated by session accord-
ing to the following procedure:
1. Sort records in chronological order based on
matching signatures (combinations of source and
destination IP addresses, port numbers, and pro-
tocols).
2. Determine session boundaries based on the time
interval conditions mentioned above.
3. Aggregate statistics (packet count, byte count,
etc.) for each session.
3.2 Conversion to Bi-Flows
VPC Flow Logs are not captured for each individ-
ual communication session; therefore, it is not possi-
ble to determine the outbound and inbound directions
for a particular session. Consequently, a processing
method has been implemented that integrates the out-
bound and inbound directions of communication con-
currently with sessionization.
A Study of Anomalous Communication Detection for IoT Devices Using Flow Logs in a Cloud Environment
411