produce a large number of network security data
every day, such as the system logs of multiple
network security equipment and the alarm
information. To efficiently visualize the dynamic
network attack, can not only help reflect the network
security situation vividly, but also help the
administrators forecast the trend of the network
security situation, find the implicit patterns and rules
send warnings and take corresponding measures.
Thus, in this paper, we investigate the integration
and storage management mechanism of massive
multi-source heterogeneous data, and propose a
general framework of the real time data computation
and visualization. Based on the framework, we realize
the visualization monitoring of network security
dynamic attacks nationwide and worldwide,
respectively. It supports the display of the TOP N
attackers, targets, attack types and so on. The
contributions can be summarized as follows.
We propose an integration and storage
management mechanism of the massive
heterogeneous multi-source data for the security
data fusion. We use the distributed real-time data
collection tool Flume for the massive multi-
source heterogeneous data aggregation, and
realize the access of heterogeneous multi-source
network security data. Then, do the unified format
pre-processing on all kinds of data integration,
and a data stream is performed based on the
extraction time.
We provide a general real time data computation
and visualization framework to realize the real-
time computing. For different sources of security
data, such as the IPS, or the firewall data, we
design different data access method to make the
real time extraction and classification of storage
for unified formatting pre-processed data. The
data stream is calculated and buffered by the
cache mechanism, which is convenient for the
subsequent visualization application.
With the proposed real time data computation and
visualization framework, we use the real security
data of the network security cloud service
platform of Chinese Academy of Sciences (CAS),
and realize the real-time massive multi-
dimensional network attack display with Data-
Driven Documents (D3).
Experiment results are given to analyze the
performance of our proposed framework on the
efficiency of data integration stage and
computation stage. We also analyze the overall
efficiency of the real-time calculation and
visualization framework.
2 RELATED WORK
This paper mainly involves the real-time data
processing (which is based on the stream processing)
and the network security information processing and
visualization. In the following, we would separate the
related work into these two parts.
The real time stream data processing can be
applied in various applications, such as social
networks (Kulkarni et al., 2015), Web Observatory
(Tinati et al., 2015), business decision (Pareek et al.,
2017), etc. The performances of the data capture, data
storage, and data computation in real time can all
affect the performance of the stream data processing.
In recent years, the study of the streaming data
processing has attracted much attention. In
(Toshniwal et al., 2014), it described the architecture
of Storm, and introduced its application in Twitter,
where Apache Storm is an open source, fault-tolerant
and distributed real-time stream data processing
system. In (Kulkarni et al., 2015), a real-time stream
data processing system for large data scale named
Heron was proposed, which was also the stream data
processing engine used inside Twitter due to its better
debug-ability and scalability. In (Yang et al., 2018), a
robust, scalable and real-time event time series
aggregation framework called TimeSeries
AggregatoR (TSAR) was presented to realize the
engagement monitoring. In (Pareek et al., 2017), a
data management platform named Striim was
described for the end-to-end stream processing,
which could enable business users to easily develop
and deploy analytical applications over real-time
streaming data by using a SQL-like declarative
language.
As for the network security information
processing and visualization, some works can be
found. In (Fischer et al., 2014), a visual analytics
system, called NStreamAware was proposed to gain
situational awareness and enhance the network
security, where distributed processing technologies
were used to analyze streams with stream slices, and
NStreamAware was also presented to analysts in a
web-based visual analytics application named
NVisAware. In (Mckenna et al., 2015), three design
methods, the qualitative coding, personas, and data
sketches were described to inform the real-world
cyber security visualization projects from a user-
centered perspective. In (Mckenna et al., 2016), a
dashboard BubbleNet was provided to visualize
patterns in cyber security data by incorporating user
feedback throughout the design.