Ganglia source
DB source
New input adapters can be added in a simple way
to allow future extensions. It is worth noting that at
runtime only a single type of input adapter can be
configured for each DCF instance (if more input
adapters are configured only one will be executed).
2.2.1 Grammar based Input Adapters
This class of adapters covers a high range of possible
sources. These adapters will be responsible for
transforming raw data (that can be described using an
EBNF notation) to structured data, leveraging a
“compiler-compiler” approach, that consists in the
generation of a data parser.
Since 1960, the tools that offer automatic
generation of parsers are increased in number and
sophistications. Today about one hundred different
parser generator tools, including commercial
products and open source software, are available. We
analysed such tools and compared them taking into
account the following requirements:
Formal description of the target language in
EBNF or EBNF-like notation;
Generated parser in Java programming
language;
Optimized and high performance parsing;
Low or no-runtime dependencies;
BSD license;
High quality error reporting;
High availability of online/literature
documentation.
The result of the comparison is that JavaCC
(JAVACC, 2015) and ANTLR (ANTLR, 2015) are
both valuable solutions for generating a parser. We
selected them for the DCF implementation for the
following motivations:
Input Grammar notation
Both JavaCC and ANTLR accept a formal
description of the language in EBNF notation. The
EBNF notation is the ISO standard, well-known by
developers and allows high flexibility.
Type of parsers generated
JavaCC produces top-down parsers. ANTLR
generates top-down parsers as well. A top-down
parser is strongly customizable, simple to read and
understand. These advantages allow high
productivity and improve the debugging process.
Output language
ANTLR, and in particular version 4, is able to
generate parsers in Java, C# , Python2 and 3. JavaCC
is strongly targeted to Java, but also supports C++ and
JavaScript. The main advantage of a parser in Java is
its high portability.
Run-time dependencies
JavaCC generates a parser with no runtime
dependencies, while ANTLR needs external libraries.
Performance
To test the performance of JavaCC and ANTLR
we have conducted many tests. One of the
performance indicator that has been evaluated is the
parsing time.
An example of the conducted tests (on the same
machine – Windows 7 - 64 bit - i7 - 6GB) is the
measurement of the time needed to parse the
following mathematical expression:
11+12*(24/8)+(1204*3)+12*(24/8)+(1204*3)
+12*(24/8)+(1204*3)+12*(24/8)+(1204*3)+1
1+12*(24/8)+(1204*3)+12*(24/8)+(1204*3)+
12*(24/8)+(1204*3)+12*(24/8)+(1204*3)+11
+12*(24/8)+(1204*3)+12*(24/8)+(1204*3)+1
2*(24/8)+(1204*3)+12*(24/8)+(1204*3)
(1)
The grammar files written for the two parser
generators are perfectly equivalent to have a common
starting point. JavaCC is faster than ANTLR, in fact
after repeated measures it is capable to parse the
expression in an average time less than 3ms, while
ANTLR(version 4) takes over 60ms.
Generated code footprint
Starting from the same code used to evaluate the
performance, JavaCC and ANTLR require, for the
generated code, a comparable footprint (less than
20KB). ANTLR however, due to runtime
dependencies, requires adding into the project an
external library that takes about 1 MB.
License
Both parser generators are under BSD license.
BSD provides high flexibility for the developers and
minimum restrictions for the redistribution of the
software.
From this analysis, even if the features of JavaCC
and ANTLR are comparable, the best performance in
parsing and generation of the output source code, the
smaller code footprint and the absence of runtime
dependencies, have led us to select JavaCC as the
parser generator to be used in the Data Collection
Framework.
The adoption of the JavaCC parser generator,
requires a declarative description of the data
structure.
The declarative description of the data structure is
provided via a grammar file (.jj) that describes the
tokens and the relative semantics of the data to parse
using an Extended-Backus-Naur Form (EBNF,