Authors:
Morihiro Hayashida
1
;
Hitoshi Koyano
2
and
Tatsuya Akutsu
3
Affiliations:
1
National Institute of Technology, Matsue College, Japan
;
2
Quantitative Biology Center, Riken, Japan
;
3
Institute for Chemical Research, Kyoto University, Japan
Keyword(s):
Generalized Series-parallel Graph, Grammar-based Compression, Integer Linear Programming.
Abstract:
We address a problem of finding generation rules from biological data, especially, represented as directed and
undirected generalized series-parallel graphs (GSPGs), which include trees, outerplanar graphs, and series-parallel
graphs. In the previous study, grammars for edge-labeled rooted ordered and unordered trees, called
SEOTG and SEUTG, respectively, were defined, and it was examined to extract generation rules from glycans
and RNAs that can be represented by rooted tree structures, where integer linear programming-based methods
for finding the minimum SEOTG and SEUTG that produce only given trees were developed. In nature and organisms,
however, there are various kinds of structures such as gene regulatory networks, metabolic pathways,
and chemical structures that cannot be represented as rooted trees. In this study, we relax the limitation of
structures to be compressed, and propose grammars representing edge-labeled directed and undirected GSPGs
based on context-free
grammars by extending SEOTG and SEUTG. In addition, we propose an integer linear
programming-based method for finding the minimum GSPG grammar in order to analyze more complicated
biological networks and structures.
(More)