However, for information extraction as a first
step or as a main solution before using a statistical
program gawk has an optimal position due to the
combination of big data processing and a short
implementation time (Bharathi et al., 2012).
In this paper we especially considered the
interaction of gawk and Mathematica by Wolfram
Research. Both programs have highly potential
methods with structural transformations and
calculations. If there are well identified
mathematical structures, Mathematica is the
preferable tool to use. An illustrative example for
this case is given by the calculation of the
eigenvalues and eigenvectors of high dimensional
matrices. In order to identify structures from real life
biomedical data gawk is much more powerful than
conventional approaches. Programming in gawk is
clearly advantageous in cases when they allow using
scripting elements and libraries which are not
available in AWK and can solve problems not
implemented in Mathematica. Examples for this are
partition problems of the considered graphs which
are nearby to NP-hard and NP-complete problems.
In such cases scripting programs position themselves
between AWK and Mathematica.
As mentioned in (Bharathi et al., 2012) there are
quite different aspects with respect to “big data”. As
an example big data in the context of image
processing differ much from big data in health care
context with quite different scales, different degrees
in accuracy and variability in time and different
frequencies (day, week, month, quarter, year,
decades) which should be taken into account when
analyzing data (Schuster, 2009).
Further research should address a comparison of
our solution with approaches based on the map-
reduce paradigm. In particular, Hadoop as already
mentioned is capable to rapidly process large data
sets in a distributed file environment and executes
tasks where data is stored. However, only less is
known about comparing their performance with
gawk. Although the Hadoop guide gives an
illustrative example in chapter 2 (White, 2012), it
might also be worth to think about potentials to
integrate gawk into such environments.
7 CONCLUSION
There are numerous tools nowadays for big data
analysis. Gawk as one of the oldest tools to analyze
big data still has a high potential in complex
situations of big data analysis. However, the
potential of combining programs with quite different
advantages in real life problems with optimal
interactions needs much more attention and further
analysis.
REFERENCES
Amann, U., Schmedt, N., Garbe, E. 2012. Prescribing of
potentially inappropriate medications for the elderly.
Age 65(69): 70-74.
Begoli, E. 2012. A short survey on the state of the art in
architectures and platforms for large scale data
analysis and knowledge discovery from data.
Proceedings of the WICSA/ECSA 2012: 177-183.
Bharathi, R., Keswani, N. N., Shinde, S. D. 2012. An
Approach to mining massive Data. Proceedings of the
MPGI National Multi Conference. International
Journal of Computer Applications: 32-36
Cao, L. 2016. Data science: nature and pitfalls. IEEE
Intelligent Systems 31(5): 66-75.
Cao, L., Fayyad, U. 2016. Data science: Challenges and
directions. Commun. ACM: 1-9.
Hassani, H., & Silva, E. S. 2015. Forecasting with big
data: A review. Annals of Data Science 2(1): 5-19.
Hu, H., Wen, Y., Chua, T. S., Li, X. 2014. Toward
scalable systems for big data analytics: A technology
tutorial. IEEE Access 2: 652-687.
Khan, N., Yaqoob, I., Hashem, I. A. T., Inayat, Z.,
Mahmoud Ali, W. K., Alam, M. 2014. Big data:
survey, technologies, opportunities, and challenges.
The Scientific World Journal, Article ID 712826.
Oussous, A., Benjelloun, F. Z., Lahcen, A. A., Belfkih, S.
2017. Big Data Technologies: A Survey. Journal of
King Saud University-Computer and Information
Sciences.
Pike, R., Dorward, S., Griesemer, R., Quinlan, S. 2005.
Interpreting the data: Parallel analysis with Sawzall.
Scientific Programming 13(4): 277-298.
Pohl-Dernick, K., Meier, F., Maas, R., Schöffski, O.,
Emmert, M. 2016. Potentially inappropriate
medication in the elderly in Germany: an economic
appraisal of the PRISCUS list. BMC health services
research 16(1): 109.
Press, G. 2013. A very short history of big data. Forbes
Tech Magazine, May, 9.
Robbins, A. 2011. GNU awk 4.0: teaching an old bird
some new tricks. Linux Journal 209: 5.
Schuster, R. 2009. Biomathematik, Stuttgart, Teubner-
Verlag.
Schuster, R. 2015: Graphentheoretische Analyse von
Vernetzungsstrukturen zwischen Wirkstoffen und
Wirkstoffgruppen in Bezug auf gleichzeitige
Verordnung beim Patienten. GAA. German Medical
Science.
Schuster, R., Schuster, M. 2015. Graphentheoretische
Analyse von Vernetzungsstrukturen im
vertragsärztlichen Sektor einer Region der
kassenärztlichen Vereinigung. German Medical
Science. DocAbstr. 202
Extracting Information and Identifying Data Structures in Pharmacological Big Data using Gawk
393