Information Extraction from Legacy Spreadsheet-based Information System - An Experience in the Automotive Context

Domenico Amalfitano, Anna Rita Fasolino, Porfirio Tramontana, Vincenzo De Simone, Giancarlo Di Mare, Stefano Scala

Abstract

Nevertheless spreadsheets were originally designed for computing purposes and for commercial applications, they are often used in industry to implement Information Systems, thanks to the functionalities offered by integrated scripting languages and ad-hoc frameworks (e.g., Visual Basic for Applications). This technological solution allows the adoption of Rapid Application Development processes for the quickly development of Spreadsheets-based Information Systems, but the resulting systems are quite difficult to be maintained and very difficult to be migrated to other architectures such as Database-oriented Informative Systems or Web applications. In this paper we present an approach for reverse engineering the data model from an Excel spreadsheet-based information system. The approach exploits a set of heuristic rules that are automatically applied in a seven-steps process. The applicability of the process has been shown in an industrial context where it was used to obtain the UML class diagrams representing the conceptual data models of three spreadsheet-based information systems.

References

  1. Abraham R. and Erwig M., Header and unit inference for spreadsheets through spatial analyses. In Proceedings of the IEEE International Symposium on Visual Languages and Human-Centric Computing (VL/HCC), 2004, pages 165-172.
  2. Abraham R. and Erwig M., Inferring templates from spreadsheets. In Proceedings of the 28th International Conference on Software Engineering (ICSE), ACM, New York, NY, USA, 2006, pages 182-191.
  3. Abraham R., Erwig M. and Andrew S., A type system based on end-user vocabulary. In Proceedings of the IEEE Symposium on Visual Languages and HumanCentric Computing (VL/HCC), Washington, DC, USA, IEEE Computer Society, 2007, pages 215-222.
  4. Abraham R. and Erwig M., Mutation operators for spreadsheets. IEEE Transactions on Software Engineering, 35(1):94-108, 2009.
  5. Ahmad Y., Antoniu T., Goldwater S. and Krishnamurthi S., A type system for statically detecting spreadsheet errors. In Proceedings of the IEEE International Conference on Automated Software Engineering, 2003, pages 174-183.
  6. Amalfitano D., Fasolino A.R., Maggio V., Tramontana P., Di Mare G., Ferrara F., Scala S., Migrating legacy spreadsheets-based systems to Web MVC architecture: An industrial case study, Proceedings of CSMRWCRE, 2014, pages 387-390.
  7. Amalfitano D., Fasolino A.R., Maggio V., Tramontana P., De Simone V., Reverse Engineering of Data Models from Legacy Spreadsheets-Based Systems: An Industrial Case Study, Proceedings of the 22nd Italian Symposium on Advanced Database System, 2014, pages 123-130.
  8. Bovenzi D., Canfora G., Fasolino A.R., Enabling legacy system accessibility by Web heterogeneous clients. In proceedings of the Seventh European Conference on Software Maintenance and Reengineering, IEEE CS Press, 2003, pages 73-81.
  9. Canfora G., Fasolino A.R., Frattolillo G., Tramontana P., A wrapping approach for migrating legacy system interactive functionalities to Service Oriented Architectures. Elsevier, Journal of Systems and Software, 2008, vol. 81(4):463-480,
  10. Chen Z. and Cafarella M., Automatic web spreadsheet data extraction. In Proceedings of the 3rd International Workshop on Semantic Search Over the Web (SS@ 7813). ACM, New York, NY, USA, 2013, 8 pages.
  11. Cunha, J., Erwig M., Saraiva J., Automatically Inferring ClassSheet Models from Spreadsheets. In IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), IEEE CS Press, 2010, pages 93-100.
  12. De Lucia A., Francese R., Scanniello G., Tortora G., Developing legacy system migration methods and tools for technology transfer. In Software Practice and Experience 38(13), Wiley,2008, pages 1333-1364.
  13. Di Lucca G.A., Fasolino A.R., De Carlini U., Recovering class diagrams from data-intensive legacy systems. In Proceedings of International Conference on Software Maintenance, ICSM, IEEE CS Press, 2000, pages 52- 62.
  14. Fisher M. and Rothermel G., The EUSES spreadsheet corpus: A shared resource for supporting experimentation with spreadsheet dependability mechanisms. In In 1st Workshop on End-User Software Engineering, 2005, pages 47-51.
  15. Hermans F., Pinzger M., van Deursen A., Automatically extracting class diagrams from spreadsheets. In proceedings of the 24th European conference on Object-oriented programming (ECOOP'10). SpringerVerlag, Berlin, Heidelberg, 2010, pages 52-75.
  16. Hermans F., Pinzger M. and van Deursen A., Supporting professional spreadsheet users by generating leveled dataflow diagrams. In Proceedings of the 33rd International Conference on Software Engineering (ICSE 7811). ACM, New York, NY, USA, 2011, pages 451-460.
  17. Hung V., Benatallah B. and Saint-Paul R., Spreadsheetbased complex data transformation. In Proceedings of the 20th ACM international conference on Information and knowledge management (CIKM 7811). ACM, New York, NY, USA, 2011, pages 1749-1754.
  18. Janvrin D. and Morrison J., Using a structured design approach to reduce risks in end user spreadsheet development. Information & Management, 37(1):1- 12, 2000.
  19. Mittermeir R. and Clermont M., Finding high-level structures in spreadsheet programs. In Proceedings of the Ninth Working Conference on Reverse Engineering (WCRE), IEEE Computer Society,2002, pages 221-232.
  20. Panko R.R. and Halverson R.P., Individual and group spreadsheet design: Patterns of errors. In Proceedings of the Hawaii International Conference on System Sciences (HICSS), 1994, pages 4-10.
  21. Ronen B., Palley M.A. and Lucas H.C., Spreadsheet analysis and design. Communications of the ACM, 32:84-93, 1989.
  22. Scaffidi C., Shaw M., Myers B., Estimating the Numbers of End Users and End User Programmers. In Proceedings of the 2005 IEEE Symposium on Visual Languages and Human-Centric Computing, 2005, pages 207-214.
  23. Shokry H., Hinchey M., Model-Based Verification of Embedded Software. In IEEE Computer, 42(4), 2009, pages 53-59.
Download


Paper Citation


in Harvard Style

Amalfitano D., Fasolino A., Tramontana P., De Simone V., Di Mare G. and Scala S. (2014). Information Extraction from Legacy Spreadsheet-based Information System - An Experience in the Automotive Context . In Proceedings of 3rd International Conference on Data Management Technologies and Applications - Volume 1: KomIS, (DATA 2014) ISBN 978-989-758-035-2, pages 389-398. DOI: 10.5220/0005139603890398


in Bibtex Style

@conference{komis14,
author={Domenico Amalfitano and Anna Rita Fasolino and Porfirio Tramontana and Vincenzo De Simone and Giancarlo Di Mare and Stefano Scala},
title={Information Extraction from Legacy Spreadsheet-based Information System - An Experience in the Automotive Context},
booktitle={Proceedings of 3rd International Conference on Data Management Technologies and Applications - Volume 1: KomIS, (DATA 2014)},
year={2014},
pages={389-398},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005139603890398},
isbn={978-989-758-035-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of 3rd International Conference on Data Management Technologies and Applications - Volume 1: KomIS, (DATA 2014)
TI - Information Extraction from Legacy Spreadsheet-based Information System - An Experience in the Automotive Context
SN - 978-989-758-035-2
AU - Amalfitano D.
AU - Fasolino A.
AU - Tramontana P.
AU - De Simone V.
AU - Di Mare G.
AU - Scala S.
PY - 2014
SP - 389
EP - 398
DO - 10.5220/0005139603890398