Automating Data Extraction from Semi-Structured Industrial Documents: The Alstom Experience
2024 (English) In: IEEE International Conference on Emerging Technologies and Factory Automation, ETFA, Institute of Electrical and Electronics Engineers Inc. , 2024Conference paper, Published paper (Refereed)
Abstract [en]
In the system development of modern railroad vehicles, engineers frequently use a plethora of diverse notations to specify various systems, subsystems, and their associated concerns. The use of diverse notations introduces complex challenges linked with their management and integration. Conventional practices, which rely on manual revisions and translations, prove to be both time-intensive and cost-prohibitive. In addition, they carry substantial risks of human error, thereby potentially introducing faults into the system. Such practices are deemed inadequate for the railway industry, which is safety-critical in its nature and places paramount importance on the assurance of reliability and data integrity. To address these challenges, we developed a regular expression-based system facilitating the automatic translation of semi-structured texts into structured data, with a particular focus on ensuring data integrity and reliability. We have defined the system capitalizing on the insights and practical experience of our industrial partner, Alstom Rail Sweden AB, and validated it within their development process. The validation demonstrated the practicality of the system in a real-world context and highlighted valuable lessons learned throughout the process. Building on these insights, we applied model-driven engineering principles to generalize the system, providing an automated solution to the data extraction challenge from tender documents in the railway domain.
Place, publisher, year, edition, pages Institute of Electrical and Electronics Engineers Inc. , 2024.
Keywords [en]
automation, data extraction, MDE, railroad, tender documents, Vehicular embedded software systems, Industrial locomotives, Interlocking signals, Network security, Steganography, Cost prohibitive, Data integrity, Embedded software systems, Railroad vehicles, Semi-structured, System development, Vehicular embedded software system
National Category
Embedded Systems
Identifiers URN: urn:nbn:se:mdh:diva-69006 DOI: 10.1109/ETFA61755.2024.10711023 Scopus ID: 2-s2.0-85207849123 ISBN: 9798350361230 (print) OAI: oai:DiVA.org:mdh-69006 DiVA, id: diva2:1912941
Conference 29th IEEE International Conference on Emerging Technologies and Factory Automation, ETFA 2024, Padova 10 September 2024 through 13 September 2024
2024-11-132024-11-132024-11-13 Bibliographically approved