Comprehensive analysis of aviation maintenance text reports using natural language processing methods

User Rating:  / 0
PoorBest 

Authors:


A. Savostin, orcid.org/0000-0002-5057-2942, Manash Kozybayev North Kazakhstan University, Petropavlovsk, Republic of Kazakhstan

G. Kaipbek*, orcid.org/0000-0003-2595-7434, Civil Aviation Academy, Almaty, Republic of Kazakhstan, e-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.

K. Koshekov, orcid.org/0000-0002-9586-2310, Civil Aviation Academy, Almaty, Republic of Kazakhstan

G. Savostina, orcid.org/0000-0001-7042-4480, Manash Kozybayev North Kazakhstan University, Petropavlovsk, Republic of Kazakhstan

K. Wardle, orcid.org/0009-0008-4866-3934, Civil Aviation Academy, Almaty, Republic of Kazakhstan; JSC Air Astana, Almaty, Republic of Kazakhstan

* Corresponding author e-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.


повний текст / full article



Naukovyi Visnyk Natsionalnoho Hirnychoho Universytetu. 2025, (6): 157 - 167

https://doi.org/10.33271/nvngu/2025-6/157



Abstract:



Purpose.
This study aims to develop and validate a comprehensive approach for analyzing unstructured textual descriptions of defects extracted from actual aviation maintenance data. The goal is to improve both the efficiency and depth of fault analysis by addressing two key tasks: automatic classification of defects into standard categories and identification of latent thematic subgroups within these categories.


Methodology.
The research is based on a dataset containing maintenance records from nine commercial aircraft over a seven-year period. A multi-stage preprocessing pipeline was developed, including an algorithm for domain-specific abbreviation identification and expert-driven decoding. To solve the multiclass classification task across 30 Chapter–Section (CS) categories, four approaches were compared: CountVectorizer with LinearSVC, TF-IDF and Word2Vec with logistic regression, and fine-tuning of the transformer-based DistilBERT model. For an in-depth analysis of the largest defect category, topic modeling based on Latent Dirichlet Allocation (LDA) was applied, with a quantitative procedure for selecting the optimal number of topics.


Findings.
The best performance in classification was achieved by the TF-IDF with logistic regression approach, reaching f1-macro = 0.762 and Cohen’s Kappa = 0.809, statistically comparable to CountVectorizer with LinearSVC. Classical methods significantly outperformed neural network models, underscoring their robustness for analyzing short technical texts. Topic modeling successfully decomposed the largest defect category into five interpretable and semantically coherent subgroups.


Originality.
The novelty of this work lies in developing and testing a formalised method for analysing unstructured aviation maintenance data, implemented as a single integrated process. The study also provides a detailed comparative evaluation of classical and modern NLP models on domain-specific aviation maintenance data.


Practical value.
The work is practical in nature and contains results which are ready for implementation. A prototype of an automated classifier has been created which is capable of processing the main flow of daily defect reports, reducing the time required for manual processing. An in-depth failure analysis tool has also been developed, which provides a transition from general fault codes to the analysis of specific sub-problems. This contributes to optimizing maintenance programs, enhancing diagnostic procedures, and ultimately improving flight safety.



Keywords:
natural language processing, maintenance, aircraft, classification, topic modeling

References.


1. Dhillon, B. S., & Liu, Y. (2006). Human error in maintenance: a review. Journal of Quality in Maintenance Engineering, 12(1), 21-36. https://doi.org/10.1108/13552510610654510

2. Agustian, E. S., & Pratama, Z. A. (2024). Artificial Intelligence Application on Aircraft Maintenance: A Systematic Literature Review. EAI Endorsed Trans IoT, 10.

3. Kalantayevskaya, N., Koshekov, K., Latypov, S., Savostin, A., & Kunelbayev, M. (2022). Design of decision-making support system in power grid dispatch control based on the forecasting of energy consumption. Cogent Engineering, 9(1). https://doi.org/10.1080/23311916.2022.2026554

4. Sathyananda Swamy, H. V., Manoj, B. N., Zaiba, N., & Pandey, M. (2024). A Study of Artificial Intelligence in Aviation Management, (pp. 108-114). QTanalytics Publication (Books). https://doi.org/10.48001/978-81-966500-8-7-11

5. Errico, A., Travascio, L., & Vozella, A. (2025). Analysis of Safety Metrics Supporting Air Traffic Management Risk Models. Engineering Proceedings, 90, 43. https://doi.org/10.3390/engproc2025090043

6. Jammal, P., Pinon-Fischer, O., Mavris, D., & Wagner, G. (2025). Predictive Maintenance of Aircraft Braking Systems: A Machine Learning Approach to Clustering Brake Wear Patterns. AIAA SciTech 2025 Forum. https://doi.org/10.2514/6.2025-0710

7. Savostin, A., Koshekov, K., Tuleshov, A., Savostina, G., & Koshekov, A. (2024). Development of remote diagnostic monitoring system for pumping equipment with open architecture. Radioelectronic and Computer Systems, 4(112), 192-206. https://doi.org/10.32620/reks.2024.4.16

8. Sundaram, S., & Zeid, A. (2025). Technical language processing for Prognostics and Health Management: applying text similarity and topic modeling to maintenance work orders. Journal of Intelligent Manufacturing, 36, 1637-1657. https://doi.org/10.1007/s10845-024-02323-4

9. Kretz, D. R. (2018). Experimentally Evaluating Bias-Reducing Visual Analytics Techniques in Intelligence Analysis. In Ellis, G. (Ed.). Cognitive Biases in Visualizations. Cham, Springer. https://doi.org/
10.1007/978-3-319-95831-6_9

10.      Nanyonga, A., Joiner, K., Turhan, U., & Wild, G. (2025). Applications of Natural Language Processing in Aviation Safety: A Review and Qualitative Analysis. Reliability Engineering & System Safety, (in press). https://doi.org/10.48550/arXiv.2501.06210

11.      Rogers, F., Kovaleva, O., & Rumshisky, A. (2020). A Primer in BERTology: What We Know About How BERT Works. Transactions of the Association for Computational Linguistics, 8, 842-866.
https://doi.org/10.1162/tacl_a_00349

12.      Nanyonga, A., Wasswa, H., Joiner, K., Turhan, U., & Wild, G. (2025). Explainable Supervised Learning Models for Aviation Predictions in Australia. Aerospace, 12, 223. https://doi.org/10.3390/aerospace1203022

13.      Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., …, & Liu, T. (2024). A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. ACM TOIS, (in press). https://doi.org/10.48550/arXiv.2311.05232

14.      Kosch, T., & Feger, S. (2024). Risk or Chance? Large Language Models and Reproducibility in HCI Research. ACM Interactions. https://doi.org/10.48550/arXiv.2404.15782

15.      Alomar, I., & Nikita, D. (2025). Managing Operational Efficiency and Reducing Aircraft Downtime by Optimization of Aircraft On-Ground (AOG) Processes for Air Operator. Applied Sciences, 15, 5129. https://doi.org/10.3390/app15095129

16.      NASA. Aviation Safety Reporting System. Retrieved from https://asrs.arc.nasa.gov/

17.      Rose, R. L., Puranik, T. G., & Mavris, D. N. (2020). Natural Language Processing Based Method for Clustering and Analysis of Aviation Safety Narratives. Aerospace, 7, 143. https://doi.org/10.3390/aerospace7100143

18.      Kuhn, K. D. (2018). Using structural topic modeling to identify latent topics and trends in aviation incident reports. Transportation Research Part C: Emerging Technologies, 87, 105-122. https://doi.org/
10.1016/j.trc.2017.12.018

19.      National Transportation Safety Board. Retrieved from https://www.ntsb.gov/

20.      Kierszbaum, S., & Lapasset, L. (2020). Applying Distilled BERT for Question Answering on ASRS Reports. 2020 New Trends in Civil Aviation (NTCA), (pp. 33-38). Prague, Czech Republic. https://doi.org/10.23919/NTCA50409.2020.9291241

21.      Dong, T., Yang, Q., Ebadi, N., Luo, X. R., & Rad, P. (2021). Identifying Incident Causal Factors to Improve Aviation Transportation Safety: Proposing a Deep Learning Approach. Journal of Advanced Transportation, 2021, 1-15. https://doi.org/10.1155/2021/5540046

22.      Nanyonga, A., Joiner, K., Turhan, U., & Wild, G. (2025). Applications of Natural Language Processing in Aviation Safety: A Review and Qualitative Analysis. AIAA 2025-2153 Session: AI/ML and Autonomy Software Engineering Practices. https://doi.org/10.2514/6.2025-2153

23.      Wang, L., Chou, J., Rouck, D., Tien, A., & Baumgartner, D. M. (2023). Adapting Sentence Transformers for the Aviation Domain. https://doi.org/10.48550/arXiv.2305.09556

24.      Akhbardeh, F., Desell, T., & Zampieri, M. (2020). MaintNet: A collaborative open-source library for predictive maintenance language resources. M. Ptaszynski & B. Ziolko (Eds.). Proceedings of the 28 th international conference on computational linguistics: System demonstrations. International Committee on Computational Linguistics (ICCL), (pp. 7-11). https://doi.org/10.18653/v1/ 2020.coling-demos.2

25.      Air Transport Association of America (2021). iSpec 2200: Information Standards for Aviation Maintenance. Harvard Dataverse. https://doi.org/10.7910/DVN/G1DSMX

26.      Zhou, S., Chen, B., Zhang, Y., Liu, H., Xiao, Y., & Pan, X. (2020). A Feature Extraction Method Based on Feature Fusion and its Application in the Text-Driven Failure Diagnosis Field. International Journal of Interactive Multimedia and Artificial Intelligence, 6, 121-130. https://doi.org/10.9781/ijimai.2020.11.006

27.      Xu, Z., Chen, B., Zhou, S., Chang, W., Ji, X., Wei, C., & Hou, W. (2021). A Text-Driven Aircraft Fault Diagnosis Model Based on a Word2vec and Priori-Knowledge Convolutional Neural Network. Aerospace, 8, 112. https://doi.org/10.3390/aerospace8040112

28.      Scott, M. J. (2024). Application of natural language processing for aircraft defect tracking in maintenance operations. ICAS PROCEEDINGS 34 th Congress of the International Council of the Aeronautical Sciences, Florence, Italy. 2024.

29.      Natural Language Toolkit (NLTK). NLTK Data – Words Corpus (n.d.). Retrieved from https://www.nltk.org/nltk_data

30       Natural Language Toolkit (NLTK) (n.d.). Retrieved from https://www.nltk.org

31.      Scikit-learn developers. Scikit-learn: Machine Learning in Python, version 1.7.0. (n.d.). Retrieved from https://scikit-learn.org/stable/

32.      Gorodkin, J. (2004). Comparing two K-category assignments by a K-category correlation coefficient. Computational Biology and Chemistry, 28, 367-374. https://doi.org/10.1016/j.compbiolchem.2004.09.006

33.      Powers, D. (2011). Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation. Journal of Machine Learning Technologies, 2, 37-63.

34.      Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to Information Retrieval. Cambridge University Press.

35.      Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. arXiv, arXiv:1301.3781.

36.      Wolf, T., Debut, L., Sanh, V., & Chaumond, J. (2020). Transformers: State-of-the-Art Natural Language Processing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. ACL, 38-45.

37.      Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv, 2019. arXiv:1910.01108.

38.      Blei, D. M., Ng, A. Y., & Jordan, M.I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993-1022.

39.      Chicco, D., Warrens, M. J., & Jurman, G. (2021). The Matthews Correlation Coefficient (MCC) is More Informative Than Cohen’s Kappa and Brier Score in Binary Classification Assessment. IEEE ­Access, 9, 78368-78381. https://doi.org/10.1109/ACCESS.2021.3084050

 

Guest Book

If you have questions, comments or suggestions, you can write them in our "Guest Book"

Registration data

ISSN (print) 2071-2227,
ISSN (online) 2223-2362.
Journal was registered by Ministry of Justice of Ukraine.
Registration number КВ No.17742-6592PR dated April 27, 2011.

Contacts

D.Yavornytskyi ave.,19, pavilion 3, room 24-а, Dnipro, 49005
Tel.: +38 (066) 379 72 44.
e-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.
You are here: Home Authors and readers EngCat Archive 2025 Content №6 2025 Comprehensive analysis of aviation maintenance text reports using natural language processing methods