A model for identifying project sprint tasks based on their description

Authors

DOI:

https://doi.org/10.30837/ITSSI.2023.26.033

Keywords:

project; task description; project task management system; model; classifier; vector representation

Abstract

The subject of research in this article is the identification of project sprint tasks. The purpose of the article is to find approaches to reducing the risks of not fulfilling sprint tasks. The article solves the following tasks: analyzing research on the classification and visualization of project tasks, developing an algorithm that can automatically classify text descriptions of sprint tasks, collecting and preparing a training sample of text descriptions of sprint tasks for training and testing the classification model, applying natural language processing methods to improve classification and ensure the accuracy of the results, validating the model on real data to assess the efficiency and accuracy of classification, and analyzing the results. The following methods have been used: machine learning methods for classification, text vectorization methods, methods for classifying text descriptions, natural language processing methods, methods for semantic analysis of task description text, methods for processing expert opinions. The following results were obtained: a comprehensive approach to using machine learning algorithms, including the collection and processing of textual descriptions of tasks, for classification and involvement of expert opinions to improve the quality of task perception by the project team. Text expressions were classified based on the Bayesian classifier and neural classifiers. A visual representation of the data was implemented. Semantic analysis of the text of the description and title of the tasks was performed. Data markup was obtained to classify the quality of the wording, which was performed by a team of experts. To measure the reliability of the obtained expert assessments, we calculated Cohen's kappa coefficient for each pair of markers. According to the experimental results, the accuracy of the Bayesian classifier is 70%. For the classifier based on deep learning, a neural network for binary classification based on the transformer architecture was selected. The neural network was trained using the Python programming language and deep learning frameworks. The result is a classifier that gives an accuracy score of 83% on a test dataset, which is a good result for a small dataset and data with conflicting labels. Conclusions: the analysis of textual data confirms that the existing data in the tracking system is incomplete and contains abbreviations, conventions, and slang. The results show that the assessment of the quality of the wording is determined by the level of expert knowledge of the specifics and context of the project, while increasing the number of experts has almost no effect on the result. In further research, it is recommended to test the hypothesis that the effectiveness of the classifier depends on the specific project and the use of unsupervised learning methods for the task of identifying the quality of formulations.

Author Biographies

Marina Grinchenko, National Technical University "Kharkiv Polytechnic Institute"

Phd of Technical Sciences, Associate Professor, Head of the Department of Strategic Management Kharkiv

Mykyta Rohovyi , Kharkiv National University of Radio Electronics

PhD candidate

References

References:

Rohovyi, M., Grinchenko, M. (2023), "Project team management model under risk conditions". Vestn. Khar'k. politekhn. in ta. Ser.: Strategichne upravlinnya, upravlinnya portfelyamy, programamy ta proektamy [Bulletin of the Kharkov Polytechnic Institute. Series: Strategic Management, Portfolio Management, Programs and Projects], Kharkov: NTU "KhPI", No. 1 (7), P. 3–11. DOI: https://doi.org/10.20998/2413-3000.2023.7.1

Sonbol, R., Rebdawi, G., Ghneim, N. (2022), "Learning software requirements syntax: An unsupervised approach to recognize templates, Knowledge-Based Systems, Vol. 248, 108933 р. https://doi.org/10.1016/j.knosys.2022.108933

Leelaprute, P., Amasaki, S. (2022), "A comparative study on vectorization methods for non-functional requirements classification", Information and Software Technology, Vol. 150, 106991 р. https://doi.org/10.1016/j.infsof.2022.106991

Femmer, H., Fernández, D., Wagner, S., Eder, S. (2017), "Rapid quality assurance with Requirements Smells", Journal of Systems and Software, Vol. 123, P. 190–213. https://doi.org/10.1016/j.jss.2016.02.047

Ramesh, M.R.R., Reddy, C.S. (2021), "Metrics for software requirements specification quality quantification", Computers & Electrical Engineering, Vol. 96, Part A, 107445 P. 3–11. https://doi.org/10.1016/j.compeleceng.2021.107445

Casamayor, A., Godoy, D., Campo, M. (2010), "Identification of non-functional requirements in textual specifications: A semi-supervised learning approach", Information and Software Technology, Vol. 52, Issue 4, P. 436–445. https://doi.org/10.1016/j.infsof.2009.10.010

Casillo, F., Deufemia, V., Gravino, C. (2022), "Detecting privacy requirements from User Stories with NLP transfer learning models", Information and Software Technology, Vol. 146, P. 106853. https://doi.org/10.1016/j.infsof.2022.106853

Dalpiaz, F., et al. (2019), "Detecting terminological ambiguity in user stories: Tool and experimentation", Information and Software Technology, Vol. 110, P. 3–16. https://doi.org/10.1016/j.infsof.2018.12.007

Dalpiaz, F., Gieske, P., Sturm, A. (2021), " On deriving conceptual models from user requirements: An empirical study", Information and Software Technology, Vol. 131, 106484 P. 1–13. https://doi.org/10.1016/j.infsof.2020.106484

Amna, A.R., Poels, G. (2022), "Ambiguity in user stories: A systematic literature review", Information and Software Technology, Vol. 145, P. 1–12. https://doi.org/10.1016/j.infsof.2022.106824

Urbieta, M., et al. (2020), "The impact of using a domain language for an agile requirements management", Information and Software Technology, Vol. 145, P. 1–16. https://doi.org/10.1016/j.infsof.2020.106375

Jia, J., et al. (2019), "Understanding software developers' cognition in agile requirements engineering", Science of Computer Programming, Vol. 178, P. 1–19. https://doi.org/10.1016/j.scico.2019.03.005

Murtazina, M., Avdeenko, T. (2019), "An Ontology-based Approach to Support for Requirements Traceability in Agile Development", Procedia Computer Science, Vol. 150, P. 628–635. https://doi.org/10.1016/j.procs.2019.02.044

Y. Wahba, Y., Madhavji, N., Steinbacher, J. (2020), "A Hybrid Machine Learning Model for Efficient Classification of IT Support Tickets in The Presence of Class Overlap", Proceedings of the 32nd Annual International Conference on Computer Science and Software Engineering, P. 151-156. DOI: 10.1109/ICIT58465.2023.10143149

Ramírez-Mora, S., Oktaba, H., Gómez-Adorno, H. (2020), "Descriptions of issues and comments for predicting issue success in software projects", Journal of Systems and Software, Vol. 168, P. 1–19. https://doi.org/10.1016/j.jss.2020.110663

Li, Z., A "Unified Understanding of Deep NLP Models for Text Classification", available at: https://arxiv.org/abs/2206.09355 (last accessed 08.11.2023).

Ishizuka, R., et al. (2022), "Categorization and Visualization of Issue Tickets to Support Understanding of Implemented Features in Software Development Projects", Applied Sciences. № 12(7):3222. https://doi.org/10.3390/app12073222

Devlin, J., et al. (2019), "BERT: Pre-training of deep bidirectional transformers for language understanding", Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1, Р. 4171–4186. DOI:10.18653/v1/N19-1423

Chawla, P., Hazarika, S., Shen, H.-W. (2020), "Token-wise sentiment decomposition for convnet: Visualizing a sentiment classifier", Visual Informatics, Vol. 4 Issue 2, Р. 132–141. https://doi.org/10.1016/j.visinf.2020.04.006

Bird, S., Klein, E., Loper, E. "Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O'Reilly Media, Beijing", 2009. 504 р. available at: https://tjzhifei.github.io/resources/NLTK.pdf (last accessed 08.11.2023).

"Word2vec", available at: https://www.tensorflow.org/text/tutorials/word2vec (last accessed 08.11.2023).

"TF-IDF (Term Frequency-Inverse Document Frequency)", available at: https://www.learndatasci.com/glossary/tf-idf-term-frequency-inverse-document-frequency/i (last accessed 08.11.2023).

Pennington, J., Socher, R., Manning, C. (2014), "GloVe: Global Vectors for Word Representation", In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Р.1532–1543. http://dx.doi.org/10.3115/v1/D14-1162

"Оpen-source FastText", available at: https://fasttext.cc/ (last accessed 08.11.2023)

McHugh, Mary L. (2012), "Interrater reliability: the kappa statistic", Biochemia Medica, Vol. 22 Issue 3, Р. 276-282 https://doi.org/10.11613/BM.2012.031

Ashish Vaswani, Noam Shazeer, Niki Parmar et al., (2017), "Attention Is All You Need", 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, P.1–15. DOI: https://doi.org/10.48550/arXiv.1706.03762

Downloads

Published

2023-12-27

How to Cite

Grinchenko, M., & Rohovyi , M. (2023). A model for identifying project sprint tasks based on their description. INNOVATIVE TECHNOLOGIES AND SCIENTIFIC SOLUTIONS FOR INDUSTRIES, (4(26), 33–44. https://doi.org/10.30837/ITSSI.2023.26.033