A model for identifying project sprint tasks based on their description
DOI:
https://doi.org/10.30837/ITSSI.2023.26.033Keywords:
project; task description; project task management system; model; classifier; vector representationAbstract
The subject of research in this article is the identification of project sprint tasks. The purpose of the article is to find approaches to reducing the risks of not fulfilling sprint tasks. The article solves the following tasks: analyzing research on the classification and visualization of project tasks, developing an algorithm that can automatically classify text descriptions of sprint tasks, collecting and preparing a training sample of text descriptions of sprint tasks for training and testing the classification model, applying natural language processing methods to improve classification and ensure the accuracy of the results, validating the model on real data to assess the efficiency and accuracy of classification, and analyzing the results. The following methods have been used: machine learning methods for classification, text vectorization methods, methods for classifying text descriptions, natural language processing methods, methods for semantic analysis of task description text, methods for processing expert opinions. The following results were obtained: a comprehensive approach to using machine learning algorithms, including the collection and processing of textual descriptions of tasks, for classification and involvement of expert opinions to improve the quality of task perception by the project team. Text expressions were classified based on the Bayesian classifier and neural classifiers. A visual representation of the data was implemented. Semantic analysis of the text of the description and title of the tasks was performed. Data markup was obtained to classify the quality of the wording, which was performed by a team of experts. To measure the reliability of the obtained expert assessments, we calculated Cohen's kappa coefficient for each pair of markers. According to the experimental results, the accuracy of the Bayesian classifier is 70%. For the classifier based on deep learning, a neural network for binary classification based on the transformer architecture was selected. The neural network was trained using the Python programming language and deep learning frameworks. The result is a classifier that gives an accuracy score of 83% on a test dataset, which is a good result for a small dataset and data with conflicting labels. Conclusions: the analysis of textual data confirms that the existing data in the tracking system is incomplete and contains abbreviations, conventions, and slang. The results show that the assessment of the quality of the wording is determined by the level of expert knowledge of the specifics and context of the project, while increasing the number of experts has almost no effect on the result. In further research, it is recommended to test the hypothesis that the effectiveness of the classifier depends on the specific project and the use of unsupervised learning methods for the task of identifying the quality of formulations.
References
References:
Rohovyi, M., Grinchenko, M. (2023), "Project team management model under risk conditions". Vestn. Khar'k. politekhn. in ta. Ser.: Strategichne upravlinnya, upravlinnya portfelyamy, programamy ta proektamy [Bulletin of the Kharkov Polytechnic Institute. Series: Strategic Management, Portfolio Management, Programs and Projects], Kharkov: NTU "KhPI", No. 1 (7), P. 3–11. DOI: https://doi.org/10.20998/2413-3000.2023.7.1
Sonbol, R., Rebdawi, G., Ghneim, N. (2022), "Learning software requirements syntax: An unsupervised approach to recognize templates, Knowledge-Based Systems, Vol. 248, 108933 р. https://doi.org/10.1016/j.knosys.2022.108933
Leelaprute, P., Amasaki, S. (2022), "A comparative study on vectorization methods for non-functional requirements classification", Information and Software Technology, Vol. 150, 106991 р. https://doi.org/10.1016/j.infsof.2022.106991
Femmer, H., Fernández, D., Wagner, S., Eder, S. (2017), "Rapid quality assurance with Requirements Smells", Journal of Systems and Software, Vol. 123, P. 190–213. https://doi.org/10.1016/j.jss.2016.02.047
Ramesh, M.R.R., Reddy, C.S. (2021), "Metrics for software requirements specification quality quantification", Computers & Electrical Engineering, Vol. 96, Part A, 107445 P. 3–11. https://doi.org/10.1016/j.compeleceng.2021.107445
Casamayor, A., Godoy, D., Campo, M. (2010), "Identification of non-functional requirements in textual specifications: A semi-supervised learning approach", Information and Software Technology, Vol. 52, Issue 4, P. 436–445. https://doi.org/10.1016/j.infsof.2009.10.010
Casillo, F., Deufemia, V., Gravino, C. (2022), "Detecting privacy requirements from User Stories with NLP transfer learning models", Information and Software Technology, Vol. 146, P. 106853. https://doi.org/10.1016/j.infsof.2022.106853
Dalpiaz, F., et al. (2019), "Detecting terminological ambiguity in user stories: Tool and experimentation", Information and Software Technology, Vol. 110, P. 3–16. https://doi.org/10.1016/j.infsof.2018.12.007
Dalpiaz, F., Gieske, P., Sturm, A. (2021), " On deriving conceptual models from user requirements: An empirical study", Information and Software Technology, Vol. 131, 106484 P. 1–13. https://doi.org/10.1016/j.infsof.2020.106484
Amna, A.R., Poels, G. (2022), "Ambiguity in user stories: A systematic literature review", Information and Software Technology, Vol. 145, P. 1–12. https://doi.org/10.1016/j.infsof.2022.106824
Urbieta, M., et al. (2020), "The impact of using a domain language for an agile requirements management", Information and Software Technology, Vol. 145, P. 1–16. https://doi.org/10.1016/j.infsof.2020.106375
Jia, J., et al. (2019), "Understanding software developers' cognition in agile requirements engineering", Science of Computer Programming, Vol. 178, P. 1–19. https://doi.org/10.1016/j.scico.2019.03.005
Murtazina, M., Avdeenko, T. (2019), "An Ontology-based Approach to Support for Requirements Traceability in Agile Development", Procedia Computer Science, Vol. 150, P. 628–635. https://doi.org/10.1016/j.procs.2019.02.044
Y. Wahba, Y., Madhavji, N., Steinbacher, J. (2020), "A Hybrid Machine Learning Model for Efficient Classification of IT Support Tickets in The Presence of Class Overlap", Proceedings of the 32nd Annual International Conference on Computer Science and Software Engineering, P. 151-156. DOI: 10.1109/ICIT58465.2023.10143149
Ramírez-Mora, S., Oktaba, H., Gómez-Adorno, H. (2020), "Descriptions of issues and comments for predicting issue success in software projects", Journal of Systems and Software, Vol. 168, P. 1–19. https://doi.org/10.1016/j.jss.2020.110663
Li, Z., A "Unified Understanding of Deep NLP Models for Text Classification", available at: https://arxiv.org/abs/2206.09355 (last accessed 08.11.2023).
Ishizuka, R., et al. (2022), "Categorization and Visualization of Issue Tickets to Support Understanding of Implemented Features in Software Development Projects", Applied Sciences. № 12(7):3222. https://doi.org/10.3390/app12073222
Devlin, J., et al. (2019), "BERT: Pre-training of deep bidirectional transformers for language understanding", Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1, Р. 4171–4186. DOI:10.18653/v1/N19-1423
Chawla, P., Hazarika, S., Shen, H.-W. (2020), "Token-wise sentiment decomposition for convnet: Visualizing a sentiment classifier", Visual Informatics, Vol. 4 Issue 2, Р. 132–141. https://doi.org/10.1016/j.visinf.2020.04.006
Bird, S., Klein, E., Loper, E. "Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O'Reilly Media, Beijing", 2009. 504 р. available at: https://tjzhifei.github.io/resources/NLTK.pdf (last accessed 08.11.2023).
"Word2vec", available at: https://www.tensorflow.org/text/tutorials/word2vec (last accessed 08.11.2023).
"TF-IDF (Term Frequency-Inverse Document Frequency)", available at: https://www.learndatasci.com/glossary/tf-idf-term-frequency-inverse-document-frequency/i (last accessed 08.11.2023).
Pennington, J., Socher, R., Manning, C. (2014), "GloVe: Global Vectors for Word Representation", In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Р.1532–1543. http://dx.doi.org/10.3115/v1/D14-1162
"Оpen-source FastText", available at: https://fasttext.cc/ (last accessed 08.11.2023)
McHugh, Mary L. (2012), "Interrater reliability: the kappa statistic", Biochemia Medica, Vol. 22 Issue 3, Р. 276-282 https://doi.org/10.11613/BM.2012.031
Ashish Vaswani, Noam Shazeer, Niki Parmar et al., (2017), "Attention Is All You Need", 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, P.1–15. DOI: https://doi.org/10.48550/arXiv.1706.03762
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Our journal abides by the Creative Commons copyright rights and permissions for open access journals.
Authors who publish with this journal agree to the following terms:
Authors hold the copyright without restrictions and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-commercial and non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
Authors are permitted and encouraged to post their published work online (e.g., in institutional repositories or on their website) as it can lead to productive exchanges, as well as earlier and greater citation of published work.