Development for performance of Porter stemmer algorithm

Authors

DOI:

https://doi.org/10.15587/1729-4061.2021.225362

Keywords:

stemming algorithm, natural language processing, information retrieval, APSA, Porter algorithm

Abstract

The Porter stemmer algorithm is a broadly used, however, an essential tool for natural language processing in the area of information access. Stemming is used to remove words that add the final morphological and diacritical endings of words in English words to their root form to extract the word root, i.e. called stem/root in the primary text processing stage. In other words, it is a linguistic process that simply extracts the main part that may be close to the relative and related root. Text classification is a major task in extracting relevant information from a large volume of data. In this paper, we suggest ways to improve a version of the Porter algorithm with the aim of processing and overcome its limitations and to save time and memory by reducing the size of the words. The system uses the improved Porter derivation technique for word pruning. Whereas performs cognitive-inspired computing to discover morphologically related words from the corpus without any human intervention or language-specific knowledge. The improved Porter algorithm is compared to the original stemmer. The improved Porter algorithm has better performance and enables more accurate information retrieval (IR).

Author Biographies

Manhal Elias Polus, Al-Mustansiriyah University

Postgraduate Student

Department of Computer Science

College of Science

Thekra Abbas, Al-Mustansiriyah University

PhD, Assistant Professor, Head of Department

Department of Computer Science

College of Science

References

  1. Seddiqui, H., Maruf, A. A. M., Chy, A. N. (2016). Recursive Suffix Stripping to Augment Bangla Stemmer. ICAICT-2016-Paper. Available at: http://www.ciu.edu.bd/icaict2016/publications/ICAICT-2016-Paper%20(50).pdf
  2. Shah, F. P., Patel, V. (2016). A review on feature selection and feature extraction for text classification. 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET). doi: https://doi.org/10.1109/wispnet.2016.7566545
  3. Saeed, A. M., Rashid, T. A., Mustafa, A. M., Agha, R. A. A.-R., Shamsaldin, A. S., Al-Salihi, N. K. (2018). An evaluation of Reber stemmer with longest match stemmer technique in Kurdish Sorani text classification. Iran Journal of Computer Science, 1 (2), 99–107. doi: https://doi.org/10.1007/s42044-018-0007-4
  4. Agbele, K., Adesina, A., Azeez, N., Abidoye, A. (2012). Context-Aware Stemming algorithm for semantically related root words. African Journal of Computing & ICT, 5 (4), 33–42.
  5. Akkus, B. K., Cakici, R. (2013). Categorization of Turkish News Documents with Morphological Analysis. 51st Annual Meeting of the Association for Computational Linguistics Proceedings of the Student Research Workshop. Sofia, 1–8. Available at: https://www.aclweb.org/anthology/P13-3001.pdf
  6. Kumar, R., Mansotra, V. (2016). Applications of stemming algorithms in information retrieval-a review. International Journal of Advanced Research in Computer Science and Software Engineering, 6 (2), 418–423.
  7. Biba, M., Gjati, E. (2014). Boosting Text Classification through Stemming of Composite Words. Recent Advances in Intelligent Informatics, 185–194. doi: https://doi.org/10.1007/978-3-319-01778-5_19
  8. Farrar, D., Huffman Hayes, J. (2019). A Comparison of Stemming Techniques in Tracing. 2019 IEEE/ACM 10th International Symposium on Software and Systems Traceability (SST). doi: https://doi.org/10.1109/sst.2019.00017
  9. Al-Sharhan, S., Al-Hunaiyyan, A., Alhajri, R., Al-Huwail, N. (2019). Utilization of Learning Management System (LMS) Among Instructors and Students. Advances in Electronics Engineering, 15–23. doi: https://doi.org/10.1007/978-981-15-1289-6_2
  10. Joshi, A., Thomas, N., Dabhade, M. (2016). Modified Porter Stemming Algorithm. International Journal of Computer Science and Information Technologies, 7 (1), 266–269.

Downloads

Published

2021-02-26

How to Cite

Elias Polus, M., & Abbas, T. (2021). Development for performance of Porter stemmer algorithm. Eastern-European Journal of Enterprise Technologies, 1(2 (109), 6–13. https://doi.org/10.15587/1729-4061.2021.225362

Issue

Section

Information technology. Industry control systems