Method of outliers removal based on the weighted training samples of w-objects

Authors

DOI:

https://doi.org/10.15587/1729-4061.2014.24331

Keywords:

training sample, data filtering, outlier, w-object, decision rule, generating set

Abstract

The problem of preprocessing training samples to improve the efficiency of trainable recognition systems is considered in the paper. A new method for solving the problem of outliers removal based on constructing weighted reduced samples of w-objects is proposed. The wGridDC method for constructing the weighted sample of w-objects by superimposing the grid features on the space and constructing weighted objects of new sample by analyzing the contents of cells is used as a basis for the proposed method.

Within the proposed method, two outliers removal algorithms are developed. The algorithm for constructing the weighted training sample of w-objects with simultaneous outliers removal at a given filtering threshold is focused on the use in the tasks that require not only filtering the original data, but also controlling the size of the sample. Herewith, filtering threshold is user-defined. The algorithm for constructing the weighted training sample of w-objects with simultaneous outliers removal at automatic filtering threshold detection is focused on the tasks that require constructing samples, providing the highest efficiency of the system.

Analysis of the effectiveness of the proposed method has shown that the main advantage of the threshold filtering algorithm is the ability to control the size of the sample. The main advantage of the non-threshold filtering algorithm is the ability to automatically select the value of the filtering threshold that provides the greatest efficiency of the recognition system as a whole. Thus, the proposed method in general and both its constituent algorithms allow to obtain the samples, providing high efficiency of trainable recognition systems.

Author Biography

Елена Владимировна Волченко, Donetsk National Technical University 84 B. Khmelnitsky Avenue, Donetsk, 83050, Ukraine

Ph.D., Associate Professor

Department of Software Intelligent Systems

References

  1. Larose, D. T. Discovering knowledge in data: an introduction to data mining [Text] / D. T. Larose. – New Jersey: John Wiley & Sons Inc., 2005. – 240 p.
  2. Giudici, P. Applied data mining: statistical methods for business and industry [Text] / P. Giudici. – Chichester: John Wiley & Sons Inc., 2003. – 380 p.
  3. Last, M. Knowledge discovery in time series databases [Text] / M. Last, Y. Klein, A. Kandel. – IEEE Transactions on Systems, man and cybernetics, 2000. – P. 60–69.
  4. Pal, S. K. Pattern Recognition Algorithms for Data Mining: Scalability, Knowledge Discovery and Soft Granular Computing [Text] / S. K. Pal, P. Mitra. – Chapman and Hall/CRC, 2004. – 280 p.
  5. Дюличева, Ю. Ю. О задачах фильтрации обучающих данных [Текст] / Ю. Ю. Дюличева // Искусственный интеллект. – 2006. – № 2. – 65–71.
  6. John, G. H. Robust Decision Trees: Removing Outliers from Databases [Text] / G. H. John // Knowledge Discovery and Data Mining. – 1995. – P. 174–179.
  7. Zagoruiko, N. G. Methods of Recognition Based on the Function of Rival Similarity [Text] / N. G. Zagoruiko , I. A. Borisova, V. V. Dyubanov, O. A. Kutnenko // Pattern Recognition and Image Analysis. – 2008. – Vol. 18, №.1. – P. 1–6.
  8. Розробка теоретичних засад і методів реалізації відкритих систем автоматичного розпізнавання, що навчаються: способи оптимізації навчаючих вибірок і методи побудови зважених вирішуючих правил класифікації [Текст] / звіт з НДР (заключний) : Тема GP/F32/130, Грант Президента України для підтримки наукових досліджень молодих учених на 2011 рік; керівник О.В. Волченко. – 0111U007107 – Донецьк, ДВНЗ «ДонНТУ», 2011. – 67 с.
  9. Волченко, Е. В. Сеточный подход к построению взвешенных обучающих выборок w-объектов в адаптивных системах распознавания [Текст] / Е. В. Волченко // Вісник Національного технічного університету "Харківський політехнічний інститут". Збірник наукових праць. Тематичний випуск: Інформатика i моделювання. – 2011. – № 36. – С. 12–22.
  10. Волченко, Е. В. О способе определения близости объектов взвешенных обучающих выборок [Текст] / Е. В. Волченко // Вісник Національного технічного університету "Харківський політехнічний інститут". Збірник наукових праць. Тематичний випуск: Інформатика i моделювання. – 2012. – № 38. – С. 38–45.
  11. Larose, D. T. (2005). Discovering knowledge in data: an introduction to data mining. New Jersey: John Wiley & Sons Inc., 240.
  12. Giudici, P. (2003). Applied data mining: statistical methods for business and industry. Chichester: John Wiley & Sons Inc., 380.
  13. Last, M., Klein, Y., Kandel, A. (2000). Knowledge discovery in time series databases. IEEE Transactions on Systems, man and cybernetics, 60–69.
  14. Pal, S. K., Mitra, P. (2004). Pattern Recognition Algorithms for Data Mining: Scalability, Knowledge Discovery and Soft Granular Computing. Chapman and Hall/CRC, 280.
  15. Dyulicheva, Yu. Yu. (2006). About Filtering Problems of Training Sample. Artificial Intelligence, 2, 65–71.
  16. John, G. H. (1995). Robust Decision Trees: Removing Outliers from Databases. Knowledge Discovery and Data Mining, 174–179.
  17. Zagoruiko, N. G., Borisova, I. A., Dyubanov, V. V., Kutnenko, O. A. (2008). Methods of Recognition Based on the Function of Rival Similarity. Pattern Recognition and Image Analysis, 18 (1), 1–6.
  18. Volchenko, E. V. (2011). Development of theoretical principles and methods of realization the open trained system of automatic recognition: methods of optimization the training samples and methods of construction the weighted decision rules of classification. Technical Report 0111U007107, 67.
  19. Volchenko, E. V. (2011). Grid approach to the construction of weighted training samples of w-objects in adaptive recognition systems. Herald of the National Technical University "KhPI". Subject issue: Information Science and Modelling, 36, 12–22.
  20. Volchenko, E. V. (2012). Method for determining the proximity of objects of weighted training samples. Herald of the National Technical University "KhPI". Subject issue: Information Science and Modelling, 38, 38–45.

Published

2014-06-20

How to Cite

Волченко, Е. В. (2014). Method of outliers removal based on the weighted training samples of w-objects. Eastern-European Journal of Enterprise Technologies, 3(4(69), 31–36. https://doi.org/10.15587/1729-4061.2014.24331

Issue

Section

Mathematics and Cybernetics - applied aspects