Automation of the marked-up data sets formation for machine learning based on simulation modeling

Automation of the marked-up data sets formation for machine learning based on simulation modeling

Andrei A. Smirnov
Doctor of Military Sciences, Military Telecommunications Academy named after S. M. Budyonniy (VAS), Assistant Professor, 3, Tikhoretsky pr., Saint Petersburg, 194064, Russia, This email address is being protected from spambots. You need JavaScript enabled to view it., SPIN-код: 8559-4689, AuthorID: 850972

Аlexander M. Kudriavtsev
Doctor of Military Sciences, Professor, VAS, Professor, 3, Tikhoretsky pr., Saint Petersburg, 194064, Russia, This email address is being protected from spambots. You need JavaScript enabled to view it., SPIN-код: 4031-3294, AuthorID: 847484


Received September 19, 2023

Abstract
The problematic issue of generating marked-up data sets for training artificial intelligence systems preparing for autonomous operation in the proposed new conditions, when the formation of a feature description of objects and situations with a known target variable is impossible or significantly difficult, is considered. As a general approach to its solution, it is proposed to build and use a software testing area that provides the generation of training and control samples, checking the effectiveness of various machine learning methods on them, forming sets of informative features of objects and phenomena, bundles of "vector-implementation of features – classification method (clustering, regression)". The results of the systematization of scientific approaches to the definition of machine learning types, as well as the main machine learning methods used in data mining, are presented. Using the example of solving the problem of assessing the dynamics of changes in the radio-electronic environment, the features of real feature descriptions of objects and phenomena that significantly affect the quality of problem solving are shown. To form training samples that adequately reflect these features, it is proposed to use simulation modeling systems that implement an agent-oriented approach to model construction. An example of setting a task and constructing such a model is presented, which ensures the formation of marked-up data sets corresponding to the simulated environmental conditions.

Key words
Information processing, data analysis, agent-based approach, AnyLogic.

DOI
10.31776/RTCJ.12204

Bibliographic description
Smirnov, A.A. and Kudriavtsev, A.M. (2024), "Automation of the marked-up data sets formation for machine learning based on simulation modeling", Robotics and Technical Cybernetics, vol. 12, no. 2, pp. 109-117, DOI: 10.31776/RTCJ.12204. (in Russian).

UDC identifier
004.852:004.94:519.876.5

References

  1. O razvitii iskusstvennogo intellekta v Rossijskoj Federacii [On the development of artificial intelligence in the Russian Federation], Decree of the President of the Russian Federation, no. 490 dated October 10, 2019. (in Russian).
  2. Zagoruyko, N.G. (1999). Prikladnyye metody analiza dannykh i znaniy [Applied methods of data and knowledge analysis], IM SO RAN Publ., Novosibirsk, Russia, p. 270. (in Russian).
  3. Petrovskij, D.V. and Sobolevskij, V.A. (2018), "Comparison of artificial data generation methods for deep learning of a monitoring system", Supply Chain Management and Business Analytics, 3(86), pp. 86-93. (in Russian).
  4. Volkov, A. V., Jackov, N.N. and Grinev V.V. (2018), "Simulation model for testing feature selection algorithms", in Informacionnye texnologii i sistemy (ITS 2018) : materialy mezhdunarodnoj nauchnoj konferencii [Information technology and systems (ITS 2018) : materials of the international scientific conference], Minsk, Belarus, pp. 278-279. (in Russian).
  5. Doenin, V.V., Gridin, V.N., Panishhev, V.S. and Razzhivajkin, I.S. (2019), "Using simulation modeling to generate datasets for complex analysis and forecasting of dynamic processes", in Optiko-elektronnye pribory i ustrojstva v sistemax raspoznavaniya obrazov i obrabotki izobrazhenij. Raspoznavanie - 2019 : sbornik materialov XV Mezhdunarodnoj nauchno-texnicheskoj konferencii [Optoelectronic devices and devices in image recognition and image processing systems. Recognition - 2019 : collection of materials of the XV International Scientific and Technical Conference], Kursk, Russia, pp. 146-148. (in Russian).
  6. Smirnov, A.A., Kudrjavcev, A.M. and Galov, S.Ju. (2020), "Simulation of the radio-electronic situation in the area of operations of military formations", Electrosvyaz, 10, pp. 36–41, DOI: 10.34832/ELSV.2020.11.10.005. (in Russian).
  7. Udalcov, N.P., Kudrjavcev, A.M. and Smirnov, A.A. (2021), "The method of constructing a simulation model of the radio-electronic environment based on the agent approach", Achievements of Modern Radioelectronics, 75(4). pp. 6-12, DOI: 10.18127/j20700784-202104-02 (in Russian).
  8. Smirnov, A.A., Kudrjavcev, A.M. and Ivanov, A.A. (2021), "Agent-based simulation of the electronic environment in the interests of improving information and analytical support for its assessment", in Sostoyanie i perspektivy razvitiya sovremennoj nauki po napravleniyu "ASU, informacionno-telekommunikacionnye sistemy" : Sbornik statej III Vserossijskoj nauchno-texnicheskoj konferencii [The state and prospects of development of modern science in the field of "Automated control systems, information and telecommunication systems" : Collection of articles of the III All-Russian Scientific and Technical Conference], Anapa, Russia, pp. 5-14. (in Russian).
  9. Smirnov, A.A., Kudrjavcev, A.M. and Ivanov, A.A. (2020), "Application of the dynamic taxonomy method for the construction of crucial functions in the automation of data processing of radio monitoring", in Sostoyanie i perspektivy razvitiya sovremennoj nauki po napravleniyu "ASU, informacionno-telekommunikacionnye sistemy" : Sbornik statej II Vserossijskoj nauchno-texnicheskoj konferencii [The state and prospects of development of modern science in the field of "Automated control systems, information and telecommunication systems" : Collection of articles of the II All-Russian Scientific and Technical Conference], Anapa, Russia, pp. 81-88. (in Russian).
  10. Ivanov, A.A., Kudrjavcev, A.M. and Smirnov, A.A. (2009), "A method of "trace" processing of radio monitoring data of an environment with random parameters", Information and space, 4, pp. 10-14. (in Russian).