METHOD OF INTELLECTUAL ANALYSIS OF SHORT HIGH-DIMENSIONAL SAMPLES BASED ON BAGGING ENSEMBLE WITH DATA AUGMENTATION
DOI:
https://doi.org/10.31891/2219-9365-2025-83-19Keywords:
small data, high-dimensional data, generalized regression neural network, data augmentation, ensemble learning, bagging, regressionAbstract
One of the persistent and critical challenges in the application of machine learning and statistical analysis methods in the medical field remains the effective processing of small data – datasets containing a limited number of observations for practical, ethical or biological reasons. In contrast to large-scale population studies or broad epidemiological databases, many real-world clinical scenarios involve working with small samples: individual patient data, rare diseases, early stage studies or specialized diagnostic procedures. As a result, researchers and clinicians are often forced to work with incomplete, sparse, or highly unbalanced data in an effort to create accurate and robust models that can be used to inform important clinical decisions. Thus, the development of efficient, reliable, and interpretable methods for processing short data is not only a methodological necessity but also a practical requirement of modern medicine. One of the most common ways to partially solve the problem of small sample analysis is data augmentation. Increasing the number of instances in the training set often has a positive effect on the accuracy of models. However, in the case of augmented data, relying on a single modeling strategy is sometimes not enough. Often, combining augmentation and ensemble learning approaches can lead to significant improvements in model robustness and performance.
This article develops a new method for intellectual analysis of short high-dimensional data samples for solving regression modeling problems, based on the use of a bagging ensemble of artificial neural networks with an additional data augmentation procedure. Its training algorithm and results are described in detail. Using this method, two medical problems were solved: predicting the level of bone fragility in patients with osteoarthritis and the percentage of body fat. According to the results of comparing the main performance metrics of the developed approach and the baseline models, proposed method demonstrated the best results for both problems. The developed bagging ensemble can be used in cases where the amount of available data is limited and classical models do not provide the required accuracy.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Мирослав ГАВРИЛЮК

This work is licensed under a Creative Commons Attribution 4.0 International License.