STUDY OF METHODS FOR CONSTRUCTING DECISION TREES FOR THE IMPLEMENTATION OF THE RANDOM FOREST ALGORITHM IN THE MEDICAL FIELD
DOI:
https://doi.org/10.31891/2219-9365-2025-81-5Keywords:
random forest, hypothyroidism, hyperthyroidism, decision trees, psychological disorders, classification, linear additive convolution, predictionAbstract
Every year, machine learning is increasingly facilitating areas of modern life, ranging from entertainment services to solving difficult tasks related to improving people's work and lives. It is especially important now to apply these analysis methods in the medical field in order to save as many lives as possible, to diagnose diseases as early as possible for easier or timelier treatment. This work is devoted to the same topic, which aims to develop methods to prevent the development of psychological disorders among patients with hypothyroidism and hyperthyroidism. In one of the observations of this topic, it was determined that the best way to predict possible difficulties is the random forest algorithm, which consists in building various decision trees. It is worth noting that it is necessary to choose the right way to develop each tree from such alternatives as ID3, CHAID, C4.5, CART, and XGBoost. All of them were analyzed using linear additive convolution based on such data as the type of tree building algorithm, data distribution criterion, data types, numerical data processing, tree pruning method, tendency to overlearn, algorithm speed, how clear the model interpretation is, and application methods. First, a table was filled out with data from decision tree methods according to the above-mentioned features, then all qualitative indicators were converted into quantitative ones for mathematical calculations when calculating convolution values for each alternative. According to the results of the experiment, greedy CART is the best algorithm for developing a decision tree model that is easy to interpret, fast, and least prone to overtraining, operates with numerical and categorical data, uses the Gini index to divide data into subsets when determining the next attribute from the list of features, and supports pruning of its structure. After conducting the experiment, the advantages and disadvantages of the chosen model of the multicriteria problem of this work are also considered.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Нурал ГУЛІЄВ

This work is licensed under a Creative Commons Attribution 4.0 International License.