METHOD OF EXPANDING IMAGE CLASSIFICATION MODEL WITH TEXT SUPERVISION
DOI:
https://doi.org/10.31891/2219-9365-2025-81-51Keywords:
machine learning, artificial intelligence, natural language processing, image classification, neural networks, computer vision, information technologyAbstract
This paper describes a method that addresses the problem of zero-shot image classification within an unbound domain. The goal of the research is to create a new method for expanding the set of classes supported by an image classifier model pre-trained on a large amount of data, without additional training of the model. An additional condition is the possibility to apply the proposed method to any conventional image classifier model, regardless of its architecture. The described method uses the additional information about objects in images obtained from text captions of images and descriptions of the classes. Text data is collected from open sources. An experiment is conducted to demonstrate the ability to generate part of the weights of the image classifier model by retraining a separate natural language processing model in order to add support for new image classes. For this, the classifier model is considered as a combination of an image encoder and a classifier layer that converts the vector representation of the image into class probabilities. When considering the mathematical model of the classifier layer, the task of creating a model without further training is reduced to the task of generating a vector of a predefined size. This transition allows to train a language model to generate weights for the image classifier model and add new classes. The resulting model demonstrates an acceptable level of accuracy on new classes with an average F-score of 0.731 for new classes, with an F-score of 0.844 for classes trained by the conventional method. Additionally, it is found that generating multiple weight vectors and using their average for classification allows to improve the quality of classification compared to using individual generated vectors.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Дмитро ДАШЕНКОВ, Кирило СМЕЛЯКОВ

This work is licensed under a Creative Commons Attribution 4.0 International License.