MULTIMODAL NEURAL NETWORK METHOD FOR ANALYZING LINGUISTIC VARIABILITY IN CONTACT CENTER DIALOGUES BASED ON ACOUSTIC AND LEXICAL FEATURES
DOI:
https://doi.org/10.31891/2219-9365-2026-86-21Keywords:
multimodal learning, deep neural networks, speech analysis, contact center dialogue, linguistic variability, acoustic featuresAbstract
The paper proposes a multimodal neural network method for analyzing linguistic variability in contact center dialogues by integrating acoustic and lexical-morphological features. The relevance of the study stems from the need to automate the monitoring of Ukrainian-language communication quality in real conversational environments, characterized by mixed speech patterns and emotionally colored expressions, which degrade the performance of conventional automatic speech recognition systems. The proposed approach is based on a hybrid multimodal architecture that combines a Transformer encoder for context-aware textual representation learning and convolutional neural encoders for extracting informative acoustic characteristics, including Mel-frequency cepstral coefficients and spectral features. A unified feature fusion mechanism enables the formation of joint embeddings reflecting phonetic variability, lexical deviations, and discourse dynamics. To quantify deviations from normative speech, formalized evaluation indicators are introduced: the Linguistic Anomaly Coefficient and the Linguistic Cleanliness Index which enable real-time assessment of linguistic variability and communication quality. Experimental validation was conducted on a multimodal corpus comprising 50 hours of dialogue recordings collected from contact centers, interviews, and media interactions. The results demonstrate a statistically significant correlation between the degree of mixed speech usage and automatic speech recognition performance measured by Word Error Rate. A practical visualization tool, the “linguistic cleanliness map,” was developed to represent temporal changes in dialogue quality through confidence-based color zoning.
The scientific contribution of the study lies in the development of a multimodal neural network framework for integrated acoustic-textual analysis of linguistic variability and in the introduction of quantitative indicators enabling automated evaluation of speech quality in service communication environments.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Маким ОВЧАРЕНКО, Віта КАШТАН

This work is licensed under a Creative Commons Attribution 4.0 International License.


