MULTIMODAL NEURAL NETWORK METHOD FOR ANALYZING LINGUISTIC VARIABILITY IN CONTACT CENTER DIALOGUES BASED ON ACOUSTIC AND LEXICAL FEATURES

Authors

DOI:

https://doi.org/10.31891/2219-9365-2026-86-21

Keywords:

multimodal learning, deep neural networks, speech analysis, contact center dialogue, linguistic variability, acoustic features

Abstract

The paper proposes a multimodal neural network method for analyzing linguistic variability in contact center dialogues by integrating acoustic and lexical-morphological features. The relevance of the study stems from the need to automate the monitoring of Ukrainian-language communication quality in real conversational environments, characterized by mixed speech patterns and emotionally colored expressions, which degrade the performance of conventional automatic speech recognition systems. The proposed approach is based on a hybrid multimodal architecture that combines a Transformer encoder for context-aware textual representation learning and convolutional neural encoders for extracting informative acoustic characteristics, including Mel-frequency cepstral coefficients and spectral features. A unified feature fusion mechanism enables the formation of joint embeddings reflecting phonetic variability, lexical deviations, and discourse dynamics. To quantify deviations from normative speech, formalized evaluation indicators are introduced: the Linguistic Anomaly Coefficient and the Linguistic Cleanliness Index which enable real-time assessment of linguistic variability and communication quality. Experimental validation was conducted on a multimodal corpus comprising 50 hours of dialogue recordings collected from contact centers, interviews, and media interactions. The results demonstrate a statistically significant correlation between the degree of mixed speech usage and automatic speech recognition performance measured by Word Error Rate. A practical visualization tool, the “linguistic cleanliness map,” was developed to represent temporal changes in dialogue quality through confidence-based color zoning.

The scientific contribution of the study lies in the development of a multimodal neural network framework for integrated acoustic-textual analysis of linguistic variability and in the introduction of quantitative indicators enabling automated evaluation of speech quality in service communication environments.

Published

2026-05-31

How to Cite

Ovcharenko М., & KASHTAN В. (2026). MULTIMODAL NEURAL NETWORK METHOD FOR ANALYZING LINGUISTIC VARIABILITY IN CONTACT CENTER DIALOGUES BASED ON ACOUSTIC AND LEXICAL FEATURES. MEASURING AND COMPUTING DEVICES IN TECHNOLOGICAL PROCESSES, (2), 166–173. https://doi.org/10.31891/2219-9365-2026-86-21