What are the areas of active research, patents and publications at Agara?

Spoken Language Understanding (SLU) from speech directly:

The need for SLU arises because transcribing the entire speech of the customer accurately (using a traditional ASR) will be error-prone because of noisy environments, quality of the transmission of voice over the phone, etc. More importantly, conversational systems are more interested in intents and entities than actual full transcripts. It’s easier to do this because the set of intents and patterns of entities provided will be a restricted set for a particular domain and industry.

The research we are doing is to build these SLU systems that are specifically tuned to accent, language, and (more importantly) domain/industry. This will allow them to be highly accurate.

One idea is to use ‘speaker embeddings’ that are deep learning models’ internal representations that contain information about speaker characteristics such as accent, gender, etc. When speaker embeddings are learned by an SLU model, they understand how to use them in interpreting speech better.

Another angle is transfer learning of speech, where we leverage pre-trained ASRs, trained on large public speech datasets of available accents (primarily American English), and fine-tune them further with a small set of accent-specific English (South-east Asian, British, etc.) examples to now be able to understand these accents well too.

Reducing latency is also something we are working on. Using GPUs and parallelization techniques, we want to invoke multiple ASRs / SLU modules simultaneously to extract the various intents and entities.

Conversation (text):

Adapting the bot’s response to the context of the conversation, emotion/mood of the customer, demographics of the customer (age, geography, etc.). Current conversational bots are very poor at this since most are pre-programmed to reply in a static fashion. The broad areas of this research are ‘Controlled Text Generation’ and ‘Style Transfer’. 

Engaging the customer in natural conversation, which includes answering questions in the context of the conversation. The broad area of NLP/NLG research this comes under is Question Answering Systems. But our focus is on Conversational Question Answering. This involves answering customer questions based on external knowledge sources such as FAQs, etc that might not be very structured and will not be written conversationally. The answers to the customers’ questions should be conversational and modified to fit the context though. An example of work in this area is https://arxiv.org/abs/2006.03533.

Reference links:

“Transforming” Delete, Retrieve, Generate Approach for Controlled Text Style Transfer

Akhilesh Sudhakar, Bhargav Upadhyay, Arjun Maheswaran. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. (104 kB)

https://www.aclweb.org/anthology/thumb/D19-1322.jpg

The Generative Style Transformer

This post explains our paper on style transfer, “Transforming Delete, Retrieve, Generate Approach for Controlled Text Style Transfer”…

https://miro.medium.com/max/1135/1*0Dg1tLXmZyYrFHxtDkaLMw.png