Research and build chatbot to support users in banking - 3

NLU processes user messages using a pipeline where the following processing steps are configured sequentially:


Domain Classification

Classification)

Intent classification

Maybe you are interested!

Classification)

Entity

Extraction)

Figure 1.3: Main processing steps in the NLU pipeline [1]

In this pipeline, you can customize components from data preprocessing, language modeling, algorithms used for word segmentation and entity information extraction…

Extract information in pipelined

“12 month loan interest rate?”

{“loan”:”loan”,“term”:”12 months”}

Tokenizer

Entity extraction

Chunker

Name entity recognition

Part of Speech Tagger

Pipelined intent classification

“12 month loan interest rate?” {intent:” interest”}

Vectorization

Intent classification


12 month loan interest rate?


Natural Language Understanding (NLU)

For details on the processing steps, see model 1.4: In which the entity extraction step is the slot filling step in figure 1.3.


Figure 1.4: Processing steps in NLU [2]

To classify user intent, we need language modeling, which is the representation of language in a vector form that can be understood by machines (vectorization). The most popular method today is word embedding. Word embedding is the general name for a set of language models and feature learning methods in natural language processing (NLP), where words or phrases from the vocabulary are mapped to real-number vectors. Conceptually, it involves mathematically embedding from a space with one dimension for each word into a continuous vector space with much lower dimensions. Some popular representation methods such as Word2Vec, GloVe or the newer FastText will be introduced in the following section.

After language modeling including training input data for the bot, determining user intent from user questions based on the trained set is the intent classification or text classification step. In this step, we can use some techniques such as: Naive Bayes, Decision Tree (Random Forest), Vector Support Machine (SVM), Convolution Neural Network (CNN), Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM, Bi-LSTM). Most current chatbots apply deep learning models such as RNN and LSTM to classify user intent. The biggest challenge for chatbots in this step is to identify multiple intents in a user statement. For example, if you say "hello, check my account balance", the bot must identify 2 intentions "hello" and "check balance" in the user statement. If the bot can understand and answer this type of question, it will make interacting with the bot more natural.

Next is the extraction of information in the user dialogue. The information to be extracted is usually in the form of numbers, strings or times and they must be declared and trained in advance.

Segmenting words (Tokenization or word segmentation): Word segmentation is a processing process aimed at determining the boundaries of words in a sentence. It can also be simply understood that word segmentation is the process of determining single words, compound words, etc. in a sentence. For language processing, in order to determine the grammatical structure of a sentence, to determine the word class of a word in a sentence, it is essential to determine which word is in the sentence. This problem seems simple to humans, but for computers, this is a very difficult problem to solve. Normally, languages ​​separate words by spaces, but in Vietnamese, there are many compound words and phrases. For example, the compound word "account" is created by 2 single words "tai" and "khoan". There are a number of algorithms that support solving this problem such as the longest matching model, the extreme matching model, etc.

Maximum Matching, Hidden Markov Models (HMM) or CRF (conditional random field) model...

1.3.1 Determine user intent



Figure 1.5: Model of steps to determine intention The user intention classification system has several basic steps:

Data preprocessing

Feature extraction

Model training

Classification


The data preprocessing step is the operation of “cleaning” the data such as: removing redundant information, standardizing data and converting misspelled words to correct spelling, standardizing abbreviations, etc. The data preprocessing step plays an important role in the chatbot system. If the input data is processed in this step, it will increase the accuracy and intelligence of the bot.

Next is the feature extraction step (feature extraction or feature engineering) from the cleaned data. In the traditional machine learning model (before the deep learning model was widely applied), the feature extraction step greatly affects the accuracy of the classification model. To extract good features, we need to analyze the data quite meticulously and also need expert knowledge in each specific application domain.

The training step takes as input the extracted features and applies machine learning algorithms to learn a classification model. The classification models can be classification rules (if using decision tree) or weight vectors.

corresponding to the extracted features (as in logistic regression, SVM, or Neural network models).

Once we have an intent classification model, we can use it to classify a new sentence. This sentence also goes through preprocessing and feature extraction steps, then the classification model will determine a “score” for each intent in the set of intents and give the highest-scoring intent.

To provide accurate support, chatbots need to identify the user's intent. Identifying the user's intent will determine how the next conversation between the person and the chatbot will take place. Therefore, if the user's intent is incorrectly identified, the chatbot will give incorrect, out-of-context responses. At that time, the user may feel annoyed and not return to use the system. The problem of identifying user intent therefore plays a very important role in the chatbot system.

For a closed application domain, we can limit the number of user intentions to a finite set of predefined intentions that are relevant to the tasks that the chatbot can support. With this limit, the problem of determining user intentions can be reduced to the problem of text classification. Given a user sentence as input, the classification system will determine the intent corresponding to that sentence in the set of predefined intents.

To build an intent classification model, we need a training dataset that includes different expressions for each intent. For example, the same intent asking about a user's account balance can use the following expressions:

Account information?

Account lookup?

Account balance?

How much money is in the account?


It can be said that the step of creating training data for the intent classification problem is one of the most important tasks when developing a chatbot system and greatly affects the quality of the chatbot system's products later. This task also requires a lot of time and effort from the chatbot developer.

1.4 Conversation Management (DM)


In long conversations between humans and chatbots, chatbots will need to remember contextual information or manage conversation states.

(dialog state). The issue of dialogue management is then important to ensure that the exchange between humans and machines is smooth.

The function of the dialogue management component is to receive input from the NLU component, manage dialogue states, dialogue contexts, and transmit output to the Natural Language Generation (NLG) component.


Figure 1.6: State management model and action decision in conversation [2]

The dialog state is saved and based on the dialog policy to decide the next action for the bot's response in a dialog scenario, or the action depends only on its previous dialog state.

For example, the dialogue management module in a chatbot serving airline ticket booking needs to know when the user has provided enough information for booking to create a ticket to the system or when it needs to reconfirm the information entered by the user. Currently, chatbot products often use the Finite State Automata (FSA) model, the Frame-based (Slot Filling) model, or a combination of these two models. Some new research directions apply the ANN model to dialogue management to help bots become smarter, see section 2.5 for details.

1.4.1 FSA finite state machine model


Figure 1.7: Conversation management based on FSA finite state machine model

The simplest FSA model manages dialogues. For example, a customer care system of a telecommunications company serves customers who complain about slow internet. The task of the chatbot is to ask the customer's name, phone number, the name of the Internet package the customer is using, and the actual Internet speed of the customer. The figure illustrates a dialogue management model for a customer care chatbot. The states of the FSA correspond to the questions that the dialogue manager asks the user. The arcs connecting the states correspond to the actions that the chatbot will perform. These actions depend on the user's response to the questions. In the FSA model, the chatbot is the user-oriented side of the conversation.

The advantage of the FSA model is that it is simple and the chatbot will predetermine the desired response format from the user. However, the FSA model is not really suitable for complex chatbot systems or when the user provides different information in the same conversation. In the chatbot example above, when the user provides both name and phone number at the same time, if the chatbot continues to ask for the phone number, the user may feel annoyed.

1.4.2 Frame-based model


The Frame-based model (or Form-based model) can solve the problem that the FSA model encounters. The Frame-based model relies on pre-defined frames to guide the conversation. Each frame will include information (slots) to be filled in and corresponding questions that the dialogue manager asks the user. This model allows the user to fill in information in many different slots in the frame. The figure above is an example of a frame for a chatbot.



Figure 1.8: Frame for chatbot asking for customer information

The Frame-based dialogue management component will ask questions to customers, fill in the slots based on the information provided by the customer until it has enough necessary information. When the user answers multiple questions at the same time, the system will have to fill in the corresponding slots and remember to not ask questions that have already been answered.

In complex application domains, a conversation may have multiple frames. The problem for chatbot developers is how to know when to switch between frames. A common approach to managing the transfer of control between frames is to define production rules. These rules are based on a number of factors such as the most recent conversational statement or question asked by the user.

1.5 Generative Language Components (NLG)

NLG is the response generation component of a chatbot. It relies on mapping the actions of the conversation manager into natural language to respond to the user.



There are 4 commonly used mapping methods: Template-Base, Plan-based, Class-base, RNN-base

1.5.1 Template-based NLG


This answer mapping method uses predefined bot answer templates to generate answers.



Figure 1.9: Language generation method based on response sample set [1]

- Advantages : simple, easy to control. Suitable for closed domain problems.

- Disadvantages : time consuming to define rules, not natural in the answer. For large systems, it is difficult to control the rules, leading to the system being difficult to develop and maintain.

1.5.2 Plan-based NLG


Figure 1.10: Plan-based language generation method [1]

- Advantages: Can model complex language structures

- Disadvantages: Heavy design, requires clear knowledge domain

Comment


Agree Privacy Policy *