Research and build chatbot to support users in banking - 3

Identify Rating Levels and Rating Scales zt2i3t4l5ee zt2a3gstourism,quan lan,quang ninh,ecology,ecotourism,minh chau,van don,geography,geographical basis,tourism development,science zt2a3ge zc2o3n4t5e6n7ts of the islanders. Therefore, this indicator will be divided into two sub-indicators: a1. Natural tourism attractiveness a2. Cultural tourism attractiveness b. Tourist capacity The two island communes in Quan Lan have different capacities to receive tourists. Minh Chau Commune is home to many standard hotels and resorts, attracting high-income domestic and international tourists. Meanwhile, Quan Lan Commune has many motels mainly built and operated by local people, so the scale and quality are not high, and will be suitable for ordinary tourists such as students. c. Time of exploitation of Quan Lan Island Commune: Quan Lan tourism is seasonal due to weather and climate conditions and festivals only take place on certain days of the year, specifically in spring. In Quan Lan commune, the period from April to June and from September to November is considered the best time to visit Quan Lan because the cultural tourism activities are mainly associated with festivals taking place during this time. Minh Chau island commune: Tourism exploitation time is all year round, because this is a place with a number of tourist attractions with diverse ecosystems such as Bai Tu Long National Park Research Center, Tram forest, Turtle Laying Beach, so besides coming to the beach for tourism and vacation in the summer, Minh Chau will attract research groups to come for tourism combined with research at other times of the year. d. Sustainability The sustainability of ecotourism sites in Quan Lan and Minh Chau communes depends on the sensitivity of the ecosystems to climate changes. landscape. In general, these tourist destinations have a fairly high level of sustainability, because they are natural ecosystems, planned and protected. However, if a large number of tourists gather at certain times, it can exceed the carrying capacity and affect the sustainability of the environment (polluted beaches, damaged trees, animals moving away from their habitats, etc.), then the sustainability of the above ecosystems (natural ecosystems, human ecosystems) will also be affected and become less sustainable. e. Location and accessibility Both island communes have ports to take tourists to visit from Van Don wharf: - Quan Lan – Van Don traffic route: Phuc Thinh – Viet Anh high-speed boat and Quang Minh high-speed boat, depart at 8am and 2pm from Van Don to Quan Lan, and at 7am and 1pm from Quan Lan to Van Don. There are also wooden boats departing at 7am and 1pm. - Van Don - Minh Chau traffic route: Chung Huong high-speed train, Minh Chau train, morning 7:30 and afternoon 13:30 from Van Don to Minh Chau, morning 6:30 and afternoon 13:00 from Minh Chau to Van Don. f. Infrastructure Despite receiving investment attention, the issue of infrastructure and technical facilities for tourism on Quan Lan Island is still an issue that needs to be resolved because it has a direct impact on the implementation of ecotourism activities. The minimum conditions for serving tourists such as accommodation, electricity, water, communication, especially medical services, and security work need to be given top priority. Ecotourism spots in Minh Chau commune are assessed to have better infrastructure and technical facilities for tourism because there are quite complete and synchronous conditions for serving tourists, meeting many needs of domestic and foreign tourists. 3.2.1.4. Determine assessment levels and assessment scales Corresponding to the levels of each criterion, the index is the score of those levels in the order of 4, 3, 2, 1 decreasing according to the standard of each level: very attractive (4), attractive (3), average (2), less attractive (1). 3.2.1.5. Determining the coefficients of the criteria For the assessment of DLST in the two communes of Quan Lan and Minh Chau islands, the students added evaluation coefficients to show the importance of the criteria and indicators as follows: Coefficient 3 with criteria: Attractiveness, Exploitation time. These are the 2 most important criteria for attracting tourists to tourism in general and eco-tourism in particular, so they have the highest coefficient. Coefficient 2 with criteria: Capacity, Infrastructure, Location and accessibility . Because the assessment area is an island commune of Van Don district, the above criteria are selected by the author with appropriate coefficients at the average level. Coefficient 1 with criteria: Sustainability. Quan Lan has natural and human-made ecotourism sites, with high biodiversity and little impact from local human factors. Most of the ecotourism sites are still wild, so they are highly sustainable. 3.2.1.6. Results of DLST assessment on Quan Lan island a. Assessment of the potential for natural tourism development  For Minh Chau commune: + Natural tourism attractiveness is determined to be very attractive (4 points) and the most important coefficient (coefficient 3), so the score of the Attractiveness criterion is 4 x 3 = 12. + Capacity is determined as average (2 points) and the coefficient is quite important (coefficient 2), then the score of Capacity criterion is 2 x 2 = 4. + Exploitation time is long (4 points), the most important coefficient (coefficient 3) so the score of the Exploitation time criterion is 4 x 3 = 12. + Sustainability is determined as sustainable (4 points), the important coefficient is the average coefficient (coefficient 1), so the score of the Sustainability criterion is 4 x 1 = 4 points + Location and accessibility are determined to be quite favorable (2 points), the coefficient is quite important (coefficient 2), the criterion score is 2 x 2 = 4 points. + Infrastructure is assessed as good (3 points), the coefficient is quite important (coefficient 2), then the score of the Infrastructure criterion is 3 x 2 = 6 points. The total score for evaluating DLST in Minh Chau commune according to 6 evaluation criteria is determined as: 12 + 4 + 12 + 4 + 4 + 6 = 42 points Similar assessment for Quan Lan commune, we have the following table: Table 3.3: Assessment of the potential for natural ecotourism development in Quan Lan and Minh Chau communes Attractiveness of self-tourismof course Capacity Mining time Sustainability Location and accessibility Infrastructure Result Point DarkMulti Point DarkMulti Point DarkMulti Point DarkMulti Point DarkMulti Point DarkMulti CommuneMinh Chau 12 12 4 8 12 12 4 4 4 8 6 8 42/52 Quan CommuneLan 6 12 6 8 9 12 4 4 4 8 4 8 33/52 b. Assessment of the potential for humanistic tourism development  For Quan Lan commune: + The attractiveness of human tourism is determined to be very attractive (4 points) and the most important coefficient (coefficient 3), so the score of the Attractiveness criterion is 4 x 3 = 12. + Capacity is determined to be large (3 points) and the coefficient is quite important (coefficient 2), then the score of the Capacity criterion is 3 x 2 = 6. + Mining time is average (3 points), the most important coefficient (coefficient 3) so the score of the Mining time criterion is 3 x 3 = 9. + Sustainability is determined as sustainable (4 points), the important coefficient is the average coefficient (coefficient 1), so the score of the Sustainability criterion is 4 x 1 = 4 points. + Location and accessibility are determined to be quite favorable (2 points), the coefficient is quite important (coefficient 2), the criterion score is 2 x 2 = 4 points. + Infrastructure is rated as average (2 points), the coefficient is quite important (coefficient 2), then the score of the Infrastructure criterion is 2 x 2 = 4 points. The total score for evaluating DLST in Quan Lan commune according to 6 evaluation criteria is determined as: 12 + 6 + 6 + 4 + 4 + 4 = 36 points. Similar assessment with Minh Chau commune we have the following table: Table 3.4: Assessment of the potential for developing humanistic eco-tourism in Quan Lan and Minh Chau communes Attractiveness of human tourismliterature Capacity Mining time Sustainability Location and accessibility Infrastructure Result Point DarkMulti Point DarkMulti Point DarkMulti Point DarkMulti Point DarkMulti Point DarkMulti Quan CommuneLan 12 12 6 8 9 12 4 4 4 8 4 8 39/52 Minh CommuneChau 6 12 4 8 12 12 4 4 4 8 6 8 36/52 Basically, both Minh Chau and Quan Lan localities have quite favorable conditions for developing ecotourism. However, Quan Lan commune has more advantages to develop ecotourism in a humanistic direction, because this is an area with many famous historical relics such as Quan Lan Communal House, Quan Lan Pagoda, Temple worshiping the hero Tran Khanh Du, ... along with local festivals held annually such as the wind praying ceremony (March 15), Quan Lan festival (June 10-19); due to its location near the port and long exploitation time, the beaches in Quan Lan commune (especially Quan Lan beach) are no longer hygienic and clean to ensure the needs of tourists coming to relax and swim; this is also an area with many beautiful landscapes such as Got Beo wind pass, Ong Phong head, Voi Voi cave, but the ability to access these places is still very limited (dirt hill road, lots of gravel and rocks), especially during rainy and windy times; In addition, other natural resources such as mangrove forests and sea worms have not been really exploited for tourism purposes and ecotourism development. On the contrary, Minh Chau commune has more advantages in developing ecotourism in the direction of natural tourism, this is an area with diverse ecosystems such as at Rua De Beach, Bai Tu Long National Park Conservation Center...; Minh Chau beach is highly appreciated for its natural beauty and cleanliness, ranked in the top ten most beautiful beaches in Vietnam; Minh Chau commune is also home to Tram forest with a large area and a purity of up to 90%, suitable for building bridges through the forest (a very effective type of natural ecotourism currently applied by many countries) for tourists to sightsee, as well as for the purpose of studying and researching. Figure 3.1: Thenmala Forest Bridge (India) Source: https://www.thenmalaecotourism.com/(August 21, 2019) 3.2.2. Using SWOT matrix to evaluate Quan Lan island tourism General assessment of current tourism activities of Quan Lan island is shown through the following SWOT matrix: Table 3.5: SWOT matrix evaluating tourism activities on Quan Lan island Internal agent Strengths- There is a lot of potential for tourism development, especially natural ecotourism and humanistic ecotourism.- The unskilled labor force is relatively abundant.- resource environmentunpolluted, still Weaknesses- Poorly developed infrastructure, especially traffic routes to tourist destinations on the island.- The team of professional staff is still weak.- Tourism products in general quite wild, originalintact general and DLST in particularalone is monotonous. External agents Opportunity- Tourism is a key industry in the socio-economic development strategy of the province and Van Don economic zone.- Quan Lan was selected as a pilot area for eco-tourism development within the framework of the green growth project between Quang Ninh province and the Japanese organization JICA.- The flow of tourists and especially ecotourism in the world tends toincreasing Challenge- Weather and climate change abnormally.- Competition in tourism products is increasingly fierce, especially with other localities in the province such as Ha Long, Mong Cai...- Awareness of tourists, especially domestic tourists, about ecotourism and nature conservation is not high. Through summary analysis using SWOT matrix we see that:  To exploit strengths and take advantage of opportunities, it is necessary to: - Diversify products and service types (build more tourism routes aimed at specific needs of tourists: experiential tourism immersed in nature, spiritual cultural tourism...) - Effective exploitation of resources and differentiated products (natural resources and human resources) div.maincontent .p { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 14pt; margin:0pt; } div.maincontent p { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 14pt; margin:0pt; } div.maincontent .s1 { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 13pt; } div.maincontent .s2 { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 13pt; } div.maincontent .s3 { color: #0D0D0D; font-family:"Times New Roman", serif; font-style: normal; font-weight: bold; text-decoration: none; font-size: 14pt; } div.maincontent .s4 { color: black; font-family:"Times New Roman", serif; font-style: italic; font-weight: normal; text-decoration: none; font-size: 14pt; } div.maincontent .s5 { color: black; font-family:"Times New Roman", serif; font-style: italic; font-weight: bold; text-decoration: none; font-size: 14pt; } div.maincontent .s6 { color: black; font-family:"Times New Roman", serif; font-style: italic; font-weight: normal; text-decoration: none; font-size: 14pt; vertical-align: -3pt; } div.maincontent .s7 { color: black; font-family:"Times New Roman", serif; font-style: italic; font-weight: normal; text-decoration: none; font-size: 14pt; vertical-align: -2pt; } div.maincontent .s8 { color: black; font-family:"Times New Roman", serif; font-style: italic; font-weight: normal; text-decoration: none; font-size: 14pt; vertical-align: -1pt; } div.maincontent .s9 { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 14pt; } div.maincontent .s10 { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: bold; text-decoration: none; font-size: 14pt; } div.maincontent .s11 { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 14pt; } div.maincontent .s12 { color: black; font-family:Symbol, serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 14pt; } div.maincontent .s13 { color: black; font-family:Wingdings; font-style: normal; font-weight: normal; text-decoration: none; font-size: 14pt; } div.maincontent .s14 { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 9pt; vertical-align: 5pt; } div.maincontent .s15 { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 9pt; vertical-align: 5pt; } div.maincontent .s16 { color: black; font-family:Cambria, serif; font-style: italic; font-weight: normal; text-decoration: none; font-size: 14pt; } div.maincontent .s17 { color: #080808; font-family:"Times New Roman", serif; font-style: normal; font-weight: bold; text-decoration: none; font-size: 14pt; } div.maincontent .s18 { color: #080808; font-family:"Times New Roman", serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 14pt; } div.maincontent .s19 { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 11pt; } div.maincontent .s20 { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 10pt; } div.maincontent .s21 { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: bold; text-decoration: none; font-size: 11pt; } div.maincontent .s22 { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 11pt; } div.maincontent .s23 { color: black; font-family:"Times New Roman", serif; font-style: italic; font-weight: normal; text-decoration: none; font-size: 14pt; } div.maincontent .s24 { color: #212121; font-family:"Times New Roman", serif; font-style: normal; font-weight: normal; tex
Research on treatment of rotator cuff tears using modified Mason-Allen tendon endoscopic suturing technique and creating micro-damage at the attachment area - 2

NLU processes user messages using a pipeline where the following processing steps are configured sequentially:

Domain Classification

Classification)

Intent classification

Maybe you are interested!

Classification)

Entity

Extraction)

Figure 1.3: Main processing steps in the NLU pipeline [1]

In this pipeline, you can customize components from data preprocessing, language modeling, algorithms used for word segmentation and entity information extraction…

Extract information in pipelined

“12 month loan interest rate?”

{“loan”:”loan”,“term”:”12 months”}

Tokenizer

Entity extraction

Chunker

Name entity recognition

Part of Speech Tagger

Pipelined intent classification

“12 month loan interest rate?” {intent:” interest”}

Vectorization

Intent classification

12 month loan interest rate?

Natural Language Understanding (NLU)

For details on the processing steps, see model 1.4: In which the entity extraction step is the slot filling step in figure 1.3.

Figure 1.4: Processing steps in NLU [2]

To classify user intent, we need language modeling, which is the representation of language in a vector form that can be understood by machines (vectorization). The most popular method today is word embedding. Word embedding is the general name for a set of language models and feature learning methods in natural language processing (NLP), where words or phrases from the vocabulary are mapped to real-number vectors. Conceptually, it involves mathematically embedding from a space with one dimension for each word into a continuous vector space with much lower dimensions. Some popular representation methods such as Word2Vec, GloVe or the newer FastText will be introduced in the following section.

After language modeling including training input data for the bot, determining user intent from user questions based on the trained set is the intent classification or text classification step. In this step, we can use some techniques such as: Naive Bayes, Decision Tree (Random Forest), Vector Support Machine (SVM), Convolution Neural Network (CNN), Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM, Bi-LSTM). Most current chatbots apply deep learning models such as RNN and LSTM to classify user intent. The biggest challenge for chatbots in this step is to identify multiple intents in a user statement. For example, if you say "hello, check my account balance", the bot must identify 2 intentions "hello" and "check balance" in the user statement. If the bot can understand and answer this type of question, it will make interacting with the bot more natural.

Next is the extraction of information in the user dialogue. The information to be extracted is usually in the form of numbers, strings or times and they must be declared and trained in advance.

Segmenting words (Tokenization or word segmentation): Word segmentation is a processing process aimed at determining the boundaries of words in a sentence. It can also be simply understood that word segmentation is the process of determining single words, compound words, etc. in a sentence. For language processing, in order to determine the grammatical structure of a sentence, to determine the word class of a word in a sentence, it is essential to determine which word is in the sentence. This problem seems simple to humans, but for computers, this is a very difficult problem to solve. Normally, languages separate words by spaces, but in Vietnamese, there are many compound words and phrases. For example, the compound word "account" is created by 2 single words "tai" and "khoan". There are a number of algorithms that support solving this problem such as the longest matching model, the extreme matching model, etc.

Maximum Matching, Hidden Markov Models (HMM) or CRF (conditional random field) model...

1.3.1 Determine user intent

Figure 1.5: Model of steps to determine intention The user intention classification system has several basic steps:

 Data preprocessing

 Feature extraction

 Model training

 Classification

The data preprocessing step is the operation of “cleaning” the data such as: removing redundant information, standardizing data and converting misspelled words to correct spelling, standardizing abbreviations, etc. The data preprocessing step plays an important role in the chatbot system. If the input data is processed in this step, it will increase the accuracy and intelligence of the bot.

Next is the feature extraction step (feature extraction or feature engineering) from the cleaned data. In the traditional machine learning model (before the deep learning model was widely applied), the feature extraction step greatly affects the accuracy of the classification model. To extract good features, we need to analyze the data quite meticulously and also need expert knowledge in each specific application domain.

The training step takes as input the extracted features and applies machine learning algorithms to learn a classification model. The classification models can be classification rules (if using decision tree) or weight vectors.

corresponding to the extracted features (as in logistic regression, SVM, or Neural network models).

Once we have an intent classification model, we can use it to classify a new sentence. This sentence also goes through preprocessing and feature extraction steps, then the classification model will determine a “score” for each intent in the set of intents and give the highest-scoring intent.

To provide accurate support, chatbots need to identify the user's intent. Identifying the user's intent will determine how the next conversation between the person and the chatbot will take place. Therefore, if the user's intent is incorrectly identified, the chatbot will give incorrect, out-of-context responses. At that time, the user may feel annoyed and not return to use the system. The problem of identifying user intent therefore plays a very important role in the chatbot system.

For a closed application domain, we can limit the number of user intentions to a finite set of predefined intentions that are relevant to the tasks that the chatbot can support. With this limit, the problem of determining user intentions can be reduced to the problem of text classification. Given a user sentence as input, the classification system will determine the intent corresponding to that sentence in the set of predefined intents.

To build an intent classification model, we need a training dataset that includes different expressions for each intent. For example, the same intent asking about a user's account balance can use the following expressions:

Account information?

Account lookup?

Account balance?

How much money is in the account?

It can be said that the step of creating training data for the intent classification problem is one of the most important tasks when developing a chatbot system and greatly affects the quality of the chatbot system's products later. This task also requires a lot of time and effort from the chatbot developer.

1.4 Conversation Management (DM)

In long conversations between humans and chatbots, chatbots will need to remember contextual information or manage conversation states.

(dialog state). The issue of dialogue management is then important to ensure that the exchange between humans and machines is smooth.

The function of the dialogue management component is to receive input from the NLU component, manage dialogue states, dialogue contexts, and transmit output to the Natural Language Generation (NLG) component.

Figure 1.6: State management model and action decision in conversation [2]

The dialog state is saved and based on the dialog policy to decide the next action for the bot's response in a dialog scenario, or the action depends only on its previous dialog state.

For example, the dialogue management module in a chatbot serving airline ticket booking needs to know when the user has provided enough information for booking to create a ticket to the system or when it needs to reconfirm the information entered by the user. Currently, chatbot products often use the Finite State Automata (FSA) model, the Frame-based (Slot Filling) model, or a combination of these two models. Some new research directions apply the ANN model to dialogue management to help bots become smarter, see section 2.5 for details.

1.4.1 FSA finite state machine model

Figure 1.7: Conversation management based on FSA finite state machine model

The simplest FSA model manages dialogues. For example, a customer care system of a telecommunications company serves customers who complain about slow internet. The task of the chatbot is to ask the customer's name, phone number, the name of the Internet package the customer is using, and the actual Internet speed of the customer. The figure illustrates a dialogue management model for a customer care chatbot. The states of the FSA correspond to the questions that the dialogue manager asks the user. The arcs connecting the states correspond to the actions that the chatbot will perform. These actions depend on the user's response to the questions. In the FSA model, the chatbot is the user-oriented side of the conversation.

The advantage of the FSA model is that it is simple and the chatbot will predetermine the desired response format from the user. However, the FSA model is not really suitable for complex chatbot systems or when the user provides different information in the same conversation. In the chatbot example above, when the user provides both name and phone number at the same time, if the chatbot continues to ask for the phone number, the user may feel annoyed.

1.4.2 Frame-based model

The Frame-based model (or Form-based model) can solve the problem that the FSA model encounters. The Frame-based model relies on pre-defined frames to guide the conversation. Each frame will include information (slots) to be filled in and corresponding questions that the dialogue manager asks the user. This model allows the user to fill in information in many different slots in the frame. The figure above is an example of a frame for a chatbot.

Figure 1.8: Frame for chatbot asking for customer information

The Frame-based dialogue management component will ask questions to customers, fill in the slots based on the information provided by the customer until it has enough necessary information. When the user answers multiple questions at the same time, the system will have to fill in the corresponding slots and remember to not ask questions that have already been answered.

In complex application domains, a conversation may have multiple frames. The problem for chatbot developers is how to know when to switch between frames. A common approach to managing the transfer of control between frames is to define production rules. These rules are based on a number of factors such as the most recent conversational statement or question asked by the user.

1.5 Generative Language Components (NLG)

NLG is the response generation component of a chatbot. It relies on mapping the actions of the conversation manager into natural language to respond to the user.

There are 4 commonly used mapping methods: Template-Base, Plan-based, Class-base, RNN-base

1.5.1 Template-based NLG

This answer mapping method uses predefined bot answer templates to generate answers.

Figure 1.9: Language generation method based on response sample set [1]

- Advantages : simple, easy to control. Suitable for closed domain problems.

- Disadvantages : time consuming to define rules, not natural in the answer. For large systems, it is difficult to control the rules, leading to the system being difficult to develop and maintain.

1.5.2 Plan-based NLG