| | |
| | |
Stat |
Members: 3665 Articles: 2'599'751 Articles rated: 2609
25 January 2025 |
|
| | | |
|
Article overview
| |
|
Is word segmentation necessary for Vietnamese sentiment classification? | Duc-Vu Nguyen
; Ngan Luu-Thuy Nguyen
; | Date: |
1 Jan 2023 | Abstract: | To the best of our knowledge, this paper made the first attempt to answer
whether word segmentation is necessary for Vietnamese sentiment classification.
To do this, we presented five pre-trained monolingual S4- based language models
for Vietnamese, including one model without word segmentation, and four models
using RDRsegmenter, uitnlp, pyvi, or underthesea toolkits in the pre-processing
data phase. According to comprehensive experimental results on two corpora,
including the VLSP2016-SA corpus of technical article reviews from the news and
social media and the UIT-VSFC corpus of the educational survey, we have two
suggestions. Firstly, using traditional classifiers like Naive Bayes or Support
Vector Machines, word segmentation maybe not be necessary for the Vietnamese
sentiment classification corpus, which comes from the social domain. Secondly,
word segmentation is necessary for Vietnamese sentiment classification when
word segmentation is used before using the BPE method and feeding into the deep
learning model. In this way, the RDRsegmenter is the stable toolkit for word
segmentation among the uitnlp, pyvi, and underthesea toolkits. | Source: | arXiv, 2301.00418 | Services: | Forum | Review | PDF | Favorites |
|
|
No review found.
Did you like this article?
Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.
|
| |
|
|
|