Identifying reduplicative words for Vietnamese word segmentation

Title: Identifying reduplicative words for Vietnamese word segmentation
Authors: Tran, Ngoc Anh
Dao, Thanh Tinh
Nguyen, Phuong-Thai
Nguyen, Hong Quan
Keywords: Vietnamese word segmentation;reduplicative word;reduplicative rules;Word formations;
Computational linguistics;Computer programming
Issue Date: 2015
Publisher: Institute of Electrical and Electronics Engineers Inc.
Citation: Scopus
URI: http://ieeexplore.ieee.org/document/7049878/
http://repository.vnu.edu.vn/handle/VNU_123/34125
This paper proposes a method based on linguistic word-formation rules and dictionaries for determining reduplicative words in Vietnamese. The key idea for identifying whether adjacent syllables in a text can form a reduplicative word based on its formation rules. For 2-syllable reduplicative words, this paper uses rules that describe the repeating and the opposing between pairs of initial consonants, rhymes and tones. Then the method is expanded to identify reduplicative words that have 3 or 4 syllables from 2-syllable ones for the Vietnamese word segmentation task. Experimental results showed that the F1-score was improved to 98.61% and that word segmentation errors were reduced significantly, 1.26%

Nhận xét

Bài đăng phổ biến từ blog này

Tội cướp tài sản theo Bộ luật hình sự Việt Nam năm 1999 (Trên cơ sở thực tiễn tại địa bàn huyện Từ Liêm, Thành phố Hà Nội)

Bảo đảm quyền con người trong hoạt động xét xử vụ án hình sự

Địa hóa môi trường