Supervised by: Ministry of Culture of PRC

Sponsored by:National Library of China
  Library Society of China

ISSN 1001-8867    CN 11-2746/G2

A Pattern and POS Auto-Learning Method for Terminology Extraction from Scientific Text

Abstract: A lot of new scientific documents are being published on various platforms every day. It is more and more imperative to quickly and efficiently discover new words and meanings from these documents. However, most of the related works rely on labeled data, and it is quite difficult to deal with unlabeled new documents efficiently. For this, we have introduced an unsupervised method based on sentence patterns and part of speech (POS) sequences. Our method just needs a few initial learnable patterns to obtain the initial terminology tokens and their POS sequences. In this process, new patterns are constructed and can match more sentences to find more POS sequences of terminology. Finally, we use obtained POS sequences and sentence patterns to extract terminology terms in new scientific text. Experiments on paper abstracts from Web of Knowledge show that this method is practical and can achieve a good performance on our test data.

Keywords: auto-learning, terminology extraction, unsupervised method, scientific text


顺义区| 吕梁市| 郯城县| 姜堰市| 仁怀市| 远安县| 玉环县| 宜丰县| 霍城县| 晴隆县| 杂多县| 婺源县| 杭锦旗| 区。| 长子县| 蕲春县| 凌云县| 白河县| 大埔区| 广水市| 即墨市| 光山县| 保康县| 敦化市| 闽侯县| 安溪县| 柘城县| 台东县| 禹州市| 丹凤县| 上蔡县| 武平县| 遂昌县| 泰宁县| 绥棱县| 尉氏县| 宁津县| 万安县| 四平市| 西宁市| 九江县|