Supervised by: Ministry of Culture of PRC

Sponsored by:National Library of China
  Library Society of China

ISSN 1001-8867    CN 11-2746/G2

Design of a generic, open platform for machine learning-assisted indexing and clustering of articles in PubMed, a biomedical bibliographic database

Abstract

Many investigators have carried out text mining of the biomedical literature for a variety of purposes, ranging from the assignment of indexing terms to the disambiguation of author names. A common approach is to define positive and negative training examples, extract features from article metadata, and use machine learning algorithms. At present, each research group tackles each problem from scratch, in isolation of other projects, which causes redundancy and a great waste of effort. Here, we propose and describe the design of a generic platform for biomedical text mining, which can serve as a shared resource for machine learning projects and as a public repository for their outputs. We initially focus on a specific goal, namely, classifying articles according to publication type and emphasize how feature sets can be made more powerful and robust through the use of multiple, heterogeneous similarity measures as input to machine learning models. We then discuss how the generic platform can be extended to include a wide variety of other machine learning-based goals and projects and can be used as a public platform for disseminating the results of natural language processing (NLP) tools to end-users as well.

Keywords: Text mining;machine learning;semantic similarity;vector representation;community platforms;data sharing;open science

克拉玛依市| 临洮县| 宝坻区| 长沙市| 丰都县| 淮阳县| 柘城县| 元阳县| 兴城市| 象山县| 高邮市| 连城县| 泾川县| 平武县| 怀宁县| 青神县| 集安市| 邻水| 南安市| 潜江市| 扎赉特旗| 永兴县| 明水县| 县级市| 微山县| 内丘县| 镇安县| 特克斯县| 苏尼特左旗| 鄂尔多斯市| 南和县| 洛隆县| 开阳县| 砚山县| 宁津县| 建水县| 冀州市| 合川市| 会东县| 柳江县| 南皮县|