Boosting biomedical document classification through the use of domain entity recognizers and semantic ontologies for document representation: the case of gluten bibliome

doi:10.1016/j.neucom.2021.10.100

Utilize este identificador para referenciar este registo: https://hdl.handle.net/1822/76493

Título:	Boosting biomedical document classification through the use of domain entity recognizers and semantic ontologies for document representation: the case of gluten bibliome
Autor(es):	Pérez-Pérez, Martín Ferreira, Tânia Lourenço, Anália Maria Garcia Igrejas, Gilberto Fdez-Riverola, Florentino
Palavras-chave:	Literature mining Document classification Semi-automatic curation Ontology-based representation Gluten bibliome
Data:	Mai-2022
Editora:	Elsevier
Revista:	Neurocomputing
Citação:	Pérez-Pérez, Martín; Ferreira, Tânia; Lourenço, Anália; Igrejas, Gilberto; Fdez-Riverola, Florentino, Boosting biomedical document classification through the use of domain entity recognizers and semantic ontologies for document representation: the case of gluten bibliome. Neurocomputing, 484, 223-237, 2022
Resumo(s):	The increasing number of scientific research documents published keeps growing at an unprecedented rate, making it increasingly difficult to access practical information within a target domain. This situation is motivating a growing interest in applying text mining techniques for the automatic processing of text resources to structure the information that helps researchers to find information of interest and infer knowledge of practical use. However, the automatic processing of research documents requires the previous existence of large, manually annotated text corpora to develop robust and accurate text mining processing methods and machine learning models. In this context, semi-automatic extraction techniques based on structured data and state-of-the-art biomedical tools appear to have significant potential to enhance curator productivity and reduce the costs of document curation. In this line, this work proposes a semi-automatic machine learning workflow and a NER+Ontology boosting technique for the automatic classification of biomedical literature. The practical relevance of the proposed approach has been proven in the curation of 4,115 gluten-related documents extracted from PubMed and contrasted against the word embedding alternative. Comparing the results of the experiments, the proposed NER+Ontology technique is an effective alternative to other state-of-the-art document representation techniques to process the existing biomedical literature.
Tipo:	Artigo
Descrição:	"Available online 11 November 2021"
URI:	https://hdl.handle.net/1822/76493
DOI:	10.1016/j.neucom.2021.10.100
ISSN:	0925-2312
Versão da editora:	https://www.journals.elsevier.com/neurocomputing
Arbitragem científica:	yes
Acesso:	Acesso aberto
Aparece nas coleções:	CEB - Publicações em Revistas/Séries Internacionais / Publications in International Journals/Series

Ficheiros deste registo:

Ficheiro	Descrição	Tamanho	Formato
document_55007_1.pdf		4,34 MB	Adobe PDF	Ver/Abrir

Ver registo completo Sugerir correção Estatísticas

Citations

Altmetrics