Journal of Cheminformatics

Table 1 Iterative feature elimination results.

From: A document processing pipeline for annotating chemical entities in scientific documents

		Precision (%)	Recall (%)	F-measure (%)
Windows		78.65	78.75	78.70
Conjunctions		81.37	85.83	83.54
Base	Token	81.32	85.86	83.53
	Lemma	78.51	83.80	81.07
Linguistic	POS	81.75	73.21	77.25
	Chunk	82.93	80.83	81.86
	Dependency parsing	85.88	82.78	84.30
	Capitalization	85.97	83.04	*84.48*
Orthographic	Counting	85.86	83.09	84.45
	Symbols	85.99	82.96	84.45
	Char n-grams	85.88	82.53	84.17
Morphological	Suffix	85.74	83.02	84.36
	Prefix	85.93	83.03	84.45
	Word shape	85.83	82.42	84.09
Lexicons	Chemicals	85.33	81.48	83.36

The first line shows the results obtained with the full set of features, together with windows or conjunctions of features. The following lines show results after iterative and cumulative removal of features. Values in bold indicate improvements over the previous best result; the italic value indicates the best result, obtained by removing dependency parsing and capitalization features.

Back to article page

ISSN: 1758-2946

Contact us

Submission enquiries: journalsubmissions@springernature.com