Skip to main content

Table 1 Iterative feature elimination results.

From: A document processing pipeline for annotating chemical entities in scientific documents

  

Precision (%)

Recall (%)

F-measure (%)

Windows

 

78.65

78.75

78.70

Conjunctions

 

81.37

85.83

83.54

Base

Token

81.32

85.86

83.53

 

Lemma

78.51

83.80

81.07

Linguistic

POS

81.75

73.21

77.25

 

Chunk

82.93

80.83

81.86

 

Dependency parsing

85.88

82.78

84.30

 

Capitalization

85.97

83.04

84.48

Orthographic

Counting

85.86

83.09

84.45

 

Symbols

85.99

82.96

84.45

 

Char n-grams

85.88

82.53

84.17

Morphological

Suffix

85.74

83.02

84.36

 

Prefix

85.93

83.03

84.45

 

Word shape

85.83

82.42

84.09

Lexicons

Chemicals

85.33

81.48

83.36

  1. The first line shows the results obtained with the full set of features, together with windows or conjunctions of features. The following lines show results after iterative and cumulative removal of features. Values in bold indicate improvements over the previous best result; the italic value indicates the best result, obtained by removing dependency parsing and capitalization features.