Skip to main content

Table 1 Iterative feature elimination results.

From: A document processing pipeline for annotating chemical entities in scientific documents

   Precision (%) Recall (%) F-measure (%)
Windows   78.65 78.75 78.70
Conjunctions   81.37 85.83 83.54
Base Token 81.32 85.86 83.53
  Lemma 78.51 83.80 81.07
Linguistic POS 81.75 73.21 77.25
  Chunk 82.93 80.83 81.86
  Dependency parsing 85.88 82.78 84.30
  Capitalization 85.97 83.04 84.48
Orthographic Counting 85.86 83.09 84.45
  Symbols 85.99 82.96 84.45
  Char n-grams 85.88 82.53 84.17
Morphological Suffix 85.74 83.02 84.36
  Prefix 85.93 83.03 84.45
  Word shape 85.83 82.42 84.09
Lexicons Chemicals 85.33 81.48 83.36
  1. The first line shows the results obtained with the full set of features, together with windows or conjunctions of features. The following lines show results after iterative and cumulative removal of features. Values in bold indicate improvements over the previous best result; the italic value indicates the best result, obtained by removing dependency parsing and capitalization features.