Skip to main content

Table 2 Impact of features on recall for each different class.

From: A document processing pipeline for annotating chemical entities in scientific documents

   Class recall (%)
Feature   Multiple Family Abbrev System Formula Identifier Trivial
Conjunctions (all features)   42.02 82.22 81.09 90.21 79.60 73.08 91.46
Base Token +1.06 -0.19 +0.55 -0.09 -0.15 0.00 +0.07
  Lemma +5.32 -2.25 -1.46 -2.95 -1.16 -4.85 -1.87
Linguistic POS -18.62 -12.65 -10.93 -13.88 -16.65 -24.88 -9.61
  Chunk -22.87 -6.68 -4.34 -4.91 -6.14 -12.83 -3.13
  Dependency 0.00 -3.72 -3.76 -1.44 -4.45 -9.39 -2.52
  Capitalization -0.53 +0.43 +0.15 +0.26 +0.31 -0.94 +0.30
Orthographic Counting 0.00 +0.24 -0.27 +0.16 -0.12 -1.72 +0.26
  Symbols -1.06 -0.12 -0.11 0.00 -0.24 -0.16 +0.01
  Char n-grams +1.60 -2.51 -0.64 -0.43 -1.35 +4.07 +0.47
Morphological Suffix +1.06 -0.09 -0.33 +0.16 +0.51 +0.63 -0.27
  Prefix 0.00 -0.64 -0.09 +0.10 -0.05 +0.31 +0.22
  Word shape +1.60 -0.07 -0.97 -0.18 -1.47 -2.50 -0.57
Lexicons Chemicals +2.66 -0.19 -2.10 -0.78 -0.75 -10.95 -2.34
  1. Values shown are differences in percentage points to the baseline (first line).