Skip to main content

Table 16 Summary of ChER's performance under the CHEMDNER track setting (set 1), under similar experimental settings as state-of-the-art methods (sets 2-4), and when applied to various corpora (sets 5-9).

From: Optimising chemical named entity recognition with pre-processing analytics, knowledge-rich features and heuristics

  Data Pre-processing Cust. Post-processing Micro-averages
  Training Test Splitter Tokeniser Feats. Abbr. Comp. P R F1
1 CHEMDNER CHEMDNER LingPipe GENIA 88.87 70.95 78.91
  training & dev. test Cafetiere OSCAR4 92.76 81.30 86.65
2 SciBorg (CM):3-fold CV LingPipe GENIA 80.44 55.16 65.45
    Cafetiere OSCAR4 85.96 74.22 79.66
3 SCAI-IUPAC SCAI-100 LingPipe GENIA 84.78 66.87 74.77
  training (IUPAC) Cafetiere GENIA 86.70 67.50 75.90
4 NaCTeM Metabolites:10-fold CV LingPipe GENIA 81.72 64.49 72.09
    Cafetiere OSCAR4 81.42 79.66 80.53
5 CHEMDNER SCAI-100 LingPipe GENIA 72.56 66.00 69.13
  training & dev. (All) Cafetiere OSCAR4 77.85 78.69 78.27
6 CHEMDNER Patents LingPipe GENIA 72.66 52.97 61.27
  training & dev.   Cafetiere OSCAR4 73.43 57.91 64.75
7 CHEMDNER DDI LingPipe GENIA 76.52 75.00 75.75
  training & dev. test Cafetiere OSCAR4 75.88 92.05 83.18
8 CHEMDNER PK LingPipe GENIA 79.29 84.66 81.89
  training & dev.   Cafetiere GENIA 79.83 88.34 83.87
9 CHEMDNER NaCTeM LingPipe GENIA 63.57 71.63 67.36
  training & dev. Metabolites Cafetiere OSCAR4 65.08 83.29 73.07
  1. The first row in each set corresponds to the baseline. Key: Cust. Feats. = Custom Features, Abbr. = Abbreviation recognition, Comp. = Chemical composition-based token relabelling; = enabled, ✗ = disabled, • = enabling or disabling makes no difference in performance.