Skip to main content

Table 16 Summary of ChER's performance under the CHEMDNER track setting (set 1), under similar experimental settings as state-of-the-art methods (sets 2-4), and when applied to various corpora (sets 5-9).

From: Optimising chemical named entity recognition with pre-processing analytics, knowledge-rich features and heuristics

 

Data

Pre-processing

Cust.

Post-processing

Micro-averages

 

Training

Test

Splitter

Tokeniser

Feats.

Abbr.

Comp.

P

R

F1

1

CHEMDNER

CHEMDNER

LingPipe

GENIA

✗

✗

✗

88.87

70.95

78.91

 

training & dev.

test

Cafetiere

OSCAR4

✓

✓

✓

92.76

81.30

86.65

2

SciBorg (CM):3-fold CV

LingPipe

GENIA

✗

✗

✗

80.44

55.16

65.45

   

Cafetiere

OSCAR4

✓

✓

✓

85.96

74.22

79.66

3

SCAI-IUPAC

SCAI-100

LingPipe

GENIA

✗

✗

✗

84.78

66.87

74.77

 

training

(IUPAC)

Cafetiere

GENIA

✓

✓

✓

86.70

67.50

75.90

4

NaCTeM Metabolites:10-fold CV

LingPipe

GENIA

✗

✗

✗

81.72

64.49

72.09

   

Cafetiere

OSCAR4

✓

✓

✓

81.42

79.66

80.53

5

CHEMDNER

SCAI-100

LingPipe

GENIA

✗

✗

✗

72.56

66.00

69.13

 

training & dev.

(All)

Cafetiere

OSCAR4

✓

✓

✓

77.85

78.69

78.27

6

CHEMDNER

Patents

LingPipe

GENIA

✗

✗

✗

72.66

52.97

61.27

 

training & dev.

 

Cafetiere

OSCAR4

✓

✓

✓

73.43

57.91

64.75

7

CHEMDNER

DDI

LingPipe

GENIA

✗

✗

✗

76.52

75.00

75.75

 

training & dev.

test

Cafetiere

OSCAR4

✓

•

✓

75.88

92.05

83.18

8

CHEMDNER

PK

LingPipe

GENIA

✗

✗

✗

79.29

84.66

81.89

 

training & dev.

 

Cafetiere

GENIA

✓

✓

✓

79.83

88.34

83.87

9

CHEMDNER

NaCTeM

LingPipe

GENIA

✗

✗

✗

63.57

71.63

67.36

 

training & dev.

Metabolites

Cafetiere

OSCAR4

✓

✓

✓

65.08

83.29

73.07

  1. The first row in each set corresponds to the baseline. Key: Cust. Feats. = Custom Features, Abbr. = Abbreviation recognition, Comp. = Chemical composition-based token relabelling; ✓ = enabled, ✗ = disabled, • = enabling or disabling makes no difference in performance.