Skip to main content

Table 2 Impact of features on recall for each different class.

From: A document processing pipeline for annotating chemical entities in scientific documents

  

Class recall (%)

Feature

 

Multiple

Family

Abbrev

System

Formula

Identifier

Trivial

Conjunctions (all features)

 

42.02

82.22

81.09

90.21

79.60

73.08

91.46

Base

Token

+1.06

-0.19

+0.55

-0.09

-0.15

0.00

+0.07

 

Lemma

+5.32

-2.25

-1.46

-2.95

-1.16

-4.85

-1.87

Linguistic

POS

-18.62

-12.65

-10.93

-13.88

-16.65

-24.88

-9.61

 

Chunk

-22.87

-6.68

-4.34

-4.91

-6.14

-12.83

-3.13

 

Dependency

0.00

-3.72

-3.76

-1.44

-4.45

-9.39

-2.52

 

Capitalization

-0.53

+0.43

+0.15

+0.26

+0.31

-0.94

+0.30

Orthographic

Counting

0.00

+0.24

-0.27

+0.16

-0.12

-1.72

+0.26

 

Symbols

-1.06

-0.12

-0.11

0.00

-0.24

-0.16

+0.01

 

Char n-grams

+1.60

-2.51

-0.64

-0.43

-1.35

+4.07

+0.47

Morphological

Suffix

+1.06

-0.09

-0.33

+0.16

+0.51

+0.63

-0.27

 

Prefix

0.00

-0.64

-0.09

+0.10

-0.05

+0.31

+0.22

 

Word shape

+1.60

-0.07

-0.97

-0.18

-1.47

-2.50

-0.57

Lexicons

Chemicals

+2.66

-0.19

-2.10

-0.78

-0.75

-10.95

-2.34

  1. Values shown are differences in percentage points to the baseline (first line).