Skip to main content

Table 3 The overview of the corrected CHEMDNER corpus in terms of the number of PubMed abstracts (#Articles), the number of CEMs (#CEMs), and the number of CEMs for each of the CEM classes in C = {SYSTEMATIC, IDENTIFIER, FORMULA, TRIVIAL, ABBREVIATION, FAMILY, MULTIPLE, NO CLASS} × means the resulting figure is unknown.

From: A CRF-based system for recognizing chemical entity mentions (CEMs) in biomedical literature

 

Training

Development

Test

Background

#Articles

3,500

3,500

3,000

17,000

#CEMs

29,478

29,485

25,351

×

ABBREVIATION

4,538

4,517

4,059

×

FAMILY

4,090

4,212

3,622

×

FORMULA

4,448

4,117

3,443

×

IDENTIFIER

672

639

513

×

MULTIPLE

202

187

199

×

SYSTEMATIC

6,656

6,814

5,666

×

TRIVIAL

8,832

8,967

7,808

×

NO CLASS

40

32

41

×