Skip to main content

Advertisement

Table 3 The overview of the corrected CHEMDNER corpus in terms of the number of PubMed abstracts (#Articles), the number of CEMs (#CEMs), and the number of CEMs for each of the CEM classes in C = {SYSTEMATIC, IDENTIFIER, FORMULA, TRIVIAL, ABBREVIATION, FAMILY, MULTIPLE, NO CLASS} × means the resulting figure is unknown.

From: A CRF-based system for recognizing chemical entity mentions (CEMs) in biomedical literature

  Training Development Test Background
#Articles 3,500 3,500 3,000 17,000
#CEMs 29,478 29,485 25,351 ×
ABBREVIATION 4,538 4,517 4,059 ×
FAMILY 4,090 4,212 3,622 ×
FORMULA 4,448 4,117 3,443 ×
IDENTIFIER 672 639 513 ×
MULTIPLE 202 187 199 ×
SYSTEMATIC 6,656 6,814 5,666 ×
TRIVIAL 8,832 8,967 7,808 ×
NO CLASS 40 32 41 ×