Skip to main content

Advertisement

Table 1 Statistics of the dataset.

From: A comparison of conditional random fields and structured support vector machines for chemical entity recognition in biomedical literature

Types Training set Development set Test set Entire corpus
ABBREVIATION 4,538 4,521 4,059 13,118
FAMILY 4,090 4,223 3,622 11,935
FORMULA 4,448 4,137 3,443 12,028
IDENTIFIER 672 639 513 1,824
MULTIPLE 202 188 199 589
SYSTEMATIC 6,656 6,816 5,666 19,138
TRIVIAL 8,832 8,970 7,808 25,610
NO CLASS 40 32 41 113
ALL 29,478 29,526 25,351 84,355