Journal of Cheminformatics

Table 1 Statistics of the dataset.

From: A comparison of conditional random fields and structured support vector machines for chemical entity recognition in biomedical literature

Types	Training set	Development set	Test set	Entire corpus
ABBREVIATION	4,538	4,521	4,059	13,118
FAMILY	4,090	4,223	3,622	11,935
FORMULA	4,448	4,137	3,443	12,028
IDENTIFIER	672	639	513	1,824
MULTIPLE	202	188	199	589
SYSTEMATIC	6,656	6,816	5,666	19,138
TRIVIAL	8,832	8,970	7,808	25,610
NO CLASS	40	32	41	113
ALL	29,478	29,526	25,351	84,355

Back to article page

ISSN: 1758-2946

Contact us

Submission enquiries: journalsubmissions@springernature.com