Skip to main content

Table 1 Overview of the datasets

From: DECIMER 1.0: deep learning for chemical image recognition using transformers

 

Dataset 1

Dataset 2

Dataset 3

Total dataset size

39 million

37 million

37 million

 

Subset 1

Subset 2

Subset 3

Subset 4

Subset 5

Subset 6

Non augmented test set

Augmented test set

Train dataset size

921,600

10,240,000

15,360,000

35,002,240

15,360,000

33,304,320

33,304,320

33,304,320

Test dataset size

102,400

1,024,000

1,536,000

3,929,093

1,536,000

3,700,480

2,000,000

2,000,000