OCSR system | Year | Dataset | Classification | Details | Evaluation | Results |
---|---|---|---|---|---|---|
CNN based approach from [40] | 2019 | 3600 images created by 200 people (90/10 for training and test) | Ring structures | Transfer learning with VGG-19 for 36 classes | Recognition rate | 80% |
MSE-DUDL | 2019 | Training: more than 50 million samples from PubChem, Indigo and USPTO datasets Test: 454 (Valko dataset) and other proprietary datasets | SMILES | U-Net based segmentation with GridLSTM | Not given | Validation: 77%-82% Test: 41% (Valko), 83%(others) |
DECIMER | 2020 | PubChem | DeepSMILES | Encoder/Decoder model with CNN and GRU | Tanimoto | 0.53 |
DECIMER Segmentation | 2021 | Training: 994 articles from the Journal of Natural Products Test: 777 pages from 75 journals (Journal of Natural Products, Phytochemistry and Molecules) | segmented structures | Mask R-CNN for the object detection and VGG for classification | Recognition rate | 91.3% |
DECIMER 1.0 | 2021 | 39 million (PubChem) (90/10 for training and test) | SELFIES | Transfer learning (EfficientNet) for classification and transformer for sequence generation | Tanimoto | 0.99 96.47% of the results had Tanimoto = 1.0 |
ICMDT | 2021 | 4 million images (BMS Dataset): 2.4 million training 1.6 million test | InChI | Deep TNT block | Levenshtein distance | 2.5 |