From: Reconstruction of lossless molecular representations from fingerprints
Abbreviations | Description | Dim | Sequence length | Token size | |
---|---|---|---|---|---|
Ave. | Max | ||||
Predefined substructures | Â | Â | Â | Â | |
 MACCS |  | 166 | 50 | 107 | 160 |
Paths and feature classes | Â | Â | Â | Â | |
 Avalon | Hashed | 512 | 182 | 470 | 516 |
Path-based | Â | Â | Â | Â | |
 HashAP | Atom pair - hashed | 2048 | 92 | 273 | 1998 |
 RDK4 | RDkit fingerprint - hashed | 2048 | 83 | 288 | 2052 |
 RDK4-L | RDK4 - with no branch | 2048 | 58 | 209 | 2052 |
4-atom-paths | Â | Â | Â | Â | |
 TT | Topological torsion | sparse | 32 | 124 | 54973 |
 HashTT | TT - hashed | 2048 | 31 | 118 | 2052 |
Circular | Â | Â | Â | Â | |
 AEs | Morgan radius 1 | sparse | 29 | 65 | 54076 |
 ECFP0 | Morgan radius 0 - hashed | 2048 | 10 | 25 | 100 |
 ECFP2 | Morgan radius 1 - hashed | 2048 | 28 | 64 | 2052 |
 ECFP4 | Morgan radius 2 - hashed | 2048 | 47 | 103 | 2052 |
 FCFP2 | Feature-class of ECFP2 | 2048 | 20 | 51 | 1576 |
 FCFP4 | Feature-class of ECFP4 | 2048 | 36 | 86 | 2052 |
Unique Representation | Â | Â | Â | Â | |
 SMILES | Tokenized atom-wise |  | 51 | 125 | 109 |
 SELFIES | Generic tokenization |  | 44 | 127 | 205 |