From: Investigation of the structure-odor relationship using a Transformer model
p | F1 | TP | Substructure constraint | TN | Summary features | |
---|---|---|---|---|---|---|
Fruity | 242 | 0.636 | Mainly annotates C(=O)O | C(=O)O and no atoms of C(=O)O in a ring. | Mainly C(=O)OH | C(=O)O |
Sweet | 208 | 0.509 | Multiple structures, including C(=O)O | C(=O)O | Mainly C(=O)OH | Multiple structures, including C(=O)O |
Green | 189 | 0.520 | C=C, C=O | C=O, C=C, CC(C)C and none of atoms in a ring | CC(=C)C | C=O and C=C without CC(=C)C |
Floral | 147 | 0.541 | Multiple substructures, including C(=O)O, C(=O), C with 3 carbon neighbors, c1ccccc1 | c1ccccc1C(=O)O or ‘A\(\sim\)A(\(\sim\)A)\(\sim\)A’ | No obvious features | Multiple substructures, including C(=O)O, C(=O), C with 3 carbon neighbors, c1ccccc1 |
Woody | 107 | 0.517 | CC(C)(C)C and three atoms of CC(C)(C)C in a ring | CC(C)(C)C and three atoms of CC(C)(C)C in a ring | Only 3 molecules, and these 3 samples are labeled by ODs such as ‘camphor’ and ‘earthy’ | CC(C)(C)C and three atoms of CC(C)(C)C in a ring |
Fatty | 78 | 0.475 | Carbon chain, C=O, -OH | ’C\(\sim\)C\(\sim\)C\(\sim\)C\(\sim\)C \(\sim\)C\(\sim\)C\(\sim\)C’, with each C having only two heavy neighbors | Tends to mark C(=O)O vaguely | Long carbon chain |
Rose | 53 | 0.503 | CC=C(C)C | CC=C(C)C’, c1ccccc1CCCC | C=O at the end | CC=C(C)C without C=O at the end |
Sulfurous | 43 | 0.709 | S | S | S=O | S but not S=O |
Minty | 32 | 0.466 | CC(=C)C1CCCCC1 | CC(=C)C1CCCCC1 | Only 2 molecules, and they are labeled by ODs such as ‘fresh’ and ‘herb’ | CC(=C)C1CCCCC1 |
Roasted | 36 | 0.470 | ’[n,s,o]’ | ’[n,s,o]’ | Sometimes marks other atoms instead of ’[n]’ | Substructures related ’[n,s,o]’ |
Meaty | 36 | 0.591 | ’[SH]’, S | ’[SH]’, SS | Tends to mark atoms on both sides of SS instead of SS | ’[SH]’, SS and some neighboring substructures |
Pineapple | 28 | 0.467 | C(=O)O | C(=O)O and no atoms of C(=O)O in a ring | Does not mark C(=O)O | May be a substructure containing C(=O)O |
Aldehydic | 22 | 0.462 | C=O, C=C | C=O | Mainly C(=O)O | C(=O) but not C(=O)O |
Phenolic | 24 | 0.484 | c1ccccc1O | c1ccccc1O | Tends to mark atoms whose neighbor is an aromatic carbon | c1ccccc1O |
Honey | 26 | 0.453 | c1ccccc1 and C(=O)O | c1ccccc1 and C(=O)O | Marks c1ccccc1O vaguely | Both c1ccccc1 and C(=O)O exist in molecule |
Orange | 29 | 0.513 | Carbon chain with C=O at the end | ’C\(\sim\)C\(\sim\)C\(\sim\)C\(\sim\)C \(\sim\)C=O’, and none of the atoms are in a ring or have more than 3 heavy neighbors | Most samples are labeled as ‘fruity’ or ‘citrus’ | Carbon chain with C=O at the end |
Musk | 11 | 0.493 | Ring with more than 10 atoms | Ring with more than 10 atoms | Too few samples to summarize | Ring with more than 10 atoms. |
Coconut | 18 | 0.481 | C(=O), with C in a ring | C(=O), with C in a ring | Tends to mark the end atoms that connected to a carbon in a ring | C(=O), with C in a ring and O at the end |
Terpene | 9 | 0.533 | Too few samples | None | Too few samples | Too few samples |