Skip to main content
Fig. 1 | Journal of Cheminformatics

Fig. 1

From: DLM-DTI: a dual language model for the prediction of drug-target interaction with hint-based learning

Fig. 1

The concept of sequence representation and pre-training is illustrated. In A, the tokenization of a drug sequence (SMILES string) is depicted. In B, the tokenized elements are converted into integer values according to the predefined dictionary, and the encoder model (in this example, ChemBERTa) restores masked tokens into the original tokens (tokens colored in gray). After pre-training, the class token (CLS) is used to represent a given sequence

Back to article page