Skip to main content
Fig. 2 | Journal of Cheminformatics

Fig. 2

From: SESNet: sequence-structure feature-integrated deep learning method for data-efficient protein engineering

Fig. 2

Spearman correlation of predicted fitness. A: Comparison of our model to other models on the predicted fitness of the single-site mutants on 20 datasets. We randomly split a given dataset into five folds by randomized shuffling and splitting. All the supervised models are trained and evaluated for five times on different folds splitting. In the i-th iteration, the fold-i is used as the test set while the remaining four folds are used for training and validation. Later, we perform a simple random strategy to split the remaining four folds of dataset into training and validation as a ratio of 7:1. The error bars of each model are the standard deviations of the five-time testing results. B: Comparison of predicted fitness of double-site mutants of our model with other unsupervised models (ESM-1v, ESM-IF1 and MSA transformer), or supervised models (ECNet and ESM-1b). Here, we performed five-fold cross-validation on the data of single-site mutants and used double-site mutants as external test set. Briefly, we randomly split the data of single-site mutants into five folds, and then picked one fold as validation set and the remaining four folds as training set. This process was repeated five times and each fold of data was employed once as the validation set. The model that performed best in the validation set was tested on the double-site mutants. B: Comparison of our model to other models on fitness prediction of quadruple-site mutants of GFP. Here, our model and other supervised model were trained using the single, double, triple-site mutants and all the three together. Where the quadruple-site mutants are the external test set. We performed five-fold cross-validation on the train set and tests the models on quadruple-site mutants

Back to article page