Design of Enhanced Proteins through Protein Evolution Anticipation

Description

Analysis of protein evolution provides crucial insights into protein development and offers valuable guidance for designing their properties and exploring the vast sequence space. These analyses are usually done retrospectively utilising ancestral sequence reconstruction or back-to-consensus engineering. However, the Successor Sequence Predictor (SSP) introduces a novel approach by leveraging protein evolution information prospectively. Imitating traditional protein evolution analysis, SSP predicts future amino acid mutations based on the protein's evolutionary history.
For the desired protein sequence, the Successor Sequence Predictor functions in several steps: First, it generates a data set of homologous sequences utilising BLAST and preprocesses it, discarding sequences that are too similar (≥90% identity). Second, the remaining sequences are clustered, and phylogenetic trees are constructed from these clusters. Each phylogenetic tree contains the target sequence and one randomly selected sequence from each cluster. Third, ancestral sequences are reconstructed for each phylogenetic tree, and the sequences along the path between the target and the root are aligned. Fourth, using linear regression on selected amino acid indices, SSP predicts the successor sequence. Finally, evolving positions are identified based on changes in individual indices, and a specific mutation can be suggested if the indices agree on a particular amino acid change.
SSP was evaluated in the design of different protein properties for two proteins: Cold shock protein CspB and aminoglycoside 3'-phosphotransferase. For the former, SSP suggested 14 mutations that should enhance protein stability. Six mutations were stabilising and eight were neutral. Interestingly, none of the suggested mutations were compromising the folding free energy. In the case of aminoglycoside 3'-phosphotransferase, we observed an enhancement in activity through increased resistance to aminoglycosides. While random mutations resulted in an average resistance value of 0.82—indicating a decrease of resistance compared to the wild-type sequence (normalized to ~1)—SSP mutants exhibited an average value of 1.36, suggesting an increase in antibiotic resistance and therefore improved protein activity.
The SSP codes are available on GitHub: https://github.com/loschmidt/successor-sequence-predictor. The method is also integrated into FireProtASR2,

Authors

DOI: 10.5281/zenodo.20761788

Publication Date: 2025-03-28

Back to publications list


About