Descripció del projecte
Interest in using peptide molecules as therapeutic agents due to high selectivity and efficacy is increasing within the pharmaceutical industry. However, most peptide-derived drugs cannot be administered orally because of low bioavailability and instability in the gastrointestinal tract due to protease activity. Therefore, structural modifications peptides are required to improve their stability. For this purpose, several in-silico software tools have been developed such as PeptideCutter or PoPS, which aim to predict peptide cleavage sites for different proteases to use this knowledge for the modification’s introduction. Moreover, several databases exist where this information is collected and stored from public sources such as MEROPS database. These tools can help design a peptide drug with increased stability against proteolysis, though they are limited to natural amino acids or cannot process cyclic peptides, for example. In drug design to identify the potential cleavage sites is through metabolic stability study of compounds and identification of the degradation products and metabolites are a critical component of the drug development process. Since the task of metabolite characterization of peptides from MS data is very time-consuming, several semi-automated tools were developed for full scan/data-dependent MS/MS peptide data interpretation (SEQUEST, etc.). These MS-based proteomics approaches have difficulties with sequencing cyclic peptides without prior linearization and they are limited to the 20 standard amino acids.
Nowadays metabolic stability study of compounds and identification of the degradation products and metabolites are a critical component of the drug development process. It is also crucial to maintain this information into a database to have continuous access to this information during the following drug design process to identify potential structure improvement strategies and to guide the first chemical modifications in early drug discovery projects. Many steps of the drug design process recently were automatized, and certain assays were developed to speed the process and perform high-throughput analysis of the potential drug candidates. The time/effort needed effort to analyze the incubation samples depends on the experiment design. Partially, data analysis was automatized in this workflow using software solutions such as MassMetaSite and WebMetabase. This software applications helped to significantly reduce the time spent on the data analysis and results review as it performs following steps automatically:
select the chromatographic peaks that are related to the compound of interest, find the mass spectral information for each extracted peak, assign potential structures by comparing the theoretical fragmentation that can be predicted with the actual mass to charge ratio (m/z) values obtained with the experimental spectra, scoring potential solutions depending on the fragment assigned to the spectra alone or by the comparison with the parent fragmentation analysis. After results are clustered between different experimental conditions and summary is presented to the expert in a single experiment entity, results are stored in the database and finally after the review report can be prepared.
During this project we develop a new search algorithm applied for the mentioned database and analysis of the extracted data. Extracted data is used in newly developed workflow that allows to use extracted information from databases or stored information about identified cleavage sites to train machine learning cleavage site prediction models. During this workflow development we are going to apply different approaches to describe the cleavage site such as by sequence one letter code, by describing a whole sequence of a peptide as a graph and adding 3D structure description of the cleavage site. We are going to train several cleavage site prediction models with the data extracted from MEROPS database and evaluate its predictive performance using collect experimental data for a set (linear/cyclic, natural/unnatural amino acids) of peptide drugs and substrate peptides incubated with different proteolytic media. Moreover, we are going to develop a new algorithm to identify the most frequent observed cleavage sites to describe the whole dataset. The main advantages of the developed approach are the ability to perform predictions of the cleavage site on the cyclic peptides and those containing unnatural amino acids, store processed information in a database and update predictive models with the new data introduced into the database.