DEGRONOPEDIA is an online resource allowing for inspection and visualization of known degron motifs in proteomes of selected model organisms as well as in the user-submitted sequence or structure.


Table of Contents


⚡ Quick start
Introduction
Degron tripartite model
DEGRONOPEDIA's overview
Inputs
💡 What is pLLDT/LDDT score?
Customizable parameters
A. Primary degron-related
⚙️ A1. Degron flanking region in sequence
⚙️ A2. Degron flanking region in structure
⚙️ A3. Region length to calculate degron disorder
B. Secondary degron-related
⚙️ B1. Region length to calculate K/C/T/S disorder
C. Tertiary degron-related
⚙️ C1. Minimum IDR distance from K/C/T/S
D. Structure-related
⚙️ D1. Minimum continuous IDR length
⚙️ D2. pLDDT/LDDT disorder threshold
⚙️ D3. Buried residue threshold
Outputs
1. Query by UniProt ID
1.1. Overview panel
💡 What is Gravy hydrophobicity index?
1.2. Visualization panel
1.3. N-/C-termini stability data
1.4. General degron information panel
💡 How are degron motifs found?
💡 What is a regular expression?
1.5. Structural degron information panel
💡 How is secondary structure calculated?
💡 How are IDRs defined?
💡 How are buried residues defined?
1.6. Post-translational modifications (PTMs) panel
1.6.1. PTMs within degron
1.6.2. PTMs within degron's flanking regions
💡 Degron flanking regions in sequence and structure
1.7. Mutations panel
1.7.1. Mutations within degron
1.7.2. Mutations within degron
1.8. Secondary and tertiary degrons panel
💡 Secondary degrons
💡 Tertiary degrons
1.8.1. Secondary and tertiary degrons in sequence
1.8.2. Secondary and tertiary degrons in structure
1.9. Degrons emerging upon proteolysis panel
1.9.1. Degrons upon user-defined cleavage sites
1.9.2. Degrons upon experimentally validated cleavage sites
1.9.3. Degrons upon predicted cleavage sites
1.10. Interacting E3s panel
2. Query by sequence
2.1. Overview panel
2.2. Visualization panel
2.3. N-/C-terminus stability predictions
💡 Predicting N-/C-termini stability
2.4. General degron information panel
2.5. Secondary degrons panel
2.6. Degrons emerging upon proteolysis panel
3. Query by structure
3.1. Overview panel
3.2. Visualization panel
3.3. N-/C-terminus stability predictions
3.3. General degron information panel
3.4. Structural degron information panel
3.5. Secondary and tertiary degrons panel
3.5.1. Secondary and tertiary degrons in sequence
3.5.2. Secondary and tertiary degrons in structure
3.6. Degrons emerging upon proteolysis panel
Download

⚡ Quick start


DEGRONOPEDIA provides the following information:


See DEGRONOPEDIA in action:


image-responsive


Introduction


Maintaining proteostasis requires degrading damaged or unwanted proteins and plays a crucial role in cellular function, organismal growth, and, ultimately, viability. The ubiquitin-proteasome system (UPS) orchestrates protein degradation in a process known as ubiquitination, where a small protein ubiquitin is covalently attached to its target. Ubiquitination is mediated by an enzymatic cascade involving ubiquitin-activating (E1), ubiquitin-conjugating (E2), and ubiquitin ligase (E3) enzymes. The proteasome complex recognizes ubiquitinated proteins and, through proteolysis, degrades them into short peptides that can be further processed.

Degron is a short linear motif on a protein of interest (POI) recognized by E3 ubiquitin ligases. N- and C-termini of protein may act as degron sites, but internal degrons, often within intrinsically disordered regions (IDRs), are also possible.

image-responsive
Scheme of a degron site. E2 - ubiquitin-conjugating enzyme, E3 - ubiquitin ligase, Ub - ubiquitin, POI - protein of interest.

Degron tripartite model


Guharoy and colleagues (Guharoy et al., 2016) suggested a tripartite degron architecture where the short linear degron motif acts as a primary degron. The secondary degron refers to lysines to which ubiquitin may be attached, and the tertiary degron indicates the IDR in close proximity to the secondary degron, acting as a site to initiate protein unfolding prior to entry into the proteasome. The secondary and tertiary degrons are suggested to play subsidiary roles that affect ubiquitin-signaling - lack of a component of the tripartite degron model, e.g., IDR near a ubiquitinated lysine can result in non-proteolytic ubiquitination functions.

image-responsive
The tripartite degron model. The primary degron is a short linear motif recognized by the E3 ligase, localized preferentially within an IDR region of the protein. The secondary degron is a residue nearby the primary degron onto which ubiquitin transfer can occur (in our implementation, it is not only lysine (K) since ubiquitination can occur on cysteines (C), serines (S), or threonines (T)). The tertiary degron is an IDR close to the secondary degron, which acts as an unfolding seed initiating proteasome-dependent protein degradation. Modified from Guharoy et al., 2016.

DEGRONOPEDIA's overview


In DEGRONOPEDIA, we not only screen proteins for the presence of over 400 known degron motifs but also provide a comprehensive view of their occurrence in regards to the tripartite model, structural data, post-translational modifications (PTMs), or mutations. Since degrons may emerge following protease cleavage, we also simulate sequence nicking based on user-defined, experimental data or theoretical predictions for over thirty proteases and screen for new N- and C-degron pathway motifs. In addition, it is also possible to predict protein N-/C-terminal stability using our machine learning classifiers.

Inputs


There are three allowed input types:

  1. UniProt ID - UniProt ID of a protein from the canonical UniProt proteome of seven model organisms: A. thaliana, S. cerevisiae, C. elegans, D. melanogaster, D. rerio, M. musculus, or H. sapiens. Provides the most comprehensive result information (see also Query by UniProt ID)
  2. Sequence - one sequence at a time in the valid FASTA format containing 20 canonical amino acids only (see also Query by sequence)
  3. Structure - a protein monomer structure in the valid PDB format (see also Query by structure):
    • with only one model and one chain
    • with continous numbering starting from 1 (to avoid inconsistency with overlaying data on the sequence). You can easily renumber structure here
    • between 50 and 8000 amino acids long
    • containing 20 canonical amino acids only
    • not exceeding 5MB size
    • with B-factor columns containing pLLDT or LDDT scores (see What is pLLDT/LDDT score?)
📝 Note: regardless of input type, the query protein must contain between 50 and 8000 amino acids (max. 5 MB file in case of a query by structure).

💡What is pLLDT/LDDT score?


pLLDT (predicted Local Distance Difference Test; ranges 0-100) or LLDT scores (Local Distance Difference Test; ranges 0-1) estimate the accuracy of the modeled residues. Those with pLDDT above 70 are generally expected to be modeled well, while pLLDT below this value correlates with disordered regions (Tunyasuvunakool et al., 2021).

When submitting a structure to DEGRONOPEDIA in PDB format, its B-factor column must contain valid pLLDT or LDDT scores, as this information is used to extract the position of IDRs. We recommend uploading a protein model from AlphaFold (provides pLLDT scores) or from the local run of RosetTTAFold (the B-factor column holds then LLDT scores). Please do not directly use the RosetTTAFold model obtained from the ROBETTA server, since the B-factor column holds an estimated RMSD error (it is possible to convert these values to LDDT scores using i.e. the PHENIX software (Liebschner et al., 2019)).

We do not support experimental PDB files, since they lack long IDRs which are crucial from the tripartite degron perspective.

Customizable parameters


Users may customize eight parameters:

📝 Note: Each of the parameters accepts integers only.

A. Primary degron-related


⚙️ A1. Degron flanking region in sequence

Applies: to all query types
Unit: aa
Definition: maximum sequence distance to regions upstream and downstream of a degron site to be considered flanking
Example: when defined as 20 aa:

image-responsive

⚙️ A2. Degron flanking region in structure

Applies: to query by UniProt ID or query by structure
Unit: Å
Definition: maximum structural distance to residues near a degron site to be considered as flanking (note that such residues are not necessarily close in sequence to the degron site)
Example: when defined as 20 Å:

image-responsive

⚙️ A3. Region length to calculate degron disorder

Applies: to query by UniProt ID or query by structure
Unit: aa
Definition: maximum sequence distance to regions upstream and downstream of a degron site to be included in the degron mean disorder score based on pLLDT/LDDT values
Example: when defined as 10 aa:

image-responsive
📝 Note: Whether it is disorder mean based on pLLDT or LDDT score depends on the query type. When querying by UniProt ID it is always based on pLLDT scores since we use AlphaFold models of proteins from the selected proteomes. When querying by structure, it depends on the submitted PDB file (see Inputs).

B. Secondary degron-related


B1. ⚙️ Region length to calculate K/C/T/S disorder

Applies: to query by UniProt ID or query by structure
Unit: aa
Definition: maximum sequence distance to regions upstream and downstream of secondary degron (K/C/T/S) to be included in the secondary degron mean disorder score based on pLLDT/LDDT values.
Example: when defined as 3 aa:

image-responsive
📝 Note: Whether it is disorder mean based on pLLDT or LDDT score depends on the query type. When querying by UniProt ID it is always based on pLLDT scores since we use AlphaFold models of proteins from the selected proteomes. When querying by structure, it depends on the submitted PDB file (see Inputs).

C. Tertiary degron-related


C1. ⚙️ Minimum IDR distance from K/C/T/S

Applies: to query by UniProt ID or query by structure
Unit: aa
Definition: minimum sequence distance of the secondary degron (K/C/T/S) to the continuous IDR region of defined length (see ⚙️ D1. Minimum continuous IDR length) to consider it as a tertiary degron.
Example: when defined as 10 aa for both this parameter and ⚙️ Minimum continuous IDR length :

image-responsive

D. Structure-related


⚙️ D1. Minimum continuous IDR length

Applies: to query by UniProt ID or query by structure
Unit: aa
Definition: minimum number of subsequent (in sequence) disordered residues to be considered as IDR
Example: when defined as 10 aa, minimum 10 disorder residues must appear one after another in sequence to recognize them as IDR

⚙️ D2. pLDDT/LDDT disorder threshold

Applies: to query by UniProt ID or query by structure
Unit: %
Definition: minimum value to recognize the residue as disordered based on its pLLDT/LDDT score. Since we define this parameter as %, it is possible to submit structure with either pLLDT scores (ranges 1-100) or LDDT scores (ranges 0-1). See What is pLLDT/LDDT score?
Example: when defined as 70%, all residues with mean pLLDT/LDDT score (this score should be the same for each atom of the residue, nevertheless we always calculate residue mean from the atoms’ values) below 70/0.7 will be considered as disordered

⚙️ D3. Buried residue threshold

Applies: to query by UniProt ID or query by structure
Unit: %
Definition: minimum value to recognize the residue as buried based on its Relative Solvent Accessibility (RSA) calculated with mkdssp software and normalized using the Sander method.
Example: when defined as 20%, all residues with the RSA value below 0.2 will be considered as buried

Outputs


Depending on the input type, different granularity of degron-related output information is provided.


image-responsive
Comparison of the result information obtained upon different query types in DEGRONOPEDIA.

1. Query by UniProt ID


Users may screen our database using UniProt ID of a protein from the canonical UniProt proteome of seven model organisms: A. thaliana, S. cerevisiae, C. elegans, D. melanogaster, D. rerio, M. musculus, or H. sapiens. Almost every protein in our database has its corresponding AlphaFold model used in the subsequent degron calculations (except for proteins with non-canonical amino acids).

Submitting UniProt ID provides the most comprehensive information about the putative and experimentally-validated degron sites in a protein, as the output is overlaid with additional information about their local context - structural data, PTMs, pathogenic mutations.

📝 Note: the reported Protein Stability Index (PSI) is always derived from experimental data. If you would like to run N-/C-termini stability predictions, please query by sequence or structure.

1.1. Overview panel



image-responsive
  1. Basic information about the protein from the UniProt database
  2. Gravy hydrophobicity index calculated for N-terminus (first 15 aa) and C-terminus (last 15 aa). See What is Gravy hydrophobicity index?
  3. Any degron-relevant information about the query protein with literature sources
  4. Summary of the number of found degron motifs in the query protein, and in proteolysis simulations (based on user-defined sites (if submitted), experimental data and predictions)
  5. Link to download results as xlsx file

💡 What is Gravy hydrophobicity index?
Gravy (grand average of hydropathy) hydrophobicity index is calculated by adding the hydropathy value for each residue and dividing by the length of the sequence (Kyte and Doolittle, 1982). Its higher values indicate that a sequence is more hydrophobic.

In DEGRONOPEDIA, it is calculated for N-terminus (first 15 aa) and C-terminus (last 15 aa) of the input sequence.
📝 Note: Gravy hydrophobicity index is not calculated if non-canonical amino acids are present in the given sequence chunk (N-terminus or C-terminus defined as the first/last 15 residues, respectively).

1.2. Visualization panel



image-responsive
Visualization of the query protein overlaid with found degron motifs (if degron motifs overlap, as in the above example, more than one lane with visualized degrons is present), positions of lysines, IDRs (see How are IDRs defined?), coils (see How is secondary structure calculated?), buried residues (see How are buried residues defined?), and PTMs, with particular focus on ubiquitination and phosphorylation. We also show pathogenic missense mutations as circles, where their size corresponds to the number of mutations occuring on a given position.

It is possible to zoom the sequence (please see the top right button for help) and download it (top left button). The bottom axis represents sequence indices.

The FeatureViewer tool was used to build the visualization.

1.3. N-/C-termini stability data



image-responsive
Visualization of the experimental PSI values for for N-/C-termini in the context of experimental data distribution. For N-termini, experimental PSI was measured in ranges 1-6, while for C-termini in ranges 1-4.

PSI values for N-terminus provide information on the experimental stability of the first 23 residues/24 residues of the protein depending on whether PSI was measured for the cleaved initiator methionine or not, respectively. More about co-translational methionine cleavage, when it occurs and the associated Ac/N-degron pathway can be found e.g. in this review.
PSI value for C-terminus provides information on experimental stability of the last 23 residues.

The higher the PSI value, the more stable the terminus is.

📝 Note: currently available only for human proteins
📝 Note 2: PSI is reported based on N-/C-terminus identity with the experimental data, not on protein name due to possible nomenclature inconsistencies (e.g. human Protein A holds experimental C-terminal PSI value, but we query human Protein B whose name is absent in the experimental dataset but which has identical C-terminal peptide as Protein A - thus we report identical C-terminal PSI value for Protein B as for Protein A).

If you would like to run our Machine Learning N-/C-termini stability predictions, please query by sequence or structure.


1.4. General degron information panel



image-responsive
  1. Found degron motif (see How are degron motifs found?) presented as a regular expression (regex) (see What is a regular expression?)
  2. Taxons in which the degron motif was found and validated (please note that taxons may be different from the taxon from which the query protein comes from)
  3. Location of the degron motif (N-terminus/C-terminus/Internal)
  4. Sequence position of the degron motif
  5. Type of the degron motif as reported in the literature
  6. Information of the ubiquitin-proteasome system (UPS) components recognizing this degron motif
  7. Literature references regarding the UPS components recognizing this degron motif
  8. Any additional information on the degron motif
  9. Literature references regarding the additional information on the degron motif

💡 How are degron motifs found?
We screen the protein for the presence of known degron motifs collected from the literature (Yan et al., 2021; Chen et al., 2021; Varshavsky, 2019; Timms et al., 2019; Koren et al. 2018; Guharoy et al., 2016; Maurer et al., 2016) using the regular expressions.

💡 What is a regular expression?
Regular expression is a search pattern allowing for text screening to check its presence.
📝 Note: we report human-readable versions of the used regexes

e.g.

[AVP]x[ST][ST][ST]
means that there are 5 characters in the pattern
  1. first character: A or V or P
  2. second character: any (x indicates any character)
  3. third character: S or T
  4. fourth character: S or T
  5. fifth character: S or T
F[^P]{3}W[^P]{2,3}[VIL]
means that there are eight or nine characters in the pattern
📝 Note: {} brackets indicate number of occurrence
  1. first character: F
  2. second character: any except P
  3. third character: any except P
  4. fourth character: any except P
  5. fifth character: W
  6. sixth: any except P
  7. seventh character: any except P
  8. eight character: either continuation of previous any except P, or if V or I or L would occur, this will be the final character
  9. ninth character - only if previous character was not V or I or L: V or I or L
FSDLWKLL
the motif has to exactly match the pattern

^M{0,1}([ED])x
means that there are three characters in the pattern
📝 Note: {} brackets indicate number of occurrence
📝 Note 2: ^ indicates that the pattern has to match the very beginning of the sequence
  1. first character: M occurs or does not occur
  2. second character: E or D
  3. third character: any
KxxR$
means that there are four characters in the pattern
📝 Note: $ indicates that the pattern has to match the very end of the sequence
  1. first character: K
  2. second character: any
  3. third character: any
  4. fourth: R

1.5. Structural degron information panel

📝 Note: Structural information is based on the corresponding AlphaFold model of the query protein.


image-responsive
  1. Found degron motif (see How are degron motifs found?) presented as a regular expression (regex) (see What is a regular expression?)
  2. Location of the degron motif (N-terminus/C-terminus/Internal)
  3. Sequence position of the degron motif
  4. Secondary structure (see How is secondary structure calculated?) of the degron motif
  5. Mean Relative Solvent Accessibility (RSA) (see How are buried residues defined?) of the degron’s residues
  6. Indicates if each of the degron’s residue is disordered (as per defined ⚙️ pLDDT/LDDT disorder threshold).
    Please note, the degron may not be in the IDR region (see How are IDRs defined?), but all its residues may still be disordered
  7. Degron mean disorder score - mean pLLDT calculated for the region as defined by ⚙️ Region length to calculate degron disorder

💡 How is secondary structure calculated?
The secondary structure is calculated using the mkdssp software

The symbols are as follows:

Symbol Secondary structure
H Alpha helix (4-12)
B Isolated beta-bridge residue
E Strand
G 3-10 helix
I Pi helix
T Turn
S Bend
- Coil

💡 How are IDRs defined?
IDRs are defined as a continuous region of minimum defined number of subsequent residues (see ⚙️ Minimum continuous IDR length) with pLLDT/LDDT scores below the defined threshold (see ⚙️ pLDDT/LDDT disorder threshold).

💡 How are buried residues defined?
Relative solvent accessibility (RSA) of a protein residue is a measure of its solvent exposure.

We calculate RSA with the mkdssp software and normalize its outputs using the Sander normalization method. RSA values are in the range 0-1, where lower values indicate more buried residues. We define buried residues based on the calculated RSA values below the defined threshold (see ⚙️ Buried residue threshold).

1.6. Post-translational modifications (PTMs) panel


The PTMs datasets were obtained from:

In total, DEGRONOPEDIA provides up to 25 different PTMs.

1.6.1. PTMs within degron


image-responsive
  1. Found degron motif (see How are degron motifs found?) presented as a regular expression (regex) (see What is a regular expression?)
  2. Location of the degron motif (N-terminus/C-terminus/Internal)
  3. Sequence position of the degron motif
  4. Indicates whether phosphorylation occurs within the degron
  5. List of PTMs occurring within the degron
  6. Position of the PTMs
  7. References regarding the PTMs

1.6.2. PTMs within degron's flanking regions
See Degron flanking regions in sequence and structure for the detailed explanation of the degron's flanking region definitions.


image-responsive
  1. Found degron motif (see How are degron motifs found?) presented as a regular expression (regex) (see What is a regular expression?)
  2. Location of the degron motif (N-terminus/C-terminus/Internal)
  3. Sequence position of the degron motif
  4. List of PTMs occurring within the degron’s flanking region in sequence
  5. Position of the PTMs at the degron’s flanking region in sequence
  6. References regarding the PTMs at the degron’s flanking region in sequence
  7. List of PTMs occurring within the degron’s flanking region in structure
  8. Position of the PTMs at the degron’s flanking region in structure
  9. References regarding the PTMs at the degron’s flanking region in structure

💡 Degron flanking regions in sequence and structure
We distinguish two degron flanking regions:
Degron flanking region in sequence: amino acids downstream and upstream the degron motif (excluding the degron motif itself) within the defined threshold (see ⚙️ Degron flanking region in sequence)

Degron flanking region in structure: residues within the defined 3D proximity (see ⚙️ Degron flanking region in structure) to the degron motif (excluding the degron motif itself)

1.7. Mutations panel


All pathogenic missense mutations (we report only those known to be pathogenic to reduce data noise) were obtained from the COSMIC database.
📝 Note: mutation data is only available when querying human proteins

1.7.1. Mutations within degron


image-responsive
  1. Found degron motif (see How are degron motifs found?) presented as a regular expression (regex) (see What is a regular expression?)
  2. Location of the degron motif (N-terminus/C-terminus/Internal)
  3. Sequence position of the degron motif
  4. Names of the occurring pathogenic missense mutation within the degron

1.7.2. Mutations within degron's flanking regions
See Degron flanking regions in sequence and structure for the detailed explanation of the degron's flanking region definitions.

📝 Note: we report pathogenic missense mutations only within PTMs at the degron flanking regions in sequence/structure, not at the entire flanking regions, to reduce data noise


image-responsive
  1. Found degron motif (see How are degron motifs found?) presented as a regular expression (regex) (see What is a regular expression?)
  2. Location of the degron motif (N-terminus/C-terminus/Internal)
  3. Sequence position of the degron motif
  4. Names of the occurring pathogenic missense mutation within the PTMs at the degron flanking region in sequence
  5. Names of the occurring pathogenic missense mutation within the PTMs at the degron flanking region in structure

1.8. Secondary and tertiary degrons panel

We report not only all the occuring known degron motifs in the query protein, but also secondary and tertiary degrons according to the Degron tripartite model.

💡 Secondary degrons
In DEGRONOPEDIA, we consider not only lysines (K) as potential secondary degrons but also cysteines (C), threonines (T) and serines (S), since ubiquitination may occur on these amino acids (see review about non-canonical ubiquitination).

Secondary degrons (also K/C/T/S) are searched in the Degron flanking regions in sequence and structure
📝 Note: they are searched in the structure upon its availability

image-responsive
💡 Tertiary degrons
Tertiary degrons are searched only if the structure is available in regard to the ⚙️ Minimum IDR distance from K/C/T/S.
📝 Note: we report only the closest IDR to each secondary degron (both in sequence and structure)

image-responsive

1.8.1. Secondary and tertiary degrons in sequence


image-responsive
  1. Found degron motif (see How are degron motifs found?) presented as a regular expression (regex) (see What is a regular expression?)
  2. Location of the degron motif (N-terminus/C-terminus/Internal)
  3. Sequence position of the degron motif
  4. Name of the secondary degrons in the degron flanking region in sequence
  5. Secondary degron’s position
  6. Secondary degron’s symbols: Ub - ubiquitinated, DEG - degradation-leading ubiquitination, B - buried
  7. References of secondary degron’s symbols (database/PMID/software)
  8. Secondary degron’s secondary structure
  9. Secondary degron’s mean disorder score - mean pLLDT score of the secondary degron’s defined neighborhood (see ⚙️ Region length to calculate K/C/T/S disorder )
  10. List of PTMs occurring on the secondary degron
  11. References regarding the PTMs
  12. Pathogenic missense mutations in the secondary degron
  13. Indices of the IDR closest in sequence to the secondary degron
  14. Distance from the secondary degron to its closest in sequence IDR

1.8.2. Secondary and tertiary degrons in structure


image-responsive
  1. Found degron motif (see How are degron motifs found?) presented as a regular expression (regex) (see What is a regular expression?)
  2. Location of the degron motif (N-terminus/C-terminus/Internal)
  3. Sequence position of the degron motif
  4. Name of the secondary degrons in the degron flanking region in structure
  5. Secondary degron’s position
  6. Secondary degron’s symbols: Ub - ubiquitinated, DEG - degradation-leading ubiquitination, B - buried
  7. References of secondary degron’s symbols (database/PMID/software)
  8. Secondary degron’s secondary structure
  9. Secondary degron’s mean disorder score - mean pLLDT score of the secondary degron’s defined neighborhood (see ⚙️ Region length to calculate K/C/T/S disorder )
  10. List of PTMs occurring on the secondary degron
  11. References regarding the PTMs
  12. Pathogenic missense mutations in the secondary degron
  13. Indices of the IDR closest in structure to the secondary degron
  14. Distance from the secondary degron to its closest in structure IDR

1.9. Degrons emerging upon proteolysis panel


Protein turnover may be regulated by different proteolytic enzymes that cleave the protein, leading to the emergence of new N- and C-terminus, which may act as degrons.

In DEGRONOPEDIA, we simulate the cleavage of a query protein based on the user-defined cleavage sites, experimentally-validated cleavage sites derived from the MEROPS database and predicted cleavage sites for 35 different proteolytic enzymes using the Pyteomics module, which implements the cleavage prediction rules of the PeptideCutter Expasy web server. Each newly emerged N-/C-termini is then screened for the presence of known degron motifs.

📝 Note 1: when defining own cleavage sites e.g. as 80, the cleavage occurs after the given site (see picture below).
📝 Note 2: degrons are searched in the emerged peptide providing its length is min. 50 amino acids.

image-responsive

1.9.1. Degrons upon user-defined cleavage sites
The first column contains the information about the user-defined cleavage site. All the other columns are analogous as described in General degron information panel.

1.9.2. Degrons upon experimentally validated cleavage sites
The first two columns contain the information about the proteolytic enzyme and its cleavage site, respectively, obtained from the MEROPS database. All the other columns are analogous as described in General degron information panel.

1.9.3. Degrons upon predicted cleavage sites
The first two columns contain the information about the proteolytic enzyme and its cleavage site, respectively, obtained as a result of cleavage sites’ predictions using the Pyteomics module, which implements the cleavage prediction rules of the PeptideCutter Expasy web server. All the other columns are analogous as described in General degron information panel.

1.10. Interacting E3s panel


Degrons act as the binding site for E3 ubiquitin ligases. Therefore, we report E3s known to interact with the query protein based on interactome data from the BioGRID, IntAct, and UbiNet 2.0 databases as well as from the literature (Oh et al., 2017; Weaver et al., 2017; Koren et al., 2018).

image-responsive
  1. UniProt ID of the interacting E3
  2. Gene symbol of the interacting E3
  3. Taxon from which the E3 comes from
  4. References

2. Query by sequence


Yields the least output information as compared to query by UniProt ID or structure since no structure nor data on PTMs/mutations/experimental proteolytic sites are available.

However, users may also mark the checkbox to predict the stability of the N-/C-termini of the query protein using our machine learning classifiers (see Predicting N-/C-termini stability).

2.1. Overview panel



image-responsive
  1. Basic information about the protein from the UniProt database
  2. Gravy hydrophobicity index calculated for N-terminus (first 15 aa) and C-terminus (last 15 aa) See What is Gravy hydrophobicity index?
  3. Summary of the number of found degron motifs in the query protein, and in proteolysis simulations (based on user-defined sites (if submitted) and cleavage sites’ predictions)
  4. Link to download results as xlsx file


2.2. Visualization panel



image-responsive
Basic visualization of the degron motifs location in the sequence (if degron motifs overlap, as in the above example, more than one lane with visualized degrons is present) and lysines' positions.

It is possible to zoom the sequence (please see the top right button for help) and download it (top left button). The bottom axis represents sequence indices.

The FeatureViewer tool was used to build the visualization.

2.3. N-/C-terminus stability predictions



image-responsive
Visualization of the predicted PSI values in the context of experimental data distribution. For N-termini, experimental PSI was measured in ranges 1-6, while for C-termini in ranges 1-4.

PSI values for N-terminus provide information on the predicted stability of the first 23 residues/24 residues of the protein depending on whether the initiator methionine was cleaved or not, respectively. Both such situations are simulated if the input sequence starts with methionine; if not, only one PSI value is predicted (for the case with initiator methionine absent, as we do not simulate adding methionine, only its cleavage). More about co-translational methionine cleavage, when it occurs and the associated Ac/N-degron pathway can be found e.g. in this review.
PSI value for C-terminus provides information on predicted stability of the last 23 residues.

The higher the PSI value, the more stable the terminus is.
When interpreting the obtained PSI predictions, please refer to the provided distributions of the experimental PSI values for each terminus.

📝 Note: this panel appears if "Predict Protein Stability Index (PSI) for N-/C-termini" box was checked upon query.
📝 Note 2: We recommend running the N-/C-termini stability predictions only on proteins from higher mammals, as our ML models were trained on stability datasets of human proteins.
📝 Note 3: For the N-terminus, regardless whether we consider initiator methionine as cleaved or not, the descriptors for Machine Learning are calculated excluding the starting methionine (if it is present in the sequence; if not, only one PSI value is predicted for the case when methionine was cleaved).
💡 Machine Learning N-/C-termini stability predictions
The Machine Learning models were developed based on the Protein Stability Index (PSI) values measured for 24/23-mers covering N-/C-termini of the nearly complete human proteome (Koren et al., 2018; Timms et al., 2019) using the CatBoost regressor.
The performance of the final models was evaluated using the testing set and an R2 coefficient, reaching the values of 0.796/0.812 for the N-terminus with initiator methionine cleaved/not cleaved, respectively, and 0.815 for the C-terminus (the highest possible value of R2 coefficient is 1).

See our publication for more details.

2.4. General degron information panel


Contains the same information as in General degron information panel described for Query by UniProt ID.

2.5. Secondary degrons panel


See Secondary degrons

Contains columns 1-7 as in Secondary and tertiary degrons in sequence described for Query by UniProt ID.
📝 Note: since no structural data is available, and hence there is no information on residues' disorder, no tertiary degrons are reported.

2.6. Degrons emerging upon proteolysis panel


The rationale behind this analysis is described in Degrons emerging upon proteolysis.

See Degrons upon predicted cleavage for output description.
📝 Note: since no experimental data are available when querying by sequence, proteolysis is simulated upon user-defined (if submitted) and predicted cleavage sites.

3. Query by structure


Provides a moderate amount of information, including tripartite degron model analysis, but not as complete as query by UniProt ID, since no experimental data such as PTMs or mutations is available.

Similarily as when querying by sequence, users may also mark the checkbox to predict the stability of the N-/C-termini of the query protein using our machine learning classifiers (see Predicting N-/C-termini stability).

3.1. Overview panel


See Overview panel as when querying by sequence for the description.

3.2. Visualization panel



image-responsive
Visualization of the query protein overlaid with found degron motifs (if degron motifs overlap, as in the above example, more than one lane with visualized degrons is present), positions of lysines, IDRs (see How are IDRs defined?), coils (see How is secondary structure calculated?), and buried residues (see How are buried residues defined?).

It is possible to zoom the sequence (please see the top right button for help) and download it (top left button). The bottom axis represents sequence indices.

The FeatureViewer tool was used to build the visualization.

3.3. N-/C-terminus stability predictions


Visualization of the predicted PSI values as in N-/C-terminus stability predictions described for Query by sequence.
📝 Note: this panel appears if "Predict Protein Stability Index (PSI) for N-/C-termini" box was checked upon query.

3.4. General degron information panel


Contains the same information as in General degron information panel described for Query by UniProt ID.

3.5. Structural degron information panel


Contains the same information as in Structural degron information panel described for Query by UniProt ID.

3.6. Secondary and tertiary degrons panel


See Secondary degrons and Tertiary degrons.

3.6.1. Secondary and tertiary degrons in sequence

Contains columns 1-9 and 13-14 as in Secondary and tertiary degrons in sequence described for Query by UniProt ID.
📝 Note: since no experimental data is available when querying by structure, no PTMs nor mutation data are available.

3.6.2. Secondary and tertiary degrons in structure

Contains columns 1-9 and 13-14 as in Secondary and tertiary degrons in structure described for Query by UniProt ID.
📝 Note: since no experimental data is available when querying by structure, no PTMs nor mutation data are available.

3.7. Degrons emerging upon proteolysis panel


The rationale behind this analysis is described in Degrons emerging upon proteolysis.

See Degrons upon predicted cleavage for output description.
📝 Note: since no experimental data are available when querying by structure, proteolysis is simulated upon user-defined (if submitted) and predicted cleavage sites.

Download


Regardless of the query type, it is possible to download an xlsx file containing all the data, divided into several sheets, as reported by the web server.
📝 Note: an xlsx file can be opened e.g. in Excel.