Welcome to the Tutorial!



DEGRONOPEDIA is an online resource that allows inspection and visualization of known degron motifs in the proteomes of selected model organisms, as well as in a user-submitted sequence or structure, and many more.


Table of Contents


QuicTour
Introduction
Inputs
Types of query
Other input data
Customizable settings
1. Primary degron-related
2. Secondary degron-related
3. Tertiary degron-related
4. Structure-related
5. Multiple Sequence Alignment (MSA)-related
Outputs
Overview
Download
Implementation details
Screening for degron motifs
What is a regular expression?
Gravy hydrophobicity index
Protein Stability Index (PSI)
N-/C-termini stability data
Machine Learning
Standalone software
Degron conservation scores
Structural features
How is disorder predicted?
How are intrinsically disordered regions defined?
How is secondary structure calculated?
How are buried residues defined?
Post-translational modifications
Mutations
Implementation of the tripartite degron model
Tripartite degron model
What are degron flanking regions?
How are secondary and tertiary degrons found?
Proteolysis simulations
E3 interactomes
References

QuicTour


DEGRONOPEDIA provides the following information:


See DEGRONOPEDIA in action ๐ŸŽฌ




Introduction


The maintenance of proteostasis requires the degradation of damaged or unwanted proteins and plays a key role in cell function, the growth of organisms and, ultimately, its viability. The ubiquitin-proteasome system (UPS) manages protein degradation through a process known as ubiquitination, in which a small ubiquitin protein is attached to its target. Ubiquitination is mediated by an enzymatic cascade involving ubiquitin-activating enzymes (E1), ubiquitin-conjugating enzymes (E2) and ubiquitin ligases (E3). The proteasome complex recognizes ubiquitinated proteins and, through proteolysis, degrades them into short peptides that can be further processed.

Degron is a short linear motif on a protein of interest (POI) recognized by E3 ubiquitin ligases. N- and C-termini of protein may act as degron sites, but internal degrons, often within intrinsically disordered regions (IDRs), are also possible.
image-responsive
Figure 1. Scheme of a degron site.
E2 - ubiquitin-conjugating enzyme, E3 - ubiquitin ligase, Ub - ubiquitin, POI - protein of interest.

Inputs


Types of query


There are four allowed query types:

  1. UniProt ID - UniProt ID of a protein (min. 50 amino acids long) from the canonical UniProt proteome of one of the 11 selected model organisms: H. sapiens, M. musculus, R. norvegicus, D. rerio, D. melanogaster, C. elegans, S. cerevisiae, S. pombe, A. thaliana, O. sativa, Z. mays
  2. Sequence - protein sequence in FASTA format, between 50 and 40,000 amino acids long, containing only 20 canonical amino acids
  3. Structure - protein monomer structure in the PDB format
    • with only one model and one chain
    • with continous numbering starting from 1 (to avoid inconsistency with overlaying data on the sequence; you can easily renumber your structure here)
    • between 50 and 40.000 amino acids long
    • containing 20 canonical amino acids only
    • not exceeding 5 MB size
  4. Structure + UniProt ID - protein monomer structure in the PDB format meeting the same criteria as described above with additionally passed UniProt ID, which, if matching the structure and is present in our database, results in the display of post-translational modifications, mutations and all other data related to the query - it mimics the "Query by UniProt ID", but calculations are performed for the structure submitted by the user
๐Ÿ“ Note: Regardless of the query type, it is possible to submit only one protein at a time.


Other input data


Other data possible to pass:


Customizable settings


Several parameters can be customized that affect the calculated results. See also the concept of a tripartite degron model.

1. Primary degron-related


1.1. Degron flanking region in sequence
Definition: the maximum sequence distance to regions upstream and downstream of the degron motif to be considered as flanking
Applies: to all query types
Unit: aa
Default value: 20
Allowed values: 5-40
image-responsive
Figure 2. Scheme of the degron flanking region in sequence (defined as 20 aa).

1.2. Degron flanking region in structure
Definition: the maximum structural distance to residues around the degron motif to be considered as flanking (note that such residues are not necessarily close in sequence to the degron motif)
Applies: to query by UniProt ID or structure
Unit: ร…
Default value: 20
Allowed values: 5-40
image-responsive
Figure 3. Scheme of the degron flanking region in structure (defined as 20 ร…).

1.3. Region length to calculate degron disorder
Definition: the maximum sequence distance to regions upstream and downstream of the degron motif to be included in the degron mean disorder score
Applies: to all query types
Unit: aa
Default value: 10
Allowed values: 1-20
image-responsive
Figure 4. Scheme of the region to calculate degron disorder (defined as 10 aa).

2. Secondary degron-related


2.1. Region length to calculate secondary degron (K/C/T/S) disorder
Definition: the maximum sequence distance to regions upstream and downstream of the secondary degron (K/C/T/S) to be included in the secondary degron mean disorder score
Applies: to all query types
Unit: aa
Default value: 3
Allowed values: 1-15
image-responsive
Figure 5. Scheme of the region to calculate secondary degron disorder (defined as 3 aa).

3. Tertiary degron-related


3.1. Minimum IDR distance from the secondary degron (K/C/T/S)
Definition: the minimum sequence distance from the secondary degron (K/C/T/S) to the continuous intrinsically disordered region (IDR) of a defined length (see Minimum continuous IDR length) to consider it as a tertiary degron
Applies: to all query types
Unit: aa
Default value: 10
Allowed values: 5-40
image-responsive
Figure 6. Scheme of different IDRs and their relation to the secondary degron
(with both Minimum IDR distance from the secondary degron (K/C/T/S) and Minimum continuous IDR length defined as 10 aa).

4. Structure-related


4.1. Minimum continuous IDR length
Definition: the minimum number of subsequent (in sequence) disordered residues to be considered as an intrinsically disordered region (IDR)
Applies: to all query types
Unit: aa
Default value: 10
Allowed values: 5-40
Example: when defined as 10 aa, minimum 10 disorder residues must appear one after another in sequence to recognize them as IDR
See also: How is disorder predicted?

4.2. pLDDT/LDDT disorder threshold
Definition: the threshold below which the residue is considered as disordered based on its confidence pLDDT/LDDT (predicted Local Distance Difference Test/Local Distance Difference Test) score. This structure for which its disorder is to be predicted based on the pLDDT/LDDT scores must be either an AlphaFold2 or RoseTTAFold model.
Applies: to query by UniProt ID or structure
Unit: %; pLDDT scores (present in AlphaFold2 models) are in the range of 1-100 and LDDT scores (present in RoseTTAFold models) are in the range of 0-1, so in order to handle both cases, this paramater is defined as %
Default value: 70
Allowed values: 40-90
Example: when defined as 70%, all residues with mean pLDDT/LDDT score (this score should be the same for each atom of the residue, nevertheless the residue mean for all atoms is always calculated) below 70/0.7 (Alphafold2 model/RoseTTAFold model) are considered as disordered
See also: How is disorder predicted?

4.3. IUPred3 disorder threshold
Definition: the threshold above which the residue is considered as disordered based on predictions obtained from the IUPred3 software (sequence-based predictions)
Applies: to all query types
Unit: %
Default value: 50
Allowed values: 40-90
Example: when defined as 50%, all residues with the score predicted by IUPred3 above this value are considered as disordered
See also: How is disorder predicted?

4.4. Buried residue threshold
Definition: the threshold below which the residue is considered as buried based on its Relative Solvent Accessibility (RSA) calculated with the DSSP software and normalized using the Sander method
Applies: to query by UniProt ID or structure
Unit: %
Default value: 20
Allowed values: 5-60
Example: when defined as 20%, all residues with the RSA value below 0.2 are considered as buried
See also: How are buried residues defined?

5. Multiple Sequence Alignment (MSA)-related


5.1. Maximum distance from the degron motif in query to consider the same degron motif from the ortholog as conserved
Definition: the maximum distance to regions upstream and downstream of the degron motif (its ends) in the query to its orthologs in the Multiple Sequence Alignment (MSA) to consider it as evolutionarily conserved
Applies: to all query types
Unit: aa
Default value: 5
Allowed values: 5-40
See also: Degron conservation scores
image-responsive
Figure 7. Scheme of the occurrence of the degron motif in the MSA and its recognition as conserved with respect to the distance from the degron motif in the query (defined as 5 aa).
Gaps in the alignment within the degron motifs are marked with horizontal dashes.

Outputs


Overview


Depending on the input type, different granularity of degron-related output information is provided.

  1. Query by UniProt ID - provides the most comprehensive information about degron motifs in a query, including full tripartite degron model, as additional information about evolutionary conservation, structural context, post-translational modifications or mutations, is superimposed on the degron data
  2. Sequence - gives the least amount of output information compared to query by UniProt ID or structure, as no structure or post-translational modifications/mutations/experimental proteolytic site data are available (although information on disordered regions is present as predicted based on the query sequence using the IUPred3 software)
  3. Structure - provides a moderate amount of information, including a full tripartite degron model, but not as complete as the query by UniProt ID, because experimental data such as post-translational modifications or mutations are unavailable
  4. Structure + UniProt ID - provides the same information as the "Query by UniProt ID"

image-responsive
Figure 8. The amount of output information depends on the type of input.

image-responsive
Figure 9. Comparison of the result information obtained upon different query types in the DEGRONOPEDIA.

Download


Regardless of the query type, it is possible to download an xlsx file with all the data, divided into separate sheets. Look for the icon at the top of each result page.

Implementation details


Screening for degron motifs


Each query sequence is screened for the presence of known degron motifs, collected from the literature, using the regular expressions.

What is a regular expression?

Regular expression is a search pattern allowing for text screening to check its presence; see examples below.

[AVP]x[ST][ST][ST]
means that there are 5 characters in the pattern
  1. first character: A or V or P
  2. second character: any (x indicates any character)
  3. third character: S or T
  4. fourth character: S or T
  5. fifth character: S or T
F[^P]{3}W[^P]{2,3}[VIL]
means that there are eight or nine characters in the pattern
๐Ÿ“ Note: {} brackets indicate number of occurrence.
  1. first character: F
  2. second character: any except P
  3. third character: any except P
  4. fourth character: any except P
  5. fifth character: W
  6. sixth: any except P
  7. seventh character: any except P
  8. eight character: either continuation of previous any except P, or if V or I or L would occur, this will be the final character
  9. ninth character - only if previous character was not V or I or L: V or I or L
FSDLWKLL
the motif has to exactly match the pattern

^M{0,1}([ED])x
means that there are three characters in the pattern
๐Ÿ“ Note: {} brackets indicate number of occurrence.
๐Ÿ“ Note 2: ^ indicates that the pattern has to match the very beginning of the sequence
  1. first character: M occurs or does not occur
  2. second character: E or D
  3. third character: any
KxxR$
means that there are four characters in the pattern
๐Ÿ“ Note: $ indicates that the pattern has to match the very end of the sequence.
  1. first character: K
  2. second character: any
  3. third character: any
  4. fourth character: R

Gravy hydrophobicity index


Gravy (grand average of hydropathy) hydrophobicity index is calculated by adding the hydropathy value for each residue and dividing by the length of the sequence3.
Its higher values indicate that a sequence is more hydrophobic. In DEGRONOPEDIA, it is calculated for N-terminus (first 15 aa) and C-terminus (last 15 aa) of the input sequence.

Interpretation: hydrophobic regions often determine the specificity for recognition by chaperones and protein quality control E3s, but they are less likely to be recognized by cullin-RING E3 ligases4-7.

Protein Stability Index (PSI)


N-/C-termini stability data


The reported Protein Stability Index (PSI) values are from two large-scale studies involving the Global Protein Stability technology to measure the stability of 23 long peptides spanning the N- and C-terminus of the human proteome8-9.

The PSI values for the N-terminus provide information about the experimental stability of the first 23 residues/24 residues of the protein, depending on whether PSI was measured for the cleaved initiator methionine or not, respectively (for more on the co-translational cleavage of methionine, when it occurs and the associated Ac/N-degron pathway, see, for example, this review). The PSI value for the C-terminus provides information about the experimental stability of the last 23 residues of the protein. Information on experimental PSI values is currently only available for human proteins.

Interpretation: the higher the PSI value, the more stable the terminus is. Please refer to the provided distributions of the experimental data for each terminus.

image-responsive
Figure 10. Visualization of experimental PSI values for N-/C-termini in the context of experimental data distribution.
For the N-termini, the experimental PSI was measured in ranges of 1-6, while for the C-termini in ranges of 1-4.

๐Ÿ“ Note: PSI is reported by the identity of the N-/C-terminus with the experimental data, not by the name of the protein due to possible inconsistencies in nomenclature (e.g., human protein A has an experimental C-terminal PSI value, but we are querying human protein B, whose name is absent from the experimental dataset, but which has an identical C-terminal peptide to protein A - so we report an identical C-terminal PSI value for protein B to that of protein A).

Machine Learning


Machine Learning models for predicting Protein Stability Index (PSI) values for the N-/C-terminus of a query were developed based on experimental stability datasets for a 24/23-mer covering the N-/C-terminus of the human proteome (see N-/C-termini stability data) using the CatBoost regressor method. The performance of the final models was evaluated using the testing set and an R2 coefficient, reaching the values of 0.796/0.812 for the N-terminus with initiator methionine cleaved/not cleaved, respectively, and 0.815 for the C-terminus (the highest possible value of R2 coefficient is 1). See our publication for more details.

โ— We recommend running N-/C-termini stability predictions only on proteins from higher mammals, as our models were trained on human protein stability datasets.

Interpretation: the higher the PSI value, the more stable the terminus is. Please refer to the provided distributions of the experimental data for each terminus.

image-responsive
Figure 11. Visualization of experimental PSI values for N-/C-termini in the context of experimental data distribution.
For the N-termini, the experimental PSI was measured in ranges of 1-6, while for the C-termini in ranges of 1-4.

๐Ÿ“ Note: If the initiator methionine is absent, only one PSI value is predicted for the case when it is cleaved.

Standalone software

Users interested in using our Machine Learning models to perform high-throughput predictions of protein N-/C-termini stability can use our standalone software available at github.com/filipsPL/degronopedia-ml-psi.

Degron conservation scores


Upon Multiple Sequence Alignment (MSA) availability, four different degron conservation scores can be calculated (see scheme below), providing insight into the degron motif conservation among orthologs.

image-responsive
Figure 12. Scheme on calculating different degron conservation scores.
Maximum distance from the degron motif in the query to its orthologs to consider them as conserved was defined as 5 aa.
Gaps in the alignment within the degron motifs are marked with horizontal dashes.

DEGRONOPEDIA integrates pre-calculated MSAs of orthologs, obtained from the eggNOG5 database, at various evolutionary distances, which are available when querying by UniProt ID or by structure + UniProt ID. Regardless of the query type, the user can also submit their custom MSA in FASTA format containing no more than 200 sequences to check the conservation of each degron motif found in the query.

The maximum distance to regions upstream and downstream of the ends of the degron motif in the query to its orthologs in the MSA to consider it as evolutionarily conserved is defined in the Maximum distance from the degron motif in query to consider the same degron motif from the ortholog as conserved parameter.

Table 1. Pre-calculated MSAs of orthologs at various evolutionary distances from the eggNOG5 database available for selected model organisms. NCBI taxonomy identifiers are given in parentheses.
Species Orthologous Group 1 Orthologous Group 2 Orthologous Group 3 Orthologous Group 4 Orthologous Group 5 Orthologous Group 6
H. sapiens Hominidae (9604) Euarchontoglires (314146) Mammalia (40674) Vertebrata (7742) Ophistokonta (33154) Eukaryota (2759)
R. norvegicus Rodentia (9989) Euarchontoglires (314146) Mammalia (40674) Vertebrata (7742) Ophistokonta (33154) Eukaryota (2759)
M. musculus Rodentia (9989) Euarchontoglires (314146) Mammalia (40674) Vertebrata (7742) Ophistokonta (33154) Eukaryota (2759)
D. rerio Actinopterygii (7898) - - Vertebrata (7742) Ophistokonta (33154) Eukaryota (2759)
D. melanogaster Drosophilidae (7214) Diptera (7147) Insecta (50557) - Ophistokonta (33154) Eukaryota (2759)
C. elegans Rhabditida (6236) - - - Ophistokonta (33154) Eukaryota (2759)
S. cerevisiae Saccharomycetaceae (4893) Saccharomycetes (4891) Ascomycota (4890) Fungi (4751) Ophistokonta (33154) Eukaryota (2759)
S. pombe Taphrinomycotina (451866) Saccharomycetes (4891) Ascomycota (4890) Fungi (4751) Ophistokonta (33154) Eukaryota (2759)
A. thaliana Brassicales (3699) - Viridiplantae (33090) - - Eukaryota (2759)
O. sativa Poales (38820) - Viridiplantae (33090) - - Eukaryota (2759)
Z. mays Poales (38820) - Viridiplantae (33090) - - Eukaryota (2759)


Structural features


How is disorder predicted?


There are two options to predict disordered regions in the query:
  1. based on the pLDDT/LDDT values of the model (applies to query by UniProt ID or structure)
  2. based on the sequence-based prediction from the IUPred3 software (applies to all query types)

Disorder predictions based on the pLDDT/LDDT values

โ— Requires calculations to be performed on AlphaFold2/RoseTTAFold model - B-factor column in PDB file must contain valid pLDDT/LDDT scores.

The pLDDT (Predicted Local Distance Difference Test; ranges 0-100) or LDDT (Local Distance Difference Test; ranges 0-1) score estimates the accuracy of the modeled residues, and those with pLDDT values above 70 are generally expected to be well modeled, while pLDDT below this value correlates with disordered regions10.

๐Ÿ“ Note: When uploading a RoseTTAFold model, we recommend using a model obtained from a local RosetTTAFold run, as its B-factor column contains the LDDT scores. Please do not directly upload a RosetTTAFold model obtained from the ROBETTA server, as its B-factor column holds the estimated RMSD error (although it is possible to convert these values to LDDT scores using, e.g., PHENIX software.

Disorder predictions based on the IUPred3 software

โ— Applies to any query type, as is IUPred3 predicts disorder based on query sequence.

IUPred3 predicts a disorder score, ranging from 0 to 1, for each amino acid in the sequence. The default threshold above which the residue can be considered as disordered is 0.5, according to the authors of this tool, but it can be adjusted in the IUPred3 disorder threshold parameter. You can read more about IUPred3 here.

๐Ÿ“ Note: IUPred3 is run as a standalone tool using default settings (long disorder is predicted).

How are intrinsically disordered regions defined?


Intrinsically disordered regions (IDRs) are defined as a continuous region with the minimum number (defined in the Minimum continuous IDR length parameter) of consecutive residues considered as disordered according to an appropriate threshold depending on the user's choice of disorder prediction method (this threshold is either defined in the parameter pLDDT/LDDT disorder threshold or IUPRED3 sequence-based predictions).

How is secondary structure calculated?


The secondary structure is calculated (not predicted) based on the protein structure using the DSSP software.

Table 2. The meanings of secondary structure symbols.
Symbol Secondary structure
H Alpha helix (4-12)
B Isolated beta-bridge residue
E Strand
G 3-10 helix
I Pi helix
T Turn
S Bend
- Coil

How are buried residues defined?


Relative solvent accessibility (RSA) of a protein residue is a measure of its solvent exposure. It is calculated using the DSSP software and normalized by the Sander's method. A residue is considered as buried if its RSA value is below the threshold defined in the Buried residue threshold parameter.

Interpretation: RSA values range from 0-1, where lower values indicate more buried residues.

Post-translational modifications


Post-translational modification (PTMs) datasets were obtained from the following sources, which combine high- and low-throughput experimental data:

In total, DEGRONOPEDIA provides up to 32 different PTMs.

๐Ÿ’ญ What is the importance of a degron being a phosphodegron?
Phosphodegron contains one or more phosphorylated residues which may modulate the degron's accessibility. See review on this topic.

Mutations


Missense mutations were obtained from the COSMIC database - the world's largest and most comprehensive resource of somatic mutations in human cancers.

๐Ÿ“ Note: Mutation data is only available when querying human proteins by UniProt ID (or when passing it along with the structure).

Implementation of the tripartite degron model


Tripartite degron model


Guharoy and colleagues1 suggested a tripartite degron model where the primary degron is a short linear motif recognized by an E3 ligase, localized preferentially within an intrinsically disordered region (IDR) of the protein. The secondary degron refers to lysines to which ubiquitin may be attached, and the tertiary degron is an IDR close to the secondary degron, which acts as an unfolding seed initiating proteasome-dependent protein degradation.

The secondary and tertiary degrons are suggested to play subsidiary roles that affect ubiquitin-signaling - lack of a component of the tripartite degron model, e.g., IDR near a ubiquitinated lysine can result in non-proteolytic ubiquitination functions.
image-responsive
Figure 13. The tripartite degron model.
Note that in our implementation, the secondary degron may not only be lysine (K), as ubiquitination can also occur on cysteines (C), serines (S) or threonines (T)2.

What are degron flanking regions?


We distinguish two degron flanking regions:

How are secondary and tertiary degrons found?


DEGRONOPEDIA reports not only all degron motifs present in the query protein, but also secondary and tertiary degrons according to the tripartite degron model.

Secondary degrons
We consider not only lysines (K) as potential secondary degrons but also cysteines (C), threonines (T) and serines (S), since ubiquitination may occur on these amino acids2. Secondary degrons (also referred as K/C/T/S) are searched within the degron flanking regions in sequence and structure.

๐Ÿ“ Note: Secondary degrons within the degron flanking regions in structure are NOT searched when querying by sequence.

image-responsive
Figure 14. Scheme on search location for secondary degrons.


Tertiary degrons
Tertiary degrons are searched within the distance from secondary degrons defined in the Minimum IDR distance from the secondary degron (K/C/T/S) parameter. Tertiary degrons close in sequence are reported for all query types (as intrinsically disordered regions (IDRs) can be predicted from the query sequence using IUPred3), but those close in structure are reported only for query by UniProt ID or structure.

๐Ÿ“ Note: Only the closest tertiary degron to each secondary degron is reported (both in terms of sequence and structure).

image-responsive
Figure 15. Scheme on reporting the tertiary degrons.


Proteolysis simulations


Protein turnover can be regulated by various proteolytic enzymes that cleave the protein, leading to the emergence of new N- and C-terminus that can act as degrons11.

DEGRONOPEDIA simulates the cleavage of a query based on a user-defined cleavage motif/site, experimentally validated cleavage sites derived from the MEROPS database (the largest resource of experimental proteolysis data) as well as from the literature, and predicted cleavage sites for 35 different proteolytic enzymes using the Pyteomics module, which implements the PeptideCutter Expasy web server cleavage prediction rules. Each newly emerged N-/C-termini is then screened for the presence of degron motifs.

๐Ÿ“ Note 1: When defining own cleavage sites e.g. as 80, the cleavage occurs after the given site (see picture below).
๐Ÿ“ Note 2: Degrons are searched in the emerged peptides providing their length is min. 50 amino acids.

image-responsive
Figure 16. Scheme on proteolysis simulation.


E3 interactomes


Since degrons act as a binding site for various E3 ubiquitin ligases, we report the E3s known to interact with the query based on interactome data from the:

References


  1. Guharoy, M., Bhowmick, P., Sallam, M. & Tompa, P. Tripartite degrons confer diversity and specificity on regulated protein degradation in the ubiquitin-proteasome system. Nat Commun 7, 10239 (2016).
  2. Squair, D. R. & Virdee, S. A new dawn beyond lysine ubiquitination. Nat Chem Biol 18, 802โ€“811 (2022).
  3. Kyte, J. & Doolittle, R. F. A simple method for displaying the hydropathic character of a protein. J Mol Biol 157, 105โ€“132 (1982).
  4. Hickey, C. M., Breckel, C., Zhang, M., Theune, W. C. & Hochstrasser, M. Protein quality control degron-containing substrates are differentially targeted in the cytoplasm and nucleus by ubiquitin ligases. Genetics 217, iyaa031 (2021).
  5. Kats, I. et al. Mapping Degradation Signals and Pathways in a Eukaryotic N-terminome. Molecular Cell 70, 488-501.e5 (2018).
  6. Stefanovicโ€Barrett, S. et al. MARCH6 and TRC8 facilitate the quality control of cytosolic and tailโ€anchored proteins. EMBO Rep 19, (2018).
  7. Culver, J. A., Li, X., Jordan, M. & Mariappan, M. A second chance for protein targeting/folding: Ubiquitination and deubiquitination of nascent proteins. BioEssays 44, 2200014 (2022).
  8. Koren, I. et al. The Eukaryotic Proteome Is Shaped by E3ย Ubiquitin Ligases Targeting C-Terminal Degrons. Cell 173, 1622-1635.e14 (2018).
  9. Timms, R. T. et al. A glycine-specific N-degron pathway mediates the quality control of protein N-myristoylation. Science 365, eaaw4912 (2019).
  10. Tunyasuvunakool, K. et al. Highly accurate protein structure prediction for the human proteome. Nature 596, 590โ€“596 (2021).
  11. Varshavsky, A. N-degron and C-degron pathways of protein degradation. Proc. Natl. Acad. Sci. U.S.A. 116, 358โ€“366 (2019).