Building ubiquitination machineries

ubiquitin ligase machineries are emerging as attractive therapeutic targets because they confer speciﬁcity to substrate ubiquitination and can be hijacked for targeted protein degradation. In this review, we bring to focus our current structural understanding of E3 ligase complexes, in particular the multi-subunit cullin RING ligases, and modulation thereof by small-molecule glues and PROTAC degraders. We highlight recent advances in elucidating the modular assembly of E3 ligase machineries, their diverse substrate and degron recognition mechanisms, and how these structural features impact on ligase function. We then outline the emergence of structures of E3 ligases bound to neo -substrates and degrader molecules, and highlight the importance of studying such ternary complexes for structure-based degrader design.

Building ubiquitination machineries: E3 ligase multi-subunit assembly and substrate targeting by PROTACs and molecular glues Sarath Ramachandran and Alessio Ciulli E3 ubiquitin ligase machineries are emerging as attractive therapeutic targets because they confer specificity to substrate ubiquitination and can be hijacked for targeted protein degradation. In this review, we bring to focus our current structural understanding of E3 ligase complexes, in particular the multi-subunit cullin RING ligases, and modulation thereof by small-molecule glues and PROTAC degraders. We highlight recent advances in elucidating the modular assembly of E3 ligase machineries, their diverse substrate and degron recognition mechanisms, and how these structural features impact on ligase function. We then outline the emergence of structures of E3 ligases bound to neo-substrates and degrader molecules, and highlight the importance of studying such ternary complexes for structure-based degrader design.

Introduction
The ubiquitin-proteasome system (UPS) regulates protein homeostasis and is garnering increasing attention as a therapeutic target owing to its role in several diseases including cancer and neurodegeneration [1]. The posttranslational addition of ubiquitin to substrate proteins is carried out sequentially by a cascade of three enzymes: E1-activating enzyme, which activates ubiquitin (Ub) in an ATP-dependent manner; E2-conjugating enzyme, to which the activated Ub is transferred via trans-esterification reaction; and E3 ubiquitin ligase, which catalyses the transfer of Ub from the E2 to a lysine residue on the substrate trough an isopeptide bond [2,3], although esterification reactions, for example, on threonine residues have been also reported [4].
E3 ubiquitin ligases can be branched into three classes: homologous to E6-AP C-terminus (HECT), really interesting new gene (RING) and RING-between-RING (RBR) ligases [5]. HECT E3s accept ubiquitin (Ub) from E2Ub to form a covalent thioester intermediate before transferring it on to the substrate [6]. In contrast, RING E3s brings E2Ub and substrate in close proximity to each other to mediate a direct transfer of ubiquitin to the substrate. The RBRs combine features of both HECT and RING families, as the N-terminal RING domain first recruits E2Ub conjugates and then transfers ubiquitin on to a HECT-type C-terminal catalytic cysteine residue before the final transfer on to the substrate [7]. Anaphasepromoting complex (APC/C) is a large (1.2 MDa) assembly of 11-13 proteins including a cullin (Apc2) and RING (Apc11) subunit, and regulates different stages of the cell cycle [8,9]. E3 ligases play a central role in imparting specificity to substrate recruitment. E3 ligase ubiquitination activity on native substrates is exquisitely controlled and regulated by protein-protein interactions (PPI) dictating their structural assembly. Furthermore, small-molecule degraders such as molecular glues and proteolysis-targeting chimeras (PROTACs) mediate recruitment of non-native interacting proteins to E3 ligases, thus hijacking the E3 intrinsic catalytic activity towards neo-substrates for proteasomal degradation. Here we review recent advances in elucidating the structural basis of building and hijacking ubiquitination machineries, with a focus on Cullin RING E3 ligase assembly, substrate recognition, and substrate recruitment mediated by degraders that holds attractive therapeutic potential.
Structural assembly and activity of modular multi-subunit E3 ligases Cullin RING E3 ubiquitin ligases (CRLs) represent the largest family of E3 ligases. They are modular in that they are composed of an interchangeable substrate receptor, adaptor subunit(s), and a RING-box domain subunit, assembled around a central cullin scaffold subunit. CRLs are classified based on the type of cullin subunit (Cul1, Cul2, Cul3, Cul4A, Cul4B, Cul5 and Cul7) [10]. Structures of fully assembled CRL complexes have highlighted a range of conformations and orientations attained by the different cullin subunits [11].
The crystal structure of CRL2 VHL complex, composed of Cul2, RING-box protein (RBX1), Elongin B, Elongin C, and von Hippel-Lindau protein (VHL) highlights an inherent interdomain bending in Cullin scaffold proteins; allowing the cullin C-terminal globular domain and the N-terminal helical bundles domain to come closer in space when compared to previously reported CRL structures ( Figure 1a) [12]. The structure also captures the RBX1 RING domain in an intermediate step in the full trajectory between inactive state and state activated by post-translational modification with the ubiquitin-like protein NEDD8 (neural precursor cell expressed developmentally downregulated protein 8) [12].
insights into the mechanism of NEDD8-mediated CRL activation. The structure comprises of neddylated CRL1 b-TRCP , ubiquitin-loaded E2 ubiquitin-conjugating enzyme UBE2D, and a phosphorylated peptide from IkBa (nuclear factor of kappa light polypeptide gene enhancer in B-cells inhibitor, alpha) ( Figure 1b) [13 ]. The full structure captures three distinct modules. First, NEDD8 is covalently linked to the winged helix-B (WHB) domain of cullin to form a globular activation module. Second, the catalytic module consists of ubiquitin-bound UBE2D and the RING domain of RBX1. Third, a so-called 'substrate-scaffolding module' comprising substrate receptor b-TRCP, adaptor subunit S-phase kinase-associated protein 1 (SKP1), and Cul1 together presents the b-TrCP-bound IkBa substrate towards the catalytic module. The mobile WHB domain of Cul1 permits the activation module to form multiple contacts between the 'backside' of UBE2D in the catalytic module and Cul1 in the substrate-scaffolding module. These extensive interactions facilitate the catalytic centre of UBE2D to be placed in proximity to the b-TRCP-bound substrate for subsequent ubiquitination.
Like ubiquitination, neddylation is also a reversible process. The deneddylation process is catalysed by the COP9 signalosome (CSN), an eight-subunit protein complex. A recent cryo-EM structure of the CSN tightly bound to neddylated CRL2 VHL adds to the structural details from the previous structures of CSN interaction with CRL1, CRL3 and CRL4A ( Figure 1c) [14 ,15,16]. The structure reaffirms the conserved activation mechanism of the deneddylation machinery, including conformational clamping of CRL2 by CSN2/CSN4, release of the catalytic CSN5/CSN6 heterodimer, and subsequent activation of the metalloprotease CSN5 [14 ]. More recently, inositol hexakisphosphate (IP6) has been characterized as a CSN cofactor that enhances interaction between CSN2 and RBX1, mediating CSN sequestration of CRL4 from UBE2R to prevent CRL4 activation [17 ].
The eukaryotic N-end rule pathway was traditionally classified into the Arg/N-end rule (recognising N-terminal basic/bulky/arginylated Asp and Glu residues) pathway and the Ac/N-end rule pathway (recognising Nterminal acetylated residues) [18]. Glucose-induced degradation subunit 4 (GID4) represents the subunit of GID assembly ubiquitin ligase that recognises substrates harbouring a recently identified third branch of N-end degron, Pro/N-degrons [19 ]. GID is a multisubunit E3 ligase from yeast that recognises the N-terminal proline of gluconeogenic enzymes and catalyses their ubiquitination ( Figure 1d) [20]. The GID assembly assumes an anticipatory state GID Ant under carbon stress. With carbon recovery, glucose-induced expression of Gid4 transitions GID ant into active GID SR4 recognizing isocitrate lyase (Icl1), fructose -1,6-bisphosphatase (Fbp1), and malate dehydrogenase (Mdh2) substrates [21 ]. The role of different subunits in GID assembly can be clustered as catalytic module (GID2 and 9), scaffold module (GID1,8 and 5) and substrate receptor module (GID4 or 10). The eight b strands and four loops of GID4 b-barrel forms a narrow opening with the N-terminal proline filling snugly in the central cavity (Figure 2a). The selectivity for proline recognition is imparted by a tight network of hydrogen bonds and hydrophobic interactions [19 ]. GID10 is proposed to be a substrate receptor that is expressed only under osmotic stress, although its substrate remains to be identified.

Substrate recognition by E3 ligase substrate receptors
CRLs employ a substrate receptor module to provide specificity for substrate recognition [10,11,22]. This section focuses on recent advances in the understanding of the structural basis of the highly diverse substrate recognition by E3 CRLs. The CRL1 FBXL5 -iron regulatory protein 2 (IRP2) complex structure provides insight into the oxygen sensing role of the [2Fe2S] cluster in binding to IRP2 (Figure 2b) [23 ].
[2Fe2S] acts as a cofactor by forming coordination bonds with the conserved cysteines of F-Box and Leucine Rich Repeat Protein 5 (FBXL5), presenting the 'interface loop' of leucine rich repeat (LRR) domain to IRP2 domain IV [23 ].
The CRL2 subunit Kelch domain-containing protein 2 (KLHDC2) was recently found to recognise substrates via a novel C-end degron protein-degradation mechanism, named DesCEND (destruction via C-end degron) [24,25 ]. The crystal structure of KLHDC2 in complex with C-terminal diglycine degrons of early terminated selenoproteins SelK and SelS, and N-terminal proteolytic fragment of USP1 reveals a deep and basic pocket at the centre of the Kelch domain of KLHDC2 that recognises the substrate via a network of hydrogen bonding interactions with its terminal carboxyl group, achieving nanomolar binding affinities (Figure 2c) [24].  Figure 2d) [26,27]. The structures reveal a U-shaped turn conformation of bound substrates in the KLHL12 hydrophobic pocket. In contrast, the 'LPDLV' Death-associated protein kinase 1 (DAPK1) epitope binds the Kelch-like protein 20 (KLHL20) as a loose helical turn (Figure 2e) [28]. The differential selectivity across the CRL3 Kelch domains could be attributed to variable length of loops at the top of the propeller and differences in patterns of hydrophobic and charged residues. Speckle-type POZ protein (SPOP) is an example of CRL3 substrate receptor protein that utilises MATH domain to bind its substrates. The cocrystal structure of pancreas/duodenum homeobox protein 1 (Pdx1) bound to SPOP-MATH relaxes the consensus binding motif for previously characterized SPOP ligands Puc phosphatase and MacroH2A (F-p-S-S/T-S/T, F:nonpolar; p: polar) to (F-p-S-p-p) (Figure 2f) [29,30].
CRL5 substrate-bound structures of suppressor of cytokine signaling 1 (SOCS1), suppressor of cytokine signaling 2 (SOCS2) and ankyrin repeat and SOCS box protein 9 (ASB9) substrate recognition modules have been recently solved. SOCS1 and SOCS2 share a similar domain architecture comprising of an N-terminal extended SH2 subdomain (ESS), a central Src-homology 2 (SH2) domain that recognises a phosphotyrosine (pY) containing sequence, and SOCS box that interacts with the adaptor ElonginB-ElonginC complex (EloBC). The ability of SOCS1 to recruit Cul5 and function as an E3 ligase is compromised because of alterations in the Cullin binding region of its SOCS box. An additional kinase inhibitory region (KIR) domain helps SOCS1 inhibit Janus kinase (JAK1 and JAK2) catalytic activity by blocking its substrate binding groove (Figure 2g) [31 ]. Interactions between JAK 'GQM' motif and BC loop of SOCS1 SH2 domain further augments binding affinity. SOCS2 utilises the SH2 domain to recognise phosphodegrons from erythropoietin receptor (EpoR) and growth hormone receptor (GHR) (Figure 2h) [32 ]. Unlike in SOCS3 and SOCS6 where the BG loop closes-in over the substrate to form a hydrophobic channel, the loop in E3 ligase assembly and targeting by PROTAC glues Ramachandran and Ciulli 113   SOCS2 adopts an open conformation that accommodates a wider range of lower-affinity substrates. ASB9 belongs to the largest family of SOCS box containing receptors, with ankyrin repeats serving as the recognition module [33]. A cryo-EM structure of ASB9 bound to the homodimer of Creatine Kinase brain type (CKB) substrate reveals that ASB9 25-34 form a helix-turn that inserts into a pocket formed by acidic D32 and basic residues (R132, N286, R292, R341) at the interface of the CKB homodimer (Figure 2i) [34 ].

Small-molecule glues of E3 ligase:neosubstrate interactions
Molecular glues mediate de novo PPIs between an E3 ligase and a neo-substrate protein leading to polyubiquitination and subsequent degradation of that protein. A first notable example of E3-ligase directed molecular glues is the plant hormone auxin that mediates CRL1 TIR1 -mediated degradation of transcription repressors [35]. Prominent examples of non-natural molecular glues are thalidomide and its Immunomodulatory drugs (IMiDs) analogues lenalidomide and pomalidomide. IMiDs bind to CRL4 CRBN and subsequently 'glue' lymphoid transcription factors Ikaros and Aiolos as neo-substrates, leading to their proteasomal degradation [36][37][38]. Crystal structures of DNA damage-binding protein 1 (DDB1)-cereblon (CRBN) complex bound to IMiDs and either casein kinase (CK1a) or G1 to S Phase Transition 1 (GSPT1) as neo-substrates provided structural insights into the mechanism of IMiD-mediated modulation of CRBN substrate specificity [36,39]. Recent complex structures of CRBN and pomalidomide with the second zinc finger (ZF2) of Ikaros (IKZF1), zinc finger protein 692(ZNF692) ZF4, and spalt like transcription factor 4 (SALL4) ZF2 highlight that in spite of minimal sequence conservation in the zinc finger degrons, pomalidomide can mediate a conserved binding mode [40 ,41 ]. An overlay of the complex structures highlight how a strictly conserved glycine of the ZF b-hairpin loop degron docks into a binding hotspot at the CRBN-pomalidomide interface (Figure 3a). These structural insights can now guide the rational design of higher-affinity CRBN binders, including molecular glues with enhanced potency and specificity for improved degradation of neosubstrate proteins [42].
Molecular glues have also been purposefully developed to enhance native E3 ligase-substrate PPIs, otherwise weakened in disease state, for example, as a result of mutations, thus rescuing impaired degradation of substrate protein [43]. The phosphodegron (DpSGwXpS) of oncogenic transcription factor b-catenin is recognised by CRL1 b-TrCP via phosphorylated Ser33 and Ser37, leading to efficient CRL1 b-TrCP -dependent ubiquitination and degradation of b-catenin. In many cancers, this PPI is significantly weakened as a result of mutations, for example, Ser-to-Ala or decreased phosphorylation levels -suggesting a strategy for rescuing the PPI via a small-molecular glue approach. Focused screening for enhanced PPI, followed by structure-guided design, achieved molecular glue NRX-2663 that enhanced the binding affinity of unphosphorylated Ser33/S37A b-catenin for b-TrCP by >10 000-fold. Ternary complex structure of NRX-2663 with monophosphorylated pSer33 b-catenin peptide and b-TrCP/Skp1 reveals that a portion of NRX-2663 fills the space left by unphosphorylated Ser37, thus substituting for the missing phosphate group (Figure 3b) [43].
Aryl-sulfonamide (e.g. indisulam) anticancer drugs were found to function as molecular glues to the CRL4 substrate receptor DDB1-associated and CUL4-associated factor 15 (DCAF15), leading to ubiquitination and proteasomal degradation of splicing factor RNA Binding Motif Protein 39 (RBM39), via a mechanism akin to that of IMiDs [44,45]. The structural basis of sulfonamide mode of action was recently elucidated in three independent structural-biophysical studies of sulfonamide-mediated complexes between DDB1-DCAF15 and RBM39 (Figure 3c) [46 ,47 ,48 ]. Indisulam and sulfonamide analogues occupy a shallow groove at the interface between the C-terminal and N-terminal domains of DCAF15, with the two sulfonyl oxygens forming hydrogen bonds with the backbone amide nitrogens of DCAF15 A234 and F235. In addition, the indole nitrogen and sulfonamide nitrogen form extensive water-mediated hydrogen bonds with the side-chain oxygens of RBM39 T262 and D264 (Figure 3c).
Unlike traditional molecular glues that bind to the substrate receptor subunit/domain of E3 ligases, a new class of glue-like compounds recruit E3 ligase machineries once bound to their target protein. The protein kinase inhibitor CR8 was shown to mediate binding of its target cyclin-dependent kinase 12 (CDK12) and the associated partner protein CyclinK to the CRL4 adaptor subunit DDB1. As a result, CDK12 acts as a neo substrate-recognition subunit (Figure 3d) [49 ]. CDK12 forms extensive interactions with BPA, BPC and C-terminal domains of DDB1, occupying the same position in the assembly as that of a substrate-recognition subunit. Similar to the Nterminus of DCAF15, the C-terminal tail of CDK12 binds to the cleft between the BPA and BPC domains of DDB1. Cyclin K, which binds CDK12 on the opposite side of CDK12, does not contact DDB1 and is presented as a neosubstrate, suitably positioned for ubiquitination and subsequent degradation (Figure 3d). More recently, Mayor-Ruiz et al. performed a focused compound screening in wild type versus isogenic hyponeddylated cells as an approach to enrich for hits that require functional ubiquitination machineries for their cellular activity [50]. Their screenings identified small molecules that glued between CDKs and DDB1, despite being chemically diverse to CR8.
PROTACs: bifunctional small molecules bridging target proteins to E3 ligases PROteolysis TArgeting Chimeras (PROTACs) are bifunctional degrader molecules made of an E3 ligase ligand and a target protein ligand, joined by a chemical linker [51,52]. PROTACs can bind to E3 ligase or target protein independently (1:1 complex), before inducing proximity between the two proteins in the form of a ternary complex (1:1:1 complex). Because of their chemical nature, PROTACs differentiate from molecular glues, which lack a linker and can bind to one but not the other of the two proteins. For these reasons, PRO-TACs were thought of working independently of PPIs between the ligase and the targeted protein. This notion has dramatically changed thanks to emerging structural and biophysical insights into PROTAC ternary complexes, revealing PROTACs can also 'glue' E3 ligase and target protein into stable and cooperative ternary complexes.
Our group solved a first PROTAC ternary structure, composed of our previously discovered PROTAC MZ1, a degrader of the Bromodomain and extraterminal domain (BET) protein Brd4, bound to VHL-ElonginC-ElonginB (VCB) and second bromodomain of Brd4 (Brd4 BD2 ) (Figure 4a) [53]. The crystal structure revealed E3 ligase assembly and targeting by PROTAC glues Ramachandran and Ciulli 115  Ternary complexes of E3 ligases with molecular glue degraders: (a-d) E3 ligase is displayed as 40% transparent surface and cartoon representation; substrate is displayed as cartoon and molecular glue is shown as spheres. Zoomed section of the interaction interface shows molecular glue in sticks with key hydrogen bond and cation-p interactions shown in yellow and orange dashed lines, respectively. (a) Superposed crystal structures of CRBN(wheat)-DDB1(cyan)-pomalidomide(purple blue carbons) bound to IKZF1 ZF2 (yellow), ZNF692 ZF4 (magenta) and SALL4 ZF2 (green) (PDB: 6H0F [40 ], 6H0G [40 ], 6UML [41 ]). Alignment of structures is performed along the CRBN-CTD. Zoomed section of the interaction interface shows the hairpin loop of the SALL4 ZF2, pomalidomide and interacting residues from CRBN in sticks. (b) Ternary complex of b-TrCP (grey), monophosphorylated b-catenin degron peptide (magenta) and NRX-2663 (green carbons) (PDB: 6M92 [43]). Doubly phosphorylated b-catenin peptide (yellow) is superposed to highlight the void occupied by NRX-2633 created by the absence of a phosphate group in Ser37 (PDB: 1P22 [62]). Zoomed section displays NRX-2663 and b-catenin shown in sticks. (c) Crystal structure of DCAF15(green)-DDB1(cyan)-DDA1 (yellow) in complex with indisulam (orange carbons) and RBM39 (magenta) (PDB: 6UD7 [46 ]). Zoomed section shows indisulam interacting residues from RBM39 and DCAF15 in sticks. Water molecules mediating protein-ligand interactions are displayed as small red spheres. (d) Crystal structure of CDK12(yellow)-cyclinK(magenta) with bound CR8 (orange carbons) and DDB1 (cyan) (PDB: 6TD3 [49 ]). The C-terminal extension of non-native neo-PPIs between VHL and Brd4 BD2 , of both hydrophobic and hydrophilic nature, wrapping the PRO-TAC into a collapsed yet favourable conformation, and resulting in the burial of extensive surface area in the system. The induced PPIs are isoform-specific and contribute to the formation of highly cooperative (a 20), stable and long-lived (t 1/2 > 2 min) ternary complex with Brd4 BD2 , which drive more pronounced ubiquitination and faster degradation of Brd4 in cells [53,54]. With the ternary structure in hand, rational structure-based approaches can be undertaken to design improved     [55 ].
Another example of successful application of rational structure-based design applied to PROTACs is the development of degraders of SWI/SNF Related, Matrix Associated, Actin Dependent Regulator Of Chromatin, Subfamily A (SMARCA2 and SMARCA4) [56 ]. An early, poor degrader of SMARCA2 (PROTAC 1), formed a cooperative ternary complex with VHL (a 10) despite its weak (mM) binding affinity for SMARCA2. This observation suggested that high-resolution structure could allow rapid optimization. Ternary complex co-crystal structure revealed extensive de novo PPIs contributing favourable binding energy, as in the case of MZ1, however accommodated through an unfavourably collapsed linker. Armed with this information, the linker was rigidified upon replacement of one of its PEG unit with a phenyl group, allowing formation of an additional p-stacking interaction with VHL Y98 (Figure 4b). Further optimization led to potent SMARCA2/4 degrader ACBI1 that formed ternary complexes of improved cooperativity and stability.
Ligase-PROTAC-target complexes have also been solved for systems that do not appear to exhibit positive cooperativity in the ternary equilibria, suggesting avenues for potential optimization strategy. The recent structure of a VCB:PROTAC6:B-cell lymphoma-extra-large (Bcl-xL) complex shows the long PEG linker of PROTAC6 is forced to adopt an extended conformation, before folding back into itself via a compact turn (Figure 4c) [57 ]. The unfavourable linker conformational energy likely surpasses any favourable induced PPIs, resulting in the negative cooperativity observed with this system. Relaxing such conformation while maintaining the relative geometry of the ternary complex might lead to improved Bcl-xL PROTAC degraders. Nowak et al. structurally characterized non-cooperative ternary complexes formed by CRBN-recruiting JQ1-based PROTACs (dBETs) of varying linker lengths (10-34 atoms) and conjugation points (Figure 4d) [58]. Distinct arrangements of the CRBN-Brd4 interface with different PROTACs highlighted plasticity of the interaction. These structural studies suggest that ternary systems of suboptimal energy and stability may still be productive for targeted protein degradation, if made of high-affinity protein binding ligands.

Conclusions
We have reviewed recent developments in structural understanding of assembly, function and (neo)-substrate recognition of E3 ligases. Existence of over 600 E3 ligases in mammalian cells underscore their importance in finetuning substrate specificity as a regulatory mechanism of protein homeostasis. E3 ligases are emerging as attractive drug targets in their own right because of their implication and dysregulation in several diseases. Therapeutic exploitation of E3 ligases with small molecules requires a structural and mechanistic understanding of the interplay of protein-protein interactions between their component subunits and how they impart biological function.
For drug development, knowledge of substrate-bound structures of E3 ligases can guide the development of small-molecule inhibitors. The advent of protein degraders that glue to E3 ligases and hijack E3 catalytic activity to effect targeted degradation of intracellular diseasedriving proteins are motivating augmented efforts focusing on this family class. Recent years have watched the emergence of structures solved for E3 ligases with molecular glue/PROTAC degraders and neo-substrates bound. The structures highlight the growing impact of structural and biophysical understanding of E3 ligase ternary complexes for degrader drug design. These founding advances are motivating current efforts to discover small molecules for more E3 ubiquitin ligases. This has the potential to usher the development of inhibitors or degraders that leverage a wider range of cell-specific, tissuespecific and disease-specific expression as well as functional essentiality and redundancy of E3 ligases, aiding improved therapeutics in the future.

Conflict of interest statement
The Ciulli laboratory receives or has received sponsored research support from Amphista Therapeutics, Boehringer Ingelheim, Eisai, Nurix Therapeutics, and Ono Pharmaceutical. A.C. is a scientific founder, shareholder, director and consultant of Amphista Therapeutics, a company that is developing targeted protein degradation therapeutic platforms.