57
58
59
60
61
62
63
64
65
66
67
Yasuda, G. et al. In vivo destabilization and functional defects of the xeroderma
pigmentosum C protein caused by a pathogenic missense mutation. Mol Cell Biol
27, 6606-6614, doi:10.1128/MCB.02166-06 (2007).
Henricksen, L. A., Umbricht, C. B. & Wold, M. S. Recombinant replication protein
A: expression, complex formation, and functional characterization. The Journal of
biological chemistry 269, 11121-11132 (1994).
Kim, M.-S., Lapkouski, M., Yang, W. & Gellert, M. Crystal structure of the V(D)J
recombinase RAG1-RAG2. Nature 518, 507-511, doi:10.1038/nature14174
(2015).
Schorb, M., Haberbosch, I., Hagen, W. J. H., Schwab, Y. & Mastronarde, D. N.
Software tools for automated transmission electron microscopy. Nat Methods 16,
471-477, doi:10.1038/s41592-019-0396-9 (2019).
Zheng, S. Q., Palovcak, E., Armache, J.P., Verba, K.A., Cheng, Y., and Agard,
D.A. MotionCor2: anisotropic correction of beam-induced motion for improved
cryo-electron microscopy. Nat Methods 14, 331-332 (2017).
Zhang, K. Gctf: Real-time CTF determination and correction. J Struct Biol 193, 112, doi:10.1016/j.jsb.2015.11.003 (2016).
Fernandez-Leiro, R., and Scheres, S.H.W. A pipeline approach to single-particle
processing in RELION. Acta Crystallogr D Struct Biol 73, 496-502 (2017).
Pettersen, E. F. et al. UCSF Chimera--a visualization system for exploratory
research and analysis. J Comput Chem 25, 1605-1612, doi:10.1002/jcc.20084
(2004).
Sanchez-Garcia, R. et al. DeepEMhancer: a deep learning solution for cryo-EM
volume post-processing. Commun Biol 4, 874, doi:10.1038/s42003-021-02399-1
(2021).
Varadi, M. et al. AlphaFold Protein Structure Database: massively expanding the
structural coverage of protein-sequence space with high-accuracy models. Nucleic
acids research 50, D439-D444, doi:10.1093/nar/gkab1061 (2022).
Emsley, P., Lohkamp, B., Scott, W.G., and Cowtan, K. Features and development
of Coot. Acta Crystallogr D Biol Crystallogr 66, 486-501 (2010).
Adams, P. D. et al. PHENIX: a comprehensive Python-based system for
macromolecular structure solution. Acta Crystallogr D Biol Crystallogr 66, 213-221,
doi:S0907444909052925 [pii] 10.1107/S0907444909052925 (2010).
24
Kim et al., 2022
733
734
735
736
737
738
739
740
741
68
69
70
Chen, V. B. et al. MolProbity: all-atom structure validation for macromolecular
crystallography.
Acta
Crystallogr
Biol
Crystallogr
66,
12-21,
doi:10.1107/S0907444909042073 (2010).
Swint-Kruse, L., and Brown, C.S. Resmap: automated representation of
macromolecular interfaces as two-dimensional networks. Bioinformatics 21, 33273328 (2005).
Kucukelbir, A., Sigworth, F. J. & Tagare, H. D. Quantifying the local resolution of
cryo-EM density maps. Nat Methods 11, 63-65, doi:10.1038/nmeth.2727 (2014).
742
25
Kim et al., 2022
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
Acknowledgements
The authors thank Drs. R. Craigie, M. Gellert, H. Lans, D. Leahy and W. Vermeulen
for critical reacting of the manuscript and J. Li for preparing carbon-filmed cryoEM grids.
This work utilized the Cryo-Electron Microscopy Core facility, NIDDK, and the NIH MultiInstitute Cryo-EM Facility (MICEF). This research was supported by National Institute of
Diabetes, Digestive and Kidney Disease (DK075037) to W.Y., and Grants-in-Aid
(KAKENHI) (Grant Number JP16H06307 and JP21H03598) to K.S.
Author contribution
J.K. carried out biochemical and structural studies; C.L.L. carried out dual incision
assays; F.M.G. developed the GraFix protocol; H.W. and Y.C. helped with cryoEM grid
preparation and data acquisition, Y.C. and X.C. helped with cryoEM data processing and
map improvement; K.S. and W.Y. conceived the research project; W.Y. supervised
experimental design and data interpretation; K.S. and F.H. helped with data interpretation;
all authors were involved in writing the paper and adhere to the “Inclusion & Ethics”
regulation.
Data and Code Availability
The structures and cryoEM maps have been deposited with PDB and EMDB with
accession codes of 8EBS, 8EBT and 8EBU, EMD-27996, 27997 and 27998 for C7CD,
C7CAD and C7AD of Cy5; 8EBV, 8EBW, 8EBX and 8EBY and EMD-27999, 28000,
28001 and 28002 for C7CD1, C7CD2, C7CAD and C7AD of AP. The focused refinement
maps of XPC-lesion DNA in Cy5_C7CD and the C-terminal domain of XPC in C7CAD
and C7AD have been deposited with EMDB with accession codes EMD-29674 and 29673,
respectively. These data will be released immediately upon publication. Other research
materials reported here are available upon request.
Competing interests
The authors declare no competing interest.
26
Kim et al., 2022
772
Extended Data
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
Extended Data Fig. 1 Structures of NER complexes. (a) Yeast Rad4 (XPC) complexed
with Core7 and damaged DNA (orange and yellow) (PDB: 7K04 at 9.25 Å). (b) Human
XPA and Core7 are complexed with undamaged but branched DNA (PDB: 6RO4). These
structures are superimposed at XPB. The DNA damage site is far away and upstream of
the lesion sensor Fe4S4 (marked by the grey arrowheads). (c) For comparison, the
structures reported here, human XPC and Core7 complex with Cy5-DNA (C7CD), is
shown after superposition with 7K04. (d) In human XPC, XPA and Core7 complexed with
Cy5-DNA (C7CAD), the DNA lesion (Cy5) is downstream of the XPD motor (5′ to 3′) and
close to the lesion sensor Fe4S4 of XPD. when XPD translocates along the lesion strand
(orange), Cy5 would be “seen” and stall the XPD motor.
Extended Data Fig. 2 Structure determination of three Cy5 structures. (a) Diagram of
Cy5-DNA and Cy5. (b) The workflow of cryoEM data processing and model generation.
(c) FSC analysis of the quality and map resolution and model fit of each complex structure.
(d) For each complex, angular distributions of particles used for the final threedimensional reconstruction, and a surface presentation of its map colored according to
the local resolution estimated by ResMap with the scale bar on the side, are shown. (e)
Representative regions of the three cryoEM maps are superimposed with the final
structural models.
Extended Data Fig. 3 Structure determination of four AP structures. (a) Diagram of
the AP-DNA, and EMSA results of 5 nM 32P-labeled AP-DNA binding by 5 nM each of
XPA, Core7, Core7 and XPA (C7A), XPC, XPC and XPA (CA), Core7 and XPC (C7C)
and Core7 with XPC and XPA (C7CA). The EMSA results were replicated at least six
times. (b) The workflow of cryoEM data processing and model generation. (c) FSC
analysis of the quality and map resolution and model fit of each complex structure. (d)
For each complex, angular distributions of particles used for the final three-dimensional
reconstruction, and a surface presentation of its map, colored according to the local
resolution estimated by ResMap with the scale bar on the side, are shown. (e)
Representative regions of the three cryoEM maps (DNA) are superimposed with the final
structural models. For gel source data of 3a, see Supplementary Figures 3.
Extended Data Fig. 4 Structure-based sequence alignment of human XPC and yeast
Rad4. Conserved residues are highlighted in yellow (hydrophobic core), grey (structural
stability), green (subunit interface), cyan (DNA binding, and underscore indicating base
interactions), and red (disease mutation). Protein secondary structures are indicated by
box (for helix) and arrow (strand). They are labeled alphabetically for helices and
numerically for strands. In BHD domains 1-3, secondary structures are preceded by
domain name “1”, “2” and “3”. Disordered regions are indicated by dashed lines.
Extended Data Fig. 5 cryoEM maps of DNA bound by XPC and XPA. (a) The flipped out
T26 in Cy5_C7CD. (b) The LHN has close contacts with Cy5 and the non-lesion strand
across the minor groove. The cryoEM map in the above two panels are shown as semitransparency grey surface. (c) cryoEM map corresponding to XPA and DNA in
27
Kim et al., 2022
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
Cy5_C7CAD. Map volume is color coded and labeled. (d) A close-up of the C-terminal
52 residues of XPC (aa 889-940). XPB, p52, and p8 of Core7 and XPC are represented
by the cryoEM map of C7CAD and C7AD of Cy5-DNA and color coded. Helices L, M and
N of XPC are show as ribbon cartoons and labeled. The penultimate K939 of XPC, which
is shown in a stick model, caps the carboxyl end of helix N. Potential interactions between
the sidechain amine of K939 (shown as a sphere) and carbonyl oxygens are indicated by
dashed yellow lines. Residues F935, P936 and F937 of XPC are anchored in a
hydrophobic pocket in XPB (green).
Extended Data Fig. 6 Domain comparison of XPC, Rad4, RAD23 and CETN2. (a)
Superposition of TGD of XPC (slate blue) and Rad4 (semi-transparent grey). (b)
Superposition of BHD1 of XPC and Rad4. (c) Superposition of BHD2 of XPC and Rad4.
(d) Superposition of BHD3 of XPC and Rad4. (e) Superposition of Rad23 and RAD23
(pale green cartoon with molecular surface) reveals that TGD domains of XPC (blue) and
Rad4 (grey) differ by a 16° rotation. (f) Superposition of TGD domains of XPC and Rad4
reveals that BHD1, BHD2 and BHD3 diverge increasingly. (g) Crystal structures of
CETN2 (2GGM in pink and 2OBH in light blue) complexed with XPC peptide (LHC, blue)
are superimposed. Symmetry mate of XPC is shown in pale green. (h) CETN2 (light
green) and XPB (dark green) in C7CD are included in superposition. The LHC (XPC, dark
blue) is shifted and interacts with the C-terminal helix of XPB when complexed with Core7.
Extended Data Fig. 7 Structure comparison of C7CD with PIC and XPB with SF2
helicase. (a) Superposition of XPB (green) in C7CD and in human PIC (PDB: 7NVW,
light grey) shows the bent DNA associated with XPB and different position of XPD (cyan
in C7CD and light grey in PIC) in the U-shaped Core7. (b) Superposition of HD2 of four
SF helicases, XPB, Rad26 (CSB homolog), Snf2 and NS3 reveals that the tracking
strands superimpose well in all cases.
Extended Data Fig. 8 Repetitive and flexible structure of TFIIH (Core7). (a) The Ushaped Core7 in C7CD. The N-terminal helices of p44 that contact XPB are outlined in a
rounded rectangle. The XPD (left) and XPB (right) arm are well separated. (b) The σshaped Core7 in C7CAD with p34 superimposed to C7CD and viewed in the same
orientation as in panel a. The interface at p34-p44 and p34-p52 (inside the dashed oval)
remain unchanged. (c) The stable interfaces of p34 with p44-RING finger (RF) and p52.
The C-terminal p34-DZF (double Zinc finger) and p44-ZR (Zing Ribbon) domain are
labeled. (d) A β hairpin of p34-DZF in C7CD is changed to a short α helix in C7CAD. A
part of p62 becomes disordered in C7CAD. (e) The third domain of p52 (DRD fold)
contacts the N-terminal DRD domain (blueish) of XPB, which is followed by the second
DRD domain (greenish) of XPB. The N-terminal helices of p44 (pink) contact the back
side of XPB. (f) The fourth domain of p52 (grey) and p8 (light purple) form a heterodimer.
Extended Data Fig. 9 Comparison of DRD (Damage Recognition Domain) domains.
Two MutS DRDs (domains I and VI from 1EWQ) are shown on the left side for comparison.
Five DRD domains in TFIIH are shown after superposition with MutS DRDs. Each DRD
is colored in rainbow fashion from the blue N- to red C-terminus. Four β strands are
labeled 1 to 4, and strands 2 and 4 are each followed by an α helix (A and B). In the P5228
Kim et al., 2022
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
p8 heterodimer, the two subunits complement each other by supply the partner DRD with
the first β strand (shown in semi-transparent blue and labeled 1’).
Extended Data Fig. 10 Length of Cy5_DNA substrate required for efficient dual
incision. (a) Sequence of three Cy5 DNA substrates, each of which contains a total 94
bp but different upstream (left) and downstream length from Cy5 (right). (b) Diagrams of
the three DNA substrates. (c) Dual incision results of each DNA substrate (sub) after
incubation with Core7, XPC, XPA, RPA, XPF and XPG at 37°C for 60 min. DNA cleavage
intermediate (int) and final product (prod) are marked. (d) Means and standard deviations
(error bars) of triplicated dual incision reactions as well as individual data points are shown
in the bar graph. For gel source data, see Supplementary Figures 5.
Extended Data Table 1. CryoEM data collection and processing, and structural
model refinement
878
879
29
782
(kDa)
FeS HD1 Arch
HD2
760
200
150
120
100
85
70
60
50
125 164 281 346
PH
17
p52
Bndl 548
127 132
305
401
D1 D2 DRD P8L 462
54 240
328 387
395
vWA ZR RF
250
292
308
vWA
DZF
69
p8 71
167
635
684 741 866 913
p44
p34
p8
XPC
TGD
RAD23
CETN2
24
102 135
XPA
CB
409
25
20
15
CETN2
10
P8
(TTDA)
C 172
197 234
ZF
C7CD
331
XPC
XPB
XPD
p62
RAD23
p52
p44
XPA
p34
40
30
940
BH1 BH2 BH3
271
XPA
723
HD2
Core7
500
HD1
483
p62
292
273
non-lesion
lesion
strand
XPC
C7CAD
XPC
C7AD
CETN2
XPD
XPB
Cy5
XPB
XPD
Cy5
XPC
CETN2
p44
p62
XPB
p62
p44
p52
XPD
XPA
RAD23
p8
DN
+X
+C A
ore
+C 7
7A
+X
PC
+C
+C
7C
+C
7C
XPD
163
DRD DRD
XPC
52
XPB
MW
p34
p8
p44
p62
XPA
p52
p34
p52
p34
Fig. 1
CETN2
XPC
RAD23
BHD3
CETN2
EF-N
34
EF-C
Ca2+
Ca
BHD2
24
31
33
23 14
22
2B
21
32
3B
2+
Cy5
BHD1 non-lesion
1A 12
strand
2C
upstream
LHC
LHN
51
C B
D TGD
downstream
lesion
strand
LH
F186
CETN2
RAD23
R192
Cy5
Y 189
R196
2B
P806
LH
T26
2C
1A
1B
Y656
A28 T27
H685 23 22 11
13 12
F756
2A
14
F733
1C
3A
31
34
21
P703
24
BHD1
BHD2
BHD3
33
3B
F797
T27
BHD3 A28 T26 T26
T27
Cy5-DNA (C7CD)
BHD2
A28
T27
T26
AP-DNA
Cy5
Cy5-DNA
Cy5
DNA (Rad4, 2QSH)
Fig. 2
XPC
RAD23
ZnF
XPA
BHD3
CETN2
C7CAD (XPA)
BHD3
XPC
BHD2
BHD1
TGD
LHC
CETN2
BHD1
Q197
Cy5
non-lesion
strand
LHN
A718
TTDA
(p8)
Cy5
XPD
LHC
XPD
p52
domain
35Å
p34
LHA
XPA
domain
W175
R211 T142
K217
T239
H242
XPB
p8
p44
H244
H266
L268
Y270
XPA
K259
p62
domain
M273
p52
p52
R207
Q146
S233
N p34
p8
p62
lesion
strand
LHA
XPB
p44
W235
XPB
E238
K706
XPC
LHA
C7CD
LHC
K218
-8 -7
XPA
domain
XPC
XPB
Cy5_C7CAD
Q893
AP_C7CAD
F546 F550
Cy5_C7CD
F658
L L900
M273
R89
Y660
P936
XPC
F937
K939
F935
S462
p8
A931
Q906
(O)
R908 W904 L934
V241 E911
p52
p52(p8-like) p8
C7CD
XPA
XPC
50°
bend
Cy5
C7CAD
Fig. 3
XPA
HD1
domain
non-lesion
strand
K218
XPA
R207
W175
K221
Q629
Fe4S4
W175
R228
lesion
strand
Y627
F508
R511
HD2
94
75
54
35
20
sub
int
prod
-F
- GGR
-F
AP
NER
Cy5
NER
min
-F
- GGR
-F
XPD
min
Dual incision prod (%)
XPB
lesion
strand
80
60
Cy5
40
20
AP
0 10 20 30 40 50 60
(min)
Fig. 4
GGR
XPC+TFIIH (C7CD)
C7CAD
XPA
XPB
XPC
XPC
XPB
XPA
XPB
lesion
XPD
XPD
stalled
XPD
XPC
XPA
XPA
Stalled RNAP+TFIIH (PIC-like)
RNA pol, CAK
XPB
RNA pol
XPB
ATP
dual incision
XPD
bypass
ATP
XPA
XPA
C7AD + 6RO4
XPB
XPD
XPD
TCR
Fig. 5
...