Main

The unique cellular state of embryonic stem cells is maintained by pluripotency-associated transcription factors such as OCT4. The transcriptional-regulatory circuitries between human and mouse pluripotent stem cells exhibit notable differences. Transcriptional networks evolve via the gain and loss of cis- and trans-regulatory elements that modify network circuitry and via the addition of new regulators1. There is growing evidence showing that transposable elements (TEs) have facilitated the former type of innovation by contributing many new cis-regulatory elements to mammalian genomes2. Indeed, various studies show that TEs are the source of binding sites for mammalian transcription factors (TFs)3,4,5 and that in this way they have 'rewired' a number of developmental regulatory networks6,7. However, an important question that remains is to what extent have TEs also contributed new regulators to mammalian regulatory networks8. Identifying these new functional nodes will be key to deciphering the mechanisms behind human cellular pluripotency9. Therefore, we carried out this study to explore whether species-specific transposable elements have provided new regulators required for pluripotency in human cells.

Previous studies have indicated that a number of HERVH insertions are expressed in a hESC-specific manner, on the basis of high-throughput RNA sequencing (RNA-seq) data sets and chromatin modifications10,11,12. In total, we found 231 HERVH insertions to be highly expressed in H1 hESCs (Online Methods and Supplementary Fig. 1a,b), and we predicted this expression to be primate specific (Supplementary Fig. 1c). To investigate the role of HERVH in hESCs, we designed four short hairpin RNAs (shRNAs) against the expressed HERVH loci (Supplementary Fig. 2a). Strikingly, depletion of HERVH with individual shRNAs led to a dramatic change in cellular morphology, with resulting cells adopting a fibroblast-like appearance (Fig. 1a). To confirm that the hESCs had undergone differentiation, we examined the expression of pluripotency and differentiation markers after knockdown. We found that pluripotency markers (OCT4 (official symbol POU5F1), SOX2 and NANOG) were downregulated, whereas differentiation markers (GATA6 and RUNX1) were upregulated upon knockdown of HERVH but were not affected upon treatment with scrambled shRNAs (Fig. 1b and Supplementary Fig. 2b–f). These data suggest that HERVH is essential for the maintenance of hESC identity. Moreover, when we examined the expression of HERVH during reprogramming of fibroblasts into induced pluripotent stem cells (iPSCs), we found that HERVH was substantially upregulated after 6-d overexpression of reprogramming factors (Fig. 1c). Depletion of HERVH during reprogramming led to a reduction in iPSC colony number (Fig. 1d). Together, these data indicate that HERVH is essential for both maintenance of pluripotency in hESCs and acquisition of pluripotency in somatic cells.

Figure 1: HERVH transcripts are essential for maintenance of hESC identity.
figure 1

(a) hESC morphology after HERVH depletion. Scale bar, 20 μm. (b) Expression of pluripotency markers as determined by qPCR after HERVH depletion. Biological-triplicate data (n = 3 extracts) are presented as mean ± s.e.m. (c) Expression of HERVH during reprogramming of MRC5 fibroblasts into induced pluripotent stem cells (iPSCs). Biological-triplicate data (n = 3 extracts) are presented as mean ± s.e.m. (d) Number of iPSC colonies after reprogramming together with HERVH shRNA. Biological-triplicate data (n = 3 dishes) are presented as mean ± s.e.m. (e) Fraction of HERVHs associated with the respective type of transcript (orange). For each expressed HERVH locus, the nearest transcript with a maximum distance of 50 kb was calculated. Expected values (blue) were estimated by sampling 10,000 times. P values from one-sided permutation test were obtained by comparison with the expected distribution from random samples (Online Methods). (f) Classes of downregulated genes preferentially affected by HERVH depletion. Other downregulated genes are included as controls. P values were calculated by one-sided binomial test.

To address a potential regulatory role of HERVH, we performed gene expression analysis after HERVH depletion. The microarray results confirmed the differentiation phenotype of hESCs after HERVH knockdown (Supplementary Fig. 3a,b). We also analyzed published RNA-seq data to address the association between HERVH and the nearest annotated genomic feature for every expressed locus. Within the 231 HERVH insertions, we defined 128 that were flanked by long terminal repeat 7 (LTR7) as being expressed (Online Methods and Supplementary Table 1). In total, 95 of these expressed HERVH loci were within 50 kb of an annotated transcript (Online Methods). In agreement with previous findings of a potential association of HERVH elements with long noncoding RNAs (lncRNAs)11,12,13, we found that 31 out of 95 (33%) of HERVH insertions were linked to genomic regions expressing lncRNAs instead of protein-coding genes (Fig. 1e). To study the relationship between HERVH and lncRNAs further, we performed RNA-seq to examine the expression of lncRNAs after HERVH depletion. Through RNA-seq analysis, we found that most of the expressed HERVH transcripts were depleted efficiently by our shRNAs (Supplementary Fig. 3c). Our gene ontology analysis revealed an upregulation of development-related genes and a downregulation of cell cycle–associated genes (Supplementary Fig. 3d,e). In an analysis of differentially expressed genes, we found that lncRNAs and protein-coding genes near HERVH loci were significantly downregulated after HERVH depletion (P = 5.85 × 10−3 for lncRNAs near HERVH; P = 1.16 × 10−11 for other lncRNAs; and P = 2.42 × 10−2 for protein coding genes; Fig. 1f and Supplementary Table 2). The only significant enrichment that we saw in upregulated genes was in lncRNAs not in proximity to HERVH (Supplementary Fig. 3f; P = 2.2 × 10−16). Furthermore, lncRNAs were downregulated more than protein coding genes (Supplementary Fig. 3g). These observations suggest a role of HERVH in enhancing the expression of lncRNAs and neighboring genes.

Next, we asked which fragment of the HERVH locus functions as a transcriptional enhancer. It was found previously that the LTR7 subfamily, which corresponds to the LTR portion of the HERVH (Supplementary Fig. 1a,b), is over-represented in OCT4-binding regions6,11, thus suggesting that this transcription factor could be mediating pluripotency-specific expression of these retroviral sequences and acting as an enhancer. To test this hypothesis, we cloned the LTRs of two HERVH insertions and showed that they possess hESC-specific enhancer activity (Supplementary Fig. 4a). Upon depletion of OCT4 by shRNA, the activity of these LTR enhancers was reduced (Supplementary Fig. 4b), thus indicating that OCT4 has a major role in activating the expression of these HERVH LTRs.

To study the mechanism of HERVH function in hESCs, we analyzed the location of HERVH transcripts. We performed RNA fluorescence in situ hybridization (FISH) experiments to determine the subcellular localization of HERVH transcripts. Interestingly, we observed an almost-exclusive localization of HERVH to the nucleus (Fig. 2a). In contrast, and as expected, we did not detect HERVH in mouse fibroblasts, human fibroblasts and HEK293T cells by RNA FISH (Supplementary Fig. 5a). Furthermore, we showed that the FISH signal was RNA dependent and was abolished by RNA digestion (Supplementary Fig. 5a). Because most lncRNAs are known to be highly enriched in the nuclear fraction14, these results suggest that HERVH may have a role as a noncoding RNA in hESCs.

Figure 2: HERVH transcripts function as lncRNAs that interact with coactivators and OCT4 to regulate expression of neighboring genes.
figure 2

(a) RNA FISH of HERVH transcripts. HERVH RNA is shown in red, and DNA is counterstained by 4′,6-diamidino-2-phenylindole (DAPI) is shown in blue. Right, localization of HERVH in open chromatin regions (arrow). Scale bars, 2.5 μM. (b) Analysis of interaction between HERVH transcripts and chromatin modifiers through coimmunoprecipitation of HERVH transcripts after UV cross-linking. U1 RNA, RPL18 and RPL19 are shown as controls. Biological-triplicate data (n = 3 extracts) are presented as mean ± s.e.m. (c) RNA immunoprecipitation analysis on the interaction of Flag-tagged factors with overexpressed HERVH in HEK293T cells. U1 RNA, RPL18 and RPL19 are shown as controls. Biological-triplicate data (n = 3 extracts) are presented as mean ± s.e.m. (d) Model of HERVH function in hESCs. HERVH interacts with coactivators and pluripotency factors such as OCT4 to promote enhancer activity of LTR7 and nearby regions cobound by p300 and OCT4 to drive the expression of neighboring lncRNAs and protein-coding genes essential to hESC identity.

Previous studies indicate that lncRNAs could work as a scaffolding unit to recruit chromatin modifiers to specific chromatin loci15,16. The nuclear localization of HERVH transcripts prompted us to test whether HERVH was associated with chromatin modifiers. To address this, we performed RNA cross-linking and immunoprecipitation (RNA-CLIP) assays. We first cross-linked chromatin modifiers to RNA by ultraviolet (UV) irradiation and then immunoprecipitated RNA–protein complexes with specific antibodies. Real-time PCR analysis showed that HERVH transcripts were specifically associated with the coactivators CBP, p300, MED6 and MED12 but not with co-repressors ESET, HDAC1 and PRC2 (Fig. 2b and Supplementary Fig. 5b,c). In contrast, the highly expressed mRNAs RPL18 and RPL19, as well as nuclear transcript U1 (official symbol RNU1-1), were not associated with these chromatin modifiers. We also asked whether the HERVH RNA was associated with transcription factors. Interestingly, RNA-CLIP assays showed an association between HERVH and OCT4 but not c-Myc (Fig. 2b and Supplementary Fig. 5c). We further confirmed these RNA-CLIP results by using formaldehyde as the cross-linker and obtained similar results (Supplementary Fig. 5b,c). Because HERVH was associated with the p300 and Mediator coactivator complexes, we sought to identify the proteins that associate with HERVH. To this end, we coexpressed Flag-tagged Mediator components with HERVH in HEK293T cells, which did not express HERVH. Using anti-Flag antibodies to pull down Flag-tagged factors, we determined which of the Mediator components were bound to HERVH. We found that CDK8 had the strongest interaction with HERVH, a result implicating a role of CDK8 in recruiting HERVH to the enhancer regions (Fig. 2c). In addition, we detected a weaker interaction with MED21, MED26, MED27, CKD8L and OCT4. The association of OCT4 and coactivators with HERVH, as revealed by RNA-CLIP, raised the possibility that HERVH expression may be necessary for LTR enhancer activity. Upon knockdown of HERVH, we observed that the LTR enhancer activities were reduced (Supplementary Fig. 5d). In addition, genes that are downregulated in the HERVH knockdown are more significantly enriched near LTR7 or HERVH elements than near OCT4-binding sites, other endogenous retroviral elements or Alu (Supplementary Fig. 5e,f; P = 2.2 × 10−6 for LTR7; P = 2.2 × 10−16 for HERVH; P = 0.12 for OCT4-binding site; P = 0.024 for other endogenous retroviral elements; and P = 0.319 for Alu), thus suggesting that the presence of HERVH and LTR7 could be essential for HERVH-mediated cis gene expression regulation. Furthermore, among 111 HERVH insertions that promoted expression of neighboring genes, the majority were bound by OCT4 (Supplementary Fig. 5g). This suggests that HERVH acts as a scaffold to recruit p300 and OCT4 to HERVH LTR7 regions to drive the expression of neighboring genes and in turn regulates pluripotency-associated transcripts. Altogether, our findings support a model in which HERVH interacts with coactivators and pluripotency-associated transcription factors to enhance the activity of LTR elements and to set up the specific expression of neighboring genes, including lncRNAs, in pluripotent stem cells (Fig. 2d).

lncRNAs are a new class of RNAs identified to be involved in various biological processes such as splicing, translation and establishment of epigenetic modifications17,18,19. A previous study on genome-wide function of lncRNAs has suggested that several lncRNAs are associated with epigenetic regulators17. In this study, we show that transposable elements can also produce lncRNAs that are highly expressed in human pluripotent cells. The transcription of HERVH in hESCs, the phenotypic consequence of its depletion, its nuclear localization and its association with OCT4 and coactivators all suggest that this TE-derived lncRNA has evolved an important role in the regulatory network of pluripotency.

Methods

Cell culture and transfection.

HEK293T and HeLa cell lines were cultured in Dulbecco's Modified Eagle's Medium (DMEM) supplemented with 15% FBS (Gibco), 1 mM L-glutamine (Gibco) and 2 μg/ml gentamicin (Gibco). HEK293T cells and HeLa cells were transfected with 2 μg pCAG plasmid DNA and 5 μl Lipofectamine 2000 (Invitrogen). MRC-5 human lung primary fibroblast cells (ATCC) were maintained in DMEM containing 15% FBS. H1 hESC cells (WA-01, passage p32) were cultured under feeder-free conditions on Matrigel (BD) in conditioned medium containing 1 mM L-glutamine (Gibco), 1% nonessential amino acids (Gibco) 0.1 mM 2-mercaptoethanol (Gibco), 4 ng/ml human basic fibroblast growth factor (bFGF) (Invitrogen), 20% Knockout Serum Replacement (Gibco) and 78% DMEM F12 media (Gibco), and medium was supplemented with additional bFGF (8 ng/ml) before use. For expansion, the H1 hESCs were passaged with 1 mg/ml collagenase IV (Gibco) every 5–7 d. For transfection, the hESCs were passaged with 0.25% trypsin (Invitrogen) for dissociation and transfected with 2 μg pSUPER-HERVH shRNA and 6 μl TransIT-LT1 Mirus transfection reagents the following day. Cells were harvested 4–5 d after transfection for RNA extraction. The cell lines used have not been authenticated recently but have been tested for mycoplasma contamination.

Cloning of plasmids.

shRNAs targeting HERVH were designed against the ESRG gene. shRNAs were cloned into BglII and HindIII sites of pSUPER-puro vector. OCT4 shRNA was cloned in the same way. Sequences of shRNAs are as follows: OCT4 shRNA, GGATGTGGTCCGAGTGTGG; control shRNA, GATGAAATGGGTAAGTACA; HERVH shRNAs, GAGAGACAAAGGAGACACATT (shRNA1), AAGTTGGCACCAGA GTTGGGG (shRNA1 scramble), GGTGGCTGGAGCTAAAGGCAT (shRNA2), ACGAGTAGCGGTCTGGATAGG (shRNA2 scramble), CTAAAGGCA TAGTCAAGGTTA (shRNA3), GCGAAGTACGAATAGTTATCA (shRNA3 scramble), CCAGCAAAGGCAGGCTATGCTAT (shRNA4) and GGCA GAGATCACACCGGTCATTA (shRNA4 scramble). Plasmids expressing Flag-tagged proteins were obtained from Addgene (cat. no. 17433, Flag-MED1; cat. no. 15425, Flag-MED4; cat. no. 15416, Flag-MED7; cat. no. 15410, Flag-MED17; cat. no. 15419, Flag-MED18; cat. no. 15422, Flag-MED21; cat. no. 15423, Flag-MED22; cat. no. 15367, Flag-MED26; cat. no. 15424, Flag-MED27; cat. no. 15368, Flag-MED28; cat. no. 15409, Flag-MED29; cat. no. 15366, Flag-CDK8; cat. no. 24762, Flag-CDK8L; and cat. no. 13820, Flag-HDAC1).

Reverse transcription and qPCR.

Total RNA was extracted from cells with TRIzol reagent (Invitrogen) and was then treated with DNase I (Ambion) for 1 h. The concentration of RNA was determined by NanoDrop 2000 (Thermo Scientific). 500 ng DNase-treated total RNA was reverse transcribed with random hexamers (Invitrogen) and SuperScript II reverse transcriptase (Invitrogen). qPCR was performed with 2 μl 5×-diluted cDNA per 10 μl reaction and KAPA SYBR Green (Kapa Biosystems). Quantitative real-time PCR was performed on the PRISM 7900HT system (Applied Biosystems). Gene expression levels were then normalized to those of housekeeping genes (ACTB and RPL19).

Western blot.

For western blot, proteins were extracted by cell lysis with 2× Laemmli buffer (1% SDS, 20% glycerol, and 125 mM Tris-HCl, pH 6.8), and the concentration was determined via Bradford assay. Cell lysates were resolved on a 10% gel by SDS-PAGE and transferred to a nitrocellulose membrane by semidry transfer (Bio-Rad). Signal was detected with ECL substrate with the Immobilon Western ECL kit (Millipore) according to the manufacturer's recommended procedure. Dilutions of primary antibodies are: rabbit anti-NANOG Ab (1:2,000, D73G4, Cell Signaling), rabbit anti-OCT4 Ab (1:5,000, ab19857, Santa Cruz), anti-actin Ab (1:1,000, C-4, Santa Cruz). Dilutions of secondary antibodies are: HRP–anti-rabbit IgG (1:5,000, NA9314V, Amersham) and HRP–anti-mouse IgG (1:5,000, NA931V, Amersham). Validations of antibodies are provided on the manufacturers' websites.

Immunofluorescence.

Cells were washed with PBS and fixed with 2% formaldehyde for 10 min at room temperature. After permeabilization with 0.5% Triton X-100, cells were incubated in blocking buffer (2.5% BSA in PBS-T) for 15 min. Cells were incubated with primary antibodies for 1 h and secondary antibodies for another h at room temperature. Images were taken with a Zeiss Axiovert epifluorescence microscope. Images were processed with ImageJ and Adobe Photoshop. Dilutions of primary antibodies are: rabbit anti-NANOG Ab (1:400, Cell Signaling, D73G4), goat anti-SOX2 Ab (1:100, Y-17, Santa Cruz) and mouse anti–TRA-1-60 Ab (1:50, Cell Signaling, 4746). Dilutions of secondary antibodies are: Alexa Fluor 546 anti-rabbit IgG (1:1,000, A11010, Invitrogen), Alexa Fluor 488 anti-goat IgG (1:1,000, A21467, Invitrogen) and Alexa Fluor 488 anti-mouse IgG (1:1,000, A21200, Invitrogen). Validations of antibodies are provided on the manufacturers' websites.

Flow cytometry analysis.

Cells were dissociated by trypsinization, fixed in ice-cold 70% ethanol and then permeabilized in 0.5% Triton X-100 for 15 min. Cells were incubated in blocking buffer (2.5% BSA in PBS-T) for 30 min at 4 °C. Cells were incubated with primary antibodies for 2 h at 4 °C and with secondary antibodies for another 2 h at 4 °C. Cells were washed and analyzed on an LSR II Flow Cytometry Analyzer (BD). The data were processed with FlowJo V10. Dilutions of primary antibodies are: goat anti-SOX2 (1:50, Y-17, Santa Cruz), mouse anti–TRA-1-81 (1:50, Cell Signaling, 4745), rabbit anti-NANOG (1:200, D73G4, Cell Signaling) and mouse anti-ACTA2 (1:200, ab18460, Abcam). Dilutions of secondary antibodies are: Alexa Fluor 647 anti-mouse IgG (1:1,000, A21463, Invitrogen), Alexa Fluor 488 anti-goat IgG (1:1,000, A21467, Invitrogen) and Alexa Fluor 488 anti-rabbit IgG (1:1,000, A21441, Invitrogen). Validations of antibodies are provided on the manufacturers' websites.

Virus production and reprogramming.

HERVH shRNA was cloned into the pLVTH vector for lentivirus-mediated knockdown. Viruses were packaged in HEK293T cells and concentrated with 100-kDa centrifugal filter devices (Millipore). MRC-5 cells from a six-well plate were split and infected with retroviruses containing four Yamanaka factors (OCT4, SOX2, KLF4 and c-MYC) with or without custom virus in the presence of 4 μg/ml polybrene (Sigma). MRC-5 cells were split onto CF-1 feeder cells and maintained in conditioned hESC medium for 3–4 weeks before harvest for immunostaining of TRA-1-60.

Microarray and gene ontology analysis.

For microarray analysis, RNA extracted from control samples and HERVH-depleted samples with TRIzol was treated with DNase I for 1 h. RNA was then purified by PureLink RNA mini kit (Invitrogen). 500 ng RNA was used for cRNA synthesis with the TotalPrep RNA Amplification Kit (Illumina). 750 ng cRNA was used for hybridization to HumanHT-12 v4 Expression BeadChip arrays (Illumina). After washing and drying steps, microarray chips were scanned with the BeadArray Reader. Data were extracted with GenomeStudio and normalized with the rank-invariant method. Normalized data were filtered for genes with detection P value <0.05. Significance Analysis of Microarrays v4.0 (SAM)20 was used for extracting genes with significant expression change. Cluster 3 (ref. 21) and Java TreeView22 were used for plotting the heat maps. Gene ontology analysis of unranked genes with more than 1.5-fold change was done with GOrilla23. The microarray data are available under Gene Expression Omnibus, GSE42430.

RNA preparation and sequencing.

TRIzol (Invitrogen) was used for extraction of total RNA from hESCs treated with control shRNA or HERVH shRNA for 5 d. RNA (4 μg) extracted was poly(A) enriched by incubation with oligo(dT)-coated Dynabeads (Invitrogen) for two rounds (Illumina). The resultant enriched mRNA was fragmented, size-selected and reverse-transcribed according to the IlluminaTruSeq RNA Sample Preparation Guide. Multiplexed samples were sequenced with the IlluminaHiSeq 2000 system. Each library was sequenced for 2 × 50 million 75-nucleotide reads. The RNA-seq data are available on Array Express under accession number E-MTAB-2072.

Luciferase assay.

For the luciferase assays, an ~500-bp LTR region of HERVH was cloned downstream of the luciferase reporter gene. 100 ng pGL4-luciferase reporter-LTR was transfected together with 4 ng of pCMV–Renilla luciferase into each well of hESCs in a 96-well plate. Media was changed 1 d after transfection, and cells were harvested for luciferase assay 3 d after transfection. Luciferase assays were performed with the Dual-Luciferase Reporter Assay System (Promega).

Association with lncRNAs and other genomic annotations.

Transcript annotations were downloaded from Ensembl (hg19, version 69). Significance of the association of HERVH expressed loci with genomic features was estimated by 10,000-times sampling of genomic coordinates from the human genome with the same length and number as the HERVH loci. Chromosomes were first sampled with a probability matched to the chromosome length, and the genomic location was sampled in a second step based on the length of the sampled chromosome. Sampled loci were required to be nonoverlapping. We associated these coordinates with the nearest genomic feature and calculated how many times the number of each genomic feature was equal to or higher than observed for the HERVH loci. Because we sampled 10,000 times, the minimal P value is P < 0.0001. Expression data of lncRNAs were obtained from v7 ENCODE24. RPKM values represent averages of processed expression data over poly(A)+ and poly(A) samples.

Defining the expressed HERVH loci.

We used four H1 hESC replicates from the Caltech paired-end 75-bp ENCODE RNA-seq data sets. The program coverageBed from BEDTools was used to calculate the coverage over HERVH insertions from the RepeatMasker track. The output from coverageBed was adjusted to take into account the total number of reads in each RNA-seq data set; coverage is measured in reads per million at each repeat instance. Expressed repeats were defined as those with more than ten reads per million in all four replicates. The analysis was done on the 5,909 HERVH insertions taken from the RepeatMasker table. Only expressed HERVH insertions between two LTR7 repeats that were on the same strand and within 9 kb were retained, and 128 expressed loci were defined.

Analysis of RNA-seq data after HERVH depletion.

Differential expression in the HERVH knockdown was determined with the Cuffdiff tool from Cufflinks. Genes were considered to be differentially regulated if the fold change was greater than two for downregulation or less than 0.5 for upregulation. Genes were labeled as lncRNAs or protein-coding genes on the basis of Gencode v13 annotation. The transcripts were classified as HERVH associated if they occurred within 1 kb of an HERVH insertion. P values were calculated with a binomial test, and the total percentage of up- or downregulated genes was used as the expected value. For direct comparison of expression levels between the control and knockdown samples, the raw values were determined with coverageBed from BEDTools. We then corrected for the number of reads in the library; data are in units of reads per million.

Analysis of features near downregulated genes was done with the windowBed command-line tool from BEDTools with the options -l 10000 -r 0. This identifies genes that have the feature of interest within 10 kb of their start sites. We performed this analysis with LTR7, ERV, Alu and HERVH from RepeatMasker as well as OCT4-binding sites from Kunarso et al.6. We calculated the percentage of up- and nonregulated genes associated with each of these features and plotted the fraction of these two percentages (percentage of upregulated genes divided by percentage of nonregulated genes). Genes from the Cuffdiff output were grouped by whether they were lncRNAs or protein-coding genes according to the Gencode v13 annotation. These two gene groups were then divided into two groups on the basis of whether or not they are near HERVH insertions.

RNA fluorescence in situ hybridization.

DNA probes were generated by nick translation from a pCAG plasmid containing cDNA encoding HERVH. The DNA was labeled with Cy3-dUTP with the Nick Translation Kit (Roche) according to the manufacturer's recommended protocol. After being labeled, the DNA was precipitated and resuspended in hybridization buffer (2× SSC, pH 7.4, 10% dextran, 2 mg/mL BSA, and 50% formamide). hESCs were trypsinized, resuspended at a density of 700,000 cells/mL in PBS and attached on glass slides with Cytospin (Thermo Scientific) at 1,500 r.p.m. for 10 min. The cells were fixed in 4% formaldehyde for 10 min, washed and stored in 70% ethanol at 4 °C. Before hybridization, the cells were dehydrated through 80%, 90% and 100% ethanol and air-dried. The RNA probe in hybridization buffer was added and incubated in a humid chamber overnight at 42 °C. The next day, the slides were washed in 2× SSC/50% formamide, 2× SSC and PBS, and were then embedded under coverslips with Vectashield containing DAPI (Vector Laboratories). Images were captured on a Nikon Widefield Microscope, processed by deconvolution in AutoQuant and edited in ImageJ and Photoshop.

RNA cross-linking and immunoprecipitation.

RNA immunoprecipitation experiments were performed according to a published protocol, with modifications25. Briefly, hESCs were cross-linked with 1% formaldehyde for 10 min or UV-irradiated at 254 nm, 600 mJ/cm2 (Stratagene Stratalinker), and cells were lysed in Triton buffer with short disruptive sonication. DNA was removed from nuclear lysate by 15 min DNase I treatment, and RNA was fragmented by 5 min RNase A treatment at 37 °C. Nuclear lysates were precleared with protein G Dynabeads (Invitrogen) for 2 h at 4 °C and then incubated along with 3 μg antibodies overnight. RNA was immunoprecipitated with Protein G Dynabeads. Proteins were removed by Proteinase K (Fermentas) treatment after RNA elution. RNA was extracted with TRIzol (Invitrogen), and RT-qPCR was performed as described above. Antibodies used for immunoprecipitation: rabbit anti-p300 Ab (C-20, Santa Cruz), rabbit anti-ESET Ab26, rabbit anti-SUZ12 Ab (ab12073, Abcam), rabbit anti-HDAC1 Ab (H51, Santa Cruz), rabbit anti-OCT4 Ab (ab19857, Abcam), rabbit anti–c-Myc Ab (N-262, Santa Cruz), rabbit anti-CBP Ab (A22, Santa Cruz), goat anti-snRNP70 Ab (C-18, Santa Cruz) and mouse anti-Flag (F3165, Sigma). Relative fold enrichment was calculated as the signal over control antibody, normalized to the average of both control transcripts RPL18 and RPL19 (fold set as 1). Validations of antibodies are provided on the manufacturers' websites.

Accession codes.

Microarray data have been deposited in the Gene Expression Omnibus database under accession number GSE42430. RNA-seq data have been deposited with ArrayExpress under accession number E-MTAB-2072.