Abstract
Human endogenous retrovirus subfamily H (HERVH) is a class of transposable elements expressed preferentially in human embryonic stem cells (hESCs). Here, we report that the long terminal repeats of HERVH function as enhancers and that HERVH is a nuclear long noncoding RNA required to maintain hESC identity. Furthermore, HERVH is associated with OCT4, coactivators and Mediator subunits. Together, these results uncover a new role of species-specific transposable elements in hESCs.
Similar content being viewed by others
Main
The unique cellular state of embryonic stem cells is maintained by pluripotency-associated transcription factors such as OCT4. The transcriptional-regulatory circuitries between human and mouse pluripotent stem cells exhibit notable differences. Transcriptional networks evolve via the gain and loss of cis- and trans-regulatory elements that modify network circuitry and via the addition of new regulators1. There is growing evidence showing that transposable elements (TEs) have facilitated the former type of innovation by contributing many new cis-regulatory elements to mammalian genomes2. Indeed, various studies show that TEs are the source of binding sites for mammalian transcription factors (TFs)3,4,5 and that in this way they have 'rewired' a number of developmental regulatory networks6,7. However, an important question that remains is to what extent have TEs also contributed new regulators to mammalian regulatory networks8. Identifying these new functional nodes will be key to deciphering the mechanisms behind human cellular pluripotency9. Therefore, we carried out this study to explore whether species-specific transposable elements have provided new regulators required for pluripotency in human cells.
Previous studies have indicated that a number of HERVH insertions are expressed in a hESC-specific manner, on the basis of high-throughput RNA sequencing (RNA-seq) data sets and chromatin modifications10,11,12. In total, we found 231 HERVH insertions to be highly expressed in H1 hESCs (Online Methods and Supplementary Fig. 1a,b), and we predicted this expression to be primate specific (Supplementary Fig. 1c). To investigate the role of HERVH in hESCs, we designed four short hairpin RNAs (shRNAs) against the expressed HERVH loci (Supplementary Fig. 2a). Strikingly, depletion of HERVH with individual shRNAs led to a dramatic change in cellular morphology, with resulting cells adopting a fibroblast-like appearance (Fig. 1a). To confirm that the hESCs had undergone differentiation, we examined the expression of pluripotency and differentiation markers after knockdown. We found that pluripotency markers (OCT4 (official symbol POU5F1), SOX2 and NANOG) were downregulated, whereas differentiation markers (GATA6 and RUNX1) were upregulated upon knockdown of HERVH but were not affected upon treatment with scrambled shRNAs (Fig. 1b and Supplementary Fig. 2b–f). These data suggest that HERVH is essential for the maintenance of hESC identity. Moreover, when we examined the expression of HERVH during reprogramming of fibroblasts into induced pluripotent stem cells (iPSCs), we found that HERVH was substantially upregulated after 6-d overexpression of reprogramming factors (Fig. 1c). Depletion of HERVH during reprogramming led to a reduction in iPSC colony number (Fig. 1d). Together, these data indicate that HERVH is essential for both maintenance of pluripotency in hESCs and acquisition of pluripotency in somatic cells.
To address a potential regulatory role of HERVH, we performed gene expression analysis after HERVH depletion. The microarray results confirmed the differentiation phenotype of hESCs after HERVH knockdown (Supplementary Fig. 3a,b). We also analyzed published RNA-seq data to address the association between HERVH and the nearest annotated genomic feature for every expressed locus. Within the 231 HERVH insertions, we defined 128 that were flanked by long terminal repeat 7 (LTR7) as being expressed (Online Methods and Supplementary Table 1). In total, 95 of these expressed HERVH loci were within 50 kb of an annotated transcript (Online Methods). In agreement with previous findings of a potential association of HERVH elements with long noncoding RNAs (lncRNAs)11,12,13, we found that 31 out of 95 (33%) of HERVH insertions were linked to genomic regions expressing lncRNAs instead of protein-coding genes (Fig. 1e). To study the relationship between HERVH and lncRNAs further, we performed RNA-seq to examine the expression of lncRNAs after HERVH depletion. Through RNA-seq analysis, we found that most of the expressed HERVH transcripts were depleted efficiently by our shRNAs (Supplementary Fig. 3c). Our gene ontology analysis revealed an upregulation of development-related genes and a downregulation of cell cycle–associated genes (Supplementary Fig. 3d,e). In an analysis of differentially expressed genes, we found that lncRNAs and protein-coding genes near HERVH loci were significantly downregulated after HERVH depletion (P = 5.85 × 10−3 for lncRNAs near HERVH; P = 1.16 × 10−11 for other lncRNAs; and P = 2.42 × 10−2 for protein coding genes; Fig. 1f and Supplementary Table 2). The only significant enrichment that we saw in upregulated genes was in lncRNAs not in proximity to HERVH (Supplementary Fig. 3f; P = 2.2 × 10−16). Furthermore, lncRNAs were downregulated more than protein coding genes (Supplementary Fig. 3g). These observations suggest a role of HERVH in enhancing the expression of lncRNAs and neighboring genes.
Next, we asked which fragment of the HERVH locus functions as a transcriptional enhancer. It was found previously that the LTR7 subfamily, which corresponds to the LTR portion of the HERVH (Supplementary Fig. 1a,b), is over-represented in OCT4-binding regions6,11, thus suggesting that this transcription factor could be mediating pluripotency-specific expression of these retroviral sequences and acting as an enhancer. To test this hypothesis, we cloned the LTRs of two HERVH insertions and showed that they possess hESC-specific enhancer activity (Supplementary Fig. 4a). Upon depletion of OCT4 by shRNA, the activity of these LTR enhancers was reduced (Supplementary Fig. 4b), thus indicating that OCT4 has a major role in activating the expression of these HERVH LTRs.
To study the mechanism of HERVH function in hESCs, we analyzed the location of HERVH transcripts. We performed RNA fluorescence in situ hybridization (FISH) experiments to determine the subcellular localization of HERVH transcripts. Interestingly, we observed an almost-exclusive localization of HERVH to the nucleus (Fig. 2a). In contrast, and as expected, we did not detect HERVH in mouse fibroblasts, human fibroblasts and HEK293T cells by RNA FISH (Supplementary Fig. 5a). Furthermore, we showed that the FISH signal was RNA dependent and was abolished by RNA digestion (Supplementary Fig. 5a). Because most lncRNAs are known to be highly enriched in the nuclear fraction14, these results suggest that HERVH may have a role as a noncoding RNA in hESCs.
Previous studies indicate that lncRNAs could work as a scaffolding unit to recruit chromatin modifiers to specific chromatin loci15,16. The nuclear localization of HERVH transcripts prompted us to test whether HERVH was associated with chromatin modifiers. To address this, we performed RNA cross-linking and immunoprecipitation (RNA-CLIP) assays. We first cross-linked chromatin modifiers to RNA by ultraviolet (UV) irradiation and then immunoprecipitated RNA–protein complexes with specific antibodies. Real-time PCR analysis showed that HERVH transcripts were specifically associated with the coactivators CBP, p300, MED6 and MED12 but not with co-repressors ESET, HDAC1 and PRC2 (Fig. 2b and Supplementary Fig. 5b,c). In contrast, the highly expressed mRNAs RPL18 and RPL19, as well as nuclear transcript U1 (official symbol RNU1-1), were not associated with these chromatin modifiers. We also asked whether the HERVH RNA was associated with transcription factors. Interestingly, RNA-CLIP assays showed an association between HERVH and OCT4 but not c-Myc (Fig. 2b and Supplementary Fig. 5c). We further confirmed these RNA-CLIP results by using formaldehyde as the cross-linker and obtained similar results (Supplementary Fig. 5b,c). Because HERVH was associated with the p300 and Mediator coactivator complexes, we sought to identify the proteins that associate with HERVH. To this end, we coexpressed Flag-tagged Mediator components with HERVH in HEK293T cells, which did not express HERVH. Using anti-Flag antibodies to pull down Flag-tagged factors, we determined which of the Mediator components were bound to HERVH. We found that CDK8 had the strongest interaction with HERVH, a result implicating a role of CDK8 in recruiting HERVH to the enhancer regions (Fig. 2c). In addition, we detected a weaker interaction with MED21, MED26, MED27, CKD8L and OCT4. The association of OCT4 and coactivators with HERVH, as revealed by RNA-CLIP, raised the possibility that HERVH expression may be necessary for LTR enhancer activity. Upon knockdown of HERVH, we observed that the LTR enhancer activities were reduced (Supplementary Fig. 5d). In addition, genes that are downregulated in the HERVH knockdown are more significantly enriched near LTR7 or HERVH elements than near OCT4-binding sites, other endogenous retroviral elements or Alu (Supplementary Fig. 5e,f; P = 2.2 × 10−6 for LTR7; P = 2.2 × 10−16 for HERVH; P = 0.12 for OCT4-binding site; P = 0.024 for other endogenous retroviral elements; and P = 0.319 for Alu), thus suggesting that the presence of HERVH and LTR7 could be essential for HERVH-mediated cis gene expression regulation. Furthermore, among 111 HERVH insertions that promoted expression of neighboring genes, the majority were bound by OCT4 (Supplementary Fig. 5g). This suggests that HERVH acts as a scaffold to recruit p300 and OCT4 to HERVH LTR7 regions to drive the expression of neighboring genes and in turn regulates pluripotency-associated transcripts. Altogether, our findings support a model in which HERVH interacts with coactivators and pluripotency-associated transcription factors to enhance the activity of LTR elements and to set up the specific expression of neighboring genes, including lncRNAs, in pluripotent stem cells (Fig. 2d).
lncRNAs are a new class of RNAs identified to be involved in various biological processes such as splicing, translation and establishment of epigenetic modifications17,18,19. A previous study on genome-wide function of lncRNAs has suggested that several lncRNAs are associated with epigenetic regulators17. In this study, we show that transposable elements can also produce lncRNAs that are highly expressed in human pluripotent cells. The transcription of HERVH in hESCs, the phenotypic consequence of its depletion, its nuclear localization and its association with OCT4 and coactivators all suggest that this TE-derived lncRNA has evolved an important role in the regulatory network of pluripotency.
Methods
Cell culture and transfection.
HEK293T and HeLa cell lines were cultured in Dulbecco's Modified Eagle's Medium (DMEM) supplemented with 15% FBS (Gibco), 1 mM L-glutamine (Gibco) and 2 μg/ml gentamicin (Gibco). HEK293T cells and HeLa cells were transfected with 2 μg pCAG plasmid DNA and 5 μl Lipofectamine 2000 (Invitrogen). MRC-5 human lung primary fibroblast cells (ATCC) were maintained in DMEM containing 15% FBS. H1 hESC cells (WA-01, passage p32) were cultured under feeder-free conditions on Matrigel (BD) in conditioned medium containing 1 mM L-glutamine (Gibco), 1% nonessential amino acids (Gibco) 0.1 mM 2-mercaptoethanol (Gibco), 4 ng/ml human basic fibroblast growth factor (bFGF) (Invitrogen), 20% Knockout Serum Replacement (Gibco) and 78% DMEM F12 media (Gibco), and medium was supplemented with additional bFGF (8 ng/ml) before use. For expansion, the H1 hESCs were passaged with 1 mg/ml collagenase IV (Gibco) every 5–7 d. For transfection, the hESCs were passaged with 0.25% trypsin (Invitrogen) for dissociation and transfected with 2 μg pSUPER-HERVH shRNA and 6 μl TransIT-LT1 Mirus transfection reagents the following day. Cells were harvested 4–5 d after transfection for RNA extraction. The cell lines used have not been authenticated recently but have been tested for mycoplasma contamination.
Cloning of plasmids.
shRNAs targeting HERVH were designed against the ESRG gene. shRNAs were cloned into BglII and HindIII sites of pSUPER-puro vector. OCT4 shRNA was cloned in the same way. Sequences of shRNAs are as follows: OCT4 shRNA, GGATGTGGTCCGAGTGTGG; control shRNA, GATGAAATGGGTAAGTACA; HERVH shRNAs, GAGAGACAAAGGAGACACATT (shRNA1), AAGTTGGCACCAGA GTTGGGG (shRNA1 scramble), GGTGGCTGGAGCTAAAGGCAT (shRNA2), ACGAGTAGCGGTCTGGATAGG (shRNA2 scramble), CTAAAGGCA TAGTCAAGGTTA (shRNA3), GCGAAGTACGAATAGTTATCA (shRNA3 scramble), CCAGCAAAGGCAGGCTATGCTAT (shRNA4) and GGCA GAGATCACACCGGTCATTA (shRNA4 scramble). Plasmids expressing Flag-tagged proteins were obtained from Addgene (cat. no. 17433, Flag-MED1; cat. no. 15425, Flag-MED4; cat. no. 15416, Flag-MED7; cat. no. 15410, Flag-MED17; cat. no. 15419, Flag-MED18; cat. no. 15422, Flag-MED21; cat. no. 15423, Flag-MED22; cat. no. 15367, Flag-MED26; cat. no. 15424, Flag-MED27; cat. no. 15368, Flag-MED28; cat. no. 15409, Flag-MED29; cat. no. 15366, Flag-CDK8; cat. no. 24762, Flag-CDK8L; and cat. no. 13820, Flag-HDAC1).
Reverse transcription and qPCR.
Total RNA was extracted from cells with TRIzol reagent (Invitrogen) and was then treated with DNase I (Ambion) for 1 h. The concentration of RNA was determined by NanoDrop 2000 (Thermo Scientific). 500 ng DNase-treated total RNA was reverse transcribed with random hexamers (Invitrogen) and SuperScript II reverse transcriptase (Invitrogen). qPCR was performed with 2 μl 5×-diluted cDNA per 10 μl reaction and KAPA SYBR Green (Kapa Biosystems). Quantitative real-time PCR was performed on the PRISM 7900HT system (Applied Biosystems). Gene expression levels were then normalized to those of housekeeping genes (ACTB and RPL19).
Western blot.
For western blot, proteins were extracted by cell lysis with 2× Laemmli buffer (1% SDS, 20% glycerol, and 125 mM Tris-HCl, pH 6.8), and the concentration was determined via Bradford assay. Cell lysates were resolved on a 10% gel by SDS-PAGE and transferred to a nitrocellulose membrane by semidry transfer (Bio-Rad). Signal was detected with ECL substrate with the Immobilon Western ECL kit (Millipore) according to the manufacturer's recommended procedure. Dilutions of primary antibodies are: rabbit anti-NANOG Ab (1:2,000, D73G4, Cell Signaling), rabbit anti-OCT4 Ab (1:5,000, ab19857, Santa Cruz), anti-actin Ab (1:1,000, C-4, Santa Cruz). Dilutions of secondary antibodies are: HRP–anti-rabbit IgG (1:5,000, NA9314V, Amersham) and HRP–anti-mouse IgG (1:5,000, NA931V, Amersham). Validations of antibodies are provided on the manufacturers' websites.
Immunofluorescence.
Cells were washed with PBS and fixed with 2% formaldehyde for 10 min at room temperature. After permeabilization with 0.5% Triton X-100, cells were incubated in blocking buffer (2.5% BSA in PBS-T) for 15 min. Cells were incubated with primary antibodies for 1 h and secondary antibodies for another h at room temperature. Images were taken with a Zeiss Axiovert epifluorescence microscope. Images were processed with ImageJ and Adobe Photoshop. Dilutions of primary antibodies are: rabbit anti-NANOG Ab (1:400, Cell Signaling, D73G4), goat anti-SOX2 Ab (1:100, Y-17, Santa Cruz) and mouse anti–TRA-1-60 Ab (1:50, Cell Signaling, 4746). Dilutions of secondary antibodies are: Alexa Fluor 546 anti-rabbit IgG (1:1,000, A11010, Invitrogen), Alexa Fluor 488 anti-goat IgG (1:1,000, A21467, Invitrogen) and Alexa Fluor 488 anti-mouse IgG (1:1,000, A21200, Invitrogen). Validations of antibodies are provided on the manufacturers' websites.
Flow cytometry analysis.
Cells were dissociated by trypsinization, fixed in ice-cold 70% ethanol and then permeabilized in 0.5% Triton X-100 for 15 min. Cells were incubated in blocking buffer (2.5% BSA in PBS-T) for 30 min at 4 °C. Cells were incubated with primary antibodies for 2 h at 4 °C and with secondary antibodies for another 2 h at 4 °C. Cells were washed and analyzed on an LSR II Flow Cytometry Analyzer (BD). The data were processed with FlowJo V10. Dilutions of primary antibodies are: goat anti-SOX2 (1:50, Y-17, Santa Cruz), mouse anti–TRA-1-81 (1:50, Cell Signaling, 4745), rabbit anti-NANOG (1:200, D73G4, Cell Signaling) and mouse anti-ACTA2 (1:200, ab18460, Abcam). Dilutions of secondary antibodies are: Alexa Fluor 647 anti-mouse IgG (1:1,000, A21463, Invitrogen), Alexa Fluor 488 anti-goat IgG (1:1,000, A21467, Invitrogen) and Alexa Fluor 488 anti-rabbit IgG (1:1,000, A21441, Invitrogen). Validations of antibodies are provided on the manufacturers' websites.
Virus production and reprogramming.
HERVH shRNA was cloned into the pLVTH vector for lentivirus-mediated knockdown. Viruses were packaged in HEK293T cells and concentrated with 100-kDa centrifugal filter devices (Millipore). MRC-5 cells from a six-well plate were split and infected with retroviruses containing four Yamanaka factors (OCT4, SOX2, KLF4 and c-MYC) with or without custom virus in the presence of 4 μg/ml polybrene (Sigma). MRC-5 cells were split onto CF-1 feeder cells and maintained in conditioned hESC medium for 3–4 weeks before harvest for immunostaining of TRA-1-60.
Microarray and gene ontology analysis.
For microarray analysis, RNA extracted from control samples and HERVH-depleted samples with TRIzol was treated with DNase I for 1 h. RNA was then purified by PureLink RNA mini kit (Invitrogen). 500 ng RNA was used for cRNA synthesis with the TotalPrep RNA Amplification Kit (Illumina). 750 ng cRNA was used for hybridization to HumanHT-12 v4 Expression BeadChip arrays (Illumina). After washing and drying steps, microarray chips were scanned with the BeadArray Reader. Data were extracted with GenomeStudio and normalized with the rank-invariant method. Normalized data were filtered for genes with detection P value <0.05. Significance Analysis of Microarrays v4.0 (SAM)20 was used for extracting genes with significant expression change. Cluster 3 (ref. 21) and Java TreeView22 were used for plotting the heat maps. Gene ontology analysis of unranked genes with more than 1.5-fold change was done with GOrilla23. The microarray data are available under Gene Expression Omnibus, GSE42430.
RNA preparation and sequencing.
TRIzol (Invitrogen) was used for extraction of total RNA from hESCs treated with control shRNA or HERVH shRNA for 5 d. RNA (4 μg) extracted was poly(A) enriched by incubation with oligo(dT)-coated Dynabeads (Invitrogen) for two rounds (Illumina). The resultant enriched mRNA was fragmented, size-selected and reverse-transcribed according to the IlluminaTruSeq RNA Sample Preparation Guide. Multiplexed samples were sequenced with the IlluminaHiSeq 2000 system. Each library was sequenced for 2 × 50 million 75-nucleotide reads. The RNA-seq data are available on Array Express under accession number E-MTAB-2072.
Luciferase assay.
For the luciferase assays, an ~500-bp LTR region of HERVH was cloned downstream of the luciferase reporter gene. 100 ng pGL4-luciferase reporter-LTR was transfected together with 4 ng of pCMV–Renilla luciferase into each well of hESCs in a 96-well plate. Media was changed 1 d after transfection, and cells were harvested for luciferase assay 3 d after transfection. Luciferase assays were performed with the Dual-Luciferase Reporter Assay System (Promega).
Association with lncRNAs and other genomic annotations.
Transcript annotations were downloaded from Ensembl (hg19, version 69). Significance of the association of HERVH expressed loci with genomic features was estimated by 10,000-times sampling of genomic coordinates from the human genome with the same length and number as the HERVH loci. Chromosomes were first sampled with a probability matched to the chromosome length, and the genomic location was sampled in a second step based on the length of the sampled chromosome. Sampled loci were required to be nonoverlapping. We associated these coordinates with the nearest genomic feature and calculated how many times the number of each genomic feature was equal to or higher than observed for the HERVH loci. Because we sampled 10,000 times, the minimal P value is P < 0.0001. Expression data of lncRNAs were obtained from v7 ENCODE24. RPKM values represent averages of processed expression data over poly(A)+ and poly(A)− samples.
Defining the expressed HERVH loci.
We used four H1 hESC replicates from the Caltech paired-end 75-bp ENCODE RNA-seq data sets. The program coverageBed from BEDTools was used to calculate the coverage over HERVH insertions from the RepeatMasker track. The output from coverageBed was adjusted to take into account the total number of reads in each RNA-seq data set; coverage is measured in reads per million at each repeat instance. Expressed repeats were defined as those with more than ten reads per million in all four replicates. The analysis was done on the 5,909 HERVH insertions taken from the RepeatMasker table. Only expressed HERVH insertions between two LTR7 repeats that were on the same strand and within 9 kb were retained, and 128 expressed loci were defined.
Analysis of RNA-seq data after HERVH depletion.
Differential expression in the HERVH knockdown was determined with the Cuffdiff tool from Cufflinks. Genes were considered to be differentially regulated if the fold change was greater than two for downregulation or less than 0.5 for upregulation. Genes were labeled as lncRNAs or protein-coding genes on the basis of Gencode v13 annotation. The transcripts were classified as HERVH associated if they occurred within 1 kb of an HERVH insertion. P values were calculated with a binomial test, and the total percentage of up- or downregulated genes was used as the expected value. For direct comparison of expression levels between the control and knockdown samples, the raw values were determined with coverageBed from BEDTools. We then corrected for the number of reads in the library; data are in units of reads per million.
Analysis of features near downregulated genes was done with the windowBed command-line tool from BEDTools with the options -l 10000 -r 0. This identifies genes that have the feature of interest within 10 kb of their start sites. We performed this analysis with LTR7, ERV, Alu and HERVH from RepeatMasker as well as OCT4-binding sites from Kunarso et al.6. We calculated the percentage of up- and nonregulated genes associated with each of these features and plotted the fraction of these two percentages (percentage of upregulated genes divided by percentage of nonregulated genes). Genes from the Cuffdiff output were grouped by whether they were lncRNAs or protein-coding genes according to the Gencode v13 annotation. These two gene groups were then divided into two groups on the basis of whether or not they are near HERVH insertions.
RNA fluorescence in situ hybridization.
DNA probes were generated by nick translation from a pCAG plasmid containing cDNA encoding HERVH. The DNA was labeled with Cy3-dUTP with the Nick Translation Kit (Roche) according to the manufacturer's recommended protocol. After being labeled, the DNA was precipitated and resuspended in hybridization buffer (2× SSC, pH 7.4, 10% dextran, 2 mg/mL BSA, and 50% formamide). hESCs were trypsinized, resuspended at a density of 700,000 cells/mL in PBS and attached on glass slides with Cytospin (Thermo Scientific) at 1,500 r.p.m. for 10 min. The cells were fixed in 4% formaldehyde for 10 min, washed and stored in 70% ethanol at 4 °C. Before hybridization, the cells were dehydrated through 80%, 90% and 100% ethanol and air-dried. The RNA probe in hybridization buffer was added and incubated in a humid chamber overnight at 42 °C. The next day, the slides were washed in 2× SSC/50% formamide, 2× SSC and PBS, and were then embedded under coverslips with Vectashield containing DAPI (Vector Laboratories). Images were captured on a Nikon Widefield Microscope, processed by deconvolution in AutoQuant and edited in ImageJ and Photoshop.
RNA cross-linking and immunoprecipitation.
RNA immunoprecipitation experiments were performed according to a published protocol, with modifications25. Briefly, hESCs were cross-linked with 1% formaldehyde for 10 min or UV-irradiated at 254 nm, 600 mJ/cm2 (Stratagene Stratalinker), and cells were lysed in Triton buffer with short disruptive sonication. DNA was removed from nuclear lysate by 15 min DNase I treatment, and RNA was fragmented by 5 min RNase A treatment at 37 °C. Nuclear lysates were precleared with protein G Dynabeads (Invitrogen) for 2 h at 4 °C and then incubated along with 3 μg antibodies overnight. RNA was immunoprecipitated with Protein G Dynabeads. Proteins were removed by Proteinase K (Fermentas) treatment after RNA elution. RNA was extracted with TRIzol (Invitrogen), and RT-qPCR was performed as described above. Antibodies used for immunoprecipitation: rabbit anti-p300 Ab (C-20, Santa Cruz), rabbit anti-ESET Ab26, rabbit anti-SUZ12 Ab (ab12073, Abcam), rabbit anti-HDAC1 Ab (H51, Santa Cruz), rabbit anti-OCT4 Ab (ab19857, Abcam), rabbit anti–c-Myc Ab (N-262, Santa Cruz), rabbit anti-CBP Ab (A22, Santa Cruz), goat anti-snRNP70 Ab (C-18, Santa Cruz) and mouse anti-Flag (F3165, Sigma). Relative fold enrichment was calculated as the signal over control antibody, normalized to the average of both control transcripts RPL18 and RPL19 (fold set as 1). Validations of antibodies are provided on the manufacturers' websites.
Accession codes.
Microarray data have been deposited in the Gene Expression Omnibus database under accession number GSE42430. RNA-seq data have been deposited with ArrayExpress under accession number E-MTAB-2072.
Accession codes
References
Babu, M.M., Luscombe, N.M., Aravind, L., Gerstein, M. & Teichmann, S.A. Curr. Opin. Struct. Biol. 14, 283–291 (2004).
Bourque, G. Curr. Opin. Genet. Dev. 19, 607–612 (2009).
Bourque, G. et al. Genome Res. 18, 1752–1762 (2008).
Wang, T. et al. Proc. Natl. Acad. Sci. USA 104, 18613–18618 (2007).
Schmidt, D. et al. Cell 148, 335–348 (2012).
Kunarso, G. et al. Nat. Genet. 42, 631–634 (2010).
Lynch, V.J., Leclerc, R.D., May, G. & Wagner, G.P. Nat. Genet. 43, 1154–1159 (2011).
Feschotte, C. Nat. Rev. Genet. 9, 397–405 (2008).
Ng, H.H. & Surani, M.A. Nat. Cell Biol. 13, 490–496 (2011).
Jacques, P.É., Jeyakani, J. & Bourque, G. PLoS Genet. 9, e1003504 (2013).
Santoni, F.A., Guerra, J. & Luban, J. Retrovirology 9, 111 (2012).
Kelley, D. & Rinn, J. Genome Biol. 13, R107 (2012).
Kapusta, A. et al. PLoS Genet. 9, e1003470 (2013).
Harrow, J. et al. Genome Res. 22, 1760–1774 (2012).
Pera, M.F. & Tam, P.P. Nature 465, 713–720 (2010).
Burgess, D.J. Nat. Rev. Genet. 12, 300 (2011).
Guttman, M. et al. Nature 477, 295–300 (2011).
Pauli, A., Rinn, J.L. & Schier, A.F. Nat. Rev. Genet. 12, 136–149 (2011).
Esteller, M. Nat. Rev. Genet. 12, 861–874 (2011).
Tusher, V.G., Tibshirani, R. & Chu, G. Proc. Natl. Acad. Sci. USA 98, 5116–5121 (2001).
Eisen, M.B., Spellman, P.T., Brown, P.O. & Botstein, D. Proc. Natl. Acad. Sci. USA 95, 14863–14868 (1998).
Saldanha, A.J. Bioinformatics 20, 3246–3248 (2004).
Eden, E., Navon, R., Steinfeld, I., Lipson, D. & Yakhini, Z. BMC Bioinformatics 10, 48 (2009).
Derrien, T. et al. Genome Res. 22, 1775–1789 (2012).
Zhao, J. et al. Mol. Cell 40, 939–953 (2010).
Yuan, P. et al. Genes Dev. 23, 2507–2520 (2009).
Acknowledgements
We thank the ENCODE consortium for the free distribution of their data sets. We also thank J. Jeyakani and K. Zawack for a number of preliminary analyses and S. Pott and Y. Wan for helpful discussions. This work was supported by the Agency for Science, Technology and Research (A*STAR) of Singapore and by Génome Québec, Génome Canada. L.R. is also supported by a grant from the Canadian Institute of Health Research (CIHR MOP-115090). F.S. is supported by the Singapore International Graduate Award from A*STAR. J.G. is supported by a fellowship within the Postdoc-Program of the German Academic Exchange Service, DAAD.
Author information
Authors and Affiliations
Contributions
X.L. contributed conception, design, data collection and analysis, and manuscript writing; F.S. contributed data collection and analysis; L.R., P.-É.J. and J.G. contributed data analysis; G.B. and H.-H.N. contributed conception, design, supervision, data interpretation and manuscript writing.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 HERVH is a primate-specific endogenous retrovirus (ERV)
(a) Multiple sequence alignment of the expressed HERVH loci and approximate position of the gag, pol and env genes in the ancestral HERVH sequence. (b) Aggregate profiles of the RNA-seq and DNase I hypersensitive sites sequencing (DHS-seq) tags around the HERVH loci in H1 hESCs, GM12878 (GM) and K562 cells split between expressed loci and not expressed loci as defined in H1. Each locus was normalized to a length of 6kb and tags within the loci are shown relative to the end they are closest to. (c) Estimated age of the repeat instances from 3 LTR repeat subfamilies (HERV16, ERVL-E and HERVH). Myrs: million years.
Supplementary Figure 2 Validation of hESC differentiation phenotype after HERVH knockdown
(a) Location of shRNAs against HERVH. (b) Expression of differentiation markers as determined by qPCR after HERVH depletion. Biological triplicate data (n=3 extracts) is presented as mean ±s.e.m. (c) Western blot of OCT4 and NANOG expression after HERVH knockdown. (d) Expression of pluripotency markers (OCT4, NANOG and SOX2) and differentiation markers (GATA6 and RUNX1) after scrambled shRNA knockdown. Biological triplicate data (n=3 extracts) is presented as mean ±s.e.m. (e) Expression of NANOG and SOX2 as determined by immunostaining after HERVH depletion. (f) Expression of pluripotency (SOX2, TRA-1-81 and NANOG) and differentiation marker (ACTA2) after HERVH depletion.
Supplementary Figure 3 Microarray and RNA-seq analysis of gene expression after HERVH depletion
(a) Heatmaps showing change in pluripotency gene expression with expression fold change >1.5 after HERVH depletion as revealed by microarray analysis. (b) Heatmaps showing change in differentiation gene expression with expression fold change >1.5 after HERVH depletion as revealed by microarray analysis. (c) Comparison of control and knockdown expression levels. Expression measured in reads per million from RNA-seq experiment data. Data points are labeled by average percent identity with shRNA. (d) Gene ontology analysis of upregulated genes after HERVH depletion. (e) Gene ontology analysis of downregulated genes after HERVH depletion. (f) Classes of upregulated genes preferentially affected by HERVH depletion. Other upregulated genes were included as control. (f) Classes of upregulated genes preferentially affected by HERVH depletion. Other upregulated genes are included as control. P values are calculated by binomial test. (g) Fold change of downregulated lncRNA and protein coding genes.
Supplementary Figure 4 HERVH LTRs function as OCT4-regulated enhancers
(a) Luciferase enhancer assay using two LTR regions of HERVH in H1 hESCs, 293T and HeLa cells. Biological triplicate data (n=3 dishes) is presented as mean ± s.e.m. (b) Same assay as in hESCs with and without OCT4 depletion. Biological triplicate data (n=3 dishes) is presented as mean ± s.e.m.
Supplementary Figure 5 HERVH specifically localizes to the nucleus and functions as a lncRNA regulating neighboring genes' expression.
(a) RNA FISH of HERVH transcripts in various cell types. HERVH RNA is shown in red, DNA counterstained by 4',6-diamidino-2-phenylindole (DAPI) is shown in blue. MEF (CF1 mouse embryonic fibroblasts), 293T cells, and MRC-5 (human fibroblasts), were stained similar as hESCs (Fig. 2a). H1 cells were treated with 10 μg/mL RNAse A for 15 minutes prior to hybridization. Scale bars, 10 μM (MEF) and 5 μM (293T, MRC-5, H1). (b) Analysis of the interaction between HERVH and chromatin modifiers through coimmunoprecipitation of HERVH transcripts after formaldehyde crosslinking. Biological triplicate data (n=3 extracts) is presented as mean ± s.e.m. (c) Gel analysis of HERVH RT-PCR products after immunoprecipitation. M denotes DNA size marker lane. RT stands for reverse transcriptase. (d) Luciferase assay of HERVH LTRs after HERVH depletion. Each LTR activity was normalized to pGL4 activity (set as 1) in the same condition. Biological triplicate data (n=3 dishes) is presented as mean ± s.e.m. (e) Enrichment of HERVH and LTR7 within 10kb of downregulated genes. P values were calculated by binominal test. (f) Other genomic features within 10kb of downregulated genes. P values were calculated by binominal test. (g) Majority of HERVH instances which regulate expression of neighboring genes are bound by OCT4 within 1kb.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–5. (PDF 877 kb)
Supplementary Table 1
The genomic coordinates of 128 expressed HERVH loci. Expression is from H1 Caltech paired 75-bp ENCODE RNA-seq Replicate 1 dataset and is presented as reads per million. Coordinates are according to human genome assembly hg19. (XLSX 18 kb)
Supplementary Table 2
Downregulated lncRNAs and protein-coding genes after HERVH depletion. (XLSX 163 kb)
Rights and permissions
About this article
Cite this article
Lu, X., Sachs, F., Ramsay, L. et al. The retrovirus HERVH is a long noncoding RNA required for human embryonic stem cell identity. Nat Struct Mol Biol 21, 423–425 (2014). https://doi.org/10.1038/nsmb.2799
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nsmb.2799
This article is cited by
-
Ribosomal profiling of human endogenous retroviruses in healthy tissues
BMC Genomics (2024)
-
MicroRNAs and long non-coding RNAs during transcriptional regulation and latency of HIV and HTLV
Retrovirology (2024)
-
Context-aware transcript quantification from long-read RNA-seq data with Bambu
Nature Methods (2023)
-
Activation of human endogenous retroviruses and its physiological consequences
Nature Reviews Molecular Cell Biology (2023)
-
Roles of transposable elements in the regulation of mammalian transcription
Nature Reviews Molecular Cell Biology (2022)