Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Next-generation characterization of the Cancer Cell Line Encyclopedia

Abstract

Large panels of comprehensively characterized human cancer models, including the Cancer Cell Line Encyclopedia (CCLE), have provided a rigorous framework with which to study genetic variants, candidate targets, and small-molecule and biological therapeutics and to identify new marker-driven cancer dependencies. To improve our understanding of the molecular features that contribute to cancer phenotypes, including drug responses, here we have expanded the characterizations of cancer cell lines to include genetic, RNA splicing, DNA methylation, histone H3 modification, microRNA expression and reverse-phase protein array data for 1,072 cell lines from individuals of various lineages and ethnicities. Integration of these data with functional characterizations such as drug-sensitivity, short hairpin RNA knockdown and CRISPR–Cas9 knockout data reveals potential targets for cancer drugs and associated biomarkers. Together, this dataset and an accompanying public data portal provide a resource for the acceleration of cancer research using model cancer cell lines.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of the datasets.
Fig. 2: DNA methylation and cancer dependence.
Fig. 3: Global chromatin profiling reveals activating mutations in p300 and CBP.
Fig. 4: MDM4 exon 6 inclusion is associated with MDM4 dependency and RPL22 or RPL22L1 status.
Fig. 5: High pSHP2 is a marker of SHP2 dependence and sensitivity to RTK inhibitors.

Similar content being viewed by others

Data availability

All the CCLE processed datasets are available at the CCLE portal (www.broadinstitute.org/ccle) and DepMap portal (http://www.depmap.org). Raw sequencing data are available at Sequence Read Archive (SRA) under accession number PRJNA523380. Achilles RNAi data (DEMETER scores) were downloaded from https://portals.broadinstitute.org/achilles. The Project Achilles CRISPR Avana 18Q3 public dataset (gene effects, CERES scores) was downloaded from https://figshare.com/articles/DepMap_Achilles_18Q3_public/6931364/1. Novartis Project DRIVE RNAi dataset (ATARiS scores) was obtained from the Project DRIVE authors. CTRP AUC scores was downloaded from the NCI website (ftp://caftpd.nci.nih.gov/pub/OCG-DCC/CTD2/Broad/CTRPv2.0_2015_ctd2_ExpandedDataset). Sanger GDSC drug sensitivity (AUC and IC50 scores) were downloaded from the Sanger website (https://www.cancerrxgene.org/downloads).

Code availability

Most of the statistical analyses were performed in R (version 3.5.2). Source codes are available upon request.

References

  1. Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012).

    Article  CAS  ADS  Google Scholar 

  2. Li, H. et al. The landscape of cancer cell line metabolism. Nat. Med. https://doi.org/10.1038/s41591-019-0404-8 (2019).

  3. Iorio, F. et al. A landscape of pharmacogenomic interactions in cancer. Cell 166, 740–754 (2016).

    Article  CAS  Google Scholar 

  4. Ben-David, U. et al. Genetic and transcriptional evolution alters cancer cell line drug response. Nature 560, 325–330 (2018).

    Article  CAS  ADS  Google Scholar 

  5. Tsherniak, A. et al. Defining a cancer dependency Map. Cell 170, 564–576 (2017).

    Article  CAS  Google Scholar 

  6. McDonald, E. R. III et al. Project DRIVE: a compendium of cancer dependencies and synthetic lethal relationships uncovered by large-scale, deep RNAi screening. Cell 170, 577–592 (2017).

    Article  Google Scholar 

  7. Meyers, R. M. et al. Computational correction of copy number effect improves specificity of CRISPR–Cas9 essentiality screens in cancer cells. Nat. Genet. 49, 1779–1784 (2017).

    Article  CAS  Google Scholar 

  8. Huang, F. W. et al. Highly recurrent TERT promoter mutations in human melanoma. Science 339, 957–959 (2013).

    Article  CAS  ADS  Google Scholar 

  9. Diouf, B. et al. Somatic deletions of genes regulating MSH2 protein stability cause DNA mismatch repair deficiency and drug resistance in human leukemia cells. Nat. Med. 17, 1298–1303 (2011).

    Article  CAS  Google Scholar 

  10. Marra, G. et al. Mismatch repair deficiency associated with overexpression of the MSH3 gene. Proc. Natl Acad. Sci. USA 95, 8568–8573 (1998).

    Article  CAS  ADS  Google Scholar 

  11. Esakova, O. & Krasilnikov, A. S. Of proteins and RNA: the RNase P/MRP family. RNA 16, 1725–1747 (2010).

    Article  CAS  Google Scholar 

  12. Hands-Taylor, K. L. et al. Heterodimerization of the human RNase P/MRP subunits Rpp20 and Rpp25 is a prerequisite for interaction with the P3 arm of RNase MRP RNA. Nucleic Acids Res. 38, 4052–4066 (2010).

    Article  CAS  Google Scholar 

  13. Doherty, J. R. & Cleveland, J. L. Targeting lactate metabolism for cancer therapeutics. J. Clin. Invest. 123, 3685–3692 (2013).

    Article  CAS  Google Scholar 

  14. Herman, J. G. et al. Silencing of the VHL tumor-suppressor gene by DNA methylation in renal carcinoma. Proc. Natl Acad. Sci. USA 91, 9700–9704 (1994).

    Article  CAS  ADS  Google Scholar 

  15. Jaffe, J. D. et al. Global chromatin profiling reveals NSD2 mutations in pediatric acute lymphoblastic leukemia. Nat. Genet. 45, 1386–1391 (2013).

    Article  CAS  Google Scholar 

  16. Creech, A. L. et al. Building the Connectivity Map of epigenetics: chromatin profiling by quantitative targeted mass spectrometry. Methods 72, 57–64 (2015).

    Article  CAS  Google Scholar 

  17. Sveen, A., Kilpinen, S., Ruusulehto, A., Lothe, R. A. & Skotheim, R. I. Aberrant RNA splicing in cancer; expression changes and driver mutations of splicing factor genes. Oncogene 35, 2413–2427 (2016).

    Article  CAS  Google Scholar 

  18. Dewaele, M. et al. Antisense oligonucleotide-mediated MDM4 exon 6 skipping impairs tumor growth. J. Clin. Invest. 126, 68–84 (2016).

    Article  Google Scholar 

  19. Rallapalli, R., Strachan, G., Cho, B., Mercer, W. E. & Hall, D. J. A novel MDMX transcript expressed in a variety of transformed cell lines encodes a truncated protein with potent p53 repressive activity. J. Biol. Chem. 274, 8299–8308 (1999).

    Article  CAS  Google Scholar 

  20. Gembarska, A. et al. MDM4 is a key therapeutic target in cutaneous melanoma. Nat. Med. 18, 1239–1247 (2012).

    Article  CAS  Google Scholar 

  21. Boutz, P. L., Bhutkar, A. & Sharp, P. A. Detained introns are a novel, widespread class of post-transcriptionally spliced introns. Genes Dev. 29, 63–80 (2015).

    Article  Google Scholar 

  22. Maruvka, Y. E. et al. Analysis of somatic microsatellite indels identifies driver events in human tumors. Nat. Biotechnol. 35, 951–959 (2017).

    Article  CAS  Google Scholar 

  23. Zhang, Y. et al. Ribosomal proteins Rpl22 and Rpl22l1 control morphogenesis by regulating pre-mRNA splicing. Cell Reports 18, 545–556 (2017).

    Article  CAS  Google Scholar 

  24. Lu, J. et al. MicroRNA expression profiles classify human cancers. Nature 435, 834–838 (2005).

    Article  CAS  ADS  Google Scholar 

  25. Li, J. et al. Characterization of human cancer cell lines by reverse-phase protein arrays. Cancer Cell 31, 225–239 (2017).

    Article  CAS  Google Scholar 

  26. Chen, Y. N. et al. Allosteric inhibition of SHP2 phosphatase inhibits cancers driven by receptor tyrosine kinases. Nature 535, 148–152 (2016).

    Article  CAS  ADS  Google Scholar 

  27. Wylie, A. A. et al. The allosteric inhibitor ABL001 enables dual targeting of BCR–ABL1. Nature 543, 733–737 (2017).

    Article  CAS  ADS  Google Scholar 

  28. Haibe-Kains, B. et al. Inconsistency in large pharmacogenomic studies. Nature 504, 389–393 (2013).

    Article  CAS  ADS  Google Scholar 

  29. The Cancer Cell Line Encyclopedia Consortium & The Genomics of Drug Sensitivity in Cancer Consortium. Pharmacogenomic agreement between two cancer cell line datasets. Nature 528, 84–87 (2015).

    Article  ADS  Google Scholar 

  30. Haverty, P. M. et al. Reproducible pharmacogenomic profiling of cancer cell line panels. Nature 533, 333–337 (2016).

    Article  CAS  ADS  Google Scholar 

  31. Geeleher, P., Gamazon, E. R., Seoighe, C., Cox, N. J. & Huang, R. S. Consistency in large pharmacogenomic studies. Nature 540, E1–E2 (2016).

    Article  CAS  ADS  Google Scholar 

  32. Bouhaddou, M. et al. Drug response consistency in CCLE and CGP. Nature 540, E9–E10 (2016).

    Article  CAS  Google Scholar 

  33. Mpindi, J. P. et al. Consistency in drug response profiling. Nature 540, E5–E6 (2016).

    Article  CAS  Google Scholar 

  34. Yu, C. et al. High-throughput identification of genotype-specific cancer vulnerabilities in mixtures of barcoded tumor cell lines. Nat. Biotechnol. 34, 419–423 (2016).

    Article  CAS  Google Scholar 

  35. King, A. J. et al. Abstract 2116: Combining the power of different profiling approaches to better understand the activity of kinase inhibitor drugs. Cancer Res. 77, 2116–2116 (2017).

    Google Scholar 

  36. Fisher, S. et al. A scalable, fully automated process for construction of sequence-ready human exome targeted capture libraries. Genome Biol. 12, R1 (2011).

    Article  Google Scholar 

  37. Johannessen, C. M. et al. A melanocyte lineage program confers resistance to MAP kinase pathway inhibition. Nature 504, 138–142 (2013).

    Article  CAS  ADS  Google Scholar 

  38. Boyle, P. et al. Gel-free multiplexed reduced representation bisulfite sequencing for large-scale DNA methylation profiling. Genome Biol. 13, R92 (2012).

    Article  CAS  Google Scholar 

  39. Brat, D. J. et al. Comprehensive, integrative genomic analysis of diffuse lower-grade gliomas. N. Engl. J. Med. 372, 2481–2498 (2015).

    Article  CAS  Google Scholar 

  40. Cancer Genome Atlas Research Network. Integrated genomic characterization of papillary thyroid carcinoma. Cell 159, 676–690 (2014).

    Article  Google Scholar 

  41. Huang, F. W. et al. TERT promoter mutations and monoallelic activation of TERT in cancer. Oncogenesis 4, e176 (2015).

    Article  CAS  Google Scholar 

  42. Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).

    Article  CAS  Google Scholar 

  43. Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

    Article  CAS  Google Scholar 

  44. Van der Auwera, G. A. et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinformatics 43, 11.10.11–11.10.33 (2013).

    Google Scholar 

  45. Wala, J. A. et al. SvABA: genome-wide detection of structural variants and indels by local assembly. Genome Res. 28, 581–591 (2018).

    Article  CAS  Google Scholar 

  46. Haas, B. et al. STAR-Fusion: fast and accurate fusion transcript detection from RNA-seq. Preprint at https://www.bioRxiv.org/content/10.1101/120295v1 (2017).

  47. Kasar, S. et al. Whole-genome sequencing reveals activation-induced cytidine deaminase signatures during indolent chronic lymphocytic leukaemia evolution. Nat. Commun. 6, 8866 (2015).

    Article  CAS  Google Scholar 

  48. Kim, J. et al. Somatic ERCC2 mutations are associated with a distinct genomic signature in urothelial tumors. Nat. Genet. 48, 600–606 (2016).

    Article  CAS  Google Scholar 

  49. The Cancer Genome Atlas Research Network. Comprehensive and integrated genomic characterization of adult soft tissue sarcomas. Cell 171, 950–965 (2017).

    Article  Google Scholar 

  50. Haradhvala, N. J. et al. Distinct mutational signatures characterize concurrent loss of polymerase proofreading and mismatch repair. Nat. Commun. 9, 1746 (2018).

    Article  CAS  ADS  Google Scholar 

  51. Carter, S. L. et al. Absolute quantification of somatic DNA alterations in human cancer. Nat. Biotechnol. 30, 413–421 (2012).

    Article  CAS  Google Scholar 

  52. Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011).

    Article  CAS  Google Scholar 

  53. Akalin, A. et al. methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles. Genome Biol. 13, R87 (2012).

    Article  Google Scholar 

  54. Ziller, M. J. et al. Charting a dynamic DNA methylation landscape of the human genome. Nature 500, 477–481 (2013).

    Article  CAS  ADS  Google Scholar 

  55. van der Maaten, L. & Hinton, G. Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).

    MATH  Google Scholar 

  56. Szklarczyk, D. et al. STRINGv10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43, D447–D452 (2015).

    Article  CAS  Google Scholar 

  57. Cerami, E. et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2, 401–404 (2012).

    Article  Google Scholar 

  58. Gao, J. et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci. Signal. 6, pl1 (2013).

    Article  Google Scholar 

  59. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    Google Scholar 

  60. DeLuca, D. S. et al. RNA-SeQC: RNA-seq metrics for quality control and process optimization. Bioinformatics 28, 1530–1532 (2012).

    Article  CAS  Google Scholar 

  61. GTEx Consortium. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).

    Article  Google Scholar 

  62. Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).

    Article  Google Scholar 

  63. Smyth, G. K. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3, 1–25 (2004).

  64. Akbani, R. et al. A pan-cancer proteomic perspective on The Cancer Genome Atlas. Nat. Commun. 5, 3887 (2014).

    Article  CAS  Google Scholar 

  65. Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010).

    Article  Google Scholar 

Download references

Acknowledgements

We thank the Broad Genomics Platform, C. Clish, H. Bitter, A. Najafi and E. Orlando for their contribution. This work was supported by grants from Novartis and partially by NIH/NCI grants 1U01CA217842-01, 1P50CA217685-01, 5P50CA098258, 1U24CA180922-01, 1R50CA211461-01, CA16672, 1R01CA219943-01, 1U54CA224068-01, NIH U01 CA176058 and R21 DA025720. F.W.H. was supported by the Prostate Cancer Foundation. M.I. was supported by a Burroughs Wellcome Fund Career Award. G.G. was partially supported by the Paul C. Zamecnik, MD, Chair in Oncology at MGH. G.B.M. was supported by the Adelson medical research fund. Drug sensitivity results are in part based on data generated by Cancer Target Discovery and Development (CTD2) Network (https://ocg.cancer.gov/programs/ctd2/data-portal) established by the National Cancer Institute’s Office of Cancer Genomics.

Reviewer information

Nature thanks Nevan Krogan, Christoph Plass and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

Authors and Affiliations

Authors

Contributions

For the work described herein, M.G., F.W.H., G.V.K., E.R.M., J.B., G.C., N.S., J.D.J., A.A.L., C.M.J., M.P.M., F.S., R.S., W.C.H., T.R.G., L.A.G. and W.R.S. conceived the studies; M.G., G.V.K., C.C.L., C.M.B., H.L., K. Hu, J.K., J.M.H., B.J.H., F.A., B.A.W., M.S.L., R.A., A.D., A.A.M., C.O., J.S., K. Hadi, K.V., A.R., D.S., M.L., J.L., J.M.K., M.D.J., J.G., A.C.W., J.M.M., M.Z., A.K., N.S., M.I., Y.E.M., A.D.C., A.T. and G.G. performed computational biology analysis; F.W.H., J.J.-V., E.R.M., J.B., A.Y.A.-D., M.V.R., B.R.P., Y.L., H.L.T., P.C.G., D.A.P., G.C., J.E.T., C.M.D., A.L.C., F.V., J.D.J., A.A.L., C.M.J. and F.S. performed biological analysis and interpretation; M.G., G.V.K., Y.R. and J.B. contributed to software development; M.G., F.W.H., J.J.-V., G.V.K., C.C.L., H.L., K. Hu, A.Y.A.-D., M.V.R., R.A., H.L.T. and K. Hadi prepared figures and tables for the main text and Supplementary Information; M.G., F.W.H., J.J.-V. and W.R.S. wrote the paper; E.R.M., J.B., M.V.R., B.R.P., R.A., P.C.G., A.K., A.T., A.A.L., D.M.W., R.S., W.C.H., G.B.M., J.S.B. and L.A.G. commented on and edited the manuscript; E.T.G. performed project management; D.M.W., M.P.M., R.S., W.C.H., G.G., G.B.M., J.S.B. and T.R.G. contributed project oversight and advisory roles; M.G. and G.V.K. were the lead computational biologists; F.S., L.A.G. and W.R.S. provided overall project leadership and developed the concepts and strategy for the project; L.A.G. and W.R.S. were the senior authors.

Corresponding author

Correspondence to William R. Sellers.

Ethics declarations

Competing interests

J.M.K., M.D.J., D.A.P., F.S., E.R.M., J.L., R.S., J.B., A.D., K.V., A.R., J.G., G.C., M.L., A.K., M.P.M. and W.R.S. are current or former Novartis employees and/or stock holders. W.R.S. is a Board or SAB member and holds equity in Peloton Therapeutics and Ideaya Biosciences and has consulted for Array, Astex, Ipsen, Sanofi and Servier. B.A.W. is a J&J employee. F.S. and G.V.K. are KSQ Therapeutics employees. N.S. is an employee of Celsius Therapeutics. A.D.C. receives research support from Bayer AG. G.G. receives research support from IBM and Pharmacyclics and is an inventor on patent applications related to MuTect and ABSOLUTE. G.B.M. consults with AstraZeneca, ImmunoMET, Ionis, Nuevolution, PDX Bio, Signalchem Lifesciences, Symphogen and Tarveda, has stock options with Catena Pharmaceuticals, ImmunoMet, SignalChem, Spindle Top Ventures and Tarveda, has sponsored research funding from Adelson Medical Research Foundation, AstraZeneca, Breast Cancer Research Foundation, Immunomet, Komen Research Foundation, Pfizer, Nanostring, Tesaro, travel support from Chrysallis Bio and has licensed technology to Nanostring and Myriad Genetics. T.R.G. is an advisor to GlaxoSmithKline, is a co-founder of Sherlock Biosciences and was a co-founder and advisor to Foundation Medicine. J.K. is a Tesaro employee. W.C.H. is a consultant for Thermo Fisher, AjuIB, Paraxel and MPM Capital, and is a founder and consultant for KSQ Therapeutics. L.A.G. is an employee of Eli Lilly.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Overview of CCLE cell lines and datasets.

a, The existing and new CCLE datasets as indicated are depicted. b, Distribution of cell lines by lineage and ancestry across CCLE. c, Visual representation of the number of cell lines in each dataset. New CCLE datasets are shown in red. Functional genomics datasets are shown in blue.

Extended Data Fig. 2 CCLE variant calling pipeline and CCLE and GDSC comparison.

a, Unified pipeline integrating mutation and indel calls from different platforms was used to generate a set of high confidence genomic alterations across 1,063 cancer cell lines. Identified variants were cross-referenced with the ExAC and TCGA databases and a panel of normals (PoN) to exclude germline variants/artefacts and generate the finalized high-confidence variant call set. bd, Comparison of variant calls between CCLE and Sanger GDSC cell lines for germline (b; n = 1,250,562), TCGA hotspot somatic (c; n = 281) and non-hotspot somatic (d; n = 82,572) variants using WES data. Pearson’s correlation coefficients are shown. e, Comparison of TCGA hotspot variant calls between CCLE Hybrid Capture (HC) data and Sanger GDSC WES data. Variants with allelic fraction >0.4 in one dataset and greater than fourfold difference in allelic fractions between the two datasets are shown as open circles (n = 980). f, g, Comparison of Pearson’s correlation coefficients between CCLE WES and Sanger GDSC WES data versus Pearson’s correlation coefficients between CCLE HC and Sanger GDSC WES data for germline (f; n = 107) and somatic (g; n = 93) variants. Cell lines with fewer than 30 variants were excluded. h, Comparison of allelic fraction Pearson’s correlations between CCLE cell lines and Sanger cell lines using CCLE HC and Sanger GDSC WES data (n = 558 common cell lines between the two datasets; Supplementary Table 3). Cell lines with low germline correlation (sample mismatch) and low somatic correlation (genetic drift) are highlighted.

Extended Data Fig. 3 Annotation of structural variants and fusions in CCLE cell lines.

a, Structural variant burden in CCLE whole genomes. Structural variants detected by SvABA in cell lines grouped by tissue type are plotted in the order of mean structural variant burden (red bar in each facet). b, Bar plot of recurrent COSMIC fusions detected in CCLE RNA-seq data coloured by cell line lineage. c, Volcano plot of Achilles RNAi gene dependencies versus CCLE fusions for cell lines (n = 478) common between CCLE and Achilles datasets. P values determined by two-sided t-test. Genes with significant adjusted P values (false discovery rate (FDR) < 0.1) are highlighted. d, e, Examples of fusions associated with gene dependency: cell lines with ESR1-CCDC170 fusion (n = 4) are sensitive to ESR1 shRNA knockdown (d), and cell lines with AFF1-KMT2A fusion (n = 3) are sensitive to AFF1 shRNA knockdown (e). The x axis shows mRNA expression, and the y axis shows Achilles RNAi gene dependency DEMETER score5.

Extended Data Fig. 4 Comparison of COSMIC mutational signatures in CCLE and TCGA datasets.

a, Mutational signature activity for CCLE cell lines and TCGA tumours averaged for each cancer type. For each sample, we computed a fraction of mutations attributed to 30 COSMIC signatures and took average across samples in each cancer type. Tumour types selected for representation have at least 20 samples in CCLE. b, Scatterplots for the mutational signature activities for CCLE and TCGA (n = 168). P value determined by linear regression analysis and corrected for COSMIC signature number. c, Volcano plot for comparison of COSMIC mutational signatures and CCLE or GDSC genetic drift estimates using two-sided Pearson’s correlation test (n = 3–459; Supplementary Table 6). d, Scatter plot for COSMIC6 mutational activity signatures versus CCLE or GDSC genetic drift estimates (n = 354). Colour coding as in b. P value determined by Pearson’s correlation test.

Extended Data Fig. 5 Determination of MSI status in the CCLE and interrogation of mismatch repair genes.

a, Identification of MSI cell lines. Number of deletions in microsatellite regions is plotted versus percentage of deletions in microsatellite regions for all cell lines in CCLE HC sequencing, CCLE WGS, CCLE WES, and Sanger GDSC WES datasets (see Methods). The x axis denotes the number of short deletions in microsatellite regions, and the y axis denotes percentage microsatellite as measured by the percentage of short deletions that lie within microsatellite regions. Inferred MSI cell lines are outlined by the green rectangle. b, Heat plot of inferred MSI status and selected CCLE annotations for DNA mismatch repair genes MLH1, MSH2 and MSH6 genes for all cell lines (top) and the MSI subset (bottom). Highlighted red boxes show differences in mRNA and protein expression levels in MSH2 and MSH6. MLH1 hypermethylation is defined as average promoter methylation greater than 0.5. c, d, Scatterplot of CCLE cell lines comparing MSH6 mRNA expression levels (x axis) from RNA-seq versus MSH6 protein abundance (y axis) as quantified by RPPA in inferred-MSI (c) and inferred-MSS (d) cell lines. Red and blue denotes cell lines containing truncating mutations or copy number loss in MSH6 and MSH2, respectively. Purple denotes cell lines containing truncating mutation or copy number loss in both MSH2 and MSH6. The black box highlights the MSH6 high mRNA low protein (HL) category. eg, Bar plots of percentages of cell lines containing truncating mutations in MSH6 (e) or MSH2 (f), and MLH1 expression loss (g) in different MSH6 mRNA and protein categories among inferred-MSI cell lines (LL: n = 11; HL: n = 17; HH: n = 44). P = 4 × 10−4 (e), P = 1 × 10−3 (f) and P = 1 × 10−4 (g), two-sided Fisher test. h, MSH2 protein levels in different MSH6 mRNA and protein categories. ***P < 1 × 10−6, two-sided Wilcoxon rank-sum test. P = 8 × 10−14, difference between the HH and HL set; P = 1 × 10−8, difference between the HH and LL set. Box plots as defined in Fig. 4d.

Extended Data Fig. 6 Examples of DNA methylation associated with gene expression and dependencies in cell lines.

a, t-SNE plot for DNA methylation data across all CCLE cell lines. Each dot represents a cell line coloured by cell lineage. b, Distribution of mean CpG methylation in CCLE cell lines (n = 843) grouped by cancer type. Box plots as defined in Fig. 4d. c, Correlation of promoter methylation and gene expression for all genes corrected for cancer type (n = 836 cell lines, 18,296 genes). The y axis represents the number of genes, and the x axis is the linear regression coefficient corresponding to normalized promoter DNA methylation. Cancer types were used as covariates in the linear regression analysis. A subset of genes show significant correlation between higher promoter methylation and lower gene expression (n = 7,388; permutation test P < 0.05; Methods). Dotted line shows the empirical null distribution. d, Cell lines with higher levels of RPP25 methylation show decreased RPP25 mRNA expression (Pearson’s r = −0.79, n = 834 cell lines; P < 2.2 × 10−16). e, Comparison of Achilles RNAi RPP25 gene dependency scores for cell lines with and without truncating mutation or copy number loss in POP7 or RPP25L genes (n = 458 cell lines; P = 0.74, two-sided Wilcoxon rank-sum test). Box plots as defined in Fig. 4d. f, Cell lines with higher levels of LDHB methylation show decreased LDHB mRNA expression (Pearson’s r = −0.80, n = 815 cell lines; P < 2.2 × 10−16). g, Cell lines with higher levels of LDHA methylation show decreased LDHA expression. Two cell lines, SK-N-BE2 and U-251-MG, show markedly higher LDHA methylation and decreased LDHA expression (Pearson’s r = −0.27, n = 836; P = 5.34 × 10−16). h, Cell lines with high levels of LDHA methylation display sensitivity to LDHB knockout by CRISPR–Cas9 screening (Pearson’s r = −0.53, n = 371, P < 2.2 × 10−16). i, Promoter methylation versus mRNA expression correlations in TCGA tumour types (sample sizes shown in parentheses). *P < 0.001, Pearson’s correlation test. j, Scatterplot of CCLE lines comparing expression of tumour suppressor VHL (Von Hippel-Landau) mRNA versus VHL methylation (left, all cell lines) and copy number (right, kidney subset). VHL hypermethylation in three kidney cell lines is associated with marked loss of VHL expression. VHL is inactivated by DNA copy number loss, somatic mutation, and promoter hypermethylation.

Extended Data Fig. 7 Global chromatin profiling dataset.

a, Unsupervised clustering of global chromatin profiling data for 897 CCLE cell lines. Each column corresponds to an individual cell line and each row corresponds to a specific combination of chromatin post-translational modifications (‘marks’). For each mark, the fold change relative to the median of cell lines is depicted on the heat map. EZH2, NSD2, CREBBP and EP300 status are annotated. Previously described clusters (associated with EZH2 gain of function, EZH2 loss of function, and NSD2 alterations), as well as the newly identified cluster associated with p300 and CBP gain-of-function alterations, are annotated. b, Volcano plot for truncating mutation enrichment analysis in the newly identified cluster, characterized by marked increases in H3K18 and H3K27 acetylation is shown (n = 893 cell lines; adjusted P values determined by two-sided Fisher’s exact test). EP300 and CREBBP are the top two genes with truncating mutations enriched in this cluster. Only genes with at least 20 affected cell lines (n = 684 genes) were included. c, Distribution of truncating mutations affecting EP300 and CREBBP in the 10 cell lines in the newly identified p300/CBP cluster. Truncating mutations predicted to affect the TAZ2 (CH3) domain specifically are highlighted. Two other truncating mutations not specific to TAZ2 (CH3) are OVCAR-8 (S893*) and COLO-704 (K1469fs).

Extended Data Fig. 8 Comparison of CCLE gene expression data with primary tumour (TCGA) and normal tissue (GTEx) gene expression datasets.

a, Comparison of gene expression profiles between the CCLE cell lines (n = 1,019) and TCGA primary tumours (n = 10,535). For every gene in each dataset, expression values were averaged per cancer type and then mean centred across types. Pearson correlation values were calculated between the CCLE and TCGA cancer types using the (n = 5,000) most highly variable genes. b, Comparison of average gene expression profiles between the CCLE cell lines (n = 1,019) and the GTEx normal tissues (n = 11,688). Similar to a, expression profiles for each tissue type in GTEx was correlated with the CCLE expression profiles (n = 5,000 genes). c, Gene expression comparison of eight prostate cell lines and TCGA primary tumour samples (n = 5,000 genes). d, Gene expression comparison of eight prostate cell lines and GTEx normal tissue samples (n = 5,000 genes).

Extended Data Fig. 9 MDM4 alternative splicing and association with RPL22 and RPL22L1.

a, Distribution of MDM4 exon 6 inclusion (left) and MDM4 mRNA expression (right) correlation with all gene dependencies in the Achilles RNAi dataset (n = 189–478; Supplementary Table 10). b, Correlation of MDM4 exon 6 inclusion with sensitivity to all small molecules in the CTRP AUC dataset using all cell lines. Nutlin-3a is the top drug sensitivity correlated with MDM4 exon 6 inclusion (n = 79–810; Supplementary Table 10). c, Example of nutlin-3a sensitivity versus MDM4 exon 6 inclusion in the AML cell lines (Spearman correlation ρ = −0.64, P = 3 × 10−4, n = 28). The y axis shows the AUC for nutlin-3a in the CTRP dataset. d, Scatterplot of MDM4 exon 6 inclusion versus RPL22L1 expression for all p53-mutant (left, n = 711) and p53 wild-type (right, n = 288) CCLE cell lines. P values determined by Pearson’s correlation test. e, Frequency of RPL22 recurrent frameshift mutations (left) and copy number deletions (right) in TCGA. f, Frequency of RPL22 recurrent frameshift mutations (left) and copy number deletions (right) in CCLE. g, Correlation of RPL22L1 mRNA expression with RPL22 copy number loss and RPL22 frameshift deletions in TCGA. P value determined by two-sided Kruskal–Wallis rank-sum test. Box plots as defined in Fig. 4d. Values in parentheses denote sample size in each category. h, Correlation of MDM4 exon 6 inclusion with RPL22 copy number loss and RPL22 frameshift deletions in TCGA. P value determined by two-sided Kruskal–Wallis rank-sum test. Box plots are as defined in Fig. 4d. Values in parentheses denote sample size in each category. i, Selected genomic features that correlate with sensitivity to MDM4 shRNA knockdown. mRNA expression of MDM4 and TP53 are shown for comparison.

Extended Data Fig. 10 Examples of microRNA expression associated with gene dependencies in cell lines.

a, t-SNE plot for miRNA data across all CCLE cell lines. Each dot represents a cell line. Each colour represents a different cell lineage. Colour coding is as in Fig. 1. b, Scatter plot of pairwise Pearson’s correlation of gene dependency and miRNA expression (n = 420 cell lines), normalized for each microRNA (z1, x axis) and each gene dependency (z2, y axis). Strong outlier pairs with |z1| > 6 or |z2| > 6 are highlighted. c, Distribution of Pearson’s correlations of mir-215 expression with Achilles RNAi gene dependencies for 16,871 genes (n = 162–420 cell lines; Supplementary Table 13). CTNNB1 knockdown is the top negative correlate with mir-215 expression. d, Distribution of Pearson’s correlations of CTNNB1 gene dependency with all 734 measured miRNAs (n = 420 cell lines). The expression of mir-215 is the top gene negatively correlated with CTNNB1 dependency. mir-215 and mir-194-1 cluster together at 1q41, whereas mir-192 and mir-194-2 cluster at 11q13.1. mir-215 and mir-192 are close homologues. e, Scatterplot of mir-215 expression versus CTNNB1 dependency of all CCLE cell lines. Colon and stomach lineages are shown in blue and red, respectively. f, Scaled mir-215 expression in TCGA and CCLE datasets (n = 14; mean ± s.e.m.). Stomach and colorectal lineages in both datasets have high mir-215 expression. g, Single-sample gene set enrichment analysis identifies TGFB1 and WNT3A pathway gene sets correlated with mir-215 expression using CCLE RNA-seq data. The gene set ‘Labbe targets of TGFB1 and WNT3A’ of downstream targets of TGF-β and WNT ligands is negatively correlated with mir-215 expression. h, The gene set ‘Labbe targets of TGFB1 and WNT3A’ is negatively correlated with mir-215 expression in the TCGA stomach mRNA expression dataset. i, The gene set ‘Vecchi gastric advanced vs early dn’ of downregulated genes distinguishing between advanced and early gastric cancer subtypes is positively correlated with mir-215 expression in the CCLE. j, mir-215 expression in the stomach TCGA mRNA expression dataset is positively correlated with the ‘Vecchi gastric advanced vs early dn’ gene set.

Extended Data Fig. 11 RPPA analysis.

a, Distribution of Pearson’s correlation coefficient between total protein levels as measured by RPPA and mRNA expression levels measured by RNA-seq (n = 890 cell lines, 154 genes). The empirical null distribution for correlation of mRNA and protein for two random genes is shown for comparison (P < 2.2 × 10−16, two-sided Wilcoxon rank-sum test). b, Effect of RPPA dynamic range on mRNA and protein correlation (n = 96). mRNA and protein correlation is plotted against dynamic range for each validated total protein antibody. Most antibodies with low mRNA and protein correlation tend to have low dynamic range with the exception of the gene VEGFR2, which despite high dynamic range, exhibits very low mRNA and protein correlation. P values determined by two-sided Pearson’s correlation test. c, Effect of RPPA antibody quality and target type on mRNA/protein correlation. On the left, mRNA/protein Pearson correlation is plotted for ‘validated’ (n = 96) and ‘with caution’ (n = 58) antibodies for antibodies against total proteins. On the right, mRNA and protein Pearson’s correlation is plotted for antibodies against total protein (n = 154) and antibodies against phospho-protein (n = 50). Median correlations are 0.62 (validated), 0.48 (caution), 0.54 (total protein), 0.21 (phospho-protein). P values determined by two-sided Wilcoxon rank-sum test. Box plots are as defined in Fig. 4d. d, Comparison of mRNA and protein correlations in CCLE and TCGA (n = 152). The Pearson’s correlation between mRNA and protein levels is calculated for each RPPA antibody in CCLE and TCGA separately. Each dot represents an antibody. Generally, the antibodies with low mRNA and protein correlation in CCLE also have low mRNA and protein correlation in TCGA data. P values determined by two-sided Pearson’s correlation test. e, Distribution of gene dependency (Achilles RNAi) correlations with RPPA pSHP2 level (left, n = 161–411, Supplementary Table 14) and PTPN11 mRNA expression (right, n = 192–478, Supplementary Table 14). PTPN11 dependency is strongly correlated with pSHP2 level, whereas there is no significant correlation with PTPN11 mRNA level. f, Comparison of pSHP2 levels in SHP099-sensitive and -resistant cell lines (n = 60). P value determined by two-sided Wilcoxon rank-sum test. SHP099 sensitivity data were obtained from a previous study26. Box plots are as defined in Fig. 4d. g, Pearson’s correlation of pSHP2 and Sanger GDSC drug sensitivity AUC dataset (n = 265 drugs and 198–588 overlapping cell lines). h, Model error for elastic net model of sensitivity to ponatinib with and without using RPPA data as predictive features. The y axis shows the cross-validation error (fivefold cross-validation) against parameter λ of elastic net (parameter α is fixed at 0.2). Data are mean ± s.d. for the five cross-validation sets. The minimum cross-validation error for models with and without using RPPA data are shown by arrows. i, Elastic net results for sensitivity to ponatinib. pSHP2 is the top feature selected by elastic net. On the left, elastic net weights (averaged over 200 bootstrapping trials) and colour-coded by the frequency each feature was selected by elastic net. The numbers in parentheses are the frequency each feature was selected. Each column is a cell line and each row is a feature. The cell lines are sorted by their sensitivity to ponatinib (shown at the bottom). j, Western blot analysis of pSHP2 and total SHP2 levels across AML and select CML cell lines. Western blots were performed twice independently with similar results. k, Validation of RPPA data for pSHP2. pSHP2 levels measured by western blot are plotted against pSHP2 levels measured by RPPA for the tested AML and control CML cell lines (n = 19). The cell lines are colour-coded by their sensitivity to ponatinib. P values determined by two-sided Pearson’s correlation test. l, In vivo mouse xenograft experiment survival curves of ponatinib-treated and control mice for the low pSHP2 primagraft DFAM-68555 (n = 7 mice in each treatment group). P values determined by log-rank (Mantle–Cox) test. m, Immunohistochemistry of spleen specimens from mice treated with control or ponatinib for 5 days using anti-CD45. Similar results were found using the other two independent sets of mice.

Supplementary information

Supplementary Information

Supplementary Methods, Computational Analysis and Supplementary References.

Reporting Summary

Supplementary Figure 1

Uncropped scans with size marker for (Extended Data Fig. 11j).

Supplementary Table 1

Cell lines annotations and available datasets Cell Line Annotations: List of the cell lines with CCLE IDs, DepMap IDs, and the available annotations. Datasets: Data used to generate Extended Data Fig. 1c. Overlapping cell lines: Number of overlapping cell lines between different datasets. Cell line name changes: List of cell lines with changed CCLE ID.

Supplementary Table 2

RainDance targets List of genomic loci and primer sequences used in RainDance sequencing data.

Supplementary Table 3

CCLE GDSC comparison Data used to generate Extended Data Fig. 2h. r_somatic_CCLE_HC_vs_GDSC_WES: Pearson correlation of somatic variants' allelic fractions between CCLE hybrid capture and Sanger GDSC whole exome sequencing. n_datapoints.somatic: Number of data points used to calculate the correlation (somatic) r_germline_CCLE_HC_vs_GDSC_WES: Pearson correlation of germline variants' allelic fractions between CCLE hybrid capture and Sanger GDSC whole exome sequencing. n_datapoints.germline: Number of data points used to calculate the correlation (germline). comments: Cell lines classification based on CCLE/GDSC concordance.

Supplementary Table 4

Fusion vs dependency analysis Data corresponding to the Extended Data Fig. 3c. This table includes fusions associated with dependencies based on Achilles RNAi, Achilles CRISPR, and Novartis RNAi datasets. Two-sided t-test was used. Sample size (n) is provided for each row.

Supplementary Table 5

TERT promoter mutations List of cell lines and corresponding TERT promoter genotype profiled by whole genome sequencing or targeted sequencing.

Supplementary Table 6

COSMIC mutational signatures analysis Data used in mutational signature analysis (Extended Data Fig. 4). CancerTypes_analyzed: List of COSMIC signatures analyzed in each cancer type. CCLE_perCellLine: Mutational signature activity in each CCLE cell line. CCLE_perTumorType: Average mutational activity in each cancer type in CCLE. TCGA_perTumorType: Average mutational activity in each cancer type in TCGA. Signature_vs_drift_volcanoPlot: Correlation of mutational signatures with genetic drift (Data used in Extended Data Fig4c); Two-sided Pearson correlation test was used. Sample size (n) is given for each row.

Supplementary Table 7

Microsatellite instability (MSI) annotation CCLE.hc.msi_del: Number of short deletions in microsatellite regions in CCLE hybrid capture dataset. CCLE.hc.total_del: Total number of short deletions in CCLE hybrid capture dataset. CCLE.wes.msi_del: Number of short deletions in microsatellite regions in CCLE whole exome sequencing dataset. CCLE.wes.total_del: Total number of short deletions in CCLE whole exome sequencing dataset. CCLE.wgs.msi_del: Number of short deletions in microsatellite regions in CCLE whole genome sequencing dataset. CCLE.wgs.total_del: Total number of short deletions in CCLE whole genome sequencing dataset. GDSC.wes.msi_del: Number of short deletions in microsatellite regions in Sanger GDSC whole exome sequencing dataset. GDSC.wes.total_del: Total number of short deletions in Sanger GDSC whole exome sequencing dataset. GDSC.msi.call: MSI call in Sanger GDSC dataset. CCLE.MSI.call: MSI call in CCLE dataset.

Supplementary Table 8

DNA methylation analysis methylation_vs_dependency: List of gene dependencies and associated promoter methylations (Fig. 2a). average_methylation: Average DNA methylation for each cell line (Extended Data Fig. 6b). mRNA_methylation_correlation: Correlation of promoter methylation and mRNA expression (Extended Data Fig. 6c).

Supplementary Table 9

Comparison of CCLE gene expression with TCGA and GTEx gene expression profiles Data used to generate Extended Data Fig. 8. corr_w_GTEx_tissueTypes: Correlation (n=5000) between CCLE samples and average expression of GTEx tissue types; Rows are individual cell lines and columns are GTEx tissue types. TCGA_CCLE_avg_cor_plot: Correlation (n=5000) between average expression for CCLE cancer types (columns) and average expression of TCGA cancer types (rows). TCGA_CCLE_mean_expressions: Average expression values of TCGA cancer types and CCLE cancer types (rows) across 5000 genes (columns). GTEx_CCLE_avg_cor_plot: Correlation (n=5000) between average expression for CCLE cancer types (columns) and average expression of GTEx tissue types (rows). GTEx_CCLE_mean_expressions: Average expression values of GTEx tissue types and CCLE cancer types (rows) across 5000 genes (columns).

Supplementary Table 10

Alternative splicing analysis Splicing_vs_dependency: Correlation of splicing with gene dependencies (Fig. 4a). Expression_vs_MDM4_splicing: Correlation of MDM4 exon 6 splicing with mRNA expressions (Fig. 4f). Splicing_vs_RPL22L1_expression: Correlation of RPL22L1 expression with mRNA splicing of different exons (Fig. 4g). MDM4_mRNA_vs_dependencies: Correlation of MDM4 mRNA expression with gene dependencies (Extended Data Fig. 9a). MDM4splicing_vs_dependencies: Correlation of MDM4 exon 6 splicing with gene dependencies (Extended Data Fig. 9a). MDM4splicing_vs_drugs_CTRP: Correlation of MDM4 exon 6 splicing with drug sensitivities (Extended Data Fig. 9b).

Supplementary Table 11

MDM4 splicing validation Primer_sequences: list of qPCR primers used in MDM4 splicing validation experiment. RNAseq_vs_qPCR: Data used in Fig. 4c.

Supplementary Table 12

TP53 status in CCLE cell lines p53 mutation, copy number, expression and splicing status in CCLE.

Supplementary Table 13

miRNA expression analysis miRNA_vs_dependency: Correlation of miRNA expressions with gene dependencies (Extended Data Fig. 10b). mir215_vs_dependencies: Correlation of miR215 expression with gene dependencies (Extended Data Fig. 10c). CTNNB1_vs_miRNA: Correlation of CTNNB1 dependency with miRNA expressions (Extended Data Fig. 6c).

Supplementary Table 14

Reverse phase protein array (RPPA) analysis RPPA_Ab_information: List of antibodies used in RPPA analysis. Batch info: Annotates which cell lines were included in each batch of the RPPA profiling. RPPA_vs_Achilles: Correlation of protein expressions with gene dependencies (Fig. 5a). RPPA_pSHP2_vs_Achilles: Correlation of pSHP2 with gene dependencies (Extended Data Fig. 11e). RPPA_PTPN11mRNA_vs_Achilles: Correlation of PTPN11 mRNA expression with gene dependencies (Extended Data Fig. 11e). RPPA_pSHP2_vs_drugs: Correlation of pSHP2 with drug sensitivities (Extended Data Fig. 11g).

Source data

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghandi, M., Huang, F.W., Jané-Valbuena, J. et al. Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature 569, 503–508 (2019). https://doi.org/10.1038/s41586-019-1186-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41586-019-1186-3

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer