New NGS sequencing approach of paired diagnostic and remission samples to detect somatic mitochondrial DNA mutations in leukemia
Mitochondrial DNA mutations (mtDNA) have been described that are associated with leukemia. To identify somatic mutations it is necessary to have a control tissue from the same individual for comparison. In this review we describe a new next-generation sequencing approach to identify leukemia-associated mtDNA mutations by using remission samples as control.
by Dr Ilaria Stefania Pagani
The identification of acquired somatic mutations in leukemic samples is of considerable importance for diagnosis and prognostication. In order to identify somatic mutations it is necessary to have a control tissue from the same individual for comparison. Non-hematopoietic tissues, such as mesenchymal stromal cells (MSCs) or hair follicles are preferred, but not always available. When patients with leukemia achieve remission, the remission peripheral blood (PB) may be a suitable and easily available control tissue. This article will provide recommendations for the identification of tumour-associated mtDNA somatic mutations, highlighting advantages and disadvantages of the method.
Human mitochondrial (mt) DNA is a 16 569 bp double-stranded, circular DNA molecule that encodes 13 polypeptides of the oxidative phosphorylation system (OXPHOS), 22 transfer RNAs and 2 ribosomal RNAs. Several important differences between the mt genome and the nuclear genome complicate the study of mtDNA mutations. Ninety-three percent of the sequence consists of coding DNA, introns are absent, the only non-coding region is at the level of the D-loop containing the promoters of the genes and it is maternally inherited. Each cell has a variable number of mitochondria (typically several hundred) and each mitochondrion contains a variable number of genomes (typically 2–10). Consequently, mtDNA mutations do not follow the pattern of a diploid genome: rather, a cell may have a single mt genotype (homoplasmy) or multiple mt genotypes (heteroplasmy). Heteroplasmy may be at any frequency, could vary between cells and many variants will be below the limit of detection of Sanger sequencing, and therefore technically difficult to validate . To date, more than 400 mtDNA mutations have been associated with human diseases, most of them being heteroplasmic. Therefore, an accurate determination of the level of heteroplasmy is important for disease association studies .
mtDNA mutations and cancer
MtDNA mutations may potentially contribute to a cell to becoming cancerous, leading to invasion and metastasis . Heteroplasmic somatic mtDNA mutations have been reported in hematological neoplasms, including myelodysplastic syndromes, chronic lymphocytic leukemia, chronic myeloid leukemia (CML), acute myeloid leukemia, and acute lymphoblastic leukemia (ALL) . Many cancer types, including leukemia, have a tendency to be highly glycolytic, increasing the production of the reactive oxygen species (ROS), that lead to genomic instability. The mtDNA genome is susceptible to ROS-induced mutations owing to the high oxidative stress in the mitochondrion and limited DNA-repair mechanisms . The identification of acquired somatic mutations in leukemic samples is of considerable importance for diagnosis and prognostication. In a study in acute myeloid leukemia, for example, patients with mutated NADH dehydrogenase subunit 4 (ND4) showed greater overall survival than patients with wild-type ND4 .
mtDNA somatic mutations: the problem of control tissue
MtDNA acquires somatic mutations at a rate 10-fold higher than nuclear DNA, so mtDNA single nucleotide variants (SNVs) accumulate with age, and may be tissue-specific . This means that there is no absolutely reliable source of ‘germline’ mtDNA, especially in older individuals . Somatic mutations must be distinguished from non-pathogenic germline variants by comparison with a control tissue sample. Non-hematopoietic tissues, such as buccal cells, hair follicles or MSCs are preferred, but not always available. PB cells from a post-treatment remission sample may be used as alternative. This method is widely used for nuclear mutations, but less commonly for mt mutations . Blood samples are readily accessible from leukemia patients who achieve morphological remission after treatment. Therefore, a method for the detection of leukemia-associated mtDNA mutations based on comparison with a remission sample may be useful.
A new approach to identify mtDNA somatic mutations at diagnosis by using remission samples as control tissue
Pagani IS and colleagues developed a next-generation sequencing (NGS) approach for the identification of leukemia-associated mtDNA mutations using samples from CML patients at diagnosis and in remission following treatment with tyrosine kinase inhibitors (TKIs) . This approach could also be applied to both hematopoietic and non-hematopoietic cancers, such as epithelial tumours, in which a tumour biopsy specimen can be compared with the normal mucosa.
Twenty-six chronic phase CML patients enrolled in the Australasian Leukaemia and Lymphoma Group CML9 trial (TIDEL-II; ID: ACTRN12607000325404)  took part in the study . PB samples from leucocytes at diagnosis before commencing TKI treatment, and remission after 12 months of therapy were compared. Hair follicles (n=4), bone marrow MSCs (n=18), or both (n=4) were used as non-hematopoietic control samples. The comparison of a diagnostic sample with a non-hematopoietic control tissue is the standard method to identify somatic mutations in leukemia . The concordance between this classic method and the diagnosis versus remission approach has been investigated.
NGS assay for the mt genome
The workflow chart is represented in Figure 1. Briefly the genomic DNA (comprising a mixture of nuclear and mtDNA) was extracted by a phenol/chloroform method from PB leukocytes and non-hematopoietic tissues. The mtDNA was amplified by long-range PCR, generating two or three overlapping fragments covering the entire mt genome. The PCR amplicons were then pooled at equimolar concentrations and sequencing libraries were prepared using the Nextera XT kit (Illumina). Indexed libraries were multiplexed and run on an Illumina MiSeq instrument using the 600 cycle MiSeq Reagent kit (v3) generating 300-bp paired-end reads .
Somatic mutation calling from high-throughput sequencing datasets and validation
The majority of the variant-calling methods in use are based on low-coverage human re-sequencing data and diploid calls with discrete frequencies of interest (0%, 50% or 100%) [7, 8]; however, these assumptions do not apply to mtDNA. The LoFreq software (loFreq-star version 2.11, genome Institute of Singapore; csb5.github.io/lofreq/) was chosen because it was developed for viral and bacterial genomes as well as diploid data, and because of its ability to automate comparison with a matched control tissue for the detection of somatic mutations . The revised Cambridge Reference Sequence (rCRS) for the human mt genome (NC_012920) was used as reference sequence to identify SNVs. Tumour tissue (test) and control were then compared to identify somatic mutations specific only for the tumour tissue. Variants in common between the test and the control sample were considered to represent germline polymorphisms or mutations and were filtered out by the software. A binomial test was applied to the remaining variants to determine whether an apparent difference between samples could be due to inadequate read coverage in the control. Variants passing the binomial test were retained in the final list of putative somatic mutations (Fig. 2a) . The identified mutations should be considered putative and, in common with most other NGS strategies for the discovery of novel mutations, any specific mutation of clinical interest would need to be confirmed using an independent method, as Sanger sequencing (limit of detection 20%), Sequenom MassArray, digital array (Fluidigm) or another NGS platform.
NGS: error rate, false positives and threshold
Before the application of NGS technologies, no evidence of heteroplasmy was detected, probably because of the lower sensitivity of earlier techniques . NGS technologies enable the inquiry of mt heteroplasmy at the genome-wide scale with much higher resolution because many independent reads are generated for each position . However, the higher error rate associated with the more sensitive NGS methodology must be taken into consideration to avoid false detection of heteroplasmy. Short-read sequencing technologies (like in Illumina systems) have a high intrinsic error rate (approximately 1 in 102–103 bases) when applied at the very high depth required to detect and measure low-level heteroplasmy. Thus, appropriate criteria for avoiding false positives due to sequencing errors are required. The most obvious way to distinguish between sequencing errors and heteroplasmy is to invoke a threshold. Two duplicate sequencing run, of which one was ultra-deep (validation run), were compared to determine sensitivity (proportion of true positives that are correctly identified as such) and specificity (proportion of true negatives that are correctly identified as such). An empirical threshold of 2% was therefore applied to distinguish true variants from sequencing errors. Variants with a variant allele fraction (VAF, the variant allele’s read depth divided by total read depth at each nucleotide position) between 2 and 98% where then considered as heteroplasmic, and variants with a VAF >2% were called homoplasmic . This threshold could be refined by an iterative process in which a different threshold is identified for each nucleotide position , as some variation in error rate was observed. The incorporation of molecular barcodes in the initial long-range PCR would also reduce the risk of false-positive mutations due to PCR artefact .
Remission samples as control tissue in the identification of the mtDNA somatic mutations at diagnosis
In the four patients who had both MSC and hair follicle DNA available as control tissue, the same mutations at diagnosis have been identified, therefore the results using the non-hematopoietic tissues as control were combined. Remission samples were then used as control tissue to determine mtDNA somatic mutations at diagnosis, and the concordance between this method and the conventional diagnosis versus the MSC/hair follicle approach was examined. Seventy-three somatic mutations (81%) were identified in common, 11 mutations (12%) were identified only in comparison with the non-hematopoietic control, and six (6.7%) only by comparison with remission samples (Fig. 2b) . Divergent results occurred as the result of differences in read quality or depth at a specific nucleotide not reaching statistical significance in the algorithm. False-negative results could be encountered using remission samples as the control tissue, because of low-level heteroplasmic mutations in the control sample that would lead to the same mutation at diagnosis being removed through filtering.
Remission samples can be used as control tissues to detect candidate mtDNA somatic mutations in leukemic samples when non-hematopoietic tissues are not available. The presence of mutations at low VAF in the remission samples in common with the diagnosis tissue, could be filtered out by the LoFreq software leading to false-negative results. Therefore visual inspection of the unfiltered variants is recommended.
1. Pagani IS, Kok CH, Saunders VA, van der Hoek MB, Heatley SL, Schwarer AP, Hahn CN, Hughes TP, White DL, Ross DM. A method for next-generation sequencing of paired diagnostic and remission samples to detect mitochondrial DNA mutations associated with leukemia. J Mol Diagn 2017; 19(5): 711–721.
2. Li M, Schonberg A, Schaefer M, Schroeder R, Nasidze I, Stoneking M. Detecting heteroplasmy from high-throughput sequencing of complete human mitochondrial DNA genomes. Am J Hum Genet 2010; 87(2): 237–249.
3. van Gisbergen MW, Voets AM, Starmans MH, de Coo IF, Yadak R, Hoffmann RF, Boutros PC, Smeets HJ, Dubois L, Lambin P. How do changes in the mtDNA and mitochondrial dysfunction influence cancer and cancer therapy? Challenges, opportunities and models. Mutat Res Rev Mutat Res 2015; 764: 16–30.
4. Damm F, Bunke T, Thol F, Markus B, Wagner K, Gohring G, Schlegelberger B, Heil G, Reuter CW, et al. Prognostic implications and molecular associations of NADH dehydrogenase subunit 4 (ND4) mutations in acute myeloid leukemia. Leukemia 2012; 26(2): 289–295.
5. Gattermann N. Mitochondrial DNA mutations in the hematopoietic system. Leukemia 2004; 18(1): 18–22.
6. Yeung DT, Osborn MP, White DL, Branford S, Braley J, Herschtal A, Kornhauser M, Issa S, Hiwase DK, et al. TIDEL-II: first-line use of imatinib in CML with early switch to nilotinib for failure to achieve time-dependent molecular targets. Blood 2015; 125(6): 915–923.
7. Meldrum C, Doyle MA, Tothill RW. Next-generation sequencing for cancer diagnostics: a practical perspective. Clin Biochem Rev 2011; 32(4): 177–195.
8. Wilm A, Aw PP, Bertrand D, Yeo GH, Ong SH, Wong CH, Chiea CK, Rosemary P, Martin LH, Niranjan N. LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets. Nucleic Acids Res 2012; 40(22): 11189–11201.
9. Chatterjee A, Dasgupta S, Sidransky D. Mitochondrial subversion in cancer. Cancer Prev Res 2011; 4(5): 638–654.
10. Kerpedjiev P, Frellsen J, Lindgreen S, Krogh A. Adaptable probabilistic mapping of short reads using position specific scoring matrices. BMC Bioinformatics 2014; 15: 100.
Ilaria Stefania Pagani1,2 PhD
1Cancer Theme, South Australian Health & Medical Research Institute, Adelaide, Australia
2School of Medicine, Faculty of Health Sciences, University of Adelaide, Adelaide, Australia