Get Help Sign In
ProcessingProcessing

How to improve SNV and indel detection in an exome NGS assay

Practical methods you can use today

Single nucleotide variant (SNV) and insertion/deletion (indel) detection in exome sequencing can be improved by optimizing various steps in library preparation, sequencing, and bioinformatic workflows. This DECODED article provides tips for improving SNV and indel detection in your exome NGS assay.

How to detect SNVs and indels using exome sequencing

The detection of SNVs and indels allows for researchers to pinpoint genetic changes associated with a particular phenotype, e.g., a disease. One way these genetic variations can be found in samples is through exome sequencing. This targeted NGS approach only focuses on the gene coding sections of a genome, the exons. The human exome (collection of all exons) has been estimated to be the site of 85% of mutations that impact disease-related phenotypes [1].

Relative to untargeted NGS approaches like whole genome sequencing, exome sequencing allows for deeper sequencing coverage to be obtained, making it ideal for SNV and indel detection as increased sequencing depth provides scientists with a higher level of confidence that the genetic variants identified are real and not sequencing artifacts. Despite the benefits offered by exome sequencing, SNV and indel detection remains challenging. This DECODED covers elements of experimental design, library preparation, and bioinformatic analyses that can be optimized to improve SNV and indel detection when using an exome NGS assay.

Tips to improve experimental design, library preparation, and sequencing for SNV and indel identification

With any NGS approach, the answers you’re looking for should be carefully considered when designing an assay, including selecting a library preparation kit and sequencing strategy. This section provides recommendations as well as factors to keep in mind when your intention is to detect variants through exome sequencing.

Coverage depth: When deciding on coverage depth for exome sequencing it’s important to obtain deep enough coverage to be able to confidently call SNVs and indels, especially if the objective is to identify low frequency variants. The coverage needed for SNV and indel detection depends on a number of factors such as the sample being used (i.e., detecting variants in somatic or germline cells, etc.) as well as how your exome sequencing library was prepared—whether you opted to use unique molecular identifiers (UMIs) or not. It is important to take all of these things into account when deciding on coverage depth.

Uniformity of coverage: Along with coverage depth, uniformity of sequencing coverage is also important. Even coverage can help avoid false negative calls by ensuring that all areas of the exome are sufficiently accounted for in your sequencing reads. Lack of coverage in certain areas could lead to incorrect conclusions about the absence of SNVs or indels. To ensure coverage will be even, you should carefully consider the oligonucleotide probes you will be using to target the sample’s exome and select those that successfully target challenging regions of a genome.

Library complexity: Library complexity refers to the level of unique fragments present in a sequencing library. A low complexity library would contain a high number of duplicated reads. High complexity is preferred as it offers a comprehensive representation of the sample being sequenced. Additionally, having a lot of duplicate reads could create an analysis bias when working to identify SNVs and indels. High-quality library preparation workflows can help ensure that high-complexity libraries are generated for exome sequencing. Accurate library quantification, minimizing PCR amplification, and library fragment quality control steps are all key for ensuring optimized library complexity.

Sequencing: In general, paired-end sequencing is recommended for detecting SNVs and indels using exome sequencing because it provides the most information from a sequencing run, improving read mapping. The more reliably reads can be aligned and mapped, the more confidently SNVs and indels can be detected. Paired-end reads also increase sensitivity as both reads can be used to determine whether a variant is a true SNV versus a sequencing error. Read length should also be carefully considered when designing an exome experiment for SNV and indel detection as longer reads can make the identification of these events easier and improve coverage. However, it is also important to balance sequencing costs and length parameters as longer reads can increase the cost and throughput of sequencing runs. The typical length recommendation for pair-end exome sequencing for SNV and indel detection is 100–150 bp.

Bioinformatic analysis tips for SNV and indel detection

After the library prep and sequencing is completed, the next step in an exome sequencing workflow is bioinformatics analysis. This process should be designed carefully because this is where SNVs and indels will be detected from the reads generated during the sequencing run. Factors to consider when designing a bioinformatics analysis pipeline for detecting SNVs and indels are outlined below.

Algorithms: There are a variety of bioinformatic tools available for analyzing sequencing data. For detecting SNVs and indels specifically, tools that have a reliable alignment algorithm as well as a reliable variant calling algorithm should be used. Commonly used alignment software tools include BWA-MEM, Bowtie 2, and minimap2 [2-4]. The variant calling tool should be selected based on the sequencing platform you used, read length, and application. Some examples of SNV and indel variant callers include FreeBayes, GATK HaplotypeCaller, Platypus, and Samtools/BCFtools [5-8].

Integration of multiple tools: One way to ensure that SNVs and indels being detected in datasets are not sequencing artifacts is to use multiple variants calling tools and to incorporate tools that allow for visual inspection of alignments. Each tool uses a slightly different approach to identify variants, so it is important to test the parameters found in each one and generate a workflow that makes the most sense with your specific dataset.

Reference genomes: If there is a reference genome available, it is important to make sure that the correct version is being used based on the sample and data that is being used. This can help ensure that the alignment being generated is accurate and that SNVs and indels being identified are true variants and not sequencing artifacts.

Exome sequencing workflow solution

An optimal solution for exome sequencing is the xGen™ Exome Hyb Panel v2, which contains probes that are designed using a new “capture-aware” algorithm and assessed with proprietary off-target analysis. All probes in the panel are manufactured under ISO 13485 standards, and then, mass spectrometry and dual quantification measurements of each probe are performed before they are pooled into the xGen Exome Hyb Panel v2. These measures ensure the quality of the probe and its appropriate representation in the final panel. When paired with the xGen Hybridization and Wash v2 Kit and xGen Universal Blockers TS, the xGen Exome Hyb Panel v2 can streamline your exome sequencing workflow by reducing processing time and maximizing on-target rate, coverage depth, and coverage uniformity.

Find out more here.

Want more expert advice on exome sequencing?

Download the Whole Exome Sequencing Handbook, a three-part series that describes the importance of WES and how this targeted NGS approach is empowering confident insights.

Click here to download your free copy.

References

  1. Majewski J, Schwartzentruber J, Lalonde E, et al. What can exome sequencing do for you? J Med Genet. 2011;48(9):580-589.
  2. Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010;26(5):589-595.
  3. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357-359.
  4. Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34(18):3094-3100.
  5. Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. arXiv. 2012.
  6. DePristo MA, Banks E, Poplin R, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011;43(5):491-498.
  7. Rimmer A, Phan H, Mathieson I, et al. Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nat Genet. 2014;46(8):912-918.
  8. Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27(21):2987-2993.

RUO24-2793_001

Published Mar 25, 2024