Introduction to whole exome sequencing: decoding the essentials

Introduction to whole exome sequencing: decoding the essentials

Dark DNA double helix with highlighted white section

What if we could zoom in on the most informative parts of the genome to speed up disease research and diagnosis? Whole exome sequencing (WES) does exactly that. It allows researchers and clinicians to identify disease-causing mutations efficiently, by focusing on just the protein-coding regions of genes. This introduction to WES explains what it is and how it works, as well as outlining its applications and offering a comparison with whole genome sequencing (WGS).

Table of contents

What is whole exome sequencing?

WES is a form of targeted next generation sequencing (NGS). Rather than analyzing an organism's entire genome, WES concentrates on the protein-coding exons (expressed regions). In the human genome, which comprises approximately 3 billion base pairs, only about 1-2 % of the DNA codes for proteins.1 Since the majority of mutations linked to genetic disorders occur within these protein-coding regions, WES can provide an effective means of identifying disease-causing mutations without the need to sequence the entire genome.

Whole exome sequencing applications

The ability to find disease-causing mutations makes WES a useful technique for many clinical research and diagnostics applications. WES can be used in clinical research to compare the protein-coding exomes of people with the same disorder. If all these individuals are found to share the same mutation(s), the genetic origin of the disease can then be identified. However, for many complex genetic diseases, the search for disease-causing mutations is not straightforward. Variations in a number of different genes can present a risk to disease development, and not all mutations need to be present for a condition to occur. This means that comprehensive studies with a large number of participants may be needed to shed light on the genetic origins of certain disorders.

Once disease-causing or risk-conferring mutations have been identified for a particular disorder, WES can be used in a clinical setting for diagnostic testing. The protein-coding exomes of people with a family history of disease can be sequenced to determine whether or not they are a carrier of it. Moreover, WES can be used to diagnose people who are looking for an explanation of their symptoms. Prenatal testing of fetuses can even be performed when ultrasound abnormalities are observed, or there are other family members with known genetic diseases.

WES may also enable more personalized treatment plans in the future, by helping scientists to understand how genetic variations affect therapeutic responses. Oncology is an important field where WES is already being used to improve treatment efficacy and outcomes. WES can identify mutations in cancer cells that may influence the tumor's behavior and response to specific treatments. This capability sometimes enables tailored therapies based on the mutations present in a patient.

WES also has applications in fields other than human disease research and diagnostics, such as agriculture. By sequencing the protein-coding exomes of livestock or crops, researchers can identify the mutations responsible for undesirable traits – such as infertility – or desirable ones, like resistance to pathogens.

How does whole exome sequencing work?

WES consists of the following steps:

  1. DNA extraction
  2. DNA fragmentation
  3. Adapter ligation and amplification
  4. Target enrichment
  5. Sequencing
  6. Data analysis
Tube and green non-coding DNA double helix with orange protein-coding sequence

2. DNA fragmentation

Next, the DNA needs to be fragmented into shorter pieces using either physical or enzymatic fragmentation methods.

DNA double helix fragmented into non-coding and protein-coding sequences

3. Adapter ligation and amplification

DNA adapters are then ligated to the fragments, and the sequences are PCR amplified.

Adapter ligation and amplification for whole exome sequencing

4. Target enrichment

Next, a target enrichment step is performed to capture the protein-coding exons. This process can be performed using either solution- or array-based methods. Both techniques rely on hybridization capture. In the more commonly used in-solution capture approach, biotinylated probes complementary to the exomic regions of interest are added to the sample. The DNA sequences denature on heating, then double-stranded molecules reform as the sample cools. Non-coding and some exomic sequences pair again with a complementary sample strand, while other protein-coding DNA fragments bind to a probe.

Denaturation of DNA sequences and probe binding

Subsequently, streptavidin-coated magnetic beads are added to the sample, which the probes bind to via the biotin molecule on one end.

Probes bound to exomic sequences bind to streptavidin-coated bead for WES target enrichment

When the sample is then placed on a magnetic separator or stand, the beads carrying the exomic regions of interest are attracted. Most unwanted, unbound DNA sequences can be eliminated by removing the supernatant, and after a series of washing steps, only the exomic regions of interest are left.

Array-based target enrichment methods work similarly to solution-based approaches. The main difference is that the probes are bound to a microarray rather than added to the sample. When the sample is denatured and added to the microarray, most exomic sequences hybridize to the probes, and unbound fragments can be aspirated and discarded.

Array-based target enrichment: exomic sequences hybridize to immobilized probes

Note that the target enrichment steps to capture exons are usually performed using a kit from a specialized manufacturer. An example of a solution-based kit is the Twist Human Core Exome Kit (Twist Bioscience); you can find a protocol for its automation on our NGS sample prep system in this application note.

5. Sequencing

The protein-coding exon sequences obtained at the end of the target enrichment process need to be sequenced to detect mutations. Several NGS methods are available for this purpose, and more information on how they work – and the pros and cons of different approaches – can be found in these articles:

6. Data analysis

The last step in the protocol is data analysis. Although WES only sequences a fraction of the entire genome, a vast amount of data is generated. Its interpretation is performed by computational algorithms and bioinformatics methods.

Whole exome sequencing vs whole genome sequencing

In contrast to WES, WGS reads an organism's entire genome, not just the protein-coding exons. This section discusses the advantages and drawbacks of both methods, to help you decide which one may be better suited for your application.

Since WES only focuses on the sections of DNA most likely to affect the phenotype of an individual, it reduces the cost and time of sequencing. The overall reagent consumption is lower compared to WGS – even though the technique requires additional library preparation steps and reagents – and data can be obtained much faster, as the smaller DNA input reduces the time taken for sequencing.

WES also produces less data, making it more manageable. Analyzing the 1-2 % of a genome that encodes proteins – instead of the entire genome – means that less storage is required. Moreover, WES data is easier to analyze, not only because the amount of data is reduced, but also because protein-coding regions are usually better conserved and better understood, reducing the computational burden.

Budget constraints mean that researchers frequently face a critical decision: whether to use WGS for a limited number of samples, or WES for a larger number of samples. WES is therefore often used in population genomics studies. The cost-effectiveness of WES also allows sequencing of samples at a greater depth (the number of times a certain nucleotide is sequenced). This increases confidence in variant detection – a benefit that is particularly crucial for applications looking for rare mutations.

Let's now look at the advantages of WGS. Although the majority of disease-causing mutations are located in the protein-coding regions, variation in non-coding DNA sequences can also have an impact on gene expression and regulation, leading to genetic disorders. WGS can detect these variations – which would be missed by WES – as it sequences the entire genome. However, there are WES kits that include probes to capture known disease-causing non-coding regions. The sequencing method can be extended to any region of interest, as long as its DNA sequence is known, so that suitable probes can be designed.

Another advantage of WGS is that it enables better detection of structural variations. WES has a low sensitivity for these mutations, whereas WGS can identify them, especially when performed with long read sequencing methods.

Furthermore, WGS leads to more reliable and uniform sequence coverage. Sequences with low or high GC contents are less effectively amplified and captured during PCR and target enrichment in WES, which can result in little to no coverage in certain areas. Moreover, as WES probes are designed to be complementary to a reference genome, they preferentially capture reference sequence alleles at heterozygous sites, which leads to an underrepresentation of alternate alleles.

WGS also has the advantage that it can be used to sequence the genome of organisms for which no reference sequence is available. For WES, you need previous knowledge of the location and sequence of protein-coding regions to design probes, whereas WGS can be used for de novo sequencing.

Lastly, WGS offers the potential for reanalysis. As our understanding of the genome increases continuously, it may be useful to reanalyze historical WGS data to find new variants of interest as better algorithms become available. WES, in contrast, would require additional sequencing to obtain this updated information.

Conclusion

In summary, WES offers a focused approach to genetic research and testing. By concentrating on the most informative sections of the genome, it allows scientists and clinicians to uncover the basis of diseases with greater efficiency, and at a lower cost, than WGS. However, the method also has its limitations, and it will be interesting to see whether WES or WGS will become the prevalent method in the long term.

Do you have further questions about WES or WGS? Leave a comment below and we'll be happy to help you.

Questions? Feel free to ask!

About the author