Whole exome sequencing (WES), also named as exome sequencing, is a genomic technique to sequence all of the protein-coding genes in exome. It contains two parts: the first is to select only the subset of DNA that encodes proteins, known as exons. The second is to sequence the exonic DNA by any high-throughput DNA sequencing technology.
By using this technique, fixed-cost studies can sequence samples to much higher depth than that of whole genome sequencing. This additional depth makes exome sequencing well suited to several applications that need reliable variant calls, for example, rare variant mapping in complex disorders, discovery of Mendelian disorders, case studies, clinical diagnostics and direct-to-consumer exome sequencing.
In this article, we listed 20 frequently asked questions about whole exome sequencing.
1. Does the whole exome sequencing have a reference genome?
Yes, there should be a sequence of related species if there is no reference group, but the reliability of the capture results is not guaranteed. Since the capture probe is designed based on the provided reference sequence, it is not recommended if the target region is known to have a large discrepancy with the reference genome, such as a large fragment insertion.
2. Why does exon sequencing need to be compared with the whole genome in the analysis, rather than directly compared to the target area?
First, the target area is generally shorter than the whole genome, and may be discontinuous. If the target area is extracted separately, it will affect the sequence alignment effect at the edge of the area;
Second, the quality of the capture cannot be assessed, such as off-target rate, on target ratio, and the like.
3. How many coverage times are preferable for whole exome sequencing?
Generally, 100 × or 150 × is recommended. With a higher coverage, some rare mutations is easier to be found for the genetic deterioration of heterogeneity. In addition, the coverage of exon sequencing is random, so a higher average coverage is beneficial to ensure that most areas have sufficient coverage.
4. What is the significance of whole exome sequencing depth? How is the sequencing depth converted?
The sequencing depth represents the number of times of the sequence is covered by the probe set. The higher the number, the more accurate the sequencing result is, and the more accurate the subsequent statistical analysis. If you run tumor, low frequency mutation studies, it is recommended that the sequencing depth should be at least 150 ×. If you only look for classic SNPs, non-low frequency mutations, the sequencing depth should be at least 30×. Sequencing depth conversion method: The capture efficiency of the general target area is 60-70%, and the target area of the exon capture kits such as Agilent and Roche are about 60 Mb, that is, the sequencing depth = 10G*60%/60Mb=100×.
5. How many missing fragments can whole exome sequencing detect?
A 50 bp fragment deletion can be roughly measured. Because the coverage of exon sequencing is very uneven, if there is a large segment of deletion, it cannot be judged whether the hybridization is not captured or it is missing. What is currently detectable is a missing found in a read. The length of a read is about 150 bp, so fragment deletions below 50 bp can be detected from exon sequencing.
6. Can whole exome sequencing be used in CNV analysis? What are the methods for detecting CNV?
Whole exon sequencing has a hybrid capture efficiency problem because of a hybrid capture process. The hybridization efficiency of each exon is different, and the homologous competition is different, so the coverage difference of different exons is very large. Therefore, in general, exon sequencing cannot be used for CNV detection. However, in cancer research, CNV can be detected using cancer tissue and paracancerous tissue controls. There are two conventional methods for detecting CNV, one is whole genome resequencing and the other is Affymetrix SNP 6.0. Among them, Affymetrix SNP6.0 has a relatively low detection cost and is a relatively economical means.
7. Can whole exome sequencing be performed in the target region for methylation detection? Can RNA capture be performed?
Target region methylation assays can be directly used with methylation capture sequencing, such as SureSelect's MethylSeq or NimbleGen's SeqCap Epi. The principle is similar to capture sequencing, which is using specialized probes to capture and enrich target area. The target region is subjected to capture enrichment, sequencing and methylation detection after bisulfite treatment. RNA capture can be performed using SureSelect's RNA Capture or NimbleGen's SeqCapRNA System.
8. Can capture samples be mixed with multiple species?
Yes, as long as the captured target fragment has a corresponding reference genome, it can be captured by the probe. For example, viral sequences integrated in the host genome, mixed parasite sequences in blood, low abundance microbial sequences in environmental samples, and the like. Since the proportion of these sequences in the mixed sample is often very low, the efficiency of using the traditional resequencing method will be very low, and a very high amount of sequencing is required to obtain sufficient coverage depth, and a large amount of redundant data is generated. In such cases, capture sequencing has an absolute advantage over them.
9. Is there any indicator for the effect of the capture? What factors will reduce the capture effect?
Due to the limitation of sequencing technology, there are some repeating sequences, undetermined “N bases”, and the quality of the sample itself, which may cause the sequence to be uncovered. This is a problem that all sequencing technologies will encounter. Deep genome sequencing does not guarantee 100% complete coverage of the genome. Since each sample varies widely, it is not reliable to make a direct commitment to the capture effect. The best way is: client provides target area, and company designs and evaluates them to get detailed reports. In addition, there are ubiquitous repeat sequences in the species. Before designing, it is generally necessary to shield some difficult-to-cover repetitive masks and then design them, which will improve the quality of the capture, but at the same time reduce the overall coverage.
10. Can high GC content fragments be captured?
Yes, but there will be some impact on the capture efficiency of these fragments. Similarly, low complexity segments and containing fuzzy base segments have some difficulty in capturing. When designing the probe, company technicians appropriately increase the number of coverages and probe density according to the coverage, and send the expected capture effect to the customer for confirmation. In addition, if you are capturing full exons, UTR, etc., you can use preset probe sets. These probe designs are optimized, which improves the corresponding capture efficiency.
20 Frequently asked Questions (FAQs) about Whole Exome Sequencing (Part Two)
11. What is the efficiency of exon capture (whole exome sequencing)?
The hybridization process is used during exon sequencing. There are many parts of the human chromosome that are homologous to exons, and these homologous parts are likely to be captured during the hybridization process. Therefore, some of the sequences detected are not exon sequences. We refer to the ratio of the sequence of exons to the entire sequencing sequence as the capture efficiency. The efficiency of capture does not affect the quality of the data, but the effective proportion of the data.
12. When do you choose whole exon capture? When do you choose custom probe capture?
All exon captures of humans and some common species have pre-set probe products, of which human all-exon probes have been optimized for many times. If there are many target regions and all are exons (several tens of Mb or more), it is recommended to select whole exon capture directly, because the whole exon has a more mature probe set and lower cost, and the capture effect is better. Custom probes are recommended for smaller segments of interest, or for segments other than exons.
13. What information are needed for custom design exon capture probe?
Species, reference genomic information, target area information (coordinates, gene names or gene numbers, etc.) are needed. If only coordinate information is provided, it should be provided in the form of a table. If the gene name is provided, it is best to have the official name of NCBI and the corresponding gene number. Multiple target areas may overlap without affecting subsequent design and analysis. After the design is completed, the final regional information will be returned to the customer for confirmation. The target area supports a maximum of 200Mb. After providing the gene name, company will help customers find specific regions of the gene, such as exons, UTR, and so on. If you want to capture several or all of them, you need to indicate.
14. Can the conventional species or new species genomes, imperfect or much-error genome be used for exon capture sequencing?
Yes, the capture sequencing platform has no species restriction, as long as genomic sequence information and target region information are available. Capture sequencing design can be performed even if the reference genome of the species is new, inaccurate, or even very different. Correspondingly, the captured probes are designed according to the sequence provided by the customer, so the accuracy of the results cannot be fully guaranteed.
15. Can the degradation sample be used for post-library construction? What is the impact on subsequent data and analysis?
If the degradation of the sample is very serious, it is not recommended to build the database. The success rate of the database is low. If the sample is seriously degraded, the impact of constructing the exon library will be low, the effective data volume is low, and the properly mapped will be low.
16. What is the impact of viscosity sample, pore impurity contamination and RNA contamination on library construction? What is the success rate?
Slight protein contamination has little effect on the library construction, but if the protein or other impurities are seriously polluted, it will affect the quantification, and the enzyme efficiency in the database, which will reduce the success rate of the database; RNA pollution mainly affects the sample quantification and the sorting of DNA in the database. Therefore, if there is RNA contamination, RNA digestion is recommended when the total amount and quality of the sample are appropriate.
17. Can a tissue be co-extracted with DNA and RNA? What is the extraction method?
Co-extraction can be performed; there are generally two methods for co-extraction, one is to divide the tissue into two parts, each for DNA and RNA extraction; the other is to use the DNA/RNA co-extraction kit for extraction, but generally degradation of DNA and RNA is likely to happy after extraction.
18. What is the reason for the low DNA extraction yield of FFPE samples?
FFPE samples are generally stored for many years, and they have been seriously degraded by formaldehyde; and FFPE samples are generally precious, and the number of samples sent is less.
19. What is the difference between cfDNA and ctDNA?
cfDNA (cell free DNA) is a free DNA in the blood. It is released from the normal cells of the body or the rupture of white blood cells. It is harmless to the body and will soon be cleaned by itself. ctDNA (circulating tumor DNA) is the rupture of tumor cells. Free DNA released into the blood can be used as a highly specific tumor marker.
20. How to separate plasma samples when extracting cf/ctDNA?
Collect 5 mL of peripheral blood (usually in the morning), quickly transfer to the EDTA anticoagulant tube, carefully invert and mix (prevent hemolysis); run internal plasma separation within 1 hour (at room temperature) or 2 hours (at 4 °C): Centrifuge at 1000 rpm for 10 minutes at 4 °C, carefully pipette the plasma to a clean 1.5 mL EP tube (not to suck white blood cells during pipetting), then centrifuge at 12,000 rpm for 10 minutes at 4 °C to remove residual cells or debris, and carefully absorb the required supernatant volume into a new dispensing EP tube, mark with an oily marker, and store in an ultra-low refrigerator at -80 °C to avoid repeated freezing and thawing, and transport samples with dry ice.