qxcvbnmy

Members
  • Content count

    6
  • Joined

  • Last visited

About qxcvbnmy

  • Rank
    New Member

Contact Methods

  • Website URL https://www.import-express.com/

Profile Information

  • Gender
  1. Technical advances in RNA-seq Sanger sequencing and microarrays. Sanger sequencing technology was first used for transcriptomics, which enabled methods such as SAGE (serial analysis of gene expression). SAGE was one of the first attempts to quantify gene expression on a global basis. Almost instantaneously, microarrays utilizing complementary probe hybridization, quickly emerged and come to dominate the field of transcriptomics profiling for the next decade. NGS. The advent of next-generation technologies has enabled the sequencing approach to surpass microarray approach. In 2006, the first RNA-seq paper was published by utilizing454/Roche technology. The era of RNA-seq dominance began in 2008 with the maturity of Illumina/Solexa technology. The most popular technical platforms for RNA-Seq has been the Illumina Genome Analyzer and Hi-Seq. While the Illumina/Solexa technology can generate gigabases of data per run (initially 1GB per run for the Genome Analyzer in 2006 and 600 GB per run for the HiSeq in 2012), Roche/454 technology generates reads long enough for RNA-seq but are hampered by the relatively low throughput and high cost. Third generation sequencing. Despite the popularization of the NGS technologies, the application of third generation sequencing in RNA-seq is on its way. For examples, Heliscope sequencing and single-molecule real-time (SMRT) sequencing have already been applied in some RNA-seq studies. PacBio SMRT long reads sequencing technology can easily cover complete transcript from the 5′ end to the 3′-poly A tail without the need of fragmentation to obtain full-length cDNA sequences, which is useful to identify new transcripts and new introns, thereby accurately identifying isoforms, alternative splicing sites, fusion gene expression, and allelic expression. Table 1. The advantages of RNA-seq compared with other transcriptomics approaches (Wang et al. 2009). Technology Tiling microarray cDNA or EST sequencing RNA-seq Technology specifications Principle hybridization Sanger sequencing High-throughput sequencing Resolution From several to 100 bp Single base Single base Throughput High Low High Reliance on genomic sequence Yes No In some cases Background noise High Low Low Application Simultaneously map transcribed regions and gene expression Yes Limited for gene expression Yes Dynamic range to quantify gene expression level Up to a few-hundredfold Not practical >8,000-fold Ability to distinguish different isoforms Limited Yes Yes Ability to distinguish allelic expression Limited Yes Yes Practical issues Required amount of RNA High High Low Cost for mapping transcriptomes of large genomes High High Relatively low Challenges of RNA-seq Short-read. Illumina sequencing technology has steadily increased read length and throughput since its introduction in 2007. Long paired-end strand-specific reads are commonly used for higher levels of mappability and de novoassembly of transcriptomes. Furthermore, the third generation sequencing technology (such as PacBio and Ion-Torrent) enables full-length transcripts sequencing. PCR biases. Another concern is the impact of PCR amplification on the accuracy of gene expression quantitation via RNA-seq. Helicos and some of the third sequencer used an amplification-free technology. There are also PCR-free methods for Illumina sequencing. Workflow of RNA-seq based on NGS The workflow of RNA-seq by utilizing high-throughput sequencing technology is illustrated in Figure 1. Briefly, long RNAs are first converted into a library of cDNA fragments through RNA or DNA fragmentation. Sequencing adaptors are then attached to each cDNA fragment and sequence data are generated in a high-throughput manner from both ends (paired-end sequencing). The resulting sequence reads are subsequently aligned with the reference genome or transcriptome, and are classifies into three types: exonic reads, junction reads and poly(A) end-reads. A base-resolution expression profile can be generated by using these three types of sequence reads. Figure 1. A typical workflow of RNA-seq (Wang et al. 2009). Library construction Figure 2. A typical library construction pipeline of RNA-seq. Following sample collection, total RNA is usually isolated via organic extraction and/or silica-membranes of spin columns. Total RNA sample is subsequently processed either by direct selection of poly(A) RNA or by selective removal of rRNA because the abundant rRNA is usually not the research focus and greatly reduces the coverage of the useful transcript. Oligo(dT)-based mRNA purification procedure is widely used in eukaryotes. However, some RNA transcripts that lack the poly(A) tails are missed. Compared to the poly(A) RNA selection, ribo-depletion approach is preferred because it enriches all nonribosomal RNA species, including tRNA, ncRNAs, nonpoly(A) mRNA, and preprocessed RNA. The two most popular rRNA depletion methods are: (i) hybridization of rRNA with biotin-labeled anti-rRNA probes, followed by removal with streptavidin-caoted magnetic beads; and (ii) selective degradation of rRNA by a 5’-3’ exonuclease that specifically recognizes rRNA with a 5’ phosphate. Fragmentation is subsequently conducted to reach the desired length for different NGS technologies. Some small RNAs, such as microRNAs, piwi-interacting RNAs, and short interfering RNAs, can be directly sequenced without fragmentation. Larger RNA molecules need to be fragmented into smaller pieces (200-500 bp) before deep-sequencing technologies. cDNA fragmentation (DNase I treatment or sonication) and RNA hydrolysis or nebulization. However, each of these methods can create a different bias in the outcome. For example, cDNA fragmentation is usually strongly biased towards the identification of sequences from the 3’ ends of transcripts, while RNA fragmentation has little bias over the transcript but is depleted for transcript ends. Therefore, cDNA fragmentation provides valuable information about the precise identity of these ends and RNA fragmentation provides access to precisely identity of the transcript body. In the classic NGS protocols, adapters are ligated onto shared double-stranded DNA fragments. However, a major drawback of this approach is the loss of information on transcriptional direction. Pre-treat the RNA samples with sodium bisulphate can convert the cytidine into uridine. Widespread C-T transition thereby marks the coding stand of each transcript. Some other methods that maintain strand-specificity have been proposed, such as direct ligation of RNA adaptors to the RNA sample before reverse transcription. Sequencing The RNA-seq is currently dominated by three different platforms: Illumina (Genome Analyzer and HiSeq), Applied Biosystems SOLID, and Roche 454 Life Science systems. Read lengths range from 30-100 bp for Illumina and SOLiD, and 200-500 bp for 454 pyrosequencing system. 454-based RNA-seq is particularly attractive for non-model organisms without reference genomes or transcriptomes. Longer reads or paired-end short reads can reveal connectivity between multiple exons. RNA-seq is a powerful method to study complex transcriptomes and reveal sequence variations in the transcribed regions. Bioinformatics Figure 3. A typical analysis pipeline of RNA-seq data. Quality assessment is the first step for the bioinformatics analysis of RNA-seq, which ensures a coherent final result by removal of low-quality sequences, over-represented sequences, and adapter sequences. Once all reads have been filtered and mapped or assembled, gene expression levels can thus be inferred, leading to a genome-scale transcriptome map in terms of quality and quantity. RNA-seq also allows detecting differential expression (DE) across treatments of conditions. Normalization has to be conducted to adjust the differences between samples such as library size and gene-specific features. Furthermore, RNA-seq enables us to identify SNPs, fusion genes, and post-transcriptional gene regulation, such as RNA editing, degradation, and translation. If you want more information about the applications of RNA-seq or bioinformatics workflow of RNA-seq, you can refer to the article. References: Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nature reviews genetics, 2009, 10(1): 57. Qian X, Ba Y, Zhuang Q, et al. RNA-Seq technology and its application in fish transcriptomics. Omics: a journal of integrative biology, 2014, 18(2): 98-110. Marguerat S, Bähler J. RNA-seq: from technology to biology. Cellular and molecular life sciences, 2010, 67(4): 569-579. Wilhelm B T, Landry J R. RNA-Seq—quantitative measurement of expression through massively parallel RNA-sequencing. Methods, 2009, 48(3): 249-257. McGettigan P A. Transcriptomics in the RNA-seq era. Current opinion in chemical biology, 2013, 17(1): 4-11.
  2. Introduction to ChIP-Seq ChIP-sequencing (also known as ChIP-seq), which combines chromatin immunoprecipitation (ChIP) assays with DNA sequencing, is a powerful technique for genome-wide profiling of DNA-binding proteins, histone modifications or nucleosomes. ChIP is a type of immunoprecipitation (IP) experimental method used to isolate specific DNA sites in direct physical interaction with transcription factors and other proteins. In ChIP, specific antibodies are used to enrich DNA fragments bound by particular proteins or nucleosomes. ChIP-seq was one of the early applications of NGS (next-generation sequencing), and the first study of large-scale profiling of the genome-wide histone methylations using ChIP-seq was published in 2007 (Barski et al., 2007). The sequencing of this study was performed on the platform of Solexa 1G genome analyzer. At the same year, Johnson et al. (2007) used ChIP-seq to generate the genome-wide mapping of transcription factor binding sites. Robertson et al. (2007) developed ChIP-seq to identify mammalian DNA sequences bound by transcription factors in vivo. These two papers also demonstrated the increased sensitivity and specificity of ChIP-seq. Owing to the rapid progress of NGS technology and the decreasing cost of sequencing, ChIP-seq has become an indispensable tool for characterization of epigenomes and gene regulation study (Park, 2009). Comparison of ChIP-chip and ChIP-seq ChIP-chip, ChIP coupled with microarrays, and ChIP-seq are two standard techniques for identification of the genome wide DNA-proteins binding interactions. Take advantages of the sequencing technology, ChIP-seq offers many advantages over ChIP-chip, as summarized in Table 1 (Park, 2009; Schones and Zhao, 2008). Table 1. Comparison of ChIP-chip and ChIP-seq. ChIP-chip ChIP-seq Maximum resolution Array-specific, generally 30-100 bp Single nucleotide Coverage Limited by sequences on the array; repetitive regions are usually masked out Limited only by alignability of reads to the genome; increases with read length; many repetitive regions can be covered Flexibility Dependent on available products; multiple arrays may be needed for large genomes Genome-wide assay of any sequenced organism Source of platform noise Cross-hybridization between probes and nonspecific targets Some GC bias can be present Experimental design Single- or double-channel, depending on the platform Single channel Cost-effective cases Profiling of selected regions; when a large fraction of the genome is enriched for the modification or protein of interest (broad binding) Large genomes; when a small fraction of the genome is enriched for the modification or protein of interest (sharp binding) Required amount of ChIP DNA High (a few micrograms) Low (10-50 ng) Dynamic range Lower detection limit; saturation at high signal Not limited Amplification More required Less required; single-molecule sequencing without amplification is available Multiplexing Not possible Possible Workflow of ChIP-seq The workflow of ChIP-seq used to profile the specific DNA binding sites for transcription factors, DNA-binding enzymes or other DNA-associated proteins (non-histone ChIP) and DNA sites correspond to modified nucleosomes (histone ChIP) is illustrated in Figure 1 (Park, 2009). Following ChIP protocols, the chromatin is fragmentated and crosslinked proteins or modified nucleosomes immunoprecipitated using an antibody specific to the protein or the histone modification. After DNA purification and library construction, DNA fragments can be sequenced simultaneously on any of the sequencing platforms, such as Illumina Solexa Genome Analyzer, Roche 454 and Applied Biosystems (ABI) SOLiD platforms, and HeliScope by Helicos, as illustrated in Figure 1. With the tremendous progress of NGS technology, the Illumina platform, such as Hiseq, has been the most widely used platform for sequencing. Figure 1. Overview of a ChIP-seq experiment (Park, 2009). Experimental Design of ChIP-seq The Encyclopedia of DNA Elements (ENCODE) and model organism ENCODE (modENCODE) consortia have developed a set of working standards and guidelines for ChIP-seq experiments based on experience of hundreds of ChIP-seq experiments (Landt et al., 2012). To obtain high-quality ChIP-seq data, there are several technical aspects should be considered in the ChIP-seq experimental design, including antibodies, cell number, controls, replicates, chromatin fragmentation, library construction and sequencing (Kidder et al., 2011). Antibodies The quality of antibodies used for ChIP is one of the most important factors that contribute to the quality of ChIP-seq data. A sensitive and specific antibody will give a high level of enrichment. Limited efficiency of antibody is the main reason for failed ChIP-seq experiments. Antibody validation and characterization should be done before the ChIP begin. Cell number As the signal-to-noise ratio (SNR) is directly correlated with the cell number, using the correct number of cells can help to diminish the background noise. The abundance of the protein or histone modification to be investigated and the quality of the antibody should be considered when determining the number of cells. Controls An important part of ChIP-seq experimental design is determining which controls to use. A ChIP-seq peak should be compared with the same region in a matched control. There are several different control types but no consensus on which is the most appropriate: Input DNA. Mock IP: DNA obtained from IP without antibody. Nonspecific IP: using an antibody against a protein that is not known to be involved in DNA binding. Replicates High-quality ChIP-seq data sets are valuable resources for the community. Many factors, including cell-culture conditions, ChIP and library construction, may contribute to variability between data sets. To ensure reliability of the data, biological replicate experiments are necessary. Chromatin fragmentation Before ChIP, chromatin must be fragmented into a manageable size. ChIP-seq for DNA-binding proteins uses endonuclease digestion or sonication to fragment DNA. ChIP-seq for histone modifications uses micrococcal nuclease (MNase) digestion to fragment DNA. Library construction and sequencing Libraries may be constructed from ChIP DNA by standard protocols specific to the sequencing platform. Process in library construction and sequencing, including size selection, gel purification, PCR, single-end or paired-end sequencing strategy and sequencing depth, would affect the ChIP-seq data quality. Considering above technical aspects, a ChIP-seq experimental design that would obtain high-quality data is illustrated in Figure 2 (Kidder et al., 2011). At first, the appropriate controls for antibody specificity should be determined before ChIP. Chromatin is sheared into an ideal size range by sonication or enzymatic means after isolation of the ideal number of cells. Next, high-quality antibodies are used for ChIP. After purification of ChIP-enriched DNA, a library is constructed to allow sequencing on NGS platforms. Figure 2. ChIP-seq experimental design (Kidder et al., 2011). At CD Genomics, we provide you with high-quality sequencing and integrated bioinformatics analysis for your ChIP-Seq project, enabling accurately screen and determine the protein binding sites in the whole genome. If you have additional requirements or questions, please feel free to contact us. Additional reading: Pipeline and Tools for ChIP-seq Analysis References: Barski, A., Cuddapah, S., Cui, K., Roh, T.Y., Schones, D.E., Wang, Z., Wei, G., Chepelev, I., and Zhao, K. (2007). High-resolution profiling of histone methylations in the human genome. Cell129, 823-837. Johnson, D.S., Mortazavi, A., Myers, R.M., and Wold, B. (2007). Genome-wide mapping of in vivo protein-DNA interactions. Science316, 1497-1502. Kidder, B.L., Hu, G., and Zhao, K. (2011). ChIP-Seq: technical considerations for obtaining high-quality data. Nature immunology12, 918-922. Landt, S.G., Marinov, G.K., Kundaje, A., Kheradpour, P., Pauli, F., Batzoglou, S., Bernstein, B.E., Bickel, P., Brown, J.B., Cayting, P., et al. (2012). ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome research22, 1813-1831. Park, P.J. (2009). ChIP-seq: advantages and challenges of a maturing technology. Nature reviews Genetics10, 669-680. Robertson, G., Hirst, M., Bainbridge, M., Bilenky, M., Zhao, Y., Zeng, T., Euskirchen, G., Bernier, B., Varhol, R., Delaney, A., et al. (2007). Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nature methods4, 651-657. Schones, D.E., and Zhao, K. (2008). Genome-wide approaches to studying chromatin modifications. Nature reviews Genetics9, 179-191.
  3. The challenges of Chip-seq ChIP-seq is a powerful method to identify genome-wide DNA binding sites for a protein of interest. Mapping the chromosomal locations of transcription factors (TFs), nucleosomes, histone modifications, chromatin remodeling enzymes, chaperones, and polymerases is one of the key tasks of modern biology. To this end, ChIP-seq is the standard methodology (Bailey et al., 2013). Multiple challenges presented in ChIP-seq are not only in sample preparation and sequencing but also in computational analysis. Unlike other types of massively parallel sequencing data, the ChIP-seq data have several characteristics: Histone modifications cover broader regions of DNA than TFs. Reads are trimmed to within a smaller number of bases. Fragments are quite large relative to binding sites of TFs. Measurements of histone modification often undulate following well-positioned nucleosomes. To extract meaningful data from the raw sequence reads, the ChIP-seq data analysis should: Identify genomic regions – ‘peaks’ – where TF binds or histones are modified. Quantify and compare levels of binding or histone modification between samples. Characterize the relationships among chromatin state and gene expression or splicing. Bioinformatics analysis workflow for ChIP-seq data Bioinformatics analysis workflow for ChIP-seq data and the considerations for each step is illustrated in Figure 1 (Nakato and Shirahige, 2017). The procedure of sample preparation, sequencing and mapping (Figure 1A) is common in both experiments with single or a few samples (Figure 1B) and experiments with many samples (Figure 1C). Initially, sequencing reads of ChIP-seq are analyzed to assess the quality of the reads. After quality metrics, reads are mapped to the reference genome. Compared with input reads, genomic regions that are significantly enriched for ChIP reads are detected as peaks. Other genomic regions are regarded as non-specific background. Read densities can be visualized along the genome. Adjusting peak-calling strategy and parameters to each sample’s property is possible in sample-scale analysis (Figure 1B). But one-by-one adjusting is difficult for large-scale analysis (Figure 1C), in which objective quality metrics for multilateral quantitative assessment is necessary to filter poor-quality data automatically. The called peaks represent candidates of histone modification and targeted protein or DNA-binding sites, which can be used to identify associated functional annotations, such as binding motifs. Figure 1. ChIP-seq analysis workflow. Adapted from (Nakato and Shirahige, 2017) A comprehensive comparison of tools for differential ChIP-seq data analysis There has been a large effort to improve analytical tools that are used in analysis of ChIP-seq data, and each step has led to the development of specialized software tools. A subset of software tools available for mapping and peak calling are briefly listed in Table 1 (Furey, 2012). Table 1. A subset of software tools available for mapping and peak calling in the analysis of ChIP-seq data. Tool Notes Web address Short-read aligners BWA (Burrows-Wheeler Aligner) Fast and efficient; based on the Burrows-Wheeler transform http://bio-bwa.sourceforge.net Bowtie Similar to BWA, part of suite of tools that includes TopHat and CuffLinks for RNA-seq processing http://bowtie-bio.sourceforge.net GSNAP (Genomic Short-read Nucleotide Alignment Program) Considers a set of variant allele inputs to better align to heterozygous sites http://research-pub.gene.com/gmap Wikipedia list of aligners A comprehensive list of available short-read aligners, with descriptions and links to download the software http://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short- Read_Sequence_Alignment Peak callers MACS (Model-based Analysis for ChIP-seq) Fits data to a dynamic Poisson distribution; works with and without control data http://liulab.dfci.harvard.edu/MACS PeakSeq Takes into account differences in mappability of genomic regions; enrichment based on FDR (false-discovery rate) calculation http://info.gersteinlab.org/PeakSeq ZINBA (Zero-Inflated Negative Binomial Algorithm) Can incorporate multiple genomic factors, such as mappability and GC content; can work with point-source and broad-source peak data http://code.google.com/p/zinba Besides detection of enriched or bound regions in ChIP-seq data analysis, an important question is to determine differences between conditions. Owing to the complexity of ChIP-seq data in terms of noisiness and variability, the question is particularly challenging for ChIP-seq. Many different computational tools have been developed and published in recent years for differential ChIP-seq analysis. These tools show important differences in their algorithmic setups, in the number and size of detected differential regions (DR), and in the range of applicability. Description of 14 different tools for differential ChIP-seq data analysis is listed in Table 2 (Steinhauser et al., 2016). Table 2. Description of different tools for differential ChIP-seq data analysis. Tool Language Peak Calling Web address SICER Bash/Python Window based approach, merging of eligible clusters in proximity closer than the defined gap size https://home.gwu.edu/~wpeng/ Software.htm MACS2 Python Not required https://github.com/taoliu/MACS/ ODIN Python Not required http://costalab.org/wp/ odin RSEG C++ Not required http://smithlabresearch.org/software /rseg/ MAnorm R Requires peak calling e.g. with MACS http://bcb.dfci.harvard.edu/~gcyuan /MAnorm/MAnorm.htm HOMER Perl & C++ Window based approach Peak calling done by HOMER http://homer.salk.edu/homer /index.html QChIPat R, Perl & C++ Peak calling possible with BELT, MACS, SISSRs or FindPeaks http://motif.bmi.ohio-state.edu/ QChIPat/ diffReps Perl Sliding window approach https://github.com/shenlab -sinai/diffreps DBChip R Requires peak calling e.g. with MACS http://pages.cs.wisc.edu/ ~kliang/DBChIP/ ChIPComp R Requires peak calling e.g. with MACS http://web1.sph.emory.edu/users /hwu30/software/ChIPComp.html MultiGPS Java Expectation maximization learning http://mahonylab.org/software /multigps/ MMDiff R Requires peak calling e.g. with MACS https://bioconductor.riken.jp/ packages/3.1/bioc/html/MMDiff.html DiffBind R Requires peak calling e.g. with MACS http://bioconductor.org/packages /release/bioc/html/DiffBind.html PePr Python Window based approach https://github.com/shawnzhangyx /PePr Decision tree indicating the proper choice of tool is illustrated in Figure 2. The choice of tool depends on several factors: shape of the signal (sharp peaks or broad ChIP enrichments), presence of replicates and presence of an external set of regions of interest. The tools indicated in black give good results using default settings, and the tools in gray would require more extensive fine-tuning of parameters to achieve optimal results. Figure 2. Decision tree indicating the proper choice of tool. Adapted from (Steinhauser et al., 2016). Technical guidelines for the comprehensive analysis of ChIP-seq data Recent advances in sequencing technologies and analyses enable us to handle hundreds of ChIP samples simultaneously. But there are still some issues in analysis of ChIP-seq data, such as the false positive peaks, the multiple mapped reads and the poor overlap between peak-finding algorithm results. To obtain high-quality results from the computational analysis of ChIP-seq data, some technical aspects should be considered, which have been listed below (Bailey et al., 2013): 1) Sequencing Depth Effective analysis of ChIP-seq data requires enough coverage by sequence reads (sequencing depth). The required sequencing depth mainly depends on the size of the genome and the number and size of the binding sites of the protein. 20 million reads may be adequate for mammalian TFs and chromatin modifications which are typically localized at specific, narrow sites, such as enhancer-associated histone marks (Landt et al., 2012). Proteins with broader factors, including most histone marks, or more binding sites, such as RNA Pol II, will require up to 60 million reads for mammalian ChIP-seq (Chen et al., 2012). Control samples should be sequenced significantly deeper than the ChIP ones. 2) Read Mapping and Quality Metrics Before mapping to the reference genome, the reads should be filtered by applying a quality cutoff. It is important to consider the percentage of uniquely mapped reads reported by the mapping tools. 3) Peak Calling The analysis for ChIP-seq data is to predict the regions of the genome where the ChIPed protein is bound by finding regions with peaks. A fine balance between sensitivity and specificity depends on choosing an appropriate peak-calling algorithm and normalization method based on the type of protein ChIPed. 4) Assessment of Reproducibility To ensure the reproducibility of the experimental results, at least two biological replicates of each ChIP-seq experiment are recommended to be performed. The reproducibility of both reads and identified peaks should be examined. 5) Differential Binding Analysis Comparative ChIP-seq analysis of an increasing number of protein-bound regions across conditions or tissues is expected with the steady raise of NGS (next-generation sequencing) projects. The direct calculation of differentially bound regions between treatment samples without controls is not recommended. 6) Peak Annotation The aim of the annotation is to associate the ChIP-seq peaks with functionally relevant genomic regions, such as gene promoters, transcription start sites, intergenic regions, etc. 7) Motif Analysis Motif analysis is useful for much more than just identifying the causal DNA-binding motif in TF ChIP-seq peaks. When the motif of the ChIPed protein is already known, motif analysis provides validation of the success of the experiment. Additional reading: The Advantages and Workflow of ChIP-Seq References: Bailey, T., Krajewski, P., Ladunga, I., Lefebvre, C., Li, Q., Liu, T., Madrigal, P., Taslim, C., and Zhang, J. (2013). Practical guidelines for the comprehensive analysis of ChIP-seq data. PLoS computational biology 9, e1003326. Chen, Y., Negre, N., Li, Q., Mieczkowska, J.O., Slattery, M., Liu, T., Zhang, Y., Kim, T.K., He, H.H., Zieba, J., et al.(2012). Systematic evaluation of factors influencing ChIP-seq fidelity. Nature methods 9, 609-614. Furey, T.S. (2012). ChIP-seq and beyond: new and improved methodologies to detect and characterize protein-DNA interactions. Nature reviews Genetics 13, 840-852. Landt, S.G., Marinov, G.K., Kundaje, A., Kheradpour, P., Pauli, F., Batzoglou, S., Bernstein, B.E., Bickel, P., Brown, J.B., Cayting, P., et al.(2012). ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome research 22, 1813-1831. Machanick, P., and Bailey, T.L. (2011). MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27, 1696-1697. McLean, C.Y., Bristor, D., Hiller, M., Clarke, S.L., Schaar, B.T., Lowe, C.B., Wenger, A.M., and Bejerano, G. (2010). GREAT improves functional interpretation of cis-regulatory regions. Nature biotechnology 28, 495-501. Nakato, R., and Shirahige, K. (2017). Recent advances in ChIP-seq analysis: from quality management to whole-genome annotation. Briefings in bioinformatics 18, 279-290. Steinhauser, S., Kurzawa, N., Eils, R., and Herrmann, C. (2016). A comprehensive comparison of tools for differential ChIP-seq analysis. Briefings in bioinformatics 17, 953-966. Thomas-Chollier, M., Herrmann, C., Defrance, M., Sand, O., Thieffry, D., and van Helden, J. (2012). RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets. Nucleic acids research 40, e31.
  4. CD Genomics, the world leading genomic sequencing company, is able to offer the genotyping of 11 all HLA loci, including HLA-A, B, C, DRB1, DRB3/4/5, DQA1, DQB1, DPA1, and DPB1 and deliver a thorough and detailed report. As known, the HLA gene complex on human chromosome 6 is one of the most polymorphic regions in the human genome and contributes in a large part of the diversity of the immune system. Accurate typing of HLA genes with short-read sequencing data has historically been difficult due to the sequence similarity between the polymorphic alleles. Hence CD Genomics introduces an accurate and reliable HLA genotyping service, which uses NGS technology and amplification methods to address limitations of traditional Sanger sequencing assays. HLA Genes have an important/integral role in the human adaptive immune system. For example, classical class I (HLA-A, -B, and -C) and class II (HLA-DR, -DP, and -DQ) HLA gene starts to function by presenting foreign antigens to T cells to trigger immune responses. Those HLA genes display amazing sequence diversity on human body. Let’s say it in another way, imagining there are >4,000 known alleles for the HLA-B gene alone. The genetic diversity in HLA genes in which different alleles have different efficiencies for presenting different antigens is believed to be a result of evolution conferring better population-level resistance against the wide range of different pathogens to which humans are exposed. Since the first description of an association between HLA and human disease, HLA molecules have proven to be central to physiology, protective immunity and deleterious, disease-causing autoimmune reactivity. It is reported that HLA has been associated with more than hundreds of different diseases, including various autoimmune disorders, transplantation etc. However, although it plays an important role in human health, people do not routinely have their HLA genes typed. With the current trend toward precision medicine, knowing their HLA types will be crucial in early diagnosis and management of many diseases. “CD Genomics HLA Typing can be done at different levels of resolution. We offer a worldwide service, with our high throughput service to process hundreds of samples at a time. We can only accept genomic DNA. Our aim is to deliver a high confidence and responsive service to our clients at realistic cost.” About CD Genomics HLA typing Service The service is offered for R&D / non-diagnostic purposes only and results must not be utilized to inform patient management decisions. It is featured with: first, comprehensive assay; second, unambiguous results; and third, sample-to-report solution. The service has complete workflow, covering from sample preparation, sequencing, data analysis, and reporting. For more information, please visit https://www.cd-genomics.com/HLA-Typing.html About CD Genomics CD Genomics was established in 2004, the company is aiming at providing the research community with high quality Next Generation Sequencing, high throughput microarray services. Due to the demand for our services has being increased; CD Genomics has already updated its technology platform to mainstream NGS and microarray instruments.
  5. The introduction to non-coding RNAs Non-coding RNAs (ncRNAs) used to be considered as transcription noises or byproducts of RNA processing, but increasing evidence suggests that a majority of them are biologically functional and regulate various activities in the cells. The ncRNAs are roughly classified into two categories according to their sequence length: small ncRNAs (<200 bp) and long ncRNAs (200 bp or more). The categories of ncRNA are listed in Table 1. Table 1. Overview of ncRNA (Fu 2014). ncRNAs Full name Function Housekeeping ncRNAs rRNA Ribosomal RNA Translational machinery tRNA Transfer RNA Amino acid carriers snRNA Small nuclear RNA RNA processing snoRNA Small nucleolar RNA RNA modifications TR Telomere RNA Chromosome end synthesis Regulatory ncRNAs miRNA MicroRNAs RNA stability and translation control endo-siRNA Endogenous siRNA RNA degradation rasiRNA Repeat-derived RNA Transcriptional control piRNA Piwi-interacting RNA Silencing transposon and mRNA decay eRNA Enhancer-derived RNA Regulation of gene expression PATs Promoter-associated RNA Transcription initiation and pause release lncRNA Long non-coding RNA Imprinting, epigenetics, nuclear structure As shown in Table 1, ncRNAs can be roughly divided into two classes: housekeeping ncRNAs and regulatory ncRNAs. Housekeeping ncRNAs, involving rRNA, tRNA, snRNA, snoRNA, and TR, are considered “constitutive” since they are ubiquitously expressed in all cell types and offer essential functions to the organisms. Regulatory ncRNAs, involving miRNA, endo-siRNA, rasiRNA, piRNA, eRNA, PATs, and lncRNA, have received increasing attention from the research community due to their regulatory function in gene expression, imprinting, and epigenetics. RNA-seq is an advanced technique to illustrate the ncRNA species. Here, we made a summary of the bioinformatics tools for ncRNA analysis with data from NGS. Figure 1. ncRNAs as integrated parts of gene network (Fu 2014). Small ncRNA analysis Small RNAs play a crucial role in transcriptional regulation and are essential to fully understand the entire scenario of transcriptional regulation. Their aberrant expression profiles are considered to be associated with cellular dysfunction and disease. Therefore, many researches are focused on detection, prediction, or expression quantification of small RNAs, particularly miRNAs, to better understand human health and disease. The available computational tools for small RNA sequencing data are summarized in Table 2. Table 2. Computational tools for small ncRNA analysis Tools Descriptions DARIO Quantify and annotate ncRNAs with access to several ncRNA public databases. CPSS Quantify and annotate ncRNAs, with special emphasis on miRNAs. ncPRO-seq Detect known small ncRNAs in an unbiased way and discover novel ncRNA species. CoRAL Divide small ncRNA into functional categories based on biologically interpretable features other than sequence; Annotate ncRNA in less well-characterized organisms. RNA-CODE Combine secondary structure with de novo assembly. Applicable to ncRNA annotation lacking reference genomes. miRDeep Used to detect both known and novel miRNAs in small RNA sequencing data. Circular RNA detection CircRNAs are a novel type of RNA that form a covalently closed continuous loop. Most of them are generated from exonic or intronic sequences, and RNA-binding proteins (RBPs) or reverse complementary sequences are necessary for their biogenesis. CircRNAs are mostly conserved, and function as miRNA sponges, regulator of splicing and transcription, or modifiers of parental gene expression. Increasing evidence suggests the potential significance of circRNA in human diseases, such as atherosclerotic vascular disease, neurological disorders, and cancer. Among all the presented tools for circRNA detection, CIRI, CIRCexplorer, and KNIFE exhibit a balanced performance between precision and sensitivity. The available computational tools for circRNA sequencing data are summarized in Table 2. Table 3. Computational tools for circular RNA detection. Method Approach dependencies CIRI Segmented read-based Bwa, peri CIRCexplorer Segmented read-based STAR, bedtools, python (pysam, docopt, Interval) KNIFE Candidate-based Bowtie, Bowtie2, tophat2, samtools, perl LncRNA investigation LncRNA is a type of non-coding RNA with more than 200 nucleotides, such as lincRNAs and macroRNAs. LncRNAs function as a platform for the interaction with mRNA, miRNA, or protein. They have emerged as vital regulators in diverse aspects of biology, including transcriptional regulation, post-transcriptional regulation, and chromatin remodeling. Increasing researches suggest misexpression of lncRNAs contributes to tumor initiation, growth, and metastasis. LncRNAs hence become a promising target for cancer diagnosis and therapy. The combination of lncRNA sequencing and matched computational tools is a powerful approach for this purpose. Table 4. Computational tools for lncRNA investigation. Tools Applications Reference lncRScan Detect lncRNA from the complex assemblies; Distinguish lncRNA from mRNAs (Sun et al., 2012) iSeeRNA Accurately and quickly detect lincRNA from large datasets (Sun et al., 2013) Annocript Detect lncRNA by leveraging public databases and sequence analysis software to verify high non-coding potential (Musacchia et al. 2015) LncRNA2Function Annotate lncRNA based on the theory that similar expression patterns across diverse conditions may share similar functions and biological pathways. (Jiang et al. 2015) References: 1. Choudhuri S. Small noncoding RNAs: biogenesis, function, and emerging significance in toxicology. Journal of biochemical and molecular toxicology, 2010, 24(3): 195-216. 2. Fu X D. Non-coding RNA: a new frontier in regulatory biology. National science review, 2014, 1(2): 190-204. 3. Gao Y, Wang J, Zhao F. CIRI: an efficient and unbiased algorithm for de novo circular RNA identification. Genome biology, 2015, 16(1): 4. 4. Jiang Q, Ma R, Wang J, et al. LncRNA2Function: a comprehensive resource for functional investigation of human lncRNAs based on RNA-seq data//BMC genomics. BioMed Central, 2015, 16(3): S2. 5. Musacchia F, Basu S, Petrosino G, et al. Annocript: a flexible pipeline for the annotation of transcriptomes able to identify putative long noncoding RNAs. Bioinformatics, 2015, 31(13): 2199-2201. 6. Qu S, Yang X, Li X, et al. Circular RNA: a new star of noncoding RNAs. Cancer letters, 2015, 365(2): 141-148. 7. Su Y, Wu H, Pavlosky A, et al. Regulatory non-coding RNA: new instruments in the orchestration of cell death[J]. Cell death & disease, 2016, 7(8): e2333. 8. Sun K, Chen X, Jiang P, et al. iSeeRNA: identification of long intergenic non-coding RNA transcripts from transcriptome sequencing data. BMC genomics, 2013, 14(2): S7. 9. Sun L, Zhang Z, Bailey T L, et al. Prediction of novel long non-coding RNAs based on RNA-Seq data of mouse Klf1 knockout study. BMC bioinformatics, 2012, 13(1): 331. 10. Veneziano D, Nigita G, Ferro A. Computational approaches for the analysis of ncRNA through deep sequencing techniques. Frontiers in bioengineering and biotechnology, 2015, 3: 77. 11. Yang G, Lu X, Yuan L. LncRNA: a link between RNA and cancer. Biochimica et Biophysica Acta (BBA)-Gene Regulatory Mechanisms, 2014, 1839(11): 1097-1109. 12. Zeng X, Lin W, Guo M, et al. A comprehensive overview and evaluation of circular RNA detection tools. PLoS computational biology, 2017, 13(6): e1005420.
  6. Microorganisms widely exist in nature and are closely related to human life and production. They are generally divided into fungi, actinomycetes, bacteria, spirulina, rickettsia, chlamydia, mycoplasma and viruses. Microbial whole genome sequencing is an important tool for mapping genomes of novel organisms, finishing genomes of known organisms, or comparing genomes across multiple samples. Sequencing the entire microbial genome is important for generating accurate reference genomes, for microbial identification, and other comparative genomic studies. Comparative genomic analysis based on whole genome sequencing plays an irreplaceable role in studying pathogenic mechanism of pathogenic microorganism, evolution of pathogenic genes and screening of novel, efficient drug targets. Microbial whole genome sequencing can be widely used in various fields. Diseases Pathogenic microorganism includes all kinds of microorganisms that cause human diseases, food corruption, animal infection in animal husbandry and breeding industry, and plant diseases. Researches focus on disease-related genes, regulatory and interaction systems, metabolic systems, genetic variation, laboratory diagnosis and specific prevention, drug resistance genes, virulence genes and so on. Specifically, microbial whole genome sequencing can be applied to: · Diagnosis and identification of pathogens · Epidemiological investigation and tracing · Rapid identification of pathogen character · Analysis and prediction of disease prevalence · Vaccine variation monitoring and efficacy evaluation · Surveillance of foodborne pathogens · Drug targets discovery Foods Whole genome sequencing of microorganisms in food and bioinformatics analysis of data can help people predict genes that play an important role in the fermentation process or product quality, providing information about the metabolic pathways of microorganisms and their interaction with the environment. Microbial whole genome sequencing opens up the possibility of modifying microorganisms to make them more efficient in the production of vinegar, liquor, yoghourt, and many other fermentation processes. Agriculture Microorganism in agriculture involves in planting and breeding, processing of agricultural products, agricultural biotechnology, agricultural ecology and other research and application. Microbial whole genome sequencing can make people have a better understanding of agriculture microbe from genomic level, and the subsequent studies on genome structure and function lay an important foundation for agromicrobiology in the field below: · Establish agricultural microbial gene bank · Soil microorganism (including root microorganism) · Plant nutrition · Biological nitrogen fixation · Microbial pesticide · Microbial fertilizer · Feed additive · Biogas fermentation Environment & Industry Microorganism is one of the important factors to maintain the energy and material circulation in the ecosystem, it plays an important role in the degradation of various pollutants and harmful substances, and has great application value in energy production and renewable utilization. Some environmental microorganisms can adapt to special environments, such as high temperature, low temperature, high pressure, acid, alkali, heavy metal, and high substrate concentration. Microbial whole genome sequencing allows people to know about the secrets of these microorganisms’ adaptation to extreme environments, and provides a lot of assistance in pollution control, environmental protection, oil exploitation, preservation and transportation of food and medicine, biofuel, fermentation industry and many other fields in environment and industry. At CD Genomics, our expert team with extensive experience can help you fully understand microbial communities and take advantage of them. For this purpose, we provide the following services: 16S/18S/ITS Amplicon Sequencing Metagenomic Shotgun Sequencing Viral Metagenomic Sequencing Metatranscriptomic Sequencing Microbial Whole Genome Sequencing Viral Genome Sequencing