In addition to the performance of the immunoprecipitation/enrichment assays, the rapid technological progress provided NGS platforms with largely different sequencing capacities ranging from tens of millions (e.g. Illumina Genome analyzer GA) to >3 billion (HiSeq2000) reads per flow cell. As a consequence, the publicly available databases hosting NGS generated datasets are populated with enrichment profiles presenting a large variety in sequencing depth. Importantly, previous studies have demonstrated that by increasing the sequencing depth, the number of discovered binding sites increases accordingly. Intuitively, it is expected that the number of sequenced reads required to discover all binding events is directly related to their total number and to their binding pattern (i.e. 'broad' regions covering large parts of a genome will require more reads to be properly identified than 'sharp' patterns with few target sites). When evaluating the quality of NGS based profiling, it is therefore important to assess if a given ChIP-seq profile is performed under optimal sequencing conditions, including the minimal sequencing depth required to discover most of the relevant binding events of a given factor.
While a number of analytical methods aiming to address the Quality of ChIP-seq datasets have been described, none of them has been shown to be applicable to the large variety of ChIP-seq and enrichment-related NGS profiling assays. In contrast, our proprietary NGS-QC algorithm has been designed to (i) infer a set of global QC indicators (QCis), which reveal the comparability of different enriched NGS data sets, (ii) provide local QCis to judge the robustness of cumulative read counts ('peaks or islands') in a particular region, (iii) provide guidelines for the choice of the optimal sequencing depth for a given target and, finally, (iv) to have quantitative means of comparing different antibodies and antibody batches for ChIP-seq and related antibody-driven studies.
Importantly, this original concept has been applied for establishing the first QC indicator database covering a large collection of ChIP-seq and enrichment-related datasets retrieved in the public domain. Our team extensively working to cover virtually all publicly available enrichment-related NGS profiling assays, thus users can compare the quality indicators computed by the NGS-QC Generator tool for a given ChIP-seq experiment with the quality indicators for published datasets present in the QC indicator database. This information will guide users toward optimization of their ChIP-seq assays, for instance by the selection of optimal antibody sources that were shown to perform in the public domain as described by the NGS-QC indicators associated to their related studies in our database.
Our database can be used to:
- Evaluate the quality of your favourite ChIP-seq or enrichment-related NGS dataset relative to the datasets present in the entire database through our customized Galaxy platform instance
- "Google" the QC descriptors for thousands of publicly available NGS generated datasets
- Search for the QC descriptor ranges of a particular transcription factor, histone modification, or a specific technology, like FAIRE-seq
- Download QC reports and local QC indicator profiles for a given dataset
- Visualize the QC of a range of targets in dynamic plots
- Visit the "database" for a plethora of additional options