QC chromstater Discover de novo re-occuring combinatorial and spatial patterns of marks
QC Genomics
Re-Compute 0 Datasets
* Local BG file is not available for this profile
# SECID Target molecule Cell # of LocalQCs Stamp
hide
annotated as Input have been analysed, you can disable these datasets clicking here, and recomputed the remaining datasets in the "Options" panel.
Re-Compute 0 Datasets
Distal RE Proximal RE TSS Gene Body TES
bp

Download


State(s) generated:
Bin coverage (%):
Transcript coverage (%):
Functional
Annotation
State ID (Clustering Order)
Datasets count
Transcript coverage
TS
20kb
TSS
5kb
TSS
BODY
TES
Re-Compute 0 Datasets
LocalQC Dispersion 10%
Clustering
Distance
Linkage method
Global Background Threshold
Minimun % LocalQCs per dataset per state
A state having a % of LocalQCs coverage of all its related datasets under the value will be filtered out and excluded from the state tables.

Functionnal annoations saved is associated to the State Id (s01, ..) not to the query performed. Only one save can be done at a time.

Save Functional Annotation
Load Functional Annotation

Documentation

Note: this documentation is a work in progress and subject to change.

This manual presents QC ChromStater, describes how to use the web application, and illustrates features with examples. We recommend that you read it carefully, and use it for future reference.

Last updated: 18 Sept. 2017.

Getting started

QC ChromStater is an online tool to discover the most frequent re-occuring patterns of enrichment sites between publicly available high-throughput sequencing datasets. The raw data were collected from public repositories, uniformly processed, and annotated using controlled vocabularies. QC ChromStater is part of the QC Genomics tools.

Accessing QC Genomics

QC Genomics is available online at http://ngs-qc.org/qcgenomics. QC ChromStater can only be access through NAVi http://ngs-qc.org/navi.

Overview

QC ChromStater allows analyzing multiple datasets with the aim to discover the most frequent re-occurring patterns of enrichment sites (named localQCs, see NGS-QC tutorial). Briefly, the QC ChromStater will compute the most enriched sites and associate them with gene regulatory elements and bodies based on their positions across the whole genome (with a resolution of 500 bp bins).

States

All combinations of localQCs co-localized across the datasets are computed. Each combination is called a state. The QC ChromStater algorithm finds each state and keeps the one overlapping with predetermined gene regions, while the less frequent states are discarded. The list of the 100 most frequent states is provided with the lists of genes (and gene regions) associated to each of them.

states computed in a genomic region for 4-5 datasets

The following rules apply for states:

  • A state is defined by the co-occurrence of enrichment among all or a subset of the input datasets. A state can be assigned to a list of datasets.
  • A state is composed of multiple bins, a bin is a window of 500 bp on the genome. Each bin corresponds to the position of a localQC that exists in all the datasets assigned to this state.
  • The maximum possible number of states is (number of datasets)2 - 1. Hence, when a single dataset is loaded, only one state is produced: the state corresponds to the list of localQCs of this dataset.
  • The total number of bins (genome size / 500 bp) is divided among all the states thus the maximum number of state is the total number of bins.
  • States having the same ID across State tables affect the same datasets/merged-datasets.

Transcripts, genes and gene regions

Transcripts are annotated from the refSeq table of the UCSC database. Genes/transcripts are decomposed into 5 regions:

  • DISTAL region, 50,000 bp upstream and downstream the TSS
  • PROXIMAL region, 10,000 bp upstream and downstream the TSS
  • TSS region, 1,500 bp upstream and downstream the TSS
  • TES region, 1,500 bp upstream and downstream the TES
  • BODY region, all bins between the TSS and TES that are not in the TSS or TES regions. This region may not exist for some of the small genes.

A diagram of the regions can be visualized in the top part of the state panel. The user can change the distances and use the Recompute buttons to apply them.

Bins are assigned to gene regions according to the following priorities:

TSS > TES > BODY > PROXIMAL > DISTAL

For instance, when a bin overlaps the TSS region of the gene X and the BODY region of the gene Y, only the first overlap to the TSS region is considered. When a bin overlaps the BODY region of 2 genes X and Y, the bin is assigned to both BODY regions of these genes since no priority prevails.

The list of generated states is filtered using frequency of overlapping transcripts regions. The list of transcript names has been converted into gene names; however all the statistics are based on the number of transcripts.
Note: a gene region, e.g. the BODY region of the gene XYZ, can overlap multiple times with bins associated with the same state. This is not taking into account the statistics provided nor during the filtering step of states.

States page

Gene regions distances

The top part of the states page allows the user to visualize the gene regions and to select custom distances for all 5 regions. Distances for TSS and TES regions can be specified upstream and downstream independently.
Note: regions cannot overlap with each other.

To apply custom distances use the Recompute button.

States tables

The QC ChromStater computes states among the list of all datasets provided as an input (main states table) and when cell lime/tissue annotations are available, it computes states for lists of datasets grouped by identical cell type/tissue as well. In addition to the main states table, a state table is generated for each type of cell line/tissue.

To switch between the states tables use the select box labeled Cell Type/Tissue. The default value Total cells displays the main states table, computed for the whole list of input datasets.
Note: the select box is disabled when all the datasets belong to the same cell type/tissue.

Each row gives information about a given state:

Column description
Functional Annotation See the section Functional annotation and FA table for more information
State ID

The unique identifier, e.g. s01, of the state. This identifier is unique only among the list of states generated for the current list of datasets. Identifiers are assigned to states using the list of input datasets or target names (when merged-datasets). In the case of target name, a state ID can be found in multiple states tables for different cell lines if the assigned datasets have different cell/tissue annotations.

Click on the column header to sort the states by bin occupancy clusters.

Bin occupancy columns

Each column header contains the name of a dataset or a target molecule if the option merge by target has been selected. The content of the cells indicates the percentage of bins used by the state compared to the total number of bins in the dataset or merged-datasets.

Move the mouse over a cell to display the column name. Click on the column header to sort the table according to the values. To get more information on the LocalQCs localization across the gene regions, click on a cell to open the State panel.

States can be filtered out according to the minimum percentage of LocalQCs use per dataset in the Options panel.By default States must correspond to at least 1% of the LocalQCs in one of the datasets affected to them.

# of target molecules/datasets Indicates the number of datasets/targets that are affected by the state. Move the mouse over the number to get the list of datasets/target molecules names. Click on the cell to open the State panel and displays more information about the these datasets.
Transcripts coverage

Displays using a heatmap the percentage of transcripts per regions covered by the state. For instance in Human ~40000 transcripts have been annotated, the column TSS indicates the % among them having at least one bin in the TSS region covered by the state. A state bins can overlap with each of the five transcripts regions.

Move the mouse over a cell to view the percentage of transcripts. Click on a cell of the heatmap to open the State panel and display the corresponding list of genes.

Click on the Download button above the state table to download it in a TSV formatted file.

State panel

The state panel provides detailed information about a given state. It can only be opened by clicking on a row of a state table. The select boxes are already showing information about the selected state, use them navigate on others states. If provided, the functional annotation is indicated as well. The panel has 3 tabs:

Genome occupancy

Shows general information about the state bins and the coverage of transcripts regions.

Data sets

Shows GSM ID, the target molecule, the cell and the stamp of the datasets assigned to the state. The % of LocalQCs covered by the state is also indicated. The user can select or un-select datasets using the check box and perform a new request using only the list of datasets selected by clicking on the Recompute button.

Genes

Displays the list of genes corresponding to the list of transcripts covered by the states. Use the buttons to filter the list by transcript regions.

When the selected state ID is found in multiple state tables, click on the button Cell line/Tissue Specific to filter the list of genes that are only specific to the cell line of the currently selected state. This option is only available when the QC ChomStater have been run with the option merge per target activated.

The button Functional Annotation Specific allows to merged list of genes for all states having the same functional annotation/label and this only for states having the same cell/tissue (or states in the global state table).
Note: the button is enabled only when a state with functional annotation is selected.

Functional annotation and FA table

The user can assign a functional annotation or any label to a given state. The same label can be assigned to multiple states, doing so, list of genes can be merged between states having the same label.

To access the FA table select Cell Line/tissue and assign at least one label to a state. Click on the button Genes/FA table located in the right upper corner above the state table.

In the FA table genes are grouped by state labels. The table provides lists of genes for all combinations of functional annotations/labels and with them, the list of genes overlapping the gene regions with at least one state of the functional annotations. When some states of the state table have not been labeled, the last row of the table corresponds to the list of genes that are not in annotated states.

Functional annotations can be quickly disabled using the Enable/Disabled button in the column header. Click on the buttons in the first column to show the list of genes. The list of genes can be filtered out by cell line/tissue specific.

In addition the user can still filter the genes overlapping to at least one of the gene regions enabled by using the button above the table. The behaviour is similar to the buttons found in the tab Genes of state panel.

To search for a list of genes, use the area above the table. To visualize matching genes, move the mouse on the corresponding cell in the column named User list.
Note: genes are searched only in the list of genes, it is not cell Line/tissue specific.

Click on the Button States table to quit the FA table.

Options page

LocalQC Dispersion 10%

Enriched sites across the genome are detected using localQCs. LocalQCs positions have been annotated 5 times using random sampling algorithm. The user can choose to compare datasets with the most reproducible LocalQCs (5/5) or less reproducible localQCs (1/5).

Clustering

A clustering algorithm is performed on the bin coverage matrix prior to display. Choose among multiple clustering distances and the clustering linkages.

Global Background Threshold

In order to exclude background noise during localQCs determination, we have incorporated a Poisson distribution-based model to estimate the background level threshold.
From the Poisson distribution, a λ value (intensity cutoff) is estimated and is subtracted from all the bins while calculating read count intensity. Users can choose to increase by X-fold this threshold, or to use an absolute value by unchecking the box.

Merge per target molecule

Pools the datasets datasets sharing the same target molecule.

Interacting

The above video describes the usage of the Chromstater tool and gives an insight of the results interpretation.

-