QC comparator A fast, integrative and exhaustive NGS datasets comparison tool
QC Genomics
Re-Compute

Current matrix size:  0
DATASETS Y
D
A
T
A
S
E
T
S
 
X

Your web browser doesn't support canvas.

Target molecule
Cell/Tissue
Stamp
Datasets X & Y
Datasets X
Datasets Y
Reset Zoom
Scale Slider
HIGH
SIMILARITY
LOW
Distances below lower limit (0): 0
Distances above upper limit (1): 0

NoMissing Datasets
Re-Compute 0 Datasets
GSM Target molecule Cell/Tissue Stamp
Similarity measure
LocalQC Dispersion 10%
Clustering
Distance
Linkage method

Documentation

Note: This documentation is a work in progress and subject to change.

This manual presents QC comparator, describes how to use the web application, and illustrates features with examples. We recommend that you read it carefully, and use it for future reference.

Last updated: 07 Sept. 2017.

Getting started

QC Comparator is an online tool allowing comparison of publicly available high-throughput sequencing datasets. The raw data were collected from public repositories, uniformly processed, and annotated using controlled vocabularies. QC comparator is part of the QC Genomics tools.

Accessing QC Genomics

QC Genomics is available online at http://ngs-qc.org/qcgenomics. QC comparator can only be accessed through NAVi http://ngs-qc.org/navi.

Matrix and enrichment annotation (right panel)

The Matrix of dissimilarity

The QC comparator is a comparison tool that aims to visualize the genome-wide similarity, i.e. overlapping positions of enriched sites, between multiple NGS datasets. The tool outputs a dissimilarity matrix showing the pairwise dissimilarities (distances) between datasets that are provided as input by the user.

A distance between 2 datasets ranges from 0 to 1 where 1 corresponds to completely different datasets i.e. without having any enrichment site in common and 0 to identical datasets. Due to the symmetric property of the matrix, a cell position X,Y is identical to the cell position Y,X, and the diagonal corresponds to distances between same datasets (colored in white).

The matrix, the enrichment panel and the dataset table are accessible through the button located on the upper right corner of the page (N datasets loaded).

Distances are computed between two datasets based on the common presence or absence of local QC indicators (localQCs) for all positions on the genome. LocalQCs are the read-count intensities computed per 500 bp bin across the genome with a read count dispersion <10% (see NGS-QC tutorial). LocalQCs positions correlate with enriched sites.

The user can choose among multiple distance metrics, cluster and linkage methods in the options.

Note: Distances are precomputed from datasets analyzed by the NGS-QC generator. Some of them may not be available yet, read about missing datasets for more information.

Interacting

Place your mouse on a cell to get detailed information on a pairwise distance:

Field Description
Profile X/Y The GSM ID, the target molecule and the stamp associated with the profile on X or Y axis
Distance The distance computed according to the metric selected in the options panel.
Overlap (X∩Y)/X is the ratio of overlapping localQCs to the total of localQCs in X. (X∩Y)/Y is the ratio of overlapping localQCs to the total of localQCs in Y.

Zooming

The user can enlarge a sub part of the matrix with the mouse. Left click and move the cursor down and to the right to select the region of interest, a blank square is now visible on the matrix. Click on the square to rescale the matrix on the distances within the square. You can reset the view using the Reset Zoom button located under the matrix.
Note: it is only possible to zoom or display square matrix.

Zooming will update the list of annotations statistic in the annotation panel as well as the list of datasets displayed in the dataset table

Color scaling

The default scale ranges from 0 to 1; to scale the color to the minimum and maximum distance values in the actual matrix, click on the button "Scale color" below the heatmap. Click again to go back to default scale. These limits can be manually changed by dragging the slider handles located below the heatmap. The matrix cells corresponding to distances outside the opening (resp. closing) range are colored in black (resp. red) and the number of cells affected are indicated below the annotation panel. Click on the black and red boxes to select custom colors for the outliers.

Annotation panel

The target molecule, cell/tissue and quality stamp attributes are provided in the annotation panel and in the dataset table. The annotation panel allows the user to locate the positions of datasets sharing the same target molecule or tissue/cell or stamps on the matrix and therefore to annotated clusters. Switch the type of annotation using the tabs, then move the mouse on each row to highlight the corresponding datasets in the matrix. Highlighted datasets are surrounded by a red square and unshaded.

The tabs Datasets X & Y, Datasets X, Datasets Y allow the user to filter the annotation of datasets for a specific axis.
Note: the list of datasets on X and Y are identical (symmetric matrix) when the matrix displayed is symmetric, therefore the tabs Datasets X, Datasets Y are disabled.

The annotation list is always reflecting the datasets visible on the displayed matrix, consequently zooming will update the annotation panel.

By clicking on a row the user can display the list of corresponding datasets in the dataset table. Click on multiple rows to combine the effect.

Dataset table

The list of datasets is located below the heatmap. The GSM ID, the target molecule, the cell (NA if not available) and the stamp are provided.

The list of visible datasets is modified according to the user interactions with the matrix:

  • Zooming reduces the list of datasets used to compute zoomed part of the matrix.
  • Selecting a row in the annotation panel reduces the list to corresponding datasets. Selecting multiple rows combines the lists of datasets.
    • To get back the full list of datasets click on the button Reset Zoom located under the matrix.
      Note: the color of the table header indicates the list of datasets displayed, blue when filtered by zoom, green when filtered by the annotation panel, gray otherwise.

      Missing datasets

      We are putting effort into computing all the distances between all the datasets for a given assembly, some datasets analyzed by the NGS-QC generator might not have been processed by the QC comparator yet. In such case, those datasets do not appear in the matrix nor in the dataset table but can be viewed by clicking on the red button Missing Datasets. The button is disabled if all initially requested datasets have been processed.

      Datasets selection

      The user can select or un-select datasets by interacting with the checkbox located on the right side of each row. To select or un-select all datasets in the table click on the checkbox located in the header of the table.

      This system allows to, for instance, un-select all the datasets having a specific cell or target from the matrix and recompute the distances without using the Recompute button.
      Note: to add new datasets in the comparison matrix the user must use. NaVi.

      Navigation

      To send the list of selected datasets to other tools, use the dropdown button Send to.

      Options panel

      To access the options panel click on the Options button in the top navigation bar.

      Similarity measure

      The similarity measure is the function used to compute the distance between two datasets. Currently only Tanimoto (Rogers and Tanimoto, 1960) and Dice (Dice, 1945) measures are available.

      LocalQC Dispersion 10%

      Enriched sites across the genome are detected using localQCs. LocalQCs positions have been annotated 5 times using random sampling algorithm. The user can choose to compare datasets with the most reproducible LocalQCs (5/5) or less reproducible localQCs (1/5).

      Clustering

      A clustering algorithm is performed on the matrix prior to its display. Choose among multiple clustering distances and the clustering linkages. Clusters might be affected by this option.

      Note: changing the options will not recompute the matrix on the fly. To apply the new options click on the Recompute button located in the top right of the matrix page.