Distances above upper limit (1): 0
Note: This documentation is a work in progress and subject to change.
This manual presents QC comparator, describes how to use the web application, and illustrates features with examples. We recommend that you read it carefully, and use it for future reference.
Last updated: 07 Sept. 2017.
QC Comparator is an online tool allowing comparison of publicly available high-throughput sequencing datasets. The raw data were collected from public repositories, uniformly processed, and annotated using controlled vocabularies. QC comparator is part of the QC Genomics tools.
The QC comparator is a comparison tool that aims to visualize the genome-wide similarity, i.e. overlapping positions of enriched sites, between multiple NGS datasets. The tool outputs a dissimilarity matrix showing the pairwise dissimilarities (distances) between datasets that are provided as input by the user.
A distance between 2 datasets ranges from 0 to 1 where 1 corresponds to completely different datasets i.e. without having any enrichment site in common and 0 to identical datasets. Due to the symmetric property of the matrix, a cell position X,Y is identical to the cell position Y,X, and the diagonal corresponds to distances between same datasets (colored in white).
Distances are computed between two datasets based on the common presence or absence of local QC indicators (localQCs) for all positions on the genome. LocalQCs are the read-count intensities computed per 500 bp bin across the genome with a read count dispersion <10% (see NGS-QC tutorial). LocalQCs positions correlate with enriched sites.
The user can choose among multiple distance metrics, cluster and linkage methods in the options.
Place your mouse on a cell to get detailed information on a pairwise distance:
|Profile X/Y||The GSM ID, the target molecule and the stamp associated with the profile on X or Y axis|
|Distance||The distance computed according to the metric selected in the options panel.|
|Overlap||(X∩Y)/X is the ratio of overlapping localQCs to the total of localQCs in X. (X∩Y)/Y is the ratio of overlapping localQCs to the total of localQCs in Y.|
The user can enlarge a sub part of the matrix with the mouse. Left click and move the cursor down and
to the right to select the region of interest, a blank square is now visible on the matrix. Click
on the square to rescale the matrix on the distances within the square.
You can reset the view using the Reset Zoom button located under the matrix.
Note: it is only possible to zoom or display square matrix.
The default scale ranges from 0 to 1; to scale the color to the minimum and maximum distance values in the actual matrix, click on the button "Scale color" below the heatmap. Click again to go back to default scale. These limits can be manually changed by dragging the slider handles located below the heatmap. The matrix cells corresponding to distances outside the opening (resp. closing) range are colored in black (resp. red) and the number of cells affected are indicated below the annotation panel. Click on the black and red boxes to select custom colors for the outliers.
The target molecule, cell/tissue and quality stamp attributes are provided in the annotation panel and in the dataset table. The annotation panel allows the user to locate the positions of datasets sharing the same target molecule or tissue/cell or stamps on the matrix and therefore to annotated clusters. Switch the type of annotation using the tabs, then move the mouse on each row to highlight the corresponding datasets in the matrix. Highlighted datasets are surrounded by a red square and unshaded.
The tabs Datasets X & Y, Datasets X, Datasets Y allow the user to filter the annotation of datasets
for a specific axis.
Note: the list of datasets on X and Y are identical (symmetric matrix) when the matrix displayed is symmetric, therefore the tabs Datasets X, Datasets Y are disabled.
The annotation list is always reflecting the datasets visible on the displayed matrix, consequently zooming will update the annotation panel.
By clicking on a row the user can display the list of corresponding datasets in the dataset table. Click on multiple rows to combine the effect.
The list of datasets is located below the heatmap. The GSM ID, the target molecule, the cell (NA if not available) and the stamp are provided.
The list of visible datasets is modified according to the user interactions with the matrix:
To get back the full list of datasets click on the button Reset Zoom located under the matrix.
Note: the color of the table header indicates the list of datasets displayed, blue when filtered by zoom, green when filtered by the annotation panel, gray otherwise.
We are putting effort into computing all the distances between all the datasets for a given assembly, some datasets analyzed by the NGS-QC generator might not have been processed by the QC comparator yet. In such case, those datasets do not appear in the matrix nor in the dataset table but can be viewed by clicking on the red button Missing Datasets. The button is disabled if all initially requested datasets have been processed.
The user can select or un-select datasets by interacting with the checkbox located on the right side of each row. To select or un-select all datasets in the table click on the checkbox located in the header of the table.
This system allows to, for instance, un-select all the datasets having a specific cell or target from the
matrix and recompute the distances without using the Recompute button.
Note: to add new datasets in the comparison matrix the user must use. NaVi.
To send the list of selected datasets to other tools, use the dropdown button Send to.
To access the options panel click on the Options button in the top navigation bar.
The similarity measure is the function used to compute the distance between two datasets. Currently only Tanimoto (Rogers and Tanimoto, 1960) and Dice (Dice, 1945) measures are available.
Enriched sites across the genome are detected using localQCs. LocalQCs positions have been annotated 5 times using random sampling algorithm. The user can choose to compare datasets with the most reproducible LocalQCs (5/5) or less reproducible localQCs (1/5).
A clustering algorithm is performed on the matrix prior to its display. Choose among multiple clustering distances and the clustering linkages. Clusters might be affected by this option.
Note: changing the options will not recompute the matrix on the fly. To apply the new options click on the Recompute button located in the top right of the matrix page.