Search thousands of experiments by organism, technique, biological source, author…
|Accession||Organism||Sample||Experiment||Mapped Reads||QC stamp|
|No entries found|
Browse over hundred of publicly available 3D experiments such as Hi-C.
|No entries found|
Visualize enrichment sites, read coverage, and chromatin organization in one place with our lightweight genome browser.
Add custom tracks to visualize your data along publicly available experiments.
|No tracks added|
This manual presents NAVi, describes how to use the web application, and illustrates features with examples. We recommend that you read it carefully, and refer to it for future reference.
Last updated: 13 September 2017.
NAVi is data portal allowing to search, and visualize publicly available high-throughput sequencing data sets. The raw data were collected from public repositories, uniformly processed, and annotated using controlled vocabularies.
A tour presenting the search and visualization features can be started by clicking the "Start the tour" button on the main portal or by clicking the following link: take the tour.
The video below shows how to search for H3K27ac and FAIRE-seq data, then order, filter, and select experiments of interest.
H3K27ac is an epigenetic mark associated with active promoters and enhancers, and FAIRE-seq is a method for identifying open chromatin regions.
The integration and visualization of Hi-C data with other genomic data remains a challenge because of its complexity. Several tools to visualize Hi-C data have already been described, and some include publicly available data sets. However, NAVi is the only application that comes with thousands of data sets.
Open the Chromatin interactions panel, to search chromosome conformation capture data sets. The search principles are the same than for 2D experiments.
Please note that it is not possible to simultaneously visualize data from different organisms. If you open the Genome browser tab having selected experiments from more than one organism (e.g. Human and mouse), an error message will invite you to refine your selection.
To visualize selected experiments, open the Genome browser tab, and define a genomic region to display by entering a gene symbol or genomic coordinates. The following video shows how to integrate several experiments for visual exploration, and how to change some track settings.
The data sets used as examples in this section are listed below.
|Homo sapiens||MCF-7||POL2RA ChIA-PET||Li GL, et al. Cell. 2012|
|FOXA1 ChIP-seq||Theodorou V, et al. Genome Res. 2013|
|RNAPol2 ChIP-seq||wa Maina C, et al. PLoS Comput Biol. 2014|
|H3K4me3 ChIP-seq||Yamamoto S, et al. Cancer Cell. 2014|
|H3K9ac ChIP-seq||Grimmer MR, et al. Nucleic Acids Res. 2014|
|FAIRE-seq||Hardy K, et al. Nucleus. 2016|
|H3K27ac ChIP-seq||Rhie SK, et al. Epigenetics Chromatin. 2016|
|HindIII Hi-C||Barutcu AR, et al. Genome Biol. 2015|
|GM12878||HindIII Capture Hi-C||Cairns JC, et al. Genome Biol. 2016|
Gene annotations display genes at their respective position on the genome. The gene's sense strand is represented by less-than (<, minus strand) and greater-than (>, plus strand) symbols.
When zooming out, genes too small compared to the genomic region are not displayed. The figure below shows a zoomed out region of the previous figure: smallest genes are not represented anymore (e.g. HOXA7).
NGS-QC Generator infers local quality indicators by evaluating the influence of random sampling on a given profile. Briefly, the genome is divided into defined windows of 500bp (referred as bins) and mapped reads are assigned to bins. Three random samplings are realised by retaining 90%, 70%, and 50% of the original reads, then the read count dispersion (δRCI) is calculated for each bin, where the dispersion represents the difference between the expected count and the observed one, expressed as a percentage. Bins with a dispersion lower than 10% for each sampling are retained.
This test is performed five times, to ensure its reproducibility. Finally, bins that have been retained at least N times (1 ≥N ≥ 5) are represented on a heatmap.
It is possible to increase or decrease N to change the stringency of the filtering (e.g. setting N = 5 shows only bins with a very strong signal).
Genome coverage tracks represent signal data by calculating the read coverage that is the number of reads per bin, where bins are consecutive, fixed-size windows.
Coverage tracks are available for all 2D experiments from the NGS-QC collection. When visualizing a profile, only the δRCI heatmap is displayed by default. To load the genome coverage, right-click on the δRCI track, then select Coverage.
The scale of the Y-axis is defined according to be the greatest value within the displayed genomic region. If this value is particularly high, the rest of the signal might look like background. In order to better distinguish the profile, you can change the Y-axis maximum value by following these steps:
When analysing ChIP-seq, FAIRE-seq, and enrichment related assays, it is common to consider duplicate reads as PCR duplicates and remove them, as they could contribute to false positives during peak detection. However, comparing a profile with and without duplicates can help to evaluate the library complexity. By default, coverage tracks display the signal with duplicates, but it is possible to superimpose the signal with duplicates and the signal without by following these steps:
Peak calling is a fundamental step in the analysis of ChIP-seq and related assays data, and aims to identify protein-DNA binding events.
MACS is a popular peak caller and is particularly appropriate for transcription factor data. ChIP-seq, and chromatin accessibility experiments from the NGS-QC collection have been processed with MACS2 and the peaks called are available for visualization. To load an experiment's peaks, right-click on the δRCI track, and select Peaks in the menu (if the item is missing, it means that peaks are not yet available for this experiment).
Peaks reported by MACS2 are not filtered by p-value or q-value. However it is possible to filter those using a p-value cutoff:
Chromosome conformation capture data is represented as a heatmap where each element of the matrix corresponds to a genomic interaction, and is colored according to the contact frequency. To support zooming, four resolutions are available: 5kb, 25kb, 100kb, and 1Mb. When loading a contact map, the resolution is selected based on to the length of the current genomic region, and can be displayed by hovering the pointer over the track's name.
Heat maps are convenient to explore the overall structure of the chromatin, but they become limited for local visualization, as they display all data. NAVi offers the possibility to display genomic interactions as arcs between two loci. Moreover, it is possible to filter loops by contact frequency or to display only those having at least on end within a given interval.
A session can be shared with other investigators by saving currently visualized experiments, settings, and genomic coordinates. Click the button to generate a shareable link: anyone with this link can load the session. Shareable links last seven days.
Because visual inspection of biological data is important, NAVi allows investigators to explore their data along publicly available experiments.
The BigWig format is a binary, compressed, and indexed version of the wiggle format. It is designed to display dense, continuous data (e.g. genome coverage), and supports zooming by storing data at different resolutions.
Click here for more information on the bigWig format.
The narrowPeak format is used to store called peaks of signal enrichment. When calling peaks with MACS2, a narrowPeak file is produced, unless the
--broad flag is on.
Click here for more information on the narrowPeak format.
Please note that it is not yet possible to select a normalization.
The hic format is binary, compressed, and indexed file format designed to store Hi-C contact maps at multiple resolutions.
Click here for more information on the hic format.
The table below contains the reference genomes used to process publicly available data.
|Organism||Common name||Genome assembly|
Overview of the experiment quality in the context of sequencing depth. The x-axis show the number of successfully mapped reads, and the y-axis shows the quality score (δRCI<10%) inferred by NGS-QC Generator.