DNA Sequencing

You are here

SATORI

SATORI is an ontology-guided visual exploration system for data repositories, which combines powerful metadata search with a treemap and a node-link diagram that visualize the repository structure, provide context to retrieved data sets, and serve as an interface to drive semantic querying and exploration, and thereby support the information foraging loop. SATORI is  web-based, open-source, and integrated  in  the Refinery-Platform—an application for biomedical data management, analysis, and visualization.

Release Date:
Status:
Availability:
Data type:
Techniques:
Software:
Technology:
Platform:
Requirements:

Project development

Institution: Harvard Medical School

Why?  Biomedical repositories are growing rapidly and provide scientists with tremendous opportunities to re-use data. In order to exploit published data sets efficiently, it is crucial to understand the content of repositories and to discover data relevant to a question of interest. These are challenging tasks, as most repositories currently only support finding data sets through  text-based search of metadata and in some cases also through metadata-based browsing. To address this, we conducted a task analysis through semi-structured interviews with 8 PhD-level domain experts and identified 3 distinct user roles.

What?  Biological data sets consists of experimental data and metadata describing the studies, properties of the analyzed biological samples, and attributes of individual data files. In this context, a data set is a collection of data files, along with the metadata. Additionally, metadata is partially annotated with ontology terms. An ontology describes a certain domain (e.g. human anatomy), defines controlled vocabularies for its concepts and relationships (e.g., kidney and is-part-of) and relates concepts with each other (e.g., nephron is-part-of kidney). By means of ontology terms, sets of annotated data sets can be classified hierarchically. SATORI extracts free-text and ontologically annotated metadata. The free-text metadata is indexed in a text-based search system. Additionally, data set-related ontology classes are parsed and visualized to provide  semantic context to data sets. Since SATORI's goal is to support exploration rather than to visualize ontologies themselves, only a relevant subtree of the ontologies is shown, i.e., effectively enforcing a strict containment hierarchy.

How?  SATORI is composed of two main interlinked views: the data set view and the exploration view. In the treemap an ontology term is illustrated by a rectangle. The area of the rectangle visualizes the size of the term relative to its sibling terms and the color indicates the distance to the farthest child term. The farther away this child term is, the darker is the color. The node-link diagram represents ontology terms as nodes and links shown parent and child terms. Additionally, the diagram visualizes the precision and recall for each term given the currently retrieved data sets. In this context, precision is useful to understand how frequently a term is used for annotation in the retrieved set of data sets and recall provides a notion of information scent by indicating if there are more data sets annotated with this term. Finally, the exploration view acts as a semantic query interface and lets users filter down collections of data sets via ontology term-based Boolean queries.

Listeriomics

As for many model organisms, the amount of Listeria omics data produced has recently increased exponentially. There are now >80 published complete Listeria genomes, around 350 different transcriptomic data sets, and 25 proteomic data sets available. The analysis of these data sets through a systems biology approach and the generation of tools for biologists to browse these various data are a challenge for bioinformaticians. We have developed a web-based platform, named Listeriomics, that integrates different tools for omics data analyses, i.e., (i) an interactive genome viewer to display gene expression arrays, tiling arrays, and sequencing data sets along with proteomics and genomics data sets; (ii) an expression and protein atlas that connects every gene, small RNA, antisense RNA, or protein with the most relevant omics data; (iii) a specific tool for exploring protein conservation through the Listeria phylogenomic tree; and (iv) a coexpression network tool for the discovery of potential new regulations. Our platform integrates all the complete Listeria species genomes, transcriptomes, and proteomes published to date. This website allows navigation among all these data sets with enriched metadata in a user-friendly format and can be used as a central database for systems biology analysis.

Release Date:
Status:
Availability:
Data type:
Techniques:
Software:
Technology:
Platform:
Requirements:

Project development

Institution: Institut Pasteur

Developed in Java using Eclipse RCP/RAP API

 

SeqMonk

SeqMonk is a program to enable the visualisation and analysis of mapped sequence data. It was written for use with mapped next generation sequence data but can in theory be used for any dataset which can be expressed as a series of genomic positions. It's main features are:

  • Import of mapped data from mapped data (BAM/SAM/bowtie etc)
  • Creation of data groups for visualisation and analysis
  • Visualisation of mapped regions against an annotated genome.
  • Flexible quantitation of the mapped data to allow comparisons between data sets
  • Statistical analysis of data to find regions of interest
  • Creation of reports containing data and genome annotation
Release Date:
Status:
Availability:
Data type:
Techniques:
Software:
Technology:
Platform:
Requirements:

Project development

Institution: Babraham Institute

ALVIS

Alvis is an open-source platform for the joint explorative analysis of MSAs and phylogenetic trees, employing Sequence Bundles as its main visualization method. Alvis combines the power of the visualization method with an interactive toolkit allowing detection of covariant sites, annotation of trees with synapomorphies and homoplasies, and motif detection. It also offers numerical analysis functionality, such as dimension reduction and classification. Alvis is user-friendly, highly customizable and can export results in publication-quality figures. It is available as a full-featured standalone version (http://www.bitbucket.org/rfs/alvis) and its Sequence Bundles visualization module is further available as a web application (http://science-practice.com/projects/sequence-bundles).

Release Date:
Status:
Availability:
Data type:
Techniques:
Software:
Technology:
Platform:
Requirements:

Project development

Institution: Goldman Group, EMBL-EBI; Science Practice

VisRseq

VisRseq is a framework for analysis of sequencing datasets that provides a computationally rich and accessible framework for integrative and interactive analyses without requiring programming expertise. Features include R apps, which offer a semi-auto generated and unified graphical user interface for computational packages in R and repositories such as Bioconductor. To address the interactivity limitation inherent in R libraries, the framework includes several native apps that provide exploration and brushing operations as well as an integrated genome browser. The apps can be chained together to create more powerful analysis workflows.

Release Date:
Status:
Availability:
Data type:
Techniques:
Software:
Technology:
Platform:
Requirements:

Project development

Institution: GrUVi Lab, Simon Fraser University, Canada

CooccurViewer

The project seeks to expose correlation of observations made between any pair of events in a data sequence. In this project, we present two methods for identifying interesting co-occurrences (see the manuscript for a detailed discussion).

A demo is availble of these two approaches through the project website.

Release Date:
Status:
Availability:
Data type:
Techniques:
Software:
Technology:
Platform:
Requirements:

Project development

Institution: Department of Computer Sciences at the University of Wisconsin—Madison

BioJS

BioJS is a library of over hundred JavaScript components enabling you to visualize and process data using current web technologies.

Release Date:
Status:
Availability:
Data type:
Techniques:
Software:
Technology:
Platform:
Requirements:

Project development

Institution: BioJS

Integrated Genome Browser

The Integrated Genome Browser (IGB, pronounced Ig-Bee) is a fast, flexible, and free desktop genome browser. First developed at Affymetrix in 2001 to support visual analytics of genome tiling arrays, IGB provides an advanced, highly customizable environment for exploring and analyzing large-scale genomic data sets.

Using IGB, you can:

  • View your RNA-Seq, ChIP-chip or ChIP-seq data alongside genome annotations and sequence.
  • Investigate alternative splicing, regulation of gene expression, epigenetic modifications of DNA, and other genome-scale questions.
  • View results from aligning short-read sequences onto a target genome, identify SNPs, and check alignment quality.
  • Copy and paste genomic sequences for further analysis into other tools, such as primer design and promoter analysis tools.
  • Create high-quality images for publication in a variety of formats.

 

IGB features

IGB lets you view results from your own experiments or computational analyses alongside public domain gene annotations, sequences, and genomic data sets, thus making it easier for you to determine how your experiments agree or disagree with current thinking and models of genomic structure.

Some features IGB offers include:

  • Animated zooming. Most genome browsers implement "jump zooming" only, in which you click a zoom button (or other type of control) and then wait for the display to re-draw. In IGB, zooming is animated, allowing you to easily and quickly adjust the zoom level as needed without losing track of your location.
  • Simple Data Sharing System - QuickLoad. IGB implements a very simple, easy-to-use system for sharing data called QuickLoad. You can use the QuickLoad system to set up a Web site you can use to share your data with colleagues, reviewers, and the public.
  • Draggable graphs. You can display genome graphs data (e.g., "bar" and "wiggle" files) alongside and even on top of reference genome annotations, thus making it easier to see how your experimental results match up to the published reference genome annotations. You can reset your graphs to "floating" and click-drag them over annotations to compare your results with annotations and others' experiments.
  • Edge-matching across tracks. When you click an item in the display, the edges of other items in the same or different tracks with identical boundaries light up, highlighting interesting similarities or differences across gene models, sequence reads, or other features.
  • Integration with local and remote external data sources. IGB can load data from a variety of sources, including Distributed Annotation Servers, QuickLoad servers, ordinary Web sites, and local files.
  • Intron-trimming sliced view. In many species, introns are huge when compared to the exonic (coding) regions of genes. IGB provides a Sliced View tab that trims uninformative regions from introns.
  • Web-controls. IGB can be controlled from a web browser or any other program capable of sending HTTP requests. Via IGB links, you can create Web pages that direct IGB to scroll to a specific region and load data sets from local files or servers.
  • Scripting. IGB understands a simple command language that allows users to write simple scripts directing IGB to show a genome, zoom and scroll to specific regions, and other functions.
  • Open source. All development on IGB proceeds via a 100% open source model. The license allows developers to incorporate IGB (and its components) into new applications.
Release Date:
Status:
Availability:
Data type:
Techniques:
Software:
Technology:
Platform:
Requirements:

Project development

Institution: UNC @ Charlotte

SequenceJuxtaposer

SequenceJuxtaposer is a flexible sequence visualization tool used to explore a set of sequences. It provides an immediate view of a set of sequences that a user can navigate through with a few clicks of a mouse.

SequenceJuxtaposer is a tool for the exploration and comparison of biomolecular sequences. We use an information visualization technique called "accordion drawing'' that guarantees three key properties: context, visibility, and frame rate. We provide context through the navigation metaphor of a rubber sheet that can be smoothly stretched to show more details in the areas of focus, while the surrounding regions of context are correspondingly shrunk. Landmarks, such as user specified motifs or differences between aligned base pairs across multiple sequences, are guaranteed to be visible even if located in the shrunken areas of context. Our graphics infrastructure for progressive rendering provides immediate responsiveness to user interaction by guaranteeing that we redraw the scene at a target frame rate. Our preprocessing algorithms are subquadratic: O(nk) for k sequences of n base pairs each. All runtime rendering algorithms are sublinear in nk: they are O(v) where v is the number of items visible onscreen at once, and v \ll nk. SequenceJuxtaposer supports interaction at 20 frames per second when browsing collections of several hundred sequences that comprise over 1.7 million total base pairs. 

Any FASTA file (DNA or RNA) can be loaded into SequenceJuxtaposer.

 

Release Date:
Status:
Availability:
Data type:
Techniques:
Software:
Technology:
Platform:
Requirements:

Project development

Institution: University of British Columbia

ARB

The ARB program package comprises a variety of directly interacting software tools for sequence database maintenance and analysis which are controlled by a common graphical user interface. Although it was initially designed for ribosomal RNA data, it can be used for any nucleic and amino acid sequence data as well. A central database contains processed (aligned) primary structure data. Any additional descriptive data can be stored in database fields assigned to the individual sequences or linked via local or worldwide networks. A phylogenetic tree visualized in the main window can be used for data access and visualization. The package comprises additional tools for data import and export, sequence alignment, primary and secondary structure editing, profile and filter calculation, phylogenetic analyses, specific hybridization probe design and evaluation and other components for data analysis. Currently, the package is used by numerous working groups worldwide.

Release Date:
Status:
Availability:
Data type:
Techniques:
Software:
Technology:
Platform:
Requirements:

Project development

Institution:

Pages