Open source

You are here

Open source software license

Scalable Insets

Scalable Insets is a new technique for interactively exploring and navigating large numbers of annotated patterns in multiscale visual spaces such as genome interaction maps from Hi-C experiments. Our technique visualizes annotated features, such as loops or TADs, too small to be identifiable at certain zoom levels using insets, i.e., magnified thumbnail views of the features. Insets are dynamically placed either within the viewport or along the boundary of the viewport to offer a compromise between locality and context preservation. Annotated features are interactively clustered by location and type. They are visually represented as an aggregated inset to provide scalable exploration within a single viewport. Finds out more in the project page and our 5-mins introductory video.

Release Date:
Status:
Availability:
Data type:
Techniques:
Software:
Technology:
Platform:
Requirements:

Project development

Institution: Harvard University

We implemented Scalable Insets as an extension to HiGlass, a flexible web application for viewing large tile-based genomic datasets. Besides genome interaction maps, our implementation currently supports gigapixel images and geographic maps too. The tool can easily be applied to existing BEDPE annotation files. The source code is available on GitHub

karyoploteR

karyoploteR is an R/Bioconductor package to plot genomic data along the genome. It implements a genomic coordinates version of most R graphical primitives facilitating the creation of rich and powerful genome visualizations. Since karyoploteR does not try to "understand" the data it is plotting, it can plot almost anything, any data type,  as long as it is positioned on the genome. In addition, while the package includes data for some of the most used genomes, it can automatically download genome information from external sources and accepts custom genomes directly from the user, thus making it possible to "plot anything on any genome". karyoploteR covers the whole zoom range, going from single base to whole genome changing a single parameter in a function call.  There are additional higher level functions to plot specific types of data, for example one to compute and plot the density of features along the genome, another to plot the coverage level directly from a BAM file or a third one to plot links between genomic regions. 

To know more about the functionality of karyoploteR you can check the package vignette or head to the karyoploteR tutorial page, were you will find a step-by-step tutorial on how to use the package as well as some more involved examples with detailed explanations including how to use karyoploteR to plot different standard data types: RNA-seq differential expression results, SNP-array data, somatic mutation distance using rainfall plots.

 

Bioconductor landing page: http://bioconductor.org/packages/karyoploteR/

Tuorial and Examples: https://bernatgel.github.io/karyoploter_tutorial/

Source code at github: https://github.com/bernatgel/karyoploteR

Release Date:
Status:
Availability:
Data type:
Techniques:
Software:
Technology:
Platform:
Requirements:

Project development

Institution: Germans Trias i Pujol Research Institute, IGTP

SATORI

SATORI is an ontology-guided visual exploration system for data repositories, which combines powerful metadata search with a treemap and a node-link diagram that visualize the repository structure, provide context to retrieved data sets, and serve as an interface to drive semantic querying and exploration, and thereby support the information foraging loop. SATORI is  web-based, open-source, and integrated  in  the Refinery-Platform—an application for biomedical data management, analysis, and visualization.

Release Date:
Status:
Availability:
Data type:
Techniques:
Software:
Technology:
Platform:
Requirements:

Project development

Institution: Harvard Medical School

Why?  Biomedical repositories are growing rapidly and provide scientists with tremendous opportunities to re-use data. In order to exploit published data sets efficiently, it is crucial to understand the content of repositories and to discover data relevant to a question of interest. These are challenging tasks, as most repositories currently only support finding data sets through  text-based search of metadata and in some cases also through metadata-based browsing. To address this, we conducted a task analysis through semi-structured interviews with 8 PhD-level domain experts and identified 3 distinct user roles.

What?  Biological data sets consists of experimental data and metadata describing the studies, properties of the analyzed biological samples, and attributes of individual data files. In this context, a data set is a collection of data files, along with the metadata. Additionally, metadata is partially annotated with ontology terms. An ontology describes a certain domain (e.g. human anatomy), defines controlled vocabularies for its concepts and relationships (e.g., kidney and is-part-of) and relates concepts with each other (e.g., nephron is-part-of kidney). By means of ontology terms, sets of annotated data sets can be classified hierarchically. SATORI extracts free-text and ontologically annotated metadata. The free-text metadata is indexed in a text-based search system. Additionally, data set-related ontology classes are parsed and visualized to provide  semantic context to data sets. Since SATORI's goal is to support exploration rather than to visualize ontologies themselves, only a relevant subtree of the ontologies is shown, i.e., effectively enforcing a strict containment hierarchy.

How?  SATORI is composed of two main interlinked views: the data set view and the exploration view. In the treemap an ontology term is illustrated by a rectangle. The area of the rectangle visualizes the size of the term relative to its sibling terms and the color indicates the distance to the farthest child term. The farther away this child term is, the darker is the color. The node-link diagram represents ontology terms as nodes and links shown parent and child terms. Additionally, the diagram visualizes the precision and recall for each term given the currently retrieved data sets. In this context, precision is useful to understand how frequently a term is used for annotation in the retrieved set of data sets and recall provides a notion of information scent by indicating if there are more data sets annotated with this term. Finally, the exploration view acts as a semantic query interface and lets users filter down collections of data sets via ontology term-based Boolean queries.

HiPiler

HiPiler an interactive visualization interface for the exploration and visualization of regions-of-interest in large genome interaction matrices. Genome interaction matrices approximate the physical distance of pairs of genomic regions to each other and can contain up to 3 million rows and columns with many sparse regions. Traditional matrix aggregation or pan-and-zoom interfaces largely fail in supporting search, inspection, and comparison of local regions-of-interest (ROIs). ROIs can be defined, e.g., by sets of adjacent rows and columns, or by specific visual patterns in the matrix. ROIs are first-class objects in HiPiler, which represents them as thumbnail-like “snippets”. Snippets can be laid out automatically based on their data and meta attributes. They are linked back to the matrix and can be explored interactively. The design of HiPiler is based on a series of semi-structured interviews with 10 domain experts involved in the analysis and interpretation of genome interaction matrices. In the paper we describe six exploration tasks that are crucial for analysis of interaction matrices and demonstrate how HiPiler supports these tasks. We report on a user study with a series of data exploration sessions with domain experts to assess the usability of HiPiler as well as to demonstrate respective findings in the data.

Release Date:
Status:
Availability:
Data type:
Techniques:
Software:
Technology:
Platform:
Requirements:

Project development

Institution: Harvard University

HiPiler is implemented as a web application consisting of a frontend interface for the visualizations and a server-side component that provides the data. The frontend is entirely written in JavaScript utilizing Aurelia as its application framework and Redux for fine-grained, history-aware state management. The matrix snippets are visualized with WebGL using Three.js as a middleware. Finally, HiGlass is integrated as a library for displaying the interaction matrix and genomic tracks. The server-side backend serves data to HiGlass and provides the matrix snippets. The backend is implemented in Python and uses Django as its application framework. The contact matrices are accessed through Cooler, a Python-based service library for storing and querying of Hi-C data. The front and backend are two separate applications that can be decoupled to load different data types. HiPiler is open source and available on GitHub.

Listeriomics

As for many model organisms, the amount of Listeria omics data produced has recently increased exponentially. There are now >80 published complete Listeria genomes, around 350 different transcriptomic data sets, and 25 proteomic data sets available. The analysis of these data sets through a systems biology approach and the generation of tools for biologists to browse these various data are a challenge for bioinformaticians. We have developed a web-based platform, named Listeriomics, that integrates different tools for omics data analyses, i.e., (i) an interactive genome viewer to display gene expression arrays, tiling arrays, and sequencing data sets along with proteomics and genomics data sets; (ii) an expression and protein atlas that connects every gene, small RNA, antisense RNA, or protein with the most relevant omics data; (iii) a specific tool for exploring protein conservation through the Listeria phylogenomic tree; and (iv) a coexpression network tool for the discovery of potential new regulations. Our platform integrates all the complete Listeria species genomes, transcriptomes, and proteomes published to date. This website allows navigation among all these data sets with enriched metadata in a user-friendly format and can be used as a central database for systems biology analysis.

Release Date:
Status:
Availability:
Data type:
Techniques:
Software:
Technology:
Platform:
Requirements:

Project development

Institution: Institut Pasteur

Developed in Java using Eclipse RCP/RAP API

 

SeqMonk

SeqMonk is a program to enable the visualisation and analysis of mapped sequence data. It was written for use with mapped next generation sequence data but can in theory be used for any dataset which can be expressed as a series of genomic positions. It's main features are:

  • Import of mapped data from mapped data (BAM/SAM/bowtie etc)
  • Creation of data groups for visualisation and analysis
  • Visualisation of mapped regions against an annotated genome.
  • Flexible quantitation of the mapped data to allow comparisons between data sets
  • Statistical analysis of data to find regions of interest
  • Creation of reports containing data and genome annotation
Release Date:
Status:
Availability:
Data type:
Techniques:
Software:
Technology:
Platform:
Requirements:

Project development

Institution: Babraham Institute

InterMine

InterMine is an open source data warehouse built specifically for the integration and analysis of complex biological data. Developed by the Micklem lab at the University of Cambridge, InterMine enables the creation of biological databases accessed by sophisticated web query tools. Parsers are provided for integrating data from many common biological data sources and formats, and there is a framework for adding your own data. InterMine includes an attractive, user-friendly web interface that works 'out of the box' and can be easily customised for your specific needs, as well as a powerful, scriptable web-service API to allow programmatic access to your data.

Release Date:
Status:
Availability:
Data type:
Techniques:
Software:
Technology:
Platform:
Requirements:

Project development

Institution: Micklem Lab, University of Cambridge

MARender

MARender is a JavaScript 3D rendering system based on three.js (http://threejs.org/).

The rendering system is centred around a JavaScript class MARenderer and aimed at simple web-based visualisation of 3D bio-medical datasets, with particular emphasis on anatomy and mapped spatial data (eg gene expression).

Typical uses combine surface, section and point cloud renderings. Surfaces and point clouds are most readily read from VTK format files using the modified VTK loader https://github.com/ma-tech/three.js/blob/master/examples/js/loaders/MAVTKLoader.js and sections either from static images or from an IIP3D server (https://github.com/ma-tech/WlzIIPSrv).

Release Date:
Status:
Availability:
Data type:
Techniques:
Software:
Technology:
Platform:
Requirements:

Project development

Institution: MRC Institute of Genetics & Molecular Medicine, The University of Edinburgh

HaptiMOL

The HaptiMOL suite enables interaction with protein structures using force feedback, through the use of a haptic feedback device:

  • HaptiMOL ISAS enables users to interact with the solvent accessible surface of biomolecules, by probing the surface with a sphere. 
  • HaptiMOL ENM enables users to apply forces to atoms in an elastic network model and to observe the resulting deformation. (A mouse version is also available).
  • HaptiMOL RD (coming soon) will be designed for rigid molecular docking. 
Release Date:
Status:
Availability:
Data type:
Techniques:
Software:
Technology:
Platform:
Requirements:

Project development

Institution: University of East Anglia

ALVIS

Alvis is an open-source platform for the joint explorative analysis of MSAs and phylogenetic trees, employing Sequence Bundles as its main visualization method. Alvis combines the power of the visualization method with an interactive toolkit allowing detection of covariant sites, annotation of trees with synapomorphies and homoplasies, and motif detection. It also offers numerical analysis functionality, such as dimension reduction and classification. Alvis is user-friendly, highly customizable and can export results in publication-quality figures. It is available as a full-featured standalone version (http://www.bitbucket.org/rfs/alvis) and its Sequence Bundles visualization module is further available as a web application (http://science-practice.com/projects/sequence-bundles).

Release Date:
Status:
Availability:
Data type:
Techniques:
Software:
Technology:
Platform:
Requirements:

Project development

Institution: Goldman Group, EMBL-EBI; Science Practice

Pages