Jumper Data Integration

the power of semantic mashups.

Biological data is highly fragmented. Different disciplines act autonomously, producing data repositories and analytical tools that operate in isolation. Bioinformatics data sources often have large, complex data structures, reflecting the richness of the scientific concepts they model. While many bioinformatics data sources cover similar domains, such as genes, proteins, sequence annotations or microarray results they rely on widely differing models. This poses significant challenges for traditional data “warehousing”.

Given the complexity of biological data it is difficult to design a flexible model that can represent any level of complexity in any data schema, relationship, or schema substructure represented in biology. It is currently not possible in most relational and object database systems to extend the schema if you need to add new non-aligned data sources. Warehouse models are very static but biological data models can change at a rapid pace. Advances in scientific knowledge require regular changes be made to the underlying data models, requiring entirely new schemas instead of incrementally changing the system as changes become necessary.

Flexibility when integrating data sources is critical for scientific investigation. Jumper semantic storage technology provides an additional layer called the semantic layer. This layer provides an upper-level framework for metadata abstraction. This new layer ensures transactional synchronization of data across heterogeneous services. The single biggest challenge integrating data is finding out which data items from different sources are the same. Defining an interchange format that captures the shared meaning-preserving structure of data elements from one schema notation into another schema notation allows for automated transformations. The semantic layer is expressed in a Semantic Dictionary, which defines a shared terminology for metadata syntax, and how it is used. The dictionary presents this terminology as a process specific vocabulary. The sdictionary defines a meaning-preserving structure from a source schema notation into a target schema notation.

high-speed data conversions

Model Driven Integration is an innovative solution. It eliminates the risks, high-costs and long completion times associated with traditional data management approaches to integration. Model-driven integration focuses on abstracting the information content into a semantic model that fully describes the source and target schemas for the determination of equivalence between models. A Jumper crosswalk allows the metadata in each schema to be correlated so that the discrete information fields in the different schemas that have the same or similar meaning can be aligned. The crosswalk builds the table or "maps" that show these relationships.

Are Integration efforts proving a Bottleneck for your Research?