the power of semantic mashups.
Biological data is highly fragmented. Different disciplines act autonomously, producing data
repositories and analytical tools that operate in isolation. Bioinformatics data sources often have large, complex data
structures, reflecting the richness of the scientific concepts they model. While many bioinformatics data sources cover
similar domains, such as genes, proteins, sequence annotations or microarray results they rely on widely differing models.
This poses significant challenges for traditional data “warehousing”.
Given the complexity of biological data it is difficult to design a flexible model that can represent any level of
complexity in any data schema, relationship, or schema substructure represented in biology. It is currently not possible
in most relational and object database systems to extend the schema if you need to add new non-aligned data sources.
Warehouse models are very static but biological data models can change at a rapid pace. Advances in scientific knowledge
require regular changes be made to the underlying data models, requiring entirely new schemas instead of incrementally
changing the system as changes become necessary.
Flexibility when integrating data sources is critical for scientific investigation. Jumper semantic storage technology
provides an additional layer called the semantic layer. This layer provides an upper-level framework for metadata
abstraction. This new layer ensures transactional synchronization of data across heterogeneous services. The single
biggest challenge integrating data is finding out which data items from different sources are the same. Defining an
interchange format that captures the shared meaning-preserving structure of data elements from one schema notation into
another schema notation allows for automated transformations. The semantic layer is expressed in a Semantic Dictionary,
which defines a shared terminology for metadata syntax, and how it is used. The dictionary presents this terminology as a
process specific vocabulary. The sdictionary defines a meaning-preserving structure from a source schema notation into a
target schema notation.