guest blog by Jin Kim, VP Product Development for Objectivity, Inc.
Almost any popular, fast-growing market experiences at least a bit of confusion around terminology. Multiple firms are frantically competing to insert their own “marketectures,” branding, and colloquialisms into the conversation with the hope their verbiage will come out on top.
Add in the inherent complexity at the intersection of Business Intelligence and Big Data, and it’s easy to understand how difficult it is to discern one competitive claim from another. Everyone and their strategic partner is focused on “leveraging data to glean actionable insights that will improve your business.” Unfortunately, the process involved in achieving this goal is complex, multi-layered, and very different from application to application depending on the type of data involved.
For our purposes, let’s compare and contrast two terms that are starting to be used interchangeably – Information Fusion and Data Integration. These two terms in fact refer to distinctly separate functions with different attributes. By putting them side-by-side, we can showcase their differences and help practitioners understand when to use each.
Before we delve into their differences, let’s take a look at their most striking similarity. Both of these technologies and best practices are designed to integrate and organize data coming in from multiple sources in order to present a unified view of data for consumption by various applications to derive actionable insights, thus making it easier for analytics applications to use and derive the “actionable insights” everyone is looking to generate.
However, Information Fusion diverges from Data Integration in a few key ways that make it much more appropriate for many of today’s environments.
• Data Reduction – Information Fusion is, first and foremost, designed to enable data abstraction. So, while data integration focuses on combining data to create consumable data, Information Fusion frequently involves “fusing” data at different abstraction levels and differing levels of uncertainty to support a more narrow set of application workloads.
• Handling Streaming/Real-Time Data – Data Integration is best used with data-at-rest or batch-oriented data. The problem is that the most compelling applications associated with Big Data and the Industrial Internet of Things are often based on streaming, sensor data. Information Fusion is capable of integrating, transforming and organizing all manner of data (structured, semi-structured, and unstructured), but specifically time-series data, for use by today’s most demanding analytics applications to bridge the gap between Fast Data and Big Data. Another way to put this is Data integration creates an integrated set of data where the larger set is retained. By comparison, Information Fusion uses multiple techniques to reduce the amount of stateless data and provide only the stateful, valuable and relevant, data to deliver improved confidence.
• Human Interfaces – Information Fusion also adds in the opportunity for a human analyst to incorporate their own contributions to the data in order to further reduce uncertainty. By adding and saving inferences and detail that can only be derived with human analysis and support into existing and new data, organizations are able to maximize their analytics efforts and deliver a more complete “Big Picture” view of a situation.
As you can see, Information Fusion, unlike Data Integration, focuses on deriving insight from real-time streaming data and enriching this stream with semantic context from other Big Data sources. This is a critical distinction, as todays most advanced, mission-critical, analytical applications start looking to Information Fusion to add real-time value.
Originally posted on Data Science Central