• Data virtualization offers best-in-class data integration capabilities to accelerate your analytics, data science and BI initiatives.
• Data virtualization empowers businesses through rapid data discovery, unified data access and the efficiencies of collaborative analytics.
• Data virtualization unleashes the power of self-sufficiency for business analysts and power-users to create as-needed custom views that display information precisely as they’d like for each unique business initiative.
• Data virtualization can save countless hours by eliminating typical roadblocks such as difficult-to-access data, funding for lengthy ETL projects, and the headaches of informal and inconsistent analytics calculations based on siloed data within organizations.
• Data virtualization provides these capabilities by abstracting and simplifying the complexity of locating, joining and filtering multiple simultaneous data sources. Even complicated transformations, cleansing and aggregations can easily be performed through a visual interface without the need for advanced SQL development skills.
Introduction to Data Virtualization
Many organizations face data integration and accessibility challenges as they seek to deliver ever-increasing amounts of data into the hands of more people for exploration and analysis. Data virtualization is an approach and set of technologies and practices to address these challenges and to empower organizations with data. Though data virtualization is not new, or without its complexities, businesses stand to gain value and efficiencies through adoption. Specifically, three primary capabilities are driving businesses towards data virtualization: data unification, business agility and synergies with data governance.
• Enabling discovery for enterprise analytics by providing a single repository to access, manipulate and leverage enterprise information assets through data unification
• Agility in data exploration and discovery accelerates time to insight
• Data virtualization is an effective catalyst for data governance by minimizing redundant and repetitive efforts and driving standardization of KPIs, metrics and reports – improving confidence in the quality and accuracy of the underlying data.
Enabling Discovery through Data Unification – Quick and Efficient Data Access
Data virtualization provides the crucial function of unifying data sources that centralizes access through a single location. Data unification is the process whereby multiple disparate data sources are made accessible from one location without the need for physical data integration, copying or moving data. This approach quickly creates a single repository in which analysts can explore, discover, and query the entire depth and breadth of enterprise information.
By unifying data sources where they exist (rather than copying data to a central location) multiple disparate data stores can be integrated – regardless of geographic location and without delays caused by copying data. Because of this, data virtualization accelerates and empowers data science, business analytics and business intelligence functions by increasing the breadth of data availability, which in turn empowers self-sufficiency.
Data virtualization improves time to business insight by placing all enterprise data at the fingertips of users, including non-traditional data types such as unstructured data, clickstream, web-originated or cloud-based data. Regardless of the existing infrastructure (i.e., a data warehouse, data lake, or data that is currently spread across multiple isolated data silos), data virtualization creates an environment that helps bring everything together now and in the future when new data stores and sources are added.
Business Agility & Collaborative Analytics – Reusability, Consistency, Self Sufficiency
By reducing the analyst’s dependency on IT for data acquisition and data preparation, data virtualization enables self-sufficiency and therefore, agility. Data virtualization makes it possible for business analysts to manipulate data on-the-fly, iterating through multiple perspectives in real time without the need to copy or move the data. This dynamic view creation makes it possible to rapidly prototype, experiment, and iterate to see, manipulate and use the data exactly as needed to meet each unique requirement. No time is wasted to physically cleanse, remodel, prepare, move or copy the data when using data virtualization. These functions are carried out in real time, as needed, and can be quickly and easily modified to meet the needs of each unique data-driven effort. This can save a tremendous amount of time by creating queryable virtual joins in minutes.
Data Virtualization as a Data Governance Catalyst
Through intelligent sharing of information, data governance greatly improves productivity and efficiency of analytical, BI and data science initiatives. Searchable data catalogs, standardized metrics and KPIs, data quality improvements, and master data management (MDM) solutions, are just a few examples of the attainable value through of a well-crafted data governance plan.
Data virtualization makes data governance more efficient and streamlines administration through centralization of data policies and administrative tasks. Since data virtualization integrates data in real time, leaving data in place and eliminating the need for redundant data copies such as staging areas and operational data stores (ODS), there are fewer areas to govern and secure, meaning less administration, less complexity, and less risk. Data governance measures can be applied on-the-fly as data flows through the virtual layer. The centralized nature of governing the data and access through a unified data layer eliminates the need for redundant steps, interfaces, procedures, and the need to examine and audit each individual data source is lessened or removed altogether.
Having a single security and access model to manage and maintain across all data sources greatly simplifies all facets of data security management by providing a single platform for administration rather than needing to juggle the many administrative applications corresponding to each individual data storage server. Data policies can be defined on a shared/common data model or on logical data objects for efficient sustainable management and reuse.
One or more of these drivers will generally resonate so strongly within an organization that they will pursue the value of data virtualization to meet those specific needs. This generally leads to further leveraging the power of data virtualization in pursuit of additional value through other business drivers for data virtualization as the platform, team, and community mature. Data Virtualization products such as those available from Red Hat JBoss, Stone Bond Technologies, and Data Virtuality, stand out among the crowd as some of the more innovative approaches to Data Virtualization.
Dirk Garner is Principal Consultant at Garner Consulting providing data strategy and advisory services. He can be contacted via email: firstname.lastname@example.org or through LinkedIn:http://www.linkedin.com/in/dirkgarner