2016 will see increasingly widespread popularization of democratized data to serve businesses needs of faster access to data to support mining, exploration and discovery of actionable insight and easier and better tools to do so. Technologies such as streaming data, data virtualization tools, data wrangling tools, and data prep & cleansing utilities will become still more mainstream and easier to use for business users leaving IT staffs to focus more on stability and ‘keeping the lights on’.
Streaming data feeds will become ever more necessary in order for companies to compete. The barriers to entry of streaming are price and expertise and both of these could prove challenging for many companies, especially for smaller enterprises. Dependency on and usage of ETL, data copying and physical re-modeling will begin to wane as the aforementioned tool categories will make it easier and quicker to clean, model, and present data in-place and/or virtually without the overhead and time burden of physically copying and re-modeling it.
With various strategies to gain access to the data faster, business analysts capabilities to analyze sooner will grow progressively and the corresponding effort will become easier as the evolution of hybrid data-blending / analytics products continues. Products such as Alteryx, Looker, and LavaStorm, lessen the skillset required to perform advanced analytics and include many tools to complete the data preparation steps along with visualizing the information.
Data supply chains (the path from production data stores to where the data is stored for analyst activity) can be further shortened through data virtualization products such as Denodo, Rocket Software, or DataVirtuality. Products like these can allow analysts to join queries across physically diverse databases of all different types without the extra steps and time delay of traditional ETL. They can also virtually model, transform, and provide user friendly data element naming. They operate as an intermediary access point and house only the necessary metadata to allow cross database joins. Many of these platforms include advanced query optimization and caching capabilities to provide a more robust toolset.
The increased prevalence of NoSql data stores (Cassandra, Hadoop, Neo4J, etc.) serving production needs will also shorten the data supply chain. These data stores have the capability of storing data in a more analytics friendly manner than traditional relational data stores. With direct or metered access to these types of data stores, when used as production stores, the analysts can act on the most recent data in real time with fewer or no data wrangling steps to slow them down.
Because of all of these new and maturing capabilities there is (finally) a compelling business need for data governance. At least there is a need for data governance in the form of a data glossary or dictionary to help guide and direct analysts to understand and consume appropriate data using standard definitions of data elements, KPIs, metrics, business terms, etc. Data governance is an enabling component to fully democratized data.
We are heading towards a world in which increasingly greater numbers of insights will be discoverable and retrievable in near real time since the events that feed and shape our KPIs and metrics are unfolding at breakneck speed. As we shorten our data supply chain we hasten our ability to analyze and act on insights.
Dirk Garner is a Principal Consultant at Garner Consulting providing data strategy consulting and full stack development. Dirk can be contacted via email: firstname.lastname@example.org or through LinkedIn: http://www.linkedin.com/in/dirkgarner