I often see conflicting and overlapping definitions of business glossaries, data dictionaries, and data catalogs, and consensus of standard definitions of each remain elusive. Some of this confusion is easily understood considering how data governance typically evolves within an organization. For instance, it can be efficient to start with the creation of a data dictionary or data catalog and subsequently build a data governance program on top of that; likewise for a data quality initiative. This approach delivers quick wins in data governance while embracing the spirit of ‘agile’. I will put forth the following as the suggested definitions and elements of each. My intent and emphasis is to capture the joint value of these assets, to provide specific definitions of each, explain how they fit into a data governance program, and provide examples of each.
A business glossary is business language-focused and easily understood in any business setting from boardrooms to technology standups. Business terms aren’t meant to define data, metadata, transforms, or locations, but rather to define what each term means in a business sense. What do we mean by a conversion? A sale? A prospect? These types of questions can be answered with a business glossary. Having a business glossary brings common understanding of the vocabulary used throughout an organization. The scope of a business glossary should be enterprise-wide or at least divisional-wide in cases where different divisions have significantly different business terminology. Because of the scope and the expertise needed, responsibility for the business glossary is owned by the business rather than by technology. Often a data steward or business analyst will have this as a sole responsibility.
A data dictionary should be focused on the descriptions and details involved in storing data. There should be one data dictionary for each database in the enterprise. The data dictionary includes details about the data such as data type, permissible length, lineage, transformations, and so on. This metadata helps data architects, engineers, and data scientists understand how to join, query, and report on the data, and explains the granularity as well. Because of the need for technical and metadata expertise, the ownership responsibility for a data dictionary lies within technology, frequently with roles such as database administrators, data engineers, data architects and/or data stewards.
The data catalog serves as a single-point directory to locate information and it further provides the mapping between the business glossary and data dictionaries. The data catalog is an enterprise-wide asset providing a single reference source for location of any data set required for varying needs such as Operational, BI, Analytics, Data Science, etc.. Just as with the business glossary, if one division of an enterprise is significantly different than others, it would be reasonable for the data catalog to be exclusive to the division rather than to the enterprise. The data catalog would most reasonably be developed after the successful creation of both the business glossary and data dictionaries, but it can also be assembled incrementally as the other two assets evolve over time. A data catalog may be presented in a variety of ways such as enterprise data marketplace. The marketplace would serve as the distribution or access point for all, or most, enterprise certified data sets for a variety of purposes. Because of the mapping work requiring involvement from both business and technical expertise, assembling the data catalog is a collaborative effort.
Of course, the success you realize from the assembly and use of these data governance assets is entirely dependent on other pillars of a solid data governance program such as a data quality initiative, master data management, compliance and security concerns, etc. Please share your thoughts in the comments section or by direct message.
Dirk Garner is Principal Consultant at Garner Consulting providing data strategy consulting and advisory services. He can be contacted via email: firstname.lastname@example.org or through LinkedIn:http://www.linkedin.com/in/dirkgarner
See more on the Garner Consulting blog: http://www.garnerconsulting.com/blog-busglossdatadictdatacat.html