Having the right people getting access to the right data is one of the most important parts of any AI project, and it’s one of the first hurdles I see organisations fail at. Too many times have I heard of indivuals successfully go through training sessions, have a clear picture of their use case, only to come to a grinding halt because they cant find, or don’t have access to the data they need. As organisations go through their data transformation projects, this scenario is becoming all too common and could potentially get worse before it gets better.
According to a Matillion and IDG report, the mean number of data sources per organisation was 400, with more than 20% of companies surveyed indicating they were feeding more than 1,000 sources in to their BI and analytics tools. These numbers perfectly highlight some of the challenges users face when trying to find the right data in their organisation.
One solution to this is the Data Catalog.
According to data catalog vendor, Alation:
A Data Catalog is a collection of metadata, combined with data management and search tools, that helps analysts and other data users to find the data that they need, serves as an inventory of available data, and provides information to evaluate fitness of data for intended uses.
Benefits of a Data Catalog
A Data Catalog brings quite a few benefits and advantages to an organisation, below are some of the main aspects:
Trust - Having all your data in one place, with proper controls and management should allow an organisation to trust the data they will use. It goes without saying that to have a successful AI project, access to reliable and accurate data is vital.
Discovery - As per the research above, if an organisation has over 400 different data sources, finding what you need, or even knowing what is available can be a challenge. As the name suggests, cataloging your data makes it easier to find, reducing time and effort to start a project.