In the age of voluminous unstructured data that can be curated manually, semi-automatically or eventually, with a fuller degree of automation with added approval processes and governance in place, it is more than ever necessary to secure not only the data, but secure the process around which data is curated, prepared, an ML algorithm(s) selected with resulting data being used trained for ingestion into ML algorithms, the curation of the data : data collection and aggregation sources (lineage), data preparation, including redundancy removal, null value replacement, range consolidation, homogeneity of content, etc.
Thus the age of Traceable Machine Learning Governance is born. Standards bodies should embark on this consolidation of views, across vendors, consumers, services organizations, data providers, curators and other stakeholders in the ML Training Life-cycle.
The problem of governance of ML is one of not only data governance as it passes through traceable, steps until ingested into an ML, but also of the process around the collection, curation, selection of ML and training set selections and partitioning.
So, here is our call to action:
1. Standards Bodies need to consolidate a standard around Traceable ML Governance to reduce the risk of fake training, bogus data used to train a “paper tiger ML”, which has no substance.
2. Corporations should give serious consideration during the ML and Cognitive Computing Training Life-cycle to secure a Traceable ML and Cognitive Governance process. This process can begin in a lightweight fashion, but secure legal implications arising from the use and administration of Cognitive Systems using Machine Learning (ML) to make recommendations, provide insights, generate summaries, reports, news, reviews, etc.
3. Furthermore, as global impact of well-trained Cognitive Systems (ML systems, AIs) becomes more and more tangible, we will want to have demonstrable traceability on how these systems were trained, retrained and where the human in the loop influenced the AI and how it was brought to bear (SME, curation, etc) on the source data, where data was sourced, how it was curated (whether human, or other ML or Cognitive System). To do so, traceability in the Cognitive System life-cycle or the ML-training life-cycle will play a cardinal role in its adoption, trust and veracity of recommendations.
Many claims around Deep Learning being beyond governance or governability is a bogus claim founded only in the illusion that ‘since the hidden layers of a Neural Network and their interactions are “dark matter” so we cannot govern their inputs and outputs.’ In fact we can and should secure the training process, as well as the inputs and output configurations of ML systems in Logs that are themselves inspected by ML systems with a human in the loop approval process in place.
4. Organizations and Standard Bodies should consider the use of Blockchain technologies to secure and govern the process chain within the ML-training or Cognitive system life-cycle in a demonstrably traceable manner, which with little doubt will gradually find its way in local and global legislation.