Poor data quality is enemy number one to the widespread, profitable use of machine learning. While the caustic observation, “garbage-in, garbage-out” has plagued analytics and decision-making for generations, it carries a special warning for machine learning. The quality demands of machine learning are steep, and bad data can rear its ugly head twice — first in the historical data used to train the predictive model and second in the new data used by that model to make future decisions.
To properly train a predictive model, historical data must meet exceptionally broad and high quality standards. First, the data must be right: It must be correct, properly labeled, de-deduped, and so forth. But you must also have the right data — lots of unbiased data, over the entire range of inputs for which one aims to develop the predictive model. Most data quality work focuses on one criterion or the other, but for machine learning, you must work on both simultaneously.
“In the years ahead, AI will raise three big questions for bosses and governments. One is the effect on jobs. ”
“A second important question is how to protect privacy as AI spreads. ”
“The third question is about the effect of AI on competition in business. A technology company that achieves a major breakthrough in artificial intelligence could race ahead of rivals, put others out of business and lessen competition. ”
“Retailing is an illustration of how AI can help large firms win market share. Amazon, which uses AI extensively, controls around 40% of online commerce in America, helping it build moats that make it harder for rivals to compete. ”
“Just as the internet felled some bosses, those who do not invest in AI early to ensure they will keep their firm’s competitive edge will flounder.”