How to Guarantee the Success of AI at Your Business
Three years have passed since Google and Facebook open-sourced generous portions of their artificial intelligence (AI) projects, but kept their data proprietary. Nevertheless, many people still mistake AI as a panacea for a data strategy. As long as the strategy is based solely on historical data it is fatally flawed. If your organization is planning to use AI, I’m writing to urge you to consider implementing a data strategy that involves continuously corroborated data from external sources.
AI systems build their intelligence from consuming vast amounts of information. As data plays an increasingly central role—across applications and industries—its quality, reliability, accuracy, and validity become critical. With greater volumes of high-quality data, performance grows in precision, accuracy, and reach. Put it into an intelligent workflow solution, and that authoritative data has the potential to drive human-like decision-making in real time.
And yet, three years after Google and Facebook’s sharing, the potential of AI is still a function of data quality.
Data Science Depends on Data Quality
When dirty data—duplicate entries, missing pieces, different formats, and so on—gets mixed in with good data, it is very difficult to clean and unify. It’s less useful for building AI. This is why ‘garbage in, garbage out’ has become a recurring theme in the data science profession.
While data-cleaning techniques have emerged to improve data quality, they take up too much time and focus exclusively on your historical data. This has become another recurring theme in the profession
- In 2014, the New York Times reported that data scientists spend up to 80% of their time “mired in this more mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets.”
- In a 2016 survey, CrowdFlower, a data enrichment platform for ecommerce websites, found that “3 out of every 5 data scientists we surveyed actually spend the most time cleaning and organizing data.” What’s more, those tasks were the least enjoyable part of respondents’ work days.
- Finally, in a January 2019 poll of over 2,000 data scientists, 67% of respondents said that they spend “most of” (60%+) their time cleaning and moving data.
This is a costly misuse of human resources, but it’s not the worst error.
Clean Historical Data
Is ‘clean historical data’ really what we want? Historical data typically comes from only one source. Cleaning your own data may be enough to get an AI tool working, but the optimized point will only be as large as your best historical performance. AI trained solely on historical data is inherently limited in its ability to discern and take advantage of new trends.
True gains in performance demand data points that you don’t have historically.
For example, if you want to reach a consumer and your CRM consists of five phone numbers listed in no particular order, then your optimized data would start by dialing the first position phone without knowing whether this phone number is even correct.
If your historical data says that most people are reached on “phone 3,” then you’d likely start with “phone 3” in every instance. External data introduces a level of intelligence indicating which number is the best number to reach a consumer (and authoritatively linked to that person). Otherwise, you’re wasting resources guessing which phone is correct.
I’ve used time of day as an example, but just imagine all of the other data points used to reach consumers: phone numbers, addresses, email addresses, and so on. Many of these data points may already be recorded in your CRM, but there’s no guarantee that they’re accurate.
Consider consumers’ phone numbers. Research shows that over 80% of outbound calls go unanswered or ring to the wrong party. Across industries, right-party contact rates continue to decline as consumers have become less trusting of who is on the other end.
Part of the problem is that for decades, contact centers have supplemented their less-than-perfect CRM data with credit bureau/demographic information. Frankly speaking, these sources emphasize quantity over quality, often contain stale data, and have major gaps in coverage (often as high as 40% of phones).
These sources may take weeks to reflect consumers’ movements from a landline phone to a mobile phone, from one cell phone carrier to another, or from one phone number to another. Phone numbers are constantly being reassigned from one subscriber to another; approximately 100,000 numbers in North America are reassigned by wireless carriers every day. With the speed of technology, these changes are happening more frequently than in the past.
That’s why, on average, 5%-15% of an organization’s CRM data becomes out-of-sync within a month, and 60% of it is inaccurate within two years. It’s unrealistic to expect meaningful results from an AI tool fed only this ‘garbage.’
If inaccurate or incomplete data can put organizations at risk for regulatory sanctions, brand damage and operational inefficiency, then it can undermine the effectiveness of an AI tool. But if an organization is able to 1) update and append their existing CRM data with the very latest consumer records, 2) have any changes in a consumer record proactively pushed to them in near real-time, and 3) layer on predictive consumer insights—like best number to use or best time to call—they are more likely to realize greater benefit from investing in AI.
Authoritative External Data
Authoritative external data supplements historical data and powers AI to arrive at better results. Now the question becomes “What constitutes an authoritative external data source?” In general, the solution should:
- Aggregate hundreds of independent, authoritative data sources that make it their business to know their customers’ identities and locations, for example: government, telecom, billing, utilities, and financial services companies.
- Work across the offline and digital worlds to continuously collect, consolidate, normalize, and corroborate data before pushing updates directly into your CRM dozens of times every day.
- Consume one or more citizen data points (e.g., phone number, address, email), and then append additional data points to that record.
Better Deep Learning
If you are augmenting your business intelligence with AI, or considering such a move, you must use information you don’t have today. Don’t rely on the same data to get to the same end faster. Get to a better, different end—faster.