As business leaders strive to get the most out of their analytics investments, democratized data science often appears to offer the perfect solution. Using analytics software with no-code and low-code tools can put data science techniques into virtually anyone’s hands. In the best scenarios, this leads to better decision making and greater self-reliance and self-service in data analysis — particularly as demand for data scientists far outstrips their supply. Add to that reduced talent costs (with fewer high-cost data scientists) and more scalable customization to tailor analysis to a particular business need and context.
However, amid all the discussion around whether and how to democratize data science and analytics, a crucial point has been overlooked. The conversation needs to define when to democratize data and analytics, even to the point of redefining what democratization should mean.
Fully democratized data science and analytics presents many risks. As Reid Blackman and Tamara Sipes wrote in a recent article, data science is difficult and an untrained “expert” cannot necessarily solve hard problems, even with good software. The ease of clicking a button that produces results provides no assurance that the answer is good — in fact, it could be very flawed and only a trained data scientist would know.
It’s Only a Matter of Time
Even with these reservations, however, democratization of data science is here to stay, as evidenced by the proliferation of software and analytics tools. Thomas Redman and Thomas Davenport are among those who advocate for the development of “citizen data scientists,” even screening for basic data science skills and aptitudes in every position hired.
Democratization of data science, however, should not be taken to the extreme. Analytics need not be at everyone’s fingertips for an organization to flourish. How many outrageously talented people wouldn’t be hired simply because they lack “basic data science skills?” It’s unrealistic and overly limiting.
As business leaders look to democratize data and analysis within their organizations, the real question they should be asking is “when” it makes the most sense. This starts by acknowledging that not every “citizen” in an organization is comparably skilled to be a citizen data scientist. As Nick Elprin, CEO and co-founder of Domino Data Labs, which provides data science and machine learning tools to organizations, told me in a recent conversation, “As soon as you get into modeling, more complicated statistical issues are often lurking under the surface.”
The Challenge of Data Democratization
Consider a grocery chain that recently used advanced predictive methods to right-size its demand planning, in an attempt to avoid having too much inventory (resulting in spoilage) or too little (resulting in lost sales). The losses due to spoilage and stockouts were not enormous, but the problem of curtailing them was very hard to solve — given all the variables of demand, seasonality, and consumer behaviors. The complexity of the problem meant that the grocery chain could not leave it to citizen data scientists to figure it out, but rather leverage a team of bona fide, well-trained, data scientists.
Data citizenry requires a “representative democracy,” as Elprin and I discussed. Just as U.S. citizens elect politicians to represent them in Congress (presumably to act in their best interests in legislative matters), so too organizations need the right representation by data scientists and analysts to weigh in on issues that others simply don’t have the expertise to address.
In short, it’s knowing when and to what degree to democratize data. I suggest the following five criteria:
Think about the “citizen’s” skill level: The citizen data scientist, in some shape and form, is here to stay. As stated earlier, there simply aren’t enough data scientists to go around, and using this scarce talent to address every data issue isn’t sustainable. More to the point, democratization of data is key to inculcating analytical thinking across the organization. A well-recognized example is Coca-Cola, which has rolled out a digital academy to train managers and team leaders, producing graduates of the program who are credited with about 20 digital, automation, and analytics initiatives at several sites in the company’s manufacturing operations.
However, when it comes to engaging in predictive modeling and advanced data analysis that could fundamentally change a company’s operations, it’s crucial to consider the skill level of the “citizen.” A sophisticated tool in the hands of a data scientist is additive and valuable; the same tool in the hands of someone who is merely “playing around in data” can lead to errors, incorrect assumptions, questionable results, and misinterpretation of outcomes and conclusions.
Measure the importance of the problem: The more important a problem is to the company, the more imperative it is to have an expert handling the data analysis. For example, generating a simple graphic of historical purchasing trends can probably be accomplished by someone with a dashboard that displays data in a visually appealing form. But a strategic decision that has meaningful impact on a company’s operations requires expertise and reliable accuracy. For example, how much an insurance company should charge for a policy is so deeply foundational to the business model itself that it would be unwise to relegate this task to a non-expert.
Determine the problem’s complexity: Solving complex problems is beyond the capacity of the typical citizen data scientist. Consider the difference between comparing customer satisfaction scores across customer segments (simple, well-defined metrics and lower-risk) versus using deep learning to detect cancer in a patient (complex and high-risk). Such complexity cannot be left to a non-expert making cavalier decisions — and potentially the wrong decisions. When complexity and stakes are low, democratizing data makes sense.
An example is a Fortune 500 company I work with, which runs on data throughout its operations. A few years ago, I ran a training program in which more than 4,500 managers were divided into small teams, each of which was asked to articulate an important business problem that could be solved with analytics. Teams were empowered to solve simple problems with available software tools, but most problems surfaced precisely because they were difficult to solve. Importantly, these managers were not charged with actually solving those difficult problems, but rather collaborating with the data science team. Notably, these 1,000 teams identified no less than 1,000 business opportunities and 1,000 ways that analytics could help the organization.
Empower those with domain expertise: If a company is seeking some “directional” insights — customer X is more likely to buy a product than customer Y — then democratization of data and some lower-level citizen data science will probably suffice. In fact, tackling these types of lower-level analyses can be a great way to empower those with domain expertise (i.e., being closest to the customers) with some simplified data tools. Greater precision (such as with high-stakes and complex issues) requires expertise.
The most compelling case for precision is when there are high-stakes decisions to be made based on some threshold. If an aggressive cancer treatment plan with significant side effects were to be undertaken at, for instance, greater than 30% likelihood of cancer, it would be important to differentiate between 29.9% and 30.1%. Precision matters — especially in medicine, clinical operations, technical operations, and for financial institutions that navigate markets and risk, often to capture very small margins at scale.
Challenge experts to scout for bias: Advanced analytics and AI can easily lead to decisions that are considered “biased.” This is challenging in part because the point of analytics is to discriminate — that is, to base choices and decisions on certain variables. (Send this offer to this older male, but not to this younger female because we think they will exhibit different purchasing behaviors in response.) The big question, therefore, is when such discrimination is actually acceptable and even good — and when it is inherently problematic, unfair, and dangerous to a company’s reputation.
Consider the example of Goldman Sachs, which was accused of discriminating by offering less credit on an Apple credit card to women than to men. In response, Goldman Sachs said it did not use gender in its model, only factors such as credit history and income. However, one could argue that credit history and income are correlated to gender and using those variables punishes women who tend to make less money on average and historically have had less opportunity to build credit. When using output that discriminates, decision-makers and data professionals alike need to understand how the data were generated and the interconnectedness of the data, as well as how to measure such things as differential treatment and much more. A company should never put its reputation on the line by having a citizen data scientist alone determine whether a model is biased.
Democratizing data has its merits, but it comes with challenges. Giving the keys to everyone doesn’t make them an expert, and gathering the wrong insights can be catastrophic. New software tools can allow everyone to use data, but don’t mistake that widespread access for genuine expertise.
Leave a Reply