A taxonomy of awkward conversations with stakeholders
This post is about different categories of difficult conversations with stakeholders that you will almost inevitably encounter as a data scientist. I think of it as more of a taxonomy (and perhaps commiseration) than advice. Data scientists sit in an awkward place inside organisations. We’re expected to deliver authority, certainty, and progress, while working with uncertainty, partial information, and value judgements that rarely belong to data science alone. Many of the most uncomfortable conversations arise from that tension.
This post is about different categories of difficult conversations with stakeholders that you will almost inevitably encounter as a data scientist. I think of it as more of a taxonomy (and perhaps commiseration) than advice.
Data scientists sit in an awkward place inside organisations. We’re expected to deliver authority, certainty, and progress, while working with uncertainty, partial information, and value judgements that rarely belong to data science alone. Many of the most uncomfortable conversations arise from that tension.
The only encouragement I can offer is this: if you aren’t having the occasional uncomfortable conversation with a stakeholder, you’re probably doing something wrong.
You don’t actually need this
In theory this shouldn’t be a particularly thorny conversation. Someone comes to you asking you to build something for them, but the thing they want either already exists or wouldn’t be useful in the way they think it will. Either way, it should be a conversation that results in less work.
In practice, it’s often surprisingly difficult. People are reluctant to admit that they already have a perfectly functional process, but want to use AI instead because it sounds modern, innovative, or strategically aligned. This tends to show up as repeated references to innovation or strategic priorities that remain vague no matter how much you try to pin them down.
The difficulty here isn’t technical. It’s that the stakeholder doesn’t actually want to solve a problem, they instead want to use data science to signal relevance or progress.
That’s not a data science question
A core part of the job is translating business problems into problems that can be solved using the tools of data science. It’s rare for someone to bring you a fully formed data science problem, and if they do it’s often something they jotted down at a conference.
Sometimes though the problem just isn’t one that can reasonably be coerced into a data science question. Examples include:
- What should our next product be?
- Should we enter the European market?
- Is this strategy too risky?
These questions can feel like data science questions because they’re about decision making under uncertainty. But they’re not analytically well-defined. “Is this strategy too risky?” isn’t a data science problem because how much risk is acceptable is a values decision.
In theory, some of these could be tightened up. If a stakeholder can define what risk means in this context and what level is acceptable, perhaps there’s a way to get at parts of the question. In practice, questions this poorly formed can only rarely be shaped into something that satisfies both the stakeholder and the data scientist.
The solution to this problem isn’t a technical one
Data science has become a victim of its own hype (I can’t be the only one still feeling victimised by ‘sexiest job of the 21st century’). As a result, people often reach for data science solutions when the problem they’re facing simply isn’t a technical one.
Imagine an organisation experiencing high employee turnover asks you to build a model predicting whether an employee will leave. Even if you could do this well, would it solve the problem? No. At best, it might identify characteristics associated with attrition and offer some clues. But there are much better tools for understanding why people are leaving and what might improve retention, most of which involve talking to people.
More importantly, it’s unclear what action would follow from the prediction. If you predict that someone is likely to leave, what do you do with that information? Even interventions like pay rises don’t address the underlying organisational issues driving attrition.
Data science can do a lot, but it’s not the right tool for fixing organisational or culture problems.
This is much harder than you think it is
One nice thing about being a data scientist is that occasionally you get to do something so straightforward as to be trivial and be treated like you’re a wizard. The flip side is being asked to solve something exceptionally complex or entirely novel on the assumption that you’ll be able to knock it out in an afternoon.
The awkwardness here comes from opacity. Stakeholders often assume that because a problem is common, it must have a simple solution. In reality, the fact that everyone has the same problem often means it’s notoriously difficult. And just because an idea is easy to describe doesn’t mean it’s easy (or even possible) to implement.
Explaining this without sounding evasive or incompetent can be surprisingly hard.
That number doesn’t mean what you think it does
Sometimes a model output or a value you’re optimising simply doesn’t mean what the stakeholder thinks it does. This becomes dangerous when it leads to decisions that the analysis doesn’t actually support.
Propensity modelling is a great example of this. Organisations often want to use propensity scores to decide how to spend money where it will change behaviour. But model outputs can only rarely be interpreted that way.
The deeper problem is that meaning usually doesn’t travel with numbers. Unfortunately, there’s often no amount of carefully stepping through examples that can dislodge a deeply held belief about what a number ought to mean.
The data you want doesn’t exist
Organisations typically generate extraordinary amounts of data, so it can come as a surprise when you have to explain that the data needed to answer a specific question simply doesn’t exist.
Sometimes this is the result of earlier design decisions. Sometimes it’s inherent to the problem. For example, by definition you will always have much less data about people who don’t use your product than about those who do.
In other cases, the data exists but not in a usable form. You might have website interaction data and purchase data, but no way to link the two. Or your data warehouse might overwrite fields rather than preserving history, making it impossible to reconstruct past states without leakage.
When the data you need doesn’t exist, there’s often pressure to see whether some other data could serve as a proxy. This is risky. If you had data showing it was a good proxy, you probably wouldn’t need the proxy in the first place. The result is often a fragile system built on an assumption that can’t really be tested.
The data is garbage
Even worse than not having the data is having it and needing to break the news that it’s unusable. Unlike the previous category, this one is less about absence and more about overconfidence. Stakeholders often believe that because data exists, it must be fit for purpose.
Organisations often have an unwarranted faith in the quality and consistency of their data. Part of the job for data scientists is bringing them back down to earth. Sometimes the data is riddled with errors. Sometimes design choices, like free text fields where controlled inputs were needed, mean a heroic amount of wrangling would be needed to turn it into something useful. I once worked with a dataset where country of birth was free text. There were dozens of different spellings of “Australia” alone. In cases like this, it is reasonable to choose sanity and push back.
Your pet theory is wrong
Sometimes the question is clear, the data exists, and the analysis works, but the result the stakeholder hoped for doesn’t appear.
In many organisations, data science is used to support decisions that have effectively already been made. When the analysis contradicts those decisions, the response can be disappointment, disbelief, or pressure to reframe the result.
Stakeholders with a preferred outcome usually make this very clear early on, so some setting of expectations is possible. Even so, in many ways data science is the craft of breaking stakeholders’ hearts.
There may be requests to massage the result or look at it another way. This is usually a waste of time. Worse, it undermines the credibility of data science internally by making it appear as though the work exists only to deliver desired conclusions.
Organisations willing to use shonky analysis to justify decisions aren’t likely to change course because evidence is lacking. But that doesn’t mean you should produce the shonky analysis for them.
That’s wrong and I won’t do it
Even more uncomfortable than telling someone their pet theory is wrong is telling them their question is ethically problematic.
Sometimes people just aren’t thinking in an ethical frame and need a simple nudge to see the issue. In other cases, the problem is more subtle, such as proxy discrimination that’s never stated explicitly, or a system that could work exactly as designed and still cause harm. And sometimes the stakeholder just doesn’t care.
I’ve written more about this in a separate post, but the short version is: if you’re building the system, you’re shaping how people are treated. That’s not someone else’s job.
Final thoughts
These conversations are uncomfortable, but they’re also part of the value you bring. Anyone can train a model. Knowing when to push back, ask harder questions, or say something unwelcome is what makes the work trustworthy.
If you’re having these conversations, you’re doing your job.