HOW we Approach problem solving

Start with the question. Everything else follows.

Why most analytics projects start in the wrong place — and what to do instead

Most analytical projects start in the wrong place.

Someone decides they need a dashboard. Or a sentiment model. Or a regression. The tool gets chosen before the question gets fully understood, and the engagement proceeds from there — gathering data that fits the tool, building outputs the tool produces, and delivering results that answer the question the tool was designed to answer rather than the one the client actually had.

We've seen this pattern enough times that we've given it a name: method-first thinking. And it's the single most common reason analytical projects produce technically correct results that nobody acts on.

At ORRO, we do it differently. We start with the question.

What do you actually need to know?

This sounds obvious. It isn't.

When a client comes to us, they often arrive with a solution already in mind. They want a dashboard that tracks sentiment. They want their survey data categorized. They want a model that predicts which customers will churn. These are reasonable starting points — but they're answers to questions that haven't been fully asked yet.

Our first job is to reframe the conversation with curiosity.

What decision is being made? Who is making it? What would change their mind? What does success look like six months after we deliver the analysis? These questions set the stage for the real work to begin. An analyst who skips them is building a bridge to a destination nobody has confirmed.

The most valuable thing we do in the early stages of any engagement is ask questions that shape the project goals and outcome — these conversations catalyze new thinking and the answers almost always change what we build.

Let the data tell you what's there

Once we understand the question, we resist the urge to immediately impose structure on the data.

This is counterintuitive. Most analytical approaches start by defining categories — you decide in advance what you're looking for, then measure how much of it exists. That works well when you already know what matters. It fails badly when you don't, because you find exactly what you expected and miss everything else.

We learned this early in our work analyzing hundreds of thousands of workplace survey comments for a global design firm. We could have started with a predefined taxonomy of workplace topics — noise, privacy, lighting, meeting rooms — and measured how often each appeared. Instead we used unsupervised topic modeling to let the data surface its own themes first.

The expected topics appeared. So did ants. And rats. And spiders.

No survey designer would ever create "pests" as a multiple choice option. No predefined taxonomy would have caught it. But for certain buildings in certain cities, it was a genuinely significant signal about employee experience — and one that pointed directly to a facilities management intervention rather than a design one.

The lesson: exploration before classification. Let the data show you what's there before you decide what to measure.

Combine methods that don't usually travel together

None of our major projects has been solved by a single method.

The workplace survey work combined unsupervised topic modeling, programmatic silver labeling, and generative AI-produced synthetic training data — three approaches that don't typically appear in the same toolkit, assembled because each one solved a problem the others couldn't.

The electrolyte drinks work combined web scraping, machine translation, LLM-based labeling, consumer persona development, and logistic regression — moving from raw unstructured text to statistically rigorous findings about what actually drives consumer satisfaction, by segment.

The hockey player longevity work combined survival modeling, regularized regression, and a multi-horizon career stage framework that treated a player's career as four distinct analytical problems rather than one.

In each case, the method combination wasn't chosen because it was fashionable or because we had a favorite tool. It was chosen because the question demanded it. That's the only legitimate reason to choose a method: because it's the right one for what you're trying to find out.

The null result is a finding too

One of the things that distinguishes rigorous analytical work from confirmatory storytelling is the willingness to report what the data didn't find.

In our hockey longevity research, we built an extensive set of variables designed to capture the psychological and social dimensions of a player's career — team stability, roster continuity, how long a player had been with the same teammates. The hypothesis was that the human environment around a player would add meaningful predictive power beyond the observable physical facts.

It didn't.

After controlling for early career usage, age at NHL debut, and physical characteristics, the psychological and social variables added almost nothing to the model. Four observable variables did the vast majority of the predictive work. Everything else was marginal.

We reported that clearly. Not as a failure — as a finding. The data was telling us something important: that by the time a player reaches the NHL, how heavily they are being used in their early seasons is more informative about their eventual career length than anything we could currently construct to capture the human side of their experience.

An analytical partner who buries null results, or spins them into something more palatable, is not actually serving their client. They're serving their own desire to have found something. Those are different things, and the difference matters.

Translation is half the work

The most technically sophisticated analysis in the world accomplishes nothing if the people who need to act on it can't understand or trust what it's saying.

This is not a soft skill. It's an analytical skill — and one that gets far less attention than it deserves.

Every engagement we undertake ends with a communication challenge. Who is the audience? What do they already know? What will they find surprising, and how do we prepare them for it? What level of technical detail serves the decision they need to make, and what level obscures it?

We think about these questions as carefully as we think about model selection. A finding that gets nodded at in a presentation and then filed away has failed, regardless of how correct it is. The goal is always the same: an answer that is both right and understood.

That sometimes means translating a logistic regression coefficient into plain language. It sometimes means building a visualization that makes a pattern visible without requiring the audience to understand how it was detected. It sometimes means saying "the most important finding here is actually the one that didn't work out as we expected" — and then explaining why that matters.

The best analytical work we've done has been work where the client walked away not just with an answer but with a new way of seeing their data. That only happens when the communication is as rigorous as the analysis.

The question that drives everything

We are drawn to problems that don't have obvious solutions. Problems where the right method isn't clear at the outset. Problems where someone has tried the standard approach and found it insufficient. Problems where the data is messy, the question is genuinely hard, and the answer will only emerge from genuine curiosity about what the data is actually trying to say.

That curiosity is what connects a study of NHL player careers to an analysis of electrolyte drink reviews to a model built on hundreds of thousands of workplace survey comments. Those problems have almost nothing in common on the surface. Underneath, they share the same structure: a question that matters, data that is harder to analyze than expected, and people who need to understand and trust the answer before they can act on it.

Start with the question. Everything else follows.

Page updated

Report abuse