Semantic Debt: A Series

A few years ago I wrote some articles on TDAN and some blog posts about “semantic debt.”

Most of us who work in data for a living have an intuitive understanding of “semantic debt.” When a company can’t calculate campaign ROI, total units sold last year or days from order to delivery, they have a systemic problem integrating and using their data. We think of that as a semantic debt.

We think it’s a debt because it’s a cost that gets bigger over time. It’ll cost money to fix it. We call it a semantic debt because the root cause is in the company’s semantic layer. We think the problem is architectural too because most companies don’t have data architects. (They all work at Raybeam/DEPT!) What they have is software architects. This gives most enterprises in the modern era a literal blank spot where their data architecture should be.

The Semantic Layer Ecosystem is made up of at least six different kinds of data architecture.

The Semantic Layer Ecosystem is made up of at least six different kinds of data architecture.

Software architects don’t usually worry about data architecture, at the data collection point they work on or anywhere else in the enterprise that data might need to go. That lack of interest shows up as what most companies call “data problems.”

At the very least most enterprises wait to think through their semantic layer around about the time they can’t figure out how many products they sold last year. Or when they’re losing customers because they can’t figure out where the orders are. That’s when they discover they’ve got semantic debt. They either learn to live with it, or hire a company like Raybeam/DEPT to start retiring it.

In our practice the semantic layer is all the data and tools you use to answer questions for your customers, shareholders, and staff. The data all gets collected in systems built by software architects, in the bottom left of the diagram above. But the semantic layer also needs at least one of the systems inside the red. Depending on the age and/or size of the company it might need or have one or more others, too.

Most of the data collection points now are off the shelf, supplied by a cloud vendor. Some are still hand-crafted. They’re all customized, on an ongoing basis, by all those software architects I mentioned above.

The collection points in the bottom left often have great software architects. On the other hand, if you’ve ever even used one of the systems inside the red zone, you know the quality is a lot more variable. Data architects can be hard to find. Even harder to find a team large enough to have practice with all of the systems on this diagram.

A company with a good semantic layer should be able to do the calculations they need to with the data they collect in the bottom left. They can’t because their own particular semantic layer is broken. They have a really bad semantic debt, and they need to pay it off. Sometimes that’s because all their architects know how to do is data collection points.

My first-draft definition of “semantic debt” was:

Semantic debt arises when an organization’s data management systems are conceptually inadequate.

That’s not very good. “Conceptually inadequate” according to who?

I tried this more recently:

 Opportunity cost is the basis of semantic debt.

A little better, except that opportunity cost is pretty nebulous itself. Better because we’ve substituted “opportunity cost” in for “conceptually inadequate.” But still nebulous on what malfunctions we’re seeing with the semantic layer that could be quantified as opportunity cost. 

At 50km we can think of “total semantic debt” as the depth of the hole a company is in, from a data management standpoint. Some of that debt is caused by customized software applications, the systems in the bottom left. We could also point to disconnected “data warehouse” parts as the main reason. We could probably find a dozen reasons. But you have to be deep into data modeling, data architecture, or data analysis before you build much of that sense of what semantic debt is. Usually it’s pretty hard to describe to non-specialists.

So I’ve had a lot of trouble over the years figuring out how to explain succinctly what semantic debt is to our clients. I’ve never been able to get the same precision used to explain technical debt. 

But I think I recently figured out an easy heuristic to describe semantic debt. In subsequent posts I’ll sketch how you can get a sense of the actual cost of your semantic debt. 

First we’ll take a look around the definition of “technical debt.” Second we’ll explain why semantic debt isn’t just another kind of technical debt. 

And finally, finally!, an explanation of what semantic debt is.

Previous
Previous

RAYBEAM/DEPT® Data Analysts: Across the Stack

Next
Next

Building an NLP Image : Amazon AMI for NVIDIA NeMo