- Speaker #0
Welcome to Let's Talk Data, the podcast by DQE, data quality everywhere. Each episode, we meet with data experts to exchange insights, experiences, and practical perspectives on data. Whether you're a data professional, a marketer, or a business leader, this podcast gives you practical perspectives to better understand and use your data.
- Speaker #1
Ever wondered why your AI pilots stall right after takeoff? I am Philippe Boulanger, and today I am joined by Dylan Anderson to cut through the hype. Today's guest is Dylan Anderson, Director of Data Strategy, Analytics and AI at London-based consultancy Atombit. Over the last decade, he's helped organizations turn messy data into measurable outcomes, and he's built a community of more than 50 000 on LinkedIn and 9 000 on Substack by mapping what he calls the data ecosystem. In short, Dylan lives where strategy meets data, and makes it work. In part one, we are asking a blunt question. If we have more data than ever, why are insights still so rare? Let's start with your background. Tell us about yourself and your career journey in the data space.
- Speaker #2
For sure. So again, my name is Dylan Anderson. I work as a director of data strategy, analytics, and AI at Atombit in London, UK. And I started my career as a business consultant. I started my career really understanding what businesses need to drive value and how to build their own strategy. I transitioned to data because I saw the future. I felt that data and AI needed to enable a lot of the strategic goals businesses and organizations had and understanding how to bridge that gap between strategy and data is really where I built my career and have done so in the consulting space, worked with dozens of different clients across many different industries and a lot of their problems end up being the same, not being able to get value from their data, even though they have tons of it on site. On the side, I also do a lot of content creation and thought leadership in the data and AI space, specifically focusing on how does data come together? What is that data ecosystem look like? And how did the different domains of data flow and work together to realize value for organizations? And that's something I've dug deep into a lot of articles and talks and such.
- Speaker #1
So speaking of progress in AI or analytics Why are we still talking about data quality as a foundational issue?
- Speaker #2
So, data quality underpins so much. It underpins everything. And it's been a problem for 20, 30 years. Everything you build on top of data, so your new models, your AI, your analytics, it all rests on the quality of the data that goes into it. Yet, when people look at value of data and when they look at what matters, they see graphs and charts and dashboards or... AI models, they don't see the underpinning elements of data quality and therefore it's forgotten about. The investment isn't given. It's not being made into data quality. Instead, the investments be made up front and therefore the models that flow off of that data that come from that data aren't being built on strong foundations. And that's what we're seeing today. And that's why a lot of companies have huge amounts of trouble setting up a value driven analytics team or actually getting an ROI on the millions they're spending on AI. It comes back down to what it's built on, and that's the data quality and the foundations of that.
- Speaker #1
So what do you mean by reliable and structured data? What is the source of that?
- Speaker #2
So reliable and structured data really means having data that has context and is set up in a are really focused right now on unstructured data because everyone talks about how AI can, I don't know, make sense of unstructured data and really pull out insights from it. But what we're missing is structured data is so much easier to glean insights because people have spent the time to frame it in a way of this is what you get from it and this is why you need it and this is how it's set up. So taking a step back, structured data is really about making sure that your data is clean, organized, and accessible by the people who need it and need to use it. And to make it reliable across the organization is the first thing you need to do to set up a data literate organization or to enable data democratization.
- Speaker #1
How do you address those foundations then for a large company or even for a small company?
- Speaker #2
Funny thing is, it's similar for both. And a lot of companies, a lot of smaller companies, even a lot of larger companies often say, oh, well, we don't have the manpower that a Google has, or we don't have the resources that a Walmart has. And they kind of push aside focusing on the foundations and investing in it. But the really, the big thing is what data are you using or what data do you need to use to drive the value that you need today? And I would say there's kind of four main things that I would focus on your foundations to do that. The first one is data architecture. So how do the different systems work together? What are your source systems that create data? How does that flow into a your warehouse or your storage layer? And how does that then go to consumption? And how do people use that data to make insights and decisions? And the second element is data modeling. So how do you structure the data in a way that people can understand it and access it? So it's not just 100 different point-to-point data engineering pipelines that look like a spaghetti diagram on a chart, because that is honestly where I think 80% of companies are at right now. The third element is data governance. So who owns the data? How do you manage it and govern it, right? It is not just the problem of IT or data, which a lot of companies that that's who people go to first, they'll go IT, I can't access this data. And they're like, well, we don't actually own that, but setting up the processes to do that is essential. And then the fourth element, which I think we mentioned before was just the data quality and standardizing what that quality means too much of data quality right now relies on no standardization. So people are just kind of assessing data ad hoc and saying, this looks good. This doesn't. And that is left up to chance. It's left out to subjective opinions. How do you make sure that the data is of a high quality that can be used for the use cases you need to get that data out for?
- Speaker #1
So let's say I have a lot of data. What are the strategies that I should use properly? Are we missing something and should we get more data?
- Speaker #2
I mean, Every organization talks about how they want to get more and more data. And then they realize, oh, actually, you know what? We have terabytes of data that we're not using right now. And I think that is a huge problem because they then think, oh, well, let's use AI to comb through that and really understand it. But they're not able to do that either because that costs a lot of money and has a lot of knowledge required. So I think the biggest strategy is prioritization. And I think 20% of your data can probably solve and deliver 80% of the value you need from data. Yeah, exactly. Exactly. So I mean, I think about what are your main use cases? How do you want to use these different pieces of data? And I think about what are the use cases? So what kind of data products you want to build? What business questions do you want to answer? How do you want to enable teams? And by thinking about that, you then figure out, okay, we want to better understand sales and forecast that in the future. Therefore, we need the sales data, historical sales data. We need potentially customer data to understand customer segments and trends. And you start to identify different data sources that are high priority by identifying those data sources. You can then to work, to establish the, what is the quality that we need that data at? How do we structure it? How do we make sure we have access to it in the right models and systems? So that data modeling piece. And in the end, you kind of create a overarching view of what matters and that becomes structured well. And then as you see fit, if you find new data that becomes of importance or if you've done that, you can add to it and build something that's more cohesive and maintained in the right way. But I think you start off with prioritization, figure out where you can get the most ROI from data because too many, too, too, too many teams just start by building models and they don't prioritize and that leaves them. wasting a ton of money and data becomes a cost center and no one sees the value from it.
- Speaker #1
Yeah, so focus without restriction.
- Speaker #2
Exactly. I can give an example too. Like we've, I've worked with a global logistics company. I mean they are in the billions of pounds, billions of dollars in terms of turnover. And they don't have the same definition of customer across all their different systems. Everything they do that's related to the customer has to be done manually. And therefore It takes a lot of time, it's inaccurate and it's riddled with errors. So we're building them a customer data model of how do the systems come together? How can we create a single kind of source of truth for that customer definition to allow you to make better decisions, draw insights and use the data you have instead of letting it just sit there and manually poking at it once in a while.
- Speaker #1
What would be the key factor to make decisions based on data?
- Speaker #2
So key factors are Everybody's focused about AI right now, and they think that needs to sit at the center of anything that they do with data. The problem with AI is it's very hype cycle. We have agents, we've got MCP models, RAG models, Gen AI. I don't know what's next. And what remains consistent and true is that data is actually at the heart of AI. So while all these companies are focusing on AI, there needs to be a consistent foundation for that AI, which is the data. and organizing your data. The other word and phrase that I've heard with every client and every company out there is single source of truth. And they all want that. It is a dream. Everyone thinks that by getting that single source of truth, everything in the world will be enabled. But it takes a lot of work to get there. So I think the key factor for me and what I try to stress for a lot of different organizations is how do you get to that single source of truth? And it is a combination of all the things that we kind of talked about. And it's not just to enable AI, it's also just to enable your analytics and your data across different parts of your business because you might have a single source of truth within the customer team, but that might not exist within the sales team or think of a global organization. I worked with a huge chipping company and what we did was we built them a digital twin at one of their ports in Morocco and The digital twin gave them a visibility into their entire operations, allowing them to simulate how to improve efficiency and for a port if you improve the efficiency by two to 3%, that's millions of dollars saved per year. And the problem was we built the digital twin for that port, but we weren't able to scale it because there was no single source of truth in that organization for different ports. And we had to rely on data that was individual per port, meaning that scaling this out was impossible. The data involved in that project was subject to one port and not the rest of the organization. The rest of the organization had different types of data all across the world and organized it in different ways and had no single source of the truth. So when we built this very powerful, very efficient, and very effective tool for the one port, it couldn't be scaled. They couldn't use it across the organization. So that really just made all that investment possible not as worthwhile. And I think that's what's happening right now with AI is the key factor to enabling it is not wide enough that allows AI to scale within an organization across their operations or across their international reach.
- Speaker #1
So let's say I have listened to this podcast, what do I do to make the best use of my data? How do you make this work in reality?
- Speaker #2
So I think the biggest thing is to talk to stakeholders and listen to them And those stakeholders should not just be in the data and technology teams. It needs to be across the business. So what are people doing on their day-to-day workload? How are they using data, but more importantly, how are they driving value for the organization? Then as the data team, it's important for you to work with them to figure out what kind of data products, what kind of data tools will help solve those problems and make them easier for them and create efficiencies or drive value. And flowing from those data tools and going back to kind of where we talked about prioritizing use cases is what data feeds into that? And how do we kind of make sure that data is of high quality so that we can use it on a consistent basis? Within understanding how to do that, there's a lot of different elements. And we mentioned a ton today on the podcast, but I think the biggest one is data quality. So when you identify those use cases and you identify what sources of data are most important, It's then taking the time to look at those data sources, assess its quality and figure out how to create an automated way to feed that data through to the use cases to reduce errors, improve the efficiency of it and essentially drive value for the business. And I think the biggest thing is going back to the practicality of doing this. The biggest thing is talking to your stakeholders up front and really understanding what data in the organization should drive that value. and how do we make that happen? And then building the foundations around that story. The last thing I'll say on that, and this is actually probably essential if you're going into reality or practical delivery, is making sure that other people are on board. And that is one reason why you have to ask those questions, why you need to work with them very closely. Because otherwise you build something and no one uses it, right? And that's what's happening to a lot of organizations right now.
- Speaker #1
Today, Dylan Anderson showed why despite all the excitement around analytics and AI, most companies still struggle to turn data into decisions. The core issue is not tools, it's shaky foundations. Organizations are drowning in data but starved of insight because the information isn't structured, isn't trusted, and often isn't even shareable across teams. Reliability spans the classic dimensions, accuracy, completeness, timeliness, and consistency. Dylan's shipping port digital twin story made it concrete. A breakthrough at one site stalled at others because each port standards and data quality were different, killing scale and momentum. The lesson? Before you scale, standardize. Now here is your tease for the second part of this podcast. If part one was about fixing the pipes, part two is about turning the tap. How data quality transforms AI from a headache into a competitive edge will dig into why AI is finally democratizing insight. Why structured beats unstructured inside the enterprise and how small specialized models on your own trusted data reduce hallucination and raise accuracy. We'll talk data quality debt. The people and process changes that unlock AI, and real examples of AI harmonizing messy records at scale. If you want AI that actually works in your business, start with the data and join us in episode 2.