Designing a best-in-class data platform for your business: 7 guiding principles

Data Nick
7 min readApr 3, 2023

--

So you’re building a modern data platform for your business and you want to get it right.

You’ll be thinking about which technology provider to go with or indeed whether should build vs buy, what set of data and analytics capabilities you will require, the business use cases you will enable and how much it will all cost.

Those are really important considerations but, if you forgive us, we’ll largely ignore them in this article. Most material on building a data platform tends to focus on those very issues so you won’t be short of advice.

Instead, our focus will be on some higher-level guiding principles, some of which may be obvious but some of which are all too often overlooked.

To save you scrolling to the bottom, here’s the punchline: we contend that you’re doing it right if the following statements are true for your data platform:

  1. It lives on public cloud
  2. It abstracts away activities such as onboarding, project creation and resource provisioning with automation and enables self-service of these routine processes
  3. Its tooling supports your business’ complete spectrum of data users, from citizen data analysts through to developers, quants and data scientists
  4. Its user experience is integrated with the rest of your business: the same identity, the same device, and even access through the same company portal
  5. It has easy discoverability and access to existing enterprise assets (not just data sources but also APIs and models) and the ability to self-publish back to these marketplaces
  6. It’s integrated against a dedicated Data RTL (route to live), distinct from a Software RTL, that releases and operationalises data products as business applications
  7. Finally, you only have one platform for your entire business

Let’s spend a little time on each of these.

Public Cloud

It lives on public cloud

A data platform is an entire ecosystem and it’s not just storage, ETL and analytics. You will require data security, networking, an API gateway, a load balancer, authentication and more. It will make your life easier if you can inherit as many of these things as possible.

Similarly, real-time data feeds and the real-time data insights should be a staple of your platform. Cloud platforms are real-time ready and save you from having to re-architect your current estate for low-latency workloads.

Advanced analytic techniques such as compute-intensive machine learning will require GPUs for model training. Data Scientists will want a range of GPUs to choose from. Demand for these compute resources often comes in peaks and troughs with intense bursts of compute followed by idle time. As such, this is an elastic resource that is best suited to public cloud.

In fact, with data volumes growing exponentially and demand for data-driven insight surging, in-built scalability is a must for a modern data platform.

If the ambition is to build something modern in a modern way then cloud is also the home of serverless technology and managed services. Leverage these to increase pace of delivery, minimise your maintenance overhead, achieve costs savings and provide a better developer experience for your platform engineers.

Automation and Self-Service

It abstracts away activities such as onboarding, project creation and resource provisioning with automation and enables self-service of these routine processes

Any task that is routine should be automated.

By automated, we mean translated into a codified workflow or template that can be called upon on demand. With these templates in hand, self-serving users or platform workflows can trigger them, fetching the script from a repository for in-the-moment execution.

Resource provisioning can be achieved at a user persona level, aiming to meet the resource requirements of that particular user base. For example, for a data visualisation user, you could provision read-only access to your enterprise datasets, an account on your business’ data visualisation tool with the ability to create and persist dashboards.

Avoid the inclination to have a human-in-the-loop approving this onboarding request. If this user is already in your organisation, then they should have a digital identity with a range of associated privileges. If those privileges indicate that data visualisation is an activity associated to their role, then onboarding should be instantaneous.

Under the hood, Infrastructure as Code (IaC) should be embraced to avoid any manual provisioning of infrastructure, environments and resources. And wherever possible, you should interact with your services through their APIs for the same benefits. This will provide safety, consistency and speed.

Tooling for All

Its tooling supports your business’ complete spectrum of data users, from citizen data analysts through to developers, quants and data scientists

Your best metric for platform success will be adoption and, as such, you’ll need to ensure your platform caters to all your colleagues, including your citizen data users. Enabling these folks is known as data democratisation and, as such, low-code and no-code interfaces should be part of your offering.

For your traditional data developers, please don’t just rely on notebooks. Even if they seem quite happy with them, engineers should be encouraged to use a full-feature IDE so that they can produce tidy, tested, debugged and linted code. Consider using a virtualised development workstation to embed IDEs into the native platform offering.

A modern data platform provides users with a rich suite of data and analytic tooling to consume from but it also provides niche users the ability to bring their own analytics. This could include their own interpreter (because they don’t use Python, say), their own libraries and software and even their own IDE.

There should be an environment which provides a playpen experience with the ability to download, install and test new packages and software. This refers to tools which have not been approved for use in your enterprise. Within a segregated and otherwise disconnected playpen, the freedom and flexibility to do this will form the first step in an onboarding pipeline, ensuring your environment remains relevant and future-proof whilst also providing you with the appropriate oversight.

Don’t forget collaboration tooling. Your platform users are humans that will be working with other humans. They will need access to a messaging channel to communicate and share ideas, as well as to online forums like Stack Overflow.

Integrated User Experience

Its user experience is integrated with the rest of your business: the same identity, the same device, and even access through the same company portal

Integrate your organisation’s existing identity and enable Single Sign On (SSO) authentication so that your users don’t have another set of credentials to remember. Once in place, this will reduce your management overhead too.

Their existing company-issue laptop should suffice. It doesn’t need to be a souped-up developer machine if the platform’s developer tools are all virtualised and the compute is happening on your platform. This saves on hardware upgrade costs or avoids developers having two different devices.

Finally, make sure users can navigate to your platform from your existing company portal rather than a tricky URL. You want your platform to be seen as part of your organisation’s digital fabric.

Sometimes, it’s the little things.

Discoverability

It has easy discoverability and access to existing enterprise assets (not just data sources but also APIs and models) and the ability to self-publish back to these marketplaces

Your business data needs to be discoverable to your platform users from a single place. This means your event streams, your unstructured data in object stores, and of course your structured tables in database stores.

Granting access to the data itself is another question but discoverability should almost always be true so that even the existence of protected or locked-down data can be discovered.

For discovery, physical copies of your business data don’t necessarily need to reside within the walls of your platform but the metadata should be at least available. Consider taking this further by stitching together a data fabric of your business’ disparate data sources through a virtual data access layer.

Data shouldn’t be the only discoverable asset. Approved APIs and models that enrich data or provide insight should also be discoverable. These are often consumed from marketplaces or inventories. Aim to have these all consumable from one place.

A data platform will allow users to generate new data and new insight. These generated assets, once they satisfy certain quality thresholds, should be consumable by other platform users. As such, platform users will need the rights to self-publish and a pipeline should promote reusable assets back into an enterprise store.

Data RTL

It’s integrated against a dedicated Data RTL (route-to-live), distinct from a Software RTL, that releases and operationalises data products as business applications

Unless you are a new start-up, it is likely most of your business’ digital estate will be predicated on a traditional software application route-to-live. This is largely incompatible with the development and deployment of data products.

Data users tend to begin their development lifecycle by interrogating live data but most organisations keep live data in a locked-down production environment.

Often the solution is to try and force data users through a software engineering route-to-live rather than build out a more appropriate environment framework. The restraints of such an approach often catch up with organisations that take this approach.

So embrace a Data RTL up front to minimise time to insight and ensure your organisation can fully exploit data and analytics.

You can read more here on how you might set this up.

One Platform

Finally, you only have one platform for your entire business

Good engineering principles around extensibility, reusability and deduplication would suggest this is true.

The purpose of an enterprise data platform is to span your organisation and bring its data assets together for en masse exploitation.

If you’ve built your platform in the right way, there shouldn’t be a need for another one.

Good luck!

--

--

No responses yet