Member-only story

How to build a Route to Live (RTL) for data products like Machine Learning models

Data Nick
GoPenAI
Published in
5 min readMar 31, 2023

More often than not, our enterprise platforms are designed for traditional software application development. They usually consist of four environments — Dev, Test, Pre-Prod & Prod — where the environments become increasingly more secure as you progress through them.

As such, “Dev” or Development is the most liberal of zones where developers can typically do as they please and, at the other extreme, “Prod” or Production is a no-touch zone and usually the only place where live data can reside.

However, these environments tend not to be suitable for the build and release of data products. By ‘data products’ we mean applications where data and code are tightly coupled and dependent on each other. Machine learning models are a prime example; where data scientists begin their lifecycle by studying live data and where model parameters depend on the data they’ve been trained on.

Raw (non-anonymised) data is needed at scale in these scenarios so that real-world trends and multi-variable correlations can be identified, so that data can be joined across multiple source systems, and so that ethics testing such as bias detection can take place.

Synthetic or anonymised data falls short here, especially in large organisations with multiple…

--

--

Responses (1)

Write a response