The New Art of AI Engineering

Why the real impact of AI is yet to come and how to make that happen

Image for post
Image for post
  • What can we learn from the few front runners that have managed to lead with AI? What are their keys to success?
  • How can we bottle the magic? How can we replicate proven success factors through a structured, repeatable formula?
  • And once we get this formula right, what does success look like? What are the new forms of value it can bring for established companies, and how to bring this about?

1. Adopting AI at scale is a transformation challenge

The AI Productivity Paradox

When Ford introduced their ERP, they did not change their processes accordingly. Procurement orders were still manually checked against delivery notes, which subsequently were checked against invoices before payments would be made. A very laborious process prone to errors. After re-engineering their processes with BPR, they started using the ERP as it should. A central database now contained all orders which could be updated with new information by different departments. When goods were received, the delivering company would be paid automatically. No more invoices and no more back-and-forth messages between departments . Once they figured out how their ERP system could support a new way of working and organizing, Ford realized a 75% reduction in headcounts.

“Any new technology tends to go through a 25 year adoption cycle”, Marc Andreessen

That being said, the leaders in AI are showing the art of the possible. Amazon and Google have AI ingrained in almost every core process. Google has an internal mantra that has gone from “mobile first” (referring to the focus on internet user experience for their services on smart phones) to “machine learning first”. Focus on AI-powered automation is part of their DNA and core to everything they do. These are the obvious examples of the digital natives. In most industry sectors there are leading incumbents and start-ups following suit. But even though 90% of executives indicate they are investing in AI, only about 40% has seen results to date [1].

Dimension 1: developing and implementing AI solutions

Clearly AI is not plug-and-play technology. Rather it both drives and requires a new level of technology-enabled business innovation. Organizations that have embarked on AI initiatives encounter a common set of barriers to be addressed as they progress with AI solutions. We distinguish five different levels of maturity in development and implementation of AI solutions. Companies that take AI development seriously typically use a funnel with similar stage gates that initiatives go through. Lasting impact is achieved only by the initiatives that go all the way. However, the success rate of these initiatives is not merely dependent on their inherent potential and feasibility. Rather, it depends on the organizations ability to do this — a new organizational capability.

  • Level 2: Proof of concept developed. The next step is to prioritize a few opportunity areas for PoC development, commonly based on feasibility versus potential value creation. Get ideas flowing, gather data, and develop a prediction model for the opportunity at hand. New insights are created and people are getting excited. Perhaps the results are being piloted through batch initiatives by the business. This is a relatively comfortable place to be and also deceptively easy to do. The risk is that it stops here. There is a huge gap between having a model developed in a sandbox as PoC, and having it running in a production environment ready to drive a business process.
  • Level 3: Production ready. Getting to this level is a crucial step for sustainable impact. It means having your algorithm running in a fully automated pipeline, either in batch or real time. Data flows are automated and predictions are generated with the right intervals. Model outputs feed into a front-end application or dashboard to aid human decision making.
  • Level 4: Embedded in work flows. Operationalized solutions are embedded in key processes to augment or fully automate decisions. In case of full automation, humans are taken out of the loop and the predictions directly feed into other systems. These machine learning pipelines essentially manifest as product features. Examples can be product recommendations in e-commerce, ETA prediction in Uber, or content recommendations by Netflix.
    In the case of augmentation, the solution typically provides machine learning based recommendations through a user interface. Users can accept the results, or make adjustments when deemed necessary. In particular this latter form of augmented intelligence suffers from huge adoption challenges. While full automation presents a binary change in processes — either you do it or you don’t — augmentation is prone to user bypasses and retreat to old ways of working.
  • Level 5: Scaled and continuously optimized. The greatest fallacy in AI solution development is to think that at some point you are done. AI systems are never done. They have to be scaled and optimized to reach maximum impact. They have to be maintained to keep their level of performance. They have to be improved and expanded. That is not to say that all or some of these level-5 imperatives cannot be automated. But in particular for core processes, the algorithms require ownership that takes care of them. Sure, there are examples of out-of-the box applications that you can run with. But those are table-stakes. The AI solutions that will drive your business model and provide a competitive edge will have to be nurtured, grown and innovated continuously.

Dimension 2: building enterprise AI capabilities

The second dimension of AI transformation deals with the technology infrastructure that allows AI solutions to be developed at scale across an organization. Here again, we observe different maturity levels typically encountered.

  • Level 2: Frog leap innovation. In an effort to accelerate and escape the difficulties inherent to level 1, some organizations choose to take a well-managed and thoughtful frog leap to modern infrastructure and architecture — mostly cloud based (AWS, MS Azure, Google Cloud Platform). This can be the result of an innovation or experimentation effort on behalf of IT, or driven by the business for development a high-value use case. Done well, this can be a prelude to the next level.
  • Level 3: Platform standards. Companies that aim to really diffuse and scale AI across their enterprise will eventually need a coherent platform of uniform AI capabilities. Such a platform contains many components and tools — as a best of breed— that allows users enough flexibility to build solutions for different requirements but also maintain certain standards that can be managed centrally. More on this in section 2.
Image for post
Image for post
Figure 1. AI transformation paths.
  • On the other extreme, organizations can start out developing AI solutions from a legacy infrastructure starting point (1a). Strong business leaders that believe in the value of AI mobilize the right resources and mandate to start an initiative. While the pressure is on, such a rush to the front can create a lot of excitement and momentum for change. In most cases however, the aspired impact is not sustainable for long because production-grade infrastructure is lacking. That in turn makes it hard to bring solutions to a level of maturity that is required for convincing business adoption. In theory, if all required tooling is in place and can be stitched together, a single use case could travel the last mile (1e). However, in our experience this is rare and getting to production level (1c) is already an achievement.

2. The new art of AI engineering

“Specifically, the most impressive capabilities of AI — those based on machine learning — have not yet diffused widely. More importantly, like other general purpose technologies (GPT), their full effects won’t be realized until waves of complementary innovations are developed and implemented.” E. Brynjolfsson et al

AI as GPT requires new engineering discipline

This section will introduce a new paradigm for AI transformation. As mentioned earlier, the information technology wave of the previous century required complementary practices like BPR and Six Sigma to weave automation technology (ERPs) into organizational fabrics. Diffusing AI requires a similar approach to organizational change to make it truly “general purpose” throughout an entire company. It requires a holistic, highly integrated, multidisciplinary engineering approach which we call AI engineering. The practice of AI engineering consists of 3 key pillars of capabilities that have to work in sync:

  • AI Solution Engineering. Developing end implementing AI solutions to drive those new processes.
  • AI Platform Engineering. Building enterprise AI platform capabilities to sustainably build AI solutions at scale.
Image for post
Image for post
Figure 2. The New Art of AI Engineering

AI Business Engineering

When it comes to the impact of AI on business and society, Solow’s paradox springs back to mind. On the one hand, we are already seeing AI being widely adopted. Plentiful use case examples have been written about and we experience the benefits of AI-powered personalization everyday (e.g. Netflix, Spotify, etc.). On the other hand, we hear claims that we are only at the beginning of the AI revolution. Sundar Pichai, CEO of Google and Alphabet, argued at World Economic Forum beginning 2020 that AI will ultimately have a greater impact than electricity.

  • Aim for clusters. In many instances, use cases come in clusters around a common data foundation or machine learning problem. Predicting car retail prices for instance is a stepping stone to forecasting their residual value — a key financial metric for any car leasing company.
  • Smartly balance feasibility with impact. A celebrated method of prioritizing any business case is to score impact versus feasibility. Of course, the sweet spot of use cases that score high on both are easy to prioritize. Other than that, it can make sense to have a small portfolio of highly feasible front-runners on horizon 1 to show the organization what is possible, to create momentum to go after the real big ticket opportunity areas on horizon 2 and 3.

AI Solution Engineering

Once you have figured out where and how AI can power your business, it is time to develop the accompanying solutions and change. To overcome the barriers presented in section 1, a proven change approach is needed. Many companies working on AI initiatives develop a step-wise approach of some sort, which is honed by experience over time. So have we, for which the key principles are listed below. It would be overkill to fully lay out all the detailed work steps here. Moreover, the best music does not come from exactly playing all the notes right, but from the artist’s own creative interpretation and execution of the underlying essence. So it is for practitioners leading AI initiatives.

  • Start with the business process. A natural tendency especially for analytical team members is to dive right in, gather some data, start doing analysis develop a first model. Although an experimentation mindset is super valuable, when the task is to build a mission critical AI capability you have start with the end in mind: what process are we supporting? What decisions are supported by the AI? Will it run fully automated, or will it be augmented decision intelligence where human and machine work together? What feedback do we expect from the results, and how will they be used to train the system? What is the performance threshold for the algorithm required to create value? What edge cases do we expect — situations where algorithms perform poorly — and how do we deal with it? These are all questions to address upfront and incorporate in the MVP design.
  • Adopt agile and cross-functional collaboration. This may be trivial nowadays, but it does constitute a critical success factor. It is paramount to have all stakeholders on the bus from day 1. The business needs to be part of the entire design and implementation journey, to maximize chance of adoption but also to bring essential domain knowledge to the team. Engineers need to be brought in before the engineering work actually starts, to ensure they are involved in all the design choices and prevent unpleasant surprises down the road. An iterative approach creates a lot of flexibility, but there is always a dose of path-dependency down the line from choices made earlier in the process.
  • Invest in superior integrator talent. AI solutions are very multidimensional, requiring involvement and contributions from many different disciplines. Many organizations come a long way having at least some of the required skills, with some subset of people. Successful solution development however requires integrators — super generalists sufficiently knowledgeable across all disciplines (business, modeling, data engineering, process design) combined with strong project and stakeholder management skills. This skill set is more rare than machine learning or data engineering.

AI Platform Engineering

Most organizations starting out on their AI transformation lack the platform infrastructure to really scale AI. The crux is to leverage the requirements of new AI solutions to build capability components step-by-step. This way, both dimensions of change from the previous section are addressed in a mutually reinforcing way. New components and platform services are built with a sense of urgency, because they are immediately required. The platform is expanded through the lens of the primary users: AI developers. New innovations can be proven “locally” and on-boarded to the platform once proven. On the other hand, AI solutions are built with generic infrastructure components. This in turn ensures robustness and maintainability. We have observed organizations take this approach along 2 different paths, as shown in figure 2:

  • When such as platform is not yet in the making and you want to accelerate a certain AI solution, taking a frog leap is a good alternative approach. Set up a cloud environment, preferably using infrastructure-as-code to be provider-agnostic, and develop your first high impact AI solution on this environment (2d). Later on, once the first solution is already up and running and creating business impact, this platform can be used as a blueprint for enterprise standards (2d to 3e). The benefits of this approach are minimal time-to-market. The only downside is that you might cut some corners that have to be reworked to fully comply to your future standards (3e).
Image for post
Image for post
Figure 3. Solution-driven capability development.
  • Development. Generic data sets that are re-used often across different use cases, combined with use-case specific data.
  • Production. Building solution-specific data pipelines.The productionization of algorithms requires at least 2 different data pipelines: one for (automated) model retraining, and one for model scoring (inference).
  • Bi-directional. Traditional BI only consumes data. The output is a dashboard with insights and metrics. AI systems however require 2-way interaction with other systems. Model outputs are fed to other systems for operational decisions. The feedback and results of those processes are in turn fed back to the algorithm to evaluate and improve performance. This implies that the classical separation of operational and analytical data architectures starts to break down and merge. An AI system is not only a data consumer (like BI) but also data producer. The data architecture (and governance) needs to support that.
  • Real-time. Depending on their specific use case, machine learning predictions are required in real-time, or make use of real-time data. For instance e-commerce sites provide live product recommendations, based on your real-time browsing behavior.
Image for post
Image for post
Figure 4. Key differences in requirements for data architecture between BI and AI
Image for post
Image for post
Figure 5. Data Lake versus Data Warehouse
Image for post
Image for post
Figure 6. ML code (black box) is only fraction of complexity of entire ML system (source: D. Sculley et al, NIPS 2015)
  • Ensure manageability & governance. Having a central repository and view on all deployed ML models enables proper management of both the models as well as the infrastructure (i.e. the model engine) it runs in.
  • Minimize technical debt. The uncoupling of components helps to minimize creation of any technical debt. The components needed to operationalize the model are part of one harmonized capability, rather than being part of every single AI solution.
  • Performance Management. To ensure prediction performance, various actions can be taken. First, a model can be retrained on a more recent data set. This action can be taken automatically, based on preset triggers on performance thresholds or periodically. Second, a different model can be put in production. A model engine can run multiple models in production in parallel as shadow models — only one of them is really used for the operational process. Based on preset business ruling, the best performing model can be put into production automatically. Third, a data scientist can update the model. This can involve adding new parameters (features) based on new data sources.
Image for post
Image for post
Figure 7. High-level components of AI platform architecture

3. How to get started

I believe it was Andrew Ng, a well known entrepreneur and researcher in the deep learning arena, who first compared AI to electricity. At first I liked the analogy, but was skeptical about the reality of that statement. Not anymore. I believe we are rapidly entering an acceleration period where in particular the deployment of machine learning models for just about any decision process will become the norm in business. If AI is truly general purpose like electricity, it will flow through the veins and arteries of the enterprise, powering every decision cell out there. I believe this to be true, because all prerequisites exist. Cloud providers are offering pre-trained (transfer learning) models behind a single API. The same goes for speech, text and image recognition: these are almost commodity capabilities (as long as you don’t seek edge performance). Machine learning libraries are standard material for any python developer. Platforms are emerging that allow solution building on company-wide scale.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store