DataQraftDataQraft
info@dataqraft.com

MLOps with Dataiku

  • Home
  • MLOps with Dataiku

Deploying Projects to Production

The Dataiku unified deployer manages project files’ movement between

Dataiku design nodes and production nodes for batch and real-time

scoring. Project bundles package everything a project needs from the

design environment to run on the production environment.

With Dataiku, data scientists can see all the deployed bundles, and data

engineers of IT operations can quickly know when a new bundle requires

testing and roll-out.

"desktop monitor"
"dataiku automation nodes graph"

Batch Scoring with Automation Nodes

Dataiku automation nodes are production nodes with advanced

automation capabilities to schedule everyday tasks for production projects

like monitoring, updating data, and retraining models based on a schedule

or triggers.

With automation nodes, AI projects run smoothly, and organizations can

scale the number of AI projects in production.

Real-Time Scoring with API Nodes

The deployment of predictive insights for real-time applications requires a

different set of characteristics than batch scoring, including dynamic

scaling of resources to meet changing needs.

Dataiku API nodes make it easy to deploy API endpoint services on elastic,

highly available infrastructure to support real-time scoring. With API nodes,

organizations can deploy more projects and build downstream

applications and processes powered by AI.

"dataiku api nodes graph"
""

Deployment with ONNX, Even on the Edge

For connectivity, speed, cost, and privacy reasons, more and more use

cases require putting the model, the sensor, and the data on the same

small devices like smartphones or embarked processing units.

ONNX is an open format created by Facebook and Microsoft to enable

interoperability between common deep learning frameworks. Dataiku

supports model deployment using ONNX for prediction on a variety of

environments, including the edge.

Monitoring and Drift Detection

Once AI projects are up and running in production, the real work begins.

Operating AI projects use pipelines to process data and score in batch and

real-time.

Dataiku monitors the pipeline to ensure all processes execute as planned

and alerts operators if there are issues. For models, Dataiku provides data

drift detection to check that scoring data and training data remain similar

so that the model can deliver reliable results.

"monitoring dashboard"
"dashboard"

Automatic Model Retraining

Production models periodically need to be updated based on newer data,

detected data drift, or an appropriate schedule.

Dataiku AI projects include automated retraining based on a schedule or

triggers, such as significant drift. With automatic retraining in place,

operations teams can focus on other pressing issues like troubleshooting

and new projects moving to production.

Production Project and Model Updates

Updating projects manually in production can be challenging and risky,

resulting in downtime for critical AI initiatives.

Dataiku makes it easy to update production artifacts — including models —

with full Git integration and version management. Dataiku production

nodes also support easy test and production environments, allowing for a

robust dev-test-prod approach to updates with multiple production nodes.

"model version dashboard"
"dataiku model management"

Automate CI/CD with APIs for DevOps

DevOps tools and processes are standard in enterprise software projects.

While AI projects are different in some ways, they still involve code artifacts

and can benefit from a continuous integration and deployment approach.

Dataiku provides a full API to perform programmatic operations from

external management systems used by DevOps teams. Dataiku integrates

with the tools that DevOps teams already use like Jenkins, GitLabCI, Travis

CI, or Azure Pipelines, to name a few.