Deploying Projects to Production
The Dataiku unified deployer manages project files’ movement between
Dataiku design nodes and production nodes for batch and real-time
scoring. Project bundles package everything a project needs from the
design environment to run on the production environment.
With Dataiku, data scientists can see all the deployed bundles, and data
engineers of IT operations can quickly know when a new bundle requires
testing and roll-out.
Batch Scoring with Automation Nodes
Dataiku automation nodes are production nodes with advanced
automation capabilities to schedule everyday tasks for production projects
like monitoring, updating data, and retraining models based on a schedule
With automation nodes, AI projects run smoothly, and organizations can
scale the number of AI projects in production.
Real-Time Scoring with API Nodes
The deployment of predictive insights for real-time applications requires a
different set of characteristics than batch scoring, including dynamic
scaling of resources to meet changing needs.
Dataiku API nodes make it easy to deploy API endpoint services on elastic,
highly available infrastructure to support real-time scoring. With API nodes,
organizations can deploy more projects and build downstream
applications and processes powered by AI.
Deployment with ONNX, Even on the Edge
For connectivity, speed, cost, and privacy reasons, more and more use
cases require putting the model, the sensor, and the data on the same
small devices like smartphones or embarked processing units.
ONNX is an open format created by Facebook and Microsoft to enable
interoperability between common deep learning frameworks. Dataiku
supports model deployment using ONNX for prediction on a variety of
environments, including the edge.
Monitoring and Drift Detection
Once AI projects are up and running in production, the real work begins.
Operating AI projects use pipelines to process data and score in batch and
Dataiku monitors the pipeline to ensure all processes execute as planned
and alerts operators if there are issues. For models, Dataiku provides data
drift detection to check that scoring data and training data remain similar
so that the model can deliver reliable results.
Automatic Model Retraining
Production models periodically need to be updated based on newer data,
detected data drift, or an appropriate schedule.
Dataiku AI projects include automated retraining based on a schedule or
triggers, such as significant drift. With automatic retraining in place,
operations teams can focus on other pressing issues like troubleshooting
and new projects moving to production.
Production Project and Model Updates
Updating projects manually in production can be challenging and risky,
resulting in downtime for critical AI initiatives.
Dataiku makes it easy to update production artifacts — including models —
with full Git integration and version management. Dataiku production
nodes also support easy test and production environments, allowing for a
robust dev-test-prod approach to updates with multiple production nodes.
Automate CI/CD with APIs for DevOps
DevOps tools and processes are standard in enterprise software projects.
While AI projects are different in some ways, they still involve code artifacts
and can benefit from a continuous integration and deployment approach.
Dataiku provides a full API to perform programmatic operations from
external management systems used by DevOps teams. Dataiku integrates
with the tools that DevOps teams already use like Jenkins, GitLabCI, Travis
CI, or Azure Pipelines, to name a few.