DataOps with Dataiku

  • Home
  • DataOps with Dataiku


Dataiku projects are the central place for all work and collaboration for

users. Each Dataiku project has a visual flow, including the pipeline of

datasets and recipes associated with the project.

Users can view the project and associated assets (like dashboards), check

the project’s overall status, and view recent activity.

"Desktop Monitor"
"dataiku work flow"

Visual Flow

Organizing data pipelines to transform, prepare, and analyze data is

critical for production-ready AI projects.

The Dataiku visual flow allows coders and non-coders alike to easily build

data pipelines with datasets, recipes to join and transform datasets, and

the ability to build predictive models. The flow also has code and reusable

plugin elements for customization and advanced functions.

Data Quality and Checks

Checks in Dataiku allow for automatic assessment of flow elements to

compare with specified or previous values, ensuring that automated flows

run within expected timeframes and with expected results. When data

pipeline items fail checks, an error will be returned, prompting

investigation and promoting quick resolution.

"Dataiku data-quality"
"dataiku automation scenario"

Scenarios and Triggers for Automation

Operating AI projects require repetitive tasks like loading and processing

data, running batch scoring jobs, and more. With Dataiku, scenarios and

triggers automate repetitive processes by scheduling for periodic

execution or triggers based on conditions.

With automation in place, production teams can manage more projects

and scale to deliver more production AI projects.

Code Notebooks, Recipes, and Environments

Dataiku is for coders and non-coders alike. Developers and advanced

data scientists who prefer tools like Python or R can incorporate code into

projects via notebooks or directly with code recipes and plugins.

Dataiku supports code notebooks for SQL, Python, and R, and code

recipes developed in Python, R, SQL, Hive, Pig, Impala, Spark-Scala,

PySpark, Spark/R, SparkSQL, and Shell. Dataiku also supports code

environments for Python, R, and Conda, and it has a complete API for R.

"dataiku products"
"dataiku git dashboard"

Git Integration

Integrating with Git for code version management is required for

development projects. Dataiku provides integration with Git, including

version control of projects, importing Python and R code, developing

reusable plugins, importing plugins, and more.


Dataiku includes robust APIs to integrate with external systems to create

and manage AI and analytics projects. The Dataiku public API allows

authorized users to interact via an external system, including

administration, maintenance, and data access.

The public API is available via a Python API client or via HTTP REST API.

Dataiku also includes a complete R API as well as APIs for JavaScript and

Scala for specific functions.

"Coding Image"