Dataiku projects are the central place for all work and collaboration for
users. Each Dataiku project has a visual flow, including the pipeline of
datasets and recipes associated with the project.
Users can view the project and associated assets (like dashboards), check
the project’s overall status, and view recent activity.
Organizing data pipelines to transform, prepare, and analyze data is
critical for production-ready AI projects.
The Dataiku visual flow allows coders and non-coders alike to easily build
data pipelines with datasets, recipes to join and transform datasets, and
the ability to build predictive models. The flow also has code and reusable
plugin elements for customization and advanced functions.
Data Quality and Checks
Checks in Dataiku allow for automatic assessment of flow elements to
compare with specified or previous values, ensuring that automated flows
run within expected timeframes and with expected results. When data
pipeline items fail checks, an error will be returned, prompting
investigation and promoting quick resolution.
Scenarios and Triggers for Automation
Operating AI projects require repetitive tasks like loading and processing
data, running batch scoring jobs, and more. With Dataiku, scenarios and
triggers automate repetitive processes by scheduling for periodic
execution or triggers based on conditions.
With automation in place, production teams can manage more projects
and scale to deliver more production AI projects.
Code Notebooks, Recipes, and Environments
Dataiku is for coders and non-coders alike. Developers and advanced
data scientists who prefer tools like Python or R can incorporate code into
projects via notebooks or directly with code recipes and plugins.
Dataiku supports code notebooks for SQL, Python, and R, and code
recipes developed in Python, R, SQL, Hive, Pig, Impala, Spark-Scala,
PySpark, Spark/R, SparkSQL, and Shell. Dataiku also supports code
environments for Python, R, and Conda, and it has a complete API for R.
Integrating with Git for code version management is required for
development projects. Dataiku provides integration with Git, including
version control of projects, importing Python and R code, developing
reusable plugins, importing plugins, and more.
Dataiku includes robust APIs to integrate with external systems to create
and manage AI and analytics projects. The Dataiku public API allows
authorized users to interact via an external system, including
administration, maintenance, and data access.
The public API is available via a Python API client or via HTTP REST API.
Scala for specific functions.