Machine Learning with Dataiku

  • Home
  • Machine Learning with Dataiku

Feature Engineering

To aid in the feature engineering process, Dataiku AutoML automatically

fills missing values and converts non-numeric data into numerical values

using well-established encoding techniques.

Users can also create new features using formulas, code, or built-in visual

recipes to provide additional signals to improve model accuracy. Once

created, Dataiku stores feature engineering steps in recipes for reuse in

scoring and model retraining.

"desktop monitor"
"dataiku machine learning dashboard"

Delivering More Models with AutoML

Automating the model training process using the best practice techniques

combined with built-in guardrails allows business analysts to build and

compare multiple production-ready models.

Dataiku AutoML uses leading algorithms and frameworks like Scikit-Learn

and XGBoost to find the best modeling results in an easy to use interface

for users across the business.

Notebook ML

Dataiku supports a variety of notebooks for code-based experimentation

and model development using Python, R, and Scala-based on Jupyter.

Dataiku also includes eight prebuilt notebooks for data analysis including

statistics, dimensionality reduction, time series, and topics modeling.

"dashboard screenshot"
"dataiku machine learning chart"

Time Series Visualization and Forecasting

Dataiku supports time-series data preparation, including resampling,

windowing, extrema extraction, and interval extraction. Time series

visualization creates line charts to display time-series data for analysis.

Data scientists can develop forecasting models using the forecasting

plugin or using custom code and notebooks combined with data

preparation and visualization in a project to ensure their forecast model is

ready for production use.

Deep Learning with Keras and Tensorflow

Dataiku fully supports deep learning with Keras and Tensorflow, including

training and deployment to CPUs and GPUs.

In Dataiku, deep learning models are treated just like any other model

created and managed in Dataiku, making deep learning models easy to

deploy as part of projects and business applications.

"python dashboard"

Custom Models using Python and Scala

Dataiku does not restrict you to the algorithms that are part of its AutoML

capabilities — it also allows users to write custom models using Python or

Scala. Custom models are first-class citizens in Dataiku.

Once deployed in a project, custom models are handled like any other

model. This powerful capability to use custom-coded models opens up

various use cases that may not be easily modeled by other methods (such

as AutoML).

Training on Large Datasets with Spark

Dataiku supports model training on large datasets that don’t fit into

memory using Spark MLLib or H2O Sparkling Water.

Once configured, Spark becomes available to users for model training.

Depending on the configuration, users can then train models using the

available algorithms in MLLib like regression, decision trees, etc., or use

H2O Sparkling Water with support for deep learning, GBM, GLM, random

forest, and more.

"dataiku dashboard"