To aid in the feature engineering process, Dataiku AutoML automatically
fills missing values and converts non-numeric data into numerical values
using well-established encoding techniques.
Users can also create new features using formulas, code, or built-in visual
recipes to provide additional signals to improve model accuracy. Once
created, Dataiku stores feature engineering steps in recipes for reuse in
scoring and model retraining.
Delivering More Models with AutoML
Automating the model training process using the best practice techniques
combined with built-in guardrails allows business analysts to build and
compare multiple production-ready models.
Dataiku AutoML uses leading algorithms and frameworks like Scikit-Learn
and XGBoost to find the best modeling results in an easy to use interface
for users across the business.
Dataiku supports a variety of notebooks for code-based experimentation
and model development using Python, R, and Scala-based on Jupyter.
Dataiku also includes eight prebuilt notebooks for data analysis including
statistics, dimensionality reduction, time series, and topics modeling.
Time Series Visualization and Forecasting
Dataiku supports time-series data preparation, including resampling,
windowing, extrema extraction, and interval extraction. Time series
visualization creates line charts to display time-series data for analysis.
Data scientists can develop forecasting models using the forecasting
plugin or using custom code and notebooks combined with data
preparation and visualization in a project to ensure their forecast model is
ready for production use.
Deep Learning with Keras and Tensorflow
Dataiku fully supports deep learning with Keras and Tensorflow, including
training and deployment to CPUs and GPUs.
In Dataiku, deep learning models are treated just like any other model
created and managed in Dataiku, making deep learning models easy to
deploy as part of projects and business applications.
Custom Models using Python and Scala
Dataiku does not restrict you to the algorithms that are part of its AutoML
capabilities — it also allows users to write custom models using Python or
Scala. Custom models are first-class citizens in Dataiku.
Once deployed in a project, custom models are handled like any other
model. This powerful capability to use custom-coded models opens up
various use cases that may not be easily modeled by other methods (such
Training on Large Datasets with Spark
Dataiku supports model training on large datasets that don’t fit into
memory using Spark MLLib or H2O Sparkling Water.
Once configured, Spark becomes available to users for model training.
Depending on the configuration, users can then train models using the
available algorithms in MLLib like regression, decision trees, etc., or use
H2O Sparkling Water with support for deep learning, GBM, GLM, random
forest, and more.