H2O-3: Fast Scalable Machine Learning

3.46.0.10 · active · verified Thu Apr 16

H2O-3 is an open-source, in-memory, distributed, fast, and scalable machine learning platform primarily implemented in Java with a Python client. It offers a wide array of common machine learning algorithms including GLM, Gradient Boosting, Deep Learning, XGBoost, and Isolation Forest. The current version is 3.46.0.10. Releases are frequent, typically on a monthly or bi-monthly cadence, reflecting active development and continuous improvement.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart initializes a local H2O cluster, converts a Pandas DataFrame into an H2OFrame, trains a Gradient Boosting Machine (GBM) model, makes predictions, and demonstrates proper shutdown of the H2O cluster. Pay close attention to data type conversions (e.g., `asfactor()`) and cluster resource management.

import h2o
from h2o.estimators.gbm import H2OGradientBoostingEstimator
import pandas as pd

# Initialize H2O cluster (adjust max_mem_size based on your system's RAM and data size)
h2o.init(max_mem_size="4G", nthreads=-1) 

# Create a sample Pandas DataFrame
data = {
    'feature1': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'feature2': [10, 9, 8, 7, 6, 5, 4, 3, 2, 1],
    'target': [0, 1, 0, 1, 0, 1, 0, 1, 0, 1]
}
df_pandas = pd.DataFrame(data)

# Convert Pandas DataFrame to H2OFrame
df_h2o = h2o.H2OFrame(df_pandas)

# Convert target to factor for classification problems
df_h2o['target'] = df_h2o['target'].asfactor()

# Define predictors and response variables
predictors = ['feature1', 'feature2']
response = 'target'

# Split data into training and testing sets
train, test = df_h2o.split_frame(ratios=[0.7], seed=42)

# Build a Gradient Boosting Machine (GBM) model
gbm_model = H2OGradientBoostingEstimator(
    ntrees=50,
    max_depth=5,
    seed=42
)
gbm_model.train(x=predictors, y=response, training_frame=train)

# Make predictions on the test set
predictions = gbm_model.predict(test)
print("\nPredictions on test data (first 5 rows):\n")
print(predictions.head())

# Shutdown H2O cluster (crucial for resource management)
h2o.shutdown(prompt=False)

view raw JSON →