Version: Next

Deploying AutoGluon Models with KServe

This guide explains how to deploy an AutoGluon TabularPredictor model with KServe using the autogluon model format and the kserve-autogluonserver runtime.

Prerequisites

Before you begin, make sure you have:

A Kubernetes cluster with KServe installed.
Access to a storage backend reachable by your cluster (for example, GCS, S3, or Azure Blob).
A model saved with TabularPredictor.save(path).

Model Artifacts Must Be a Directory

AutoGluon models must be stored as a directory generated by TabularPredictor.save(path), not as a single file artifact.

Deploy the Model with REST Endpoint

Create an InferenceService with explicit runtime selection:

apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "autogluon-titanic"
spec:
  predictor:
    model:
      modelFormat:
        name: autogluon
      protocolVersion: v2
      runtime: kserve-autogluonserver
      storageUri: "gs://your-bucket/autogluon-model/"
      resources:
        requests:
          cpu: "100m"
          memory: "1Gi"
        limits:
          cpu: "1"
          memory: "2Gi"

Apply the manifest:

kubectl apply -f autogluon.yaml

Runtime Availability

The kserve-autogluonserver runtime may not be installed by default in every release bundle. Verify that the ClusterServingRuntime exists in your cluster before deploying the InferenceService.

Run Inference

First, determine the ingress IP and ports, then set INGRESS_HOST and INGRESS_PORT.

REST v1 Example

{
  "instances": [
    {
      "PassengerId": 1,
      "Pclass": 3,
      "Sex": "male"
    },
    {
      "PassengerId": 2,
      "Pclass": 1,
      "Sex": "female"
    }
  ]
}

SERVICE_HOSTNAME=$(kubectl get inferenceservice autogluon-titanic -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v \
  -H "Host: ${SERVICE_HOSTNAME}" \
  -H "Content-Type: application/json" \
  -d @./autogluon-input-v1.json \
  http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/autogluon-titanic:predict

REST v2 Example

For v2 requests, provide one input tensor per feature. Each tensor name must match the feature name expected by the model, and all features must have a consistent batch length.

{
  "inputs": [
    { "name": "PassengerId", "shape": [2], "datatype": "INT64", "data": [1, 2] },
    { "name": "Pclass", "shape": [2], "datatype": "INT64", "data": [3, 1] },
    { "name": "Sex", "shape": [2], "datatype": "BYTES", "data": ["male", "female"] }
  ]
}

SERVICE_HOSTNAME=$(kubectl get inferenceservice autogluon-titanic -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v \
  -H "Host: ${SERVICE_HOSTNAME}" \
  -H "Content-Type: application/json" \
  -d @./autogluon-input-v2.json \
  http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/autogluon-titanic/infer

Expected response:

{
  "model_name": "autogluon-titanic",
  "outputs": [
    { "name": "predictions", "datatype": "INT64", "shape": [2], "data": [1, 0] }
  ]
}

Prediction Probabilities

The AutoGluon runtime supports returning probabilities via the PREDICT_PROBA=true environment setting in the runtime container configuration.

note

When probability output is enabled, output schema differs from class prediction output.

Troubleshooting

Ensure storageUri points to a directory created by TabularPredictor.save(path).
For v2 requests, verify each feature is provided as a separate tensor with matching batch length.
If no runtime is selected automatically, set runtime: kserve-autogluonserver explicitly.

Prerequisites​

Deploy the Model with REST Endpoint​

Run Inference​

REST v1 Example​

REST v2 Example​

Prediction Probabilities​

Troubleshooting​

References​