Deploying AutoGluon Models with KServe
This guide explains how to deploy an AutoGluon TabularPredictor model with KServe using the autogluon model format and the kserve-autogluonserver runtime.
Prerequisites
Before you begin, make sure you have:
- A Kubernetes cluster with KServe installed.
- Access to a storage backend reachable by your cluster (for example, GCS, S3, or Azure Blob).
- A model saved with
TabularPredictor.save(path).
AutoGluon models must be stored as a directory generated by TabularPredictor.save(path), not as a single file artifact.
Deploy the Model with REST Endpoint
Create an InferenceService with explicit runtime selection:
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "autogluon-titanic"
spec:
predictor:
model:
modelFormat:
name: autogluon
protocolVersion: v2
runtime: kserve-autogluonserver
storageUri: "gs://your-bucket/autogluon-model/"
resources:
requests:
cpu: "100m"
memory: "1Gi"
limits:
cpu: "1"
memory: "2Gi"
Apply the manifest:
kubectl apply -f autogluon.yaml
The kserve-autogluonserver runtime may not be installed by default in every release bundle. Verify that the ClusterServingRuntime exists in your cluster before deploying the InferenceService.
Run Inference
First, determine the ingress IP and ports, then set INGRESS_HOST and INGRESS_PORT.
REST v1 Example
{
"instances": [
{
"PassengerId": 1,
"Pclass": 3,
"Sex": "male"
},
{
"PassengerId": 2,
"Pclass": 1,
"Sex": "female"
}
]
}
SERVICE_HOSTNAME=$(kubectl get inferenceservice autogluon-titanic -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v \
-H "Host: ${SERVICE_HOSTNAME}" \
-H "Content-Type: application/json" \
-d @./autogluon-input-v1.json \
http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/autogluon-titanic:predict
REST v2 Example
For v2 requests, provide one input tensor per feature. Each tensor name must match the feature name expected by the model, and all features must have a consistent batch length.
{
"inputs": [
{ "name": "PassengerId", "shape": [2], "datatype": "INT64", "data": [1, 2] },
{ "name": "Pclass", "shape": [2], "datatype": "INT64", "data": [3, 1] },
{ "name": "Sex", "shape": [2], "datatype": "BYTES", "data": ["male", "female"] }
]
}
SERVICE_HOSTNAME=$(kubectl get inferenceservice autogluon-titanic -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v \
-H "Host: ${SERVICE_HOSTNAME}" \
-H "Content-Type: application/json" \
-d @./autogluon-input-v2.json \
http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/autogluon-titanic/infer
Expected response:
{
"model_name": "autogluon-titanic",
"outputs": [
{ "name": "predictions", "datatype": "INT64", "shape": [2], "data": [1, 0] }
]
}
Prediction Probabilities
The AutoGluon runtime supports returning probabilities via the PREDICT_PROBA=true environment setting in the runtime container configuration.
When probability output is enabled, output schema differs from class prediction output.
Troubleshooting
- Ensure
storageUripoints to a directory created byTabularPredictor.save(path). - For v2 requests, verify each feature is provided as a separate tensor with matching batch length.
- If no runtime is selected automatically, set
runtime: kserve-autogluonserverexplicitly.