AI Endpoints - Batch API


New

AI Endpoints - Batch API

The most cost-efficient choice for large-scale or non-urgent workloads.

Run your requests asynchronously and optimize your budget for background processing or scheduled tasks, while ensuring sovereign data processing on our European infrastructure.

  • Beta started !
  • Free of charge during the Beta phase !
  • Duration : 2 months
  • Quick start
AI endpoints new batch 2

Why AI Endpoints Bach API ?

While AI Endpoints Base API is for interaction, Batch API aims for long-running jobs (minutes to hours), where cost and throughput matter more than latency.

  • Cost efficiency, run jobs for 50% of price (at General Availability)
  • Fixed SLO: All jobs are queued to be completed within a 24 hours target.
  • Resource Efficiency: We bridge the gap between volatile demands and fixed compute capacity.
  • OpenAI Ready: Seamless integration into existing workflows with full API compatibility.

Use cases for AI Endpoints Batch mode

Batch API is optimized for asynchronous Workloads: The perfect engine for tasks where volume beats speed

 

netapp_efs

Mass document processing (classification, extraction, OCR post-processing, tagging)

Scale: Turn mountains of PDFs and scans into structured data.

Features: Automated classification, entity extraction, and OCR post-processing.

Value: Eliminates manual data entry for back-office operations

 

low cost icon OVHcloud

Large-Scale Embeddings

Scale: Convert entire knowledge bases into vectors for RAG or search.

Features: High-throughput indexing for vector databases.

Value: Cost-efficient foundation for semantic search engines.

 

netapp_nas-ha

Dataset Preparation

Scale: Clean and augment massive datasets for machine learning.

Features: Text preprocessing, PII anonymization, and synthetic data generation.

Value: High-quality training data without the premium real-time price tag.

 

How to use Batch mode ?

A batch job is processed in four stages:


1.Prepare a JSONL file where each line describes a single request (model, endpoint, body).
2.Upload this file to AI Endpoints through the Files API (`/v1/files`) with `purpose="batch"`
3.Create a batch (`/v1/batches`) referencing the uploaded file and the target endpoint.
4.Poll the batch status until it is `completed`, then download the output file (and the error file, if any).

 

User Guide Documentation

 

Requirements

AI Endpoints Base and Batch API specifications

 

BATCH

(Beta)

BASE

(Generally available)

Models availableMVP : LLMs & EmbeddingsAll catalog
Uses casesOptimized for large-volume or delayed workloads. Ideal for all your requests that can waitIdeal for interactive workloads with standard throughtput, where users experience usual interactions
Prompts per API Call50k rows max. / File size limited to 200MB1 (max 2 Mo) / 1 - 25 for Embeddings
Response TimeSLO target = 24hStandard response time experience
Rate Limitn.a400 Requests per Minute per Project (higher for embeddings)
Pricing model (per token or per image or per second)Cost-optimized, with batch requests billed at a significant discounted price compared to the Base API rateWith standard price, no commitment required, enjoy a pay-as-you-go consumption mode
API TypeAsynchronousSynchronous
SLANo SLA99.8% uptime 
Pricing50% off vs Basehttps://www.ovhcloud.com/en/public-cloud/prices/#17128
AvailabilityWW (out of US)WW (out of US)
OpenAI compatibilityYESYES (except. image generation models)
custom background image

Feedbacks are welcome

 

Batch mode is still in development. Functionalities are still in progress and supposed to improve. 

Thanks to give us feedbacks through Feature Requests on GitHub.

 

 

FAQ

What is the Batch API ?

The Batch API allows you to batch many tasks or big datasets into one request, run them asynchronously in the background, and fetch all results together. It reduces round trips, eliminates overhead, and is ideal for large processing jobs. For all the requests that can wait, this allows us to deliver discounted prices per token.

How is Batch Mode different from AI Endpoints base (real-time) inference ?

Standard inference returns a response immediately (synchronous). Batch Mode is asynchronous: you submit a job, it is processed in the background, and you poll for completion. The trade-off is latency for cost — Batch Mode is significantly cheaper, has no rate limits per request, and is designed for workloads where a sub-second response time is not required.

Is the API compatible with OpenAI's Batch API

Yes. The Batch Mode API is designed to be compatible with the OpenAI Batch API interface. You can use the official OpenAI Python or TypeScript SDK by pointing base_url to our endpoint and replacing your API key. The completion_window parameter accepts "24h".

What will be the price of AI Endpoint Batch mode ?

AI Endpoints Batch mode is a asynchronous offering allowing customers to benefit from 50% target price of AI Endpoint services.

How long is my request duration?

Batch API requests are processed asynchronously. While the processing time can vary based on the current system load and the size of your payload, our target Service Level Objective (SLO) is 24 hours.

How do I check the status of a running job ?

You can track the status of your batch jobs using either of the following methods:

  • Via the API / Technical Documentation: If you prefer to check the status programmatically, you can follow the step-by-step instructions and find technical details on checking job status in our dedicated  Ai Endpoints Batch mode User Guide

  • Via the Dedicated UI: You can monitor the real-time status and evolution of your jobs directly from our interactive dashboard here: [coming soon].

What happens if a job is not completed within 24 hours ?

If a batch cannot be completed within the 24-hour completion window, it transitions to the expired status. This is a terminal state — the job stops processing and will not be retried automatically. You will need to submit a new batch to reprocess your requests.

For more information about the batch lifecycle, please refer to our documentation

Am I billed for requests that fail or time out ?

No. You are only billed for requests that are successfully processed. Requests that fail due to an internal error, or that are not reached before the 24-hour window closes, are not charged.

  • Alpha
  • Beta
  • General Availability