Inference

Call the HuggingFace Inference API.

The Serverless Inference API offers a fast and free way to explore thousands of models for a variety of tasks. Whether you’re prototyping a new application or experimenting with ML capabilities, this API gives you instant access to high-performing models across multiple domains:

text

- Text Generation: Including large language models and tool-calling prompts, generate and experiment with high-quality responses.
- Image Generation: Easily create customized images, including LoRAs for your own styles.
- Document Embeddings: Build search and retrieval systems with SOTA embeddings.
- Classical AI Tasks: Ready-to-use models for text classification, image classification, speech recognition, and more.

yaml
type: "io.kestra.plugin.huggingface.Inference"

Examples

Use inference for text classification

yaml
id: huggingface_inference_text
namespace: company.team

tasks:
- id: huggingface_inference
  type: io.kestra.plugin.huggingface.Inference
  model: cardiffnlp/twitter-roberta-base-sentiment-latest
  apiKey: "{{ secret('HUGGINGFACE_API_KEY') }}"
  inputs: "I want a refund"

Use inference for image classification.

yaml
id: huggingface_inference
namespace: company.team

tasks:
- id: huggingface_inference_image
  type: io.kestra.plugin.huggingface.Inference
  model: google/vit-base-patch16-224
  apiKey: "{{ secret('HUGGINGFACE_API_KEY') }}"
  inputs: "{{ read('my-base64-image.txt') }}"
  parameters:
    function_to_apply: sigmoid,
    top_k: 3
  waitForModel: true
  useCache: false

Properties

apiKey*string

API Key

Huggingface API key (ex: hf_********)

inputs*string

Inputs

Inputs required for the specific model

model*string

Model

Model used for the Inference api (ex: cardiffnlp/twitter-roberta-base-sentiment-latest, google/gemma-2-2b-it)

endpointstring

Defaulthttps://api-inference.huggingface.co/models

API endpoint

Default value of the Huggingface API is https://api-inference.huggingface.co/models

options

Options

The options to set to customize the HTTP client

Definitions

io.kestra.plugin.huggingface.AbstractHttpTask-RequestOptions

connectTimeoutstring

Formatduration

The time allowed to establish a connection to the server before failing.

connectionPoolIdleTimeoutstring

DefaultPT0S

Formatduration

The time an idle connection can remain in the client's connection pool before being closed.

defaultCharsetstring

DefaultUTF-8

The default charset for the request.

java.nio.charset.Charset

maxContentLengthintegerstring

Default10485760

The maximum content length of the response.

readIdleTimeoutstring

DefaultPT5M

Formatduration

The time allowed for a read connection to remain idle before closing it.

readTimeoutstring

DefaultPT10S

Formatduration

The maximum time allowed for reading data from the server before failing.

parametersobject

Parameters

Map of optional parameters depending on the model

useCachebooleanstring

Defaulttrue

Use cache

There is a cache layer on the inference API to speed up requests when the inputs are exactly the same. Many models, such as classifiers and embedding models, can use those results as is if they are deterministic, meaning the results will be the same. However, if you use a nondeterministic model, you can disable the cache mechanism from being used, resulting in a real new query.

waitForModelbooleanstring

Defaultfalse

Wait for model

When a model is warm, it is ready to be used and you will get a response relatively quickly. However, some models are cold and need to be loaded before they can be used. In that case, you will get a 503 error.

Outputs

outputobject

Output returned by the Huggingface API

AI

Tasks that orchestrate generative AI in Kestra with LangChain4j, covering chat completions, agents, RAG, tools, and shared providers.

AIDatabase

Deepseek

Tasks that call DeepSeek chat models for conversational or JSON-only responses.

Ollama

This sub-group of plugins contains general tasks for working with Ollama models and features.

Inference

More Plugins in this Category

AI

Deepseek

Ollama

1.1.0