Inference
Call the HuggingFace Inference API.
The Serverless Inference API offers a fast and free way to explore thousands of models for a variety of tasks. Whether you’re prototyping a new application or experimenting with ML capabilities, this API gives you instant access to high-performing models across multiple domains:
- Text Generation: Including large language models and tool-calling prompts, generate and experiment with high-quality responses.
- Image Generation: Easily create customized images, including LoRAs for your own styles.
- Document Embeddings: Build search and retrieval systems with SOTA embeddings.
- Classical AI Tasks: Ready-to-use models for text classification, image classification, speech recognition, and more.
type: "io.kestra.plugin.huggingface.Inference"
Examples
Use inference for text classification
id: huggingface_inference_text
namespace: company.team
tasks:
- id: huggingface_inference
type: io.kestra.plugin.huggingface.Inference
model: cardiffnlp/twitter-roberta-base-sentiment-latest
apiKey: "{{ secret('HUGGINGFACE_API_KEY') }}"
inputs: "I want a refund"
Use inference for image classification.
id: huggingface_inference
namespace: company.team
tasks:
- id: huggingface_inference_image
type: io.kestra.plugin.huggingface.Inference
model: google/vit-base-patch16-224
apiKey: "{{ secret('HUGGINGFACE_API_KEY') }}"
inputs: "{{ read('my-base64-image.txt') }}"
parameters:
function_to_apply: sigmoid,
top_k: 3
waitForModel: true
useCache: false
Properties
apiKey *Requiredstring
API Key
Huggingface API key (ex: hf_********)
inputs *Requiredstring
Inputs
Inputs required for the specific model
model *Requiredstring
Model
Model used for the Inference api (ex: cardiffnlp/twitter-roberta-base-sentiment-latest, google/gemma-2-2b-it)
endpoint string
https://api-inference.huggingface.co/models
API endpoint
Default value of the Huggingface API is https://api-inference.huggingface.co/models
options AbstractHttpTask-RequestOptions
Options
The options to set to customize the HTTP client
parameters object
Parameters
Map of optional parameters depending on the model
useCache booleanstring
true
Use cache
There is a cache layer on the inference API to speed up requests when the inputs are exactly the same. Many models, such as classifiers and embedding models, can use those results as is if they are deterministic, meaning the results will be the same. However, if you use a nondeterministic model, you can disable the cache mechanism from being used, resulting in a real new query.
waitForModel booleanstring
false
Wait for model
When a model is warm, it is ready to be used and you will get a response relatively quickly. However, some models are cold and need to be loaded before they can be used. In that case, you will get a 503 error.
Outputs
output object
Output returned by the Huggingface API
Definitions
java.nio.charset.Charset
io.kestra.plugin.huggingface.AbstractHttpTask-RequestOptions
connectTimeout string
duration
The time allowed to establish a connection to the server before failing.
connectionPoolIdleTimeout string
PT0S
duration
The time an idle connection can remain in the client's connection pool before being closed.
defaultCharset Charsetstring
UTF-8
The default charset for the request.
maxContentLength integerstring
10485760
The maximum content length of the response.
readIdleTimeout string
PT5M
duration
The time allowed for a read connection to remain idle before closing it.
readTimeout string
PT10S
duration
The maximum time allowed for reading data from the server before failing.