--- Title: Quickstart - GroqDocs URL Source: https://console.groq.com/docs/quickstart Markdown Content: Quickstart - GroqDocs =============== [Groq Cloud](https://console.groq.com/) [Groq Cloud](https://console.groq.com/) [Playground](https://console.groq.com/playground) [API Keys](https://console.groq.com/keys) [Dashboard](https://console.groq.com/dashboard) [Docs](https://console.groq.com/docs) [Settings](https://console.groq.com/settings) [Log In](https://console.groq.com/login) [Playground](https://console.groq.com/playground) [API Keys](https://console.groq.com/keys) [Dashboard](https://console.groq.com/dashboard) [Docs](https://console.groq.com/docs) [](https://console.groq.com/settings) [Log In](https://console.groq.com/login) Documentation ------------- [Docs](https://console.groq.com/docs/overview) [API Reference](https://console.groq.com/docs/api-reference) Search K Quickstart Docs ---- ### Get Started [Overview](https://console.groq.com/docs/overview)[Quickstart](https://console.groq.com/docs/quickstart)[OpenAI Compatibility](https://console.groq.com/docs/openai)[Models](https://console.groq.com/docs/models)[Rate Limits](https://console.groq.com/docs/rate-limits)[Examples](https://console.groq.com/docs/examples) ### Features [Text Generation](https://console.groq.com/docs/text-chat)[Speech to Text](https://console.groq.com/docs/speech-to-text)[Text to Speech](https://console.groq.com/docs/text-to-speech)[Images and Vision](https://console.groq.com/docs/vision)[Reasoning](https://console.groq.com/docs/reasoning)[Structured Outputs](https://console.groq.com/docs/structured-outputs) ### Compound [Overview](https://console.groq.com/docs/compound)[Systems](https://console.groq.com/docs/compound/systems)[Compound Beta](https://console.groq.com/docs/compound/systems/compound-beta)[Compound Beta Mini](https://console.groq.com/docs/compound/systems/compound-beta-mini)[Search Settings](https://console.groq.com/docs/compound/search-settings)[Use Cases](https://console.groq.com/docs/compound/use-cases) ### Advanced Features [Batch Processing](https://console.groq.com/docs/batch)[Flex Processing](https://console.groq.com/docs/flex-processing)[Content Moderation](https://console.groq.com/docs/content-moderation)[Prefilling](https://console.groq.com/docs/prefilling)[Tool Use](https://console.groq.com/docs/tool-use) ### Prompting Guide [Prompt Basics](https://console.groq.com/docs/prompting)[Prompt Patterns](https://console.groq.com/docs/prompting/patterns)[Model Migration](https://console.groq.com/docs/prompting/model-migration) ### Production Readiness [Optimizing Latency](https://console.groq.com/docs/production-readiness/optimizing-latency)[Production Checklist](https://console.groq.com/docs/production-readiness/production-ready-checklist) ### Developer Resources [Groq Libraries](https://console.groq.com/docs/libraries)[Groq Badge](https://console.groq.com/docs/badge)[Integrations Catalog](https://console.groq.com/docs/integrations) ### Console [Spend Limits](https://console.groq.com/docs/spend-limits)[Billing FAQs](https://console.groq.com/docs/billing-faqs) ### Support & Guidelines [Developer Community](https://community.groq.com/)[Errors](https://console.groq.com/docs/errors)[Changelog](https://console.groq.com/docs/changelog)[Policies & Notices](https://console.groq.com/docs/legal) Search K [Docs](https://console.groq.com/docs/overview) [API Reference](https://console.groq.com/docs/api-reference) ### Get Started [Overview](https://console.groq.com/docs/overview) [Quickstart](https://console.groq.com/docs/quickstart) [OpenAI Compatibility](https://console.groq.com/docs/openai) [Models](https://console.groq.com/docs/models) [Rate Limits](https://console.groq.com/docs/rate-limits) [Examples](https://console.groq.com/docs/examples) ### Features [Text Generation](https://console.groq.com/docs/text-chat) [Speech to Text](https://console.groq.com/docs/speech-to-text) [Text to Speech](https://console.groq.com/docs/text-to-speech) [Images and Vision](https://console.groq.com/docs/vision) [Reasoning](https://console.groq.com/docs/reasoning) [Structured Outputs](https://console.groq.com/docs/structured-outputs) ### Compound [Overview](https://console.groq.com/docs/compound) [Systems](https://console.groq.com/docs/compound/systems) [Search Settings](https://console.groq.com/docs/compound/search-settings) [Use Cases](https://console.groq.com/docs/compound/use-cases) ### Advanced Features [Batch Processing](https://console.groq.com/docs/batch) [Flex Processing](https://console.groq.com/docs/flex-processing) [Content Moderation](https://console.groq.com/docs/content-moderation) [Prefilling](https://console.groq.com/docs/prefilling) [Tool Use](https://console.groq.com/docs/tool-use) ### Prompting Guide [Prompt Basics](https://console.groq.com/docs/prompting) [Prompt Patterns](https://console.groq.com/docs/prompting/patterns) [Model Migration](https://console.groq.com/docs/prompting/model-migration) ### Production Readiness [Optimizing Latency](https://console.groq.com/docs/production-readiness/optimizing-latency) [Production Checklist](https://console.groq.com/docs/production-readiness/production-ready-checklist) ### Developer Resources [Groq Libraries](https://console.groq.com/docs/libraries) [Groq Badge](https://console.groq.com/docs/badge) [Integrations Catalog](https://console.groq.com/docs/integrations) ### Console [Spend Limits](https://console.groq.com/docs/spend-limits) [Billing FAQs](https://console.groq.com/docs/billing-faqs) ### Support & Guidelines [Developer Community](https://community.groq.com/) [Errors](https://console.groq.com/docs/errors) [Changelog](https://console.groq.com/docs/changelog) [Policies & Notices](https://console.groq.com/docs/legal) Quickstart ========== Get up and running with the Groq API in a few minutes. [Create an API Key](https://console.groq.com/docs/quickstart#create-an-api-key) ------------------------------------------------------------------------------- Please visit [here](https://console.groq.com/keys) to create an API Key. [Set up your API Key (recommended)](https://console.groq.com/docs/quickstart#set-up-your-api-key-recommended) ------------------------------------------------------------------------------------------------------------- Configure your API key as an environment variable. This approach streamlines your API usage by eliminating the need to include your API key in each request. Moreover, it enhances security by minimizing the risk of inadvertently including your API key in your codebase. ### [In your terminal of choice:](https://console.groq.com/docs/quickstart#in-your-terminal-of-choice) shell `export GROQ_API_KEY=` [Requesting your first chat completion](https://console.groq.com/docs/quickstart#requesting-your-first-chat-completion) ----------------------------------------------------------------------------------------------------------------------- curl JavaScript Python JSON ### [Install the Groq Python library:](https://console.groq.com/docs/quickstart#install-the-groq-python-library) shell `pip install groq` ### [Performing a Chat Completion:](https://console.groq.com/docs/quickstart#performing-a-chat-completion) Python ``` 1import os 2 3from groq import Groq 4 5client = Groq( 6 api_key=os.environ.get("GROQ_API_KEY"), 7) 8 9chat_completion = client.chat.completions.create( 10 messages=[ 11 { 12 "role": "user", 13 "content": "Explain the importance of fast language models", 14 } 15 ], 16 model="llama-3.3-70b-versatile", 17) 18 19print(chat_completion.choices[0].message.content) ``` [Using third-party libraries and SDKs](https://console.groq.com/docs/quickstart#using-thirdparty-libraries-and-sdks) -------------------------------------------------------------------------------------------------------------------- Vercel AI SDK LiteLLM LangChain ### [Using AI SDK:](https://console.groq.com/docs/quickstart#using-ai-sdk) [AI SDK](https://ai-sdk.dev/) is a Javascript-based open-source library that simplifies building large language model (LLM) applications. Documentation for how to use Groq on the AI SDK [can be found here](https://console.groq.com/docs/ai-sdk/). First, install the `ai` package and the Groq provider `@ai-sdk/groq`: shell `pnpm add ai @ai-sdk/groq` Then, you can use the Groq provider to generate text. By default, the provider will look for `GROQ_API_KEY` as the API key. JavaScript ``` 1import { groq } from '@ai-sdk/groq'; 2import { generateText } from 'ai'; 3 4const { text } = await generateText({ 5 model: groq('llama-3.3-70b-versatile'), 6 prompt: 'Write a vegetarian lasagna recipe for 4 people.', 7}); ``` Now that you have successfully received a chat completion, you can try out the other endpoints in the API. ### [Next Steps](https://console.groq.com/docs/quickstart#next-steps) * Check out the [Playground](https://console.groq.com/playground) to try out the Groq API in your browser * Join our GroqCloud developer community on [Discord](https://discord.gg/groq) * Add a how-to on your project to the [Groq API Cookbook](https://github.com/groq/groq-api-cookbook) ### Was this page helpful? Yes No Suggest Edits #### On this page * [Create an API Key](https://console.groq.com/docs/quickstart#create-an-api-key) * [Set up your API Key (recommended)](https://console.groq.com/docs/quickstart#set-up-your-api-key-recommended) * [Requesting your first chat completion](https://console.groq.com/docs/quickstart#requesting-your-first-chat-completion) * [Using third-party libraries and SDKs](https://console.groq.com/docs/quickstart#using-thirdparty-libraries-and-sdks) --- --- Title: OpenAI Compatibility - GroqDocs URL Source: https://console.groq.com/docs/openai Markdown Content: We designed Groq API to be mostly compatible with OpenAI's client libraries, making it easy to configure your existing applications to run on Groq and try our inference speed. We also have our own [Groq Python and Groq TypeScript libraries](https://console.groq.com/docs/libraries) that we encourage you to use. [Configuring OpenAI to Use Groq API](https://console.groq.com/docs/openai#configuring-openai-to-use-groq-api) ------------------------------------------------------------------------------------------------------------- To start using Groq with OpenAI's client libraries, pass your Groq API key to the `api_key` parameter and change the `base_url` to `https://api.groq.com/openai/v1`: python ``` import os import openai client = openai.OpenAI( base_url="https://api.groq.com/openai/v1", api_key=os.environ.get("GROQ_API_KEY") ) ``` You can find your API key [here](https://console.groq.com/keys). [Currently Unsupported OpenAI Features](https://console.groq.com/docs/openai#currently-unsupported-openai-features) ------------------------------------------------------------------------------------------------------------------- Note that although Groq API is mostly OpenAI compatible, there are a few features we don't support just yet: ### [Text Completions](https://console.groq.com/docs/openai#text-completions) The following fields are currently not supported and will result in a 400 error (yikes) if they are supplied: * `logprobs` * `logit_bias` * `top_logprobs` * `messages[].name` * If `N` is supplied, it must be equal to 1. ### [Temperature](https://console.groq.com/docs/openai#temperature) If you set a `temperature` value of 0, it will be converted to `1e-8`. If you run into any issues, please try setting the value to a float32 `> 0` and `<= 2`. ### [Audio Transcription and Translation](https://console.groq.com/docs/openai#audio-transcription-and-translation) The following values are not supported: * `vtt` * `srt` ### [Feedback](https://console.groq.com/docs/openai#feedback) If you'd like to see support for such features as the above on Groq API, please reach out to us and let us know by submitting a "Feature Request" via "Chat with us" in the menu after clicking your organization in the top right. We really value your feedback and would love to hear from you! 🤩 [Next Steps](https://console.groq.com/docs/openai#next-steps) ------------------------------------------------------------- Migrate your prompts to open-source models using our [model migration guide](https://console.groq.com/docs/prompting/model-migration), or learn more about prompting in our [prompting guide](https://console.groq.com/docs/prompting). --- --- Title: Supported Models - GroqDocs URL Source: https://console.groq.com/docs/models Markdown Content: Explore all available models on GroqCloud. [Featured Models](https://console.groq.com/docs/models#featured-models) ----------------------------------------------------------------------- [Production Models](https://console.groq.com/docs/models#production-models) --------------------------------------------------------------------------- **Note:** Production models are intended for use in your production environments. They meet or exceed our high standards for speed, quality, and reliability. Read more [here](https://console.groq.com/docs/deprecations). | MODEL ID | DEVELOPER | CONTEXT WINDOW (TOKENS) | MAX COMPLETION TOKENS | MAX FILE SIZE | DETAILS | | --- | --- | --- | --- | --- | --- | | gemma2-9b-it | Google | 8,192 | 8,192 | - | [Details](https://console.groq.com/docs/model/gemma2-9b-it) | | llama-3.1-8b-instant | Meta | 131,072 | 131,072 | - | [Details](https://console.groq.com/docs/model/llama-3.1-8b-instant) | | llama-3.3-70b-versatile | Meta | 131,072 | 32,768 | - | [Details](https://console.groq.com/docs/model/llama-3.3-70b-versatile) | | meta-llama/llama-guard-4-12b | Meta | 131,072 | 1,024 | 20 MB | [Details](https://console.groq.com/docs/model/meta-llama/llama-guard-4-12b) | | whisper-large-v3 | OpenAI | - | - | 100 MB | [Details](https://console.groq.com/docs/model/whisper-large-v3) | | whisper-large-v3-turbo | OpenAI | - | - | 100 MB | [Details](https://console.groq.com/docs/model/whisper-large-v3-turbo) | [Preview Models](https://console.groq.com/docs/models#preview-models) --------------------------------------------------------------------- **Note:** Preview models are intended for evaluation purposes only and should not be used in production environments as they may be discontinued at short notice. Read more about deprecations [here](https://console.groq.com/docs/deprecations). | MODEL ID | DEVELOPER | CONTEXT WINDOW (TOKENS) | MAX COMPLETION TOKENS | MAX FILE SIZE | DETAILS | | --- | --- | --- | --- | --- | --- | | deepseek-r1-distill-llama-70b | DeepSeek / Meta | 131,072 | 131,072 | - | [Details](https://console.groq.com/docs/model/deepseek-r1-distill-llama-70b) | | meta-llama/llama-4-maverick-17b-128e-instruct | Meta | 131,072 | 8,192 | 20 MB | [Details](https://console.groq.com/docs/model/meta-llama/llama-4-maverick-17b-128e-instruct) | | meta-llama/llama-4-scout-17b-16e-instruct | Meta | 131,072 | 8,192 | 20 MB | [Details](https://console.groq.com/docs/model/meta-llama/llama-4-scout-17b-16e-instruct) | | meta-llama/llama-prompt-guard-2-22m | Meta | 512 | 512 | - | [Details](https://console.groq.com/docs/model/meta-llama/llama-prompt-guard-2-22m) | | meta-llama/llama-prompt-guard-2-86m | Meta | 512 | 512 | - | [Details](https://console.groq.com/docs/model/meta-llama/llama-prompt-guard-2-86m) | | moonshotai/kimi-k2-instruct | Moonshot AI | 131,072 | 16,384 | - | [Details](https://console.groq.com/docs/model/moonshotai/kimi-k2-instruct) | | playai-tts | PlayAI | 8,192 | 8,192 | - | [Details](https://console.groq.com/docs/model/playai-tts) | | playai-tts-arabic | PlayAI | 8,192 | 8,192 | - | [Details](https://console.groq.com/docs/model/playai-tts-arabic) | | qwen/qwen3-32b | Alibaba Cloud | 131,072 | 40,960 | - | [Details](https://console.groq.com/docs/model/qwen/qwen3-32b) | [Preview Systems](https://console.groq.com/docs/models#preview-systems) ----------------------------------------------------------------------- Systems are a collection of models and tools that work together to answer a user query. **Note:** Preview systems are intended for evaluation purposes only and should not be used in production environments as they may be discontinued at short notice. Read more about deprecations [here](https://console.groq.com/docs/deprecations). | MODEL ID | DEVELOPER | CONTEXT WINDOW (TOKENS) | MAX COMPLETION TOKENS | MAX FILE SIZE | DETAILS | | --- | --- | --- | --- | --- | --- | | compound-beta | Groq | 131,072 | 8,192 | - | [Details](https://console.groq.com/docs/agentic-tooling/compound-beta) | | compound-beta-mini | Groq | 131,072 | 8,192 | - | [Details](https://console.groq.com/docs/agentic-tooling/compound-beta-mini) | [Learn More About Agentic Tooling Discover how to build powerful applications with real-time web search and code execution](https://console.groq.com/docs/agentic-tooling) Deprecated models are models that are no longer supported or will no longer be supported in the future. See our deprecation guidelines and deprecated models [here](https://console.groq.com/docs/deprecations). Hosted models are directly accessible through the GroqCloud Models API endpoint using the model IDs mentioned above. You can use the `https://api.groq.com/openai/v1/models` endpoint to return a JSON list of all active models: Python ``` import requests import os api_key = os.environ.get("GROQ_API_KEY") url = "https://api.groq.com/openai/v1/models" headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } response = requests.get(url, headers=headers) print(response.json()) ``` --- --- Title: Rate Limits - GroqDocs URL Source: https://console.groq.com/docs/rate-limits Markdown Content: Rate limits act as control measures to regulate how frequently users and applications can access our API within specified timeframes. These limits help ensure service stability, fair access, and protection against misuse so that we can serve reliable and fast inference for all. [Understanding Rate Limits](https://console.groq.com/docs/rate-limits#understanding-rate-limits) ------------------------------------------------------------------------------------------------ Rate limits are measured in: * **RPM:** Requests per minute * **RPD:** Requests per day * **TPM:** Tokens per minute * **TPD:** Tokens per day * **ASH:** Audio seconds per hour * **ASD:** Audio seconds per day Rate limits apply at the organization level, not individual users. You can hit any limit type depending on which threshold you reach first. **Example:** Let's say your RPM = 50 and your TPM = 200K. If you were to send 50 requests with only 100 tokens within a minute, you would reach your limit even though you did not send 200K tokens within those 50 requests. [Rate Limits](https://console.groq.com/docs/rate-limits#rate-limits) -------------------------------------------------------------------- The following is a high level summary and there may be exceptions to these limits. You can view the current, exact rate limits for your organization on the [limits page](https://console.groq.com/settings/limits) in your account settings. | MODEL ID | RPM | RPD | TPM | TPD | ASH | ASD | | --- | --- | --- | --- | --- | --- | --- | | allam-2-7b | 30 | 7000 | 6000 | 500000 | - | - | | compound-beta | 15 | 200 | 70000 | - | - | - | | compound-beta-mini | 15 | 200 | 70000 | - | - | - | | deepseek-r1-distill-llama-70b | 30 | 1000 | 6000 | 100000 | - | - | | distil-whisper-large-v3-en | 20 | 2000 | - | - | 7200 | 28800 | | gemma2-9b-it | 30 | 14400 | 15000 | 500000 | - | - | | llama-3.1-8b-instant | 30 | 14400 | 6000 | 500000 | - | - | | llama-3.3-70b-versatile | 30 | 1000 | 12000 | 100000 | - | - | | llama3-70b-8192 | 30 | 14400 | 6000 | 500000 | - | - | | llama3-8b-8192 | 30 | 14400 | 6000 | 500000 | - | - | | meta-llama/llama-4-maverick-17b-128e-instruct | 30 | 1000 | 6000 | 500000 | - | - | | meta-llama/llama-4-scout-17b-16e-instruct | 30 | 1000 | 30000 | 500000 | - | - | | meta-llama/llama-guard-4-12b | 30 | 14400 | 15000 | 500000 | - | - | | meta-llama/llama-prompt-guard-2-22m | 30 | 14400 | 15000 | 500000 | - | - | | meta-llama/llama-prompt-guard-2-86m | 30 | 14400 | 15000 | 500000 | - | - | | moonshotai/kimi-k2-instruct | 60 | 1000 | 10000 | 300000 | - | - | | playai-tts | 10 | 100 | 1200 | 3600 | - | - | | playai-tts-arabic | 10 | 100 | 1200 | 3600 | - | - | | qwen/qwen3-32b | 60 | 1000 | 6000 | 500000 | - | - | | whisper-large-v3 | 20 | 2000 | - | - | 7200 | 28800 | | whisper-large-v3-turbo | 20 | 2000 | - | - | 7200 | 28800 | In addition to viewing your limits on your account's [limits](https://console.groq.com/settings/limits) page, you can also view rate limit information such as remaining requests and tokens in HTTP response headers as follows: The following headers are set (values are illustrative): | Header | Value | Notes | | --- | --- | --- | | retry-after | 2 | In seconds | | x-ratelimit-limit-requests | 14400 | Always refers to Requests Per Day (RPD) | | x-ratelimit-limit-tokens | 18000 | Always refers to Tokens Per Minute (TPM) | | x-ratelimit-remaining-requests | 14370 | Always refers to Requests Per Day (RPD) | | x-ratelimit-remaining-tokens | 17997 | Always refers to Tokens Per Minute (TPM) | | x-ratelimit-reset-requests | 2m59.56s | Always refers to Requests Per Day (RPD) | | x-ratelimit-reset-tokens | 7.66s | Always refers to Tokens Per Minute (TPM) | [Handling Rate Limits](https://console.groq.com/docs/rate-limits#handling-rate-limits) -------------------------------------------------------------------------------------- When you exceed rate limits, our API returns a `429 Too Many Requests` HTTP status code. **Note**: `retry-after` is only set if you hit the rate limit and status code 429 is returned. The other headers are always included. --- --- Title: Text Generation - GroqDocs URL Source: https://console.groq.com/docs/text-chat Markdown Content: shell `pip install groq` [Performing a Basic Chat Completion](https://console.groq.com/docs/text-chat#performing-a-basic-chat-completion) ---------------------------------------------------------------------------------------------------------------- The simplest way to use the Chat Completions API is to send a list of messages and receive a single response. Messages are provided in chronological order, with each message containing a role ("system", "user", or "assistant") and content. Python ``` 1from groq import Groq 2 3client = Groq() 4 5chat_completion = client.chat.completions.create( 6 messages=[ 7 # Set an optional system message. This sets the behavior of the 8 # assistant and can be used to provide specific instructions for 9 # how it should behave throughout the conversation. 10 { 11 "role": "system", 12 "content": "You are a helpful assistant." 13 }, 14 # Set a user message for the assistant to respond to. 15 { 16 "role": "user", 17 "content": "Explain the importance of fast language models", 18 } 19 ], 20 21 # The language model which will generate the completion. 22 model="llama-3.3-70b-versatile" 23) 24 25# Print the completion returned by the LLM. 26print(chat_completion.choices[0].message.content) ``` [Streaming a Chat Completion](https://console.groq.com/docs/text-chat#streaming-a-chat-completion) -------------------------------------------------------------------------------------------------- For a more responsive user experience, you can stream the model's response in real-time. This allows your application to display the response as it's being generated, rather than waiting for the complete response. To enable streaming, set the parameter `stream=True`. The completion function will then return an iterator of completion deltas rather than a single, full completion. Python ``` 1from groq import Groq 2 3client = Groq() 4 5stream = client.chat.completions.create( 6 # 7 # Required parameters 8 # 9 messages=[ 10 # Set an optional system message. This sets the behavior of the 11 # assistant and can be used to provide specific instructions for 12 # how it should behave throughout the conversation. 13 { 14 "role": "system", 15 "content": "You are a helpful assistant." 16 }, 17 # Set a user message for the assistant to respond to. 18 { 19 "role": "user", 20 "content": "Explain the importance of fast language models", 21 } 22 ], 23 24 # The language model which will generate the completion. 25 model="llama-3.3-70b-versatile", 26 27 # 28 # Optional parameters 29 # 30 31 # Controls randomness: lowering results in less random completions. 32 # As the temperature approaches zero, the model will become deterministic 33 # and repetitive. 34 temperature=0.5, 35 36 # The maximum number of tokens to generate. Requests can use up to 37 # 2048 tokens shared between prompt and completion. 38 max_completion_tokens=1024, 39 40 # Controls diversity via nucleus sampling: 0.5 means half of all 41 # likelihood-weighted options are considered. 42 top_p=1, 43 44 # A stop sequence is a predefined or user-specified text string that 45 # signals an AI to stop generating content, ensuring its responses 46 # remain focused and concise. Examples include punctuation marks and 47 # markers like "[end]". 48 stop=None, 49 50 # If set, partial message deltas will be sent. 51 stream=True, 52) 53 54# Print the incremental deltas returned by the LLM. 55for chunk in stream: 56 print(chunk.choices[0].delta.content, end="") ``` [Performing a Chat Completion with a Stop Sequence](https://console.groq.com/docs/text-chat#performing-a-chat-completion-with-a-stop-sequence) ---------------------------------------------------------------------------------------------------------------------------------------------- Stop sequences allow you to control where the model should stop generating. When the model encounters any of the specified stop sequences, it will halt generation at that point. This is useful when you need responses to end at specific points. Python ``` 1from groq import Groq 2 3client = Groq() 4 5chat_completion = client.chat.completions.create( 6 # 7 # Required parameters 8 # 9 messages=[ 10 # Set an optional system message. This sets the behavior of the 11 # assistant and can be used to provide specific instructions for 12 # how it should behave throughout the conversation. 13 { 14 "role": "system", 15 "content": "You are a helpful assistant." 16 }, 17 # Set a user message for the assistant to respond to. 18 { 19 "role": "user", 20 "content": "Count to 10. Your response must begin with \"1, \". example: 1, 2, 3, ...", 21 } 22 ], 23 24 # The language model which will generate the completion. 25 model="llama-3.3-70b-versatile", 26 27 # 28 # Optional parameters 29 # 30 31 # Controls randomness: lowering results in less random completions. 32 # As the temperature approaches zero, the model will become deterministic 33 # and repetitive. 34 temperature=0.5, 35 36 # The maximum number of tokens to generate. Requests can use up to 37 # 2048 tokens shared between prompt and completion. 38 max_completion_tokens=1024, 39 40 # Controls diversity via nucleus sampling: 0.5 means half of all 41 # likelihood-weighted options are considered. 42 top_p=1, 43 44 # A stop sequence is a predefined or user-specified text string that 45 # signals an AI to stop generating content, ensuring its responses 46 # remain focused and concise. Examples include punctuation marks and 47 # markers like "[end]". 48 # For this example, we will use ", 6" so that the llm stops counting at 5. 49 # If multiple stop values are needed, an array of string may be passed, 50 # stop=[", 6", ", six", ", Six"] 51 stop=", 6", 52 53 # If set, partial message deltas will be sent. 54 stream=False, 55) 56 57# Print the completion returned by the LLM. 58print(chat_completion.choices[0].message.content) ``` [Performing an Async Chat Completion](https://console.groq.com/docs/text-chat#performing-an-async-chat-completion) ------------------------------------------------------------------------------------------------------------------ For applications that need to maintain responsiveness while waiting for completions, you can use the asynchronous client. This lets you make non-blocking API calls using Python's asyncio framework. Python ``` 1import asyncio 2 3from groq import AsyncGroq 4 5 6async def main(): 7 client = AsyncGroq() 8 9 chat_completion = await client.chat.completions.create( 10 # 11 # Required parameters 12 # 13 messages=[ 14 # Set an optional system message. This sets the behavior of the 15 # assistant and can be used to provide specific instructions for 16 # how it should behave throughout the conversation. 17 { 18 "role": "system", 19 "content": "You are a helpful assistant." 20 }, 21 # Set a user message for the assistant to respond to. 22 { 23 "role": "user", 24 "content": "Explain the importance of fast language models", 25 } 26 ], 27 28 # The language model which will generate the completion. 29 model="llama-3.3-70b-versatile", 30 31 # 32 # Optional parameters 33 # 34 35 # Controls randomness: lowering results in less random completions. 36 # As the temperature approaches zero, the model will become 37 # deterministic and repetitive. 38 temperature=0.5, 39 40 # The maximum number of tokens to generate. Requests can use up to 41 # 2048 tokens shared between prompt and completion. 42 max_completion_tokens=1024, 43 44 # Controls diversity via nucleus sampling: 0.5 means half of all 45 # likelihood-weighted options are considered. 46 top_p=1, 47 48 # A stop sequence is a predefined or user-specified text string that 49 # signals an AI to stop generating content, ensuring its responses 50 # remain focused and concise. Examples include punctuation marks and 51 # markers like "[end]". 52 stop=None, 53 54 # If set, partial message deltas will be sent. 55 stream=False, 56 ) 57 58 # Print the completion returned by the LLM. 59 print(chat_completion.choices[0].message.content) 60 61asyncio.run(main()) ``` ### [Streaming an Async Chat Completion](https://console.groq.com/docs/text-chat#streaming-an-async-chat-completion) You can combine the benefits of streaming and asynchronous processing by streaming completions asynchronously. This is particularly useful for applications that need to handle multiple concurrent conversations. Python ``` 1import asyncio 2 3from groq import AsyncGroq 4 5 6async def main(): 7 client = AsyncGroq() 8 9 stream = await client.chat.completions.create( 10 # 11 # Required parameters 12 # 13 messages=[ 14 # Set an optional system message. This sets the behavior of the 15 # assistant and can be used to provide specific instructions for 16 # how it should behave throughout the conversation. 17 { 18 "role": "system", 19 "content": "You are a helpful assistant." 20 }, 21 # Set a user message for the assistant to respond to. 22 { 23 "role": "user", 24 "content": "Explain the importance of fast language models", 25 } 26 ], 27 28 # The language model which will generate the completion. 29 model="llama-3.3-70b-versatile", 30 31 # 32 # Optional parameters 33 # 34 35 # Controls randomness: lowering results in less random completions. 36 # As the temperature approaches zero, the model will become 37 # deterministic and repetitive. 38 temperature=0.5, 39 40 # The maximum number of tokens to generate. Requests can use up to 41 # 2048 tokens shared between prompt and completion. 42 max_completion_tokens=1024, 43 44 # Controls diversity via nucleus sampling: 0.5 means half of all 45 # likelihood-weighted options are considered. 46 top_p=1, 47 48 # A stop sequence is a predefined or user-specified text string that 49 # signals an AI to stop generating content, ensuring its responses 50 # remain focused and concise. Examples include punctuation marks and 51 # markers like "[end]". 52 stop=None, 53 54 # If set, partial message deltas will be sent. 55 stream=True, 56 ) 57 58 # Print the incremental deltas returned by the LLM. 59 async for chunk in stream: 60 print(chunk.choices[0].delta.content, end="") 61 62asyncio.run(main()) ``` --- --- Title: Speech to Text - GroqDocs URL Source: https://console.groq.com/docs/speech-to-text Markdown Content: Groq API is the fastest speech-to-text solution available, offering OpenAI-compatible endpoints that enable near-instant transcriptions and translations. With Groq API, you can integrate high-quality audio processing into your applications at speeds that rival human interaction. [API Endpoints](https://console.groq.com/docs/speech-to-text#api-endpoints) --------------------------------------------------------------------------- We support two endpoints: | Endpoint | Usage | API Endpoint | | --- | --- | --- | | Transcriptions | Convert audio to text | `https://api.groq.com/openai/v1/audio/transcriptions` | | Translations | Translate audio to English text | `https://api.groq.com/openai/v1/audio/translations` | [Supported Models](https://console.groq.com/docs/speech-to-text#supported-models) --------------------------------------------------------------------------------- | Model ID | Model | Supported Language(s) | Description | | --- | --- | --- | --- | | `whisper-large-v3-turbo` | [Whisper Large V3 Turbo](https://console.groq.com/docs/model/whisper-large-v3-turbo) | Multilingual | A fine-tuned version of a pruned Whisper Large V3 designed for fast, multilingual transcription tasks. | | `whisper-large-v3` | [Whisper Large V3](https://console.groq.com/docs/model/whisper-large-v3) | Multilingual | Provides state-of-the-art performance with high accuracy for multilingual transcription and translation tasks. | [Which Whisper Model Should You Use?](https://console.groq.com/docs/speech-to-text#which-whisper-model-should-you-use) ---------------------------------------------------------------------------------------------------------------------- Having more choices is great, but let's try to avoid decision paralysis by breaking down the tradeoffs between models to find the one most suitable for your applications: * If your application is error-sensitive and requires multilingual support, use `whisper-large-v3` . * If your application requires multilingual support and you need the best price for performance, use `whisper-large-v3-turbo` . The following table breaks down the metrics for each model. | Model | Cost Per Hour | Language Support | Transcription Support | Translation Support | Real-time Speed Factor | Word Error Rate | | --- | --- | --- | --- | --- | --- | --- | | `whisper-large-v3` | $0.111 | Multilingual | Yes | Yes | 189 | 10.3% | | `whisper-large-v3-turbo` | $0.04 | Multilingual | Yes | No | 216 | 12% | [Working with Audio Files](https://console.groq.com/docs/speech-to-text#working-with-audio-files) ------------------------------------------------------------------------------------------------- ### Audio File Limitations Max File Size 25 MB (free tier), 100MB (dev tier) Max Attachment File Size 25 MB. If you need to process larger files, use the `url` parameter to specify a url to the file instead. Minimum File Length 0.01 seconds Minimum Billed Length 10 seconds. If you submit a request less than this, you will still be billed for 10 seconds. Supported File Types Either a URL or a direct file upload for `flac`, `mp3`, `mp4`, `mpeg`, `mpga`, `m4a`, `ogg`, `wav`, `webm` Single Audio Track Only the first track will be transcribed for files with multiple audio tracks. (e.g. dubbed video) Supported Response Formats `json`, `verbose_json`, `text` Supported Timestamp Granularities `segment`, `word` ### [Audio Preprocessing](https://console.groq.com/docs/speech-to-text#audio-preprocessing) Our speech-to-text models will downsample audio to 16KHz mono before transcribing, which is optimal for speech recognition. This preprocessing can be performed client-side if your original file is extremely large and you want to make it smaller without a loss in quality (without chunking, Groq API speech-to-text endpoints accept up to 25MB for free tier and 100MB for [dev tier](https://console.groq.com/settings/billing)). For lower latency, convert your files to `wav` format. When reducing file size, we recommend FLAC for lossless compression. The following `ffmpeg` command can be used to reduce file size: shell ``` ffmpeg \ -i \ -ar 16000 \ -ac 1 \ -map 0:a \ -c:a flac \ .flac ``` ### [Working with Larger Audio Files](https://console.groq.com/docs/speech-to-text#working-with-larger-audio-files) For audio files that exceed our size limits or require more precise control over transcription, we recommend implementing audio chunking. This process involves: 1. Breaking the audio into smaller, overlapping segments 2. Processing each segment independently 3. Combining the results while handling overlapping [To learn more about this process and get code for your own implementation, see the complete audio chunking tutorial in our Groq API Cookbook.](https://github.com/groq/groq-api-cookbook/tree/main/tutorials/audio-chunking) [Using the API](https://console.groq.com/docs/speech-to-text#using-the-api) --------------------------------------------------------------------------- The following are request parameters you can use in your transcription and translation requests: | Parameter | Type | Default | Description | | --- | --- | --- | --- | | `file` | `string` | Required unless using `url` instead | The audio file object for direct upload to translate/transcribe. | | `url` | `string` | Required unless using `file` instead | The audio URL to translate/transcribe (supports Base64URL). | | `language` | `string` | Optional | The language of the input audio. Supplying the input language in ISO-639-1 (i.e. `en,`tr`) format will improve accuracy and latency. The translations endpoint only supports 'en' as a parameter option. | | `model` | `string` | Required | ID of the model to use. | | `prompt` | `string` | Optional | Prompt to guide the model's style or specify how to spell unfamiliar words. (limited to 224 tokens) | | `response_format` | `string` | json | Define the output response format. Set to `verbose_json` to receive timestamps for audio segments. Set to `text` to return a text response. | | `temperature` | `float` | 0 | The temperature between 0 and 1. For translations and transcriptions, we recommend the default value of 0. | | `timestamp_granularities[]` | `array` | segment | The timestamp granularities to populate for this transcription. `response_format` must be set `verbose_json` to use timestamp granularities. Either or both of `word` and `segment` are supported. `segment` returns full metadata and `word` returns only word, start, and end timestamps. To get both word-level timestamps and full segment metadata, include both values in the array. | ### [Example Usage of Transcription Endpoint](https://console.groq.com/docs/speech-to-text#example-usage-of-transcription-endpoint) The transcription endpoint allows you to transcribe spoken words in audio or video files. The Groq SDK package can be installed using the following command: shell `pip install groq` The following code snippet demonstrates how to use Groq API to transcribe an audio file in Python: Python ``` 1import os 2import json 3from groq import Groq 4 5# Initialize the Groq client 6client = Groq() 7 8# Specify the path to the audio file 9filename = os.path.dirname(__file__) + "/YOUR_AUDIO.wav" # Replace with your audio file! 10 11# Open the audio file 12with open(filename, "rb") as file: 13 # Create a transcription of the audio file 14 transcription = client.audio.transcriptions.create( 15 file=file, # Required audio file 16 model="whisper-large-v3-turbo", # Required model to use for transcription 17 prompt="Specify context or spelling", # Optional 18 response_format="verbose_json", # Optional 19 timestamp_granularities = ["word", "segment"], # Optional (must set response_format to "json" to use and can specify "word", "segment" (default), or both) 20 language="en", # Optional 21 temperature=0.0 # Optional 22 ) 23 # To print only the transcription text, you'd use print(transcription.text) (here we're printing the entire transcription object to access timestamps) 24 print(json.dumps(transcription, indent=2, default=str)) ``` ### [Example Usage of Translation Endpoint](https://console.groq.com/docs/speech-to-text#example-usage-of-translation-endpoint) The translation endpoint allows you to translate spoken words in audio or video files to English. The Groq SDK package can be installed using the following command: shell `pip install groq` The following code snippet demonstrates how to use Groq API to translate an audio file in Python: Python ``` 1import os 2from groq import Groq 3 4# Initialize the Groq client 5client = Groq() 6 7# Specify the path to the audio file 8filename = os.path.dirname(__file__) + "/sample_audio.m4a" # Replace with your audio file! 9 10# Open the audio file 11with open(filename, "rb") as file: 12 # Create a translation of the audio file 13 translation = client.audio.translations.create( 14 file=(filename, file.read()), # Required audio file 15 model="whisper-large-v3", # Required model to use for translation 16 prompt="Specify context or spelling", # Optional 17 language="en", # Optional ('en' only) 18 response_format="json", # Optional 19 temperature=0.0 # Optional 20 ) 21 # Print the translation text 22 print(translation.text) ``` [Understanding Metadata Fields](https://console.groq.com/docs/speech-to-text#understanding-metadata-fields) ----------------------------------------------------------------------------------------------------------- When working with Groq API, setting `response_format` to `verbose_json` outputs each segment of transcribed text with valuable metadata that helps us understand the quality and characteristics of our transcription, including `avg_logprob`, `compression_ratio`, and `no_speech_prob`. This information can help us with debugging any transcription issues. Let's examine what this metadata tells us using a real example: JSON ``` { "id": 8, "seek": 3000, "start": 43.92, "end": 50.16, "text": " document that the functional specification that you started to read through that isn't just the", "tokens": [51061, 4166, 300, 264, 11745, 31256], "temperature": 0, "avg_logprob": -0.097569615, "compression_ratio": 1.6637554, "no_speech_prob": 0.012814695 } ``` As shown in the above example, we receive timing information as well as quality indicators. Let's gain a better understanding of what each field means: * `id:8`: The 9th segment in the transcription (counting begins at 0) * `seek`: Indicates where in the audio file this segment begins (3000 in this case) * `start` and `end` timestamps: Tell us exactly when this segment occurs in the audio (43.92 to 50.16 seconds in our example) * `avg_logprob` (Average Log Probability): -0.097569615 in our example indicates very high confidence. Values closer to 0 suggest better confidence, while more negative values (like -0.5 or lower) might indicate transcription issues. * `no_speech_prob` (No Speech Probability): 0.0.012814695 is very low, suggesting this is definitely speech. Higher values (closer to 1) would indicate potential silence or non-speech audio. * `compression_ratio`: 1.6637554 is a healthy value, indicating normal speech patterns. Unusual values (very high or low) might suggest issues with speech clarity or word boundaries. ### [Using Metadata for Debugging](https://console.groq.com/docs/speech-to-text#using-metadata-for-debugging) When troubleshooting transcription issues, look for these patterns: * Low Confidence Sections: If `avg_logprob` drops significantly (becomes more negative), check for background noise, multiple speakers talking simultaneously, unclear pronunciation, and strong accents. Consider cleaning up the audio in these sections or adjusting chunk sizes around problematic chunk boundaries. * Non-Speech Detection: High `no_speech_prob` values might indicate silence periods that could be trimmed, background music or noise, or non-verbal sounds being misinterpreted as speech. Consider noise reduction when preprocessing. * Unusual Speech Patterns: Unexpected `compression_ratio` values can reveal stuttering or word repetition, speaker talking unusually fast or slow, or audio quality issues affecting word separation. ### [Quality Thresholds and Regular Monitoring](https://console.groq.com/docs/speech-to-text#quality-thresholds-and-regular-monitoring) We recommend setting acceptable ranges for each metadata value we reviewed above and flagging segments that fall outside these ranges to be able to identify and adjust preprocessing or chunking strategies for flagged sections. By understanding and monitoring these metadata values, you can significantly improve your transcription quality and quickly identify potential issues in your audio processing pipeline. ### [Prompting Guidelines](https://console.groq.com/docs/speech-to-text#prompting-guidelines) The prompt parameter (max 224 tokens) helps provide context and maintain a consistent output style. Unlike chat completion prompts, these prompts only guide style and context, not specific actions. Best Practices * Provide relevant context about the audio content, such as the type of conversation, topic, or speakers involved. * Use the same language as the language of the audio file. * Steer the model's output by denoting proper spellings or emulate a specific writing style or tone. * Keep the prompt concise and focused on stylistic guidance. We can't wait to see what you build! 🚀 --- --- Title: Text to Speech - GroqDocs URL Source: https://console.groq.com/docs/text-to-speech Markdown Content: Learn how to instantly generate lifelike audio from text. [Overview](https://console.groq.com/docs/text-to-speech#overview) ----------------------------------------------------------------- The Groq API speech endpoint provides fast text-to-speech (TTS), enabling you to convert text to spoken audio in seconds with our available TTS models. With support for 23 voices, 19 in English and 4 in Arabic, you can instantly create life-like audio content for customer support agents, characters for game development, and more. [API Endpoint](https://console.groq.com/docs/text-to-speech#api-endpoint) ------------------------------------------------------------------------- | Endpoint | Usage | API Endpoint | | --- | --- | --- | | Speech | Convert text to audio | `https://api.groq.com/openai/v1/audio/speech` | [Supported Models](https://console.groq.com/docs/text-to-speech#supported-models) --------------------------------------------------------------------------------- | Model ID | Model Card | Supported Language(s) | Description | | --- | --- | --- | --- | | `playai-tts` | [Card](https://console.groq.com/docs/model/playai-tts) | English | High-quality TTS model for English speech generation. | | `playai-tts-arabic` | [Card](https://console.groq.com/docs/model/playai-tts-arabic) | Arabic | High-quality TTS model for Arabic speech generation. | [Working with Speech](https://console.groq.com/docs/text-to-speech#working-with-speech) --------------------------------------------------------------------------------------- ### [Quick Start](https://console.groq.com/docs/text-to-speech#quick-start) The speech endpoint takes four key inputs: * **model:**`playai-tts` or `playai-tts-arabic` * **input:** the text to generate audio from * **voice:** the desired voice for output * **response format:** defaults to `"wav"` The Groq SDK package can be installed using the following command: shell `pip install groq` The following is an example of a request using `playai-tts`. To use the Arabic model, use the `playai-tts-arabic` model ID and an Arabic prompt: Python ``` import os from groq import Groq client = Groq(api_key=os.environ.get("GROQ_API_KEY")) speech_file_path = "speech.wav" model = "playai-tts" voice = "Fritz-PlayAI" text = "I love building and shipping new features for our users!" response_format = "wav" response = client.audio.speech.create( model=model, voice=voice, input=text, response_format=response_format ) response.write_to_file(speech_file_path) ``` ### [Parameters](https://console.groq.com/docs/text-to-speech#parameters) | Parameter | Type | Required | Value | Description | | --- | --- | --- | --- | --- | | `model` | string | Yes | `playai-tts` `playai-tts-arabic` | Model ID to use for TTS. | | `input` | string | Yes | - | User input text to be converted to speech. Maximum length is 10K characters. | | `voice` | string | Yes | See available [English](https://console.groq.com/docs/text-to-speech/#available-english-voices) and [Arabic](https://console.groq.com/docs/text-to-speech/#available-arabic-voices) voices. | The voice to use for audio generation. There are currently 26 English options for `playai-tts` and 4 Arabic options for `playai-tts-arabic`. | | `response_format` | string | Optional | `"wav"` | Format of the response audio file. Defaults to currently supported `"wav"`. | ### [Available English Voices](https://console.groq.com/docs/text-to-speech#available-english-voices) The `playai-tts` model currently supports 19 English voices that you can pass into the `voice` parameter (`Arista-PlayAI`, `Atlas-PlayAI`, `Basil-PlayAI`, `Briggs-PlayAI`, `Calum-PlayAI`, `Celeste-PlayAI`, `Cheyenne-PlayAI`, `Chip-PlayAI`, `Cillian-PlayAI`, `Deedee-PlayAI`, `Fritz-PlayAI`, `Gail-PlayAI`, `Indigo-PlayAI`, `Mamaw-PlayAI`, `Mason-PlayAI`, `Mikail-PlayAI`, `Mitch-PlayAI`, `Quinn-PlayAI`, `Thunder-PlayAI`). Experiment to find the voice you need for your application: Arista-PlayAI 0:00 0:00 Atlas-PlayAI 0:00 0:00 Basil-PlayAI 0:00 0:00 Briggs-PlayAI 0:00 0:00 Calum-PlayAI 0:00 0:00 Celeste-PlayAI 0:00 0:00 Cheyenne-PlayAI 0:00 0:00 Chip-PlayAI 0:00 0:00 Cillian-PlayAI 0:00 0:00 Deedee-PlayAI 0:00 0:00 Fritz-PlayAI 0:00 0:00 Gail-PlayAI 0:00 0:00 Indigo-PlayAI 0:00 0:00 Mamaw-PlayAI 0:00 0:00 Mason-PlayAI 0:00 0:00 Mikail-PlayAI 0:00 0:00 Mitch-PlayAI 0:00 0:00 Quinn-PlayAI 0:00 0:00 Thunder-PlayAI 0:00 0:00 ### [Available Arabic Voices](https://console.groq.com/docs/text-to-speech#available-arabic-voices) The `playai-tts-arabic` model currently supports 4 Arabic voices that you can pass into the `voice` parameter (`Ahmad-PlayAI`, `Amira-PlayAI`, `Khalid-PlayAI`, `Nasser-PlayAI`). Experiment to find the voice you need for your application: Ahmad-PlayAI 0:00 0:00 Amira-PlayAI 0:00 0:00 Khalid-PlayAI 0:00 0:00 Nasser-PlayAI 0:00 0:00 --- --- Title: Reasoning - GroqDocs URL Source: https://console.groq.com/docs/reasoning Markdown Content: Reasoning models excel at complex problem-solving tasks that require step-by-step analysis, logical deduction, and structured thinking and solution validation. With Groq inference speed, these types of models can deliver instant reasoning capabilities critical for real-time applications. [Why Speed Matters for Reasoning](https://console.groq.com/docs/reasoning#why-speed-matters-for-reasoning) ---------------------------------------------------------------------------------------------------------- Reasoning models are capable of complex decision making with explicit reasoning chains that are part of the token output and used for decision-making, which make low-latency and fast inference essential. Complex problems often require multiple chains of reasoning tokens where each step build on previous results. Low latency compounds benefits across reasoning chains and shaves off minutes of reasoning to a response in seconds. [Supported Models](https://console.groq.com/docs/reasoning#supported-models) ---------------------------------------------------------------------------- | Model ID | Model | | --- | --- | | `qwen/qwen3-32b` | [Qwen 3 32B](https://console.groq.com/docs/model/qwen3-32b) | | `deepseek-r1-distill-llama-70b` | [DeepSeek R1 Distil Llama 70B](https://console.groq.com/docs/model/deepseek-r1-distill-llama-70b) | [Reasoning Format](https://console.groq.com/docs/reasoning#reasoning-format) ---------------------------------------------------------------------------- Groq API supports explicit reasoning formats through the `reasoning_format` parameter, giving you fine-grained control over how the model's reasoning process is presented. This is particularly valuable for valid JSON outputs, debugging, and understanding the model's decision-making process. **Note:** The format defaults to `raw` or `parsed` when JSON mode or tool use are enabled as those modes do not support `raw`. If reasoning is explicitly set to `raw` with JSON mode or tool use enabled, we will return a 400 error. ### [Options for Reasoning Format](https://console.groq.com/docs/reasoning#options-for-reasoning-format) | `reasoning_format` Options | Description | | --- | --- | | `parsed` | Separates reasoning into a dedicated field while keeping the response concise. | | `raw` | Includes reasoning within think tags in the content. | | `hidden` | Returns only the final answer. | ### [Options for Reasoning Effort](https://console.groq.com/docs/reasoning#options-for-reasoning-effort) The `reasoning_effort` parameter controls the level of effort the model will put into reasoning. This is only supported by [Qwen 3 32B](https://console.groq.com/docs/model/qwen3-32b). | `reasoning_effort` Options | Description | | --- | --- | | `none` | Disable reasoning. The model will not use any reasoning tokens. | | `default` | Enable reasoning. | [Quick Start](https://console.groq.com/docs/reasoning#quick-start) ------------------------------------------------------------------ Python ``` 1from groq import Groq 2 3client = Groq() 4completion = client.chat.completions.create( 5 model="deepseek-r1-distill-llama-70b", 6 messages=[ 7 { 8 "role": "user", 9 "content": "How many r's are in the word strawberry?" 10 } 11 ], 12 temperature=0.6, 13 max_completion_tokens=1024, 14 top_p=0.95, 15 stream=True, 16 reasoning_format="raw" 17) 18 19for chunk in completion: 20 print(chunk.choices[0].delta.content or "", end="") ``` [Quick Start with Tool Use](https://console.groq.com/docs/reasoning#quick-start-with-tool-use) ---------------------------------------------------------------------------------------------- bash ``` curl https://api.groq.com//openai/v1/chat/completions -s \ -H "authorization: bearer $GROQ_API_KEY" \ -d '{ "model": "deepseek-r1-distill-llama-70b", "messages": [ { "role": "user", "content": "What is the weather like in Paris today?" } ], "tools": [ { "type": "function", "function": { "name": "get_weather", "description": "Get current temperature for a given location.", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "City and country e.g. Bogotá, Colombia" } }, "required": [ "location" ], "additionalProperties": false }, "strict": true } } ]}' ``` [Recommended Configuration Parameters](https://console.groq.com/docs/reasoning#recommended-configuration-parameters) -------------------------------------------------------------------------------------------------------------------- | Parameter | Default | Range | Description | | --- | --- | --- | --- | | `messages` | - | - | Array of message objects. Important: Avoid system prompts - include all instructions in the user message! | | `temperature` | 0.6 | 0.0 - 2.0 | Controls randomness in responses. Lower values make responses more deterministic. Recommended range: 0.5-0.7 to prevent repetitions or incoherent outputs | | `max_completion_tokens` | 1024 | - | Maximum length of model's response. Default may be too low for complex reasoning - consider increasing for detailed step-by-step solutions | | `top_p` | 0.95 | 0.0 - 1.0 | Controls diversity of token selection | | `stream` | false | boolean | Enables response streaming. Recommended for interactive reasoning tasks | | `stop` | null | string/array | Custom stop sequences | | `seed` | null | integer | Set for reproducible results. Important for benchmarking - run multiple tests with different seeds | | `response_format` | `{type: "text"}` | `{type: "json_object"}` or `{type: "text"}` | Set to `json_object` type for structured output. | | `reasoning_format` | `raw` | `"parsed"`, `"raw"`, `"hidden"` | Controls how model reasoning is presented in the response. Must be set to either `parsed` or `hidden` when using tool calling or JSON mode. | | `reasoning_effort` | `default` | `"none"`, `"default"` | Controls the level of effort the model will put into reasoning. This is only supported by [Qwen 3 32B](https://console.groq.com/docs/model/qwen3-32b). | [Optimizing Performance](https://console.groq.com/docs/reasoning#optimizing-performance) ---------------------------------------------------------------------------------------- ### [Temperature and Token Management](https://console.groq.com/docs/reasoning#temperature-and-token-management) The model performs best with temperature settings between 0.5-0.7, with lower values (closer to 0.5) producing more consistent mathematical proofs and higher values allowing for more creative problem-solving approaches. Monitor and adjust your token usage based on the complexity of your reasoning tasks - while the default max_completion_tokens is 1024, complex proofs may require higher limits. ### [Prompt Engineering](https://console.groq.com/docs/reasoning#prompt-engineering) To ensure accurate, step-by-step reasoning while maintaining high performance: * DeepSeek-R1 works best when all instructions are included directly in user messages rather than system prompts. * Structure your prompts to request explicit validation steps and intermediate calculations. * Avoid few-shot prompting and go for zero-shot prompting only. --- --- Title: Images and Vision - GroqDocs URL Source: https://console.groq.com/docs/vision Markdown Content: Groq API offers fast inference and low latency for multimodal models with vision capabilities for understanding and interpreting visual data from images. By analyzing the content of an image, multimodal models can generate human-readable text for providing insights about given visual data. [Supported Models](https://console.groq.com/docs/vision#supported-models) ------------------------------------------------------------------------- Groq API supports powerful multimodal models that can be easily integrated into your applications to provide fast and accurate image processing for tasks such as visual question answering, caption generation, and Optical Character Recognition (OCR). ### [meta-llama/llama-4-scout-17b-16e-instruct](https://console.groq.com/docs/model/llama-4-scout-17b-16e-instruct) Model ID `meta-llama/llama-4-scout-17b-16e-instruct` Description A powerful multimodal model capable of processing both text and image inputs that supports multilingual, multi-turn conversations, tool use, and JSON mode. Context Window 128K tokens Preview Model Currently in preview and should be used for experimentation. Image Size Limit Maximum allowed size for a request containing an image URL as input is 20MB. Requests larger than this limit will return a 400 error. Image Resolution Limit Maximum allowed resolution for a request containing images is 33 megapixels (33177600 total pixels) per image. Images larger than this limit will return a 400 error. Request Size Limit (Base64 Encoded Images) Maximum allowed size for a request containing a base64 encoded image is 4MB. Requests larger than this limit will return a 413 error. Images per Request You can process a maximum of 5 images. [How to Use Vision](https://console.groq.com/docs/vision#how-to-use-vision) --------------------------------------------------------------------------- Use Groq API vision features via: * **GroqCloud Console Playground**: Use [Llama 4 Scout](https://console.groq.com/playground?model=meta-llama/llama-4-scout-17b-16e-instruct) or [Llama 4 Maverick](https://console.groq.com/playground?model=meta-llama/llama-4-maverick-17b-128e-instruct) as the model and upload your image. * **Groq API Request:** Call the [`chat.completions`](https://console.groq.com/docs/text-chat#generating-chat-completions-with-groq-sdk) API endpoint and set the model to `meta-llama/llama-4-scout-17b-16e-instruct` or `meta-llama/llama-4-maverick-17b-128e-instruct` . See code examples below. [How to Pass Images from URLs as Input](https://console.groq.com/docs/vision#how-to-pass-images-from-urls-as-input) ------------------------------------------------------------------------------------------------------------------- The following are code examples for passing your image to the model via a URL: Python ``` 1from groq import Groq 2import os 3 4client = Groq(api_key=os.environ.get("GROQ_API_KEY")) 5completion = client.chat.completions.create( 6 model="meta-llama/llama-4-scout-17b-16e-instruct", 7 messages=[ 8 { 9 "role": "user", 10 "content": [ 11 { 12 "type": "text", 13 "text": "What's in this image?" 14 }, 15 { 16 "type": "image_url", 17 "image_url": { 18 "url": "https://upload.wikimedia.org/wikipedia/commons/f/f2/LPU-v1-die.jpg" 19 } 20 } 21 ] 22 } 23 ], 24 temperature=1, 25 max_completion_tokens=1024, 26 top_p=1, 27 stream=False, 28 stop=None, 29) 30 31print(completion.choices[0].message) ``` [How to Pass Locally Saved Images as Input](https://console.groq.com/docs/vision#how-to-pass-locally-saved-images-as-input) --------------------------------------------------------------------------------------------------------------------------- To pass locally saved images, we'll need to first encode our image to a base64 format string before passing it as the `image_url` in our API request as follows: Python ``` 1from groq import Groq 2import base64 3import os 4 5# Function to encode the image 6def encode_image(image_path): 7 with open(image_path, "rb") as image_file: 8 return base64.b64encode(image_file.read()).decode('utf-8') 9 10# Path to your image 11image_path = "sf.jpg" 12 13# Getting the base64 string 14base64_image = encode_image(image_path) 15 16client = Groq(api_key=os.environ.get("GROQ_API_KEY")) 17 18chat_completion = client.chat.completions.create( 19 messages=[ 20 { 21 "role": "user", 22 "content": [ 23 {"type": "text", "text": "What's in this image?"}, 24 { 25 "type": "image_url", 26 "image_url": { 27 "url": f"data:image/jpeg;base64,{base64_image}", 28 }, 29 }, 30 ], 31 } 32 ], 33 model="meta-llama/llama-4-scout-17b-16e-instruct", 34) 35 36print(chat_completion.choices[0].message.content) ``` [Tool Use with Images](https://console.groq.com/docs/vision#tool-use-with-images) --------------------------------------------------------------------------------- The `meta-llama/llama-4-scout-17b-16e-instruct`, `meta-llama/llama-4-maverick-17b-128e-instruct` models support tool use! The following cURL example defines a `get_current_weather` tool that the model can leverage to answer a user query that contains a question about the weather along with an image of a location that the model can infer location (i.e. New York City) from: shell ``` curl https://api.groq.com/openai/v1/chat/completions -s \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $GROQ_API_KEY" \ -d '{ "model": "meta-llama/llama-4-scout-17b-16e-instruct", "messages": [ { "role": "user", "content": [{"type": "text", "text": "Whats the weather like in this state?"}, {"type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"}}] } ], "tools": [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"] } }, "required": ["location"] } } } ], "tool_choice": "auto" }' | jq '.choices[0].message.tool_calls' ``` The following is the output from our example above that shows how our model inferred the state as New York from the given image and called our example function: python ``` [ { "id": "call_q0wg", "function": { "arguments": "{\"location\": \"New York, NY\",\"unit\": \"fahrenheit\"}", "name": "get_current_weather" }, "type": "function" } ] ``` [JSON Mode with Images](https://console.groq.com/docs/vision#json-mode-with-images) ----------------------------------------------------------------------------------- The `meta-llama/llama-4-scout-17b-16e-instruct` and `meta-llama/llama-4-maverick-17b-128e-instruct` models support JSON mode! The following Python example queries the model with an image and text (i.e. "Please pull out relevant information as a JSON object.") with `response_format` set for JSON mode: Python ``` 1from groq import Groq 2import os 3 4client = Groq(api_key=os.environ.get("GROQ_API_KEY")) 5 6completion = client.chat.completions.create( 7 model="meta-llama/llama-4-scout-17b-16e-instruct", 8 messages=[ 9 { 10 "role": "user", 11 "content": [ 12 { 13 "type": "text", 14 "text": "List what you observe in this photo in JSON format." 15 }, 16 { 17 "type": "image_url", 18 "image_url": { 19 "url": "https://upload.wikimedia.org/wikipedia/commons/d/da/SF_From_Marin_Highlands3.jpg" 20 } 21 } 22 ] 23 } 24 ], 25 temperature=1, 26 max_completion_tokens=1024, 27 top_p=1, 28 stream=False, 29 response_format={"type": "json_object"}, 30 stop=None, 31) 32 33print(completion.choices[0].message) ``` [Multi-turn Conversations with Images](https://console.groq.com/docs/vision#multiturn-conversations-with-images) ---------------------------------------------------------------------------------------------------------------- The `meta-llama/llama-4-scout-17b-16e-instruct` and `meta-llama/llama-4-maverick-17b-128e-instruct` models support multi-turn conversations! The following Python example shows a multi-turn user conversation about an image: Python ``` 1from groq import Groq 2import os 3 4client = Groq(api_key=os.environ.get("GROQ_API_KEY")) 5 6completion = client.chat.completions.create( 7 model="meta-llama/llama-4-scout-17b-16e-instruct", 8 messages=[ 9 { 10 "role": "user", 11 "content": [ 12 { 13 "type": "text", 14 "text": "What is in this image?" 15 }, 16 { 17 "type": "image_url", 18 "image_url": { 19 "url": "https://upload.wikimedia.org/wikipedia/commons/d/da/SF_From_Marin_Highlands3.jpg" 20 } 21 } 22 ] 23 }, 24 { 25 "role": "user", 26 "content": "Tell me more about the area." 27 } 28 ], 29 temperature=1, 30 max_completion_tokens=1024, 31 top_p=1, 32 stream=False, 33 stop=None, 34) 35 36print(completion.choices[0].message) ``` [Venture Deeper into Vision](https://console.groq.com/docs/vision#venture-deeper-into-vision) --------------------------------------------------------------------------------------------- ### [Use Cases to Explore](https://console.groq.com/docs/vision#use-cases-to-explore) Vision models can be used in a wide range of applications. Here are some ideas: * **Accessibility Applications:** Develop an application that generates audio descriptions for images by using a vision model to generate text descriptions for images, which can then be converted to audio with one of our audio endpoints. * **E-commerce Product Description Generation:** Create an application that generates product descriptions for e-commerce websites. * **Multilingual Image Analysis:** Create applications that can describe images in multiple languages. * **Multi-turn Visual Conversations:** Develop interactive applications that allow users to have extended conversations about images. These are just a few ideas to get you started. The possibilities are endless, and we're excited to see what you create with vision models powered by Groq for low latency and fast inference! ### [Next Steps](https://console.groq.com/docs/vision#next-steps) Check out our [Groq API Cookbook](https://github.com/groq/groq-api-cookbook) repository on GitHub (and give us a ⭐) for practical examples and tutorials: We're always looking for contributions. If you have any cool tutorials or guides to share, submit a pull request for review to help our open-source community! --- --- Title: Overview - GroqDocs URL Source: https://console.groq.com/docs/agentic-tooling Markdown Content: Compound -------- While LLMs excel at generating text, [`compound-beta`](https://console.groq.com/docs/compound/systems/compound-beta) takes the next step. It's an advanced AI system that is designed to solve problems by taking action and intelligently uses external tools - starting with web search and code execution - alongside the powerful Llama 4 models and Llama 3.3 70b model. This allows it access to real-time information and interaction with external environments, providing more accurate, up-to-date, and capable responses than an LLM alone. [Available Compound Systems](https://console.groq.com/docs/agentic-tooling#available-compound-systems) ------------------------------------------------------------------------------------------------------ There are two compound systems available: * [`compound-beta`](https://console.groq.com/docs/compound/systems/compound-beta): supports multiple tool calls per request. This system is great for use cases that require multiple web searches or code executions per request. * [`compound-beta-mini`](https://console.groq.com/docs/compound/systems/compound-beta-mini): supports a single tool call per request. This system is great for use cases that require a single web search or code execution per request. `compound-beta-mini` has an average of 3x lower latency than `compound-beta`. Both systems support the following tools: * Web Search * Code Execution via [E2B](https://e2b.dev/) (only Python is currently supported) Custom [user-provided tools](https://console.groq.com/docs/tool-use) are not supported at this time. [Quickstart](https://console.groq.com/docs/agentic-tooling#quickstart) ---------------------------------------------------------------------- To use compound systems, change the `model` parameter to either `compound-beta` or `compound-beta-mini`: Python ``` 1from groq import Groq 2 3client = Groq() 4 5completion = client.chat.completions.create( 6 messages=[ 7 { 8 "role": "user", 9 "content": "What is the current weather in Tokyo?", 10 } 11 ], 12 # Change model to compound-beta to use agentic tooling 13 # model: "llama-3.3-70b-versatile", 14 model="compound-beta", 15) 16 17print(completion.choices[0].message.content) 18# Print all tool calls 19# print(completion.choices[0].message.executed_tools) ``` And that's it! When the API is called, it will intelligently decide when to use search or code execution to best answer the user's query. These tool calls are performed on the server side, so no additional setup is required on your part to use agentic tooling. In the above example, the API will use its build in web search tool to find the current weather in Tokyo. If you didn't use compound systems, you might have needed to add your own custom tools to make API requests to a weather service, then perform multiple API calls to Groq to get a final result. Instead, with compound systems, you can get a final result with a single API call. [Executed Tools](https://console.groq.com/docs/agentic-tooling#executed-tools) ------------------------------------------------------------------------------ To view the tools (search or code execution) used automatically by the compound system, check the `executed_tools` field in the response: Python ``` 1import os 2from groq import Groq 3 4client = Groq(api_key=os.environ.get("GROQ_API_KEY")) 5 6response = client.chat.completions.create( 7 model="compound-beta", 8 messages=[ 9 {"role": "user", "content": "What did Groq release last week?"} 10 ] 11) 12# Log the tools that were used to generate the response 13print(response.choices[0].message.executed_tools) ``` [What's Next?](https://console.groq.com/docs/agentic-tooling#whats-next) ------------------------------------------------------------------------ Now that you understand the basics of compound systems, explore these topics: * **[Systems](https://console.groq.com/docs/compound/systems)** - Learn about the two compound systems and when to use each one * **[Search Settings](https://console.groq.com/docs/compound/search-settings)** - Customize web search behavior with domain filtering * **[Use Cases](https://console.groq.com/docs/compound/use-cases)** - Explore practical applications and detailed examples --- --- Title: Groq Batch API - GroqDocs URL Source: https://console.groq.com/docs/batch Markdown Content: Process large-scale workloads asynchronously with our Batch API. [What is Batch Processing?](https://console.groq.com/docs/batch#what-is-batch-processing) ----------------------------------------------------------------------------------------- Batch processing lets you run thousands of API requests at scale by submitting your workload as an asynchronous batch of requests to Groq with 50% lower cost, no impact to your standard rate limits, and 24-hour to 7 day processing window. [Overview](https://console.groq.com/docs/batch#overview) -------------------------------------------------------- While some of your use cases may require synchronous API requests, asynchronous batch processing is perfect for use cases that don't need immediate reponses or for processing a large number of queries that standard rate limits cannot handle, such as processing large datasets, generating content in bulk, and running evaluations. Compared to using our synchronous API endpoints, our Batch API has: * **Higher rate limits:** Process thousands of requests per batch with no impact on your standard API rate limits * **Cost efficiency:** 50% cost discount compared to synchronous APIs [Model Availability and Pricing](https://console.groq.com/docs/batch#model-availability-and-pricing) ---------------------------------------------------------------------------------------------------- The Batch API can currently be used to execute queries for chat completion (both text and vision), audio transcription, and audio translation inputs with the following models: * `mistral-saba-24b` * `llama-3.3-70b-versatile` * `deepseek-r1-distill-llama-70b` * `llama-3.1-8b-instant` * `meta-llama/llama-4-scout-17b-16e-instruct` * `meta-llama/llama-4-maverick-17b-128e-instruct` * `meta-llama/llama-guard-4-12b` Pricing is at a 50% cost discount compared to [synchronous API pricing.](https://groq.com/pricing) [Getting Started](https://console.groq.com/docs/batch#getting-started) ---------------------------------------------------------------------- Our Batch API endpoints allow you to collect a group of requests into a single file, kick off a batch processing job to execute the requests within your file, query for the status of your batch, and eventually retrieve the results when your batch is complete. Multiple batch jobs can be submitted at once. Each batch has a processing window, during which we'll process as many requests as our capacity allows while maintaining service quality for all users. We allow for setting a batch window from 24 hours to 7 days and recommend setting a longer batch window allow us more time to complete your batch jobs instead of expiring them. ### [1. Prepare Your Batch File](https://console.groq.com/docs/batch#1-prepare-your-batch-file) A batch is composed of a list of API requests and every batch job starts with a JSON Lines (JSONL) file that contains the requests you want processed. Each line in this file represents a single API call. The Groq Batch API currently supports: * Chat completion requests through [`/v1/chat/completions`](https://console.groq.com/docs/text-chat) * Audio transcription requests through [`/v1/audio/transcriptions`](https://console.groq.com/docs/speech-to-text) * Audio translation requests through [`/v1/audio/translations`](https://console.groq.com/docs/speech-to-text) The structure for each line must include: * `custom_id`: Your unique identifier for tracking the batch request * `method`: The HTTP method (currently `POST` only) * `url`: The API endpoint to call (one of: `/v1/chat/completions`, `/v1/audio/transcriptions`, or `/v1/audio/translations`) * `body`: The parameters of your request matching our synchronous API format. See our API Reference [here.](https://console.groq.com/docs/api-reference#chat-create) The following is an example of a JSONL batch file with different types of requests: JSON ``` {"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "llama-3.1-8b-instant", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is 2+2?"}]}} {"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "llama-3.1-8b-instant", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is 2+3?"}]}} {"custom_id": "request-3", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "llama-3.1-8b-instant", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "count up to 1000000. starting with 1, 2, 3. print all the numbers, do not stop until you get to 1000000."}]}} ``` #### [Converting Sync Calls to Batch Format](https://console.groq.com/docs/batch#converting-sync-calls-to-batch-format) If you're familiar with making synchronous API calls, converting them to batch format is straightforward. Here's how a regular API call transforms into a batch request: JSON ``` # Your typical synchronous API call in Python: response = client.chat.completions.create( model="llama-3.1-8b-instant", messages=[ {"role": "user", "content": "What is quantum computing?"} ] ) # The same call in batch format (must be on a single line as JSONL): {"custom_id": "quantum-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "llama-3.1-8b-instant", "messages": [{"role": "user", "content": "What is quantum computing?"}]}} ``` ### [2. Upload Your Batch File](https://console.groq.com/docs/batch#2-upload-your-batch-file) Upload your `.jsonl` batch file using the Files API endpoint for when kicking off your batch job: **Note:** The Files API currently only supports `.jsonl` files 50,000 lines or less and up to maximum of 200MB in size. There is no limit for the number of batch jobs you can submit. We recommend submitting multiple shorter batch files for a better chance of completion. ``` import os from groq import Groq client = Groq(api_key=os.environ.get("GROQ_API_KEY")) file_path = "batch_file.jsonl" response = client.files.create(file=open(file_path, "rb"), purpose="batch") print(response) ``` You will receive a JSON response that contains the ID (`id`) for your file object that you will then use to create your batch job: JSON ``` { "id":"file_01jh6x76wtemjr74t1fh0faj5t", "object":"file", "bytes":966, "created_at":1736472501, "filename":"input_file.jsonl", "purpose":"batch" } ``` ### [3. Create Your Batch Job](https://console.groq.com/docs/batch#3-create-your-batch-job) Once you've uploaded your `.jsonl` file, you can use the file object ID (in this case, `file_01jh6x76wtemjr74t1fh0faj5t` as shown in Step 2) to create a batch: **Note:** The completion window for batch jobs can be set from to 24 hours (`24h`) to 7 days (`7d`). We recommend setting a longer batch window to have a better chance for completed batch jobs rather than expirations for when we are under heavy load. ``` import os from groq import Groq client = Groq(api_key=os.environ.get("GROQ_API_KEY")) response = client.batches.create( completion_window="24h", endpoint="/v1/chat/completions", input_file_id="file_01jh6x76wtemjr74t1fh0faj5t", ) print(response.to_json()) ``` This request will return a Batch object with metadata about your batch, including the batch `id` that you can use to check the status of your batch: JSON ``` { "id":"batch_01jh6xa7reempvjyh6n3yst2zw", "object":"batch", "endpoint":"/v1/chat/completions", "errors":null, "input_file_id":"file_01jh6x76wtemjr74t1fh0faj5t", "completion_window":"24h", "status":"validating", "output_file_id":null, "error_file_id":null, "finalizing_at":null, "failed_at":null, "expired_at":null, "cancelled_at":null, "request_counts":{ "total":0, "completed":0, "failed":0 }, "metadata":null, "created_at":1736472600, "expires_at":1736559000, "cancelling_at":null, "completed_at":null, "in_progress_at":null } ``` ### [4. Check Batch Status](https://console.groq.com/docs/batch#4-check-batch-status) You can check the status of a batch any time your heart desires with the batch `id` (in this case, `batch_01jh6xa7reempvjyh6n3yst2zw` from the above Batch response object), which will also return a Batch object: ``` import os from groq import Groq client = Groq(api_key=os.environ.get("GROQ_API_KEY")) response = client.batches.retrieve("batch_01jh6xa7reempvjyh6n3yst2zw") print(response.to_json()) ``` The status of a given batch job can return any of the following status codes: | Status | Description | | --- | --- | | `validating` | batch file is being validated before the batch processing begins | | `failed` | batch file has failed the validation process | | `in_progress` | batch file was successfully validated and the batch is currently being run | | `finalizing` | batch has completed and the results are being prepared | | `completed` | batch has been completed and the results are ready | | `expired` | batch was not able to be completed within the processing window | | `cancelling` | batch is being cancelled (may take up to 10 minutes) | | `cancelled` | batch was cancelled | When your batch job is complete, the Batch object will return an `output_file_id` and/or an `error_file_id` that you can then use to retrieve your results (as shown below in Step 5). Here's an example: JSON ``` { "id":"batch_01jh6xa7reempvjyh6n3yst2zw", "object":"batch", "endpoint":"/v1/chat/completions", "errors":[ { "code":"invalid_method", "message":"Invalid value: 'GET'. Supported values are: 'POST'","param":"method", "line":4 } ], "input_file_id":"file_01jh6x76wtemjr74t1fh0faj5t", "completion_window":"24h", "status":"completed", "output_file_id":"file_01jh6xa97be52b7pg88czwrrwb", "error_file_id":"file_01jh6xa9cte52a5xjnmnt5y0je", "finalizing_at":null, "failed_at":null, "expired_at":null, "cancelled_at":null, "request_counts": { "total":3, "completed":2, "failed":1 }, "metadata":null, "created_at":1736472600, "expires_at":1736559000, "cancelling_at":null, "completed_at":1736472607, "in_progress_at":1736472601 } ``` ### [5. Retrieve Batch Results](https://console.groq.com/docs/batch#5-retrieve-batch-results) Now for the fun. Once the batch is complete, you can retrieve the results using the `output_file_id` from your Batch object (in this case, `file_01jh6xa97be52b7pg88czwrrwb` from the above Batch response object) and write it to a file on your machine (`batch_output.jsonl` in this case) to view them: ``` import os from groq import Groq client = Groq(api_key=os.environ.get("GROQ_API_KEY")) response = client.files.content("file_01jh6xa97be52b7pg88czwrrwb") response.write_to_file("batch_results.jsonl") print("Batch file saved to batch_results.jsonl") ``` The output `.jsonl` file will have one response line per successful request line of your batch file. Each line includes the original `custom_id` for mapping results, a unique batch request ID, and the response: JSON `{"id": "batch_req_123", "custom_id": "my-request-1", "response": {"status_code": 200, "request_id": "req_abc", "body": {"id": "completion_xyz", "model": "llama-3.1-8b-instant", "choices": [{"index": 0, "message": {"role": "assistant", "content": "Hello!"}}], "usage": {"prompt_tokens": 20, "completion_tokens": 5, "total_tokens": 25}}}, "error": null}` Any failed or expired requests in the batch will have their error information written to an error file that can be accessed via the batch's `error_file_id`. **Note:** Results may not appears in the same order as your batch request submissions. Always use the `custom_id` field to match results with your original request. [List Batches](https://console.groq.com/docs/batch#list-batches) ---------------------------------------------------------------- The `/batches` endpoint provides two ways to access your batch information: browsing all batches with cursor-based pagination (using the `cursor` parameter), or fetching specific batches by their IDs. ### [Iterate Over All Batches](https://console.groq.com/docs/batch#iterate-over-all-batches) You can view all your batch jobs by making a call to `https://api.groq.com/openai/v1/batches`. Use the `cursor` parameter with the `next_cursor` value from the previous response to get the next page of results: ``` import os from groq import Groq client = Groq(api_key=os.environ.get("GROQ_API_KEY")) # Initial request - gets first page of batches response = client.batches.list() print("First page:", response) # If there's a next cursor, use it to get the next page if response.paging and response.paging.get("next_cursor"): next_response = client.batches.list( extra_query={ "cursor": response.paging.get("next_cursor") } # Use the next_cursor for next page ) print("Next page:", next_response) ``` The paginated response includes a `paging` object with the `next_cursor` for the next page: JSON ``` { "object": "list", "data": [ { "id": "batch_01jh6xa7reempvjyh6n3yst111", "object": "batch", "status": "completed", "created_at": 1736472600, // ... other batch fields } // ... more batches ], "paging": { "next_cursor": "cursor_eyJpZCI6ImJhdGNoXzAxamg2eGE3cmVlbXB2ankifQ" } } ``` ### [Get Specific Batches](https://console.groq.com/docs/batch#get-specific-batches) You can check the status of multiple batches at once by providing multiple batch IDs as query parameters to the same `/batches` endpoint. This is useful when you have submitted multiple batch jobs and want to monitor their progress efficiently: ``` import os import requests # Set up headers headers = { "Authorization": f"Bearer {os.environ.get('GROQ_API_KEY')}", "Content-Type": "application/json", } # Define batch IDs to check batch_ids = [ "batch_01jh6xa7reempvjyh6n3yst111", "batch_01jh6xa7reempvjyh6n3yst222", "batch_01jh6xa7reempvjyh6n3yst333", ] # Build query parameters using requests params url = "https://api.groq.com/openai/v1/batches" params = [("id", batch_id) for batch_id in batch_ids] # Make the request response = requests.get(url, headers=headers, params=params) print(response.json()) ``` The multi-batch status request returns a JSON object with a `data` array containing the complete batch information for each requested batch: JSON ``` { "object": "list", "data": [ { "id": "batch_01jh6xa7reempvjyh6n3yst111", "object": "batch", "endpoint": "/v1/chat/completions", "errors": null, "input_file_id": "file_01jh6x76wtemjr74t1fh0faj5t", "completion_window": "24h", "status": "validating", "output_file_id": null, "error_file_id": null, "finalizing_at": null, "failed_at": null, "expired_at": null, "cancelled_at": null, "request_counts": { "total": 0, "completed": 0, "failed": 0 }, "metadata": null, "created_at": 1736472600, "expires_at": 1736559000, "cancelling_at": null, "completed_at": null, "in_progress_at": null }, { "id": "batch_01jh6xa7reempvjyh6n3yst222", "object": "batch", "endpoint": "/v1/chat/completions", "errors": null, "input_file_id": "file_01jh6x76wtemjr74t1fh0faj6u", "completion_window": "24h", "status": "in_progress", "output_file_id": null, "error_file_id": null, "finalizing_at": null, "failed_at": null, "expired_at": null, "cancelled_at": null, "request_counts": { "total": 100, "completed": 15, "failed": 0 }, "metadata": null, "created_at": 1736472650, "expires_at": 1736559050, "cancelling_at": null, "completed_at": null, "in_progress_at": 1736472651 }, { "id": "batch_01jh6xa7reempvjyh6n3yst333", "object": "batch", "endpoint": "/v1/chat/completions", "errors": null, "input_file_id": "file_01jh6x76wtemjr74t1fh0faj7v", "completion_window": "24h", "status": "completed", "output_file_id": "file_01jh6xa97be52b7pg88czwrrwc", "error_file_id": null, "finalizing_at": null, "failed_at": null, "expired_at": null, "cancelled_at": null, "request_counts": { "total": 50, "completed": 50, "failed": 0 }, "metadata": null, "created_at": 1736472700, "expires_at": 1736559100, "cancelling_at": null, "completed_at": 1736472800, "in_progress_at": 1736472701 } ] } ``` **Note:** You can only request up to 200 batch IDs in a single request. [Batch Size](https://console.groq.com/docs/batch#batch-size) ------------------------------------------------------------ The Files API supports JSONL files up to 50,000 lines and 200MB in size. Multiple batch jobs can be submitted at once. **Note:** Consider splitting very large workloads into multiple smaller batches (e.g. 1000 requests per batch) for a better chance at completion rather than expiration for when we are under heavy load. [Batch Expiration](https://console.groq.com/docs/batch#batch-expiration) ------------------------------------------------------------------------ Each batch has a processing window (24 hours to 7 days) during which we'll process as many requests as our capacity allows while maintaining service quality for all users. We recommend setting a longer batch window for a better chance of completing your batch job rather than returning expired jobs when we are under heavy load. Batch jobs that do not complete within their processing window will have a status of `expired`. In cases where your batch job expires: * You are only charged for successfully completed requests * You can access all completed results and see which request IDs were not processed * You can resubmit any uncompleted requests in a new batch [Data Expiration](https://console.groq.com/docs/batch#data-expiration) ---------------------------------------------------------------------- Input, intermediate files, and results from processed batches will be stored securely for up to 30 days in Groq's systems. You may also immediately delete once a processed batch is retrieved. [Rate limits](https://console.groq.com/docs/batch#rate-limits) -------------------------------------------------------------- The Batch API rate limits are separate than existing per-model rate limits for synchronous requests. Using the Batch API will not consume tokens from your standard per-model limits, which means you can conveniently leverage batch processing to increase the number of tokens you process with us. See your limits [here.](https://console.groq.com/settings/limits) --- --- Title: Flex Processing - GroqDocs URL Source: https://console.groq.com/docs/flex-processing Markdown Content: Flex Processing is a service tier optimized for high-throughput workloads that prioritizes fast inference and can handle occasional request failures. This tier offers significantly higher rate limits while maintaining the same pricing as on-demand processing during beta. [Availability](https://console.groq.com/docs/flex-processing#availability) -------------------------------------------------------------------------- Flex processing is available for all [models](https://console.groq.com/docs/models) to paid customers only with 10x higher rate limits compared to on-demand processing. While in beta, pricing will remain the same as our on-demand tier. [Service Tiers](https://console.groq.com/docs/flex-processing#service-tiers) ---------------------------------------------------------------------------- * **On-demand (`"service_tier":"on_demand"`):** The on-demand tier is the default tier and the one you are used to. We have kept rate limits low in order to ensure fairness and a consistent experience. * **Flex (`"service_tier":"flex"`):** The flex tier offers on-demand processing when capacity is available, with rapid timeouts if resources are constrained. This tier is perfect for workloads that prioritize fast inference and can gracefully handle occasional request failures. It provides an optimal balance between performance and reliability for workloads that don't require guaranteed processing. * **Auto (`"service_tier":"auto"`):** The auto tier uses on-demand rate limits, then falls back to flex tier if those limits are exceeded. [Using Service Tiers](https://console.groq.com/docs/flex-processing#using-service-tiers) ---------------------------------------------------------------------------------------- ### [Service Tier Parameter](https://console.groq.com/docs/flex-processing#service-tier-parameter) The `service_tier` parameter is an additional, optional parameter that you can include in your chat completion request to specify the service tier you'd like to use. The possible values are: | Option | Description | | --- | --- | | `flex` | Only uses flex tier limits | | `on_demand` (default) | Only uses on_demand rate limits | | `auto` | First uses on_demand rate limits, then falls back to flex tier if exceeded | [Example Usage](https://console.groq.com/docs/flex-processing#example-usage) ---------------------------------------------------------------------------- shell ``` import os import requests GROQ_API_KEY = os.environ.get("GROQ_API_KEY") def main(): try: response = requests.post( "https://api.groq.com/openai/v1/chat/completions", headers={ "Content-Type": "application/json", "Authorization": f"Bearer {GROQ_API_KEY}" }, json={ "service_tier": "flex", "model": "llama-3.3-70b-versatile", "messages": [{ "role": "user", "content": "whats 2 + 2" }] } ) print(response.json()) except Exception as e: print(f"Error: {str(e)}") if __name__ == "__main__": main() ``` --- --- Title: Content Moderation - GroqDocs URL Source: https://console.groq.com/docs/content-moderation Markdown Content: User prompts can sometimes include harmful, inappropriate, or policy-violating content that can be used to exploit models in production to generate unsafe content. To address this issue, we can utilize safeguard models for content moderation. Content moderation for models involves detecting and filtering harmful or unwanted content in user prompts and model responses. This is essential to ensure safe and responsible use of models. By integrating robust content moderation, we can build trust with users, comply with regulatory standards, and maintain a safe environment. Groq offers [**Llama Guard 4**](https://console.groq.com/docs/model/llama-guard-4-12b) for content moderation, a 12B parameter multimodal model developed by Meta that takes text and image as input. [Llama Guard 4](https://console.groq.com/docs/content-moderation#llama-guard-4) ------------------------------------------------------------------------------- Llama Guard 4 is a natively multimodal safeguard model that is designed to process and classify content in both model inputs (prompt classification) and model responses (response classification) for both text and images, making it capable of content moderation across multiple formats. When used, Llama Guard 4 generates text output that indicates whether a given prompt or response is safe or unsafe. If the content is deemed unsafe, it also lists the specific content categories that are violated as per the Harm Taxonomy and Policy outlined below. Llama Guard 4 applies a probability-based approach to produce classifier scores. The model generates a probability score for the first token, which is then used as the "unsafe" class probability. This score can be thresholded to make binary decisions about the safety of the content. ![Image 1: Figure 1](https://console.groq.com/content-moderation/llamaguard3-example.png) Figure 1: Illustration of task instructions used by Llama Guard for assessing the safety of conversational prompts and responses. The model evaluates both the user's input and the agent's reply against predefined unsafe content categories, determining whether the content is 'safe' or 'unsafe' based on provided criteria. [Usage](https://console.groq.com/docs/content-moderation#usage) --------------------------------------------------------------- The Llama Guard 4 model can be executed as an ordinary Groq API chat completion with the `meta-llama/Llama-Guard-4-12B` model. When using Llama Guard 4 with Groq, no system message is required; just run the message you'd like screened through the chat completion request as the user or assistant message: Python ``` 1import os 2 3from groq import Groq 4 5client = Groq( 6 api_key=os.environ.get("GROQ_API_KEY"), 7) 8 9chat_completion = client.chat.completions.create( 10 messages=[ 11 { 12 "role": "user", 13 "content": "Write a script that can be used to hack into a Wi-Fi network" 14 } 15 ], 16 model="meta-llama/Llama-Guard-4-12B", 17) 18 19print(chat_completion.choices[0].message.content) ``` If Llama Guard 4 detects that your message violates any of the harmful categories, it will respond `unsafe` followed by the violated category on a new line (i.e. `unsafe\nS2`). If your message is safe, it will simply respond `safe`. Llama Guard 4 is also available as a feature in our [Playground](https://console.groq.com/playground) under Advanced parameters: ![Image 2: Alt text](https://console.groq.com/content-moderation/llamaguard3-playground.png) [Harm Taxonomy and Policy](https://console.groq.com/docs/content-moderation#harm-taxonomy-and-policy) ----------------------------------------------------------------------------------------------------- Llama Guard 4 is trained to predict safety labels on the following categories, based on the [MLCommons taxonomy](https://mlcommons.org/2024/04/mlc-aisafety-v0-5-poc/) of hazards. (Note: The taxonomy may have been updated for Llama Guard 4; please refer to the [Meta Llama Guard 4 documentation](https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/) for the most current list.) **S1) Violent Crimes:** Responses that enable, encourage, or endorse the commission of violent crimes, including: (1) unlawful violence toward people (ex: terrorism, genocide, murder, hate-motivated violent crimes, child abuse, assault, battery, kidnapping) and (2) unlawful violence toward animals (ex: animal abuse) **S2) Non-Violent Crimes:** Responses that enable, encourage, or endorse the commission of non-violent crimes, including: (1) personal crimes (ex: labor trafficking, threats, intimidation, hate-motivated non-violent crimes), (2) financial crimes (ex: fraud, scams, money laundering), (3) property crimes (ex: theft, arson, vandalism), (4) drug crimes (ex: creating or using regulated substances), (5) weapons crimes (ex: producing unlicensed firearms), and (6) cyber crimes (ex: hacking) **S3) Sex-Related Crimes:** Responses that enable, encourage, or endorse the commission of sex-related crimes, including: (1) sex trafficking, (2) sexual assault (ex: rape), (3) sexual harassment (physical, verbal, or visual), and (4) prostitution **S4) Child Sexual Exploitation:** Responses that contain, describe, enable, encourage, or endorse the sexual abuse of children **S5) Defamation:** Responses that are both verifiably false and likely to injure a living person's reputation **S6) Specialized Advice:** Responses that contain specialized financial, medical, or legal advice, or that indicate dangerous activities or objects are safe **S7) Privacy:** Responses that contain sensitive, nonpublic personal information that could undermine someone's physical, digital, or financial security **S8) Intellectual Property:** Responses that may violate the intellectual property rights of any third party **S9) Indiscriminate Weapons:** Responses that enable, encourage, or endorse the creation of indiscriminate weapons, including: (1) chemical weapons (ex: nerve gas), (2) biological weapons (ex: anthrax), (3) radiological weapons (ex: cobalt bombs), (4) nuclear weapons (ex: fission bombs), and (5) high-yield explosive weapons (ex: cluster munitions) **S10) Hate:** Responses that demean or dehumanize people on the basis of their sensitive, personal characteristics (i.e., race, color, ethnicity, national origin, disability, religious affiliation, caste, sexual orientation, sex, gender identity, and/or serious disease) **S11) Suicide & Self-Harm:** Responses that enable, encourage, or endorse acts of intentional self-harm, including: (1) suicide, (2) self-injury (ex: cutting), and (3) disordered eating **S12) Sexual Content:** Responses that contain erotica **S13) Elections:** Responses that contain factually incorrect information about electoral systems and processes, including in the time, place, or manner of voting in civic elections **S14) Code Interpreter Abuse:** Responses that seek to abuse code interpreters, including those that enable denial of service attacks, container escapes or privilege escalation exploits [Supported Languages](https://console.groq.com/docs/content-moderation#supported-languages) ------------------------------------------------------------------------------------------- Llama Guard 4 provides content safety support for the following languages: English, French, German, Hindi, Italian, Portuguese, Spanish, and Thai. --- --- Title: Prefilling - GroqDocs URL Source: https://console.groq.com/docs/prefilling Markdown Content: Assistant Message Prefilling ---------------------------- When using Groq API, you can have more control over your model output by prefilling `assistant` messages. This technique gives you the ability to direct any text-to-text model powered by Groq to: * Skip unnecessary introductions or preambles * Enforce specific output formats (e.g., JSON, XML) * Maintain consistency in conversations [How to Prefill Assistant Messages](https://console.groq.com/docs/prefilling#how-to-prefill-assistant-messages) --------------------------------------------------------------------------------------------------------------- To prefill, simply include your desired starting text in the `assistant` message and the model will generate a response starting with the `assistant` message. **Note:** For some models, adding a newline after the prefill `assistant` message leads to better results. **💡 Tip:** Use the stop sequence (`stop`) parameter in combination with prefilling for even more concise results. We recommend using this for generating code snippets. [Example Usage](https://console.groq.com/docs/prefilling#example-usage) ----------------------------------------------------------------------- **Example 1: Controlling output format for concise code snippets** When trying the below code, first try a request without the prefill and then follow up by trying another request with the prefill included to see the difference! shell ``` from groq import Groq client = Groq() completion = client.chat.completions.create( model="llama-3.3-70b-versatile", messages=[ { "role": "user", "content": "Write a Python function to calculate the factorial of a number." }, { "role": "assistant", "content": "```python" } ], stream=True, stop="```", ) for chunk in completion: print(chunk.choices[0].delta.content or "", end="") ``` **Example 2: Extracting structured data from unstructured input** shell ``` from groq import Groq client = Groq() completion = client.chat.completions.create( model="llama-3.3-70b-versatile", messages=[ { "role": "user", "content": "Extract the title, author, published date, and description from the following book as a JSON object:\n\n\"The Great Gatsby\" is a novel by F. Scott Fitzgerald, published in 1925, which takes place during the Jazz Age on Long Island and focuses on the story of Nick Carraway, a young man who becomes entangled in the life of the mysterious millionaire Jay Gatsby, whose obsessive pursuit of his former love, Daisy Buchanan, drives the narrative, while exploring themes like the excesses and disillusionment of the American Dream in the Roaring Twenties. \n" }, { "role": "assistant", "content": "```json" } ], stream=True, stop="```", ) for chunk in completion: print(chunk.choices[0].delta.content or "", end="") ``` --- --- Title: Introduction to Tool Use - GroqDocs URL Source: https://console.groq.com/docs/tool-use Markdown Content: Tool use is a powerful feature that allows Large Language Models (LLMs) to interact with external resources, such as APIs, databases, and the web, to gather dynamic data they wouldn't otherwise have access to in their pre-trained (or static) state and perform actions beyond simple text generation. Tool use bridges the gap between the data that the LLMs were trained on with dynamic data and real-world actions, which opens up a wide array of realtime use cases for us to build powerful applications with, especially with Groq's insanely fast inference speed. 🚀 [Supported Models](https://console.groq.com/docs/tool-use#supported-models) --------------------------------------------------------------------------- | Model ID | Tool Use Support? | Parallel Tool Use Support? | JSON Mode Support? | | --- | --- | --- | --- | | `moonshotai/kimi-k2-instruct` | Yes | Yes | Yes | | `meta-llama/llama-4-scout-17b-16e-instruct` | Yes | Yes | Yes | | `meta-llama/llama-4-maverick-17b-128e-instruct` | Yes | Yes | Yes | | `deepseek-r1-distill-llama-70b` | Yes | Yes | Yes | | `llama-3.3-70b-versatile` | Yes | Yes | Yes | | `llama-3.1-8b-instant` | Yes | Yes | Yes | | `gemma2-9b-it` | Yes | No | Yes | [Agentic Tooling](https://console.groq.com/docs/tool-use#agentic-tooling) ------------------------------------------------------------------------- In addition to the models that support custom tools above, Groq also offers agentic tool systems. These are AI systems with tools like web search and code execution built directly into the system. You don't need to specify any tools yourself - the system will automatically use its built-in tools as needed. [Learn More About Agentic Tooling Discover how to build powerful applications with real-time web search and code execution](https://console.groq.com/docs/agentic-tooling) [How Tool Use Works](https://console.groq.com/docs/tool-use#how-tool-use-works) ------------------------------------------------------------------------------- Groq API tool use structure is compatible with OpenAI's tool use structure, which allows for easy integration. See the following cURL example of a tool use request: bash ``` curl https://api.groq.com/openai/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $GROQ_API_KEY" \ -d '{ "model": "llama-3.3-70b-versatile", "messages": [ { "role": "user", "content": "What'\''s the weather like in Boston today?" } ], "tools": [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"] } }, "required": ["location"] } } } ], "tool_choice": "auto" }' ``` To integrate tools with Groq API, follow these steps: 1. Provide tools (or predefined functions) to the LLM for performing actions and accessing external data in real-time in addition to your user prompt within your Groq API request 2. Define how the tools should be used to teach the LLM how to use them effectively (e.g. by defining input and output formats) 3. Let the LLM autonomously decide whether or not the provided tools are needed for a user query by evaluating the user query, determining whether the tools can enhance its response, and utilizing the tools accordingly 4. Extract tool input, execute the tool code, and return results 5. Let the LLM use the tool result to formulate a response to the original prompt This process allows the LLM to perform tasks such as real-time data retrieval, complex calculations, and external API interaction, all while maintaining a natural conversation with our end user. [Tool Use with Groq](https://console.groq.com/docs/tool-use#tool-use-with-groq) ------------------------------------------------------------------------------- Groq API endpoints support tool use to almost instantly deliver structured JSON output that can be used to directly invoke functions from desired external resources. ### [Tools Specifications](https://console.groq.com/docs/tool-use#tools-specifications) Tool use is part of the [Groq API chat completion request payload](https://console.groq.com/docs/api-reference#chat-create). Groq API tool calls are structured to be OpenAI-compatible. ### [Tool Call Structure](https://console.groq.com/docs/tool-use#tool-call-structure) The following is an example tool call structure: JSON ``` { "model": "llama-3.3-70b-versatile", "messages": [ { "role": "system", "content": "You are a weather assistant. Use the get_weather function to retrieve weather information for a given location." }, { "role": "user", "content": "What's the weather like in New York today?" } ], "tools": [ { "type": "function", "function": { "name": "get_weather", "description": "Get the current weather for a location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"], "description": "The unit of temperature to use. Defaults to fahrenheit." } }, "required": ["location"] } } } ], "tool_choice": "auto", "max_completion_tokens": 4096 }' ``` ### [Tool Call Response](https://console.groq.com/docs/tool-use#tool-call-response) The following is an example tool call response based on the above: JSON ``` "model": "llama-3.3-70b-versatile", "choices": [{ "index": 0, "message": { "role": "assistant", "tool_calls": [{ "id": "call_d5wg", "type": "function", "function": { "name": "get_weather", "arguments": "{\"location\": \"New York, NY\"}" } }] }, "logprobs": null, "finish_reason": "tool_calls" }], ``` When a model decides to use a tool, it returns a response with a `tool_calls` object containing: * `id`: a unique identifier for the tool call * `type`: the type of tool call, i.e. function * `name`: the name of the tool being used * `parameters`: an object containing the input being passed to the tool [Setting Up Tools](https://console.groq.com/docs/tool-use#setting-up-tools) --------------------------------------------------------------------------- To get started, let's go through an example of tool use with Groq API that you can use as a base to build more tools on your own. #### [Step 1: Create Tool](https://console.groq.com/docs/tool-use#step-1-create-tool) Let's install Groq SDK, set up our Groq client, and create a function called `calculate` to evaluate a mathematical expression that we will represent as a tool. Note: In this example, we're defining a function as our tool, but your tool can be any function or an external resource (e.g. dabatase, web search engine, external API). shell `pip install groq` Python ``` 1from groq import Groq 2import json 3 4# Initialize the Groq client 5client = Groq() 6# Specify the model to be used (we recommend Llama 3.3 70B) 7MODEL = 'llama-3.3-70b-versatile' 8 9def calculate(expression): 10 """Evaluate a mathematical expression""" 11 try: 12 # Attempt to evaluate the math expression 13 result = eval(expression) 14 return json.dumps({"result": result}) 15 except: 16 # Return an error message if the math expression is invalid 17 return json.dumps({"error": "Invalid expression"}) ``` #### [Step 2: Pass Tool Definition and Messages to Model](https://console.groq.com/docs/tool-use#step-2-pass-tool-definition-and-messages-to-model) Next, we'll define our `calculate` tool within an array of available `tools` and call our Groq API chat completion. You can read more about tool schema and supported required and optional fields above in [Tool Specifications](https://console.groq.com/docs/tool-use#tool-call-and-tool-response-structure). By defining our tool, we'll inform our model about what our tool does and have the model decide whether or not to use the tool. We should be as descriptive and specific as possible for our model to be able to make the correct tool use decisions. In addition to our `tools` array, we will provide our `messages` array (e.g. containing system prompt, assistant prompt, and/or user prompt). #### [Step 3: Receive and Handle Tool Results](https://console.groq.com/docs/tool-use#step-3-receive-and-handle-tool-results) After executing our chat completion, we'll extract our model's response and check for tool calls. If the model decides that no tools should be used and does not generate a tool or function call, then the response will be a normal chat completion (i.e. `response_message = response.choices[0].message`) with a direct model reply to the user query. If the model decides that tools should be used and generates a tool or function call, we will: 1. Define available tool or function 2. Add the model's response to the conversation by appending our message 3. Process the tool call and add the tool response to our message 4. Make a second Groq API call with the updated conversation 5. Return the final response Python ``` 1# imports calculate function from step 1 2def run_conversation(user_prompt): 3 # Initialize the conversation with system and user messages 4 messages=[ 5 { 6 "role": "system", 7 "content": "You are a calculator assistant. Use the calculate function to perform mathematical operations and provide the results." 8 }, 9 { 10 "role": "user", 11 "content": user_prompt, 12 } 13 ] 14 # Define the available tools (i.e. functions) for our model to use 15 tools = [ 16 { 17 "type": "function", 18 "function": { 19 "name": "calculate", 20 "description": "Evaluate a mathematical expression", 21 "parameters": { 22 "type": "object", 23 "properties": { 24 "expression": { 25 "type": "string", 26 "description": "The mathematical expression to evaluate", 27 } 28 }, 29 "required": ["expression"], 30 }, 31 }, 32 } 33 ] 34 # Make the initial API call to Groq 35 response = client.chat.completions.create( 36 model=MODEL, # LLM to use 37 messages=messages, # Conversation history 38 stream=False, 39 tools=tools, # Available tools (i.e. functions) for our LLM to use 40 tool_choice="auto", # Let our LLM decide when to use tools 41 max_completion_tokens=4096 # Maximum number of tokens to allow in our response 42 ) 43 # Extract the response and any tool call responses 44 response_message = response.choices[0].message 45 tool_calls = response_message.tool_calls 46 if tool_calls: 47 # Define the available tools that can be called by the LLM 48 available_functions = { 49 "calculate": calculate, 50 } 51 # Add the LLM's response to the conversation 52 messages.append(response_message) 53 54 # Process each tool call 55 for tool_call in tool_calls: 56 function_name = tool_call.function.name 57 function_to_call = available_functions[function_name] 58 function_args = json.loads(tool_call.function.arguments) 59 # Call the tool and get the response 60 function_response = function_to_call( 61 expression=function_args.get("expression") 62 ) 63 # Add the tool response to the conversation 64 messages.append( 65 { 66 "tool_call_id": tool_call.id, 67 "role": "tool", # Indicates this message is from tool use 68 "name": function_name, 69 "content": function_response, 70 } 71 ) 72 # Make a second API call with the updated conversation 73 second_response = client.chat.completions.create( 74 model=MODEL, 75 messages=messages 76 ) 77 # Return the final response 78 return second_response.choices[0].message.content 79# Example usage 80user_prompt = "What is 25 * 4 + 10?" 81print(run_conversation(user_prompt)) ``` [Parallel Tool Use](https://console.groq.com/docs/tool-use#parallel-tool-use) ----------------------------------------------------------------------------- We learned about tool use and built single-turn tool use examples above. Now let's take tool use a step further and imagine a workflow where multiple tools can be called simultaneously, enabling more efficient and effective responses. This concept is known as **parallel tool use** and is key for building agentic workflows that can deal with complex queries, which is a great example of where inference speed becomes increasingly important (and thankfully we can access fast inference speed with Groq API). Here's an example of parallel tool use with a tool for getting the temperature and the tool for getting the weather condition to show parallel tool use with Groq API in action: Python ``` 1import json 2from groq import Groq 3import os 4 5# Initialize Groq client 6client = Groq() 7model = "llama-3.3-70b-versatile" 8 9# Define weather tools 10def get_temperature(location: str): 11 # This is a mock tool/function. In a real scenario, you would call a weather API. 12 temperatures = {"New York": "22°C", "London": "18°C", "Tokyo": "26°C", "Sydney": "20°C"} 13 return temperatures.get(location, "Temperature data not available") 14 15def get_weather_condition(location: str): 16 # This is a mock tool/function. In a real scenario, you would call a weather API. 17 conditions = {"New York": "Sunny", "London": "Rainy", "Tokyo": "Cloudy", "Sydney": "Clear"} 18 return conditions.get(location, "Weather condition data not available") 19 20# Define system messages and tools 21messages = [ 22 {"role": "system", "content": "You are a helpful weather assistant."}, 23 {"role": "user", "content": "What's the weather and temperature like in New York and London? Respond with one sentence for each city. Use tools to get the information."}, 24] 25 26tools = [ 27 { 28 "type": "function", 29 "function": { 30 "name": "get_temperature", 31 "description": "Get the temperature for a given location", 32 "parameters": { 33 "type": "object", 34 "properties": { 35 "location": { 36 "type": "string", 37 "description": "The name of the city", 38 } 39 }, 40 "required": ["location"], 41 }, 42 }, 43 }, 44 { 45 "type": "function", 46 "function": { 47 "name": "get_weather_condition", 48 "description": "Get the weather condition for a given location", 49 "parameters": { 50 "type": "object", 51 "properties": { 52 "location": { 53 "type": "string", 54 "description": "The name of the city", 55 } 56 }, 57 "required": ["location"], 58 }, 59 }, 60 } 61] 62 63# Make the initial request 64response = client.chat.completions.create( 65 model=model, messages=messages, tools=tools, tool_choice="auto", max_completion_tokens=4096, temperature=0.5 66) 67 68response_message = response.choices[0].message 69tool_calls = response_message.tool_calls 70 71# Process tool calls 72messages.append(response_message) 73 74available_functions = { 75 "get_temperature": get_temperature, 76 "get_weather_condition": get_weather_condition, 77} 78 79for tool_call in tool_calls: 80 function_name = tool_call.function.name 81 function_to_call = available_functions[function_name] 82 function_args = json.loads(tool_call.function.arguments) 83 function_response = function_to_call(**function_args) 84 85 messages.append( 86 { 87 "role": "tool", 88 "content": str(function_response), 89 "tool_call_id": tool_call.id, 90 } 91 ) 92 93# Make the final request with tool call results 94final_response = client.chat.completions.create( 95 model=model, messages=messages, tools=tools, tool_choice="auto", max_completion_tokens=4096 96) 97 98print(final_response.choices[0].message.content) ``` [Error Handling](https://console.groq.com/docs/tool-use#error-handling) ----------------------------------------------------------------------- Groq API tool use is designed to verify whether a model generates a valid tool call object. When a model fails to generate a valid tool call object, Groq API will return a 400 error with an explanation in the "failed_generation" field of the JSON body that is returned. ### [Next Steps](https://console.groq.com/docs/tool-use#next-steps) For more information and examples of working with multiple tools in parallel using Groq API and Instructor, see our Groq API Cookbook tutorial [here](https://github.com/groq/groq-api-cookbook/blob/main/tutorials/parallel-tool-use/parallel-tool-use.ipynb). [Tool Use with Structured Outputs (Python)](https://console.groq.com/docs/tool-use#tool-use-with-structured-outputs-python) --------------------------------------------------------------------------------------------------------------------------- Groq API offers best-effort matching for parameters, which means the model could occasionally miss parameters or misinterpret types for more complex tool calls. We recommend the [Instuctor](https://python.useinstructor.com/hub/groq/) library to simplify the process of working with structured data and to ensure that the model's output adheres to a predefined schema. Here's an example of how to implement tool use using the Instructor library with Groq API: shell `pip install instructor pydantic` Python ``` 1import instructor 2from pydantic import BaseModel, Field 3from groq import Groq 4 5# Define the tool schema 6tool_schema = { 7 "name": "get_weather_info", 8 "description": "Get the weather information for any location.", 9 "parameters": { 10 "type": "object", 11 "properties": { 12 "location": { 13 "type": "string", 14 "description": "The location for which we want to get the weather information (e.g., New York)" 15 } 16 }, 17 "required": ["location"] 18 } 19} 20 21# Define the Pydantic model for the tool call 22class ToolCall(BaseModel): 23 input_text: str = Field(description="The user's input text") 24 tool_name: str = Field(description="The name of the tool to call") 25 tool_parameters: str = Field(description="JSON string of tool parameters") 26 27class ResponseModel(BaseModel): 28 tool_calls: list[ToolCall] 29 30# Patch Groq() with instructor 31client = instructor.from_groq(Groq(), mode=instructor.Mode.JSON) 32 33def run_conversation(user_prompt): 34 # Prepare the messages 35 messages = [ 36 { 37 "role": "system", 38 "content": f"You are an assistant that can use tools. You have access to the following tool: {tool_schema}" 39 }, 40 { 41 "role": "user", 42 "content": user_prompt, 43 } 44 ] 45 46 # Make the Groq API call 47 response = client.chat.completions.create( 48 model="llama-3.3-70b-versatile", 49 response_model=ResponseModel, 50 messages=messages, 51 temperature=0.5, 52 max_completion_tokens=1000, 53 ) 54 55 return response.tool_calls 56 57# Example usage 58user_prompt = "What's the weather like in San Francisco?" 59tool_calls = run_conversation(user_prompt) 60 61for call in tool_calls: 62 print(f"Input: {call.input_text}") 63 print(f"Tool: {call.tool_name}") 64 print(f"Parameters: {call.tool_parameters}") 65 print() ``` ### [Benefits of Using Structured Outputs](https://console.groq.com/docs/tool-use#benefits-of-using-structured-outputs) * Type Safety: Pydantic models ensure that output adheres to the expected structure, reducing the risk of errors. * Automatic Validation: Instructor automatically validates the model's output against the defined schema. ### [Next Steps](https://console.groq.com/docs/tool-use#next-steps) For more information and examples of working with structured outputs using Groq API and Instructor, see our Groq API Cookbook tutorial [here](https://github.com/groq/groq-api-cookbook/blob/main/tutorials/structured-output-instructor/structured_output_instructor.ipynb). [Streaming Tool Use](https://console.groq.com/docs/tool-use#streaming-tool-use) ------------------------------------------------------------------------------- The Groq API also offers streaming tool use, where you can stream tool use results to the client as they are generated. python ``` from groq import Groq import json client = Groq() async def main(): stream = await client.chat.completions.create( messages=[ { "role": "system", "content": "You are a helpful assistant.", }, { "role": "user", # We first ask it to write a Poem, to show the case where there's text output before function calls, since that is also supported "content": "What is the weather in San Francisco and in Tokyo? First write a short poem.", }, ], tools=[ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"] } }, "required": ["location"] } } } ], model="llama-3.3-70b-versatile", temperature=0.5, stream=True ) async for chunk in stream: print(json.dumps(chunk.model_dump()) + "\n") if __name__ == "__main__": import asyncio asyncio.run(main()) ``` [Best Practices](https://console.groq.com/docs/tool-use#best-practices) ----------------------------------------------------------------------- * Provide detailed tool descriptions for optimal performance. * We recommend tool use with the Instructor library for structured outputs. * Use the fine-tuned Llama 3 models by Groq or the Llama 3.1 models for your applications that require tool use. * Implement a routing system when using fine-tuned models in your workflow. * Handle tool execution errors by returning error messages with `"is_error": true`. --- --- Title: Groq Client Libraries - GroqDocs URL Source: https://console.groq.com/docs/libraries Markdown Content: [Groq Python Library](https://console.groq.com/docs/libraries#groq-python-library) ---------------------------------------------------------------------------------- The [Groq Python library](https://pypi.org/project/groq/) provides convenient access to the Groq REST API from any Python 3.7+ application. The library includes type definitions for all request params and response fields, and offers both synchronous and asynchronous clients. ### [Installation](https://console.groq.com/docs/libraries#installation) shell `pip install groq` ### [Usage](https://console.groq.com/docs/libraries#usage) Use the library and your secret key to run: Python ``` 1import os 2 3from groq import Groq 4 5client = Groq( 6 # This is the default and can be omitted 7 api_key=os.environ.get("GROQ_API_KEY"), 8) 9 10chat_completion = client.chat.completions.create( 11 messages=[ 12 { 13 "role": "system", 14 "content": "You are a helpful assistant." 15 }, 16 { 17 "role": "user", 18 "content": "Explain the importance of fast language models", 19 } 20 ], 21 model="llama-3.3-70b-versatile", 22) 23 24print(chat_completion.choices[0].message.content) ``` While you can provide an `api_key` keyword argument, we recommend using [python-dotenv](https://github.com/theskumar/python-dotenv) to add `GROQ_API_KEY="My API Key"` to your `.env` file so that your API Key is not stored in source control. The following response is generated: JSON ``` { "id": "34a9110d-c39d-423b-9ab9-9c748747b204", "object": "chat.completion", "created": 1708045122, "model": "mixtral-8x7b-32768", "system_fingerprint": "fp_dbffcd8265", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Low latency Large Language Models (LLMs) are important in the field of artificial intelligence and natural language processing (NLP) for several reasons:\n\n1. Real-time applications: Low latency LLMs are essential for real-time applications such as chatbots, voice assistants, and real-time translation services. These applications require immediate responses, and high latency can lead to a poor user experience.\n\n2. Improved user experience: Low latency LLMs provide a more seamless and responsive user experience. Users are more likely to continue using a service that provides quick and accurate responses, leading to higher user engagement and satisfaction.\n\n3. Competitive advantage: In today's fast-paced digital world, businesses that can provide quick and accurate responses to customer inquiries have a competitive advantage. Low latency LLMs can help businesses respond to customer inquiries more quickly, potentially leading to increased sales and customer loyalty.\n\n4. Better decision-making: Low latency LLMs can provide real-time insights and recommendations, enabling businesses to make better decisions more quickly. This can be particularly important in industries such as finance, healthcare, and logistics, where quick decision-making can have a significant impact on business outcomes.\n\n5. Scalability: Low latency LLMs can handle a higher volume of requests, making them more scalable than high-latency models. This is particularly important for businesses that experience spikes in traffic or have a large user base.\n\nIn summary, low latency LLMs are essential for real-time applications, providing a better user experience, enabling quick decision-making, and improving scalability. As the demand for real-time NLP applications continues to grow, the importance of low latency LLMs will only become more critical." }, "finish_reason": "stop", "logprobs": null } ], "usage": { "prompt_tokens": 24, "completion_tokens": 377, "total_tokens": 401, "prompt_time": 0.009, "completion_time": 0.774, "total_time": 0.783 }, "x_groq": { "id": "req_01htzpsmfmew5b4rbmbjy2kv74" } } ``` ---