Llama

Model name: llama_instruct_local

About Llama

LLaMA is a decoder-only transformer model designed for text generation tasks. It uses the standard Transformer architecture without an encoder, processing input tokens autoregressively to predict the next token in sequence. Pre-trained on a large-scale corpus of publicly available text, LLaMA is capable of handling various natural language tasks, including chat, code generation, summarization, and question answering.

Supported aidb operations

decode_text
decode_text_batch

Supported models

See more in the Support matrix.

Creating the default model

SELECT aidb.create_model('my_llama_model', 'llama_instruct_local');

Creating a specific model

SELECT aidb.create_model(
  'another_llama_model',
  'llama_instruct_local',
  '{"model": "HuggingFaceTB/SmolLM2-135M-Instruct", "revision": "main"}'::JSONB
)

Running the model

SELECT aidb.decode_text('llama_instruct', 'Why is the sky blue?');

Model configuration settings

The following configuration settings are available for Llama models:

model — The Llama model to use. The default is TinyLlama/TinyLlama-1.1B-Chat-v1.0.
revision — The revision of the model to use. The default is main.
system_prompt — Optional. Foundational instructions to guide general LLM responses.
use_flash_attention — Indicate if the model uses flash attention. The default is false.
seed — The random seed to use for sampling. The default is 1599222198345926291.
temperature — The temperature to use for sampling. The default is 0.2.
sample_len — The maximum number of tokens to generate. The default is 64.
repeat_last_n — The number of tokens to consider for the repetition penalty. The default is 64.
repeat_penalty — The repetition penalty to use. The default is 1.1.
top_p — Cumulative probability threshold for filtering the token distribution. The default is 0.9.
use_kv_cache — Enables reuse of attention key/value pairs during generation for faster decoding. The default is true.

Model credentials

No credentials are required for local Llama models.

← Prev

CLIP

↑ Up

Pipelines supported models

Gemini

Could this page be better? Report a problem or suggest an addition!