Llama
Model name: llama_instruct_local
About Llama
LLaMA is a decoder-only transformer model designed for text generation tasks. It uses the standard Transformer architecture without an encoder, processing input tokens autoregressively to predict the next token in sequence. Pre-trained on a large-scale corpus of publicly available text, LLaMA is capable of handling various natural language tasks, including chat, code generation, summarization, and question answering.
Read more about Llama on Wikipedia.
Supported aidb operations
- decode_text
- decode_text_batch
Supported models
- TinyLlama/TinyLlama-1.1B-Chat-v1.0
- HuggingFaceTB/SmolLM2-135M-Instruct
- HuggingFaceTB/SmolLM2-360M-Instruct
- HuggingFaceTB/SmolLM2-1.7B-Instruct
Creating the default model
SELECT aidb.create_model('my_llama_model', 'llama_instruct_local');
Creating a specific model
SELECT aidb.create_model( 'another_llama_model', 'llama_instruct_local', '{"model": "HuggingFaceTB/SmolLM2-135M-Instruct", "revision": "main"}'::JSONB )
Running the model
SELECT aidb.decode_text('llama_instruct', 'Why is the sky blue?');
Model configuration settings
The following configuration settings are available for Llama models:
model
— The Llama model to use. The default isTinyLlama/TinyLlama-1.1B-Chat-v1.0
.revision
— The revision of the model to use. The default ismain
.system_prompt
— Optional. Foundational instructions to guide general LLM responses.use_flash_attention
— Indicate if the model uses flash attention. The default isfalse
.seed
— The random seed to use for sampling. The default is1599222198345926291
.temperature
— The temperature to use for sampling. The default is0.2
.sample_len
— The maximum number of tokens to generate. The default is64
.repeat_last_n
— The number of tokens to consider for the repetition penalty. The default is64
.repeat_penalty
— The repetition penalty to use. The default is1.1
.top_p
— Cumulative probability threshold for filtering the token distribution. The default is0.9
.use_kv_cache
— Enables reuse of attention key/value pairs during generation for faster decoding. The default istrue
.
Model credentials
No credentials are required for local Llama models.
Could this page be better? Report a problem or suggest an addition!