✍️ ezpz-generate¶
[!WARNING] Experimental / not well tested. The flow works in simple cases but has not seen broad coverage—treat as best-effort and be ready to fall back to your own HF script if needed.
Interactive text generation loop for Hugging Face causal language models.
- Loads a model and tokenizer via 🤗
transformers - Moves the model to the device detected by
ezpz.get_torch_device_type() - Prompts you for text and a max length, then streams a single completion
Usage¶
Flags¶
--model_name(default:meta-llama/Llama-3.2-1B): Hugging Face repo/model to load.--dtype(default:bfloat16, choices:float16|bfloat16|float32): Torch dtype for the model.
At runtime the script will prompt for:
prompt: Text to feed the model.max length: Token limit passed tomodel.generate.
Notes¶
- Expects
torchandtransformersto be installed and a compatible accelerator available (GPU strongly recommended). - Tokenizer
pad_tokenis set toeos_tokenbefore generation. - Type “exit” at the prompt or press
Ctrl+Cto quit.