RapidAI/exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
Stars: 1Language: Python
Give AlbumentationsX a star on GitHub — it powers this leaderboard
Star on GitHubA more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.