andfoy/GPTQModel
Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
Stars: 0Language: Python
Give AlbumentationsX a star on GitHub — it powers this leaderboard
Star on GitHubProduction ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.