https://store-images.s-microsoft.com/image/apps.37443.e43ab0db-fc5f-41dc-a057-c6f2257f5a19.2c7169c6-44fd-410a-bb0b-1bcec9604b20.56a2eab1-ea1f-49ce-82ab-2d11356bc7f6
ExLlama / ExLlamaV2
Autor: bCloud LLC
Just a moment, logging you in...
Version 0.3.2 + Free with Support on Ubuntu 24.04
ExLlama / ExLlamaV2
ExLlama / ExLlamaV2 is a high-performance Python library designed for running large language models (LLMs) efficiently on NVIDIA GPUs. It provides optimized CUDA extensions, fast tokenization, and tensor management to enable low-latency inference for AI and NLP workloads.
Features of ExLlama / ExLlamaV2:
- GPU-accelerated inference for large language models using optimized CUDA extensions.
- Support for tokenization and tensor operations for seamless integration with Python workflows.
- Efficient memory utilization for transformer-based models.
- Modular design to support NLP tasks such as text generation, summarization, and AI content creation.
- Easy integration with Python ML pipelines and research projects.
To Check Version:
$ sudo apt update
$ cd /opt/exllama/exllamav2
$ python -m venv venv
$ source /opt/venv/bin/activate && \
export PYTHONPATH=/opt/exllamav2-0.3.2:$PYTHONPATH && \
python -c 'import exllamav2_ext; print("ExLlamaV2 imported successfully! Version: 0.3.2")'
$ ls /opt | grep exllamav2
Disclaimer: ExLlama / ExLlamaV2 is an open-source AI library provided under its respective license. It is offered "as is," without any warranty, express or implied. Users are responsible for ensuring compatibility with their hardware (CUDA-enabled GPUs) and Python environment.