Microsoft Marketplace | cloud solutions, AI apps, and agents

https://catalogartifact.azureedge.net/publicartifacts/lynxroute.vllm-938780b8-f8ad-465a-8f49-779207ff9591/image2_Azureready.png

Overview Plans Ratings + reviews Details + support

vLLM with Open WebUI - CIS Level 1 hardened on Ubuntu 24.04 LTS with SBOM and CIS Report.

What is vLLM

vLLM is an open-source, high-throughput inference engine for large language models, built in Python on top of PyTorch. It implements PagedAttention, continuous batching, and tensor parallelism to serve any HuggingFace-compatible transformer model (Llama, Mistral, Qwen, Phi, Gemma, OPT, GPT-J, Falcon, and 100+ more) through a fully OpenAI-compatible REST API. Any OpenAI client (openai-python, openai-node, LangChain, LlamaIndex, AnythingLLM) connects unchanged - just point the base URL at this VM and pass the local Bearer token. This image ships the CPU build of vLLM bundled with Open WebUI as a browser chat front end pre-wired to the local vLLM. The default model facebook/opt-125m (~250 MB) is preloaded so chat and API work immediately, no HuggingFace token required; any HuggingFace-compatible model can be swapped in via /etc/vllm/server.env.

Why self-host vLLM

Self-hosting keeps every prompt, document, embedding, and API key inside your own tenant. No third-party SaaS sees your customer data, internal knowledge bases, or model traffic. Recommended for teams with data residency requirements, organisations under regulated frameworks (HIPAA, GDPR, ISO 27001), and AI labs that need full visibility into the inference path. Apache-2.0 (vLLM) and MIT (Open WebUI) - fully auditable, no vendor lock-in.

What this VM image adds

Security hardening:

Random 32-byte API key generated at first boot - written to /root/vllm-credentials.txt, never baked into the image; the same key is injected into Open WebUI so the chat UI authenticates to vLLM transparently
vLLM bound to 127.0.0.1:8000 - reachable only through Nginx with TLS, with --api-key Bearer auth enforced on every /v1/* request
Open WebUI bound to 127.0.0.1:8080 - reachable only through Nginx with TLS
First registered user in Open WebUI becomes the workspace administrator - no admin baked in
Nginx reverse proxy - self-signed TLS, HTTP-to-HTTPS redirect, WebSocket upgrade for streaming chat, security headers (X-Content-Type-Options, X-Frame-Options, Referrer-Policy)
Loading splash page - served while the model warms up on first request
Anonymous telemetry disabled - VLLM_NO_USAGE_STATS, DO_NOT_TRACK, ANONYMIZED_TELEMETRY
UFW firewall - only TCP 22, 80, 443 exposed; 8000 and 8080 explicitly denied
fail2ban - SSH brute-force protection
AppArmor - mandatory access control
Trivy CVE scan - every image is scanned for vulnerabilities before release
Trivy secret scan - blocks any image that ships with leaked credentials

OS hardening (CIS Level 1):

CIS Level 1 hardened - CIS Ubuntu 24.04 LTS Level 1 Benchmark via ansible-lockdown
auditd - system call auditing for critical paths
SSH hardening - PasswordAuthentication disabled, key-only access
Kernel hardening - SYN cookies, ASLR, rp_filter, TCP BBR
/tmp as tmpfs - nosuid, nodev, noexec
Azure IMDS endpoints - egress rules pre-configured (169.254.169.254, 168.63.129.16)

Compliance artifacts (inside the VM):

SBOM - CycloneDX 1.6 at /etc/lynxroute/sbom.json
CIS Conformance Report - OpenSCAP HTML at /etc/lynxroute/cis-report.html
Tailored CIS profile - /usr/share/doc/lynxroute/CIS_TAILORED_PROFILE.md
Credentials file - /root/vllm-credentials.txt with the API key and connection details

Quick Start

Deploy VM (Standard_D4s_v3 recommended; minimum Standard_D2s_v3 with 8 GB RAM)
Open NSG: TCP 443 from YOUR IP until you have registered; SSH 22 from your management IPs
SSH: ssh -i key.pem <username>@<PUBLIC_IP> (default user: azureuser)
Read connection details: sudo cat /root/vllm-credentials.txt
Open https://<PUBLIC_IP>/, accept the self-signed certificate, click "Sign up" - the first registered user becomes the workspace administrator

OpenAI API direct: curl https://<PUBLIC_IP>/v1/models -H "Authorization: Bearer <API_KEY>" -k. Replace the self-signed TLS certificate with a CA-signed certificate for production.

vLLM with WebUI - Hardened Self-Hosted LLM Server

by Lynxroute