Trino - Hardened Distributed SQL Query Engine
by Lynxroute
Trino 481 - CIS Level 1 hardened distributed SQL query engine on Ubuntu 24.04 LTS
What is Trino
Trino is a fast, distributed SQL query engine for federated analytics, maintained by the Trino Software Foundation. It runs as a single Java (JVM) process that acts as both coordinator and worker, so one node is a complete cluster. Trino speaks ANSI SQL with a cost-based optimiser and executes queries in parallel across pluggable connectors, letting you join data that lives in object storage, relational databases, data lakes and streams in a single query, without copying or ETL. The full distribution ships 40+ connectors (object storage, PostgreSQL, MySQL, Iceberg, Delta Lake, Hive, Kafka and more). Trino is stateless: it stores no data of its own and queries external sources directly. This image bundles the tpch (benchmark) and memory connectors so you can validate it immediately. Apache-2.0 license, no vendor lock-in.
Why self-host Trino
Running Trino on a VM you control keeps your query traffic and the credentials to every connected data source inside your own tenant rather than a managed query service. Self-hosting suits teams with data residency requirements, organisations under GDPR, HIPAA or ISO 27001, and any analytics workload where the data and its access keys must stay within your own perimeter with no per-query fees.
What this VM image adds
Security hardening:
- Unique administrator password generated per instance at first launch - no default credential - stored in /root/trino-credentials.txt (mode 0600)
- File-based password authentication - CLI, JDBC and BI clients authenticate over TLS with the generated credential; the query engine never accepts anonymous access
- Trino bound behind the firewall - the HTTP query port (8080) is reachable only on the instance; nginx terminates TLS on port 443 and proxies the Web UI and the REST/JDBC endpoint
- Unique internal cluster shared secret generated per instance at first boot - internal communication is never left on a default key
- JVM heap and query memory auto-sized to the instance RAM at first boot - no out-of-memory on first launch
- Self-signed TLS certificate generated at first launch and replaceable with your own CA-signed certificate (certbot is pre-installed)
- UFW firewall - TCP 443 open externally for buyer use, TCP 22 for SSH; all other inbound dropped
- fail2ban - SSH brute-force protection
- AppArmor - mandatory access control
- CVE scan - every image is scanned with Trivy before release
OS hardening (CIS Level 1):
- CIS Ubuntu 24.04 LTS Level 1 Benchmark via ansible-lockdown
- auditd for system call auditing of critical paths
- SSH hardening - PasswordAuthentication disabled, key-only access, PermitRootLogin no, LoginGraceTime 60
- Kernel hardening - SYN cookies, ASLR, rp_filter, kexec disabled, IPv6 off
- /tmp as tmpfs with nosuid, nodev, noexec
Compliance artifacts (inside the VM):
- SBOM - CycloneDX 1.6 at /etc/lynxroute/sbom.json with Trino pinned by version, PURL, Apache-2.0 license, supplier, and hash
- CIS Conformance Report at /etc/lynxroute/cis-report.html (OpenSCAP, Azure tailoring profile, 0 FAIL rules)
- Tailored CIS profile at /usr/share/doc/lynxroute/CIS_TAILORED_PROFILE.md
- Operator credentials file at /root/trino-credentials.txt (mode 0600) with the admin username and password and the Trino Web UI HTTPS URL
Quick Start
- Deploy VM from Azure Marketplace (Standard_D4s_v3 recommended for the JVM heap)
- Open NSG: TCP 443 from your trusted sources, TCP 22 from your management IPs only
- SSH: ssh -i key.pem azureuser@<PUBLIC_IP>, then sudo cat /root/trino-credentials.txt for the admin password
- Open https://<PUBLIC_IP>/ui/ in your browser, accept the self-signed certificate warning, and log in as admin with the password from the credentials file
- Connect a SQL client over TLS: trino --server https://<PUBLIC_IP> --user admin --password, or JDBC jdbc:trino://<PUBLIC_IP>:443?SSL=true
Trino runs as a single-node coordinator and worker; the HTTP query port 8080 stays bound to the instance and nginx is the TLS perimeter on port 443 - do not expose 8080 directly. Bundled catalogs are tpch and memory; add your own under /etc/trino/catalog/ to query object storage, PostgreSQL, Iceberg, Delta Lake, Kafka and more. Replace the self-signed certificate with a CA-signed one for production.