https://catalogartifact.azureedge.net/publicartifacts/pacificdataintegrators1738256099128.synthetic-data-generator-38cbb548-e258-48b8-acd5-ba1b572a6a6c/image0_SDGAWSMarketPlace300x300.png
Synthetic Data Generator
por Pacific Data Integrators
Just a moment, logging you in...
Synthetic Data Generator (SDG) creates realistic, privacy-safe test datasets from Oracle, MySQL, PostgreSQL, SQL Server, and SFTP sources
Synthetic Data Generator (SDG) is a self-hosted web application that produces realistic, privacy-safe synthetic datasets from your own production data, without moving that data outside your network. Point SDG at an Oracle, MySQL, PostgreSQL, SQL Server, or SFTP source; it profiles the real data, auto-detects columns containing PII (names, emails, phones, SSNs, and custom patterns you define), and generates synthetic output that preserves statistical distributions, data types, and key relationships.
Key capabilities:
1. Broad source coverage Oracle, MySQL, PostgreSQL, SQL Server, and SFTP (.csv, .xml, .json, .dat). Configure a connection once in the UI; SDG introspects schemas, tables, and columns automatically.
2. Automated PII detection Microsoft Presidio engine plus extensible custom rules. Catches the usual identifiers (names, emails, phones, SSNs, credit cards) out of the box, and lets your team add domain-specific patterns such as policy numbers, account IDs, and tax IDs so nothing sensitive slips through untagged.
3. Faithful synthesis Preserves column distributions, column-pair correlations, and primary/foreign-key relationships. Downstream tests behave the way they would against production data, without any of the exposure. - Runs in your account (deployed from a single AMI into your VPC; data never leaves your AWS boundary and no third-party service sees it). - Web UI (Angular frontend for configuring connections, browsing schemas, selecting tables, reviewing detected PII, and triggering generation. Generated data can be written back to a target database or downloaded as files).
Who it's for - Data, QA, and platform teams who need production-realistic datasets for development, testing, training, or demos but can't legally or safely use real customer data. Especially useful for regulated industries (finance, insurance, healthcare) where data-sharing boundaries and PII handling are audited.
What you get - A hardened Ubuntu 24.04 AMI with SDG preinstalled (gunicorn + nginx), CloudWatch-ready logging, and a built-in 7-day trial license. Full licenses are issued through PDI's sales team.
De un vistazo
https://catalogartifact.azureedge.net/publicartifacts/pacificdataintegrators1738256099128.synthetic-data-generator-38cbb548-e258-48b8-acd5-ba1b572a6a6c/trailer0_trailer.png