https://catalogartifact.azureedge.net/publicartifacts/bcloudllc1671615348068.deequwithapachespark-0e1b2833-a91b-4ad7-8ce0-74f76ce12e76/2ba8eeaa-325e-4b7f-a8bb-a24a4c159773_bcloud.png
Deequ with Apache Spark
by bCloud LLC
Just a moment, logging you in...
Version 1.2.0+ Free Support on Ubuntu 24.04
Deequ with Apache Spark is a data quality framework developed by Amazon that runs on top of Apache Spark. It allows organizations to define, automate, and enforce data quality checks on large datasets, leveraging Spark’s distributed computing for scalable and efficient analysis. Deequ helps ensure the reliability, consistency, and completeness of data in production pipelines.
Features of Deequ with Apache Spark:
- Data quality framework: Enables definition of constraints and metrics such as completeness, uniqueness, and consistency to validate data quality.
- Distributed computation: Utilizes Apache Spark’s distributed processing for scalable verification of large datasets.
- Historical tracking: Supports storing verification results over time to monitor trends in data quality.
- Declarative API: Allows users to define checks and analysis in a concise and readable manner, while integrating seamlessly into Spark pipelines.
- Compatible with Ubuntu 24.04 and Python virtual environments for isolated PyDeequ package management.
Deequ Usage with PySpark:
$ sudo su$ source /opt/pydeequ-venv/bin/activate
$ /opt/spark/bin/pyspark --packages com.amazon.deequ:deequ:2.0.7-spark-3.5
$Once Spark starts successfully, test: $ import pydeequ print(pydeequ.__version__)
Disclaimer: Deequ is an independent open-source project developed by Amazon and is not affiliated with, endorsed by, or sponsored by Apache Spark or any Linux distribution.
Other apps from bCloud LLC
KeycloakbCloud LLCVersion 26.5.5 + Free Support on Ubuntu 24.04
+1
Applicable to:
Virtual Machines
NaN out of 5
EmbeddingsbCloud LLCVersion 5.1.1 + Free Support on Ubuntu 24.04
+1
Applicable to:
Virtual Machines
NaN out of 5
Gemini Pro VersionbCloud LLCVersion 3.1 + Free Support on Ubuntu 24.04
+1
Applicable to:
Virtual Machines
NaN out of 5
Document AI OCR ProcessorbCloud LLCVersion 5.3.4 + Free Support on Ubuntu 24.04
+1
Applicable to:
Virtual Machines
NaN out of 5
ElevenLabsbCloud LLCVersion 2.34.0 + Free with Support on Ubuntu 24.04
+1
Applicable to:
Virtual Machines
NaN out of 5