https://catalogartifact.azureedge.net/publicartifacts/kcloudhubllc1763357129530.pyspark1-58f9c612-60f2-4172-ae50-d5933ca1c98e/81e93bc8-beb8-4f5d-901f-f35f784c2a42_kcloudlogo.txt.png

Pyspark

от kCloudHub LLC

(1 оценки)

Version 4.1.1 + Free Support on Ubuntu 24.04

PySpark is an open-source Python API for Apache Spark that enables large-scale data processing and distributed computing. It allows developers and data engineers to use Python to work with big data efficiently across clusters.

Key Features of PySpark:

  • Open-source distributed data processing framework with Python support.
  • Scalable processing of large datasets across clusters.
  • Built-in libraries for SQL queries, machine learning, and streaming.
  • High-performance in-memory data processing.
  • Integration with data sources such as Hadoop, cloud storage, and databases.

PySpark Usage:

$ sudo su
$ cd /opt/pyspark
$ source pyspark-env/bin/activate
$ python3 -c "import pyspark; print(pyspark.__version__)"
  

Disclaimer:
PySpark is an independent open-source project that is part of the Apache Spark ecosystem and is maintained by the Apache Software Foundation.