Zum Hauptinhalt wechseln
https://catalogartifact.azureedge.net/publicartifacts/kcloudhubllc1763357129530.arrow122-da343909-c357-4964-8e73-63ac7129535f/image1_largekcloud.png

Arrow

durch kCloudHub LLC

Version 24.0.0 + Free Support on Ubuntu 24.04

Apache Arrow 24.0.0 is a high-performance, open-source columnar memory format and multi-language data processing framework designed for fast data interchange, in-memory analytics, and efficient data storage workflows.

The solution supports common data engineering and analytics workflows including columnar data processing, Parquet file handling, data interchange between systems, Python-based analytics, and integration with tools such as pandas, NumPy, Spark, and other big data platforms. It is ideal for developers, data engineers, and analytics teams working with high-performance data pipelines on Azure.

Version: Apache Arrow / PyArrow 24.0.0

Features of Apache Arrow:

  • High-performance columnar memory format for analytics workloads.
  • Fast data interchange between programming languages and systems.
  • Python support through PyArrow for data processing and analytics.
  • Apache Parquet read and write support.
  • Integration with pandas, NumPy, and other data science tools.
  • Suitable for big data, machine learning, ETL, and in-memory analytics workflows.
  • No web dashboard or default application port required.

Usage instructions for Apache Arrow
$ sudo su
$ cd /opt
$ python3 -m venv arrow-env
$ source /opt/arrow-env/bin/activate
$ python3 -m pip install --upgrade pip setuptools wheel
$ pip install "pyarrow==24.0.*" pandas
$ python3 -c "import pyarrow as pa; print(pa.__version__)"

Expected output:
24.0.0

Testing Apache Arrow installation
$ cat > /opt/test_arrow.py <<'EOF'
import pyarrow as pa
import pyarrow.parquet as pq
table = pa.table({"id": [1, 2, 3], "name": ["azure", "apache", "arrow"]})
pq.write_table(table, "/opt/arrow_test.parquet")
result = pq.read_table("/opt/arrow_test.parquet")
print("Apache Arrow Version:", pa.__version__)
print(result)
print("Apache Arrow test completed successfully.")
EOF
$ python3 /opt/test_arrow.py

Access information:
Apache Arrow is a library-based solution and does not provide a web dashboard.
No browser URL is required.
No application port is required.

Required Azure inbound port:
SSH Port: 22

Disclaimer: Apache Arrow is provided “as is” under applicable open-source licenses. Users are responsible for proper installation, configuration, validation of data workflows, and secure handling of application data. This solution is best suited for data engineering, analytics, ETL, Parquet processing, and high-performance in-memory data workflows in development and production environments.