Getting Started with Zalando Direct Data Sharing

Overview

Zalando Direct Data Sharing enables Partners to access their Zalando data directly using the delta sharing protocol. With Direct Data Sharing, you can work with up-to-date data while maintaining security and performance.

Key Benefits

  • Instant Access: Access shared data programmatically and process/store the data from your own data platform/reporting or Business Intelligence (BI) solution.
  • Minimized Manual Work: Eliminate the need to download reports manually from Zalando portals.
  • Flexible Compatibility: Direct Data Sharing makes it easy to access data whether you're using SQL, Python, or BI tools. You can work with up-to-date data while maintaining security and performance.
  • Efficient Data Handling: Enable deduplication and delta updates to simplify data ingestion and processing. Zalando provides a created_at column for every dataset shared, allowing information on the version of the data available.
  • Secure and Controlled: Access is granted on Zalando's merchant center. This allows access to the Direct Data Sharing credentials and all shared data under the partner's account in Zalando.

Prerequisites

Before you begin, ensure you have:

  1. A zdirect application with Direct Data Sharing API permissions:

  2. Your application credentials:

    • Client ID
    • Client Secret

Note

For more information on configuring client applications with scopes and merchants, see Creating and Managing Apps.

Quick Start:

The code snippets below use Python 3.11+ with the following packages: - delta-sharing - pandas or pyspark

Note

The auth_token_url and dds_url in the example snippets below use the sandbox endpoints. When accessing your actual Zalando data, replace the sandbox URLs with:

  • Authentication URL: https://api.merchants.zalando.com/auth/token
  • Direct Data Sharing Token URL: https://api.merchants.zalando.com/dds-tokens

Step 1: Get Your Initial Access Token

Retrieve your Direct Data Sharing access credentials (.share file) to connect to the Delta Sharing service.

import requests

# Configuration - replace with your actual credentials
auth_token_url = 'https://api-sandbox.merchants.zalando.com/auth/token'
dds_url = 'https://api-sandbox.merchants.zalando.com/dds-tokens'
client_id = "<your-client-id>"
client_secret = "<your-client-secret>"

# Get the zdirect authentication token
auth_response = requests.post(
    auth_token_url,
    headers={'Content-Type': 'application/x-www-form-urlencoded'},
    data={
        "grant_type": "client_credentials",
        "client_id": client_id,
        "client_secret": client_secret,
        "scope": "access_token_only"
    }
)

access_token = auth_response.json().get("access_token")
print("Access token retrieved successfully.")

# Step 2: Get the Direct Data Sharing token (first time setup)
headers = {
    "Authorization": f"Bearer {access_token}",
    "Content-Type": "application/json"
}

# use the method - GET
dds_response = requests.get(dds_url, headers=headers)

# print result or raise error
dds_response.raise_for_status()
response = dds_response.json()
print(f"Direct Data Sharing Token Response: {response}")

The token activation link would be visible here and the .share file can be downloaded from it.

Caution

Security Best Practices:

  • Never share your .share file via email or chat
  • Tokens expire after 90 days - set up automated rotation (see Token Management).

Please refer to the Direct Data Sharing Best Practices Guide for more information on securing the share file.

Step 2: List Available Datasets

Discover what data is available to you. The example below uses pandas to discover the available datasets

import delta_sharing

# Path to your downloaded credentials file
credentials_file = "path/to/credentials.share"

# Create a Delta Sharing client
client = delta_sharing.SharingClient(credentials_file)

# List all available datasets
datasets = client.list_all_tables()

print("Available datasets:")
for dataset in datasets:
    print(f"  - {dataset.share}.{dataset.schema}.{dataset.name}")

Example Output:

Available datasets:
  - cxm_country_assessment_share.direct_data_sharing.cxm_country_assessment
  - return_order_in_transit_snapshot_share.direct_data_sharing.return_order_in_transit_snapshot
  - sales_performance.direct_data_sharing.sales_performance_kpi_pp

Step 3: Read Data from a Dataset

Load data from any available dataset.

import delta_sharing
import pandas as pd

# Path to your credentials file
credentials_file = "path/to/credentials.share"

# Specify the dataset you want to read
# Format: credentials_file + "#<share_name>.<schema_name>.<table_name>"
table_url = credentials_file + "#cxm_country_assessment_share.direct_data_sharing.cxm_country_assessment"

# Load the data into a Pandas DataFrame
df = delta_sharing.load_as_pandas(table_url)

# Display the first few rows
print(df.head())

# Now you can work with the data
print(f"Total rows: {len(df)}")
print(f"Columns: {list(df.columns)}")

For more information on accessing large datasets, please refer to the Direct Data Sharing Best Practices Guide - Data Fetching

Working with Large Datasets - The Pyspark approach

For large-scale data processing, you can also use Apache Spark instead of Pandas:

# Install required dependencies
pip install delta-sharing pyspark
from pyspark.sql import SparkSession
import delta_sharing

# Initialize Spark session
spark = SparkSession.builder \
    .appName("Delta Sharing Example") \
    .config("spark.jars.packages", "io.delta:delta-sharing-spark_2.13:4.1.0") \
    .getOrCreate()

# Path to your credentials file
credentials_file = "path/to/credentials.share"

# Define the table URL
table_url = credentials_file + "#cxm_country_assessment_share.direct_data_sharing.cxm_country_assessment"

# Read data into a Spark DataFrame
df = spark.read.format("deltaSharing").load(table_url)

# Display the data
df.show()

# Perform Spark operations
df.printSchema()

For more details on working with Spark and advanced use cases, see Data Loading Best Practices.

Additional Resources

Contact Support