Best practices for Direct Data Sharing
Best practices for the credential files (.share files) used for Direct Data Sharing:
- Store your delta sharing tokens (.share files) in a secure location with limited access.
- Use environment variables to reference the file paths of your .share files in your code, rather than hardcoding file paths.
- Regularly rotate your delta sharing tokens to minimize the risk of unauthorized access. You can use the Direct Data Sharing Credentials Management API to rotate your tokens without service interruption (see Token Rotation Guide).
- Limit the number of people who have access to the .share files and ensure that those who do have access are aware of the sensitive nature of the data and the importance of keeping the files secure.
- In case you suspect that your delta sharing tokens have been compromised, immediately revoke them using the PATCH /dds-tokens method on the Direct Data Sharing Credentials Management API with a payload
{"existing_token_expiry_time_in_seconds": 0}. This will invalidate all existing tokens and prevent unauthorized access to your data. - Regularly review and audit access to your .share files and the usage of your delta sharing tokens to ensure that they are being used appropriately and that there are no signs of unauthorized access.
Refer to the Direct Data Sharing Credentials Management API documentation and Direct Data Sharing Getting Started Guide for more information on how to manage and rotate your delta sharing tokens.
Best practices for efficient data fetching:
Selective column fetching
Avoid select * queries on the shared table, to make sure that non-breaking changes do not break your integration. Always specify the exact columns while performing any operations on the shared object.
Partition pruning
For datasets with additional partition keys available, make sure to filter on these partition keys to reduce the amount of data scanned and transferred. This will significantly speed up the data fetch on the user side.
For delta-sharing with Spark, include the partition column in the filter condition of your query:
from pyspark.sql import col
df = (
spark
.read
.format("deltaSharing")
.load(table_url)
.filter(col("dt") == "2026-03-25") # Example filter on the partition column
)
For delta-sharing with pandas, include the partition column in the jsonPredicateHints filter condition of your query:
import delta_sharing
hint = '''{
"op": "equal",
"children": [
{"op": "column", "name":"dt", "valueType":"date"},
{"op": "literal", "value":"2026-03-25", "valueType":"date"}
]
}'''
df = delta_sharing.load_as_pandas(table_url, jsonPredicateHints=hint)
For more details, refer to the delta-sharing documentation.
Contact Support