dagster-spark library
- dagster_spark.define_spark_config [source]
Spark configuration.
See the Spark documentation for reference: https://spark.apache.org/docs/latest/submitting-applications.html
- dagster_spark.construct_spark_shell_command [source]
Constructs the spark-submit command for a Spark job.
Spark Declarative Pipelines
classdagster_spark.SparkDeclarativePipelineComponent [source]- preview
This API is currently in preview, and may have breaking changes in patch version releases. This API is not considered ready for production use.
State-backed component for Spark Declarative Pipelines (SDP).
Discovers datasets via spark-pipelines dry-run (or source_only), caches state, and builds a multi_asset that runs spark-pipelines run and yields MaterializeResults.
classdagster_spark.components.spark_declarative_pipeline.SparkPipelinesResource [source]- preview
This API is currently in preview, and may have breaking changes in patch version releases. This API is not considered ready for production use.
Dagster resource for Spark Declarative Pipelines: discovery and run.
Use discover_datasets to get datasets from spark-pipelines dry-run (or source_only). Use run_and_observe inside an asset to run the pipeline and yield MaterializeResults.
classdagster_spark.components.spark_declarative_pipeline.SparkDeclarativePipelineScaffolder [source]- preview
This API is currently in preview, and may have breaking changes in patch version releases. This API is not considered ready for production use.
Scaffolds a Spark Declarative Pipeline component defs.yaml and pipeline spec path.