Skip to main content

dagster-spark library

class dagster_spark.SparkOpError [source]
dagster_spark.define_spark_config [source]

Spark configuration.

See the Spark documentation for reference: https://spark.apache.org/docs/latest/submitting-applications.html

dagster_spark.create_spark_op [source]
dagster_spark.construct_spark_shell_command [source]

Constructs the spark-submit command for a Spark job.

Spark Declarative Pipelines

class dagster_spark.SparkDeclarativePipelineComponent [source]
preview

This API is currently in preview, and may have breaking changes in patch version releases. This API is not considered ready for production use.

State-backed component for Spark Declarative Pipelines (SDP).

Discovers datasets via spark-pipelines dry-run (or source_only), caches state, and builds a multi_asset that runs spark-pipelines run and yields MaterializeResults.

class dagster_spark.components.spark_declarative_pipeline.SparkPipelinesResource [source]
preview

This API is currently in preview, and may have breaking changes in patch version releases. This API is not considered ready for production use.

Dagster resource for Spark Declarative Pipelines: discovery and run.

Use discover_datasets to get datasets from spark-pipelines dry-run (or source_only). Use run_and_observe inside an asset to run the pipeline and yield MaterializeResults.

class dagster_spark.components.spark_declarative_pipeline.SparkDeclarativePipelineScaffolder [source]
preview

This API is currently in preview, and may have breaking changes in patch version releases. This API is not considered ready for production use.

Scaffolds a Spark Declarative Pipeline component defs.yaml and pipeline spec path.