Slides

https://github.com/keypointt/reading/blob/master/spark/2017_spark_spring_SF_Databricks_structured_streaming.pdf

note

1. Metric processing overview

overview

Tech stack:

  • Kafka
  • Parquet

2. Watermarking example

watermarking

Watermark: Data newer than watermark may be late, but allowed to aggregate

  • moving threshold of how late data is expected to be and when to drop old state
  • Trails behind max seen event time
  • Trailing gap is configurable

Reference: