Monitoring Databricks Structured Streaming Queries in Datadog

Pablo Beltran
Fri Apr 29 2022
ANNOUNCEMENT

At Statsig, we recently transitioned to using structured streaming for our ETL. As the number of streaming queries grew, we wanted a centralized place where we could quickly view a snapshot of all our pipelines.

What we want to monitor

When it comes to monitoring our queries, we are primarily interested in answering three questions:

  1. What is the input rate?
  2. What is the processing rate?
  3. What is the age of the freshest data being processed?

Pre-existing Structured Streaming UI

Structured Streaming queries on Databricks already come with a UI attached to them when they are started, which helps monitor the input rate and processing rate (#1 and #2) at the job level:

These graphs help look at the performance of a specific job, but when you have a lot of queries, it is not practical to individually check each one. Additionally, this UI does not answer question number three: What is the age of the freshest data being processed?

Setting up the Datadog Agent

Before monitoring the query, we first need to install the Datadog Agent on the cluster. You can follow the ‘Driver only’ instructions on the Datadog Databricks integration guide. Before running the script provided by Datadog, you need to modify it to set ‘enable_query_name_tag’ to true. This is placed under ‘instances’ like this:

instances:
- spark_url: http://\$DB_DRIVER_IP:\$DB_DRIVER_PORT
spark_cluster_mode: spark_standalone_mode
cluster_name: \${hostip}
streaming_metrics: true
enable_query_name_tag: true <----

This will tag your metrics with the QueryName you provide, allowing you to view them individually even if there is more than one query per cluster. You also need to enable `dogstatsd` by adding the following two lines to the script:

# Enable dogstatsd
echo "use_dogstatsd: true" >> /etc/datadog-agent/datadog.yaml
echo "dogstatsd_port: 8125" >> /etc/datadog-agent/datadog.yaml

Running the Datadog agent on your cluster

To run the Datadog agent on your cluster, you need to have the install script you generated run as an init script as well as enable streaming metrics and pass in your Datadog API key:

How to set the install script to run at startup
Enable streaming metrics and provide your API key

Setting up the query for monitoring

In order to select a specific query when making a dashboard, we need to provide a QueryName for the query to use. This can be done when writing the query:

.writeStream
.queryName(query_name) <---
.outputMode("append")
.format("delta")
.option("checkpointLocation", checkpoint)
.toTable(tablename)

Checking the data freshness

Input rate and processing rate are automatically tracked for you. However, if you want to know your data freshness, you need to track it using foreachBatch. First, you need to pip install datadog. Then you can track the freshness like this:

from datadog import statsd
.writeStream
.queryName(query_name) <---
.format("delta")
.option("checkpointLocation", checkpoint)
.toTable(tablename)
def record_freshness(df, epoch_id):
timestamp = df.limit(1).collect()[0]['enqueued_time']
freshness = (datetime.now() - timestamp).total_seconds()
statsd.gauge(
'streaming.freshness_seconds',
freshness,
tags=['query_name:'+query_name]
)
(
query
.writeStream
.queryName(query_name)
.outputMode("append")
.foreachBatch(record_freshness)
.format("delta")
.option("checkpointLocation", checkpoint)
.start()
)

If your query does not use ‘foreachBatch’, you can create a second query that reads the updates from your first query and records metrics:

spark.readStream
.option('startingVersion', 'latest')
.format("delta")
.table(tablename)
.select("event_time")
.writeStream
.foreachBatch(update_freshness)
.start()

Creating the dashboard

Now that we have all the metrics we care about being tracked, we can build a dashboard on Datadog. We can track all of these metrics using time-series graphs. Here is a quick guide on setting up a graph.

Final Results

This is what the dashboard could look like when all the charts are set up:

You can quickly swap to a different $query_name to view graphs for different queries. These dashboards will allow you to ensure that your queries are keeping up with incoming data and track how changes are affecting performance.


Try Statsig Today

Explore Statsig’s smart feature gates with built-in A/B tests, or create an account instantly and start optimizing your web and mobile applications. You can also schedule a live demo or chat with us to design a custom package for your business.

MORE POSTS

Recently published

Quant vs. Qual

MARGARET-ANN SEGER

💡 How to decide between leaning on data vs. research when diagnosing and solving product problems Four heuristics I’ve found helpful when deciding between data vs. research to diagnose + solve a problem. Earth image credit of Moncast Drawing. As a PM, data...

Read more

The Importance of Default Values

TORE

Have you ever sent an email to the wrong person? Well I have. At work. From a generic support email address. To a group of our top customers. Facepalm. In March of 2018, I was working on the games team at Facebook. You may remember that month as a tumultuous...

Read more
ANNOUNCEMENT

CUPED on Statsig

CRAIG

Run experiments with more speed and accuracy We’re pleased to announce the rollout of CUPED for all our customers. Statsig will now automatically use CUPED to reduce variance and bias on experiments’ key metrics. This gives you access to a powerful experiment...

Read more

Culture of Experimentation

ANU SHARMA

You Can’t Invent Without Experimenting When Amazon launched Home Services, the team was convinced that most people want to schedule home installations in the mornings, evenings, or weekends. This naturally constrained the number of available time slots, and...

Read more

Leading a team of lions

ANU SHARMA

Training your team to make independent decisions Image Courtesy: The New Yorker “It was like the debate of a group of savages as to how to extract a screw from a piece of wood. Accustomed only to nails, they had made one effort to pull out the screw by main...

Read more

Why do my Facebook Groups look different?

TORE

Photo by Joshua Hoehne on Unsplash By now, most people realize that when they open Facebook or Instagram on their phone, their experience is very different than the person next to them. It goes deeper than just the content that you see, and the ranking...

Read more

We use cookies to ensure you get the best experience on our website.

Privacy Policy