Analyzing Sentinel Data with Python - Andrew Blumhardt's Blog

KQL is easy to learn, efficient, readable, and perfect for daily hunting and incident response. It powers queries across Microsoft Sentinel, Azure Monitor Logs, and the Advanced Hunting experience in Microsoft XDR. Every time you open the Logs blade or run a query in the portal, you are using KQL to explore the Analytics Tier.

For most analysts, KQL is enough. But sometimes you need more flexibility to automate an analysis, train a model, or build visualizations that go beyond what the portal can do. That is where Python and notebooks come in.

Working Locally in VS Code

The easiest way to begin is by running Python in a local desktop environment. Visual Studio Code (VS Code) is the most popular option, though Jupyter Notebook and PyCharm work just as well.

Running code locally means the development environment is on your computer, but the data or compute resources do not have to be local. You can connect to cloud services such as Microsoft Sentinel, Azure Monitor, or Azure Machine Learning while still writing and executing the code from your desktop.

In this context, I am referring to a true local run where data is loaded into a pandas DataFrame in memory and processed using local CPU and memory. This helps demonstrate how pandas operates as a local processing engine before scaling up to cloud-hosted notebooks or distributed compute environments.

After installing Python, add a few important libraries that make working with Sentinel data simple.

pip install pandas numpy matplotlib azure-identity msticpy

Library	Purpose
pandas	The core library for working with tabular data. It allows you to add, rename, and filter columns, modify values, sort, summarize, join, and combine data.
numpy	Provides fast numerical operations that make pandas calculations efficient.
matplotlib	A charting library for creating plots, histograms, and time-series visualizations.
azure-identity	Handles Azure authentication tokens for accessing APIs securely.
msticpy	Microsoft Threat Intelligence library for connecting directly to Sentinel and other security data sources.

Loading Sentinel Data into pandas

You can analyze Sentinel data by loading it into a pandas DataFrame in several ways. The simplest is to export data manually from Sentinel after running a KQL query in the portal, then open it directly.

import pandas as pd
df = pd.read_csv("SecurityEvent.csv")

That single line loads your exported CSV or JSON file into memory for analysis.

You can also connect programmatically for live queries.

Using the Log Analytics API

The Log Analytics API is a REST endpoint that allows you to run KQL queries programmatically against your Sentinel or Azure Monitor workspace. Instead of typing a query in the portal, you send it to the API using an HTTPS request. The API executes the KQL server-side and returns the results as structured JSON.

This is useful for automation, scheduled queries, or integrating Sentinel data into external tools or notebooks. In Python, it is often used with the requests library and azure-identity for authentication.

import requests, pandas as pd from azure.identity import DefaultAzureCredential 
token = DefaultAzureCredential().get_token("https://api.loganalytics.io/.default")
query = {"query": "SecurityEvent | take 100"} headers = {"Authorization": f"Bearer {token.token}"} url = "https://api.loganalytics.io/v1/workspaces/<workspace-id>/query" r = requests.get(url, headers=headers, params=query) data = r.json() columns = [c["name"] for c in data["tables"][0]["columns"]] 
df = pd.DataFrame(data["tables"][0]["rows"], columns=columns)

Using MSTICPy

MSTICPy (Microsoft Threat Intelligence Python library) is an open-source library developed by Microsoft’s security team. It provides ready-made functions for connecting to Microsoft Sentinel, Microsoft Defender, and other data sources.

MSTICPy handles authentication automatically, runs KQL queries directly from Python, and returns results as pandas DataFrames. It also includes enrichment tools for IP lookups, geolocation, timeline charts, and entity mapping.

These methods let you pull data directly from Sentinel into a DataFrame for further analysis.

from msticpy.data import QueryProvider qry = QueryProvider("MSSentinel") qry.connect(workspace="your-workspace-id") 
df = qry.exec_query("SecurityEvent | summarize count() by Computer")

Understanding pandas

pandas is not a query language like KQL. Rather, it is a library for performing table operations similar to KQL operators. You can add or rename columns, modify values, filter rows, sort, summarize, join, and union tables.

df_filtered = df[df["EventID"] == 4624]
df_summary = df_filtered.groupby("Computer").size().reset_index(name="Logons")
df_summary.sort_values("Logons", ascending=False).head()

Where KQL is usually run in a cloud portal, pandas loads data into memory as a DataFrame (often shortened to df). This makes it powerful but also limited by available RAM. On a typical workstation, you can comfortably work with datasets up to a few million rows. Larger data may slow down or crash depending on system memory.

Because data is local, security is also a concern. Sentinel logs may contain sensitive information, so exporting them to a personal device should only be done when approved. Use local notebooks for learning or testing, not for production hunting or regulated data.

Since pandas is used in local notebooks, Sentinel notebooks, and data lake Spark notebooks, it is worth learning the basics. Many KQL concepts translate directly, and with tools like GitHub Copilot, even users with limited programming experience can get started quickly.

Expanding Beyond Queries with pandas

Unlike KQL, which is optimized for queries, pandas is both a query and data analysis library. Once data is in a DataFrame, you can run table-level analysis that goes beyond simple filtering or aggregation. For example, you can calculate correlations between columns, detect outliers, normalize or scale values, compute statistical summaries, and perform time-based analysis. pandas also supports merging multiple data sources, reshaping tables, pivoting, and creating rolling windows for trend analysis.

df.describe()              # Summary statistics
df.corr()                  # Correlation between numeric columns
df["EventID"].value_counts(normalize=True)  # Distribution
df.rolling(7).mean()       # 7-day rolling average

These operations make pandas a bridge between log analysis and machine learning. Once the data is structured and explored, you can hand it off directly to modeling or visualization libraries without leaving your notebook.

Comparing KQL and pandas Operations

KQL Operation	pandas Equivalent	Example
`project`	`df[["col1", "col2"]]`	Select columns
`extend`	`df["NewCol"] = df["col1"] * 2`	Add calculated column
`where`	`df[df["col1"] > 10]`	Filter rows
`summarize count()`	`df.groupby("col1").size()`	Aggregate
`summarize avg(col1)`	`df.groupby("col2")["col1"].mean()`	Aggregate with average
`sort by`	`df.sort_values("col1")`	Sort
`join`	`pd.merge(df1, df2, on="key")`	Join two tables
`union`	`pd.concat([df1, df2])`	Append tables
`distinct`	`df.drop_duplicates()`	Remove duplicates
`top`	`df.nlargest(10, "col1")`	Top results
`take`	`df.head(10)`	First n rows

If you already know KQL, this table makes pandas far less intimidating.

Sentinel Notebooks

Sentinel notebooks provide a cloud-hosted IDE for creating and running Python notebooks directly within Microsoft Sentinel or the Defender XDR portal. They are powered by Azure Machine Learning, which provides a managed compute environment where you can edit, run, and share notebooks securely in the cloud.

These notebooks connect directly to your Sentinel workspace and come preloaded with libraries such as pandas, numpy, matplotlib, and MSTICPy. You can query Sentinel data with KQL through MSTICPy, analyze results using pandas, and visualize findings without leaving the portal.

Because they run on Azure Machine Learning compute, your data stays inside Azure under the same security and compliance controls that govern Sentinel. This combines the flexibility of Jupyter-style development with the scale and security of a managed cloud service.

The main advantages are:

Your data stays within Azure security boundaries.
You can query live Sentinel data with KQL through MSTICPy.
You can install additional packages with pip: !pip install duckdb polars --quiet
You can shut down compute when finished to stop billing.

Since the compute runs on an Azure VM, performance is similar to a powerful local machine. You pay only while the notebook is running.

While there is no dedicated Sentinel notebook extension for Visual Studio Code, you can develop and manage Sentinel notebooks in local VS Code using the Jupyter and Azure Machine Learning extensions. This lets you edit and test notebooks locally, then connect to your Azure ML workspace to run or deploy them on the same compute used by Sentinel notebooks.

Scheduling is not built directly into Sentinel notebooks, but you can trigger them using Azure Machine Learning pipelines, Azure Logic Apps, or Azure Automation.

Spark Notebooks and the data lake

Spark notebooks are used when your data is stored in the Sentinel data lake (SDL) rather than in the standard Analytics Tier. The data lake stores Sentinel tables as Parquet files in Azure Data Lake Storage, optimized for large-scale analytics. Spark notebooks can query this data directly using SQL or PySpark, making them ideal for historical analysis, correlation, and machine learning workloads.

Example:

df = spark.read.parquet("abfss://sentinel@storageaccount.dfs.core.windows.net/SecurityEvent/")
df.createOrReplaceTempView("SecurityEvent")

spark.sql("""
SELECT Computer, COUNT(*) AS Events
FROM SecurityEvent
WHERE ingestion_date = '2025-10-18'
GROUP BY Computer
ORDER BY Events DESC
""").show()

You can also use pandas-on-Spark for a more familiar syntax while still taking advantage of distributed compute:

import pyspark.pandas as ps

psdf = ps.read_parquet("abfss://sentinel@storageaccount.dfs.core.windows.net/SecurityEvent/")
summary = psdf.groupby("Computer")["EventID"].count().sort_values(ascending=False)
summary.head()

Spark notebooks cannot run KQL queries. They are limited to SQL and PySpark operations. Spark notebooks scale across multiple nodes, allowing you to analyze months or years of data efficiently. Costs are based on cluster runtime and storage, and compute stops automatically when the session ends.

The MicrosoftSentinelProvider Class

The MicrosoftSentinelProvider class is included with the Microsoft Sentinel extension for Visual Studio Code and provides a simplified way to read Sentinel data in Spark notebooks. It abstracts the complexity of Spark read operations, letting you load tables from either the Analytics Tier or the data lake without writing raw Spark or storage path code.

Unlike MSTICPy, which connects to Sentinel using KQL through the Log Analytics API, the MicrosoftSentinelProvider operates entirely within Spark. It does not support KQL queries and instead uses PySpark DataFrames for all data access.

Core Methods

Method	Description
read_table(table_name, tier)	Reads a Sentinel table into a Spark DataFrame from the analytics or lake tier.
get_table_list(tier)	Lists all available tables in the selected tier.
save_as_table(dataframe, table_name, tier, mode)	Writes a Spark DataFrame to a table (optional use).
drop_table(table_name, tier)	Deletes a specified table from the selected tier.

Example Usage

from microsoftsentinel import MicrosoftSentinelProvider

# Initialize the provider
provider = MicrosoftSentinelProvider()

# Read a table from the Sentinel data lake
df = provider.read_table("SecurityEvent", tier="lake")

# Display record count and columns
print(f"Rows: {df.count()}, Columns: {df.columns}")

# Query using Spark SQL
df.createOrReplaceTempView("SecurityEvent")
spark.sql("""
SELECT Account, COUNT(*) AS Logons
FROM SecurityEvent
GROUP BY Account
ORDER BY Logons DESC
""").show()

The provider can also read from the Analytics Tier using the same syntax:

df = provider.read_table("SecurityEvent", tier="analytics")

Choosing the Right Environment

Environment	Language	Scale	Security	Cost	Best For
KQL in Azure Monitor	KQL	GB	High	Included	Daily hunting and detections
Local VS Code / pandas	Python	MB–GB	Moderate (risk of local data)	Free	Learning, testing, small data
Sentinel Notebook	Python + KQL	GB	High	Pay per active VM	Secure analytics, automation
Spark Notebook (data lake)	SQL / PySpark	TB–PB	High	Pay per cluster runtime	Long-term analytics, ML

Summary

KQL is simple, fast, and ideal for everyday security operations. pandas extends those same table operations into Python, allowing analysts to enrich, visualize, and model Sentinel data in new ways.

Working locally in VS Code or Jupyter is great for learning and small projects. Sentinel notebooks keep everything inside Azure and make it easier to analyze data securely at scale. Spark notebooks take that a step further, letting teams explore and model massive datasets in the data lake using distributed compute.

For those already comfortable with KQL, pandas introduces powerful new capabilities that feel familiar but go much further. With tools like GitHub Copilot, it has never been easier to learn Python and start writing pandas code, even for those without a strong programming background.

In a follow-up article, I’ll take a deeper look at pandas, exploring practical examples and techniques for transforming Sentinel data into new insights.