Ways to reduce Entra ID (AAD) log size in Microsoft Sentinel.

By | August 27, 2023

Are your Entra ID (AAD) logs getting too expensive in Sentinel? I have a theory on this. These logs appear to be created primarily for reporting purposes. Optimal sizing may not have been a primary consideration. In my experience these logs can grow rapidly and may become too expensive to retain in Sentinel.

The two primary Entra logs in terms of size are the interactive and non-interactive sign-in logs. The latter being the largest. Interactive logs are user driven activity. The non-interactive logs are based on activity performed by a client app or OS components on behalf of a user.

Interactive logs are largely the result of user-based patters, conditional access policies, and token refresh limits (90-days by default but many set a much shorter refresh).

Non-integrative frequency is determined by apps. This is largely outside of our control. Especially these are things like Teams and Exchange Online.

These logs are available for 90-days and Entra includes a reporting tool to help you explore these logs.

Why would you send these logs to Sentinel?

  • For dashboarding? – Entra provides an excellent dashboard (way better than any workbook).
  • For alerting? – If you integrate with M365D that may not be required. See the following description of the M365D integration with Entra ID.  AADIP now integrates with M365D
  • For archival? – Entra only stores the data for 90 days. Though consider that some of this data may also be present in the M365D logs.

So alerting and dashboarding are provided and the logs may be too large and expensive for Sentinel ingestion and archival.

What are your option when the cost gest too high?

  1. Turn off Entra ID log integration with Sentinel (at a minimum turn off the large non-interactive logs. Relying on M365D alone for alerting and archival.
  2. Filter the data using an pre-ingestion transformation to reduce the size in Sentinel without sacrificing forensic value.
  3. Set the table(s) to use the Basic tier in log analytics (though these tables may not be supported yet. This lowers the ingestion price but the logs are archived after 8 days and this will not reduce archival costs.
  4. Send these logs to Azure Data Explorer via an event hub. This can significantly lower ingestion and archival costs. Data remains accessible but is not available directly for Sentinel alert rules. Though this option can be more complex to configure.

If you need help looking into this further. The key non-interactive log columns are Identity, Status, ResourceDisplayName, IPAddress, AuthenticationProcessingDetails, and AppDisplayName.

Summarize on these columns or combine with Identity, for example:

AADNonInteractiveUserSignInLogs
| summarize count() by Identity, ResourceDisplayName | sort by count_

AADNonInteractiveUserSignInLogs
| summarize count() by Status ¸AuthenticationProcessingDetails, | sort by count_

The goal here is to identify noisy inputs and hopefully to correct or trim.

If you are performing a transform filter, I would start by dropping the following columns in a transform. These are low-value or duplicate or blank.

project-away ConditionalAccessPolicies, Type, TokenIssuerType, TokenIssuerName, TenantId, SourceSystem,SignInIdentifierType,SignInEventTypes,ServicePrincipalId, RiskEventTypes_V2, RiskEventTypes, ResourceGroup, OperationVersion, OperationName, MfaDetail, Level, UserDisplayName, DurationMs, Category, AutonomousSystemNumber, AppliedEventListeners, AlternateSignInName, RiskDetail, IsRisky, AuthenticationProtocol, AuthenticationMethodsUsed, RiskLevelDuringSignIn

You might consider dropping AuthenticationDetails since this is duplicate (parsed) and ConditionalAccessPolicies.

The CA column will list ever CA policy that applies to the users, even disabled and reporting status. I can’t see major forensic value in retaining this column over cost. This info is only to support the reporting dashboard in Entra to help evaluate CA policies. I suspect dropping these columns could cut the non-interactive log size by 50% (without sacrificing forensic value).

You could also check for duplicate records. If you see a large number of duplicate logs it may warrant a support case.

AADNonInteractiveUserSignInLogs //only for a few hours is all you need
| extend T2 = replace_string(tostring(CreatedDateTime), "/", "")
| extend CheckString =strcat(Identity, AppDisplayName, AppId, HomeTenantId, AutonomousSystemNumber, T2)
| summarize count() by CheckString

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.