Step-by-Step Guide to integrate DataDog in .NET Core Services on kubernetes (AKS)

It is not quite straight forward to integrate DataDog with your apps, simply because of, surprise, too much documentation. There are so many options and ways to install the agent and tracer and configure them, that you can end up with the wrong configuration and have something broken. Here're the steps that worked for me for this particular combination:

  1. AKS (or any other kubernetes cluster - just a small difference in agent configuration).

  2. DataDog agent installed on kubernetes as a deployment or daemonset via helm charts

  3. .NET Core services - combination of web applications as well as Azure Service Bus consumers and cronjobs.

DataDog agent installation (This was Agent v7, helm chart version 3.61.0)

Create the datadog namespace, datadog

Pick the APIKEY from the DD portal and create a datadog-secret in the datadog namespace.

Install helm charts for datadog by passing the following datadog-values.yaml file

helm repo add datadog https://helm.datadoghq.com
helm repo update
helm install datadog-agent -f datadog-values.yaml datadog/datadog -n datadog
datadog:
  apiKeyExistingSecret: datadog-secret
  site: us3.datadoghq.com
  kubelet:
    host:
      valueFrom:
        fieldRef:
          fieldPath: spec.nodeName
    hostCAPath: /etc/kubernetes/certs/kubeletserver.crt
  apm:
    instrumentation:
      enabled: true
      enabledNamespaces:
      - dev
      - qa
  logs:
    enabled: true
    containerCollectAll: true
    autoMultiLineDetection: true
  serviceMonitoring:
    enabled: true
  networkMonitoring:
    enabled: true
  systemProbe:
    enableOOMKill: true
clusterAgent:
  replicas: 2
  createPodDisruptionBudget: true
  env:
  - name: DD_CLOUD_PROVIDER_METADATA
    value: "azure"
agents:
  containers:
    agent:
      resources:
        requests:
          cpu: 100m
          memory: 512Mi
        limits:
          cpu: 250m
          memory: 1Gi
    traceAgent:
      env:
      - name: DD_APM_IGNORE_RESOURCES
        value: "GET /health/liveness,GET /health/readiness,GET /health"
providers:
  aks:
    enabled: true

As you can see, this is a minimal yaml file that overrides the default values the datadog helm chart has. There are 3 sections that are AKS specific:

A very important piece is the logging and log-trace correlation. I use Serilog, and log to console, and no log files. The simplest way to handle this is to use CompactJsonFormatter. Once the agent is up and running, add the relevant packages in the application:

Nugets:

Datadog.Trace

Serilog

Serilog.AspNetCore

Serilog.Formatting.Compact

Serilog.Sinks.Console

Here's the relevant section from appsettings.json

"Serilog": {
  "Using": [ "Serilog.Sinks.Console" ],
  "MinimumLevel": {
    "Default": "Information",
    "Override": {
      "Azure.Messaging.ServiceBus": "Warning",
      "Microsoft": "Warning",
      "System": "Warning",
      "Polly": "Warning"
    }
  },
  "WriteTo": [
    {
      "Name": "Console",
      "Args": {
        "formatter": "Serilog.Formatting.Compact.CompactJsonFormatter, Serilog.Formatting.Compact"
      }
    }
  ],
  "Filter": [
    {
      "Name": "ByExcluding",
      "Args": {
        "expression": "RequestPath like '%swagger%' OR RequestPath like '%health%'"
      }
    }
  ]
}

Now create a dockerfile. Note that this is a simple alpine based dockerfile where the .net build happens on the Azure DevOps build agent, and the published artifacts are just copied over to the docker image. There is nothing specific to datadog here.

FROM mcr.microsoft.com/dotnet/aspnet:8.0.6-alpine3.20 AS runtime

RUN apk update && apk upgrade --no-cache --available \
    && apk add --no-cache icu-libs curl\
    && addgroup -S --gid 1000 myapp && adduser -D -S --uid 1000 myapp -G myapp

# Configure the application
ENV DOTNET_SYSTEM_GLOBALIZATION_INVARIANT=false
ENV ASPNETCORE_HTTP_PORTS=8080
ENV ASPNETCORE_HTTPS_PORTS=

# Build runtime image for api
FROM runtime AS api
WORKDIR /app
COPY --chown=myapp:myapp Api/ .
USER myapp
EXPOSE 8080
ENTRYPOINT ["dotnet", "Api.dll"]

The final piece is the helm chart for the .net service - this was the most troublesome part, as certain things were not picked up automatically. For example, maybe as I had an alpine distribution, it did not set the source of the service as csharp automatically, and this made log-trace correlation difficult.

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    tags.datadoghq.com/env: "dev" #Imp
    tags.datadoghq.com/service: "api" #Imp
    tags.datadoghq.com/version: "1.0.1" #Imp
spec:
  template:
    metadata:
      labels:
        tags.datadoghq.com/env: "dev" #Imp
        tags.datadoghq.com/service: "api" #Imp
        tags.datadoghq.com/version: "1.0.1" #Imp
        admission.datadoghq.com/enabled: "true" # Imp
      annotations:
        ad.datadoghq.com/api.logs: '[{"source":"csharp"}]'  # Imp
        admission.datadoghq.com/dotnet-lib.version: v2.53.0-musl # Imp
    spec:
      containers:
        - name: <CONTAINER_NAME>
        ...
        ...
        ...
          env:
          - name: DD_LOGS_INJECTION
            value: "true"
          - name: DD_PROFILING_ENABLED
            value: "true"
          - name: DD_RUNTIME_METRICS_ENABLED
            value: "true"

The dotnet version has a -musl appended because it is alpine based image.

Another important piece is the support of serilog with the datadog library that is getting injected because of "admission.datadoghq.com/dotnet-lib.version: v2.53.0-musl". There is generally no problem with minor version upgrades, but major version upgrades of Serilog requires you to check that datadog has added support for it.

Now, with these changes, all logs-trace correlation should work perfectly fine for HTTP based calls. Spans are created automatically and API invocation across services also are handled. However to deal with non-http requests, you need to additionally pass and retrieve the datadog metadata.

Here's a sample of adding these headers while sending and receiving messages from Azure Service Bus, using some extension methods:

public static class DatadogServiceBusExtensions
{
    public static async Task AddDatadogContextAndSendMessageAsync(this ServiceBusSender sender, ServiceBusMessage serviceBusMessage, CancellationToken cancellationToken)
    {
        using var scope = Tracer.Instance.StartActive("servicebus.send");
        SetTagsAndAppProperties(scope, serviceBusMessage);
        await sender.SendMessagesAsync([serviceBusMessage], cancellationToken);
    }

    public static async Task AddDatadogContextAndSendMessagesAsync(this ServiceBusSender sender, IEnumerable<ServiceBusMessage> serviceBusMessages, CancellationToken cancellationToken)
    {
        using var scope = Tracer.Instance.StartActive("servicebus.send");
        foreach (var serviceBusMessage in serviceBusMessages)
        {
            SetTagsAndAppProperties(scope, serviceBusMessage);
        }

        await sender.SendMessagesAsync(serviceBusMessages, cancellationToken);
    }

    private static void SetTagsAndAppProperties(IScope scope, ServiceBusMessage serviceBusMessage)
    {
        scope.Span.SetTag("message.id", serviceBusMessage.MessageId);
        foreach (var prop in serviceBusMessage.ApplicationProperties.Keys)
        {
            scope.Span.SetTag($"message.{prop}", serviceBusMessage.ApplicationProperties[prop].ToString());
        }

        // Inject datadog active span context into the message application properties
        serviceBusMessage.ApplicationProperties["dd-trace-id"] = scope.Span.TraceId.ToString();
        serviceBusMessage.ApplicationProperties["dd-span-id"] = scope.Span.SpanId.ToString();
    }

    public static IScope ExtractDatadogContextAndStartScope(this ServiceBusReceivedMessage message)
    {
        ulong parentTraceId = 0, parentSpanId = 0;
        if (message.ApplicationProperties.TryGetValue("dd-trace-id", out var traceId))
        {
            ulong.TryParse(traceId.ToString(), out parentTraceId);
        }
        if (message.ApplicationProperties.TryGetValue("dd-span-id", out var spanId))
        {
            ulong.TryParse(spanId.ToString(), out parentSpanId);
        }

        var parentSpanContext = new SpanContext(parentTraceId, parentSpanId);
        var scope = Tracer.Instance.StartActive("servicebus.receive", new SpanCreationSettings { Parent = parentSpanContext });
        scope.Span.SetTag("message.id", message.MessageId);
        return scope;
    }
}

While there is a new DD_TRACE_OTEL_ENABLED env variable that was introduced in .NET Datadog.Trace library to automatically add the required metadata when publishing messages on Azure Service Bus via AMQP protocol, it also requires to enable the experimental activity source support in the Azure SDK via the env var AZURE_EXPERIMENTAL_ENABLE_ACTIVITY_SOURCE

If you use this, you can avoid explicitly using dd-trace-id and dd-span-id above, but you would still need to start the trace on receiving it.

Ref#1: .NET and .NET Core Compatibility Requirements (datadoghq.com)

Ref#2: azure-sdk-for-net/sdk/core/Azure.Core/samples/Diagnostics.mdat main · Azure/azure-sdk-for-net (github.com)

Hope that helps someone who is struggling to correlate logs and traces in DataDog and .NET!