Photo by Nick Morrison on Unsplash
Step-by-Step Guide to integrate DataDog in .NET Core Services on kubernetes (AKS)
It is not quite straight forward to integrate DataDog with your apps, simply because of, surprise, too much documentation. There are so many options and ways to install the agent and tracer and configure them, that you can end up with the wrong configuration and have something broken. Here're the steps that worked for me for this particular combination:
AKS (or any other kubernetes cluster - just a small difference in agent configuration).
DataDog agent installed on kubernetes as a deployment or daemonset via helm charts
.NET Core services - combination of web applications as well as Azure Service Bus consumers and cronjobs.
DataDog agent installation (This was Agent v7, helm chart version 3.61.0)
Create the datadog namespace, datadog
Pick the APIKEY from the DD portal and create a datadog-secret
in the datadog
namespace.
Install helm charts for datadog by passing the following datadog-values.yaml
file
helm repo add datadog https://helm.datadoghq.com
helm repo update
helm install datadog-agent -f datadog-values.yaml datadog/datadog -n datadog
datadog:
apiKeyExistingSecret: datadog-secret
site: us3.datadoghq.com
kubelet:
host:
valueFrom:
fieldRef:
fieldPath: spec.nodeName
hostCAPath: /etc/kubernetes/certs/kubeletserver.crt
apm:
instrumentation:
enabled: true
enabledNamespaces:
- dev
- qa
logs:
enabled: true
containerCollectAll: true
autoMultiLineDetection: true
serviceMonitoring:
enabled: true
networkMonitoring:
enabled: true
systemProbe:
enableOOMKill: true
clusterAgent:
replicas: 2
createPodDisruptionBudget: true
env:
- name: DD_CLOUD_PROVIDER_METADATA
value: "azure"
agents:
containers:
agent:
resources:
requests:
cpu: 100m
memory: 512Mi
limits:
cpu: 250m
memory: 1Gi
traceAgent:
env:
- name: DD_APM_IGNORE_RESOURCES
value: "GET /health/liveness,GET /health/readiness,GET /health"
providers:
aks:
enabled: true
As you can see, this is a minimal yaml file that overrides the default values the datadog helm chart has. There are 3 sections that are AKS specific:
The kubelet certificate section:https://docs.datadoghq.com/containers/kubernetes/distributions/?tab=helm#aks-kubelet-certificate
The cluster agent env variable for cloud provider
The provider section.
A very important piece is the logging and log-trace correlation. I use Serilog, and log to console, and no log files. The simplest way to handle this is to use CompactJsonFormatter. Once the agent is up and running, add the relevant packages in the application:
Nugets:
Datadog.Trace
Serilog
Serilog.AspNetCore
Serilog.Formatting.Compact
Serilog.Sinks.Console
Here's the relevant section from appsettings.json
"Serilog": {
"Using": [ "Serilog.Sinks.Console" ],
"MinimumLevel": {
"Default": "Information",
"Override": {
"Azure.Messaging.ServiceBus": "Warning",
"Microsoft": "Warning",
"System": "Warning",
"Polly": "Warning"
}
},
"WriteTo": [
{
"Name": "Console",
"Args": {
"formatter": "Serilog.Formatting.Compact.CompactJsonFormatter, Serilog.Formatting.Compact"
}
}
],
"Filter": [
{
"Name": "ByExcluding",
"Args": {
"expression": "RequestPath like '%swagger%' OR RequestPath like '%health%'"
}
}
]
}
Now create a dockerfile. Note that this is a simple alpine based dockerfile where the .net build happens on the Azure DevOps build agent, and the published artifacts are just copied over to the docker image. There is nothing specific to datadog here.
FROM mcr.microsoft.com/dotnet/aspnet:8.0.6-alpine3.20 AS runtime
RUN apk update && apk upgrade --no-cache --available \
&& apk add --no-cache icu-libs curl\
&& addgroup -S --gid 1000 myapp && adduser -D -S --uid 1000 myapp -G myapp
# Configure the application
ENV DOTNET_SYSTEM_GLOBALIZATION_INVARIANT=false
ENV ASPNETCORE_HTTP_PORTS=8080
ENV ASPNETCORE_HTTPS_PORTS=
# Build runtime image for api
FROM runtime AS api
WORKDIR /app
COPY --chown=myapp:myapp Api/ .
USER myapp
EXPOSE 8080
ENTRYPOINT ["dotnet", "Api.dll"]
The final piece is the helm chart for the .net service - this was the most troublesome part, as certain things were not picked up automatically. For example, maybe as I had an alpine distribution, it did not set the source of the service as csharp automatically, and this made log-trace correlation difficult.
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
tags.datadoghq.com/env: "dev" #Imp
tags.datadoghq.com/service: "api" #Imp
tags.datadoghq.com/version: "1.0.1" #Imp
spec:
template:
metadata:
labels:
tags.datadoghq.com/env: "dev" #Imp
tags.datadoghq.com/service: "api" #Imp
tags.datadoghq.com/version: "1.0.1" #Imp
admission.datadoghq.com/enabled: "true" # Imp
annotations:
ad.datadoghq.com/api.logs: '[{"source":"csharp"}]' # Imp
admission.datadoghq.com/dotnet-lib.version: v2.53.0-musl # Imp
spec:
containers:
- name: <CONTAINER_NAME>
...
...
...
env:
- name: DD_LOGS_INJECTION
value: "true"
- name: DD_PROFILING_ENABLED
value: "true"
- name: DD_RUNTIME_METRICS_ENABLED
value: "true"
The dotnet version has a -musl appended because it is alpine based image.
Another important piece is the support of serilog with the datadog library that is getting injected because of "admission.datadoghq.com/dotnet-lib.version: v2.53.0-musl". There is generally no problem with minor version upgrades, but major version upgrades of Serilog requires you to check that datadog has added support for it.
Now, with these changes, all logs-trace correlation should work perfectly fine for HTTP based calls. Spans are created automatically and API invocation across services also are handled. However to deal with non-http requests, you need to additionally pass and retrieve the datadog metadata.
Here's a sample of adding these headers while sending and receiving messages from Azure Service Bus, using some extension methods:
public static class DatadogServiceBusExtensions
{
public static async Task AddDatadogContextAndSendMessageAsync(this ServiceBusSender sender, ServiceBusMessage serviceBusMessage, CancellationToken cancellationToken)
{
using var scope = Tracer.Instance.StartActive("servicebus.send");
SetTagsAndAppProperties(scope, serviceBusMessage);
await sender.SendMessagesAsync([serviceBusMessage], cancellationToken);
}
public static async Task AddDatadogContextAndSendMessagesAsync(this ServiceBusSender sender, IEnumerable<ServiceBusMessage> serviceBusMessages, CancellationToken cancellationToken)
{
using var scope = Tracer.Instance.StartActive("servicebus.send");
foreach (var serviceBusMessage in serviceBusMessages)
{
SetTagsAndAppProperties(scope, serviceBusMessage);
}
await sender.SendMessagesAsync(serviceBusMessages, cancellationToken);
}
private static void SetTagsAndAppProperties(IScope scope, ServiceBusMessage serviceBusMessage)
{
scope.Span.SetTag("message.id", serviceBusMessage.MessageId);
foreach (var prop in serviceBusMessage.ApplicationProperties.Keys)
{
scope.Span.SetTag($"message.{prop}", serviceBusMessage.ApplicationProperties[prop].ToString());
}
// Inject datadog active span context into the message application properties
serviceBusMessage.ApplicationProperties["dd-trace-id"] = scope.Span.TraceId.ToString();
serviceBusMessage.ApplicationProperties["dd-span-id"] = scope.Span.SpanId.ToString();
}
public static IScope ExtractDatadogContextAndStartScope(this ServiceBusReceivedMessage message)
{
ulong parentTraceId = 0, parentSpanId = 0;
if (message.ApplicationProperties.TryGetValue("dd-trace-id", out var traceId))
{
ulong.TryParse(traceId.ToString(), out parentTraceId);
}
if (message.ApplicationProperties.TryGetValue("dd-span-id", out var spanId))
{
ulong.TryParse(spanId.ToString(), out parentSpanId);
}
var parentSpanContext = new SpanContext(parentTraceId, parentSpanId);
var scope = Tracer.Instance.StartActive("servicebus.receive", new SpanCreationSettings { Parent = parentSpanContext });
scope.Span.SetTag("message.id", message.MessageId);
return scope;
}
}
While there is a new DD_TRACE_OTEL_ENABLED
env variable that was introduced in .NET Datadog.Trace
library to automatically add the required metadata when publishing messages on Azure Service Bus via AMQP protocol, it also requires to enable the experimental activity source support in the Azure SDK via the env var AZURE_EXPERIMENTAL_ENABLE_ACTIVITY_SOURCE
If you use this, you can avoid explicitly using dd-trace-id and dd-span-id above, but you would still need to start the trace on receiving it.
Ref#1: .NET and .NET Core Compatibility Requirements (datadoghq.com)
Ref#2: azure-sdk-for-net/sdk/core/Azure.Core/samples/Diagnostics.mdat main · Azure/azure-sdk-for-net (github.com)
Hope that helps someone who is struggling to correlate logs and traces in DataDog and .NET!