Azure Application Insights Private Availability testing for Highly Regulated Enterprises

Azure Application Insights Private Availability testing for Highly Regulated Enterprises

Azure Application Insights is a fundamental service for DevOps teams, the service is part of the overall Azure Monitor stack provided by Azure.  The Application Insights service provides several extremely useful capabilities for instrumenting & monitoring your application workloads.

When adopting any cloud service in a highly regulated enterprise there are some common requirements which start to appear, one such requirement is that the service support private connectivity.  In Azure one of the ways that that a service does this is by adding support for Private Link, in general the Azure Monitor services already support this scenario via Azure Monitor Private Link Scopes.

As per the official docs Azure Monitor Private Link support allows enterprises to do the following:

  • Connect privately to Azure Monitor without opening up any public network access.
  • Ensure your monitoring data is only accessed through authorized private networks.
  • Prevent data exfiltration from your private networks by defining specific Azure Monitor resources that connect through your private endpoint.
  • Securely connect your private on-premises network to Azure Monitor using ExpressRoute and Private Link.
  • Keep all traffic inside the Microsoft Azure backbone network.

A key point to note in this scenario is that private network traffic is mostly supported in an outbound direction from your virtual network in Azure to the relevant Azure Monitor service i.e. Telemetry and Metrics which are being ingested from Application Insights or other Azure Monitor agents running in your network.

For the remainder of this post I will focus on Application Insights Availability Testing, this is a great capability which allows you to monitor the availability of your applications. Out of the box the following two types of tests are currently supported by the service:

  • URL ping test - Simple Http Status check with options for configuring request timeout, expected status code, retries & parsing dependent resources.
  • Multi-step web test - We can record a http session and leverage this a webtest file for executing multiple steps during our testing.

One of the benefits of these built-in test types is that enterprises do not need to manage deployment of an agent or any special infrastructure to enable these scenarios. Another benefit is that these tests can also be configured to originate from one or more Azure Regions which are supported by the Availability testing service.

Unfortunately, not all enterprises are able to leverage the functionality described above as network traffic generated by the Availability testing service originates from a pool of IP addresses assigned to the service from a Microsoft managed network.

There are a couple of alternatives for these enterprises:

  1. Azure Service Tags, it's possible to whitelist inbound traffic from the Application Insights Availability service by leveraging the "ApplicationInsightsAvailablity" Service Tag. Keep in mind that it still requires a public IP address and DNS needs to be publicly resolvable.
  2. Leverage the App Insights SDK to Create and run custom availability tests.

For the remainder of this post, I am going to focus on providing a complete solution leveraging the concepts documented in Option 2. The source code for the solution I am proposing can be found here.

In Azure we have quite a few interesting options for executing our tests:  

I lean towards a container-based approach as a container image is a neat way for sharing this functionality amongst teams and simplifying private test setup for development teams.

Scheduling Private Test Runners with Kubernetes

Let's go through the process of setting up the proposed solution in a lab environment. Before we get started, we require the following pre-requisites:

Typically, enterprise cloud customers start with a Hub and Spoke network topology, these hubs are normally regional and may also be isolated by environment i.e. Production or Development etc. Azure workloads running in the spokes can communicate back on-prem via their hub network and its express route connection.

A hub is a perfect location to place shared compute infrastructure which could be used to run an internal managed service for testing availability of line of business applications. Depending on where your developer teams are deploying their workloads and where their clients are using their services from you may want to run availability tests from multiple locations.

In our lab scenario we will deploy two AKS clusters one in West Europe and the other in Central US.

Before we get started setting up this solution in our lab environment, we need to clone the Git repository.

PS:> git clone https://github.com/keyoke/AzureApplicationInsights.git

Once we have a local copy of the solution, we can start to deploy our private test runners we need to create a "testruns.json" file and add the following contents, update to target your test application.

{
  "PingTests": [
    {
      "Name": "Test Website #1",
      "Url": "http://www.microsoft.com",
      "StatusCode": 200,
      "Timeout": 120,
      "ParseDependentRequests": false,
      "Locations": [ "West Europe", "Central US" ]
    },
    {
      "Name": "Test Website #2",
      "Url": "http://www.bing.com",
      "StatusCode": 200,
      "Timeout": 120,
      "ParseDependentRequests": false,
      "Locations": [ "North Europe", "Central US" ]
    }

  ]
}

It's also possible to host multiple test run files and configure each test runner at execution time to use a specific set of test runs.

As we want to share test run configuration across multiple test runners running in different regions, we will upload the "testrun.json" file to our Storage Account.

Open the Storage account in the Portal and navigate to Create a new Blob service  -> Container. Enter the container name and click Create.

We will leverage the Managed Identity (or Service Principal) to read our test run data from the Storage Container therefore we need to navigate to Access Control (IAM) -> Add Role Assignment. Enter the identity name and select the Storage Blob Data Reader Role.

We can now upload the "testruns.json" file to our Storage Container.

We will need a container image to deploy to your cluster you can leverage the following Dockerfile to build one and subsequently push it to your own Docker Registry or you can leverage my publicly hosted image on DockerHub.

FROM mcr.microsoft.com/dotnet/sdk:3.1 AS build

WORKDIR /privatetestrunner/
COPY . ./

RUN dotnet publish privatetestrunner.console -c Release --output output

FROM mcr.microsoft.com/dotnet/runtime:3.1 AS runtime
COPY --from=build /privatetestrunner/output .
ENTRYPOINT ["dotnet", "privatetestrunner.console.dll"]

Once we have our image there are many ways we could schedule our test runners in Kubernetes, the simplest being Cron Jobs which run a container image on a schedule, in the example below I have used a CRON expression set to run every 15 minutes.

apiVersion: batch/v1beta1
kind: CronJob
metadata:
 name: privatetestrunner
spec:
 schedule: "*/15 * * * *"
 successfulJobsHistoryLimit: 3
 failedJobsHistoryLimit: 3
 jobTemplate:
   spec:
     template:
       spec:
         containers:
         - name: privatetestrunner
           image: docker.io/keyoke/privatetestrunner:0.5
           imagePullPolicy: IfNotPresent
           env:
           - name: TestRunner__StorageContainerEndpoint
             value: https://[ACCOUNT_NAME].blob.core.windows.net/[CONTAINER_NAME]
           - name: TestRunner__StorageBlobName
             value: testruns.json
           - name: TestRunner__InstrumentationKey
             value: [APP_INSIGHTS_IKEY]
           - name: TestRunner__Location
             value: [CLUSTER_REGION]
           - name: AZURE_TENANT_ID   # Optional - Either Manually provide the Service Principal Credentials or Leverage Managed Identity/Pod Identity
             value: [AAD_TENANT_ID]
           - name: AZURE_CLIENT_ID   # Optional
             value: [AAD CLIENT_ID]
           - name: AZURE_CLIENT_SECRET # Optional
             value: [AAD_CLIENT_SECRET]
         restartPolicy: OnFailure

Before applying the "cronjob.yaml" file we need to set the following values:

  • Storage Container Endpoint - The URI to your storage container where your test run configuration is stored.
  • Application Insights Instrumentation Key - The iKey  for the central App Insights resource where all the availability telemetry should be published.
  • Location - The location where this test runner is deployed.
  • Tenant Id - Optional, The Azure Active Directory Tenant for the manually supplied Service Principal.
  • Client Id - Optional, The Client Id for the manually supplied Service Principal.
  • Client Secret - Optional, The Client Secret for the manually supplied Service Principal.

Once you have set these values, we can now deploy our cron job to the cluster, this would be repeated for each region you wish to support.

PS:> kubectl apply -f .\cronjob.yaml 

As the availability tests are executed you will see the results in the Application Insights Availability blade in the Azure Portal.

Running standalone test runners

It's also possible to configure a runner to run in standalone mode, the console and function test runners will both look in a local "appsettings.json" file for their test run configuration, an example of which can be found below.

{
  "TestRunner": {
    "StorageContainerEndpoint": "",
    "StorageBlobName": "",
    "InstrumentationKey": "[APP_INSIGHTS_IKEY]",
    "EndpointAddress": "https://dc.services.visualstudio.com/v2/track",
    "Location": "[LOCATION]",
    "TestRuns": {
      "PingTests": [
      {
      "Name": "Test Website #1",
      "Url": "http://www.microsoft.com",
      "StatusCode": 200,
      "Timeout": 120,
      "ParseDependentRequests": false,
      "Locations": [ "West Europe", "Central US" ]
    },
    {
      "Name": "Test Website #2",
      "Url": "http://www.bing.com",
      "StatusCode": 200,
      "Timeout": 120,
      "ParseDependentRequests": false,
      "Locations": [ "North Europe", "Central US" ]
    }
      ]
    }
  }
}

I hope that you have found this post useful, I wish you all the best on your journey to the cloud!