VSTS: Collect Telemetry for build and release tasks using Application Insights

Developers of extensions for Visual Studio Team Services (or Team Foundation Server) tend to use Application Insights to collect telemetry about their extensions.

Metrics like usage, performance, errors are collected to know which features are used, which features are working, how they are performing and if they are failing to improve them and better deliver value to its end users.

There are several types of extensions

  • Dashboard widgets
  • Hubs
  • Actions
  • Work Item form elements
  • Build and release tasks

 

Except for build and release tasks, all other extensions types run in a web browser, so it is easy to collect telemetry and Application Insights is a natural fit.

 

There is plenty of information how to integrate Application Insights with your extensions  (here and here just to reference a few),  if you use the ALM | Devops Rangers generator-vsts-extension to generate the skeleton of your extension Application Insights support is automatically (optional) added for you.

 

image

 

At first look, it might seem that we cannot use Application Insights for build & release tasks since they do not run in a browser and we are stuck with the statistics that Visual Studio Marketplace provides us (installs and uninstalls numbers).

In this post I’m going to show, how you can use Application Insights to measure usage of build & release tasks (referred as tasks from now on)

Tasks are implemented in either PowerShell or JavaScript, tasks should preferably  be implemented in JavaScript since they can be executed in all platforms (build and release agent is cross platform and can run in Windows, Linux or MacOs) unless there is a strong reason to implement them in PowerShell (which can only run in Windows agent)

 

I’m going to explain how Application Insights can be used in a task implemented in JavaScript (using TypeScript  to be more exact) but the same technique can be used in PowerShell

 

Below you can see the most straightforward implementation of an extension (in TypeScript)

import tl = require('vsts-task-lib/task');
 
async function run() {
    try {
    } catch (err) {
        tl.setResult(tl.TaskResult.Failed, err.message);
    }
}
 
run();

 

In order to collect telemetry we need to install Application Insights for Node.js NPM module (eg: npm install applicationinsights –save)

Next, we need to import the module and initialize it by adding the follow snippet (outside the Run function) [don’t forget to enter your Application Insights instrumentation key or externalize it)

appInsights.setup('INSERT APPLICATION INSIGHTS INSTRUMENTATION KEY')
    .setAutoDependencyCorrelation(false)
    .setAutoCollectRequests(false)
    .setAutoCollectPerformance(false)
    .setAutoCollectExceptions(false)
    .setAutoCollectDependencies(false)
    .setAutoCollectConsole(false)
    .setUseDiskRetryCaching(false)
    .start();
 
var client = appInsights.defaultClient;

Application Insights SDK is initialized, auto collection has been disabled and data collection has been started.

We now need to collect information we want explicitly.

Tracking Usage

 

For our example, these are the requirements

  • Track how many executions times an extension had
  • How many accounts/collections are actively using the extension
  • Get errors in order to track issues that users don’t bother to report.
  • Don’t collect any user data nor use any information that may lead to user identification; all information is anonymous.

 

You may have others, but these are the ones we are going to solve in this post.

There are several kind of events we can send to Application Insights, we can track things like Page views, events, metrics, exceptions, request, log traces or dependency.

Since we only want to track usage we have two choices track a request or track a custom event.

 

Track using a request event

 

Conceptually the execution of a task is not a request; the request represents a web request. Semantics apart the request is a suitable way to track task executions even if we are stretching the request definition a little.

If we use a request these are some things we get out of the box

  • We can track the response time of the task  execution (typically this doesn’t matter since a task may be executed in machines with very different specs, or different data specs)
  • Usage (and performance) data is visible on application insights summary blade

 

image

 

We need to call the track request API call, and provide the following parameters

  • Name  Name of the request, we can use the task name on this field.
  • URL Since we don’t have an URL we can use anything we want here (it doesn’t need to be a valid URI) so we can use either Build or Release to know if the task was executed in a build or a request
  • Duration The execution time (if we want to track performance, otherwise use 0)
  • Success status If the request was successful or failed.
  • Result Code The result code (use anything you want, but you need to specify it, otherwise the request is ignored)

 

There is just one thing missing, how can we track usage across accounts, for that we can use properties that are sent in every event as custom dimensions. This would be the implementation of our task

 

async function run() {
    try {
 
        //do your actions

        let taskType = tl.getVariable("Release.ReleaseId") ? "Release Task" : "Build Task";
 
        client.commonProperties = {
            collection: tl.getVariable("system.collectionId"), projectId: tl.getVariable("system.teamProjectId")
        };
 
 
        client.trackRequest({ name: "taskName", url: taskType, duration: 0, success: true, resultCode: "OK" });
 
    } catch (err) {
        tl.setResult(tl.TaskResult.Failed, err.message);
    } finally {
        client.flush();
    }
}
 
run();

 

Notice we send the collection id and the team project id, these are GUID’s that are opaque, they do not reveal any client information and can’t be used to track them, but if you are worried, you can be extra cautious and pass them through a hash function for further anonymization.

 

Tracking using a custom Event

 

A custom event can be used to send data points to Application Insights; you can associate custom data with an event. You can use data that can be aggregated (to be viewed in metrics explorer) or data to available in Search. Both are queryable in Application Insights Analytics.

async function run() {
    try {
 
        //do your actions

        let taskType = tl.getVariable("Release.ReleaseId") ? "Release Task" : "Build Task";
 
        client.trackEvent({ "name""Task Execution""properties": { "taskType": taskType, "taskName""taskname""collection": tl.getVariable("system.collectionId"), "projectId": tl.getVariable("system.teamProjectId") } });
 
    } catch (err) {
        tl.setResult(tl.TaskResult.Failed, err.message);
    } finally {
        client.flush();
    }
}

We the event, I opted to use Task Execution for the event type name , it allows us to quickly count the number of task executions (regardless if it is in a release or a build context) and if we need to get the context where the task has been executed, we can get it from the taskType property.

 

Errors Telemetry

 

Finally we want to get error information, so we add the following code to the catch handler

 

catch (err) {
    client.trackException({ exception: err });
 
    tl.setResult(tl.TaskResult.Failed, err.message);
}

 

If we just send exception event, we will miss the task execution failures, so we need to call the event (either trackEvent or trackRequest) or we will not have usage data for task failures.

If we are using events this would be the code for that catch handler

catch (err) {
    client.trackException({ exception: err });
 
    let taskType = tl.getVariable("Release.ReleaseId") ? "Release Task" : "Build Task";
 
    client.trackEvent({ "name""Task Execution""properties": { "failed"true"taskType": taskType, "taskName""taskname""collection": tl.getVariable("system.collectionId"), "projectId": tl.getVariable("system.teamProjectId") } });
 
    tl.setResult(tl.TaskResult.Failed, err.message);
}

Noticed we added a failed property to the properties object

If we are using request events this would the catch handler

catch (err) {
    client.trackException({ exception: err });
 
    let taskType = tl.getVariable("Release.ReleaseId") ? "Release Task" : "Build Task";
 
    client.commonProperties = {
        collection: tl.getVariable("system.collectionId"), projectId: tl.getVariable("system.teamProjectId")
    };
 
    client.trackRequest({ name: "taskName", url: taskType, duration: 0, success: false, resultCode: "Error" });
 
    tl.setResult(tl.TaskResult.Failed, err.message);
}

The only difference is that we are setting sucess property to false, so the request appears in the failed requests.

 

Sending Data

 

To make sure data is sent (the SDK batches data) we call the flush method on the finally handler to guarantee data is sent to Application Insights before the task execution finishes.

} finally {     client.flush();
}

Opting In/Opting out

 

Optionally you can allow users to either opt in or opt out via a task parameter and users can decide if they want to contribute to anonymous telemetry data.

 

image

 

Telemetry can be disabled via the disableAppInsights property of the client config property.

var client = appInsights.defaultClient;
 
client.config.disableAppInsights = !enabled;

Analyzing Data

 

After deploying the tasks with telemetry collection enabled we are now ready to analyze usage data.

 

We have several ways to visualize or analyze Application Insights data; we can use Azure Portal or Application Insights Analytics

Note: this is not a primer on Application Insights, it is just a glimpse of some ways to analyze the data collected from tasks.

Azure Portal

 

If you decided to go with the track request route, all executions are visible immediately on the Overview blade

 

image

 

If you decided to go the event’s route, You could get a similar graphic by opening Metrics explorer and configuring the chart to display Events and grouping it by Event Name.

 

SNAGHTML89c0d6d

 

You can also group by Operating System to see if tasks were executed on a windows, Linux or MacOS agent.

If you wish to see event details, just click on the search and click on the event [1] you wish to inspect

 

SNAGHTML8ad657d

 

You can see on Custom data, the custom event data we have sent. The collection identifier, the project identifier, the name of the task and the where the task has been executed (Build in this case).

 

Application Insights Analytics

 

Application Insights Analytics provides you with a search and query tool to analyze Application Insights data.

 

Let’s start with the simplest query, list all the requests ordered by date

 

image

 

To get an idea how tasks executions over time and which platform the agent is running on, we can render the following chart (data is grouped in 5 min intervals)
image

 

However, we can use it to answer, more elaborate questions. Like in how many executions each task had in distinct team projects (we could also get how many different accounts, but it is easier to demonstrate with single account)

 

image

 

In the last six hours, the task called “taskName” has been executed 60 times in two different team projects.

VSTS: Generating badges for Releases

Visual Studio Team Services (and Team Foundation Server) supports build badges out of the box giving you an URL for an image that can be embedded in a web page (eg: a wiki or a GitHub page) to visualize the (always updated) status of a build.

 

image

 

Unfortunately Release Management doesn’t offer the same support for releases, fortunately it easy to implement a release badge generator with a few lines of code and using VSTS built-in extensibility mechanisms.

 

This is what I’m going to show you on this post, how I built a VSTS release badge generator using VSTS web hooks, an Azure Function to generate the badge and Azure Blog storage to store them practically for free (it will cost you at most some cents per month for a huge number of badges/accesses).

 badge example

 

Architecture

 

We want a system that:

  • Is Fast
  • Free or very very cheap to run
  • Has very few moving parts
  • Doesn’t require access to VSTS data
  • Doesn’t require maintenance or management
  • Is stateless, don’t want to manage data besides the badge itself.

 

The architecture consists of a system that statically generates the badge every time a release is deployed, only if there is a new release the badge is regenerated.

This means when a users sees a badge, it’s only accessing a static file (no computation is needed, so no extra costs are incurred. Computation costs are much higher than storage costs) , the badge is not generated in real and there is no need to have access to VSTS, meaning not only there is no need to manage credentials to access VSTS, the code is simpler and specially there is no attack surface against VSTS via the generator.

Since the files are statically generated and accessed via HTTP/HTTPS they can be cached (in proxies and in browsers) improving access speed to badges and saving bandwidth (and storage costs), by default the cache is configure for a low value but it can be configured to be adjusted to different release (expected) frequencies.

 

This means the we can generate badges while being decoupled from VSTS, generate badges from many team projects (and even different accounts) without having access to them, the only thing we need is to provide the endpoint of the generator and configure a web hook.

 

It is an event drive architecture, with all the advantages and disadvantages. Smile

 

This design has some drawbacks:

  • We only have badges after the the generator is configured and after a release is deployed, it doesn’t work with past deploys. If you can’t wait for the next release you can force a redeploy just to generate the badge.
  • The badges are totally decoupled from the release definition and environments. If release definition or a environment is deleted the badges are not deleted.
    • We could have a process to clean up orphaned badges, but for simplicity we just leave them alone (storage is cheap) Smile

 

Requirements

 

An Azure account to host the code and at least one VSTS account to generate the badges for.

Service Hooks

 

VSTS Service Hooks allows integrating VSTS with other applications, every time an certain event happens in VSTS (a work item has changed, a build has started, pull request created, and all other sort of events) the application is notified in near real time.

 

Service hooks can be used to integrate out of the box with services like AppVeyor, Azure App Service, Campfire, Jenkins, Microsoft Teams , Slack, Office 365 or ZenDesk just to name a few. In our case we are interested in a generic integration, so we are going to use a Web Hook to receive an event every time a  release deployment is finished.

 

We can create one web hook [1] with Release deployment completed [2] event per release definition (if we wish to generate release badges only for some releases or environments) or a web hook for any release definition [3] (this will generate a badge for all releases in any environment).

 

SNAGHTML27677d27

 

The event has a JSON payload, the schema of the event varies with the type of event we are receiving, the Release deployment completed  event contains all the information we need to generate a badge, it contains among other things

  • Release definition id/name
  • The release number
  • The environment

 

With a (near) real time event every time a release is deployed we just need to generate the badge based on the event payload.

 

Generate the badge using Azure Function

 

We need to generate the badge every time an event is received, an Azure Function is very suitable for this task. Some of its advantages (among others)

  • Its very cheap to run (see hosting plan comparisons)
    • If you already have an azure app services you can host it there (since you are already paying for it) so no extra costs
    • Pay per use, pay only what you use with a generous free monthly cap (which means you can run it for free unless you have big number of deployments occurring)
  • No need to manage infrastructure or care about scalability.
  • First class with Azure components, we wan use other Azure resources with only a few lines of code or declaratively.

 

We need to implemented a function, that upon receiving a JSON payload via HTTP/HTTPS, generates a badge based on the event data and store it somewhere where is publically accessible via HTTP/HTTPS.

 

I’m not going into the details of creating an Azure Function, there is plenty of information available online:

 

We need to implement an Azure function that is triggered by an HTTP request and when the event is received we need to generate the badge based on the data (we will generate a badge with name of the release definition and the release number).

 

image

Generating the badge

 

We can either manipulate images files to generate the badge image, use some external library or use an external service to do the heavy lifting for us.

 

I’ve chosen the latter and used Shields IO service to generate badges. Shields IO is a service that can generate badges in several image formats (jpeg, gif, svg,…) for dozens of services (eg: Travis, AppVeyor, CircleCI,etc)  but it can also generate generic badges and that what we will use to generate our badges with a single HTTP call.

 

Storing the generated badges

 

After the badge has been generated, we need to store it somewhere so people in general can access it. Azure Storage is a natural choice:

  • It’s dirty cheap
  • Badges can be accessed via HTTP/HTTPS anywhere in the world
  • It’s fast (files are pre generated) and you can control time to live on browser caches to save bandwidth costs.
  • It has built in replication mechanisms to guarantee durability and high availability.

 

So we will use Azure Storage Blobs, all generated badges will be stored on a container.

 

Writing a file from an Azure function is only a few lines of code, we could be using Azure Functions declaratively capabilities to write the file into azure storage with no code at all, but we want to control no only the storage container and file path, we also want to have a more fine grained browser cache directives. Even so writing the file to storage is just a file lines of code
image

 

The container where the badges are stored only one to fill one requirement, anonymous access must be enabled.

 

image

 

Security

 

By default the functions are set to use function level security (if you use the Get Function URL from the portal, the default code is automatically added to the URL)

 

image

 

If you wish to have subscriptions from multiple accounts or team projects, you may want to have different keys so you can revoke them if needed without disrupting all other subscriptions. It is advisable you set up multiple keys.

 

Learn how to work with function or host keys

 

Be aware that anyone that you share the function URL with, can flood the function with fake events, not only generating fake data but also make you (potentially) incur into extra costs (compute and storage)

 

Show me the code

 

You must be wondering, so much talk and practically no details at all how this was implemented, I haven’t gone into details since the code is quite simple and it has plenty of comments.

 

The solution is not future proof, for example it doesn’t deal with failures or unavailability of Shields IO service (transient failures will not prevent badge generation, since Service Hooks built-in retry mechanism will take care of that).

 

The code works and can be used in production, but it has been written as a learning experimenting since I wanted to to try some things with Azure Function and at the same time produce something useful. Smile

 

All the code is available as Open Source with a MIT license on GitHub, the repo contains not only the source code but also a PowerShell script (which uses an ARM template) to automatically provision and configure the Azure Function. Setting up Web Hooks subscriptions and deploying the code are manual tasks

 

The readme file on the repo has all instructions (way longer that this post) how to provision, deploy code and configure Web Hooks as well as parameterize the generator (different cache settings, badge styles, etc.).

 

Provisioning the function and deploying the generator is very easy to automate using VSTS release management. It can automated with only two tasks

 

image