Wednesday 30 December 2020

Azure Pipelines meet Jest

This post explains how to integrate the tremendous test runner Jest with the continuous integration platform Azure Pipelines. Perhaps we're setting up a new project and we've created a new React app with Create React App. This ships with Jest support out of the box. How do we get that plugged into Pipelines such that:

  1. Tests run as part of our pipeline
  2. A failing test fails the build
  3. Test results are reported in Azure Pipelines UI?

Tests run as part of our pipeline

First of all, lets get the tests running. Crack open your azure-pipelines.yml file and, in the appropriate place add the following:

        - task: Npm@1
          displayName: npm run test
          inputs:
            command: 'custom'
            workingDir: 'src/client-app'
            customCommand: 'run test'

The above will, when run, trigger a npm run test in the src/client-app folder of my project (it's here where my React app lives). You'd imagine this would just work™️ - but life is not that simple. This is because Jest, by default, runs in watch mode. This is blocking and so not appropriate for CI.

In our src/client-app/package.json let's create a new script that runs the tests but not in watch mode:

        "test:ci": "npm run test -- --watchAll=false",

and switch our azure-pipelines.yml to use it:

        - task: Npm@1
          displayName: npm run test
          inputs:
            command: 'custom'
            workingDir: 'src/client-app'
            customCommand: 'run test:ci'

Boom! We're now running tests as part of our pipeline. And also, failing tests will fail the build, because of Jest's default behaviour of exiting with status code 1 on failed tests.

Tests results are reported in Azure Pipelines UI

Pipelines has a really nice UI for reporting test results. If you're using something like .NET then you'll find that test results just magically show up there. We'd like that for our Jest tests as well. And we can have it.

The way we achieve this is by:

  1. Producing test results in a format that can be subsequently processed
  2. Using those test results to publish to Azure Pipelines

The way that you configure Jest test output is through usage of reporters. However, Create React App doesn't support these. However that's not an issue, as the marvellous Dan Abramov demonstrates here.

We need to install the jest-junit package to our client-app:

npm install jest-junit --save-dev

And we'll tweak our test:ci script to use the jest-junit reporter as well:

        "test:ci": "npm run test -- --watchAll=false --reporters=default --reporters=jest-junit",

We also need to add some configuration to our package.json in the form of a jest-junit element:

    "jest-junit": {
        "suiteNameTemplate": "{filepath}",
        "outputDirectory": ".",
        "outputName": "junit.xml"
    }

The above configuration will use the name of the test file as the suite name in the results, which should speed up the tracking down of the failing test. The other values specify where the test results should be published to, in this case the root of our client-app with the filename junit.xml.

Now our CI is producing our test results, how do we get them into Pipelines? For that we need the Publish test results task and a new step in our azure-pipelines.yml after our npm run test step:

        - task: Npm@1
          displayName: npm run test
          inputs:
            command: 'custom'
            workingDir: 'src/client-app'
            customCommand: 'run test:ci'

        - task: PublishTestResults@2
          displayName: 'supply npm test results to pipelines'
          condition: succeededOrFailed() # because otherwise we won't know what tests failed
          inputs:
            testResultsFiles: 'src/client-app/junit.xml'

This will read the test results from our src/client-app/junit.xml file and pump them into Pipelines. Do note that we're always running this step; so if the previous step failed (as it would in the case of a failing test) we still pump out the details of what that failure was. Like so:

And that's it! Azure Pipelines and Jest integrated.

Tuesday 22 December 2020

Prettier your CSharp with dotnet-format and lint-staged

Consistent formatting is a good thing. It makes code less confusing to newcomers and it allows whoever is working on the codebase to reliably focus on the task at hand. Not "fixing curly braces because Janice messed them up with her last commit". (A git commit message that would be tragic in so many ways.)

Once you've agreed that you want to have consistent formatting, you want it to be enforced. Enter, stage left, Prettier, the fantastic tool for formatting code. It rocks; I've been using on my JavaScript / TypeScript for the longest time. But what about C#? Well, there is a Prettier plugin for C#.... Sort of. It appears to be abandoned and contains the worrying message in the README.md:

Please note that this plugin is under active development, and might not be ready to run on production code yet. It will break your code.

Not a ringing endorsement.

dotnet-format: a new hope

Margarida Pereira recently pointed me in the direction of dotnet-format which is a formatter for .NET. It's a .NET tool which:

is a code formatter for dotnet that applies style preferences to a project or solution. Preferences will be read from an .editorconfig file, if present, otherwise a default set of preferences will be used.

It can be installed with:

dotnet tool install -g dotnet-format

The VS Code C# extension will make use of this formatter, you just need to set the following in your settings.json:

    "omnisharp.enableRoslynAnalyzers": true,
    "omnisharp.enableEditorConfigSupport": true

Customising your formatting

If you'd like to deviate from the default formatting options then create yourself a .editorconfig file in the root of your project. Let's say you prefer more of the K & R style approach to braces instead of the C# default of Allman style. To make dotnet-format use that you'd set the following:

# Remove the line below if you want to inherit .editorconfig settings from higher directories
root = true

# See https://github.com/dotnet/format/blob/master/docs/Supported-.editorconfig-options.md for reference
[*.cs]
csharp_new_line_before_open_brace = none
csharp_new_line_before_catch = false
csharp_new_line_before_else = false
csharp_new_line_before_finally = false
csharp_new_line_before_members_in_anonymous_types = false
csharp_new_line_before_members_in_object_initializers = false
csharp_new_line_between_query_expression_clauses = true

With this in place it's K & R all the way baby!

lint-staged integration

It's become somewhat standard to use the marvellous husky and lint-staged to enforce code quality. To quote the docs:

Run linters against staged git files and don't let 💩 slip into your code base!

So, before I happened upon dotnet-format I was already enforcing TypeScript / JavaScript style with the following entry in my package.json:

"husky": {
    "hooks": {
        "pre-commit": "lint-staged"
    }
},
"lint-staged": {
    "*.{js,ts,tsx}": "prettier --write"
}

The above configuration runs Prettier against files which have been staged for commit, provided they have the suffix .js or .ts or .tsx. How can we get dotnet-format in the mix also? Like so:

"husky": {
    "hooks": {
        "pre-commit": "lint-staged --relative"
    }
},
"lint-staged": {
    "*.cs": "dotnet format --include",
    "*.{js,ts,tsx}": "prettier --write"
}

We've done two things here. First, we've changed the lint-staged command to include the parameter --relative. This is because dotnet-format only deals with relative paths. Prettier is pretty flexible, so we can make this change without breaking anything.

Secondly we've added the *.cs task of dotnet format --include. This is the task that will be run on commit, when lint-staged runs, it will pass a list of relative file paths to dotnet format, the --include accepts a list of relative file or folder paths to include in formatting. So if you'd staged two files it might end up executing a command like this:

dotnet format --include src/server-app/Server/Controllers/UserController.cs src/server-app/Server/Controllers/WeatherForecastController.cs

By and large we don't have to think about this; the important take home is that we're now enforcing standardised formatting for all C# files upon commit. Everything that goes into the codebase will be formatted in a consistent fashion.

Monday 21 December 2020

How to make Azure AD 403

This post is about how you can customise ASP.NETs integration with Azure Active Directory to customise the behaviour that redirects unauthorized requests to the AccessDenied endpoint. If you're using the tremendous Azure Active Directory for authentication with ASP.NET then there's a good chance you're using the Microsoft.Identity.Web library. It's this that allows us to drop the following statement into the ConfigureServices method of our Startup class:

services.AddMicrosoftIdentityWebAppAuthentication(Configuration);

Which (combined with configuration in our appsettings.json files) hooks us up with Azure AD for authentication. This is 95% awesome. The 5% is what we're here for. Here's a screenshot of the scenario that troubles us:

We've made a request to /WeatherForecast; a secured endpoint (a controller decorated with the Authorize attribute). We're authenticated; the app knows who we are. But we're not authorized / allowed to access this endpoint. We don't have permission. The HTTP specification caters directly for this scenario with status code 403 Forbidden:

The 403 (Forbidden) status code indicates that the server understood the request but refuses to authorize it.

However, Microsoft.Identity.Web is ploughing another furrow. Instead of returning 403, it's returning 302 Found and redirecting the browser to https://localhost:5001/Account/AccessDenied?ReturnUrl=%2FWeatherForecast. Now the intentions here are great. If you wanted to implement a page in your application at that endpoint that displayed some kind of useful message it would be really useful. However, what if you want the more HTTP-y behaviour instead? In the case of a HTTP request triggered by JavaScript (typical for Single Page Applications) then this redirect isn't that helpful. JavaScript doesn't really know what to do with the 302 and whilst you could code around this, it's not desirable.

We want 403 - we don't want 302.

Give us 403

You can have this behaviour by dropping the following code after your services.AddMicrosoftIdentityWebAppAuthentication:

services.Configure<CookieAuthenticationOptions>(CookieAuthenticationDefaults.AuthenticationScheme, options =>
{
    options.Events.OnRedirectToAccessDenied = new Func<RedirectContext<CookieAuthenticationOptions>, Task>(context =>
    {
        context.Response.StatusCode = StatusCodes.Status403Forbidden;
        return context.Response.CompleteAsync();
    });
});

This code hijacks the redirect to AccessDenied and transforms it into a 403 instead. Tremendous! What does this look like?

This is the behaviour we want!

Extra customisation bonus points

You may want to have some nuance to the way you handle unauthorized requests. Because of the nature of OnRedirectToAccessDenied this is entirely possible; you have complete access to the requests coming in which you can use to direct behaviour. To take a single example, let's say we want to direct normal browsing behaviour (AKA humans clicking about in Chrome) which is not authorized to a given screen, otherwise provide 403s. What would that look like?

services.Configure<CookieAuthenticationOptions>(CookieAuthenticationDefaults.AuthenticationScheme, options =>
{
    options.Events.OnRedirectToAccessDenied = new Func<RedirectContext<CookieAuthenticationOptions>, Task>(context =>
    {
        var isRequestForHtml = context.Request.Headers["Accept"].ToString().Contains("text/html");
        if (isRequestForHtml) {
            context.Response.StatusCode = StatusCodes.Status302Found;
            context.Response.Headers["Location"] = "/unauthorized";
        }
        else {
            context.Response.StatusCode = StatusCodes.Status403Forbidden;
        }

        return context.Response.CompleteAsync();
    });
});

So above, we check the request Accept headers and see if they contain "text/html"; which we're using as a signal that the request came from a users browsing. (This may not be bulletproof; better suggestions gratefully received.) If the request does contain a "text/html" Accept header then we redirect the client to an /unauthorized screen, otherwise we return 403 as we did before. Super flexible and powerful!

Sunday 20 December 2020

Nullable reference types; CSharp's very own strictNullChecks

'Tis the season to play with new compiler settings! I'm a very keen TypeScript user and have been merrily using strictNullChecks since it shipped. I was dimly aware that C# was also getting a similar feature by the name of nullable reference types.

It's only now that I've got round to taking at look at this marvellous feature. I thought I'd share what moving to nullable reference types looked like for me; and what code changes I found myself making as a consequence.

Turning on nullable reference types

To turn on nullable reference types in a C# project you should pop open the .csproj file and ensure it contains a <Nullable>enable</Nullable>. So if you had a .NET Core 3.1 codebase it might look like this:

  <PropertyGroup>
    <TargetFramework>netcoreapp3.1</TargetFramework>
    <Nullable>enable</Nullable>
  </PropertyGroup>

When you compile from this point forward, possible null reference types are reported as warnings. Consider this C#:

    [ApiController]
    public class UserController : ControllerBase
    {
        private readonly ILogger<UserController> _logger;

        public UserController(ILogger<UserController> logger)
        {
            _logger = logger;
        }

        [AllowAnonymous]
        [HttpGet("UserName")]
        public string GetUserName()
        {
            if (User.Identity.IsAuthenticated) {
                _logger.LogInformation("{User} is getting their username", User.Identity.Name);
                return User.Identity.Name;
            }

            _logger.LogInformation("The user is not authenticated");
            return null;
        }
    }

A dotnet build results in this:

dotnet build --configuration release

Microsoft (R) Build Engine version 16.7.1+52cd83677 for .NET
Copyright (C) Microsoft Corporation. All rights reserved.

  Determining projects to restore...
  Restored /Users/jreilly/code/app/src/server-app/Server/app.csproj (in 471 ms).
Controllers/UserController.cs(38,24): warning CS8603: Possible null reference return. [/Users/jreilly/code/app/src/server-app/Server/app.csproj]
Controllers/UserController.cs(42,20): warning CS8603: Possible null reference return. [/Users/jreilly/code/app/src/server-app/Server/app.csproj]
  app -> /Users/jreilly/code/app/src/server-app/Server/bin/release/netcoreapp3.1/app.dll
  app -> /Users/jreilly/code/app/src/server-app/Server/bin/release/netcoreapp3.1/app.Views.dll

Build succeeded.

Controllers/UserController.cs(38,24): warning CS8603: Possible null reference return. [/Users/jreilly/code/app/src/server-app/Server/app.csproj]
Controllers/UserController.cs(42,20): warning CS8603: Possible null reference return. [/Users/jreilly/code/app/src/server-app/Server/app.csproj]
    2 Warning(s)
    0 Error(s)

You see the two "Possible null reference return." warnings? Bingo

Really make it hurt

This is good - information is being surfaced up. But it's a warning. I could ignore it. I like compilers to get really up in my face and force me to make a change. I'm not into warnings; I'm into errors. Know what works for you. If you're similarly minded, you can upgrade nullable reference warnings to errors by tweaking the .csproj a touch further. Add yourself a <WarningsAsErrors>nullable</WarningsAsErrors> element. So maybe your .csproj now looks like this:

  <PropertyGroup>
    <TargetFramework>netcoreapp3.1</TargetFramework>
    <Nullable>enable</Nullable>
    <WarningsAsErrors>nullable</WarningsAsErrors>
  </PropertyGroup>

And a dotnet build will result in this:

dotnet build --configuration release

Microsoft (R) Build Engine version 16.7.1+52cd83677 for .NET
Copyright (C) Microsoft Corporation. All rights reserved.

  Determining projects to restore...
  Restored /Users/jreilly/code/app/src/server-app/Server/app.csproj (in 405 ms).
Controllers/UserController.cs(38,24): error CS8603: Possible null reference return. [/Users/jreilly/code/app/src/server-app/Server/app.csproj]
Controllers/UserController.cs(42,20): error CS8603: Possible null reference return. [/Users/jreilly/code/app/src/server-app/Server/app.csproj]

Build FAILED.

Controllers/UserController.cs(38,24): error CS8603: Possible null reference return. [/Users/jreilly/code/app/src/server-app/Server/app.csproj]
Controllers/UserController.cs(42,20): error CS8603: Possible null reference return. [/Users/jreilly/code/app/src/server-app/Server/app.csproj]
    0 Warning(s)
    2 Error(s)

Yay! Errors!

What do they mean?

"Possible null reference return" isn't the clearest of errors. What does that actually amount to? Well, it amounts to the compiler saying "you're a liar! (maybe)". Let's look again at the code where this error is reported:

        [AllowAnonymous]
        [HttpGet("UserName")]
        public string GetUserName()
        {
            if (User.Identity.IsAuthenticated) {
                _logger.LogInformation("{User} is getting their username", User.Identity.Name);
                return User.Identity.Name;
            }

            _logger.LogInformation("The user is not authenticated");
            return null;
        }

We're getting that error reported where we're returning null and where we're returning User.Identity.Name which may be null. And we're getting that because as far as the compiler is concerned string has changed. Before we turned on nullable reference types the compiler considered string to mean string OR null. Now, string means string.

This is the same sort of behaviour as TypeScripts strictNullChecks. With TypeScript, before you turn on strictNullChecks, as far as the compiler is concerned, string means string OR null OR undefined (JavaScript didn't feel one null-ish value was enough and so has two - don't ask). Once strictNullChecks is on, string means string.

It's a lot clearer. And that's why the compiler is getting antsy. The method signature is string, but it can see null potentially being returned. It doesn't like it. By and large that's good. We want the compiler to notice this as that's the entire point. We want to catch accidental nulls before they hit a user. This is great! However, what do you do if have a method (as we do) that legitimately returns a string or null?

Widening the type to include null

We change the signature from this:

        public string GetUserName()

To this:

        public string? GetUserName()

That's right, the simple addition of ? marks a reference type (like a string) as potentially being null. Adding that means that we're potentially returning null, but we're sure about it; there's intention here - it's not accidental. Wonderful!

Wednesday 9 December 2020

azure-pipelines-task-lib and isOutput setVariable

Some blog posts are insightful treatises on the future of web development, some are "here's how I solved my problem". This is most assuredly the latter.

I'm writing an custom pipelines task extension for Azure Pipelines. It's written with TypeScript and the azure-pipelines-task-lib.

The pipeline needs to output a variable. Azure Pipelines does that using the setvariable command combined with isOutput=true. This looks something like this: ##vso[task.setvariable variable=myOutputVar;isOutput=true]this is the value".

The bad news is that the lib doesn't presently support isOutput=true. Gosh it makes me sad. Hopefully in future it will be resolved. But what now?

For now we can hack ourselves a workaround:

import * as tl from 'azure-pipelines-task-lib/task';
import * as tcm from 'azure-pipelines-task-lib/taskcommand';
import * as os from 'os';

/**
 * Sets a variable which will be output as well.
 *
 * @param     name    name of the variable to set
 * @param     val     value to set
 * @param     secret  whether variable is secret.  Multi-line secrets are not allowed.  Optional, defaults to false
 * @returns   void
 */
export function setOutputVariable(name: string, val: string, secret = false): void {
    // use the implementation of setVariable to set all the internals,
    // then subsequently set the output variable manually
    tl.setVariable(name, val, secret);

    const varValue = val || '';

    // write the command
    // see https://docs.microsoft.com/en-us/azure/devops/pipelines/process/variables?view=azure-devops&tabs=yaml%2Cbatch#set-a-multi-job-output-variable
    _command(
        'task.setvariable',
        { variable: name || '', isOutput: 'true', issecret: (secret || false).toString() },
        varValue
    );
}

const _outStream = process.stdout;

function _writeLine(str: string): void {
    _outStream.write(str + os.EOL);
}

function _command(command: string, properties: any, message: string) {
    const taskCmd = new tcm.TaskCommand(command, properties, message);
    _writeLine(taskCmd.toString());
}

The above is effectively a wrapper for the existing setVariable. However, once it's called into the initial implementation, setOutputVariable then writes out the same variable once more, but this time bolting on isOutput=true.

Finally, I've raised a PR to see if isOutput can be added directly to the library. You can track progress on that here.

Saturday 28 November 2020

Images in MarkDown for the Azure DevOps Marketplace!

I've recently found myself developing custom pipelines task extensions for Azure DevOps. The extensions being developed end up in the Azure DevOps Marketplace. What you see there when you look at existing extensions is some pretty lovely documentation.

How can our tasks look as lovely?

That, my friends, is the question to answer. Good documentation is key to success. Here's the ask: when a custom task is installed it becomes available in the marketplace, we want it to:

  • contain documentation
  • that documentation should support images... For a picture, famously, speaks a thousand words

Mark(Down) our manifest

To get documentation showing up in the marketplace, we need to take a look at the vss-extension.json file which lies at the root of our extension folder. It's a kind of manifest file and is documented here.

Tucked away in the docs, you'll find mention of a content property and the words:

Dictionary of content files that describe your extension to users...
Each file is assumed to be in GitHub Flavored Markdown format. The path of each item is the path to the markdown file in the extension. Valid keys: details.

This means we can have a MarkDown file in our repo which documents our task. To stay consistent with most projects, a solid choice is to use the README.md that sits in the root of the project to this end.

So the simple addition of this:

{
    //...
    "content": {
        "details": {
            "path": "README.md"
        }
    },
    //...
}

Gives us documentation in the marketplace. Yay!

Now the images...

If we are referencing images in our README.md then, as it stands right now, they won't show up in the marketplace. It'll be broken link city. Imagine some MarkDown like this:

![alt text](images/screenshot.png)

This is entirely correct and supported, but won't work by default. This is because these images need to be specified in the files property of the vss-extension.json.

{
    //...
    "content": {
        "details": {
            "path": "README.md"
        }
    },
    "files": [
        {
            "path": "images",
            "addressable": true
        }
    ]
    //...
}

Consider the above; the path of images includes everything inside the images folder in the task. However, it's crucial that the "addressable": true is present as well. It's this that makes the files in this path URL-addressable. And without that, the images won't be displayed.

That's it! We're done! We can have rich, image inclusive, documentation in our custom tasks.

A final note: it's possible to specify individual files rather than whole paths in the files directory and you might want to do that if you're being very careful around file size. There is a maximum size for a custom task and it's easy to breach it. But by and large I find that "allowlisting" a single directory is easier.

Saturday 14 November 2020

Bulletproof uniq with TypeScript generics (yay code reviews!)

Never neglect the possibilities of a code review. There are times when you raise a PR and all you want is for everyone to hit approve so you can merge, merge and ship, ship! This can be a missed opportunity. For as much as I'd like to imagine my code is perfect, it's patently not. There's always scope for improvement.

"What's this?"

This week afforded me that opportunity. I was walking through a somewhat complicated PR on a call and someone said "what's this?". They'd spotted an expression much like this in my code:

const myValues = [...new Set(allTheValuesSupplied)];

What is that? Well, it's a number of things:

  1. It's a way to get the unique values in a collection.
  2. It's a pro-tip and a coding BMX trick.

What do I mean? Well, this is indeed a technique for getting the unique values in a collection. But it relies upon you knowing a bunch of things:

  • Set contains unique values. If you add multiple identical values, only a single value will be stored.
  • The Set constructor takes iterable objects. This means we can new up a Set with an array that we want to "unique-ify" and we will have a Set that contains those unique values.
  • If you want to go on to do filtering / mapping etc on your unique values, you'll need to get them out of the Set. This is because (regrettably) ECMAScript iterables don't implicitly support these operations and neither are methods such as these part of the Set API. The easiest way to do that is to spread into a new array which you can then operate upon.

I have this knowledge. Lots of people have this knowledge. But whilst this may be the case, using this technique goes against what I would generally consider to be a good tenet of programming: comprehensibility. When you read this code above, it doesn't immediately tell you what it's doing. This is a strike against it.

Further to that, it's "noisy". Even if the reader does have this knowledge, as they digest the code, they have to mentally unravel it. "Oh it's a Set, we're passing in values, then spreading it out, it's probably intended to get the unique values.... Right, cool, cool.... Continue!"

Margarida Pereira explicitly called this out and I found myself agreeing. Let's make a uniq function!

uniq v1

I wrote a very simple uniq function which looked like this:

/**
 * Return the unique values found in the passed iterable
 */
function uniq<TElement>(iterableToGetUniqueValuesOf: Iterable<TElement>) {
    return [...new Set(iterableToGetUniqueValuesOf)];
}

Usage of this was simple:

uniq([1,1,1,3,1,1,2]) // produces [1, 3, 2]
uniq(["John", "Guida", "Ollie", "Divya", "John"]) // produces ["John", "Guida", "Ollie", "Divya"]

And I thought this was tremendous. I committed and pushed. I assumed there was no more to be done. Guida (Margarida) then made this very helpful comment:

BTW, I found a big bold warning that new Set() compares objects by reference (unless they're primitives) so it might be worth adding a comment to warn people that uniq/distinct compares objects by reference:
https://codeburst.io/javascript-array-distinct-5edc93501dc4

She was right! If a caller was to, say, pass a collection of objects to uniq then they'd end up highly disappointed. Consider:

uniq([{ name: "John" }, { name: "John" }]) // produces [{ name: "John" }, { name: "John" }]

We can do better!

uniq v2

I like compilers shouting at me. Or more accurately, I like compilers telling me when something isn't valid / supported / correct. I wanted uniq to mirror the behaviour of Set - to only support primitives such as string, number etc. So I made a new version of uniq that hardened up the generic contraints:

/**
 * Return the unique values found in the passed iterable
 */
function uniq<TElement extends string | number | bigint | boolean | symbol>(
    iterableToGetUniqueValuesOf: Iterable<TElement>
) {
    return [...new Set(iterableToGetUniqueValuesOf)];
}

With this in place, the compiler started shouting in the most helpful way. When I re-attemped [{ name: "John" }, { name: "John" }] the compiler hit me with:

Argument of type '{ name: string; }[]' is not assignable to parameter of type 'Iterable<string | number | bigint | boolean | symbol>'.

Take a look.

This is good. This is descriptive code that only allows legitimate inputs. It should lead to less confusion and a reduced likelihood of issues in Production. It's also a nice example of how code review can result in demonstrably better code. Thanks Guida!

Tuesday 10 November 2020

Throttling data requests with React Hooks

When an application loads data, typically relatively few HTTP requests will be made. For example, if we imagine we're making a student administration application, then a "view" screen might make a single HTTP request to load that student's data before displaying it.

Occasionally there's a need for an application to make a large number of HTTP requests. Consider a reporting application which loads data and then aggregates it for presentation purposes.

This need presents two interesting problems to solve:

  1. how do we load data gradually?
  2. how do we present loading progress to users?

This post will talk about how we can tackle these and demonstrate using a custom React Hook.

Let's bring Chrome to its knees

We'll begin our journey by spinning up a TypeScript React app with Create React App:

npx create-react-app throttle-requests-react-hook --template typescript

Because we're going to be making a number of asynchronous calls, we're going to simplify the code by leaning on the widely used react-use for a useAsync hook.

cd throttle-requests-react-hook
yarn add react-use

We'll replace the App.css file with this:

.App {
  text-align: center;
}

.App-header {
  background-color: #282c34;
  min-height: 100vh;
  display: flex;
  flex-direction: column;
  align-items: center;
  justify-content: center;
  font-size: calc(10px + 2vmin);
  color: white;
}

.App-labelinput > * {
  margin: 0.5em;
  font-size:24px;
}

.App-link {
  color: #61dafb;
}

.App-button {
  font-size: calc(10px + 2vmin);
  margin-top: 0.5em;
  padding: 1em;
  background-color: cornflowerblue;
  color: #ffffff;
  text-align: center;
}

.App-progress {
  padding: 1em;
  background-color: cadetblue;
  color: #ffffff;
}

.App-results {
  display: flex;
  flex-wrap: wrap;
}

.App-results > * {
  padding: 1em;
  margin: 0.5em;
  background-color: darkblue;
  flex: 1 1 300px;
}

Then we'll replace the App.tsx contents with this:

import React, { useState } from "react";
import { useAsync } from "react-use";
import "./App.css";

function use10_000Requests(startedAt: string) {
  const responses = useAsync(async () => {
    if (!startedAt) return;

    // make 10,000 unique HTTP requests
    const results = await Promise.all(
      Array.from(Array(10_000)).map(async (_, index) => {
        const response = await fetch(
          `/manifest.json?querystringValueToPreventCaching=${startedAt}_request-${index}`
        );
        const json = await response.json();
        return json;
      })
    );

    return results;
  }, [startedAt]);

  return responses;
}


function App() {
  const [startedAt, setStartedAt] = useState("");
  const responses = use10_000Requests(startedAt);

  return (
    <div className="App">
      <header className="App-header">
        <h1>The HTTP request machine</h1>
        <button
          className="App-button"
          onClick={(_) => setStartedAt(new Date().toISOString())}
        >
          Make 10,000 requests
        </button>
        {responses.loading && <div>{progressMessage}</div>}
        {responses.error && <div>Something went wrong</div>}
        {responses.value && (
          <div className="App-results">
            {responses.value.length} requests completed successfully
          </div>
        )}
      </header>
    </div>
  );
}

export default App;

The app that we've built is very simple; it's a button which, when you press it, fires 10,000 HTTP requests in parallel using the Fetch API. The data being requested in this case is an arbitrary JSON file; the manifest.json. If you look closely you'll see we're doing some querystring tricks with our URL to avoid getting cached data.

In fact, for this demo we're not interested in the results of these HTTP requests; rather we're interested in how the browser copes with this approach. (Spoiler: not well!) It's worth considering that requesting a text file from a server running on the same machine as the browser should be fast.

So we'll run yarn start and go to http://localhost:3000 to get to the app. Running with Devtools open results in the following unhappy affair:

The GIF above has been edited significantly for length. In reality it took 20 seconds for the first request to be fired, prior to that Chrome was unresponsive. When requests did start to fire, a significant number failed with net::ERR_INSUFFICIENT_RESOURCES. Further to that, those requests that were fired sat in "Stalled" state prior to being executed. This is a consequence of Chrome limiting the number of connections - all browsers do this:

There are already six TCP connections open for this origin, which is the limit. Applies to HTTP/1.0 and HTTP/1.1 only.

In summary, the problems with the current approach are:

  1. the browser becoming unresponsive
  2. failing HTTP requests due to insufficient resources
  3. no information displayed to the user around progress

Throttle me this

Instead of hammering the browser by firing all the requests at once, we could instead implement a throttle. A throttle is a mechanism which allows you to limit the rate at which operations are performed. In this case we want to limit the rate at which HTTP requests are made. A throttle will tackle problems 1 and 2 - essentially keeping the browser free and easy and ensuring that requests are all successfully sent. We also want to keep our users informed around how progress is going. It's time to unveil the useThrottleRequests hook:

import { useMemo, useReducer } from "react";
import { AsyncState } from "react-use/lib/useAsync";

/** Function which makes a request */
export type RequestToMake = () => Promise<void>;

/**
 * Given an array of requestsToMake and a limit on the number of max parallel requests
 * queue up those requests and start firing them
 * - inspired by Rafael Xavier's approach here: https://stackoverflow.com/a/48007240/761388
 *
 * @param requestsToMake
 * @param maxParallelRequests the maximum number of requests to make - defaults to 6
 */
async function throttleRequests(
  requestsToMake: RequestToMake[],
  maxParallelRequests = 6
) {
  // queue up simultaneous calls
  const queue: Promise<void>[] = [];
  for (let requestToMake of requestsToMake) {
    // fire the async function, add its promise to the queue,
    // and remove it from queue when complete
    const promise = requestToMake().then((res) => {
      queue.splice(queue.indexOf(promise), 1);
      return res;
    });
    queue.push(promise);

    // if the number of queued requests matches our limit then
    // wait for one to finish before enqueueing more
    if (queue.length >= maxParallelRequests) {
      await Promise.race(queue);
    }
  }
  // wait for the rest of the calls to finish
  await Promise.all(queue);
}

/**
 * The state that represents the progress in processing throttled requests
 */
export type ThrottledProgress<TData> = {
  /** the number of requests that will be made */
  totalRequests: number;
  /** the errors that came from failed requests */
  errors: Error[];
  /** the responses that came from successful requests */
  values: TData[];
  /** a value between 0 and 100 which represents the percentage of requests that have been completed (whether successfully or not) */
  percentageLoaded: number;
  /** whether the throttle is currently processing requests */
  loading: boolean;
};

function createThrottledProgress<TData>(
  totalRequests: number
): ThrottledProgress<TData> {
  return {
    totalRequests,
    percentageLoaded: 0,
    loading: false,
    errors: [],
    values: [],
  };
}

/**
 * A reducing function which takes the supplied `ThrottledProgress` and applies a new value to it
 */
function updateThrottledProgress<TData>(
  currentProgress: ThrottledProgress<TData>,
  newData: AsyncState<TData>
): ThrottledProgress<TData> {
  const errors = newData.error
    ? [...currentProgress.errors, newData.error]
    : currentProgress.errors;

  const values = newData.value
    ? [...currentProgress.values, newData.value]
    : currentProgress.values;

  const percentageLoaded =
    currentProgress.totalRequests === 0
      ? 0
      : Math.round(
          ((errors.length + values.length) / currentProgress.totalRequests) * 100
        );

  const loading =
    currentProgress.totalRequests === 0
      ? false
      : errors.length + values.length < currentProgress.totalRequests;

  return {
    totalRequests: currentProgress.totalRequests,
    loading,
    percentageLoaded,
    errors,
    values,
  };
}

type ThrottleActions<TValue> =
  | {
      type: "initialise";
      totalRequests: number;
    }
  | {
      type: "requestSuccess";
      value: TValue;
    }
  | {
      type: "requestFailed";
      error: Error;
    };

/**
 * Create a ThrottleRequests and an updater
 */
export function useThrottleRequests<TValue>() {
  function reducer(
    throttledProgressAndState: ThrottledProgress<TValue>,
    action: ThrottleActions<TValue>
  ): ThrottledProgress<TValue> {
    switch (action.type) {
      case "initialise":
        return createThrottledProgress(action.totalRequests);

      case "requestSuccess":
        return updateThrottledProgress(throttledProgressAndState, {
          loading: false,
          value: action.value,
        });

      case "requestFailed":
        return updateThrottledProgress(throttledProgressAndState, {
          loading: false,
          error: action.error,
        });
    }
  }

  const [throttle, dispatch] = useReducer(
    reducer,
    createThrottledProgress<TValue>(/** totalRequests */ 0)
  );

  const updateThrottle = useMemo(() => {
    /**
     * Update the throttle with a successful request
     * @param values from request
     */
    function requestSucceededWithData(value: TValue) {
      return dispatch({
        type: "requestSuccess",
        value,
      });
    }

    /**
     * Update the throttle upon a failed request with an error message
     * @param error error
     */
    function requestFailedWithError(error: Error) {
      return dispatch({
        type: "requestFailed",
        error,
      });
    }

    /**
     * Given an array of requestsToMake and a limit on the number of max parallel requests
     * queue up those requests and start firing them
     * - based upon https://stackoverflow.com/a/48007240/761388
     *
     * @param requestsToMake
     * @param maxParallelRequests the maximum number of requests to make - defaults to 6
     */
    function queueRequests(
      requestsToMake: RequestToMake[],
      maxParallelRequests = 6
    ) {
      dispatch({
        type: "initialise",
        totalRequests: requestsToMake.length,
      });

      return throttleRequests(requestsToMake, maxParallelRequests);
    }

    return {
      queueRequests,
      requestSucceededWithData,
      requestFailedWithError,
    };
  }, [dispatch]);

  return {
    throttle,
    updateThrottle,
  };
}

The useThrottleRequests hook returns 2 properties:

  • throttle - a ThrottledProgress<TData> that contains the following data:

    • totalRequests - the number of requests that will be made
    • errors - the errors that came from failed requests
    • values - the responses that came from successful requests
    • percentageLoaded - a value between 0 and 100 which represents the percentage of requests that have been completed (whether successfully or not)
    • loading - whether the throttle is currently processing requests
  • updateThrottle - an object which exposes 3 functions:

    • queueRequests - the function to which you pass the requests that should be queued and executed in a throttled fashion
    • requestSucceededWithData - the function which is called if a request succeeds to provide the data
    • requestFailedWithError - the function which is called if a request fails to provide the error

That's a lot of words to describe our useThrottleRequests hook. Let's look at what it looks like by migrating our use10_000Requests hook to (no pun intended) use it. Here's a new implementation of App.tsx:

import React, { useState } from "react";
import { useAsync } from "react-use";
import { useThrottleRequests } from "./useThrottleRequests";
import "./App.css";

function use10_000Requests(startedAt: string) {
  const { throttle, updateThrottle } = useThrottleRequests();
  const [progressMessage, setProgressMessage] = useState("not started");

  useAsync(async() => {
      if (!startedAt) return;

      setProgressMessage("preparing");

      const requestsToMake = Array.from(Array(10_000)).map(
        (_, index) => async () => {
          try {
            setProgressMessage(`loading ${index}...`);

            const response = await fetch(
              `/manifest.json?querystringValueToPreventCaching=${startedAt}_request-${index}`
            );
            const json = await response.json();

            updateThrottle.requestSucceededWithData(json);
          } catch (error) {
            console.error(`failed to load ${index}`, error);
            updateThrottle.requestFailedWithError(error);
          }
        }
      );

      await updateThrottle.queueRequests(requestsToMake);

  }, [startedAt, updateThrottle, setProgressMessage]);

  return { throttle, progressMessage };
}

function App() {
  const [startedAt, setStartedAt] = useState("");

  const { progressMessage, throttle } = use10_000Requests(startedAt);

  return (
    <div className="App">
      <header className="App-header">
        <h1>The HTTP request machine</h1>
        <button
          className="App-button"
          onClick={(_) => setStartedAt(new Date().toISOString())}
        >
          Make 10,000 requests
        </button>
        {throttle.loading && <div>{progressMessage}</div>}
        {throttle.values.length > 0 && (
          <div className="App-results">
            {throttle.values.length} requests completed successfully
          </div>
        )}
        {throttle.errors.length > 0 && (
          <div className="App-results">
            {throttle.errors.length} requests errored
          </div>
        )}
      </header>
    </div>
  );
}

export default App;

Looking at the new use10_000Requests hook, there's a few subtle differences to our prior implementation. First of all, we're now exposing the throttle; a ThrottleProgress<TData>. Our updated hook also exposes a progressMessage which is a simple string stored with useState that we update as our throttle runs. In truth the information being surfaced here isn't that interesting. The progressMessage is in place just to illustrate that you could capture some data from your requests as they complete for display purposes; a running total for instance.

So, how does our new hook approach perform?

Very well indeed! Please note that the above GIF has again been edited for brevity. If we look back at the problems we faced with the prior approach, how do we compare?

  1. the browser becoming unresponsive - the browser remains responsive.
  2. failing HTTP requests due to insufficient resources - the browser does not experience failing HTTP requests.
  3. no information displayable to the user around progress - details of progress are displayed to the user throughout.

Tremendous!

What shall we build?

Our current example is definitely contrived. Let's try and apply our useThrottleRequests hook to a more realistic scenario. We're going to build an application which, given a repo on GitHub, lists all the contributors blogs. (You can specify a blog URL on your GitHub profile; many people use this to specify their Twitter profile.)

We can build this thanks to the excellent GitHub REST API. It exposes two endpoints of interest given our goal.

1. List repository contributors

List repository contributors lists contributors to the specified repository at this URL: GET https://api.github.com/repos/{owner}/{repo}/contributors. The response is an array of objects, crucially featuring a url property that points to the user in question's API endpoint:

[
  // ...
  {
    // ...
    "url": "https://api.github.com/users/octocat",
    // ...
  },
  // ...
]
2. Get a user

Get a user is the API that the url property above is referring to. When called it returns an object representing the publicly available information about a user:

{
  // ...
  "name": "The Octocat",
  // ...
  "blog": "https://github.blog",
  // ...
}

Blogging devs v1.0

We're now ready to build our blogging devs app; let's replace the existing App.tsx with:

import React, { useCallback, useMemo, useState } from "react";
import { useAsync } from "react-use";
import { useThrottleRequests } from "./useThrottleRequests";
import "./App.css";

type GitHubUser = { name: string; blog?: string };

function timeout(ms: number) {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

function useContributors(contributorsUrlToLoad: string) {
  const { throttle, updateThrottle } = useThrottleRequests<GitHubUser>();
  const [progressMessage, setProgressMessage] = useState("");

  useAsync(async () => {
    if (!contributorsUrlToLoad) return;

    setProgressMessage("loading contributors");

    // load contributors from GitHub
    const contributorsResponse = await fetch(contributorsUrlToLoad);
    const contributors: { url: string }[] = await contributorsResponse.json();

    setProgressMessage(`loading ${contributors.length} contributors...`);

    // For each entry in result, retrieve the given user from GitHub
    const requestsToMake = contributors.map(({ url }, index) => async () => {
      try {
        setProgressMessage(
          `loading ${index} / ${contributors.length}: ${url}...`
        );

        const response = await fetch(url);
        const json: GitHubUser = await response.json();

        // wait for 1 second before completing the request
        // - makes for better demos
        await timeout(1000);

        updateThrottle.requestSucceededWithData(json);
      } catch (error) {
        console.error(`failed to load ${url}`, error);
        updateThrottle.requestFailedWithError(error);
      }
    });

    await updateThrottle.queueRequests(requestsToMake);

    setProgressMessage("");
  }, [contributorsUrlToLoad, updateThrottle, setProgressMessage]);

  return { throttle, progressMessage };
}

function App() {
  // The owner and repo to query; we're going to default
  // to using DefinitelyTyped as an example repo as it
  // is one of the most contributed to repos on GitHub
  const [owner, setOwner] = useState("DefinitelyTyped");
  const [repo, setRepo] = useState("DefinitelyTyped");
  const handleOwnerChange = useCallback(
    (event: React.ChangeEvent<HTMLInputElement>) =>
      setOwner(event.target.value),
    [setOwner]
  );
  const handleRepoChange = useCallback(
    (event: React.ChangeEvent<HTMLInputElement>) => setRepo(event.target.value),
    [setRepo]
  );

  const contributorsUrl = `https://api.github.com/repos/${owner}/${repo}/contributors`;

  const [contributorsUrlToLoad, setUrlToLoad] = useState("");
  const { progressMessage, throttle } = useContributors(contributorsUrlToLoad);

  const bloggers = useMemo(
    () => throttle.values.filter((contributor) => contributor.blog),
    [throttle]
  );

  return (
    <div className="App">
      <header className="App-header">
        <h1>Blogging devs</h1>

        <p>
          Show me the{" "}
          <a
            className="App-link"
            href={contributorsUrl}
            target="_blank"
            rel="noopener noreferrer"
          >
            contributors for {owner}/{repo}
          </a>{" "}
          who have blogs.
        </p>

        <div className="App-labelinput">
          <label htmlFor="owner">GitHub Owner</label>
          <input
            id="owner"
            type="text"
            value={owner}
            onChange={handleOwnerChange}
          />
          <label htmlFor="repo">GitHub Repo</label>
          <input
            id="repo"
            type="text"
            value={repo}
            onChange={handleRepoChange}
          />
        </div>

        <button
          className="App-button"
          onClick={(e) => setUrlToLoad(contributorsUrl)}
        >
          Load bloggers from GitHub
        </button>

        {progressMessage && (
          <div className="App-progress">{progressMessage}</div>
        )}

        {throttle.percentageLoaded > 0 && (
          <>
            <h3>Behold {bloggers.length} bloggers:</h3>
            <div className="App-results">
              {bloggers.map((blogger) => (
                <div key={blogger.name}>
                  <div>{blogger.name}</div>
                  <a
                    className="App-link"
                    href={blogger.blog}
                    target="_blank"
                    rel="noopener noreferrer"
                  >
                    {blogger.blog}
                  </a>
                </div>
              ))}
            </div>
          </>
        )}

        {throttle.errors.length > 0 && (
          <div className="App-results">
            {throttle.errors.length} requests errored
          </div>
        )}
      </header>
    </div>
  );
}

export default App;

The application gives users the opportunity to enter the organisation and repository of a GitHub project. Then, when the button is clicked, it:

  • loads the contributors
  • for each contributor it loads the individual user (separate HTTP request for each)
  • as it loads it communicates how far through the loading progress it has got
  • as users are loaded, it renders a tile for each user with a listed blog

Just to make the demo a little clearer we've artificially slowed the duration of each request by a second. What does it look like when you put it together? Well like this:

We have built a React Hook which allows us to:

  • gradually load data
  • without blocking the UI of the browser
  • and which provides progress data to keep users informed.

This post was originally published on LogRocket.

Saturday 31 October 2020

Azure DevOps Node API: The missing episodes

I've been taking a good look at the REST API for Azure DevOps. I'm delighted to say that it's a very full API. However, there's quirks.

I'm writing a tool that interrogates Azure DevOps in order that it can construct release documentation. That release documentation we would like to publish to the project wiki.

To make integration with Azure DevOps even easier, the ADO team have put a good amount of work into client libraries that allow you to code in your language of choice. In my case I'm writing a Node.js tool (using TypeScript) and happily the client lib for Node is written and published with TypeScript too. Tremendous! However, there is a "but" coming....

Wiki got a big ol' "but"

As I've been using the Node client lib, I've found minor quirks. Such as the GitApi.getRefs missing the pagination parts of the API.

Whilst the GitApi was missing some parameters on a method, the WikiApi was missing whole endpoints, such as the Pages - Create Or Update one. The various client libraries are auto-generated which makes contribution a difficult game. The lovely Matt Cooper has alerted the team

These clients are generated from the server-side controllers, and at a glance, I don't understand why those two parameters weren't included. Full transparency, we don't dedicate a lot of cycles here, but I will get it on the team's radar to investigate/improve.

In the meantime, I still had a tool to write.

Handrolled Wiki API

Whilst the Node.js client lib was missing some crucial pieces, there did seem to be a way forward. Using the API directly; not using the client lib to do our HTTP and using axios instead. Happily the types we needed were still available for be leveraged.

Looking at the docs it seemed it ought to be simple:

https://docs.microsoft.com/en-us/rest/api/azure/devops/?view=azure-devops-rest-6.1#assemble-the-request

But when I attempted this I found my requests erroring out with 203 Non-Authoritative Informations. It didn't make sense. I couldn't get a single request to be successful, they all failed. It occurred to me that the answer was hiding in node_modules. I'd managed to make successful requests to the API using the client lib. What was it doing that I wasn't?

The answer ended up being an authorization one-liner:

    const request = await axios({
        url,
        headers: {
            Accept: 'application/json',
            'Content-Type': 'application/json',
            // This!
            Authorization: `Basic ${Buffer.from(`PAT:${adoPersonalAccessToken}`).toString('base64')}`,
            'X-TFS-FedAuthRedirect': 'Suppress',
        },
    });
}

With this in hand everything started to work and I found myself able to write my own clients to fill in the missing pieces from the client lib:

import axios from 'axios';
import { WikiPage, WikiPageCreateOrUpdateParameters, WikiType } from 'azure-devops-node-api/interfaces/WikiInterfaces';
import { IWikiApi } from 'azure-devops-node-api/WikiApi';

async function getWikiPage({
    adoUrl,
    adoProject,
    adoPat,
    wikiId,
    path,
}: {
    adoUrl: string;
    adoProject: string;
    adoPat: string;
    wikiId: string;
    path: string;
}) {
    try {
        const url = `${makeBaseApiUrl({
            adoUrl,
            adoProject,
        })}/wiki/wikis/${wikiId}/pages?${apiVersion}&path=${path}&includeContent=True&recursionLevel=full`;
        const request = await axios({
            url,
            headers: makeHeaders(adoPat),
        });

        const page: WikiPage = request.data;
        return page;
    } catch (error) {
        return undefined;
    }
}

async function createWikiPage({
    adoUrl,
    adoProject,
    adoPat,
    wikiId,
    path,
    data,
}: {
    adoUrl: string;
    adoProject: string;
    adoPat: string;
    wikiId: string;
    path: string;
    data: WikiPageCreateOrUpdateParameters;
}) {
    try {
        const url = `${makeBaseApiUrl({
            adoUrl,
            adoProject,
        })}/wiki/wikis/${wikiId}/pages?${apiVersion}&path=${path}`;

        const request = await axios({
            method: 'PUT',
            url,
            headers: makeHeaders(adoPat),
            data,
        });

        const newPage: WikiPage = request.data;
        return newPage;
    } catch (error) {
        return undefined;
    }
}

const apiVersion = "api-version=6.0";

/**
* Create the headers necessary to ake Azure DevOps happy
* @param adoPat Personal Access Token from ADO
*/
function makeHeaders(adoPat: string) {
    return {
        Accept: 'application/json',
        'Content-Type': 'application/json',
        Authorization: `Basic ${Buffer.from(`PAT:${adoPat}`).toString('base64')}`,
        'X-TFS-FedAuthRedirect': 'Suppress',
    };
}

/**
* eg https://dev.azure.com/{organization}/{project}/_apis
*/
function makeBaseApiUrl({ adoUrl, adoProject }: { adoUrl: string; adoProject: string }) {
    return `${adoUrl}/${adoProject}/_apis`;
}

With this I was able to write code like this:

    let topLevelPage = await getWikiPage({
        adoUrl,
        adoProject,
        adoPat,
        wikiId,
        path: config.wikiTopLevelName,
    });

    if (!topLevelPage)
        topLevelPage = await createWikiPage({
            adoUrl,
            adoProject,
            adoPat,
            wikiId,
            path: config.wikiTopLevelName,
            data: { content: '' },
        });

and the wikis were ours!

Monday 19 October 2020

The Mysterious Case of Safari and the Empty Download

TL;DR: Safari wants a Content-Type header in responses. Even if the response is Content-Length: 0. Without this, Safari can attempt to trigger an empty download. Don't argue; just go with it; some browsers are strange.

The longer version

Every now and then a mystery presents itself. A puzzle which just doesn't make sense and yet stubbornly continues to exist. I happened upon one of these the other day and to say it was frustrating does it no justice at all.

It all came back to the default iOS and Mac browser; Safari. When our users log into our application, they are redirected to a shared login provider which, upon successful authentication, hands over a cookie containing auth details and redirects back to our application. A middleware in our app reads what it needs from the cookie and then creates a cookie of its own which is to be used throughout the session. As soon as the cookie is set, the page refreshes and the app boots up in an authenticated state.

That's the background. This mechanism had long been working fine with Chrome (which the majority of our users browse with), Edge, Firefox and Internet Explorer. But we started to get reports from Safari users that, once they'd supplied their credentials, they'd not be authenticated and redirected back to our application. Instead they'd be prompted to download an empty document and the redirect would not take place.

As a team we could not fathom why this should be the case; it just didn't make sense. There followed hours of experimentation before Hennie noticed something. It was at the point when the redirect back to our app from the login provider took place. Specifically the initial response that came back which contained our custom cookie and a Refresh: 0 header to trigger a refresh in the browser. There was no content in the response, save for headers. It was Content-Length: 0 all the way.

Hennie noticed that there was no Content-Type set and wondered if that was significant. It didn't seem like it would be a necessary header given there was no content. But Safari reckons not with logic. As an experiment we tried setting the response header to Content-Type: text/html. It worked! No mystery download, no failed redirect (which it turned out was actually a successful redirect which wasn't being surfaced in Safari's network request tab).

It appears that always providing a Content-Type header in your responses is wise if only for the case of Safari. In fact, it's generally unlikely that this won't be set anyway, but it can happen as we have experienced. Hopefully we've suffered so you don't have to.

Friday 2 October 2020

Autofac 6, integration tests and .NET generic hosting

I blogged a little while ago around to support integration tests using Autofac. This was specific to Autofac but documented a workaround for a long standing issue with ConfigureTestContainer that was introduced into .NET core 3.0 which affects all third-party containers that use ConfigureTestContainer in their tests.

I'll not repeat the contents of the previous post - it all still stands. However, with Autofac 6 the approach documented there will cease to work. This is because the previous approach relied upon ContainerBuilder not being sealed. As of Autofac 6 it is.

Happily the tremendous Alistair Evans came up with an alternative approach which is listed below:

    /// <summary>
    /// Based upon https://github.com/dotnet/AspNetCore.Docs/tree/master/aspnetcore/test/integration-tests/samples/3.x/IntegrationTestsSample
    /// </summary>
    /// <typeparam name="TStartup"></typeparam>
    public class AutofacWebApplicationFactory<TStartup> : WebApplicationFactory<TStartup> where TStartup : class
    {
        protected override IHost CreateHost(IHostBuilder builder)
        {
            builder.UseServiceProviderFactory<ContainerBuilder>(new CustomServiceProviderFactory());
            return base.CreateHost(builder);
        }
    }

    /// <summary>
    /// Based upon https://github.com/dotnet/aspnetcore/issues/14907#issuecomment-620750841 - only necessary because of an issue in ASP.NET Core
    /// </summary>
    public class CustomServiceProviderFactory : IServiceProviderFactory<ContainerBuilder>
    {
        private AutofacServiceProviderFactory _wrapped;
        private IServiceCollection _services;

        public CustomServiceProviderFactory()
        {
            _wrapped = new AutofacServiceProviderFactory();
        }

        public ContainerBuilder CreateBuilder(IServiceCollection services)
        {
            // Store the services for later.
            _services = services;

            return _wrapped.CreateBuilder(services);
        }

        public IServiceProvider CreateServiceProvider(ContainerBuilder containerBuilder)
        {
            var sp = _services.BuildServiceProvider();
#pragma warning disable CS0612 // Type or member is obsolete
            var filters = sp.GetRequiredService<IEnumerable<IStartupConfigureContainerFilter<ContainerBuilder>>>();
#pragma warning restore CS0612 // Type or member is obsolete

            foreach (var filter in filters)
            {
                filter.ConfigureContainer(b => { })(containerBuilder);
            }

            return _wrapped.CreateServiceProvider(containerBuilder);
        }        
    }

Using this in place of the previous approach should allow you continue running your integration tests with Autofac 6. Thanks Alistair!

Concern for third-party containers

Whilst this gets us back up and running, Alistair pointed out that this approach depends upon a deprecated interface. This is the IStartupConfigureContainerFilter which has been marked as Obsolete since mid 2019. What this means is, at some point, this approach will stop working.

The marvellous David Fowler has said that ConfigureTestContainer issue should be resolved in .NET. However it's worth noting that this has been an issue since .NET Core 3 shipped and unfortunately the wonderful Chris Ross has advised that it's not likely to be fixed for .NET 5.

I'm very keen this does get resolved in .NET. Building tests upon an Obsolete attribute doesn't fill me with confidence. I'm a long time user of Autofac and I'd like to continue to be. Here's hoping that's made possible by a fix landing in .NET. If this is something you care about, it may be worth upvoting / commenting on the issue in GitHub so the team are aware of desire around this being resolved.

Friday 4 September 2020

Why your team needs a newsfeed

I'm part of a team that builds an online platform. I'm often preoccupied by how to narrow the gap between our users and "us" - the people that build the platform. It's important we understand how people use and interact with what we've built. If we don't then we're liable to waste our time and energy building the wrong things. Or the wrong amount of the right things.

On a recent holiday I spent a certain amount of time pondering how to narrow the gap between our user and us. We have lots of things that help us; we use various analytics tools like mixpanel, we've got a mini analytics platform of our own, we have teams notifications that pop up client feedback and so on. They are all great, but they're somewhat disparate; they don't give us a clear insight as to who uses our platform and how they do so. The information is there, but it's tough to grok. It doesn't make for a joined up story.

Reaching around for how to solve this I had an idea: what if our platform had a newsfeed? The kind of thing that social media platforms the likes of Twitter and Facebook have used to great effect; a stream of mini-activities which show how the community interacts with the product. People logging in and browsing around, using features on the platform. If we could see this in near real time we'd be brought closer to our users; we'd have something that would help us have real empathy and understanding. We'd see our product as the stories of users interacting with it.

How do you build a newsfeed?

This was an experiment that seemed worth pursuing. So I decided to build a proof of concept and see what happened. Now I intended to put the "M" into MVP with this; I went in with a number of intentional constraints:

  1. The news feed wouldn't auto update (users have the F5 key for that)
  2. We'd host the newsfeed in our own mini analytics platform (which is already used by the team to understand how people use the platform)
  3. News stories wouldn't be stored anywhere; we'd generate them on the fly by querying various databases / APIs. The cost of this would be that our news stories wouldn't be "persistent"; you wouldn't be able to address them with a URL; there'd be no way to build "like" or "share" functionality.

All of the above constraints are, importantly, reversable decisions. If we want auto update it could be built later. If we want the newsfeed to live somewhere else we could move it. If we wanted news stories to be persisted then we could do that.

Implementation

With these constraints in mind, I turned my attention to the implementation. I built a NewsFeedService that would be queried for news stories. The interface I decided to build looked like this:

NewsFeedService.getNewsFeed(from: Date, to: Date): NewsFeed

type NewsFeed {
    startedAt: Date;
    ended at: Date;
    stories: NewsStory[];
}

type NewsStory {
    /** When the story happened */
    happenedAt: Date;
    /** A code that represents the type of story this is; eg USER_SESSION */
    storyCode: string
    /** The story details in markdown format */
    story: string;
}

Each query to NewsFeedService.getNewsFeed would query various databases / APIs related to our product, looking for interesting events. Whether it be users logging in, users performing some kind of action, whatever. For each interested event a news story like this would be produced:

Jane Smith logged in at 10:03am for 25 minutes. They placed an order worth £3,000.

Now the killer feature here is Markdown. Our stories are written in Markdown. Why is Markdown cool? Well to quote the creators of Markdown:

Markdown allows you to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML).

This crucially includes the ability to include links. This was significant because I want us to be able to be able to click on pieces of information in the stories and be taken to the relevant place in the platform to see more details. Just as you see status updates on, for example, Twitter which lead you on to more details:

Again consider this example news story:

Jane Smith logged in at 10:03am for 25 minutes. They placed an order worth £3,000.

Consider that story but without a link. It's not the same is it? A newsfeed without links would be missing a trick. Markdown gives us links. And happily due to my extensive work down the open source mines, I speak it like a native.

The first consumer of the newsfeed was to be our own mini analytics platform, which is a React app. Converting the markdown stories to React is a solved problem thanks to the wonderful react-markdown. You can simply sling Markdown at it and out comes HTML. Et voilà a news feed!

What's next?

So that's it! We've built a (primitive) news feed. We can now see in real time how are users are getting on. We're closer to them, we understand them better as a consequence. If we want to take it further there's a number of things we could do:

  1. We could make the feed auto-update
  2. We could push news stories to other destinations. Markdown is a gloriously portable format which can be used in a variety of environments. For instance the likes of Slack and Teams accept it and apps like these are generally open on people's desktops and phones all the time anyway. Another way to narrow the gap between us and and our users.

It's very exciting!

Sunday 9 August 2020

Devcontainers AKA performance in a secure sandbox

Many corporate machines arrive in engineers hands with a preponderance of pre-installed background tools; from virus checkers to backup utilities to port blockers; the list is long.

The reason that these tools are installed is generally noble. However, the implementation can often be problematic. The tools may be set up in such a way as they impact and interfere with one another. Really powerful machines with 8 CPUs and hardy SSDs can be slowed to a crawl. Put simply: the good people responsible for ensuring security are rarely encouraged to incentivise performance alongside it. And so don't.

The unfortunate consequence of considering the role of security without regard to performance is this: sluggish computers. The further consequence (and this is the one I want you to think long and hard about) is low developer productivity. And that sucks. It impacts what an organisation is able to do, how fast an organisation is able to move. Put simply: it can be the difference between success and failure.

The most secure computer is off. But you won't ship much with it. Encouraging your organisation to consider tackling security with performance in mind is worthwhile. It's a long game though. In the meantime what can we do?

"Hide from the virus checkers* in a devcontainer"

Devcontainers, the infrastructure as code equivalent for developing software, have an underappreciated quality: unlocking your machine's performance.

Devcontainers are isolated secure sandboxes in which you can build software. To quote the docs:

A devcontainer.json file in your project tells VS Code how to access (or create) a development container with a well-defined tool and runtime stack. This container can be used to run an application or to sandbox tools, libraries, or runtimes needed for working with a codebase.

Workspace files are mounted from the local file system or copied or cloned into the container.

We're going to set up a devcontainer to code an ASP.NET Core application with a JavaScript (well TypeScript) front end. If there's one thing that's sure to catch a virus checkers beady eye, it's node_modules. node_modules contains more files than a black hole has mass. Consider a project with 5,000 source files. One trusty yarn later and the folder now has a tidy 250,000 files. The virus checker is now really sitting up and taking notice.

Our project has a git commit hook set up with Husky that formats our TypeScript files with Prettier. Every commit the files are formatted to align with the project standard. With all the virus checkers in place a git commit takes around 45 seconds. Inside a devcontainer we can drop this to 5 seconds. That's nine times faster. I'll repeat that: that's nine times faster!

The "cloned into a container" above is key to what we're going to do. We're not going to mount our local file system into the devcontainer. Oh no. We're going to build a devcontainer with ASP.NET CORE and JavaScript in. Then, inside there, we're going to clone our repo. Then we can develop, build and debug all inside the container. It will feel like we're working on our own machine because VS Code does such a damn fine job. In reality, we're connecting to another computer (a Linux computer to boot) that is running in isolation to our own. In our case that machine is sharing our hardware; but that's just an implementation detail. It could be anywhere (and in the future may well be).

Make me a devcontainer...

Enough talk... We're going to need a .devcontainer/devcontainer.json:

{
  "name": "my devcontainer",
  "dockerComposeFile": "../docker-compose.devcontainer.yml",
  "service": "my-devcontainer",
  "workspaceFolder": "/workspace",

  // Set *default* container specific settings.json values on container create.
  "settings": {
    "terminal.integrated.shell.linux": "/bin/zsh"
  },

  // Add the IDs of extensions you want installed when the container is created.
  "extensions": [
    "ms-dotnettools.csharp",
    "dbaeumer.vscode-eslint",
    "esbenp.prettier-vscode",
    "ms-mssql.mssql",
    "eamodio.gitlens",
    "ms-azuretools.vscode-docker",
    "k--kato.docomment",
    "Leopotam.csharpfixformat"
  ],

  // Use 'postCreateCommand' to clone the repo into the workspace folder when the devcontainer starts
  // and copy in the .env file
  "postCreateCommand": "git clone git@github.com:my-org/my-repo.git . && cp /.env /workspace/.env"

  // "remoteUser": "vscode"
}

Now the docker-compose.devcontainer.yml which lives in the root of the project. It provisions a SQL Server container (using the official image) and our devcontainer:

version: "3.7"
services:
  my-devcontainer:
    image: my-devcontainer
    build: 
      context: .
      dockerfile: Dockerfile.devcontainer
    command: /bin/zsh -c "while sleep 1000; do :; done"
    volumes:
      # mount .zshrc from home - make sure it doesn't contain Windows line endings
      - ~/.zshrc:/root/.zshrc

    # user: vscode
    ports:
      - "5000:5000"
      - "8080:8080"
    environment:
      - CONNECTIONSTRINGS__MYDATABASECONNECTION
    depends_on:
      - db
  db:
    image: mcr.microsoft.com/mssql/server:2019-latest
    privileged: true
    ports:
      - 1433:1433
    environment:
      SA_PASSWORD: "Your_password123"
      ACCEPT_EULA: "Y"         

The devcontainer will be built with the Dockerfile.devcontainer in the root of our repo. It relies upon your SSH keys and a .env file being available to be copied in:


#-----------------------------------------------------------------------------------------------------------
# Based upon: https://github.com/microsoft/vscode-dev-containers/tree/master/containers/dotnetcore
#-----------------------------------------------------------------------------------------------------------
ARG VARIANT="3.1-bionic"
FROM mcr.microsoft.com/dotnet/core/sdk:${VARIANT}

# Because MITM certificates
COPY ./docker/certs/. /usr/local/share/ca-certificates/
ENV NODE_EXTRA_CA_CERTS=/usr/local/share/ca-certificates/mitm.pem
RUN update-ca-certificates 

# This Dockerfile adds a non-root user with sudo access. Use the "remoteUser"
# property in devcontainer.json to use it. On Linux, the container user's GID/UIDs
# will be updated to match your local UID/GID (when using the dockerFile property).
# See https://aka.ms/vscode-remote/containers/non-root-user for details.
ARG USERNAME=vscode
ARG USER_UID=1000
ARG USER_GID=$USER_UID

# Options for common package install script
ARG INSTALL_ZSH="true"
ARG UPGRADE_PACKAGES="true"
ARG COMMON_SCRIPT_SOURCE="https://raw.githubusercontent.com/microsoft/vscode-dev-containers/master/script-library/common-debian.sh"
ARG COMMON_SCRIPT_SHA="dev-mode"

# Settings for installing Node.js.
ARG INSTALL_NODE="true"
ARG NODE_SCRIPT_SOURCE="https://raw.githubusercontent.com/microsoft/vscode-dev-containers/master/script-library/node-debian.sh"
ARG NODE_SCRIPT_SHA="dev-mode"

# ARG NODE_VERSION="lts/*"
ARG NODE_VERSION="14"
ENV NVM_DIR=/usr/local/share/nvm

# Have nvm create a "current" symlink and add to path to work around https://github.com/microsoft/vscode-remote-release/issues/3224
ENV NVM_SYMLINK_CURRENT=true
ENV PATH=${NVM_DIR}/current/bin:${PATH}

# Configure apt and install packages
RUN apt-get update \
    && export DEBIAN_FRONTEND=noninteractive \
    #
    # Verify git, common tools / libs installed, add/modify non-root user, optionally install zsh
    && apt-get -y install --no-install-recommends curl ca-certificates 2>&1 \
    && curl -sSL ${COMMON_SCRIPT_SOURCE} -o /tmp/common-setup.sh \
    && ([ "${COMMON_SCRIPT_SHA}" = "dev-mode" ] || (echo "${COMMON_SCRIPT_SHA} */tmp/common-setup.sh" | sha256sum -c -)) \
    && /bin/bash /tmp/common-setup.sh "${INSTALL_ZSH}" "${USERNAME}" "${USER_UID}" "${USER_GID}" "${UPGRADE_PACKAGES}" \
    #
    # Install Node.js
    && curl -sSL ${NODE_SCRIPT_SOURCE} -o /tmp/node-setup.sh \
    && ([ "${NODE_SCRIPT_SHA}" = "dev-mode" ] || (echo "${COMMON_SCRIPT_SHA} */tmp/node-setup.sh" | sha256sum -c -)) \
    && /bin/bash /tmp/node-setup.sh "${NVM_DIR}" "${NODE_VERSION}" "${USERNAME}" \
    #
    # Clean up
    && apt-get autoremove -y \
    && apt-get clean -y \
    && rm -f /tmp/common-setup.sh /tmp/node-setup.sh \
    && rm -rf /var/lib/apt/lists/* \
    #
    # Workspace
    && mkdir workspace \
    && chown -R ${NONROOT_USER}:root workspace


# Install Vim
RUN apt-get update && apt-get install -y \
    vim \
    && rm -rf /var/lib/apt/lists/*

# Set up a timezone in the devcontainer - necessary for anything timezone dependent
ENV TZ=Europe/London
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone \
 && apt-get update \
 && apt-get install --no-install-recommends -y \
    apt-utils \
    tzdata  \
 && apt-get autoremove -y \
 && apt-get clean -y \
 && rm -rf /var/lib/apt/lists/* 

ENV DOTNET_RUNNING_IN_CONTAINER=true 

# Copy across SSH keys so you can git clone
RUN mkdir /root/.ssh
RUN chmod 700 /root/.ssh

COPY .ssh/id_rsa /root/.ssh
RUN chmod 600 /root/.ssh/id_rsa

COPY .ssh/id_rsa.pub /root/.ssh
RUN chmod 644 /root/.ssh/id_rsa.pub

COPY .ssh/known_hosts /root/.ssh
RUN chmod 644 /root/.ssh/known_hosts  

# Disable initial git clone prompt
RUN echo "StrictHostKeyChecking no" >> /etc/ssh/ssh_config

# Copy across .env file so you can customise environment variables
# This will be copied into the root of the repo post git clone
COPY .env /.env
RUN chmod 644 /.env  

# Install dotnet entity framework tools
RUN dotnet tool install dotnet-ef --tool-path /usr/local/bin --version 3.1.2 

With this devcontainer you're good to go for an ASP.NET Core / JavaScript developer setup that is blazing fast! Remember to fire up Docker and give it goodly access to the resources of your host machine. All the CPUs, lots of memory and all the performance that there ought to be.

* "virus checkers" is a euphemism here for all the background tools that may be running. It was that or calling them "we are legion"