Phil Marius

Data Scientist, Data Engineer, Linux, and OSS Fan

04 Mar 2021

Azure Functions Environment Separation with Linux Apps

At my current company, we use Azure Functions as the runtime for a number of our data collection pipelines. Serverless technology has been a recent adoption here and it’s worked out well for us so far. We’re big fans of Python in the data engineering team and quickly building robust collection services works out well for us.

Until it gets to maintaining these services. The only real split in Azure Functions between development / staging / production (dev/stg/prod) environments is running it locally (development) or publishing the function and running it on Azure infrastructure (production). Deployment Slots are the Azure official method of maintaining function apps but, in typical Microsoft fashion, they’re not available for apps running on the Linux consumption plan.

This arguably makes Functions redundant outside of Windows apps and C#. Looking on the forums, they are supposedly under development according to this post, however, that same post also says they were due end of 2020.

So who knows.

Comparing this to other serverless technologies, AWS Lambda requires a bit of setup but works with all languages. GCP takes a different approach and suggests using different projects for different environments, a tutorial is available here.

Here’s my solution to this issue.

Three Different Functions

Yep, you heard that right.

Overview

I decided to have one separate function for each of development, staging, and production with automated processes to deploy each one. The full diagram is as follows:

The above image describes the full process of one of our data collection functions called “collector” and the process starts when I clone the remote git repository to my local machine. Using the following command, I retrieve the function settings from the dev-collector function:

func azure functionapp fetch-app-settings dev-collector

This creates the local.settings.json file with all the configuration elements within from the deployed collector function on Azure. The various connection strings / API keys / etc. are all Azure Key Vault references accessible to the Azure Function.

Storage Locations

One configuration setting is not an Azure Key Vault reference, it’s called environment and serves as the decider as to which environment the function is running. It is one of:

  • development
  • staging
  • production

And they refer to containers in our Azure Data Lake Storage Gen 2 (ADLS) instance. Our ADLS is configured so that the different environment data storages are separate containers within the same instance. This enables us to use the same folder structure across all three locations and only change the environment in which a certain service is running. This is what the environment configuration setting controls, which container to talk to.

This setting can then be called via reference in various locations, including things like blob triggers using the following syntax:

{
  "scriptFile": "__init__.py",
  "bindings": [
    {
      "name": "inblob",
      "type": "blobTrigger",
      "direction": "in",
      "path": "%environment%/xxx/xxx/xxx/{name}.json",
      "connection": "storage_location"
    },
    {
      "name": "outblob",
      "type": "blob",
      "path": "%environment%/xxx/xxx/xxx/xxx/{name}.csv",
      "connection": "storage_location",
      "direction": "out"
    }
  ]
}

This allows each function to communicate with the correct container.

Back to my local development collector, it’s currently in development mode on my local machine and has "environment": "development", in the local.settings.json. I replace the Azure Key Vault references with my personal access tokens for the various services I need to interact with as the references cannot be used locally. Using func start I can run the function locally and have the data save to the development container on our ADLS instance to verify it works as intended.

Then, when satisfied with the result, I would like to test how well it works on the timer trigger it’s configured on on Azure. We run a variation of git flow with many of our workflows so I create a new branch feature/ticket-name , commit my changes, and push to the remote repository. I open a PR to the develop branch, it’s approved by a colleague when they eventually check their emails and it’s finally merged.

Continuous Deployment

At this point, Github Actions kicks in and automatically deploys the function to its Azure hosted instance using the following CI/CD pipeline:

The above workflow does a number of things. Firstly, the build-and-deploy job is triggered on a push to one of three branches, these are:

  • develop
  • master
  • release/** (any branch beginning with “release/”)

As my PR was merged to develop, this CI/CD pipeline is triggered. Then, the following is executed:

  1. Checks out code
  2. Sets up Python environment using the PYTHON_VERSION environment variable set at the top of the workflow
  3. The environment is set (I’ll come onto this)
  4. Dependencies are installed
  5. Run test
  6. Run audits
  7. Run check to see if it’s black compliant (no mercy for crappy code)
  8. Publish function to Azure infrastructure

Step 3 is where the magic happens. This is the step where the workflow configures itself for the right environment depending on what branch it was called on. The ${{ github.ref }} GitHub Action variable contains information relating to the branch the workflow is running on. Step 3 takes this information and sets the following two variables:

  • AZURE_FUNCTIONAPP_NAME
  • PUBLISH_PROFILE_NAME

The two variables point to which Azure hosted function this workflow is pointing to. AZURE_FUNCTIONAPP_NAME is the name of the function itself (i.e. dev-collector for the develop branch) and PUBLISH_PROFILE_NAME is the name of the secret that contains the publishing profile of the function (i.e. DEV_AZURE_FUNCTIONAPP_PUBLISH_PROFILE for develop), not to be confused with the publishing profile itself. This is then called with ${{ secrets[env.PUBLISH_PROFILE_NAME] }} to get the secret itself. The reason behind setting the name and not fetching the secret is because GitHub Actions doesn’t obscure a secret if it’s set to an environment variable and retrieved as one.

All three publishing profiles for the three different function apps are saved under GitHub secrets. These are:

  • DEV_AZURE_FUNCTIONAPP_PUBLISH_PROFILE for develop
  • STG_AZURE_FUNCTIONAPP_PUBLISH_PROFILE for staging
  • PROD_AZURE_FUNCTIONAPP_PUBLISH_PROFILE for production

The CI/CD process is based off the one documented in the Azure docs for GitHub Actions.

Release Process

How to move the function from development to staging to production is the next step. This process takes its inspiration from Rebecca Franks’ blog post on automating her release process using GitHub actions (highly recommend reading this first).

Using the workflow_dispatch trigger in GitHub Actions, we can manually trigger a new release process with the following GitHub Actions workflow:

Starting with the trigger, workflow dispatch looks like this in the GitHub interface:

Which allows us to choose which release process yml file to use (really useful when updating the CI/CD pipelines) and has two variables for the release process trigger-er to fill in:

  • Name of function version (required, i.e. 2.3.0) - this details the version of the function being updated.
  • Corvidae release version (not required, i.e. 7) - in case this function release relates to a Corvidae release version, it can be detailed here.

When this is triggered, the following is executed:

  1. Code is checked out
  2. A new release branch (e.g. release/v2.3.0) is created with the version name (I’ll come onto this)
  3. The version name from the workflow dispatch trigger is saved to a VERSION file in the repo
  4. The changelog is updated (we use keepachangelog format)
  5. A git user is configured (GitHub bot credentials)
  6. Commit changelog and version file to the repository
  7. Push new branch to the repository (this is very important, I totally forgot to do this and only realised 10 contributions later)
  8. Create PR into master from release branch (git flow)
  9. Create PR into develop from release branch (git flow)

Staging

Once we have this release/v2.3.0 branch ready to be merged to master and develop, the GitHub CI/CD Action from before gets triggered on the prefix match of release/** and the staging function (stg-collector) gets updated.

The staging function is connected to our staging database which, in turn, is connected to the staging web app the backend engineers develop. At this point, the data can be loaded through this collection pipeline into the database which can then be seen on the front end.

Any updates that need immediate fixing can be done here by merging into the release branch.

Production

Then, when happy with the release, the two PRs can be merged into the master / main and develop branches. Once this happens, the next GitHub Action is triggered:

This action is the release tagging process and is triggered on merges to master. The process is:

  1. Check out code
  2. Initialise git configuration again
  3. Pull the version information from the VERSION file and save to a new environment variable
  4. Extract release notes
  5. Create a release using the version number and release notes

At the same time this runs, the CI/CD pipeline from earlier is run again but this time for the production function.

Conclusion

And that’s about it! A full development, staging, and production separation between Azure function apps which we’ve been working with recently.

Next steps:

  • Configure access control lists on our Azure Data Lake Storage Gen 2 instance so that we can begin restricting the access each function has to each environments (it’s currently connection strings)