04 Mar 2021

Azure Functions Environment Separation with Linux Apps

At my current company, we use Azure Functions as the runtime for a number of our data collection pipelines. Serverless technology has been a recent adoption here and it’s worked out well for us so far. We’re big fans of Python in the data engineering team and quickly building robust collection services works out well for us.

Until it gets to maintaining these services. The only real split in Azure Functions between development / staging / production (dev/stg/prod) environments is running it locally (development) or publishing the function and running it on Azure infrastructure (production). Deployment Slots are the Azure official method of maintaining function apps but, in typical Microsoft fashion, they’re not available for apps running on the Linux consumption plan.

This arguably makes Functions redundant outside of Windows apps and C#. Looking on the forums, they are supposedly under development according to this post, however, that same post also says they were due end of 2020.

So who knows.

Comparing this to other serverless technologies, AWS Lambda requires a bit of setup but works with all languages. GCP takes a different approach and suggests using different projects for different environments, a tutorial is available here.

Here’s my solution to this issue.

Three Different Functions

Yep, you heard that right.

Overview

I decided to have one separate function for each of development, staging, and production with automated processes to deploy each one. The full diagram is as follows:

The above image describes the full process of one of our data collection functions called “collector” and the process starts when I clone the remote git repository to my local machine. Using the following command, I retrieve the function settings from the dev-collector function:

func azure functionapp fetch-app-settings dev-collector

This creates the local.settings.json file with all the configuration elements within from the deployed collector function on Azure. The various connection strings / API keys / etc. are all Azure Key Vault references accessible to the Azure Function.

Storage Locations

One configuration setting is not an Azure Key Vault reference, it’s called environment and serves as the decider as to which environment the function is running. It is one of:

development
staging
production

And they refer to containers in our Azure Data Lake Storage Gen 2 (ADLS) instance. Our ADLS is configured so that the different environment data storages are separate containers within the same instance. This enables us to use the same folder structure across all three locations and only change the environment in which a certain service is running. This is what the environment configuration setting controls, which container to talk to.

This setting can then be called via reference in various locations, including things like blob triggers using the following syntax:

{
  "scriptFile": "__init__.py",
  "bindings": [
    {
      "name": "inblob",
      "type": "blobTrigger",
      "direction": "in",
      "path": "%environment%/xxx/xxx/xxx/{name}.json",
      "connection": "storage_location"
    },
    {
      "name": "outblob",
      "type": "blob",
      "path": "%environment%/xxx/xxx/xxx/xxx/{name}.csv",
      "connection": "storage_location",
      "direction": "out"
    }
  ]
}

This allows each function to communicate with the correct container.

Back to my local development collector, it’s currently in development mode on my local machine and has "environment": "development", in the local.settings.json. I replace the Azure Key Vault references with my personal access tokens for the various services I need to interact with as the references cannot be used locally. Using func start I can run the function locally and have the data save to the development container on our ADLS instance to verify it works as intended.

Then, when satisfied with the result, I would like to test how well it works on the timer trigger it’s configured on on Azure. We run a variation of git flow with many of our workflows so I create a new branch feature/ticket-name , commit my changes, and push to the remote repository. I open a PR to the develop branch, it’s approved by a colleague when they eventually check their emails and it’s finally merged.

Continuous Deployment

At this point, Github Actions kicks in and automatically deploys the function to its Azure hosted instance using the following CI/CD pipeline:

	name: Run CI and CD

	on:
	push:
	branches:
	- develop
	- master
	- release/**

	env:
	AZURE_FUNCTIONAPP_PACKAGE_PATH: "." # set this to the path to your web app project, defaults to the repository root
	PYTHON_VERSION: "3.8" # set this to the python version to use (supports 3.6, 3.7, 3.8)

	jobs:
	build-and-deploy:
	runs-on: ubuntu-latest

	steps:
	- name: "Checkout GitHub Action"
	uses: actions/checkout@master

	- name: Setup Python ${{ env.PYTHON_VERSION }} Environment
	uses: actions/setup-python@v1
	with:
	python-version: ${{ env.PYTHON_VERSION }}

	- name:
	Set environment to use
	# Check the branch and set function name if master, release, or develop
	# Sets publish profile to correct profile for function as well
	# We use GHA `$GITHUB_ENV` to set local env vars to be used (see https://docs.github.com/en/actions/reference/workflow-commands-for-github-actions#setting-an-environment-variable)
	# Match on prefix for release/** : https://stackoverflow.com/questions/2172352/in-bash-how-can-i-check-if-a-string-begins-with-some-value
	run: \|
	if [ "${{github.ref}}" == "refs/heads/master" ]; then
	echo "AZURE_FUNCTIONAPP_NAME=prod-collector" >> $GITHUB_ENV
	echo "PUBLISH_PROFILE_NAME=PROD_AZURE_FUNCTIONAPP_PUBLISH_PROFILE" >> $GITHUB_ENV

	elif [ "${{github.ref}}" == "refs/heads/develop" ]; then
	echo "AZURE_FUNCTIONAPP_NAME=dev-collector" >> $GITHUB_ENV
	echo "PUBLISH_PROFILE_NAME=DEV_AZURE_FUNCTIONAPP_PUBLISH_PROFILE" >> $GITHUB_ENV

	elif [[ "${{github.ref}}" == refs/heads/release/* ]]; then
	echo "AZURE_FUNCTIONAPP_NAME=stg-collector" >> $GITHUB_ENV
	echo "PUBLISH_PROFILE_NAME=STG_AZURE_FUNCTIONAPP_PUBLISH_PROFILE" >> $GITHUB_ENV

	else
	echo "The branch ${{ github.ref }} cannot do this..."
	exit 1
	python -m pip install --upgrade pip
	pip install flake8 pytest
	if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
	popd

	- name: Run Tests
	run: \|
	pytest

	- name: Run Audit
	run: \|
	safety check -r requirements.txt

	- name: Run Black
	uses: jpetrucciani/black-check@master
	with:
	path: "."

	- name: Run Azure Functions Action
	uses: Azure/functions-action@v1
	id: fa
	with:
	app-name: ${{ env.AZURE_FUNCTIONAPP_NAME }}
	package: ${{ env.AZURE_FUNCTIONAPP_PACKAGE_PATH }}
	publish-profile: ${{ secrets[env.PUBLISH_PROFILE_NAME] }}
	# For more samples to get started with GitHub Action workflows to deploy to Azure, refer to https://github.com/Azure/actions-workflow-samples fi

	- name: Install Dependencies
	shell: bash
	run: \|
	pushd './${{ env.AZURE_FUNCTIONAPP_PACKAGE_PATH }}'
	python -m pip install --upgrade pip
	pip install flake8 pytest
	if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
	popd

	- name: Run Tests
	run: \|
	pytest

	- name: Run Audit
	run: \|
	safety check -r requirements.txt

	- name: Run Black
	uses: jpetrucciani/black-check@master
	with:
	path: "."

	- name: Run Azure Functions Action
	uses: Azure/functions-action@v1
	id: fa
	with:
	app-name: ${{ env.AZURE_FUNCTIONAPP_NAME }}
	package: ${{ env.AZURE_FUNCTIONAPP_PACKAGE_PATH }}
	publish-profile: ${{ secrets[env.PUBLISH_PROFILE_NAME] }}
	# For more samples to get started with GitHub Action workflows to deploy to Azure, refer to https://github.com/Azure/actions-workflow-samples

view raw ci-cd.yml hosted with ❤ by GitHub

The above workflow does a number of things. Firstly, the build-and-deploy job is triggered on a push to one of three branches, these are:

develop
master
release/** (any branch beginning with “release/”)

As my PR was merged to develop, this CI/CD pipeline is triggered. Then, the following is executed:

Checks out code
Sets up Python environment using the PYTHON_VERSION environment variable set at the top of the workflow
The environment is set (I’ll come onto this)
Dependencies are installed
Run test
Run audits
Run check to see if it’s black compliant (no mercy for crappy code)
Publish function to Azure infrastructure

Step 3 is where the magic happens. This is the step where the workflow configures itself for the right environment depending on what branch it was called on. The ${{ github.ref }} GitHub Action variable contains information relating to the branch the workflow is running on. Step 3 takes this information and sets the following two variables:

AZURE_FUNCTIONAPP_NAME
PUBLISH_PROFILE_NAME

The two variables point to which Azure hosted function this workflow is pointing to. AZURE_FUNCTIONAPP_NAME is the name of the function itself (i.e. dev-collector for the develop branch) and PUBLISH_PROFILE_NAME is the name of the secret that contains the publishing profile of the function (i.e. DEV_AZURE_FUNCTIONAPP_PUBLISH_PROFILE for develop), not to be confused with the publishing profile itself. This is then called with ${{ secrets[env.PUBLISH_PROFILE_NAME] }} to get the secret itself. The reason behind setting the name and not fetching the secret is because GitHub Actions doesn’t obscure a secret if it’s set to an environment variable and retrieved as one.

All three publishing profiles for the three different function apps are saved under GitHub secrets. These are:

DEV_AZURE_FUNCTIONAPP_PUBLISH_PROFILE for develop
STG_AZURE_FUNCTIONAPP_PUBLISH_PROFILE for staging
PROD_AZURE_FUNCTIONAPP_PUBLISH_PROFILE for production

The CI/CD process is based off the one documented in the Azure docs for GitHub Actions.

Release Process

How to move the function from development to staging to production is the next step. This process takes its inspiration from Rebecca Franks’ blog post on automating her release process using GitHub actions (highly recommend reading this first).

Using the workflow_dispatch trigger in GitHub Actions, we can manually trigger a new release process with the following GitHub Actions workflow:

	# Release branch process
	# Creates new release branch with version name
	# Saves version name to artifact for release process
	# Inspired by: https://riggaroo.dev/using-github-actions-to-automate-our-release-process/

	name: Create Release Branch

	on:
	workflow_dispatch:
	inputs:
	version_name:
	description: "Name of function version (i.e. 2.3.0)"
	required: true
	corvidae_release_name:
	description: Corvidae release version (optional, i.e. 7)
	required: false

	jobs:
	create_release:
	runs-on: ubuntu-latest

	steps:
	- name: Check out code
	uses: actions/checkout@v2

	- name: Create release branch
	run: git checkout -b release/v${{ github.event.inputs.version_name }}

	- name: Initialise mandatory git config
	run: \|
	git config user.name "GitHub Actions"
	git config user.email noreply@github.com

	- name: Update changelog
	uses: thomaseizinger/keep-a-changelog-new-release@v1
	with:
	version: ${{ github.event.inputs.version_name }}

	- name: Save version to file
	run: echo "${{ github.event.inputs.version_name }}" > VERSION

	- name: Commit changelog and VERSION
	id: make-commit
	run: \|
	git add CHANGELOG.md VERSION
	git commit -m "Prepare release ${{ github.event.inputs.version_name }}"
	echo "::set-output name=commit::$(git rev-parse HEAD)"

	- name: Push new branch
	run: git push origin release/v${{ github.event.inputs.version_name }}

	- name: Create PR into master
	uses: thomaseizinger/create-pull-request@1.0.0
	with:
	GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
	head: release/v${{ github.event.inputs.version_name }}
	base: master
	title: v${{ github.event.inputs.version_name }} into master
	reviewers: ${{ github.event.issue.user.login }}
	body: \|
	Release PR for version ${{ github.event.inputs.version_name }}.
	Version name and code commit updated: ${{ steps.make-commit.outputs.commit }}
	For Corvidae release: ${{ github.event.inputs.corvidae_release_name }}

	- name: Create PR into develop
	uses: thomaseizinger/create-pull-request@1.0.0
	with:
	GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
	head: release/v${{ github.event.inputs.version_name }}
	base: develop
	title: v${{ github.event.inputs.version_name }} into develop
	reviewers: ${{ github.event.issue.user.login }}
	body: \|
	Release PR for version ${{ github.event.inputs.version_name }}.
	Version name and code commit updated: ${{ steps.make-commit.outputs.commit }}
	For Corvidae release: ${{ github.event.inputs.corvidae_release_name }}

view raw create-release-branch.yml hosted with ❤ by GitHub

Starting with the trigger, workflow dispatch looks like this in the GitHub interface:

Which allows us to choose which release process yml file to use (really useful when updating the CI/CD pipelines) and has two variables for the release process trigger-er to fill in:

Name of function version (required, i.e. 2.3.0) - this details the version of the function being updated.
Corvidae release version (not required, i.e. 7) - in case this function release relates to a Corvidae release version, it can be detailed here.

When this is triggered, the following is executed:

Code is checked out
A new release branch (e.g. release/v2.3.0) is created with the version name (I’ll come onto this)
The version name from the workflow dispatch trigger is saved to a VERSION file in the repo
The changelog is updated (we use keepachangelog format)
A git user is configured (GitHub bot credentials)
Commit changelog and version file to the repository
Push new branch to the repository (this is very important, I totally forgot to do this and only realised 10 contributions later)
Create PR into master from release branch (git flow)
Create PR into develop from release branch (git flow)

Staging

Once we have this release/v2.3.0 branch ready to be merged to master and develop, the GitHub CI/CD Action from before gets triggered on the prefix match of release/** and the staging function (stg-collector) gets updated.

The staging function is connected to our staging database which, in turn, is connected to the staging web app the backend engineers develop. At this point, the data can be loaded through this collection pipeline into the database which can then be seen on the front end.

Any updates that need immediate fixing can be done here by merging into the release branch.

Production

Then, when happy with the release, the two PRs can be merged into the master / main and develop branches. Once this happens, the next GitHub Action is triggered:

	name: Tag Release
	on:
	push:
	branches: [master]

	jobs:
	tag_release:
	runs-on: ubuntu-latest

	steps:
	- name: Checkout code
	uses: actions/checkout@v2

	- name: Initialise mandatory git config
	run: \|
	git config user.name "GitHub Actions"
	git config user.email noreply@github.com

	- name: Setup env version name with artifact file
	run: \|
	version=`cat VERSION`
	export VERSION_NAME=$version
	echo "VERSION_NAME=$VERSION_NAME" >> $GITHUB_ENV

	- name: Extract release notes
	id: extract-release-notes
	uses: ffurrer2/extract-release-notes@v1

	- name: Create release
	id: create-release
	uses: actions/create-release@v1
	env:
	GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
	with:
	tag_name: ${{ env.VERSION_NAME }}
	release_name: v${{ env.VERSION_NAME }}
	body: ${{ steps.extract-release-notes.outputs.release_notes }}
	draft: false
	prerelease: false

view raw tag-release.yml hosted with ❤ by GitHub

This action is the release tagging process and is triggered on merges to master. The process is:

Check out code
Initialise git configuration again
Pull the version information from the VERSION file and save to a new environment variable
Extract release notes
Create a release using the version number and release notes

At the same time this runs, the CI/CD pipeline from earlier is run again but this time for the production function.

Conclusion

And that’s about it! A full development, staging, and production separation between Azure function apps which we’ve been working with recently.

Next steps:

Configure access control lists on our Azure Data Lake Storage Gen 2 instance so that we can begin restricting the access each function has to each environments (it’s currently connection strings)

Phil Marius