Building an Automated CI/CD Pipeline for Serverless Machine Learning on AWS
A step-by-step guide on automating the infrastructure pipeline on AWS Lambda architecture
By Kuriko IWAI

Table of Contents
IntroductionWhat is CI/CD PipelineThe Workflow in ActionTesting and Building WorkflowIntroduction
A CI/CD pipeline is a set of automated processes that helps machine learning teams deliver models more reliably and efficiently.
This automation is crucial for ensuring that new model versions are continuously integrated, tested, and deployed to production without manual intervention.
In this article, I'll explore a step-by-step guide on integrating an infrastructure CI/CD pipeline for a machine learning application deployed on a serverless Lambda architecture.
What is CI/CD Pipeline
A CI/CD (Continuous Integration / Continuous Delivery) pipeline is an automated process that helps deliver code changes more reliably and efficiently by automating the steps of building, testing, and deploying software.
Continuous Integration (CI) focuses on the practice of developers regularly merging code changes into a central repository.
After each merge, an automated build and a series of tests like unit tests are run to ensure the new code doesn't break the existing application.
Continuous Delivery (CD) automates the process of taking the code that passed CI and getting it ready for release.
In the process, the software is built, tested, and packaged into a release-ready state.
Then, Continuous Deployment (CD) automatically deployed the code passed all automated tests to production without human intervention.
Using a CI/CD pipeline is critical in DevOp practices, providing benefits like:
Faster releases as automation eliminates manual, time-consuming operational tasks,
Reducing risk by running automated tests on every code change,
Improving collaboration through a shared, automated pipeline that provides a consistent process, and
Improving code quality that reflects the immediate feedback from the automated tests.
The Workflow in Action
To establish a robust CI/CD pipeline for an ML application, it is critical to automate the entire lifecycle of the infrastructure, models, and data.
This process is referred as MLOps, extending traditional DevOps practices to include the unique challenges of machine learning like data and model versioning.
In this article, I’ll focus on building an infrastructure CI/CD pipeline for the dynamic pricing system built on AWS Lambda:

Kernel Labs | Kuriko IWAI | kuriko-iwai.com
Figure A. Infrastructure CI/CD pipelines (Created by Kuriko IWAI)
The pipeline covers the four stages:
Source: Commits a code change to GitHub to trigger the pipeline,
Test: Runs automated tests and security scans over the artifact,
Build: Compiles the committed code into an artifact, and
Deploy: Deploys passed code to a staging or production.
All code is hosted on GitHub, where it's protected by branch protection rules and enforced pull request reviews.
Once a change is ready, a GitHub Actions workflow (green box in the diagram) is triggered to run the testing and building processes.
To prevent errors from reaching production, I added a human review phase (pink box in the diagram) between the build and deployment workflows, ensuring any issues to be addressed before the final deployment.
If the code passes the human review, another GitHub Actions workflow is manually triggered to deploy the code as a Lambda function in staging or production.
This entire process is enhanced with comprehensive monitoring and security checkups (orange boxes).
Testing and Building Workflow
I’ll first configure the Github Actions Workflow to trigger testing and building every push and pull request.
This automation process involves the three phases:
Environment Setup:
Setting up Python,
Installing dependencies,
Configuring AWS credentials using OIDC,
Test Phase:
Running PyTest,
Running the Static Application Security Testing (SAST),
Scanning dependencies with Software Composition Analysis (SCA), and
Build Phase:
- Once the code passes all tests, triggering AWS CodeBuild to start the project where the container image is built and pushed to the ECR.
These phases are configured in the build_test.yml script stored in the .github folder located at the root of the project directory:
.github/workflows/build_test.yml
1name: Build and Test
2
3on:
4 push:
5 branches: [ main ]
6 pull_request:
7 branches: [ main ]
8
9env:
10 API_ENDPOINT: ${{ secrets.API_ENDPOINT }}
11 CLIENT_A: ${{ secrets.CLIENT_A }}
12
13
14# set permissions for oicd (open id connect) authentication with aws
15permissions:
16 id-token: write # for requesting the jwt token from GitHub's OIDC provider.
17 contents: read # for checking out the code in the repo
18 security-events: write
19
20jobs:
21 build_and_test:
22 runs-on: ubuntu-latest
23 timeout-minutes: 60
24
25 steps:
26 # environment setup
27 - name: checkout repository code
28 uses: actions/checkout@v4
29
30 - name: set up python
31 uses: actions/setup-python@v5
32 with:
33 python-version: '3.12'
34 cache: 'pip'
35
36 - name: install dependencies
37 run: |
38 python -m pip install --upgrade pip
39 pip install -r requirements.txt
40 pip install -r requirements_dev.txt
41
42 # config aws credentials using oidc
43 - name: configure aws credentials
44 uses: aws-actions/configure-aws-credentials@v4
45 with:
46 aws-region: ${{ secrets.AWS_REGION_NAME }}
47 role-to-assume: ${{ secrets.AWS_IAM_ROLE_ARN }} # iam role for github actions
48 role-session-name: GitHubActions-Build-Test-${{ github.run_id }}
49
50 - name: test aws access
51 run: |
52 aws sts get-caller-identity
53 echo "✅ oidc authentication successful"
54
55 # testing
56 - name: run pytest
57 run: pytest
58 env:
59 CORS_ORIGINS: 'http://localhost:3000,http://127.0.0.1:3000'
60 PYTEST_RUN: true
61
62 # static analysis and dependency scanning with snyk
63 - name: run snyk sast
64 uses: snyk/actions/python@master
65 with:
66 command: test
67 args: --severity-threshold=high --policy-path=.synk --python-version=3.12 --skip-unresolved --file=requirements.txt
68 env:
69 SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
70
71 - name: run snyk sca
72 uses: snyk/actions/python@master
73 with:
74 command: code test
75 args: --severity-threshold=high --policy-path=.synk --python-version=3.12 --skip-unresolved --file=requirements.txt
76 env:
77 SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
78
79 # building - trigger aws codebuild to start the project named ${{ secrets.CODEBUILD_PROJECT }}
80 - name: trigger aws codebuild
81 uses: aws-actions/aws-codebuild-run-build@v1
82 id: codebuild
83 with:
84 project-name: ${{ secrets.CODEBUILD_PROJECT }}
85 source-version-override: ${{ github.sha }}
86 env-vars-for-codebuild: | # pass the env vars to buildspec.yml. set BUILD_TYPE as test not to trigger the deployment
87 GITHUB_SHA=${{ github.sha }},
88 BUILD_TYPE=test
89
90 - name: check codebuild status
91 if: always()
92 run: |
93 BUILD_ID="${{ steps.codebuild.outputs.aws-build-id }}"
94 echo "codebuild id: $BUILD_ID"
95
96 BUILD_STATUS=$(aws codebuild batch-get-builds --ids "$BUILD_ID" \
97 --query 'builds[0].buildStatus' --output text)
98 echo "build status: $BUILD_STATUS"
99
100 aws codebuild batch-get-builds --ids "$BUILD_ID" \
101 --query 'builds[0].phases[].{Phase:phaseType,Status:phaseStatus,Duration:durationInSeconds}' \
102 --output table
103
104 if [ "$BUILD_STATUS" != "SUCCEEDED" ]; then
105 echo "❌ codebuild failed with status: $BUILD_STATUS"
106 exit 1
107 else
108 echo "✅ codebuild completed successfully"
109 fi
110
111 - name: upload build artifacts
112 if: always()
113 run: |
114 echo "build completed for commit: ${{ github.sha }}"
115 echo "branch: ${{ github.ref_name }}"
116 echo "build ID: ${{ steps.codebuild.outputs.aws-build-id }}"
117
Next, I’ll add support components to make the workflow run successfully.
This process involves:
Adding PyTest scripts,
Configuring the Synk credential for SAST and SCA tests,
AWS related configuration:
Setting up OIDC for AWS credentials,
Defining an IAM role for GitHub Actions, and
Configuring the AWS CodeBuild
◼ Adding PyTest Scripts
I’ll start the process by adding PyTest scripts to the tests folder located at the root of the project repository.
For demonstration, I’ll add two test files to evaluate the main script and the Flask's app scripts:
tests/main_test.py (Testing the main script)
1import os
2import shutil
3import numpy as np
4import pytest
5from unittest.mock import patch, MagicMock
6
7import src.main as main_script
8
9
10def test_data_loading_and_preprocessor_saving(mock_data_handling, mock_s3_upload, mock_joblib_dump):
11 """tests that data loading is called and the preprocessor is saved and uploaded."""
12
13 main_script.run_main()
14
15 # verify that data_handling.main_script was called
16 mock_data_handling.assert_called_once()
17
18 # verify preprocessor is dumped in mock file
19 mock_joblib_dump.assert_called_once_with(mock_data_handling.return_value[-1], PREPROCESSOR_PATH)
20
21 # verify preprocessor is uploaded to mock s3
22 mock_s3_upload.assert_any_call(file_path=PREPROCESSOR_PATH)
23
24
25
26def test_model_optimization_and_saving(mock_data_handling, mock_model_scripts, mock_s3_upload):
27 """tests that each model's optimization script is called and the results are saved and uploaded."""
28
29 mock_torch_script, mock_sklearn_script = mock_model_scripts
30 main_script.run_main()
31
32 # verify each model's main_script was called
33 assert mock_torch_script.called
34 assert mock_sklearn_script.call_count == len(main_script.sklearn_models)
35
36 # verify that each model file exists and s3_upload was called for it
37 ## dfn
38 assert os.path.exists(DFN_FILE_PATH)
39 mock_s3_upload.assert_any_call(file_path=DFN_FILE_PATH)
40
41 ## svr model
42 assert os.path.exists(SVR_FILE_PATH)
43 mock_s3_upload.assert_any_call(file_path=SVR_FILE_PATH)
44
45 ## elastic net
46 assert os.path.exists(EN_FILE_PATH)
47 mock_s3_upload.assert_any_call(file_path=EN_FILE_PATH)
48
49 ## light gbm
50 assert os.path.exists(GBM_FILE_PATH)
51 mock_s3_upload.assert_any_call(file_path=GBM_FILE_PATH)
52
tests/app_test.py (Testing the Flask app scripts)
1import os
2import json
3import io
4import pandas as pd
5import numpy as np
6from unittest.mock import patch, MagicMock
7
8# import scripts to test
9import app
10
11# add cors origin
12os.environ['CORS_ORIGINS'] = 'http://localhost:3000, http://127.0.0.1:3000'
13
14
15@patch('app.t.scripts.load_model')
16@patch('torch.load')
17@patch('app._redis_client', new_callable=MagicMock)
18@patch('app.joblib.load')
19@patch('app.s3_load_to_temp_file')
20@patch('app.s3_load')
21def test_predict_endpoint_primary_model(
22 mock_s3_load,
23 mock_s3_load_to_temp_file,
24 mock_joblib_load,
25 mock_redis_client,
26 mock_torch_load,
27 mock_load_model,
28 flask_client,
29):
30 """test a prediction from the primary model without cache hit."""
31
32 # mock return values for file loading
33 mock_preprocessor = MagicMock()
34 mock_joblib_load.return_value = mock_preprocessor
35 mock_s3_load.return_value = io.BytesIO(b'dummy_data')
36 mock_s3_load_to_temp_file.return_value = 'dummy_path'
37
38 # config redis cache for cache miss
39 mock_redis_client.get.return_value = None
40
41 # config the model and torch mock
42 mock_torch_model = MagicMock()
43 mock_load_model.return_value = mock_torch_model
44 mock_torch_load.return_value = {'state_dict': 'dummy'}
45
46 # mock model's prediction array
47 num_rows = 1200
48 num_bins = 100
49 expected_length = num_rows * num_bins
50 mock_prediction_array = np.random.uniform(1.0, 10.0, size=expected_length)
51
52 # mock the return chain for the model's forward pass
53 mock_torch_model.return_value.cpu.return_value.numpy.return_value.flatten.return_value = mock_prediction_array
54
55 # create a mock dataframe
56 mock_df_expanded = pd.DataFrame({
57 'stockcode': ['85123A'] * num_rows,
58 'quantity': np.random.randint(50, 200, size=num_rows),
59 'unitprice': np.random.uniform(1.0, 10.0, size=num_rows),
60 'unitprice_min': np.random.uniform(1.0, 3.0, size=num_rows),
61 'unitprice_median': np.random.uniform(4.0, 6.0, size=num_rows),
62 'unitprice_max': np.random.uniform(8.0, 12.0, size=num_rows),
63 })
64
65 # set global variables used by the app endpoint
66 app.X_test = mock_df_expanded.drop(columns='quantity')
67 app.preprocessor = mock_preprocessor
68
69 with patch.object(pd, 'read_parquet', return_value=mock_df_expanded):
70 response = flask_client.get('/v1/predict-price/85123A')
71
72 # assertion
73 assert response.status_code == 200
74
75 data = json.loads(response.data)
76 assert isinstance(data, list)
77 assert len(data) == num_bins
78 assert data[0]['stockcode'] == '85123A'
79 assert 'predicted_sales' in data[0]
80
In the app_test.py script, I used the @patch decorators from the Python’s unittest.mock library to temporarily replace the functions and objects with mock objects.
This allows the tests to run without depending on external sources like files or storages.
In practice, these tests need to be added every change made in the code to make sure the code changes will not cause errors.
◼ Configuring the Synk Credential for SAST and SCA Tests
For SAST and SCA, I’ll use a security platform, Synk to find and fix vulnerabilities in the code and dependencies.
Synk’s primary goal is to shift left by integrating security into the development workflow as early as possible.
So, the Github Actions Workflow must run Synk SAST and SCA processes before triggering the build process.
To configure the Synk credential, visit the synk account page, copy the Auth Token, and store it in the Github repository secrets.

Kernel Labs | Kuriko IWAI | kuriko-iwai.com
Figure B. Screenshot of the Synk account page
◼ Setting Up OIDC for AWS Credentials
Next, I’ll configure handling AWS credentials with OIDC (OpenID Connect).
OIDC is a security practice that avoids storing long-lived AWS credentials in the environment by leveraging a federated identity approach with an external identity provider (IdP).
The IdP generates a temporary, short-lived token, exchanged with AWS for temporary security credentials to grant access to specific resources for a limited time.
To make the process work, I’ll first add the identity provider to the AWS account.
Visit AWS's IAM console > Identity Providers:
Provider type: Select OpenID Connect
Provider URL: https://token.actions.githubusercontent.com
Audience: sts.amazonaws.com
Click Add provider.
◼ Configuring an IAM Role for Github Actions
Next, I’ll add an IAM role for the Github Actions.
An IAM role is a security entity in AWS that defines a set of permissions for making service requests.
To make the Github Actions access necessary AWS resources for the project, the IAM role must have permissions on:
Retrieving the identity from the AWS Security Token Service (STS): GetCallerIdentity
Running AWS CodeBuild commands: BatchGetBuilds, BatchGetProjects, StartBuild
Logging the CodeBuild project: GetLogEvents, FilterLogEvents, DescribeLogStreams
Storing parameters in the System Manager (SSM) Parameter Store: GetParameter, GetParameters, GetParametersByPath
Retrieving and modifying project related resources:
Lambda function: GetFunction, UpdateFunctionCode, UpdateFunctionConfiguration, InvokeFunction
ECR: ListImage, DescribeImages, DescribeRepositories
I’ll configure these permissions as an inline policy github_actions_permissions in a JSON format.
The inline policy refines the security scope to the absolute minimum, even though some permissions are already covered by broader AWS managed policies like AWSCodeBuildDeveloperAccess and CloudWatchLogsReadOnlyAccess.
IAM console > Roles > Create role > Add permissions > Create inline policy > JSON > github_actions_permissions:
1{
2 "Version": "2012-10-17",
3 "Statement": [
4 {
5 "Effect": "Allow",
6 "Action": [
7 "sts:GetCallerIdentity"
8 ],
9 "Resource": "*"
10 },
11 {
12 "Effect": "Allow",
13 "Action": [
14 "codebuild:BatchGetBuilds",
15 "codebuild:StartBuild",
16 "codebuild:BatchGetProjects"
17 ],
18 "Resource": [
19 "ADD_CODEBUILD_PROJECT_ARN"
20 ]
21 },
22 {
23 "Effect": "Allow",
24 "Action": [
25 "logs:GetLogEvents",
26 "logs:DescribeLogStreams",
27 "logs:DescribeLogGroups",
28 "logs:FilterLogEvents"
29 ],
30 "Resource": [
31 "arn:aws:logs:*:AWS_ACCOUNT_ID:log-group:/aws/codebuild/*:*"
32 ]
33 },
34 {
35 "Effect": "Allow",
36 "Action": [
37 "ssm:GetParameter",
38 "ssm:GetParameters",
39 "ssm:GetParametersByPath"
40 ],
41 "Resource": [
42 "ADD_SSM_PARAMETER_ARN",
43 ]
44 },
45 {
46 "Effect": "Allow",
47 "Action": [
48 "lambda:GetFunction",
49 "lambda:UpdateFunctionCode",
50 "lambda:UpdateFunctionConfiguration",
51 "lambda:InvokeFunction"
52 ],
53 "Resource": "ADD_LAMBDA_FUNCTION_ARN"
54 },
55 {
56 "Effect": "Allow",
57 "Action": [
58 "ecr:ListImages",
59 "ecr:DescribeImages",
60 "ecr:DescribeRepositories"
61 ],
62 "Resource": "ADD_ECR_ARN"
63 }
64 ]
65}
66
◼ Configuring the AWS CodeBuild
Lastly, I’ll configure the AWS CodeBuild project from the AWS console.
The process involves:
Step 1. Add an IAM role for the CodeBuild,
Step 2. Create a CodeBuild project, and
Step 3. Configure a buildspec.yml file to customize the build process.
▫ Step 1. Adding an IAM Role for CodeBuild
The IAM role for CodeBuild needs to have permissions on:
Connecting CodePipeline to GitHub Actions: UseConnection
Creating logs on the CodeBuild project: CreateLogGroup, CreateLogStream, PutLogEvents.
Storing parameters in the SSM Parameter Store: GetParameter, GetParameters, GetParametersByPath
Putting objects in S3 buckets for the CodePipeline: GetObject, GetObjectVersion, PutObject
Retrieving and modifying project related resources:
Lambda function: GetFunction, UpdateFunctionCode, UpdateFunctionConfiguration, InvokeFunction
ECR: ListImage, DescribeImages, DescribeRepositories
Similar to the GitHub Actions Role, these permissions are defined as an inline policy:
IAM console > Roles > Create role > Add permissions > Create inline policy > JSON > codebuild_permissions:
1{
2 "Version": "2012-10-17",
3 "Statement": [
4 {
5 "Effect": "Allow",
6 "Action": [
7 "codeconnections:UseConnection"
8 ],
9 "Resource": "ADD_CONNCETION_ARN"
10 },
11 {
12 "Effect": "Allow",
13 "Action": [
14 "logs:CreateLogGroup",
15 "logs:CreateLogStream",
16 "logs:PutLogEvents"
17 ],
18 "Resource": [
19 "arn:aws:logs:<CODEBUILD_PROJECT_ARN>",
20 ]
21 },
22 {
23 "Effect": "Allow",
24 "Action": [
25 "ssm:PutParameter",
26 "ssm:GetParameter",
27 "ssm:GetParameters",
28 "ssm:DeleteParameter",
29 "ssm:DescribeParameters"
30 ],
31 "Resource": [
32 "ADD_SSM_ARN"
33 ]
34 },
35 {
36 "Effect": "Allow",
37 "Action": [
38 "s3:GetObject",
39 "s3:GetObjectVersion",
40 "s3:PutObject"
41 ],
42 "Resource": [
43 "arn:aws:s3:::codepipeline-us-east-1-*/*"
44 ]
45 },
46 {
47 "Effect": "Allow",
48 "Action": [
49 "lambda:UpdateFunctionCode",
50 "lambda:GetFunction",
51 "lambda:UpdateFunctionConfiguration"
52 ],
53 "Resource": "ADD_LAMBDA_FUNCTION_ARN"
54 },
55 {
56 "Effect": "Allow",
57 "Action": [
58 "ecr:BatchCheckLayerAvailability",
59 "ecr:GetDownloadUrlForLayer",
60 "ecr:BatchGetImage",
61 "ecr:GetAuthorizationToken",
62 "ecr:InitiateLayerUpload",
63 "ecr:UploadLayerPart",
64 "ecr:CompleteLayerUpload",
65 "ecr:PutImage"
66 ],
67 "Resource": "ADD_ECR_ARN"
68 }
69 ]
70}
71
▫ Step 2. Creating a CodeBuild Project
Visit Developer Tools > CodeBuild > Build project > Create build project, create a new CodeBuild project:
Project name: pj-sales-pred (or any name that specifies in the .yml file),
Project type: Default project,
Source 1: Github (follow the instructions to connect the Github account to the CodeBuild)
Repository: https://github.com/<YOUR GITHUB ACCOUNT>/<REPOSITORY NAME> (The CodeBuild project needs to understand where to get the source code)
Service Role: Choose the IAM role created in Step 1
Build Spec: Choose Use a buildspec file option/ Add buildspec.yml to the Build Command tag
Configure environment:
Environment image: Managed image
Operating system: Amazon Linux 2
Runtime(s): Standard
Image: aws/codebuild/amazonlinux2-x86_64-standard:5.0
Image version: Always use the latest image for this runtime version
Environment type: Linux
Compute: 3 GB memory, 2 vCPUs (BUILD_GENERAL1_SMALL)
Check "Privileged"

Kernel Labs | Kuriko IWAI | kuriko-iwai.com
Figure C. Screenshot of the AWS CodeBuild console
▫ Step 3. Add buildspec File
Lastly, add the buildspec.yml file at the root of the project repository.
The buildspec.yml file configures the CodeBuild process by defining key components:
version: Specifies the buildspec version.
env: Defines environment variables.
phases: Defines the commands to run:
pre_build: Commands to run before the main build. Login to ECR and create a repository if it does not exist.
build: The main part of the build where the Docker image is built and tagged.
post_build: Commands to run after the main build is complete. The Docker image is pushed to the ECR.
artifacts: Specifies the files or directories that stores the build output. The artifacts will be passed to the next stage of the CI/CD pipeline - deployment stage.
cache: Defines files or directories to cache between builds to speed up the process.
AWS CodeBuild automatically looks into the file and executes the commands accordingly.
buildspec.yml
1version: 0.2
2
3phases:
4 pre_build:
5 commands:
6 # login to ecr
7 - echo "=== Pre-build Phase Started ==="
8 - AWS_ACCOUNT_ID=$(echo $CODEBUILD_BUILD_ARN | cut -d':' -f5)
9 - ECR_REGISTRY="$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"
10 - aws ecr get-login-password --region $AWS_DEFAULT_REGION > /tmp/ecr_password
11 - cat /tmp/ecr_password | docker login --username AWS --password-stdin $ECR_REGISTRY
12 - rm /tmp/ecr_password
13 - REPOSITORY_URI="$ECR_REGISTRY/$ECR_REPOSITORY_NAME"
14
15 # use github sha or codebuild commit hash as an image tag
16 - |
17 if [ -n "$GITHUB_SHA" ]; then
18 COMMIT_HASH=$(echo $GITHUB_SHA | cut -c 1-7)
19 echo "Using GitHub SHA: $GITHUB_SHA"
20 else
21 COMMIT_HASH=$(echo $CODEBUILD_RESOLVED_SOURCE_VERSION | cut -c 1-7)
22 echo "Using CodeBuild SHA: $CODEBUILD_RESOLVED_SOURCE_VERSION"
23 fi
24 - IMAGE_TAG="${COMMIT_HASH:-latest}"
25
26 # store image tag in aws ssm parameter store
27 - |
28 aws ssm put-parameter --name "/my-app/image-tag" --value "$IMAGE_TAG" --type "String" --overwrite
29
30 # create an ecr registory if not exist
31 - |
32 aws ecr describe-repositories --repository-names $ECR_REPOSITORY_NAME --region $AWS_DEFAULT_REGION || \
33 aws ecr create-repository --repository-name $ECR_REPOSITORY_NAME --region $AWS_DEFAULT_REGION
34
35 build:
36 commands:
37 - echo "=== Build Phase Started ==="
38 # build docker image
39 - docker build -t my-app -f Dockerfile.lambda .
40 - docker tag $ECR_REPOSITORY_NAME:latest $REPOSITORY_URI:$IMAGE_TAG
41 - docker images | grep $ECR_REPOSITORY_NAME
42
43 post_build:
44 commands:
45 - echo "=== Post-build Phase Started ==="
46
47 # push the docker image to ecr
48 - docker push ${REPOSITORY_URI}:${IMAGE_TAG}
49
50artifacts:
51 files:
52 - '**/*'
53 name: ml-sales-prediction-$(date +%Y-%m-%d)
54
55cache:
56 paths:
57 - '/root/.cache/pip/**/*'
58
After a successful test and build from the GitHub Actions workflow triggered by a push to the GitHub repository, the CodeBuild project has now build history:

Kernel Labs | Kuriko IWAI | kuriko-iwai.com
Figure D. Screenshot of the CodeBuild console
This concludes the build_test.yml workflow.
Deployment Workflow
After human review on the build results, the container image is finally deployed as a Lambda function using GitHub Actions Workflow.
The process involves:
Environment Setup
Setting up Python,
Installing dependencies,
Configuring AWS credentials using OIDC, and
Extracting a shortened version of the Git commit SHA into the SHORT_SHA
Deployment
Checking if the Lambda function exists,
Retrieving the latest image tag from the SSM parameter store,
If the image tag is found, update the lambda function with the image retrieved,
If not, start a new CodeBuild project to rebuild a container image, and
Update the lambda function with the container image.
Verification and Testing:
Check if the Lambda function is updated, and
Test the updated Lambda function.
Configuration Update:
After successful test run, update the environment variable for the Lambda function, and
Clean up the temporary files, ensuring a clean state for the next run.
.github/workflows/deploy.yml
1name: Deploy Containerized Lambda
2
3on:
4 workflow_dispatch: # manual run
5 inputs:
6 branch:
7 description: 'The branch to deploy from'
8 required: true
9 default: 'develop'
10 type: choice
11 options:
12 - main
13 - develop
14
15env:
16 GITHUB_SHA: ${{ github.sha }}
17
18permissions:
19 id-token: write
20 contents: read
21
22jobs:
23 deploy:
24 runs-on: ubuntu-latest
25
26 steps:
27 ### environment setup ###
28 - name: checkout code
29 uses: actions/checkout@v4
30 with:
31 ref: ${{ github.event.inputs.branch }}
32
33 - name: set up python
34 uses: actions/setup-python@v5
35 with:
36 python-version: '3.12'
37 cache: 'pip'
38
39 # configure aws credentials using oicd
40 - name: configure aws credentials
41 uses: aws-actions/configure-aws-credentials@v4
42 with:
43 aws-region: ${{ secrets.AWS_REGION_NAME }}
44 role-to-assume: ${{ secrets.AWS_IAM_ROLE_ARN }}
45
46 - name: set environment variables
47 run: |
48 echo "SHORT_SHA=${GITHUB_SHA::8}" >> $GITHUB_ENV
49
50 ### deployment ###
51 - name: check lambda function exists
52 run: |
53 aws lambda get-function --function-name ${{ secrets.LAMBDA_FUNCTION_NAME }} --region ${{ secrets.AWS_REGION_NAME }}
54
55 - name: retrieve image tag and validate image
56 id: validate_image
57 run: |
58 IMAGE_TAG=$(aws ssm get-parameter --name "/my-app/image-tag" --query "Parameter.Value" --output text || echo "")
59 echo "IMAGE_TAG=$IMAGE_TAG" >> $GITHUB_ENV
60
61 if [[ -z "$IMAGE_TAG" ]]; then
62 echo "has_image=false" >> $GITHUB_OUTPUT
63 else
64 echo "... checking for image with tag: $IMAGE_TAG"
65 IMAGE_URI=${{ secrets.AWS_ACCOUNT_ID }}.dkr.ecr.${{ secrets.AWS_REGION_NAME }}.amazonaws.com/${{ secrets.ECR_REPOSITORY }}:${IMAGE_TAG}
66
67 if aws ecr describe-images --repository-name ${{ secrets.ECR_REPOSITORY }} --image-ids imageTag=$IMAGE_TAG --region ${{ secrets.AWS_REGION_NAME }} > /dev/null 2>&1; then
68 echo "has_image=true" >> $GITHUB_OUTPUT
69 echo "IMAGE_URI=$IMAGE_URI" >> $GITHUB_OUTPUT
70 else
71 echo "has_image=false" >> $GITHUB_OUTPUT
72 fi
73 fi
74
75 - name: update lambda function with existing image
76 if: ${{ steps.validate_image.outputs.has_image == 'true' }}
77 run: |
78 aws lambda update-function-code \
79 --function-name ${{ secrets.LAMBDA_FUNCTION_NAME }} \
80 --region ${{ secrets.AWS_REGION_NAME }} \
81 --image-uri ${{ steps.validate_image.outputs.IMAGE_URI }}
82 echo "...lambda function updated with existing image ..."
83
84 - name: start codebuild for container build
85 if: ${{ steps.validate_image.outputs.has_image == 'false' }} # run only when the image is not found
86 uses: aws-actions/aws-codebuild-run-build@v1
87 id: codebuild
88 with:
89 project-name: ${{ secrets.CODEBUILD_PROJECT }}
90 source-version-override: ${{ github.event.inputs.branch }}
91 env-vars-for-codebuild: |
92 [
93 {
94 "name": "GITHUB_REF",
95 "value": "refs/heads/${{ github.event.inputs.branch }}"
96 },
97 {
98 "name": "BRANCH_NAME",
99 "value": "${{ github.event.inputs.branch }}"
100 },
101 {
102 "name": "ECR_REPOSITORY_NAME",
103 "value": "${{ secrets.ECR_REPOSITORY }}"
104 },
105 {
106 "name": "LAMBDA_FUNCTION_NAME",
107 "value": "${{ secrets.LAMBDA_FUNCTION_NAME }}"
108 }
109 ]
110
111 - name: update lambda function with a new image (after build)
112 if: ${{ steps.validate_image.outputs.has_image == 'false' }} # run only when the image is not found
113 run: |
114 LATEST_IMAGE_URI=$(aws ecr describe-images --repository-name ${{ secrets.ECR_REPOSITORY }} --query 'sort_by(imageDetails,&imagePushedAt)[-1].imagePushedAt' | xargs -I {} aws ecr describe-images --repository-name ${{ secrets.ECR_REPOSITORY }} --query 'imageDetails[?imagePushedAt==`{}`].imageUri' --output text)
115
116 if [[ -z "$LATEST_IMAGE_URI" ]]; then
117 echo "... failed to retrieve the new image uri ..."
118 exit 1
119 fi
120
121 aws lambda update-function-code \
122 --function-name ${{ secrets.LAMBDA_FUNCTION_NAME }} \
123 --region ${{ secrets.AWS_REGION_NAME }} \
124 --image-uri "$LATEST_IMAGE_URI"
125 echo "... lambda function updated with newly built image ..."
126
127 ### verification and testing ###
128 - name: verify lambda updates
129 run: |
130 CURRENT_IMAGE=$(aws lambda get-function \
131 --function-name ${{ secrets.LAMBDA_FUNCTION_NAME }} \
132 --region ${{ secrets.AWS_REGION_NAME }} \
133 --query 'Code.ImageUri' \
134 --output text)
135
136 if [[ $CURRENT_IMAGE == *"dkr.ecr"* ]]; then
137 echo "✅ lambda function successfully updated with new image"
138 else
139 echo "❌ lambda function update may have failed"
140 exit 1
141 fi
142
143 - name: test lambda function
144 if: github.event.inputs.branch == 'main'
145 run: |
146 aws lambda invoke \
147 --function-name ${{ secrets.LAMBDA_FUNCTION_NAME }} \
148 --region ${{ secrets.AWS_REGION_NAME }} \
149 --payload '{"test": true}' \
150 --cli-binary-format raw-in-base64-out \
151 response.json
152
153 cat response.json
154
155 ### update the lambda func env vars ###
156 - name: update lambda environment variables
157 if: github.event.inputs.branch == 'main'
158 run: |
159 DEPLOY_TIME=$(date -u +%Y-%m-%dT%H:%M:%SZ)
160 IMAGE_TAG="${{ env.IMAGE_TAG }}"
161
162 echo "ENVIRONMENT: production"
163 echo "VERSION: $IMAGE_TAG"
164 echo "DEPLOY_TIME: $DEPLOY_TIME"
165
166 # wait for a while to reattempt updating the env var updates
167 MAX_ATTEMPTS=30
168 ATTEMPT=1
169
170 while [ $ATTEMPT -le $MAX_ATTEMPTS ]; do
171 echo "... attempt $ATTEMPT/$MAX_ATTEMPTS: checking function state ..."
172
173 FUNCTION_STATE=$(aws lambda get-function \
174 --function-name ${{ secrets.LAMBDA_FUNCTION_NAME }} \
175 --region ${{ secrets.AWS_REGION_NAME }} \
176 --query 'Configuration.State' \
177 --output text)
178
179 LAST_UPDATE_STATUS=$(aws lambda get-function \
180 --function-name ${{ secrets.LAMBDA_FUNCTION_NAME }} \
181 --region ${{ secrets.AWS_REGION_NAME }} \
182 --query 'Configuration.LastUpdateStatus' \
183 --output text)
184
185 if [ "$FUNCTION_STATE" = "Active" ] && [ "$LAST_UPDATE_STATUS" = "Successful" ]; then
186 echo "✅ function is ready for configuration update"
187 break
188 elif [ "$LAST_UPDATE_STATUS" = "Failed" ]; then
189 echo "❌ function update failed"
190 exit 1
191 else
192 echo "function not ready yet, waiting 30 seconds..."
193 sleep 30
194 ATTEMPT=$((ATTEMPT + 1))
195 fi
196 done
197
198 if [ $ATTEMPT -gt $MAX_ATTEMPTS ]; then
199 echo "❌ Timeout waiting for function to be ready"
200 exit 1
201 fi
202
203 aws lambda update-function-configuration \
204 --function-name ${{ secrets.LAMBDA_FUNCTION_NAME }} \
205 --region ${{ secrets.AWS_REGION_NAME }} \
206 --environment "Variables={ENVIRONMENT=production,VERSION=$IMAGE_TAG,DEPLOY_TIME=$DEPLOY_TIME}"
207
208 # clean up temp files for the clean state for the next run
209 - name: cleanup
210 if: always()
211 run: |
212 echo "=== Cleanup ==="
213 rm -f response.json
214 echo "✅ cleanup completed"
215
That’s all for the Infrastructure CI/CD pipeline integration.
Next, I'll configure Grafana for more advanced monitoring. This step is optional, as AWS CloudWatch can also cover your monitoring needs.
Monitoring with Grafana
Lastly, I’ll configure Grafana for advanced logging and monitoring on top of AWS CloudWatch.
Grafana is an open-source data visualization and analytics tool.
It allows to query, visualize, alert on, and understand the metrics no matter where they are stored.
The configuration process involves:
Create an IAM User,
Attaching roles and policies to the IAM User, and
Connecting the data source to Grafana.
◼ Creating AWS IAM User for Grafana
First, I’ll add a new IAM User dedicated to the Grafana integration and grant it read-only access to various AWS services.
This ensures segregating the permissions to specific resources it needs to access, following the principle of least privilege.
Visit IAM console > User > Create user > Add user name “grafana” > Attach policies directly > Create policy > JSON:
1{
2 "Version": "2012-10-17",
3 "Statement": [
4 {
5 "Sid": "ListAllLogGroups",
6 "Effect": "Allow",
7 "Action": [
8 "logs:DescribeLogGroups"
9 ],
10 "Resource": "*"
11 },
12 {
13 "Sid": "AccessSpecificLogGroups",
14 "Effect": "Allow",
15 "Action": [
16 "logs:DescribeLogStreams",
17 "logs:GetLogEvents",
18 "logs:FilterLogEvents",
19 "logs:StartQuery",
20 "logs:StopQuery",
21 "logs:GetQueryResults",
22 "logs:DescribeMetricFilters",
23 "logs:GetLogGroupFields",
24 "logs:DescribeExportTasks",
25 "logs:DescribeDestinations"
26 ],
27 "Resource": [
28 "arn:aws:logs:*:<AWS ACCOUNT ID>:log-group:/aws/lambda/*",
29 "arn:aws:logs:*:<AWS ACCOUNT ID>:log-group:/aws/codebuild/*",
30 "arn:aws:logs:*:<AWS ACCOUNT ID>:log-group:/aws/apigateway/*",
31 "arn:aws:logs:*:<AWS ACCOUNT ID>:log-group:<RDS NAME>*",
32 "arn:aws:logs:*:<AWS ACCOUNT ID>:log-group:<PROJECT NAME>*",
33 "arn:aws:logs:*:<AWS ACCOUNT ID>:log-group:application/*",
34 "arn:aws:logs:*:<AWS ACCOUNT ID>:log-group:custom/*"
35 ]
36 },
37 {
38 "Sid": "CloudWatchLogsQueryOperations",
39 "Effect": "Allow",
40 "Action": [
41 "logs:DescribeQueries",
42 "logs:DescribeResourcePolicies",
43 "logs:DescribeSubscriptionFilters"
44 ],
45 "Resource": "*"
46 },
47 {
48 "Sid": "CloudWatchMetricsAccess",
49 "Effect": "Allow",
50 "Action": [
51 "cloudwatch:GetMetricStatistics",
52 "cloudwatch:GetMetricData",
53 "cloudwatch:ListMetrics",
54 "cloudwatch:DescribeAlarms",
55 "cloudwatch:DescribeAlarmsForMetric",
56 "cloudwatch:GetDashboard",
57 "cloudwatch:ListDashboards",
58 "cloudwatch:DescribeAlarmHistory",
59 "cloudwatch:GetMetricWidgetImage",
60 "cloudwatch:ListTagsForResource"
61 ],
62 "Resource": "*"
63 },
64 {
65 "Sid": "EC2DescribeAccess",
66 "Effect": "Allow",
67 "Action": [
68 "ec2:DescribeInstances",
69 "ec2:DescribeRegions",
70 "ec2:DescribeTags",
71 "ec2:DescribeAvailabilityZones",
72 "ec2:DescribeSecurityGroups",
73 "ec2:DescribeSubnets",
74 "ec2:DescribeVpcs",
75 "ec2:DescribeVolumes",
76 "ec2:DescribeNetworkInterfaces"
77 ],
78 "Resource": "*"
79 },
80 {
81 "Sid": "ResourceGroupsAccess",
82 "Effect": "Allow",
83 "Action": [
84 "resource-groups:ListGroups",
85 "resource-groups:GetGroup",
86 "resource-groups:ListGroupResources",
87 "resource-groups:SearchResources"
88 ],
89 "Resource": "*"
90 },
91 {
92 "Sid": "LambdaDescribeAccess",
93 "Effect": "Allow",
94 "Action": [
95 "lambda:ListFunctions",
96 "lambda:GetFunction",
97 "lambda:ListTags",
98 "lambda:GetAccountSettings",
99 "lambda:ListEventSourceMappings"
100 ],
101 "Resource": "*"
102 },
103 {
104 "Sid": "APIGatewayDescribeAccess",
105 "Effect": "Allow",
106 "Action": [
107 "apigateway:GET"
108 ],
109 "Resource": [
110 "arn:aws:apigateway:*::/restapis",
111 "arn:aws:apigateway:*::/restapis/*/stages",
112 "arn:aws:apigateway:*::/restapis/*/resources",
113 "arn:aws:apigateway:*::/domainnames",
114 "arn:aws:apigateway:*::/usageplans"
115 ]
116 },
117 {
118 "Sid": "ECSDescribeAccess",
119 "Effect": "Allow",
120 "Action": [
121 "ecs:ListClusters",
122 "ecs:DescribeClusters",
123 "ecs:ListServices",
124 "ecs:DescribeServices",
125 "ecs:ListTasks",
126 "ecs:DescribeTasks"
127 ],
128 "Resource": "*"
129 },
130 {
131 "Sid": "RDSDescribeAccess",
132 "Effect": "Allow",
133 "Action": [
134 "rds:DescribeDBInstances",
135 "rds:DescribeDBClusters",
136 "rds:ListTagsForResource"
137 ],
138 "Resource": "*"
139 },
140 {
141 "Sid": "TaggingAccess",
142 "Effect": "Allow",
143 "Action": [
144 "tag:GetResources",
145 "tag:GetTagKeys",
146 "tag:GetTagValues"
147 ],
148 "Resource": "*"
149 },
150 {
151 "Sid": "XRayAccess",
152 "Effect": "Allow",
153 "Action": [
154 "xray:BatchGetTraces",
155 "xray:GetServiceGraph",
156 "xray:GetTimeSeriesServiceStatistics",
157 "xray:GetTraceSummaries"
158 ],
159 "Resource": "*"
160 },
161 {
162 "Sid": "SNSAccess",
163 "Effect": "Allow",
164 "Action": [
165 "sns:ListTopics",
166 "sns:GetTopicAttributes"
167 ],
168 "Resource": "*"
169 },
170 {
171 "Sid": "SQSAccess",
172 "Effect": "Allow",
173 "Action": [
174 "sqs:ListQueues",
175 "sqs:GetQueueAttributes"
176 ],
177 "Resource": "*"
178 }
179 ]
180}
181
◼ Attaching Roles and Policies
After creating the IAM User, I’ll create a policy by visiting IAM console > Policies > Create policy > JSON and adding the same policy attached to the IAM User created in the previous step.
Then, configure a new role by visiting IAM console > Roles > Create role > Custom trust policy:
1{
2 "Version": "2012-10-17",
3 "Statement": [
4 {
5 "Effect": "Allow",
6 "Principal": {
7 "AWS": "arn:aws:iam::<AWS ACCOUNT ID>:user/grafana"
8 },
9 "Action": "sts:AssumeRole"
10 }
11 ]
12}
13
Then, attach the policy created in the previous step.
The IAM Role Trust Policy defines who can assume a specific IAM role like “Who is trusted to use this role’s permissions?”
I configured the trust policy to allow the IAM User for Grafana grafana created in the first step to assume the role.
◼ Connecting Data Source to Grafana
Lastly, I’ll configure the data source for Grafana.
Visit Grafana console > Data sources > cloudwatch > add:
Access Key ID: Access key ID of the IAM User grafana
Secret Access Key: Secret access key of the IAM User grafana
Assume Role ARN: ARN of the IAM Role created in the previous step.
Click save & test.
This will allow for importing data from the relative AWS resources:

Kernel Labs | Kuriko IWAI | kuriko-iwai.com
Figure E. Screenshot of the Grafana dashboard
Wrapping Up
In this article, we demonstrated how to integrate a robust CI/CD pipeline into a machine learning application.
While the specific services used may vary depending on the project's needs, the principles remain the same: automating the process and detecting errors as early as possible before the actual deployment.
The next step would be to extend this pipeline to include the crucial aspects of model and data CI/CD.
Continue Your Learning
If you enjoyed this blog, these related entries will complete the picture:
Architecting Production ML: A Deep Dive into Deployment and Scalability
Data Pipeline Architecture: From Traditional DWH to Modern Lakehouse
Building a Production-Ready Data CI/CD Pipeline: Versioning, Drift Detection, and Orchestration
Related Books for Further Understanding
These books cover the wide range of theories and practices; from fundamentals to PhD level.

Linear Algebra Done Right

Foundations of Machine Learning, second edition (Adaptive Computation and Machine Learning series)

Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications

Machine Learning Design Patterns: Solutions to Common Challenges in Data Preparation, Model Building, and MLOps
Share What You Learned
Kuriko IWAI, "Building an Automated CI/CD Pipeline for Serverless Machine Learning on AWS" in Kernel Labs
https://kuriko-iwai.com/integrating-cicd-pipelines
Looking for Solutions?
- Deploying ML Systems 👉 Book a briefing session
- Hiring an ML Engineer 👉 Drop an email
- Learn by Doing 👉 Enroll AI Engineering Masterclass
Written by Kuriko IWAI. All images, unless otherwise noted, are by the author. All experimentations on this blog utilize synthetic or licensed data.


