CI/CD Pipeline Orchestration
-
Releasing Stale Terraform State Locks in AWS DynamoDB After Pipeline Failures
Last Tuesday at 2 a.m., I received a notification on my phone that made me sick to my stomach as a DevOps engineer. The notification indicated that our CI/CD pipeline’s Terraform apply had timed out. The runner had sent a SIGTERM from Jenkins, which killed the job and left the state file locked. Any subsequent pipeline runs that followed had…
Read More » -
Mitigating GitHub Actions “API rate limit exceeded for installation” Errors in Matrix Builds
I have gone through this issue and know how frustrating it can be when you are running a cross-platform matrix that is firing off 60 jobs at once, and you start to see half of your jobs failing with a 403 error code. If you look at the logs, they will tell you “API rate limit exceeded for installation ID”,…
Read More » -
Diagnosing ArgoCD “OutOfSync” and “Degraded” Status for Helm CustomResourceDefinitions
Half of my Saturday had been wasted, but I only realized it after looking at ArgoCD. The status of my Helm-based application is OutOfSync and Degraded along with a little heart icon that was pulsing in red. The resource that has this status was a CustomResourceDefinition (CRD). All indications were that the CRD was installed properly, the CRD controller was functioning, and a kubectl get…
Read More » -
Fixing GitLab Runner “Dial tcp: lookup gitlab.com: no such host” DNS Resolution Errors
The pipeline has created a wall of red, failed due to “dial tcp: lookup gitlab.com: no such host” after you pushed a critical hotfix at 2 AM, and now you’ve lost your CI/CD pipeline. Now you’re mad at Docker DNS because it worked fine 20 minutes ago, and you wasted your entire night searching for “gitlab runner dial tcp lookup…
Read More » -
Resolving Jenkins Java Heap Space OutOfMemoryError During Large Maven Builds
It is currently 2:00 A.M., and you have restarted Jenkins twice in an attempt to build your large-scale Maven multilanguage project. However, you are still getting out of memory errors, “Java Heap Space” or “PermGen Space.” I understand how frustrating this can be. After restarting Jenkins and attempting to troubleshoot an issue, you may be contemplating whether there is something…
Read More » -
Deploying Ephemeral Self-Hosted GitHub Action Runners on Proxmox VE
Every day at work, for some reason I have to deal with a dead runner again, at 2:00 a.m. My deployment has failed and on my GitHub Actions interface, I see my persistent self-hosted runner as ‘offline’, with a stale registration token. I have gone through this before… A home server that runs self-hosted runners 24/7, and the runners eventually…
Read More » -
Structuring Jenkins Pipeline Shared Libraries for Modular CI/CD Workflows
In 2019 I discovered a collection of 40+ microservices that all shared a similar Jenkinsfile. All the generic boilerplate to build and tag Docker images, upload artifacts to S3, and send notifications to Slack were being copy/pasted multiple times across all the microservices that were sharing this repo. You could manually change logic in one location (like changing the notification…
Read More » -
Automating Terraform State Management via GitHub Actions OIDC Authentication
Around a year ago, I made the mistake of pushing an AWS access key to a publicly available repository. The access key belonged to an extremely old IAM user that had full admin access (don’t ask how that happened!). Thankfully, I was able to identify and remedy this occurrence within minutes by rotating that access key, but it meant that…
Read More » -
Implementing GitOps with ArgoCD for Kubernetes Cluster Synchronization
At 3 am, I experienced a major breakdown when a production namespace was lost as our Jenkins pipeline using kubectl apply –force overwrote the manual hotfix that had been applied moments earlier by a team member. The push model of deploying to a cluster had failed to deliver on its promise, and this event resolved me that I would never…
Read More » -
Architecting a GitLab CI/CD Pipeline for Multi-Stage Docker Builds
Last month I found myself staring at a Docker image for our Go API that was 1.4 GB in size and was taking the GitLab pipeline eleven (11) minutes to build each time I made a push to the repository. No, that is not a typo, the actual image had everything from the full Go toolchain to half of apt and even…
Read More »