// CI/CD Guide Β· DevOps & Automation
CI/CD Complete Guide 2026: GitHub Actions, GitLab CI, ArgoCD, GitOps & Expert Interview Q&A
π
Updated May 2026β± 23 min readπ· CI/CD Β· GitHub Actions Β· ArgoCD Β· GitOps Β· DevOps
π¨βπ»
Dhanush R β Senior DevOps Engineer
4.5+ years designing and maintaining production CI/CD pipelines with GitHub Actions, GitLab CI, Jenkins, and ArgoCD across teams of 5 to 50 engineers. Every pattern here comes from real pipelines β not from documentation screenshots.
CI/CD (Continuous Integration and Continuous Delivery/Deployment) is the pipeline backbone of modern DevOps. Every code commit triggers a chain of automated processes β tests, security scans, container builds, vulnerability checks, and deployments β that delivers software reliably and repeatedly. I have designed and maintained CI/CD pipelines for teams of 5 to 50 engineers, across GitHub Actions, GitLab CI, Jenkins, and ArgoCD. This guide covers everything you need to understand, build, and explain CI/CD pipelines in production interviews.
What is CI/CD? The Core Concepts
Continuous Integration (CI) is the practice of automatically building and testing every code change as it is committed to the repository. The goal is to catch integration problems early β when a developer merges a branch, automated tests verify that the new code does not break existing functionality. CI gives teams fast feedback (within minutes) instead of discovering integration bugs days or weeks later during manual testing cycles.
Continuous Delivery (CD) means every passing build is automatically prepared for release to any environment. The software is always in a deployable state. A human makes the decision to deploy to production (clicking an approval button), but the deployment process itself is fully automated. This is the approach most regulated industries (finance, healthcare) use.
Continuous Deployment goes one step further β every passing build is automatically deployed to production without human intervention. Any commit that passes all tests, security scans, and quality gates is live within minutes. Netflix, Amazon, and Google deploy hundreds of times per day using this model. This requires extremely high confidence in automated testing and robust rollback mechanisms.
The pipeline rule: If a step in your pipeline is not automated, it is a bottleneck. Manual testing, manual security reviews, and manual deployment approvals all create queues, introduce human error, and slow the team down. Automate everything that can be objectively verified, and reserve human judgement only for decisions that genuinely require it.
GitHub Actions β The Modern Standard
GitHub Actions is the dominant CI/CD platform for teams using GitHub. It uses YAML workflow files stored in .github/workflows/ and triggered by GitHub events (push, pull_request, schedule, workflow_dispatch). It has the largest marketplace of reusable actions, native GitHub integration, and excellent OIDC support for cloud authentication without stored secrets.
# .github/workflows/ci.yml β production-grade CI pipeline
name: CI Pipeline
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
env:
REGISTRY: 123456789.dkr.ecr.ap-south-1.amazonaws.com
IMAGE_NAME: api-service
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: '20', cache: 'npm' }
- run: npm ci
- run: npm run lint
- run: npm test -- --coverage
- uses: actions/upload-artifact@v4
with: { name: coverage, path: coverage/ }
security-scan:
runs-on: ubuntu-latest
needs: test
steps:
- uses: actions/checkout@v4
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
scan-type: 'fs'
severity: 'CRITICAL,HIGH'
exit-code: '1' # fail pipeline on critical CVEs
build-push:
runs-on: ubuntu-latest
needs: [test, security-scan]
if: github.ref == 'refs/heads/main'
permissions:
id-token: write
contents: read
steps:
- uses: actions/checkout@v4
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789:role/github-ecr-push
aws-region: ap-south-1
- uses: aws-actions/amazon-ecr-login@v2
- name: Build and push Docker image
run: |
IMAGE_TAG=${{ github.sha }}
docker build -t $REGISTRY/$IMAGE_NAME:$IMAGE_TAG .
docker push $REGISTRY/$IMAGE_NAME:$IMAGE_TAG
# Also tag as latest for easy reference
docker tag $REGISTRY/$IMAGE_NAME:$IMAGE_TAG $REGISTRY/$IMAGE_NAME:latest
docker push $REGISTRY/$IMAGE_NAME:latest
echo "image=$REGISTRY/$IMAGE_NAME:$IMAGE_TAG" >> $GITHUB_OUTPUT
id: build
deploy-staging:
runs-on: ubuntu-latest
needs: build-push
environment: staging # requires environment protection rules
steps:
- uses: actions/checkout@v4
- name: Update image tag in Kubernetes manifests
run: |
sed -i "s|image:.*|image: $REGISTRY/$IMAGE_NAME:${{ github.sha }}|g" k8s/staging/deployment.yaml
git config user.email "ci@company.com"
git config user.name "CI Bot"
git commit -am "ci: update staging image to ${{ github.sha }}"
git push # ArgoCD detects the change and deploys
GitLab CI β Enterprise Standard
GitLab CI/CD is deeply integrated with GitLab's source control, merge requests, security scanning, and container registry. It uses a single .gitlab-ci.yml file at the repository root. GitLab CI is popular in enterprises using self-hosted GitLab for data residency requirements.
# .gitlab-ci.yml β multi-stage pipeline with environments
stages:
- test
- build
- deploy-staging
- deploy-production
variables:
DOCKER_BUILDKIT: "1"
IMAGE: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
test:
stage: test
image: node:20-alpine
cache:
key: ${CI_COMMIT_REF_SLUG}
paths: [node_modules/]
script:
- npm ci
- npm run lint
- npm test -- --coverage
coverage: '/Statements\s*:\s*(\d+\.?\d*)%/'
artifacts:
reports:
coverage_report:
coverage_format: cobertura
path: coverage/cobertura-coverage.xml
build:
stage: build
image: docker:24
services: [docker:24-dind]
before_script:
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
script:
- docker build --cache-from $CI_REGISTRY_IMAGE:latest -t $IMAGE .
- docker push $IMAGE
only: [main]
deploy-staging:
stage: deploy-staging
image: bitnami/kubectl:latest
environment:
name: staging
url: https://staging.company.com
script:
- kubectl set image deployment/api api=$IMAGE -n staging
- kubectl rollout status deployment/api -n staging
only: [main]
deploy-production:
stage: deploy-production
image: bitnami/kubectl:latest
environment:
name: production
url: https://api.company.com
script:
- kubectl set image deployment/api api=$IMAGE -n production
- kubectl rollout status deployment/api -n production
when: manual # requires human approval click in GitLab UI
only: [main]
ArgoCD β GitOps for Kubernetes
ArgoCD is a declarative GitOps continuous delivery tool for Kubernetes. Instead of pushing deployments imperatively from CI pipelines (kubectl set image), ArgoCD continuously monitors a Git repository containing Kubernetes manifests and reconciles the cluster to match the desired state in Git. Git becomes the single source of truth for cluster state β every change is a Git commit, providing a complete audit trail, easy rollback (git revert), and clear ownership.
The GitOps pattern separates CI from CD: CI builds the image and updates the image tag in the manifest repository. ArgoCD detects the Git change and handles the cluster deployment. This decoupling means the CI pipeline does not need cluster credentials β only the manifest repository access β improving security significantly.
# ArgoCD Application manifest
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: api-service
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: production
source:
repoURL: https://github.com/company/k8s-manifests
targetRevision: main
path: apps/api-service/production
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true # delete resources removed from Git
selfHeal: true # revert manual kubectl changes
syncOptions:
- CreateNamespace=true
- PrunePropagationPolicy=foreground
retry:
limit: 5
backoff:
duration: 5s
maxDuration: 3m
factor: 2
Production Pipeline Stages
A mature production CI/CD pipeline has these stages in order. Each stage must pass before the next begins:
- Source checkout and dependency installation β Checkout code at the specific commit SHA. Install dependencies with locked versions (package-lock.json, go.sum, Pipfile.lock). Cache dependencies between runs to reduce build time.
- Linting and static analysis β ESLint, golangci-lint, flake8, or language-specific linters catch code quality issues. Fail fast β these run in seconds and catch obvious mistakes before slower tests run.
- Unit and integration tests with coverage β Run the full test suite. Enforce a minimum coverage threshold (80% is a common baseline). Fail if coverage drops. Upload test reports as artifacts.
- Security scanning β SAST, dependency vulnerabilities β SAST (Static Application Security Testing) tools like Semgrep, Snyk Code, or CodeQL scan source code for security vulnerabilities. Dependency scanners (npm audit, OWASP Dependency Check, Snyk) identify known CVEs in third-party libraries. Block merges on critical vulnerabilities.
- Docker image build β Build the production Docker image using BuildKit with layer cache optimisation. Use multi-stage builds. Tag with the git commit SHA for full traceability.
- Container image vulnerability scan β Scan the built image with Trivy, Grype, or Snyk Container for OS package CVEs and application dependency CVEs. Fail on CRITICAL severity by default, consider failing on HIGH in mature security programmes.
- Push to registry β Push the scanned image to the private container registry (ECR, GCR, ACR). Only push images that have passed all previous stages.
- Deploy to staging β Automated deployment to staging environment. Run smoke tests and integration tests against the deployed service.
- Deploy to production β Automated (CD) or manual approval gate (Continuous Delivery). Rolling deployment with readiness probe gating. Monitor error rates and latency during rollout. Automatic rollback on error rate spike.
Secrets Management in CI/CD Pipelines
Never store secrets as plain text in CI/CD configuration files, environment variables in the UI, or β critically β in Git repositories. Common mistakes I see in real pipelines:
Never do this: Hardcode AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY as GitHub Actions secrets or GitLab CI variables. Long-lived IAM access keys are a critical security risk. If they leak (through logs, through a compromised runner, through a supply chain attack on an Action), an attacker has persistent AWS access. Instead, use OIDC federation: GitHub Actions can assume an AWS IAM role via OIDC without any stored secrets. The credentials are temporary (1-hour TTL), scoped to the exact permissions needed, and rotated automatically on every pipeline run.
# GitHub Actions OIDC β no stored AWS credentials
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789:role/github-deploy-role
role-session-name: github-ci-${{ github.run_id }}
aws-region: ap-south-1
# IAM role trust policy allows only your specific repo + branch
# "StringEquals": {"token.actions.githubusercontent.com:sub": "repo:company/api:ref:refs/heads/main"}
Deployment Strategies
- Rolling Update β Gradually replaces old Pods with new ones. Kubernetes default. Zero downtime if readiness probes are configured correctly. Risk: both old and new versions run simultaneously for minutes β requires backward-compatible APIs and database schemas.
- Blue-Green β Two identical environments (blue=live, green=new). Deploy to green, run tests, then switch traffic from blue to green instantly. Immediate rollback is just switching traffic back. Requires double the infrastructure cost while both environments are running.
- Canary β Route a small percentage of production traffic (1%, 5%, 10%) to the new version. Monitor error rates and latency. Gradually increase to 100% if metrics are healthy. Argo Rollouts automates this with Prometheus-based analysis. Catches bugs that only appear under real production traffic patterns.
- Feature Flags β Deploy code to production but disable it with a feature flag. Enable the flag for 1% of users, monitor, gradually increase. Decouples deployment from release. LaunchDarkly, Flagsmith, and Unleash are popular feature flag platforms.
12 CI/CD Interview Questions with Expert Answers
Q1: What is the difference between Continuous Delivery and Continuous Deployment?
Both terms describe the CD in CI/CD but with a key distinction. Continuous Delivery means every successful build is automatically prepared and validated for production release β the deployment to production is possible at any time with a single click or approval. Continuous Deployment means every successful build is automatically deployed to production without any human intervention. Continuous Delivery includes a human approval gate before production. Continuous Deployment removes that gate entirely. The choice between them depends on risk tolerance and regulatory requirements. Regulated industries (banking, healthcare) use Continuous Delivery with approval workflows. Consumer software companies with mature testing use Continuous Deployment. Most DevOps interviews use these terms interchangeably, but knowing the precise distinction shows depth.
Q2: What is GitOps and how does ArgoCD implement it?
GitOps is an operational framework where Git is the single source of truth for infrastructure and application desired state. All changes β application code, Kubernetes manifests, infrastructure configuration β are made through Git commits and pull requests. A GitOps operator (ArgoCD, Flux) continuously reconciles the cluster state to match Git. This gives you: a complete audit trail (every change is a commit with an author and message), instant rollback (git revert the problematic commit), enforced pull request review (no direct kubectl apply to production), and disaster recovery (rebuild any environment from Git). ArgoCD implements GitOps by watching a Git repository containing Kubernetes manifests, comparing them to the live cluster state, and automatically syncing divergences. With selfHeal=true, any manual kubectl change to the cluster is automatically reverted to match Git within seconds β Git always wins.
Q3: How do you handle database migrations in a CI/CD pipeline?
Database migrations in CI/CD are the most common source of deployment-related incidents. The correct approach: (1) Run migrations as a Kubernetes Job (not as part of the application startup) before the new Deployment rolls out. If the migration fails, the deployment is blocked and the old version continues serving traffic. (2) Write migrations to be backward-compatible β the old version of the application must be able to run against the new schema during the rolling update period (both versions run simultaneously). Never rename a column directly β add the new column, backfill it, update the app to read the new column, then drop the old column in a later release. (3) Test migrations in staging against a copy of production data before deploying to production. (4) Use a migration tool (Flyway, Liquibase, Alembic, golang-migrate) that tracks which migrations have run in a schema history table, preventing duplicate runs.
Q4: What is a pipeline artifact and why are artifacts important?
A pipeline artifact is a file or set of files produced by one pipeline stage and consumed by a later stage or stored for later analysis. Examples: compiled binaries, Docker images pushed to a registry, test coverage reports (HTML and Cobertura XML), SAST scan reports (SARIF format for GitHub Security tab), Terraform plan files, and signed release archives. Artifacts matter for three reasons: (1) Immutability β the Docker image tagged with the git commit SHA that passed all tests in CI is the exact same binary deployed to staging and then production. You never rebuild from source for each environment. (2) Traceability β given any production incident, you can trace the running image tag to the exact commit, the CI run that built it, the test results, and the security scan results. (3) Speed β downloading a pre-built artifact from a cache is orders of magnitude faster than rebuilding from source in every stage.
Q5: How do you implement zero-downtime deployments in a CI/CD pipeline?
Five requirements that must all be satisfied simultaneously: (1) Kubernetes rolling update with maxUnavailable=0 so capacity never drops below 100% during rollout. (2) Readiness probes that accurately report when the new Pod is ready to serve traffic β the deployment waits for each new Pod to pass readiness before removing an old Pod. (3) A preStop lifecycle hook (sleep 5) giving in-flight requests time to complete before the Pod receives SIGTERM. (4) terminationGracePeriodSeconds long enough for the application to drain gracefully. (5) Backward-compatible API changes and database schema migrations run before the deployment β the old version must work against the new schema during the transition period. In the CI/CD pipeline, add a post-deployment health check: curl the /health endpoint and check error rates in CloudWatch or Prometheus for 5 minutes after rollout. Trigger automatic rollback if error rate exceeds threshold.
Q6: What is a canary deployment and how do you automate it?
A canary deployment routes a small percentage of production traffic to the new version while the majority continues to the stable version. The name comes from the "canary in a coal mine" β a small signal of danger before it becomes widespread. Implementation with Argo Rollouts: define a Rollout resource instead of a Deployment, specify the canary strategy with steps (10% β pause 5 min β analyse β 50% β pause β analyse β 100%), and configure an AnalysisTemplate that queries Prometheus for error rate and latency SLOs. Argo Rollouts automatically promotes the canary if metrics stay within bounds, or rolls back if they breach thresholds. This is fully automated with no human involvement after the initial deployment trigger. For HTTP traffic splitting, the AWS ALB Ingress Controller supports weighted target groups, and NGINX Ingress supports canary-weight annotations. Service meshes (Istio, Linkerd) provide the most precise traffic control at the request level.
Q7: How do you prevent secrets from leaking in CI/CD pipeline logs?
Multiple layers of protection: (1) Never echo or print secret values in pipeline scripts. Use secret masking (GitHub Actions automatically masks any value stored as a repository secret). (2) Use OIDC federation instead of stored IAM credentials β temporary credentials generated per-run cannot leak a persistent secret. (3) Mark Terraform variables as sensitive=true so they never appear in plan output. (4) Use --no-print-directory and redirect sensitive command output to /dev/null. (5) Implement a secret scanner (truffleHog, GitLeaks, Gitleaks) as a pre-commit hook and a CI step that scans every commit for accidentally committed secrets. Configure GitHub's secret scanning feature which automatically detects and alerts on 200+ secret patterns (AWS keys, private keys, API tokens) committed to any repository. (6) Rotate any secret that may have been exposed immediately, even if you think it was masked β mask failures happen.
Q8: What is the difference between a pipeline trigger on push vs pull request?
Push triggers run the pipeline when a commit is pushed to a branch β including main, develop, and feature branches. They are used for CI validation on all branches and for CD deployments when code lands on main. Pull request (PR) triggers run when a PR is opened, updated, or synced β the pipeline validates that the proposed change is safe to merge. PR pipelines typically run tests, linting, security scans, and Terraform plan β but do not deploy. Push-to-main triggers the deploy pipeline after the PR is merged and approved. Using both: PR pipeline catches issues before they merge (fast feedback, protect main branch quality), push-to-main pipeline deploys validated code (continuous delivery). In GitHub Actions, use on.push.branches for deployment workflows and on.pull_request.branches for validation workflows. Use branch protection rules to require PR pipeline success before merging.
Q9: How do you handle environment-specific configuration in CI/CD?
Three patterns in increasing order of sophistication: (1) Environment variables injected at runtime β CI/CD platform stores dev, staging, and prod variable sets. The application reads config from environment variables (12-Factor App methodology). The Docker image is identical across environments; only the env vars differ. (2) Kubernetes ConfigMaps and Secrets per namespace β one Deployment manifest, environment-specific config mounted as ConfigMaps in dev/staging/prod namespaces. ArgoCD ApplicationSets generate per-environment Applications from a single template. (3) Helm values files β a base chart with environment-specific values.yaml files (values-staging.yaml, values-production.yaml). CI/CD runs helm upgrade with the appropriate values file for each environment. This gives you templating, version pinning, and a clean separation between the app's Kubernetes structure and its environment-specific configuration.
Q10: What is trunk-based development and why does CI/CD require it?
Trunk-based development is a branching strategy where all developers commit directly to a single main branch (the "trunk") multiple times per day. Feature branches exist for at most 1β2 days before merging. Contrast this with Gitflow or long-lived feature branches that can diverge for weeks. CI/CD requires trunk-based development because: long-lived branches accumulate large diffs that are painful and risky to merge, preventing the frequent small merges that CI is designed for. With trunk-based development, every commit is small, tested, and integrated continuously. Feature flags handle partially-complete features that need to be merged before they are ready for users. Trunk-based development is the branching strategy used at Google (single monorepo trunk), Facebook, and Netflix. For teams not ready for pure trunk-based development, short-lived feature branches (< 2 days) with squash merges to main is an acceptable intermediate approach.
Q11: How do you roll back a bad deployment quickly?
Multiple rollback mechanisms, from fastest to most comprehensive: (1) Kubernetes rollout undo: kubectl rollout undo deployment/api -n production β immediately scales up the previous ReplicaSet. Takes 30β60 seconds. This is the fastest option. (2) ArgoCD sync to previous Git commit: argocd app sync api-service --revision <previous-sha> β GitOps rollback with full audit trail. (3) Pipeline-triggered rollback: CI/CD monitors post-deployment error rate via Prometheus or CloudWatch. If error rate exceeds threshold in the 5 minutes after deployment, automatically trigger a rollback job. Argo Rollouts implements this automatically for canary and blue-green deployments. (4) Feature flag disable: if the bad behaviour is behind a feature flag, disable the flag in LaunchDarkly/Unleash β takes seconds and requires no deployment. For the fastest possible rollback, design your deployments to be stateless (no database migrations that need reverting) and always keep the previous image tag available in ECR.
Q12: How do you measure CI/CD pipeline quality? What metrics matter?
The four DORA metrics (DevOps Research and Assessment) are the industry standard for measuring CI/CD effectiveness: (1) Deployment frequency β how often you deploy to production. Elite teams deploy multiple times per day. Measured by counting production deployments per day/week. (2) Lead time for changes β time from code commit to running in production. Elite: under 1 hour. Measured from first commit of a change to its production deployment. (3) Change failure rate β percentage of deployments that cause production incidents requiring a hotfix or rollback. Elite: 0β15%. Measured by (rollbacks + hotfixes) / total deployments. (4) Mean time to recovery (MTTR) β how long to restore service after a production incident. Elite: under 1 hour. Measured from incident detection to resolution. In addition to DORA metrics, track: pipeline duration (identify slow stages), test flakiness rate (flaky tests erode trust in CI), and security scan coverage (percentage of builds with container scanning).