Building a CI/CD Pipeline That Actually Works

Every team starts with the same CI/CD dream: push code, tests run, deployment happens, everyone's happy. But after setting up pipelines for 50+ projects at Vaarak, I can tell you that the gap between a "working" pipeline and a production-grade one is enormous. A bad CI/CD pipeline is worse than no pipeline — it's slow, flaky, and creates a false sense of security.

This article shares the patterns, anti-patterns, and hard-won lessons from building CI/CD systems across Next.js apps, Python microservices, mobile apps, embedded firmware, and monorepos. Whether you're starting from scratch or improving an existing pipeline, these principles apply universally.

GitHub Actions and CI/CD workflow — A well-designed pipeline balances speed, reliability, and developer experience

Principle 1: Speed Is a Feature

If your CI pipeline takes 30 minutes, developers will stack PRs, context-switch to other tasks, and lose the feedback loop that makes CI valuable. Our target: under 5 minutes for the critical path (lint + type-check + unit tests), under 15 minutes for the full pipeline including integration tests and deployment.

Parallelize everything: Run lint, type-check, and tests in parallel jobs, not sequentially
Cache aggressively: npm/yarn/pnpm cache, Docker layer cache, build cache (Next.js .next/cache, Turborepo)
Only test what changed: Use tools like Turborepo, Nx, or Jest's --changedSince to run only affected tests
Use larger runners: GitHub's 4-core runners cost 2x but run 3-4x faster than default 2-core
Fail fast: Put the fastest checks (lint, type-check) first and cancel on failure

.github/workflows/ci.yml

name: CI
on:
  pull_request:
    branches: [main]

# Cancel in-progress runs for the same PR
concurrency:
  group: ci-${{ github.ref }}
  cancel-in-progress: true

jobs:
  # Fast checks run in parallel (< 2 min each)
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: npm
      - run: npm ci
      - run: npm run lint

  type-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: npm
      - run: npm ci
      - run: npx tsc --noEmit

  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: npm
      - run: npm ci
      - run: npm test -- --coverage

  # Build only runs after all checks pass
  build:
    needs: [lint, type-check, test]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: npm
      - run: npm ci
      - run: npm run build

Principle 2: Eliminate Flaky Tests

Flaky tests are the silent killer of CI/CD. When tests randomly fail, developers start ignoring failures, re-running pipelines "one more time," and eventually losing trust in the entire system. We have a zero-tolerance policy for flaky tests: if a test fails non-deterministically, it gets quarantined immediately and fixed or deleted within 48 hours.

A flaky test that fails 5% of the time will fail at least once in every 20-run batch. With 100 tests at 5% flake rate each, you'll have a flaky failure in nearly every pipeline run. Flakiness compounds exponentially.

Common causes of flaky tests and their fixes: race conditions in async tests (use proper await and assertion libraries), time-dependent tests (mock Date.now and use fake timers), database state leakage between tests (use transactions with rollback), and port conflicts in integration tests (use random port allocation).

Development team collaboration — A reliable CI pipeline builds team confidence and enables faster iteration cycles

Principle 3: Environment Parity

"It works on my machine" is a meme for a reason. We ensure that CI, staging, and production environments are as identical as possible. Docker containers, pinned dependency versions, and infrastructure-as-code eliminate entire categories of environment-specific bugs.

Dockerfile

# Multi-stage build for consistent environments
FROM node:20-alpine AS base
WORKDIR /app

# Dependencies layer (cached unless package*.json changes)
FROM base AS deps
COPY package.json package-lock.json ./
RUN npm ci --ignore-scripts

# Build layer
FROM base AS builder
COPY --from=deps /app/node_modules ./node_modules
COPY . .
RUN npm run build

# Production image
FROM base AS runner
ENV NODE_ENV=production
RUN addgroup -g 1001 -S nodejs && adduser -S nextjs -u 1001
COPY --from=builder --chown=nextjs:nodejs /app/.next/standalone ./
COPY --from=builder --chown=nextjs:nodejs /app/.next/static ./.next/static
COPY --from=builder --chown=nextjs:nodejs /app/public ./public

USER nextjs
EXPOSE 3000
CMD ["node", "server.js"]

Principle 4: Progressive Deployment

Never deploy directly to 100% of users. We use a progressive deployment strategy: merge to main triggers a deployment to a canary environment (5% of traffic), automated health checks monitor error rates and latency for 10 minutes, then traffic gradually shifts to 25%, 50%, and 100%. If any health check fails, traffic automatically rolls back.

PR merged → Build and push Docker image with Git SHA tag
Deploy to canary (5% traffic) with automated smoke tests
10-minute monitoring window: error rate < 0.1%, P99 latency < target
Progressive rollout: 25% → 50% → 100% with monitoring at each stage
Full rollout with automated Slack notification to the team
If any check fails: automatic rollback to previous version within 30 seconds

Principle 5: Security in the Pipeline

Security scanning should be automated and blocking. We run four types of security checks in every pipeline: dependency vulnerability scanning (npm audit / Snyk), static code analysis (CodeQL / Semgrep), secret detection (GitLeaks), and container image scanning (Trivy). Critical and high vulnerabilities block deployment; medium and below generate tickets for the next sprint.

The Template We Use for Every Project

After 50+ projects, we've converged on a standard pipeline template that we customize for each stack. The core structure is always the same: fast checks in parallel, build after checks pass, security scanning, deployment to staging with automated tests, progressive production rollout. The specifics (test frameworks, build commands, deployment targets) are configurable per project.

“The best CI/CD pipeline is one that developers don't have to think about. It should be fast enough that they don't context-switch, reliable enough that they trust it, and comprehensive enough that nothing slips through. Build the pipeline once, maintain it continuously.”
— Marcus Rodriguez, Vaarak DevOps