Building a CI/CD Pipeline That Actually Works
Lessons from setting up CI/CD for 50+ projects across different tech stacks.
Every team starts with the same CI/CD dream: push code, tests run, deployment happens, everyone's happy. But after setting up pipelines for 50+ projects at Vaarak, I can tell you that the gap between a "working" pipeline and a production-grade one is enormous. A bad CI/CD pipeline is worse than no pipeline — it's slow, flaky, and creates a false sense of security.
This article shares the patterns, anti-patterns, and hard-won lessons from building CI/CD systems across Next.js apps, Python microservices, mobile apps, embedded firmware, and monorepos. Whether you're starting from scratch or improving an existing pipeline, these principles apply universally.
Principle 1: Speed Is a Feature
If your CI pipeline takes 30 minutes, developers will stack PRs, context-switch to other tasks, and lose the feedback loop that makes CI valuable. Our target: under 5 minutes for the critical path (lint + type-check + unit tests), under 15 minutes for the full pipeline including integration tests and deployment.
- Parallelize everything: Run lint, type-check, and tests in parallel jobs, not sequentially
- Cache aggressively: npm/yarn/pnpm cache, Docker layer cache, build cache (Next.js .next/cache, Turborepo)
- Only test what changed: Use tools like Turborepo, Nx, or Jest's --changedSince to run only affected tests
- Use larger runners: GitHub's 4-core runners cost 2x but run 3-4x faster than default 2-core
- Fail fast: Put the fastest checks (lint, type-check) first and cancel on failure
name: CI
on:
pull_request:
branches: [main]
# Cancel in-progress runs for the same PR
concurrency:
group: ci-${{ github.ref }}
cancel-in-progress: true
jobs:
# Fast checks run in parallel (< 2 min each)
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
cache: npm
- run: npm ci
- run: npm run lint
type-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
cache: npm
- run: npm ci
- run: npx tsc --noEmit
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
cache: npm
- run: npm ci
- run: npm test -- --coverage
# Build only runs after all checks pass
build:
needs: [lint, type-check, test]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
cache: npm
- run: npm ci
- run: npm run buildPrinciple 2: Eliminate Flaky Tests
Flaky tests are the silent killer of CI/CD. When tests randomly fail, developers start ignoring failures, re-running pipelines "one more time," and eventually losing trust in the entire system. We have a zero-tolerance policy for flaky tests: if a test fails non-deterministically, it gets quarantined immediately and fixed or deleted within 48 hours.
A flaky test that fails 5% of the time will fail at least once in every 20-run batch. With 100 tests at 5% flake rate each, you'll have a flaky failure in nearly every pipeline run. Flakiness compounds exponentially.
Common causes of flaky tests and their fixes: race conditions in async tests (use proper await and assertion libraries), time-dependent tests (mock Date.now and use fake timers), database state leakage between tests (use transactions with rollback), and port conflicts in integration tests (use random port allocation).
Principle 3: Environment Parity
"It works on my machine" is a meme for a reason. We ensure that CI, staging, and production environments are as identical as possible. Docker containers, pinned dependency versions, and infrastructure-as-code eliminate entire categories of environment-specific bugs.
# Multi-stage build for consistent environments
FROM node:20-alpine AS base
WORKDIR /app
# Dependencies layer (cached unless package*.json changes)
FROM base AS deps
COPY package.json package-lock.json ./
RUN npm ci --ignore-scripts
# Build layer
FROM base AS builder
COPY --from=deps /app/node_modules ./node_modules
COPY . .
RUN npm run build
# Production image
FROM base AS runner
ENV NODE_ENV=production
RUN addgroup -g 1001 -S nodejs && adduser -S nextjs -u 1001
COPY --from=builder --chown=nextjs:nodejs /app/.next/standalone ./
COPY --from=builder --chown=nextjs:nodejs /app/.next/static ./.next/static
COPY --from=builder --chown=nextjs:nodejs /app/public ./public
USER nextjs
EXPOSE 3000
CMD ["node", "server.js"]Principle 4: Progressive Deployment
Never deploy directly to 100% of users. We use a progressive deployment strategy: merge to main triggers a deployment to a canary environment (5% of traffic), automated health checks monitor error rates and latency for 10 minutes, then traffic gradually shifts to 25%, 50%, and 100%. If any health check fails, traffic automatically rolls back.
- PR merged → Build and push Docker image with Git SHA tag
- Deploy to canary (5% traffic) with automated smoke tests
- 10-minute monitoring window: error rate < 0.1%, P99 latency < target
- Progressive rollout: 25% → 50% → 100% with monitoring at each stage
- Full rollout with automated Slack notification to the team
- If any check fails: automatic rollback to previous version within 30 seconds
Principle 5: Security in the Pipeline
Security scanning should be automated and blocking. We run four types of security checks in every pipeline: dependency vulnerability scanning (npm audit / Snyk), static code analysis (CodeQL / Semgrep), secret detection (GitLeaks), and container image scanning (Trivy). Critical and high vulnerabilities block deployment; medium and below generate tickets for the next sprint.
The Template We Use for Every Project
After 50+ projects, we've converged on a standard pipeline template that we customize for each stack. The core structure is always the same: fast checks in parallel, build after checks pass, security scanning, deployment to staging with automated tests, progressive production rollout. The specifics (test frameworks, build commands, deployment targets) are configurable per project.
“The best CI/CD pipeline is one that developers don't have to think about. It should be fast enough that they don't context-switch, reliable enough that they trust it, and comprehensive enough that nothing slips through. Build the pipeline once, maintain it continuously.”
— Marcus Rodriguez, Vaarak DevOps
Marcus Rodriguez
DevOps Engineering Lead