Architecture
How everything fits together, cloud, network, services, data. A guide to the boring decisions and the seams between them. Written so a new engineer can read once and know where to look the next time something breaks.
1. The shape of it
GitApplied is a job-search atelier: a kanban board for job postings, an LLM-assisted scrape-and-tailor pipeline, a document library, and a Chrome extension that captures jobs while the user browses. The system is a single AWS account in us-east-1 with two EC2 instances, a Postgres database, an S3 bucket, and a small set of managed services for email, billing, and DNS. Everything else is application code.
┌─────────────────────────────────────────┐
│ User · browser · ext │
└────────────────┬────────────────────────┘
│ HTTPS
▼
Route 53 ──► app.githired.com
│
▼
┌──────────────────────────────────────────┐
│ ALB · TLS 1.3 · ACM cert · 2 AZs │
│ /api/* → API target group │
│ default → Web target group │
└────────┬─────────────────────┬───────────┘
│ │
private subnet private subnet
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Web EC2 │ │ API EC2 │
│ nginx + │ │ Go + Gin │
│ React SPA │ │ chromedp │
└──────────────┘ └──────┬───────┘
│
┌───────────────┬───────────────────┼──────────────┐
▼ ▼ ▼ ▼
Secrets Manager RDS Postgres S3 bucket SES v2
(DB + app JSON) 16.4 · gp3 documents/ transactional
single-AZ prefix only email
│
▼
┌─────────────────┐
│ OpenAI API │
│ Stripe API │
└─────────────────┘
Region us-east-1, VPC 10.0.0.0/16 over us-east-1a and us-east-1b, three-tier subnet layout (public / app / db). All compute lives in private subnets with no public IPs; outbound internet goes through a single NAT gateway.
2. AWS & cloud topology
Everything in production is provisioned with Terraform under infra/terraform/. The list below is everything AWS knows about us, grouped by purpose.
Compute
| Service | Role | Notes |
|---|---|---|
| EC2 · web | nginx + React SPA | t4g.small, arm64 AL2023, gp3 20GB encrypted, IMDSv2 required. Runs the built React bundle via docker compose. |
| EC2 · api | Go API server | t4g.small, arm64 AL2023. Runs the API container plus an embedded headless Chromium for scraping JS-heavy job postings. |
| ALB | Public edge | 2-AZ ALB. TLS 1.3 termination via ACM cert. Path-based routing: /api/* → API TG, default → Web TG. drop_invalid_header_fields on. |
| NAT Gateway | Egress for private subnets | Single NAT (one AZ). Outbound is just container pulls and outbound HTTPS, so a single SPOF is acceptable today. |
Data & storage
| Service | Role | Notes |
|---|---|---|
| RDS Postgres 16 | Primary datastore | db.t4g.micro, gp3 20GB autoscaling to 100GB, single-AZ, encrypted, 7-day backups, deletion protection on. Schema migrations live in internal/database/migrations and run on API boot. |
| S3 · documents bucket | User documents | All resumes and cover letters are stored under the documents/ prefix. IAM policy on the EC2 role restricts ListBucket to that prefix only. Reads are served via short-lived pre-signed URLs. |
| S3 · docs bucket | Public docs site | Created out-of-band in the console (see Out-of-band AWS below). |
| ECR · api / web | Image registry | Two repos, scan-on-push, lifecycle policy keeps the last 20 images. |
Identity & secrets
| Service | Role | Notes |
|---|---|---|
| Secrets Manager · db | Postgres credentials | JSON blob with username, password, host, port, dbname, and a pre-rendered url. Password generated by Terraform random_password. |
| Secrets Manager · app | Non-DB secrets | JWT signing key, OpenAI key, S3 bucket name, all six Stripe variables. The API renders .env from this blob on every boot, a rotation is a reboot, not a redeploy. |
| IAM · ec2 role | Instance role | Reads the two specific secret ARNs, pulls from ECR (AmazonEC2ContainerRegistryReadOnly), and gets PutObject/GetObject/DeleteObject scoped to documents/*. SSM Managed Instance Core attached for shell-in-without-keys. |
| IAM · GitAppliedDeploy | CI deploy role | OIDC trust to GitHub Actions. Used to push images to ECR and restart EC2. Created in the console; not in Terraform. |
DNS, TLS, edge
| Service | Role | Notes |
|---|---|---|
| Route 53 | Authoritative DNS | Apex, www., and app. records are A-aliases to the ALB. Cert validation records also live here. |
| ACM | TLS certificate | SAN covering apex + www + app. DNS-validated. Attached to the HTTPS listener with TLS 1.3 policy. |
Messaging
| Service | Role | Notes |
|---|---|---|
| SES v2 | Transactional email | Password reset, welcome, email verification. SDK call from internal/email/mailer.go. Falls back to a stdout logger in dev when EMAIL_FROM_ADDRESS is empty. |
Out-of-band AWS (not in Terraform)
- S3 docs bucket for the marketing/docs site.
- GitAppliedDeploy OIDC role used by GitHub Actions.
- gh-api IAM user for any non-OIDC automation.
- SSH keypairs used for break-glass instance access.
These were created in the console because they predate the Terraform layout or because rotating them via Terraform would be more risk than benefit. If you create new infrastructure, default to Terraform.
3. Network & security
VPC layout
VPC CIDR 10.0.0.0/16, three subnet tiers spread across us-east-1a and us-east-1b:
- Public, ALB and the NAT gateway live here. Inbound from the internet on 80/443 only.
- App (private), the web and API EC2 instances. No public IPs. Outbound goes through the NAT.
- Database (private), RDS subnet group only. No route to the internet.
Security-group chain
Four security groups, chained by reference rather than by IP so the rules survive any IP change.
internet ──► alb-sg :443,:80
│
▼ (referenced by SG, not CIDR)
web-sg :80 api-sg :8080
│
▼ (referenced by SG)
rds-sg :5432
The web and API SGs accept ingress only from the ALB SG. The RDS SG accepts ingress only from the API SG. The ALB SG is the only thing open to the internet. Adding a worker tomorrow is one SG ingress rule.
Hardening defaults
- IMDSv2 required on both EC2 instances.
- EBS volumes encrypted by default.
- RDS storage encrypted, deletion protection on, 7-day backups, final snapshot.
- ALB
drop_invalid_header_fieldsenabled. - TLS policy
ELBSecurityPolicy-TLS13-1-2-2021-06. - S3 IAM scoping to the
documents/*prefix, the role cannot enumerate anything else.
Application-level guards
- CORS reflects either the configured
APP_URLor anychrome-extension://origin; credentials are allowed and the allow-origin is set per-request (no wildcard). - Three rate-limiter buckets (token-bucket, in-process): an auth limiter keyed by IP, a per-user API limiter, and a tighter AI limiter shared across
/extract,/tailor, application-question generation, and skill-match endpoints. - JWT cookies for the SPA, Bearer API tokens for the Chrome extension. The bearer token is hashed at rest and
last_used_atis best-effort-touched. - Per-job ownership middleware wraps every
/jobs/:id/...route so child handlers can trust the path param. - Feature-tier middleware gates Base-only endpoints (job posting breakdown, skill match, document editor, interview prep) and Premium-only endpoints (resume tailor, cover letter, application answers).
- Stripe webhook is mounted outside the v1 group; the Stripe signature is the authentication.
4. Data plane & storage
Postgres
One Postgres 16 database under the githired schema. Migrations are versioned SQL files embedded into the binary at build time and run on API boot via golang-migrate. The current series ends at migration 35; the latest migrations cover job submissions, user preferences, resume profile snapshots, requirement matches, cover-letter drafts, and per-column timestamps.
Repositories live in internal/database/, one file per aggregate. Notable entities:
- Users, auth, profile, tier, trial window, preferences, Stripe customer ID.
- Jobs, the kanban card. Has scraping status, autofill-ready flag, outcome, normalized URL, and a per-job example cover letter.
- Job-scoped children, contacts, interview rounds, prep notes, job notes (with pinning and threaded comments), application questions, generated documents (multi-version resumes + cover letters), job posting highlights, requirement matches, cover-letter drafts, submissions.
- Documents, user’s uploaded resumes and cover letters; the actual bytes live in S3.
- Resume profile, structured snapshot of a parsed resume (experiences, accomplishments, education). There is a base profile per user and a per-job snapshot under
job_resume_profiles. - Auth tables, password reset tokens, email verification tokens, API tokens (hashed), Stripe billing fields.
S3 documents
Every uploaded resume / cover letter and every generated export is keyed under documents/<user>/<doc>. The API serves bytes by issuing a short-lived pre-signed GET URL; the browser fetches the file directly. The same wrapper supports PutObject, CopyObject, and DeleteObject, and falls back to the EC2 instance-role credential chain when static keys are not configured.
Secrets & configuration
The API reads its full configuration from environment variables (pkg/config). In dev, a .env file is loaded from one of several relative paths; in prod, the EC2 user-data script reads the two Secrets Manager blobs and writes /opt/gitapplied/.env before docker compose up.
5. Application services
The Go API is a single binary (cmd/server) using Gin as the HTTP layer. There is no microservice split today; instead, the binary is composed of a handful of cohesive packages under internal/.
| Package | Responsibility |
|---|---|
internal/auth | Tier & feature catalog. EffectiveTier resolves a stored tier + trial window into the tier the user should be treated as at request time. Mirrored in web/src/auth/features.ts. |
internal/middleware | JWT/Bearer auth, per-IP and per-user rate limiters, feature-gate, job-ownership. |
internal/handlers | HTTP handlers, one file per resource. The router in cmd/server/server.go wires them all up. |
internal/database | SQL repositories, the embedded migration source, and the CardDataLoader that fans out per-job sub-reads. |
internal/services | S3 wrapper and the LLM-backed services (skill match, tailor, resume text extraction). |
internal/extractor | The scrape-and-parse pipeline. A site-specific scraper handles Greenhouse / Lever / Ashby / LinkedIn / Indeed / Workday; chromedp renders JS-heavy pages; the LLM enrichment step uses an OpenAI model to fill in skills, responsibilities, benefits, company, salary, etc. |
internal/billing | Stripe SDK wrapper. Treats Stripe as the source of truth for subscription state; the app stores Stripe IDs and reacts to webhook events. |
internal/email | SES v2 mailer for transactional email, with a stdout fallback in dev. Disposable-domain blocklist lives next to it. |
The hot paths
POST /api/v1/extract, scrape a URL or parse pasted text, run the LLM enrichment, return a structured job. Rate-limited per user against the shared AI bucket.POST /api/v1/jobs/from-extension, the Chrome extension posts the active tab’s URL and HTML; the API reuses the LLM enrichment without re-scraping and creates a pending job card.POST /api/v1/tailor, Premium-gated. Tailors a resume against a job. Uses OpenAI.POST /api/v1/jobs/:id/auto-skill-match, Base-gated. Matches the user’s resume bullets to the job posting’s requirements.POST /api/v1/jobs/:id/questions/generate+/polish, Premium-gated. Drafts and polishes free-form application answers.POST /api/v1/billing/checkout//portal, redirect to Stripe-hosted Checkout and Customer Portal. We never see the card.POST /api/v1/billing/webhook, Stripe webhook receiver. Mounted outside the auth-required group; signature is the auth.
6. Frontend & extension
Web app
React 19 + TypeScript SPA under web/src/, built with react-scripts and a Tailwind CSS layer compiled separately. State management uses Zustand stores for auth, board sort, board data, bullet selection, and theme. The rich-text experience uses Tiptap; drag-and-drop on the kanban uses dnd-kit; resume rendering uses docx-preview, mammoth for parsing, and html2pdf.js for export.
The SPA ships as an nginx container with a single try_files $uri $uri/ /index.html rewrite for client-side routing. Hashed static assets are cached one year, immutable.
Chrome extension
Manifest V3, built with Vite + CRXJS under extension/. Three execution contexts:
- Popup, user-facing UI for “save this job posting” and account status.
- Background service worker, calls the API with the Bearer token and proxies messages from content scripts.
- Content scripts, injected on the supported job boards (Greenhouse, Lever, Ashby, LinkedIn, Indeed, Workday). They grab
document.outerHTMLon demand so the API can parse pages it cannot reach from outside.
The extension authenticates via a one-time token issued from Settings → Connections. The public half of the signing key is pinned in manifest.config.ts so the extension ID is deterministic across dev, CI, and the Chrome Web Store.
Mocks
This document, and everything else under web/mocks/, is a static HTML mock that shares a self-contained design-token CSS file (mocks.css) with no React, no build step, and no shared component library. The mocks ship the design system; the production SPA is migrating onto the same tokens.
7. Third-party services
| Service | Use | How it’s wired |
|---|---|---|
| OpenAI | LLM enrichment, resume tailor, cover-letter drafting, skill match, application-answer drafting and polish | OPENAI_API_KEY in Secrets Manager. Direct HTTPS calls from the API container. Costs are bounded by the AI rate-limiter bucket (per-user) on the server. |
| Stripe | Subscription billing for Base and Premium tiers, monthly and yearly | Six secrets in Secrets Manager: the secret key, the webhook secret, and four price IDs. Checkout and Customer Portal sessions are server-created; the app never touches a card. Tier state is driven by webhooks. |
| SES v2 | Password reset, welcome, email verification | SDK call from the API. AWS region inherited from AWS_REGION. Falls back to a stdout logger in dev. |
| Google Fonts | Inter, Source Serif 4, JetBrains Mono | Linked from every HTML entry-point with preconnect. |
| Lucide icons | Iconography in mocks and the SPA | UMD bundle in mocks; lucide-react in the SPA. |
| GitHub Actions | CI & deploy | Authenticates to AWS via the GitAppliedDeploy OIDC role. Builds two images, pushes to ECR, restarts EC2. |
| Chrome Web Store | Extension distribution | Manifest V3 zip, deterministic extension ID via pinned public key. |
8. Build, deploy & release
Images
Two Dockerfiles, one per service:
Dockerfile(API), multi-stage Go build (Go 1.26-alpine) producing a static binary, packaged on adebian:13-slimruntime that shipschromium,ca-certificates, fonts, andtini.tiniis non-optional:chromedpspawns chromium subprocesses that would otherwise pile up as zombies under the Go server.web/Dockerfile, multi-stage Node 25 build producing a static React bundle, served bynginx:1.29-alpinewith the SPA-fallback rewrite.
Deploy flow
- A push to
developtriggers a CI build via the GitAppliedDeploy OIDC role. - Both images are built for
linux/arm64(matchingt4g.*) and pushed to their ECR repos. ECR scan-on-push runs. - The
/deployworkflow opens a release branch (release-YYYYMMDD-<sha>) and an associated PR fromdevelopintomain; merging that PR is the production cutover. - EC2 instances are restarted, or their user-data is re-executed via
user_data_replace_on_change, which pulls the new image and writes a fresh.envfrom Secrets Manager.
Schema migrations
SQL files in internal/database/migrations are embedded into the API binary at build time. On startup, the API runs migrate up against the githired schema. There is no separate migration job; deploying the new API is the migration.
9. Identity, auth & tiers
Two credential paths
- JWT cookie, signed with the JWT secret from Secrets Manager, issued at login. Secure flag in production. Used by the SPA.
- Bearer API token, user-issued from Settings → Connections, stored hashed (
HashAPIToken) inapi_tokens. Used by the Chrome extension.last_used_atis touched asynchronously on every request.
Tiers
Three tiers: Free Base Premium. Tier is stored on the user row; trial state is a trial_ends_at timestamp. The effective tier is computed per request:
- Free + active trial → treated as Premium until expiry.
- Base → Base. Premium → Premium. Free + no trial → Free.
Features map to a minimum tier in internal/auth/features.go. The RequireFeature middleware looks up the effective tier and returns 402 if the gate fails. The list is mirrored in web/src/auth/features.ts so the UI can disable controls before the request goes out.
Email verification & password reset
Both flows use a single-use token persisted in Postgres with an expiry; the token is sent by SES with a deep link into the SPA, where the user posts back to /api/v1/auth/verify-email or /api/v1/auth/reset-password.
10. Observability & operations
Honest answer: minimal. The app relies on the free CloudWatch metrics AWS already publishes for ALB, EC2, and RDS, plus gin’s default request logging. There are no CloudWatch alarms in Terraform yet. That’s appropriate for “no users yet” and inappropriate the moment that changes.
The planned monitoring increments, in order:
- ALB access logs to S3, turn on before opening signups. Unlocks every retroactive “was something weird happening at 02:14?” question.
- ~10 CloudWatch alarms wired to SNS → email, ALB 5xx rate, ALB unhealthy host count, RDS free storage, RDS CPU, EC2 status check fails.
- Application metrics, per-route request duration, DB query time, error rates. CloudWatch Embedded Metric Format from the Go binary, no agent.
- External synthetic check on
/healthevery minute from outside the region, paging on two consecutive failures.
Break-glass access
SSM Managed Instance Core is attached to the EC2 role, so aws ssm start-session works without SSH keys for routine debugging. SSH keypairs created in the console exist as a fallback.
11. Scaling path
Up to the first thousand active users, we scale vertically: bump instance_type and db_instance_class. These are one-line changes in terraform.tfvars plus a brief restart.
After that, the order is:
- ASG at web/API tier, replace the single
aws_instance.{web,api}with launch templates + autoscaling groups. Target groups already exist; the wiring change is small. Trigger: sustained target response time p95 > 500ms for 15 min, or CPU > 70%. - Multi-AZ RDS, flip
multi_az = true. Cost roughly doubles. Trigger: first paying customers, or the first scheduled maintenance we can’t take an outage for. - Second NAT gateway, one per AZ. Trigger: we lose a deploy because one AZ had a NAT outage.
- CloudFront in front of the ALB, reuse the ACM cert, cache static asset paths, leave
/api/*pass-through. Trigger: regular non-US latency > 200ms p95, or ALB egress starts to dominate. - AWS WAF managed rules,
Common,KnownBadInputs,SQLi. Trigger: first credential-stuffing pattern in ALB access logs. - In-API LRU → read replica → ElastiCache, in that order, only if the DB is the bottleneck. Trigger: RDS CPU > 60% sustained, or specific endpoints have DB time dominating their p95.
- ECS on Fargate, the moment we’re running more than ~4 services, or rolling-back is costing real human time. We skip EKS unless there’s a concrete platform feature we need from it.
- Distributed rate limiter, today the token-bucket limiters are in-process; switch to Redis the moment we run more than one API replica.
12. Principles
The choices above are downstream of a small set of principles. Naming them keeps future decisions consistent.
The cost of an architecture isn’t its AWS bill, it’s the surface area you have to keep in your head.
- Boring beats clever. EC2 + Docker over ECS until we feel the pain. SES over a third-party email vendor. Postgres over a managed search index. Each “we should also…” gets a counter-question: what signal am I waiting for that says this is now worth the cost?
- Stripe is the source of truth for billing. The app stores Stripe IDs and reacts to webhook events; it does not mutate subscription tier directly. The card never touches our servers.
- One Go binary, many handlers. A microservice split is justified by independent scaling or independent ownership. We have neither, so we don’t pay for either.
- Security groups by reference, not by IP. If we add a worker, it joins the right SG and the rules just work.
- Rotation is a reboot. User-data renders
.envfrom Secrets Manager on every boot, so a credential rotation never requires a redeploy. - Mock UI first. Every user-facing change starts as a static mock in
web/mocks/on fixture data, then earns its schema and API. - Write the trigger down. Every “we deliberately didn’t build X” is paired with the signal that will change our minds. Otherwise “not yet” quietly becomes “never.”