Skip to main content

Playbook

What we actually produce, in plain English.

The most honest way to judge how a team thinks is to read what they write. The samples below are fictional but representative of the documents you would receive on a real engagement: a project audit memo, an architecture decision record, a weekly status update, and a handoff checklist. None of them reference a real client.

Sample 1 of 4

Project audit memo

Delivered after roughly one week of audit work, this is the document a careful CTO uses to decide whether to repair, stabilize, or rebuild.

Sample · fictional

MEMO  ·  Audit of inherited customer portal  ·  Prepared for: Director of Engineering, [Client]  ·  Prepared by: AppStartDev  ·  Day 8 of 8


Executive summary

The portal is salvageable. The current production build is functional for roughly 70% of customer flows but carries three serious risks that should be addressed in the next 30 days regardless of longer-term direction. We recommend a stabilization sprint, then an incremental rebuild of the order workflow over a 12 to 16 week horizon. A full rewrite is not justified.

Top three risks

  1. Authentication is one library version away from a known CVE. The auth layer relies on a deprecated JWT library. Upgrade is non-trivial because token-shape assumptions are baked into multiple services. Severity: high. Effort: 1 to 2 weeks.
  2. No backups of the production database have been verified in 4 months. Backups appear to be running, but a restore has never been tested. Severity: high. Effort: 2 to 3 days to verify, longer to remediate if backups are actually broken.
  3. A single developer has the only deploy key. If that developer leaves or loses their laptop, releases stop. Severity: medium-high. Effort: 1 day to rotate and document.

By area

  • Codebase. React 18 frontend, Node.js + Express API, Postgres. Code style is inconsistent across modules but generally readable. Test coverage is 12% and concentrated in non-critical utilities. Critical paths (checkout, auth) have no automated tests.
  • Hosting. AWS, three EC2 instances behind an ALB, RDS Postgres single-AZ. Cost is reasonable for current load. No autoscaling. No disaster recovery plan documented.
  • Deployment.Manual SSH plus a shell script. CI runs lint only, not tests. Releases happen ad hoc, on the developer's schedule. No staging environment.
  • Backlog. 247 open tickets, of which roughly 30 are duplicates and 80 are stale (no activity in 9+ months). The 60 most recent tickets describe real customer pain, mostly in the order workflow.
  • Operational health. No error tracking. Logs are in CloudWatch, retained 7 days. No alerting on production errors. Support tickets are the de facto monitoring system.

Recommendation

Path A (recommended): Stabilize, then rebuild the order workflow incrementally. 4 weeks of stabilization (auth upgrade, backup verification, deploy hardening, error tracking, basic CI). Then 12 weeks rebuilding the order workflow as a new service while the legacy portal continues to serve other flows. Customers stay live throughout. Estimated total effort: 16 weeks of one full team.

Path B (not recommended): Full rewrite. 6 to 9 months. The codebase is not bad enough to justify it, and a full rewrite forfeits two years of accumulated business logic and edge-case handling that the current code has internalized.

Path C (only if speed matters more than scope): Stabilize only. 4 weeks. Reduces risk but does not address the order workflow pain that drives the most support tickets.

Quick wins for the first two weeks

  • Rotate the single deploy key, document the deploy process, give two engineers access (1 day)
  • Verify backup restore on a sandbox database, document the result (2 days)
  • Wire Sentry or equivalent to the API; alert on 5xx error rate (half a day)
  • Add a staging environment and a smoke test before each deploy (3 days)

Whether we are the right team

We are a fit for Path A. We have built and stabilized similar Node.js + React portals. If you choose Path B, we would recommend a partner with deeper experience in your specific domain. We are not the right team for Path C alone, since stabilization without further investment usually returns to the same place within a year.

Sample 2 of 4

Architecture decision record

One ADR per non-trivial decision, lives in the repo. Reads like a memo, not a debate. Captures the alternatives that were considered, so the next person on the project does not relitigate decided questions.

Sample · fictional

ADR-0014  ·  Database for the order service  ·  Status: Accepted  ·  Date: 2026-04-12  ·  Authors: J. Patel, M. Chen


Context

The new order service needs a primary data store. Expected load is 200 to 500 orders per minute peak, with strict requirements around order ledger immutability and a need to query orders by customer, date range, and status. The team has Postgres expertise. The existing portal also uses Postgres.

Decision

Use Postgres 16 on RDS with a single-writer multi-reader topology. Use logical replication to feed an analytics replica. Append-only ledger table for order events, with derived state in a normalized orders table.

Alternatives considered

  • DynamoDB. Strong on horizontal scale and predictable latency. Rejected because flexible queries (date range, status filter, customer history) would require many secondary indexes or downstream sync to a query store. The added complexity outweighs the scale headroom we do not currently need.
  • CockroachDB. Postgres-compatible with horizontal scale. Rejected because it adds operational complexity, vendor lock-in, and a learning curve for a team that does not yet need its scale ceiling. We can revisit if order volume grows 10x.
  • MongoDB. Considered briefly. Rejected because order ledger semantics are naturally relational and the team has no Mongo experience.

Consequences

  • Single-writer Postgres becomes a vertical-scale boundary. We will hit its ceiling around 5x current load. ADR will be revisited at that point.
  • Backups, point-in-time recovery, and observability are mature in Postgres. Less custom tooling required.
  • The order ledger pattern (append-only events plus derived state) gives auditability for free and makes downstream analytics easier.
  • Consistency guarantees across services will be eventual, not transactional, since the order service does not share transactions with billing or inventory.

When to revisit

When sustained write load exceeds 1,500 orders per minute, when a multi-region active-active deployment becomes a hard requirement, or when the team finds itself building scaling workarounds that no longer feel cheaper than a different store.

Sample 3 of 4

Weekly status update

Sent every Friday by the engagement lead. Designed to be read in 90 seconds. Replaces, rather than supplements, a status meeting.

Sample · fictional

STATUS  ·  Week 7 of 16  ·  Order workflow rebuild  ·  Sent: Friday, 17:30


TL;DR

On track. Order creation is feature-complete in staging. Two risks moved from yellow to green. One new risk on payment provider sandbox stability. We need a decision from product on tax line item formatting by Wednesday.

Shipped this week

  • Order creation API: full happy path plus 12 of 14 error cases. Behind a feature flag in staging.
  • Inventory reservation flow: integrated and tested with the existing inventory service.
  • Sentry wired to the order service. First production-like errors caught and triaged in staging.

In flight next week

  • Remaining 2 error cases on order creation (concurrent reservation collision, payment timeout)
  • Begin order modification API (cancellations and partial refunds)
  • Load test against staging with synthetic traffic at 1.5x peak load

Decisions needed

  • Tax line item formatting on receipts. Two valid options. Product owner: M. Reyes. Need by Wednesday EOD or it slips into week 9.

Risk register changes

  • resolved Auth library upgrade landed. CVE risk closed.
  • green Backup restore verified on sandbox. Documented runbook added.
  • new Payment provider sandbox returned 503s for ~20 minutes Tuesday. Vendor confirmed. Adding retry-with-backoff to payment client.

Blockers

None this week.

Looking ahead

Week 8 ends with the order workflow ready for an internal beta. We need a list of internal users and a feedback intake plan from product by Tuesday.

Sample 4 of 4

Handoff checklist

Walked together at the end of an engagement, regardless of whether the next phase continues with us or moves to your team. Designed so a new engineer could pick up the project on Monday morning without us in the room.

Sample · fictional

HANDOFF  ·  Order workflow rebuild  ·  Engagement end: Week 16  ·  Walked: Tuesday, 90 minutes


Repository and source

  • Repo handed to client GitHub org. AppStartDev access removed by [date].
  • Branch protection rules documented. Main requires PR + 1 review + CI green.
  • README in each service explains purpose, runbook entry point, and primary maintainer.
  • CONTRIBUTING.md describes commit style, PR template, and review expectations.

Infrastructure and access

  • AWS account access transferred. AppStartDev IAM users removed.
  • Terraform state stored in client S3 bucket. Walked through plan / apply with two engineers.
  • Production deploy keys rotated. Two client engineers can deploy.
  • Domain ownership and DNS records confirmed in client registrar.

Vendors and credentials

  • Sentry, Resend, Stripe, payment provider, observability vendor: account ownership confirmed.
  • API keys rotated where AppStartDev had access. New keys live in client's secret store.
  • Vendor inventory document updated with pricing tier, renewal date, and primary contact.

Operations

  • Runbooks for the top 8 production issues live in the repo.
  • On-call rotation handed to client engineering lead.
  • Alerting routes documented. Sentry, uptime checker, payment failures all alert to client's pager.
  • Logs and metrics dashboards bookmarked in shared docs.

Documentation

  • 14 ADRs live in /docs/decisions, dated and indexed.
  • Architecture overview updated. Includes service diagram, data flow, and integration map.
  • Onboarding guide for a new engineer joining the project. Walked through with a volunteer engineer who confirmed it works end to end.

Known issues

  • Payment provider sandbox occasionally returns 503s. Retry-with-backoff is in place; production is unaffected.
  • Order modification API for partial refunds works but is rate-limited by the provider. Higher rate tier is on file pending approval.
  • Two non-blocking lint warnings in the inventory client. Tracked in ticket [#].

Recommended next steps

  • Add chaos test for payment provider failover (effort: 1 to 2 weeks).
  • Move analytics replica to a dedicated read instance once analytics load grows.
  • Plan a quarterly dependency upgrade pass. First one due in 60 days.

Sign-off

Client engineering lead and AppStartDev engagement lead both signed the handoff. Final invoice issued. Support window through [date] covers questions, not new work.

Want to see what one of these would look like for your project?

Tell us what you are building, fixing, or stuck on. If an audit is the right starting point, we will scope one. If it is not, we will say so.