Sample · fictional
MEMO · Audit of inherited customer portal · Prepared for: Director of Engineering, [Client] · Prepared by: AppStartDev · Day 8 of 8
Executive summary
The portal is salvageable. The current production build is functional for roughly 70% of customer flows but carries three serious risks that should be addressed in the next 30 days regardless of longer-term direction. We recommend a stabilization sprint, then an incremental rebuild of the order workflow over a 12 to 16 week horizon. A full rewrite is not justified.
Top three risks
- Authentication is one library version away from a known CVE. The auth layer relies on a deprecated JWT library. Upgrade is non-trivial because token-shape assumptions are baked into multiple services. Severity: high. Effort: 1 to 2 weeks.
- No backups of the production database have been verified in 4 months. Backups appear to be running, but a restore has never been tested. Severity: high. Effort: 2 to 3 days to verify, longer to remediate if backups are actually broken.
- A single developer has the only deploy key. If that developer leaves or loses their laptop, releases stop. Severity: medium-high. Effort: 1 day to rotate and document.
By area
- Codebase. React 18 frontend, Node.js + Express API, Postgres. Code style is inconsistent across modules but generally readable. Test coverage is 12% and concentrated in non-critical utilities. Critical paths (checkout, auth) have no automated tests.
- Hosting. AWS, three EC2 instances behind an ALB, RDS Postgres single-AZ. Cost is reasonable for current load. No autoscaling. No disaster recovery plan documented.
- Deployment.Manual SSH plus a shell script. CI runs lint only, not tests. Releases happen ad hoc, on the developer's schedule. No staging environment.
- Backlog. 247 open tickets, of which roughly 30 are duplicates and 80 are stale (no activity in 9+ months). The 60 most recent tickets describe real customer pain, mostly in the order workflow.
- Operational health. No error tracking. Logs are in CloudWatch, retained 7 days. No alerting on production errors. Support tickets are the de facto monitoring system.
Recommendation
Path A (recommended): Stabilize, then rebuild the order workflow incrementally. 4 weeks of stabilization (auth upgrade, backup verification, deploy hardening, error tracking, basic CI). Then 12 weeks rebuilding the order workflow as a new service while the legacy portal continues to serve other flows. Customers stay live throughout. Estimated total effort: 16 weeks of one full team.
Path B (not recommended): Full rewrite. 6 to 9 months. The codebase is not bad enough to justify it, and a full rewrite forfeits two years of accumulated business logic and edge-case handling that the current code has internalized.
Path C (only if speed matters more than scope): Stabilize only. 4 weeks. Reduces risk but does not address the order workflow pain that drives the most support tickets.
Quick wins for the first two weeks
- Rotate the single deploy key, document the deploy process, give two engineers access (1 day)
- Verify backup restore on a sandbox database, document the result (2 days)
- Wire Sentry or equivalent to the API; alert on 5xx error rate (half a day)
- Add a staging environment and a smoke test before each deploy (3 days)
Whether we are the right team
We are a fit for Path A. We have built and stabilized similar Node.js + React portals. If you choose Path B, we would recommend a partner with deeper experience in your specific domain. We are not the right team for Path C alone, since stabilization without further investment usually returns to the same place within a year.