Why the landing page randomly returns 500

Effect Healthcare — root-cause summary, May 2026

The symptom

On Azure App Service, the website occasionally returns 500 errors on pages that have nothing to do with master data — the homepage, sign-in, pricing. Restarting fixes it for a while. Then it comes back.

The root cause, in one paragraph

The Node.js process that runs the whole website has one thread for JavaScript. Every request — homepage, sign-in, pricing, master-data — shares that single thread. When a PWA client opens and starts pulling master data, it fires 16 endpoints in sequence, paginated. Each call runs through Payload's ORM, which holds the thread for hundreds of milliseconds to several seconds while it joins relations, runs hooks, and shapes the response. While the thread is busy with that work, every other request — including the landing page — sits waiting in a queue. If a request waits longer than Azure's gateway timeout, the user sees a 500.

Proof — one PWA is enough to break the homepage

Tested over real HTTP against a production build (pnpm build && pnpm start). One PWA client doing a real sync; meanwhile, a probe hits the homepage repeatedly.

ConditionHomepage avgHomepage peak
Idle (no PWA load)11 ms16 ms
1 PWA syncing2 160 ms4 939 ms
2 PWAs syncing in parallel4 059 ms11 892 ms

Sign-in and pricing degrade the same way — they all share the same thread. With two PWAs syncing, the homepage took almost 12 seconds to respond. Anything over Azure's gateway timeout becomes a 5xx.

What it is NOT

The fix

Replace the 16 paginated master-data endpoints with one cached bundle endpoint that returns everything in a single response, using direct database queries (no ORM in the hot path) and an in-memory cache invalidated by Payload hooks when admins edit data.

Net effect: instead of holding the Node thread for tens of seconds per PWA login, master-data work collapses to one short request — or a cache hit. The landing page stays responsive.

Already implemented on branch feature/master-data-bundle.