A dead hostname killing every login

Listen to this articleAI narration

0:00 / 0:00

User can't log in. The browser console shows two POST requests to the login endpoint - the first returns a 401 with "Invalid credentials", the second returns a 500 Internal Server Error. The 401 is expected for a wrong password attempt. The 500 is the problem.

The twist: authentication was actually succeeding. The user's password was correct. The 500 happened after the password check, during a legacy data import that ran synchronously inside the login response.

What the Browser Shows

The frontend API client maps a 401 to an "Invalid credentials" field error and a 500 to a generic toast pulled from the Hydra error response. The handling is correct - the problem was entirely server-side.

CloudWatch Investigation

The platform runs on AWS Lambda in eu-north-1. First step was pulling the production API logs from CloudWatch, filtering by the affected user's email address. The structured JSON logs told the full story in chronological order.

At 08:59:55, the route matched and the login request came in. One second later, Symfony's JsonLoginAuthenticator verified the credentials successfully. The user had two roles: ROLE_USER and ROLE_LEGACY_USER - that second role turned out to be key.

Immediately after authentication, the Symfony Messenger dispatched an ImportLegacyUserData command using the sync transport. Not queued to SQS. Not handled by a background worker. Synchronous, inline, inside the HTTP request.

The handler tried to make an HTTP GET request to app.[REDACTED].se to fetch migration data for the user. DNS couldn't resolve the host. A TransportException was thrown, wrapped into a HandlerFailedException by the DoctrineTransactionMiddleware, and surfaced as a 500 Internal Server Error.

The host app.[REDACTED].se no longer exists. The legacy platform was decommissioned. But the import code was still running on every login for users flagged as legacy.

The Kill Chain

The Code Path

Two event listeners fire on the authentication success event. The first, AddUserDetailsListener, serializes the user data into the JWT response using Symfony's normalizer. Works fine.

The second is the problem. LegacyUserLogInListener checks if the authenticated user should be imported from the legacy platform. If the user is a legacy user and import hasn't started yet, it dispatches an ImportLegacyUserData command to the message bus.

The shouldBeImported check on the User entity returns true when two conditions are met: the imported boolean is true, meaning the user was migrated from the old platform, and importStartedAt is null, meaning the import process hasn't been initiated yet.

The messenger routing config explicitly maps ImportLegacyUserData to the sync transport. That's the fatal decision. Every other command in the config - Ping, CreditInvoice, GeneratePayslipPdf, and a dozen others - routes to the workqueue transport backed by SQS. ImportLegacyUserData was the sole exception.

The sync transport means the handler runs inline, inside the HTTP request. The handler calls the Legacy[REDACTED]Client to fetch user export data from the legacy platform. The client makes a GET request to a path on app.[REDACTED].se, authenticated with a self-signed JWT token.

The client has error handling - it wraps the HTTP call in a try-catch, logs any failures as critical, and re-throws wrapped in a Legacy[REDACTED]Exception. In an async context, that re-throw would be caught by the messenger retry system: retried four times with exponential backoff, then moved to the failed transport. In a synchronous context inside an HTTP request, that re-throw propagates through the DoctrineTransactionMiddleware, gets wrapped in a HandlerFailedException, and kills the response with a 500.

The base URL points to app.[REDACTED].se, configured via the [REDACTED]_API_ENDPOINT environment variable and a scoped HTTP client in http_client.yaml. The domain doesn't resolve. DNS fails instantly. The entire chain collapses within the same second that authentication succeeded.

Why Only Some Users

Not every user hits this. The imported flag is only true for users that were bulk-imported from the legacy platform via a CSV migration. The ROLE_LEGACY_USER role gets dynamically added in the User entity's getRoles method when imported is true. Regular users who signed up natively don't have this flag and never trigger the listener.

This explains why the bug went unnoticed. Most users sign up natively and log in without issues. Only legacy users - migrated from the old system and not yet fully imported - trigger the listener. Without knowing the flag existed, the login endpoint looked intermittently broken.

The Impersonation Path

The logs also revealed a parallel failure. An admin was trying to impersonate the affected user through the admin panel, hitting the impersonate endpoint. That flow also triggers the ImportLegacyUserData dispatch. Same dead host, same TransportException, same 500 - but from the admin side.

The admin retried multiple times within a few minutes. Every attempt produced the same critical log entry: "Failed to fetch user from legacy app" with the TransportException pointing at the unresolvable hostname. The impersonation endpoint was just as broken as the login.

Secondary Finding: Database Permissions

While filtering through CloudWatch, a separate issue surfaced for a different user entirely. A Doctrine DriverException was being thrown with an insufficient privilege error on the expense and invoice_payment tables. The application's PostgreSQL user lacked SELECT privileges on those tables, causing 500 errors on the invoices listing and the unpaid sum calculation.

Different root cause, different user, but surfaced during the same log investigation. Likely a migration or database role change that didn't propagate correctly to the application's database user.

The Fix

The legacy platform is gone. The domain doesn't resolve. Every feature that depends on it is already broken.

The removal scope covers the full dependency tree. The LegacyUserLogInListener that triggers imports on login. The ImportLegacyUserData and ImportLegacyUserBankDetails commands and their handlers. The UpdateLegacyUserRecipientData command and handler for recipient re-imports. The entire Legacy[REDACTED]Client service along with all its models and exception classes in the Legacy[REDACTED] directory. Three console commands: ImportLegacy[REDACTED]Users for bulk CSV import, ReimportLegacyUserRecipientsCommand, and the GetLegacy[REDACTED]Command debug tool. The DownloadLegacyAssetController for PDF downloads from the dead host. And the LegacyUserInfo API resource with its state provider.

Configuration changes include removing the ImportLegacyUserData and UpdateLegacyUserRecipientData routing from messenger.yaml, removing the Legacy[REDACTED]Client service definition from services.yaml, removing the scoped http_client.[REDACTED] configuration from http_client.yaml, and cleaning up the shouldBeImported method and importStartedAt field from the User entity. The MondayBoardSelector also needs adjusting since it checks isImported for board routing.

The imported boolean and ROLE_LEGACY_USER role stay for now. They're data - they tell us which users came from the legacy platform, and the Monday.com integration uses that for board selection. Removing data is a separate conversation.

Messenger Transport: The Real Lesson

The root cause wasn't that app.[REDACTED].se went down. Hosts go down. The root cause was that a non-critical background operation was configured to run synchronously inside the login HTTP request.

Every other command in the messenger config uses the workqueue transport backed by SQS. ImportLegacyUserData was the sole exception, routed to the sync transport. If it had been async, the login would have succeeded and the import would have failed silently in the background worker - logged, retried with exponential backoff, and eventually moved to the failed transport after four attempts.

The Legacy[REDACTED]Client already had proper error handling. It catches any throwable, logs it as critical with the full exception context, then re-throws wrapped in a Legacy[REDACTED]Exception. That's correct behavior for an async handler - the messenger retry system catches the re-throw and handles retries. In a synchronous context inside an HTTP request, that same re-throw just adds a log line before crashing the response.

The error handling was designed for the right context. The transport configuration put it in the wrong one.

Takeaways

First, synchronous message handlers in HTTP requests are a liability. If it can fail - and anything involving network calls can fail - it should be async. The login endpoint is the worst possible place for a synchronous side effect.

Second, decommissioned services leave landmines. The legacy platform was shut down, but nobody audited what still depended on it. A recursive grep for the old hostname across the codebase would have flagged every reference.

Third, CloudWatch told the entire story. The structured JSON logging with timestamps, levels, channels, and full exception context made diagnosis fast. The log entries showed authentication succeeding, the sync dispatch happening, the DNS failure, and the exception propagation - all within the same second.

Fourth, the imported flag created an invisible user segmentation. Native users logged in fine. Legacy users hit the 500. Without knowing the flag existed, you'd think the login endpoint was randomly broken.

Fifth, catch-and-rethrow is only useful if something upstream catches the rethrow. The client's error handling was correct for an async context. In a sync context, it just logged before crashing.