SoundCloud recounts the end of the public API choke

SoundCloud recently announced that it has completed its An 8-year migration journey using the strangler pattern from a monolithic codebase to a full-fledged database Backend for the frontend (BFF), an architecture model launched by and at SoundCloud.

The announcement examines the SoundCloud team’s steps to a successful migration, lessons learned from the migration journey, and the benefits and risks of the Strangler model.

The motivation for SoundCloud to adopt the Strangler model dates back to 2014, when they noticed that their Rails application did not perform well when interacting with multiple microservices to serve user traffic. Therefore, the Strangler BFF public API was introduced, where it would intercept and augment public API responses by calling additional services as needed.

Architecture diagram of the BFF Strangler model – source: BFF @SoundCloud

This adoption of the Strangler model was driven more by immediate need than planning for a future without the public API monolith. As a result, the Soundcloud team continued development of their microservice API while leaving both Strangler and the original monolith largely unmaintained. As a result, a host of unwanted issues have arisen over time, code duplication, inconsistent API behaviors, and security risks. Motivated to deal with the situation, the SoundCloud team began migrating the Strangler to a full-fledged BFF in January 2020.

Evolution of the BFF public API – source: SoundCloud Developer Blog

In order to understand the scope of all migration work, the SoundCloud team added telemetry to understand which devices were still in use. Next, explicitly declare all known public API routes in the Strangler codebase. For all undeclared routes, a fallback to call the public API has been added. Also, the fallback was removed once the SoundCloud team was satisfied they had identified all routes. All routes that were not used and were not documented on the developer portal are removed. Finally, knowing all the endpoints that need to be ported, each will be migrated to call existing microservices instead of the public API.

The SoundCloud team is deploying the ported implementation with the old code that overrides the public API to reduce and avoid any unwanted changes to the public API. Incoming requests run both code paths – old (using the proxy) and new code. The response from the proxy call to the public API is returned to the caller. At the same time, the responses from the proxy and the new code are compared for consistency. If the responses from the old and new code do not match, a telemetry event is triggered and the difference is logged for developer inspection. The developer may then need to make some changes to the ported implementation until they are sure that the new code matches the original functionality. At this point the proxy can be removed and the ported response is returned.

Strangler is now a fully-fledged BFF, and the entire public API codebase has been removed. As a result, SoundCloud now has a codebase that most engineers can contribute to, that doesn’t negatively impact project scope, that matches our microservices architecture, and helps ensure consistency. and data security.

The Strangler model carried significant risks. The SoundCloud team suffered from a long fallow period with very little maintenance and plans for the public API, which caused an unhealthy code base, increased security risks and increased complexity for the development of features. Ultimately, when deciding whether or not to adopt the Strangler model, one must consider whether such disruptions to the business outweigh the ultimate benefits of the job.

Victor L. Jones