RetailMeNot, part of Ziff Davis, Inc, makes everyday life more affordable for shoppers. They are the leading savings destination providing online and in-store coupons and cashback offers. RetailMeNot serves millions of monthly active users from their desktop, mobile web, native (iOS & Android) apps, and browser extension (Deal Finder™) experiences.
Modernizing their stack by adopting GraphQL
In 2019, RetailMeNot began a project to modernize its technology stack to be better prepared to serve future experiences and grow company revenue. Kartik Kumar Gujarati is a Senior Software Engineer at RetailMeNot, and he describes this evolution on RMN’s engineering blog:
“For the last 10 years, RetailMeNot’s engineering teams have built several highly performant, scalable, and efficient systems to bring savings data to our users through our experiences. And like many companies, RetailMeNot used REST APIs to serve the data. However, as the company and the systems grew, we started to experience problems like versioning, over-fetching, and under-fetching with REST APIs. Ultimately, this translated to performance limitations.”
RetailMeNot began with a monolithic GraphQL API as a solution to power their web and native app experiences, along with a new browser extension, Deal Finder™. They had a single team responsible for governance and standards for this graph. Client teams were given access to contribute to the monolithic graph, and the central graph team would review their contributions.
The monolithic graph turns into a bottleneck
However, over time the monolithic graph began to be a bottleneck for RetailMeNot. The API team found themselves getting overwhelmed with all of the contributions. Hannah Shin, a Senior Software Engineer on the API team, describes the challenge,
“My team was in charge of maintaining that monolith. But we had many different client teams working on it. We started to become a bottleneck trying to coordinate different release cadences, conflicting features, and the desire to ensure that features were tested properly.”
Hannah’s team was also responsible for maintaining core data sources and their event-driven architecture. However, her team was bogged down with code reviews.
“Our team of three engineers spent around 75% of our time reviewing code changes to our GraphQL monolith, which left us very little time to innovate on our backend platform.”
Adopting a supergraph with Apollo Federation
RetailMeNot realized their monolith was no longer scaling, and they wanted a solution that could work well for the growing number of teams building on top of their graph. They chose Apollo Federation because it allowed them to empower each subgraph team to build and maintain their portion of the unified supergraph schema. As Hannah puts it,
“We wanted a way for the teams to not be tightly coupled. Having maintained our monolith, we saw so many inconsistencies and inefficiencies in our data structures. For example, we had a web offer card, then an app offer card, and another type of offer card. We believed that having the shared graph and consolidated ownership of shared types, would help us as an organization better understand and model our data.”
The process to migrate from their monolith was incremental. Kartik describes this process in depth in his blog post:
“Here are the incremental steps that we took for this migration:
- First, we transformed the existing GraphQL schema in the monolithic service to support federation specifications. This allowed us to support both the federation spec and schema stitching in the same service. Here are the open-source libraries that provide support for federation spec.
- We then set up a new Gateway service that simply forwarded the traffic to the monolithic service.
- Using a weighted routing technique, we then controlled the amount of traffic that would hit the new Gateway service vs the monolithic service. Once we were confident with the changes and validations in our lower environments (stage/pre-production), we then switched over to 100% of our traffic going through the Gateway service in the production. At this point, our monolithic service was completely behind the gateway.
- Finally, we started to break apart the schema in our monolithic service (which now became a subgraph) and migrated the entities over to more cohesive and smaller subgraph services.”
The supergraph: a faster and safer way to ship products
After adopting Apollo Federation and Apollo Studio, RetailMeNot began to see immediate benefits. By automating their schema reviews and deployments with Apollo Studio, the API team at RetailMeNot no longer spent the majority of their time reviewing code.
Hannah Shin said, “After adopting managed federation and schema checks, we went from three engineers spending 75% of their time reviewing code to less than 10%.”
Their platform team could focus on innovating. Instead of doing code reviews, they focused on establishing best practices for contributing to the graph. They began regular education sessions for developers, engineering leaders, and the product team on GraphQL benefits. Over time, the discussions pivoted to focus on improving their supergraph. They invested more in observability along with templatizing experiences and content distribution.
“It’s been over a year since we’ve had any breaking changes. Prior to adopting Apollo, we had breaking changes as frequently as every month. We once took down our mobile home page for six hours.”
Hannah Shin Senior Software Engineer, RetailMeNot
RetailMeNot has also seen a significant increase in reliability after migrating to their supergraph. Hannah says, “It’s been over a year since we’ve had any breaking changes. Prior to adopting Apollo, we had breaking changes as frequently as every month. We once took down our mobile home page for six hours.”
“Working in monolith means that you have to be very careful about your changes, moving to a subgraph architecture allows you to move much faster. We are able to get features out of the door 40% faster since we migrated to Apollo Federation.”
Kartik Kumar Gujarati Senior Software Engineer at RetailMeNot
The days of painful rollbacks and war rooms have been replaced with much more confidence in their GraphQL release process. As a result, the team saw a significant improvement in developer velocity from their GraphQL monolith to their supergraph. Kartik says, “Working in monolith means that you have to be very careful about your changes, moving to a subgraph architecture allows you to move much faster. We are able to get features out of the door 40% faster since we migrated to Apollo Federation.”
Adopting a supergraph has empowered RetailMeNot to continuously onboard new teams and services. RetailMeNot plans to innovate by continuing to modularize its architecture. Currently, they are working on building out a templating system that is integrated into their content management system. Creating new pages typically took their operations team up to 1 month. It required writing custom feature code to be replaced with a template-based approach. Soon their operations team will be able to create new pages and experiences self-service without having to request new capabilities. Longer-term, this will empower them to change the template and have it propagate to all of their different use cases and make experimentation much more effortless.
As the RetailMeNot engineering team continues to scale their supergraph, their focus is on helping educate and onboard new teams. Summing up the benefits of moving to the supergraph, Karthik calls out the following benefits in his blog post:
- “Faster product iteration: The user-facing application teams can move much faster as the bottleneck from a single “GraphQL API team” is removed.
- Concern-based separation: Moving to a federated architecture enabled different teams to work on different product areas without affecting/blocking each other while contributing to a single graph.
- Similar tech-stack across services: Moving to a federated architecture helped us standardize the tech-stack across multiple services at RetailMeNot. This also led to high collaboration across teams.
- Developer experience: With standardized tooling and a common tech stack, developer experience has been improved a lot.”
Want to learn more about how RetailMeNot made the switch to the supergraph? Watch this webinar discussing their journey.