A quick guide to A/B testing your product discovery platform
Constructor is unique among product discovery platforms in that our foremost goal is not just speed or ‘relevance’ but rather driving the KPIs our customers care about most.
This means customers frequently roll out Constructor with an a/b test both to validate the lift improvements as well as to reduce the risk of switching 100% of traffic to a new solution before testing.
While a/b testing product discovery and search may seem simple, because of the extent of user interaction with the Constructor product in every corner of a site, needing to ensure equal UI between systems, and wanting to verify additional purchases and revenue are happening across devices, it is actually quite challenging to ensure an accurate and successful test.
This guide is intended to help customers design, implement, run and interpret a/b tests of the Constructor product (and any other product) more effectively.
Below is a list of topics to consider as you set up your test.
Provide Constructor with access to analytics
Differentiating ourselves from other product discovery vendors, we don’t count success when we sign a contract, or even when we’re live with a customer, but rather when we’re live with a meaningful lift to business KPIs.
For this reason, we ask any customer running an a/b test to provide access to their analytics system of record, whether this is Omniture (Adobe Analytics), Google Analytics, Amplitude or some other solution. We ask for this so that we’re looking at the same data in the same system, and there is no question about the source of truth for the test.
Constructor is a cutting-edge search, autosuggest, browse and recommendations solution, and while we provide analytics to help drive insights on improvements, we understand that our customers often want to use a third party tool to verify our lifts and are happy to oblige. Our motto, after all, is to make us prove it.
Access to your analytics allows us to:
- Help troubleshoot any issues that may arise during the test
- Share a common reference point in discussing test results
- Investigate outliers or inconsistencies in the data
Test the Constructor platform as a whole
It may seem more beneficial to test each piece of the Constructor platform in isolation (search, autosuggest, browse, recommendations separately). However, we recommend testing the full platform together for defining success metrics for a test, we recommend looking at overall metrics like revenue-per-visitor and conversion rate.
There are two reasons for this:
- First, each component learns from the others, so results with all product capabilities together will be greater than a single one in isolation.
- Second, improving conversion rate for users who touch, for example, search, means nothing if overall revenue decreases.
QA the implementation thoroughly
Because search, browse, recommendations, and autosuggest touch so many components of a site and so many systems, it is critical to QA the experience extensively.
Machine learning will help optimize the products and their ranking to each user, but it won’t improve metrics if there are broken links, UI problems like displaying incorrect prices, links to products that lead to error pages, etc. For the a/b test to be legitimate, it really has to be an apples to apples comparison.
Achieve data parity between analytics systems
Because machine learning solutions require user data to train and optimize results, it’s crucial to ensure parity in data between your analytics system and Constructor. During this phase, we will work to validate consistency in behavioral data between your analytics system of record. This will be a collaborative effort, but will include:
- Checking for items added to cart or purchased that are not present in the Constructor catalog.
- Validating the count of products with a given brand or facet is the same between Constructor and the incumbent solution.
- Ensuring key metrics like search, add to cart and purchase events are similar (within 1%) between variation.
Communicate test cell data
Send tracking calls for the test to Constructor in both cells, with a parameter for the cell a user experienced. This will allow Constructor to quickly identify any result issues or experience gaps. Work with your dedicated Constructor integration engineer on the value to send for your particular test.
Run an AA/BB test
The single best test practice to catch test setup issues, ensure valid results and help avoid the human tendency to interpret random fluctuations as meaningful signals is to implement an AA/BB test. Also known as a ‘split control,’ this allows you to look at two ostensibly identical experiences and validate that statistics are indeed the same between them.
An AA/BB test will allow you to identify the following:
- Is user assignment biased?
- Are a few ‘super-purchasers’ skewing revenue numbers in one cell or another?
- Are there tracking issues skewing results?
- Servers are in multiple places and there’s poor replication of data - customers could get load balanced to a different location and jump experiences.
Once you validate that A1 and A2, and B1 and B2 have broadly similar results you can continue with the test readout as you’d normally do. While each cell (A1 and A2, and B1 and B2) will have less data available, you can bucket these in common segments in most analytics tools, evaluating cell A (composed of A1 and A2 users) vs cell B (composed of B1 and B2 users). We highly, highly recommend this as it surfaces issues in test setup in more than half the tests we’ve seen.
Perform a burn-in validation test
Due to the complexity of integrating a product discovery platform, we recommend an initial shorter test of the solution to identify any configuration or integration errors not caught in the QA period, prior to the longer test of lift.
Examples of what would be identified in this test:
- Difference in error page numbers between cells.
- Browse requests that don’t correspond to valid categories or facets.
At the end of the day, the burn-in validation allows us to test the product discovery platform independent of integration errors.
Evaluate the data for outliers
When optimizing for metrics like revenue-per-visitor or total products purchased, it is important to identify and discount outlier users that would reduce the test’s statistical power. Certain verticals will see large orders or many orders from one individual user, sometimes purchasing to resell domestically or in international markets. This can mean 2 or 3 individuals in one cell could represent a few percent of the overall revenue in that cell and/or the test as a whole. When looking to evaluate an expected lift in overall revenue that can be in the single digit percentages, these outlier users can throw things off in a big way.
To account for this factor and maintain statistical validity of RPV figures, we recommend identifying and holding back the top purchasing 0.1% of users.
Adobe Test & Target DOES NOT automatically filter outliers. Optimizely generally does.
Ensure experience is consistent between cells
For a test to be valid, everything in the experience save for the result display should be identical.
Some examples of interface elements to check
- Are product images the same? Are they both on models or off? Are there badges in both?
- Are pricing and user reviews displayed consistently? Do both cells correctly display sale prices, or does one have the sale price and the other have the base price?
- Has the integration added a delay to Constructor results (these should return extremely fast)?
Constructor is committed to helping your business improve the metrics that matter most to you. By following the guidelines above, we can help ensure a successful and efficient outcome that proves the value of Constructor’s platform.