Conducting performance tests
for a leading eCommerce platform
in three weeks
eCommerce platforms must ensure a seamless experience for their customers – from the home page to the confirmation of order email. As the revenue of these platforms hinges on customer purchases, anything less than that can have a negative impact on revenue. Performance plays an important role in this context. If users are confronted with a slow or downed eCommerce checkout, for example, they might turn to competitors.
Conscious of the importance of this subject, a renowned eCommerce platform with a large user base in Europe, Africa and South America contacted spriteCloud to undertake performance tests. This company has a clear ambition for significant revenue increase through a market-leading eCommerce solution.
These performance tests included complex scripting, realistic workload model creation, designing of multiple test scenarios, test executions, monitoring as well as the analysis of the results and was executed in a record three weeks.
In this case study, we will explain how spriteCloud supported the client’s eCommerce solution expansion by performing a capacity planning exercise. To start we will look into the exact issue the client was facing, the strategy used on this project, the approach to the testing scenarios, and finally we will discuss what spriteCloud was able to discover.
‘ The spriteCloud team helped us regain confidence in our current eCommerce solution and has provided us with the insights needed to continue our customer onboarding plans.’
– Senior Product Architect
Identifying the impact of an architectural change
An architectural change was proposed for the client’s existing eCommerce solution. Multiple vendors were involved in the application’s development and infrastructure provisioning. However, nobody was sure about the impact of the change on the performance of the application.
The customer wanted to know the number of concurrent users that the platform could take before it broke or significant issues with frontend slowness, database slowness, CPU bottleneck and DB tables locking arose. Since the existing eCommerce solution was using Azure landscape, it was important to keep a close eye on cost optimization as well as the potential impact on scaling the environment up or down.
spriteCloud was chosen to find out how the architectural change would impact performance because of the experience and expertise of our engineers, who have carried out extensive Automation and Performance testing work for other eCommerce platforms. This project was the continuation of over seven years of partnering with this client and more than 40 projects done together. This gave the client full confidence that the performance test of their application was in safe hands.
One performance test engineer was assigned to the project. His mission was to get an understanding of the application’s complete architecture and of the various components of its landscape. The spriteCloud team brainstormed the best possible scenarios to test and prepared a performance test strategy. During the entire project, there was always a test manager involved when required. The test manager from spriteCloud presented the strategy to various stakeholders.
Test strategy and tooling
spriteCloud analysed the production workload from Google Analytics on peak days for different OpCo’s and created a similar workload model to load test the application. Octoperf was used as a load testing tool and the client already had NewRelic set up as an application performance monitoring tool.
spriteCloud proposed five test scenarios. It was suggested to put 150% load on the application and monitor it, as well as its system performance. Monitoring ranged from looking at the analytics, the CPU, the memory, the database transactions to checking the database logs. It was recommended to repeat this scenario with back-office load to measure the degradation in response time. Additionally, a stress test for the individual geographies (OpCo’s) and a combined stress test for different OpCo’s was performed to find the breaking point of the system.
spriteClouds testing strategy went as follow:
- The test started with 100% load on the system to verify the performance of the new architectural changes.
- If 100% load was successful, the load was increased to 150% and the performance was validated using pre-defined KPI’s.
- The same test was repeated with back-office load in parallel. Pre-defined KPI’s were examined to detect degradation.
- A stress test for single geography (OpCo) was performed to determine the breaking point of the application in terms of concurrent users.
- A stress test for all OpCo’s was carried out to find the breaking point of the application in terms of concurrent users.
- The front-end page load pattern was analyzed during load and stress tests to find out how the user experience was affected.
After rigorous testing, spriteCloud gathered all the data from the load testing tool and the Application Performance Monitoring (APM) tools to analyse the results. During the stress test, spriteCloud found the breaking point of the application, where its throughput got saturated with increasing load. Meaning the application was not able to process more requests after a certain time.
spriteCloud correlated the increase in response time with the exhausted CPU utilization of the application’s servers. The application was not able to handle more load as CPUs were utilized to full capacity.
The reason for the high CPU utilization shown in the graph was the excessive garbage collection: 35% of the CPU was consumed in garbage collection activity and only 65% of it was used for real application processing. We recommended changing the configuration for G1 GC, as this would help optimise the CPU utilization so that the CPU time could be spent on application processing instead of garbage collection. Additionally, spriteCloud pointed out a few slow running queries and java methods which could be potential bottlenecks under peak load.
As part of the test strategy, spriteCloud performed front-end performance tests as well in parallel with load and stress tests. Here we recommended some front-end optimizations, such as image or font compression and image resizing, for example.
In addition to assessing the impact of an architectural change on the performance of the client’s eCommerce platform, spriteCloud suggested turning the scripts into a continuous performance monitoring setup. This was possible thanks to the reusable scripts that spriteCloud created. Hereby we were able to run the tests on demand and compare performance results with previous (benchmarked) performance test results. These tests could be run after (each) deployment or scheduled so that changes in the performance of the platform are detected easily and early in the development pipeline.
A successful testing exercise
spriteCloud respected the tight project timeline and finished the whole performance testing exercise, which included complex scripting, realistic workload model creation, designing of multiple test scenarios, test executions, monitoring and result analysis, in a record three weeks. We gave a detailed technical result walkthrough to all the stakeholders. Issues related to CPU utilization, slow performance APIs, database queries, the maximum number of concurrent users and maximum orders that could be placed were explained.
The Developer team accepted the recommendations and the Management team was happy with the efforts spriteCloud put into testing various scenarios. Through this capacity planning exercise, spriteCloud gave the management confidence that the application, with its new architectural changes, could perform well with the current workload. This allowed the client to expand its eCommerce solution to serve twice the number of concurrent users than previously.
Besides that, spriteCloud transformed the scripts that were created for the performance tests into a continuous performance monitoring setup. This allowed the tests to be run on demand to easily spot changes in performance early on in the development pipeline.