
Macy’s online sales has been growing at a blazingly fast rate since its roll out in 2000. During one of the holiday seasons, Macy’s website i.e. macys.com experienced serious issues. Macy’s technical team worked overtime to find the root cause behind the said issue but to no avail!
THE CHALLENGE
Identify & fix the root cause behind the hefty increase in number of sessions during peak season.

• One of the largest US Retailers
• 512 Stores across 43 States
• $25 Billion Revenue in 2019/$5 Billion Online
Observed trouble symptoms were caused by the database table space running short. That, in turn, was caused by a burst of Application Server Sessions increasing more than 10 folds from the time new software release was put in production just before the holiday period.
Macy’s load testing team, using LoadRunner, did not observe anything that indicated any issue with the software release rolled into production and had earlier reported the performance characteristics to be like the previous application software version. Problem was “patched” by increasing the table-space size and taking some hit to site responsiveness.
After failing to even reproduce the issue in lab, Several months later, Macy’s employed performance practice team from IBM Global Services (IGS). The IGS team used IBM’s Rational Performance Tester (RPT) with large number of load generators to generate production sized load but RPT could not scale to the desired size of the load. IGS than switched over to LoadRunner and conducted several tests. That also did not help in replicating production situation and hence no handle on the root cause.
NetStorm Cavisson Systems, Inc. Performance Testing Solution Finds the Root Cause where RPT & LoadRunner Failed
The Solution
NetStorm, using its advanced simulation features, simulated the production load model in the lab and helped recreate the production behaviors in the lab. Now, the team could see the hefty increase in session count with production release and the problem disappearing with the previous application software version.
Once the issue could be reproduced at will, it was much simpler to get to the root cause. Problem was identified and fixed within 12 hours that has been pending for more than 8 months.
NetStorm’s arrival rate modelling apart from other features help recreate the production issue and built-in monitoring help isolate the issue as the JSESSION were seen proportional to page views as opposed to virtual user sessions. Tracing user session suggested HTTP cookie was not being set by server resulting each page hit being viewed as a new session.
Next Year’s Holiday Readiness: The NetStom’s Difference
Production candidate software release was load tested independently by IGS Consulting Team using LoadRunner and by Macys.com Test Team using NetStorm.
To the surprise of Macy’s, test results produced by two teams were just opposite, with Loadrunner results predicting the production candidate release to be worse in performance on almost all counts including response time, system resource utilization, Java Garbage collection performance, etc. by about 30% compared to previous software release. NetStorm predicted just the opposite by indicating the production candidate software release to be 25% better compared to previous application software version.
After much discussion and based on better end to end reasonability of test data including NetStorm’s powerful deep insights into system under test, Macy’s team accepted the NetStorm test results & went into holiday season with the NetStorm recommended new software configuration.
To improve the system performance, IBM team advised to use lower size heap and more application instances (8 instances each with 1 GB heap, baseline was 4 instances with 2 GB heap) for the reasons of faster GC time owing to smaller heap. Cavisson suggested lower number of instances with higher heap size (2 instances with 4GB heap) as the application was using coherence cache and front cache inside application instance will be more effective (more cache hits) if more requests hit the same instance.
NetStom employed production access log reply modelling to ensure product distribution is production like. Monitoring production systems, during the holiday period, vindicated the NetStorm predictions of the production release to be much better compared to previous release.
NetStorm Proves Its Effectiveness, Again & Again
Several times, it happened that the test results produced by IGS team using LoadRunner were quite at variance with the test results produced by NetStorm. And on all occasions, production rollout proved the NetStorm predictions to be correct.
Macys.com performed much better during peak holiday traffic compared to just prior non-holiday period with previous software release and much smaller user traffic.
Customers using legacy solutions such as LoadRunner often gets misguided by the false predictions that are caused by the lack of creating production like load modeling and environment in the test lab. Furthermore, coupled with Cavisson solution’s issue identification capabilities, Macy’s completely switched to Cavisson solution.
WHY NETSTORM?
It has become critical for applications and systems (web application servers and web intermediaries such as firewalls, load balancers, security devices etc) that are likely to face the web traffic to test them for real-life conditions such as mentioned above.
NetStorm from Cavisson Systems, Inc. helps you answer vital questions that are essential for determining web application/system production readiness, such as follows:
- Will system handle, anticipated user load from real-world users?
- Will the Client Perceived Response Time be within reasonable bounds per SLAs (Service Level Agreements)?
- Will a proposed change in the system (such as adding index on a DB table column) improve the client perceived response time or system throughput in real-world situations?
- Will the system handle stress conditions (caused my momentarily increased user concurrency or network congestion) and quickly recover once overload condition goes away, effectively is the system resilient?
NetStorm is an extremely powerful load generator appliance that provides an accurate simulation of web environment including the Internet/intranet, web traffic pattern, web browser, web user behavior and enable web application and systems providers to do load testing, performance analysis and capacity tunings of their systems pro-actively.
A look at some industry defining features that are part of NetStorm’s extensive product capabilities:
- Massive, internet scale load generation: NetStorm makes testing with millions of active users not only possible, but easy to manage. NetStorm advance technology can simulate millions of concurrent users with a single server with no limit on additional Load Generators.
- Internet-TrueTM load modeling: Number of concurrent users “spike” dramatically, when responsiveness of the enterprise application or network degrades, as each user contributes more processing time on the This results in an avalanche of sessions and transaction loads. To enable modeling such scenarios, NetStorm’s design of test scenarios are based on user arrival rate.
- Production Scenario Recreation: NetStorm has ability to recreate production scenario, using access log replay, to: Mimic the real-life production traffic to test applications with a more practical approach, or Recreate production issues for validation of fix.
- Baseline Tracking : With baseline metric graph in the background with current execution graphs tracking them help quickly identify if current executing tests are too far from golden baseline and thus quickly try to get to the bottom of Looking at the system, after the test is over may make the whole test invalid (for example, an unwanted batch job).
- In Moment Data Collection: NetStorm not only collect a lot of client side metrices but also serve side data so as to help identify the root cause with auto diagnosis. In Moment data collection capability allow NetStorm to collect deeper data on detecting certain Such as, collecting stack traces of java instances on detecting idle threads approaching Zero. As, Stack traces can consume a lot of disk space, just keep collecting stack traces may not be practically feasible. But collecting at “interesting times” can not only be smart usage of disk space but can prove to quite valuable in resolving issues.
- Goal based scenario design: With NetStorm, test scenarios can be designed using causal criteria (e.g. number of concurrent users) or symptomatically (e.g. load the system until 95 percentile of response times are less than 300 ms) or the server CPU is less than 80% busy.
GET IN TOUCH WITH US
to witness how we are redefining Application Performance Management with our Innovative & Un-Paralleled Approach.