There have been many high profile cases of outages occurring across major websites. E-commerce sites can have many individual moving pieces - content management, product management, search capabilities, payment, fulfilment, account management, fraud detection, and a whole host of other 3rd party systems that are combined in order to create a seamless experience for their customers. If any one of these components fails it can spell disaster for the end user, who will readily get frustrated and take their money elsewhere.
Additionally, some of our clients transact > 50% of their online business through sales promotions during the holiday period. This means that there is a lot of pressure placed on business and technology teams to ensure successful outcomes during this time. It all adds up to a lot of stress, but it needn’t be with strong upfront planning.
We get involved in helping several of our clients with peak preparation, and the following is a 10 step brief guide of areas to focus on:
Ensure your monitoring and alerting systems are fully set up and tested. Create a set of core dashboards and ensure staff know how to read and act upon this data.
Example systems that you want to make sure are in place include Splunk, Splunk Synthetics, Akamai mPulse, New Relic, Dynatrace and PagerDuty.
Load test your websites based on your peak traffic projections. Add a generous additional allowance for contingencies (the cost of failure is greater than the cost of redundant resources).
Load tests models should be determined based on traffic patterns (get these from your analytics data) and business forecasts. In addition, they should also include AT LEAST a 20% contingency.
Load testing should also be re-run if any code changes are made, prior to your peak event. Even more ideally would be to introduce these into your CI/CD pipeline if you run one.
Optimize your frontend performance to the best of your team’s ability. 80% of performance issues are on the frontend, so start there.
Disable any 3rd parties not critical to the journey (e.g. A/B testing, social media etc). Work with key 3rd party vendors to ensure that they will be able to cope with your traffic levels. Keep them informed of what to expect during your peak event.
This is especially important for 3rd parties that are required for the user to complete their journey and also any 3rd party that is placed in the critical rendering path (especially if it is blocking page render). To improve performance, we recommend temporarily removing anything that is not 100% required.
Put a waiting room in place so that you can manage a sudden surge of traffic (as a failover) and have a clear, documented plan about when and how it will be deployed.
The splash page should contain a clear message to the user. Ideally if they are in a holding pattern a timer can be displayed that will automatically refresh to the full website experience once they have completed their holding time. Examples of technology that covers this include Akamai Visitor Prioritization and Queue IT.
Temporarily Increase CDN caching to offload more of your traffic to your CDN.
Offloading more traffic to your CDN helps to ensure the health of your web, application and database servers. This also helps to make your site more scalable and less prone to issues during peak traffic events. During peak traffic times the traffic can often surge for a short period, and this can temporarily overwhelm your core infrastructure.
Ensure accountability with your team. Have a backup resource in place for all key roles should anyone not be available. Create a culture of knowledge sharing between these people and provide clear communication channels that are open during your Peak Traffic Event.
Our clients achieve this by working with clients to create a common incidence response plan, and to put people in place (using a roster) and a list of technology and business SMEs, with people flagged as on-call or in active monitoring mode. Making use of online bridges with multiple shared screens means that this can be managed over a Zoom call – especially important right now.
Introduce redundancy into your systems, and be able to failover quickly should you need to. Having multiple payment providers, multiple CDN and multiple DNS providers is a good idea.
We start by creating a list of critical infrastructure, and then working with SMEs in each of the technology areas to create a redundancy and a disaster recovery plan.
Consider simplifying or temporarily disabling parts of your site that are computationally expensive.
Endless scrolling creates lots of database queries. Disabling technologies like this can reduce traffic. Similarly with search. We also recommend disabling any batch jobs that are not critical to taking orders. We’ve seen websites being taken out because batch jobs are being triggered during the end of a peak event.
If you do have an outage, ensure you have communication channels open with your customers, and a communications plan in place. Capture user details so they can be emailed to resume their session once the outage has cleared.
We hope this has given you some good tips to get started with your Cyber Week and Peak Event planning. Don’t forget to check out the various services we offer and please reach out to us if you have any questions or would like to discuss your particular issues with us.