Archive for November 2009
30
Cyber Monday verdict is in: Kohls having serious trouble keeping up
No comments · Posted by Patrick Lightbody in Industry News
Using our own browser-based monitoring service, we’ve been watching some of the top retailers this morning to see how they fare during the Cyber Monday onslaught as US shoppers return to work after the Thanksgiving holiday. Of the sites that we’ve been monitoring, very few have had any problems so far.
While Amazon, Kmart, Macy’s, Walmart, and Best Buy all experienced minor site performance problems, only Kohls so far as outright failed. Starting around 7AM PST their home page response time went from 2.5 seconds to, in some cases, over 25 seconds (a 10X slowdown). At one point it was completely offline.
Check out some of the screenshots our monitoring service captured:



No tags
24
Optimizing WordPress performance by disabling plugins and using BrowserMob’s monitoring service
No comments · Posted by Patrick Lightbody in Industry News
We were recently delighted to see a detailed writeup about optimizing WordPress by Rob Havasy, a blogger by night and business analyst by day. He’s been running his WordPress-powered blog since May 2009 and everything had been going quite nicely until recently:
But I noticed a sudden decrease in performance earlier this week and couldn’t understand why. I was having both resource issues on the server (PHP was consuming too much processor capacity and my host automatically killing the process occasionally) and an overall slowness on the pages. How would I track down what was going on?
Rob goes on to explain how one of the plugins he was using had begun to have performance issues (apparently due to a 3rd party API slowing down) and not until he disabled the plugin did his WordPress-based site performance improve dramatically. The best part? He discovered the problem and confirmed the fix worked by using BrowserMob’s free monitoring service.
No tags
16
Why your site could be slow, even with low CPU/RAM/disk utilization
3 Comments · Posted by Patrick Lightbody in Load Testing Tips
We often have customers ask us why their site appeared to slow down significantly, despite the fact that their CPU, RAM, and disk utilization did not rise in utilization significantly. While those three metrics are often good indicators of why systems can “slow down”, there are many other causes of performance problems. Today, we’re going to discuss one common root cause for slow websites that often gets overlooked: connection management.
Until very recently, most web browsers would only issue a maximum of two connections per host, as per the recommendation by the original HTTP/1.1 specification. This meant that if 1000 users all hit your home page at the same time, you could expect ~2000 open connections to your server. Let’s suppose that each connection consumes, on average, 0.01% of the server’s CPU and no significant RAM or disk activity.
That would mean that 2000 connections should be consuming 20% of the CPU, leaving a full 80% ready to handle additional load – or that the server should be able to handle another 4X load (4000 more users). However, this type of analysis fails to account for many other variables, most importantly the web server’s connection management settings.
Just about every web server available today (Apache, IIS, nginx, lighthttpd, etc) has one or more settings that control how connections are handle. This includes connection pooling, maximum allowed connections, Keep-Alive timeout values, etc. They all work basically the same way:
- When a request (connection) comes in to the server, the server will look at the maximum active connections setting (ie: MaxClients in Apache) and decide if it can handle the request.
- If it can, the request is processed and the number of active connections is incremented by one.
- If it can’t, the request is placed in to a queue, where it will wait in line until it finally can be processed.
- If that queue is too long (also a configuration setting in the server), the request will be rejected outright, usually with a 503 response code.
It’s this queue that can make your site to appear to be slow, despite low server utilization. Say the server allows up to 256 concurrent requests and each request takes 1 second to complete. That means if 1000 users visited the site at the same time, causing 2000 requests, then the first 128 (256/2) users would get a 1 second response time, the second 128 users would get a 2 second response time, and the last user would get an EIGHT SECOND response time.
The simple solution is to raise the concurrent request limit. However, be careful here: if you raise it too high it’s possible your server won’t have enough CPU or RAM to handle all the requests, resulting in all users be affected (rather than just some of them, like in the last example).
Also remember that not all requests are equal: a request to a dynamic search result will be much more expensive than one to a static CSS file. This is why larger sites optimize their hosting to place static files on special web servers with different configurations, usually with host names like images.example.com, while leaving their more complex content to be handled by a larger quantity of servers with a fewer number of concurrent requests on each server.
So next time you’re wondering why your site is slow, take a look at more than just CPU and RAM. Find out how the server is processing the content and see if perhaps your web server is the bottleneck.
No tags
