Traffic management upgrade
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report to Moderator
- Plusnet Community
- :
- Plusnet Blogs
- :
- Traffic management upgrade
Traffic management upgrade
As you may or may not know, here at PlusNet we employ a traffic management system to ensure that the network traffic for "interactive" applications (like web browsing, gaming, instant messaging, etc) is treated with higher priority than, for instance, peer-to-peer traffic. We also use this system to prioritise the traffic for customers' on different products, to manage peak time / non-peak time traffic, and to slow down users' who exceed their usage limits. This last issue is one of the reasons why we are open about our network management, have clearly defined usage limits and offer our customers the ability to purchase additional bandwidth should they use more than their product allows - we cannot fairly manage users' usage unless they have a clear understanding of what their limits are. Anyway, the system we use is supplied by Arbor Networks and uses an Oracle database to store users' profiles and usage counts, split by different protocols. Until recently this Oracle database was running on a single server, with another server running as a disaster-recovery failover (that is, if the main database server died in any way, we could continue running the system with the DR box). This failover, however, would have to be done manually and could have caused problems for our customers until we had the other database server running (examples of these problems would be slow browsing, high latency for gamers etc). Of course, as sod's law would have it, this would probably happen in the middle of the night, resulting in a massive callout for our on-call engineers as well as many others in the business. All in all, it would not have been much fun for anybody. We have now deployed a much more resilient system (just typing this makes me feel that I'm tempting fate). The live database now consists of:
- Two SunFire X4100 servers, with dual Opteron processors and 8G of memory
- Two Sun StorEdge 3510 Disk Packs, with 12 x 144G FCAL disks and dual RAID controllers
These servers are running the latest version of Oracle 10g, along with Oracle's ASM (Automated Storage Managment) and RAC (Real Application Clusters). This setup means that both servers access the same database, and that database is simultaneously updated on both disk packs. By having two servers, two disk packs and four RAID controllers and this Oracle setup, the system is protected from:
- Complete loss of one of the servers;
- Complete loss of one of the disk packs;
- Loss of one of the RAID controllers
and would carry on working with minimal impact on the service we provide. We can also fix whatever component has failed, while the system is still running! In addition, we still have a disaster-recovery setup (to protect against multiple server or disk-pack loss) which consists of Oracle 10g running on a single server with a single disk pack. In addition to the increased resilience of the system, it also has a much increased data throughput. This is most clearly demonstrated by our usage accounting servers now taking around 10 minutes to process an hour's worth of usage data, whereas prior to this upgrade they took around 30 minutes. I hope this blog has given you an insight into how we at PlusNet manage network traffic, along with an idea of the behind the scenes improvements that we constantly make, that (mostly) only impact you, our customers, in a positive way. Thanks for reading, Alan Langridge PlusNet Operational DBA