The night of April 13th could have been devastating for the average ISP, but thanks to world-class back-office management, you only heard about it now and your subscribers, blissfully unaware, never lost a single bit of data.
A critical and core server within our infrastructure (the primary database server) had a hard drive meltdown. For many ISPs this could have resulted in a tremendous data loss, or at the very least a significant downtime while the server was repaired and backups were restored. I’ve seen ISPs go entirely out of business due to a crash like this because they did not have the capabilities to restore services fast enough and their customers ditched to a different ISP. The great news about this crash: For visp.net ISPs this crash, and subsequent repair, went 100% unnoticed by all ISPs and subscribers. Business went on as usual as if nothing happened.
ISPs and subscribers did not notice this outage for many reasons:
- The database server had redundant drives that instantaneously took up the load.
- Database load is distributed via multiple (also redundant) load balancers.
- Had there been a multiple drive failure or system-wide crash, there is a backup database server standing by with a replicated (mirrored) set of data ready to take-over automatically.
- The maintenance to replace the defective drive also went completely unnoticed because the redundant server gracefully handled all queries in real-time while the defective drive was being replaced and the RAID array was re-built.
visp.net’s highly experienced technicians came down at 12:00AM the morning of April 14th to replace the defective hard drive and not a single visp.net ISP had to manage, think about, or wake up to make sure it happened. It just happened as have many other hardware replacements in the past and went completely unnoticed by ISPs and subscribers. The byproduct is an experience by ISPs and subscribers of incredibly reliable service.
This is just a single server example of how visp.net’s redundancy protects your business and your subscribers’ data. visp.net’s servers and routers have their fair share of problems and outages just like any other data center; however, through our extensive measures of redundancy we are able to ensure these outages have little or no impact on your business. Every critical system at visp.net has the same or similar redundant features to maximize uptime as the database server. Take a look at some of the levels of redundancy we provide.
Power: Every server is protected by enterprise grade UPS (uninterruptable power supply). The UPS protects the servers and routers against power fluctuations or power outages. In the event of an extensive power outage we have generator backup connected to an automatic transfer switch which automatically transfers our NOC to the generator in the event of a utility power outage.
Internet Connectivity: The visp.net NOC currently has three separate high grade fiber connections to the internet. These connections are load balanced and have automatic failover support by means of the BGPv4 protocol. The circuits come in via separate physical paths to minimize an accidental outage due to the physical circuit being damaged. These circuits terminate into our facility on redundant Cisco routers, which also have redundant power supplies. The routers employ powerful load balancing protocols to ensure that even in the complete failure of a router, the other routers automatically handle all the internet traffic.
Load Balancers: We don’t have just one, but multiple load balancers. One could do the job; however, just like servers load balancers contain electronics which could possibly fail as well. It’s very unlikely because of their solid state design, but visp.net isn’t taking any chances. We have multiple so in the unlikely event one were to fail, the backup automatically takes it’s place and resumes distributing traffic within just a few heartbeats.
RAID: RAID stands for Redundant Array of Independent Disks. Hard drives tend to be the most common failure point in any server. In our NOC we primarily make use of RAID-1; however, on ultra-high performance machines we also use RAID-10. RAID-1 has two benefits; first it provides double the read speed since there is a duplicate set of data on each disk the system can read from the least loaded disk. Second it provides the best level of redundancy because it mirrors identical copies of the data on two identical disks. In the event of a failure of one disk, all reads and writes automatically fail-over to the functional disk. RAID-10 has the same level of redundancy as RAID-1; however, also provides additional speed.
Redundant Servers: For mission critical applications such as authentication, databases, and email filtering there are multiple servers distributing the load. In the same way that RAID-1 mirrors data on two independent disks, our mission critical servers automatically fail over in the event of a full server outage. This means critical services are protected on many levels, first by RAID for disk problems, and second by redundant servers for a problem which expands beyond disks.
Enterprise Grade Equipment: If you walk around our NOC you will see or hear a lot of terms. Cisco, Xeon, Opteron, ECC, ES Class, etc. These are names or acronyms for equipment we use in our NOC. Our servers are far higher quality than the typical workstation. A lot of ISPs make the choice of running servers on workstation class hardware which amounts, in the long run, to data loss. We make the choice of running enterprise level gear in addition to all the redundancy talked about here to ensure your customers experience the best class of service available.
Backups: Despite all the redundancy and protections listed here we still account for catastrophic situations. In the extremely unlikely event that a server had multiple hard drives fail at once which defeated the RAID-1, and simultaneously the backup servers experienced the same unlikely failure we stand to risk data loss. Although this type of situation has virtually non-existant odds of happening, we still maintain frequent backups of all data. Typically every server has a full backup weekly which is stored both on-site and off-site. Additionally servers which contain extremely important data (for instance, your Ultimate Front Office billing data) have more frequent backups.
This means our NOC has nearly twice as many servers as is required and two to three times as many hard drives as is required. We invest in these protections because we know computers are prone to failure and we want to take every precaution to make sure all visp.net ISPs are able to provide the best class of uninterrupted service available.