High Availability Authentication Upgrade

Visp.net’s authentication systems have been completely upgraded and re-architected to deliver extraordinary uptime.
Uptime support systems:

  • Monitoring and alerts: 36 points of Authentication are monitored on three separate monitoring systems. Load thresholds are set at conservative limits. If a threshold has been passed or system failure has been detected, our system administrators are notified.
  • Backup alerts: A secondary alert system also notifies our system admin team in the event of a threshold has been reached or if a failure has been detected.
  • Auth Proofs: We’ve implemented end-to-end authentication tests, from both inside our network and outside our network. This proves an authentication request is received by the requesting NAS currently every twelve seconds, simulating live authentications to prove all systems (or auto-failover systems) in the authentication loop are functional. If there’s a failure, we get an alert. We also graph success and the latency of each authentication.

Architecture as of April 13, 2017

  • Load balancers: Proxy servers automatically route your authentication requests to the most available of currently two redundant RADIUS servers. If one of the RADIUS servers fails, requests are automatically routed to the other server.
  • Redundant proxies: In the event one proxy server fails (due to underlying hardware failure, for instance), your NAS can fail over to a secondary proxy.
  • High capacity: As of the date of this post, each proxy server and RADIUS server are currently running at less than 1% cpu load. The database is currently averaging well under 5% CPU load with peaks rarely over 10%.
  • Separate data center zones: Each authentication path operates from a separate “zone”, meaning it operates on separate power circuits, separate backup power, separate internet circuits, etc. Only the UBO Database is currently in one zone 2a and a single point of failure, but is already pre-staged to be multi-zoned once the upgrades are complete.
  • Self-restarting servers: In the event our monitoring systems detect an unresponsive server, the server is automatically restarted on the fly to maintain maximum uptime with no system administrator involvement. Meanwhile, your auth requests fail over to the redundant path.

Server failures happen. It’s a fact of life; hardware fails and software fails. But with HA architecture, your authentications are redundant and highly tolerant. Here’s a sample of a recent server failure that was automatically restarted, within just a few minutes, using this technology. In this case the redundant backup server takes over authentication while this server recovers itself, quickly restoring redundancy. This is what it looks like in our monitoring systems:

  • Self-healing servers: In the event our monitoring systems detect a hardware failure as we’ve experienced in the past, the server is automatically rebuilt on the fly on other hardware as needed to maintain maximum uptime with no system administrator involvement. Meanwhile, your auth requests fail over to the redundant path.
  • Redundant cloud providers: In the event authentications fail on both redundant systems, or the entire cloud fails, authentications are routed to backup authentication systems on the Google cloud, where you have a choice of three different speeds depending on your choice of the best match of your SLA for a given NAS. In the event this system is reached, this backup RADIUS authorizes all requests with a one hour timeout until the primary authentication systems are recovered. This makes a total of three separate authentication paths to ensure your paid subscribers always have access and to minimize phone calls in the event of an invalid authentication failure.
  • Rapidly upgradable: Each of these servers are rapidly upgradable, in the event load changes and to accommodate new features.
  • Fully mirrored staging environment: built a fully mirrored staging environment for thorough testing of other new improvements.
  • Direct access to our team: For anyone interested, visp.net hosts a RADIUS, called the “RAD“ channel on Skype for direct communication with our support teams.
  • Team hours and cross training: Visp.net keeps technical staff on duty 18 hours a day, over two shifts, five days a week. A minimum of three technical staff are on call after hours. Key technical staff are cross trained to diagnose and recover these mission critical systems.
  • Phone system improvement: An issue with the phone system that misdirected after-hours support calls during certain hours to our daytime greeting has been identified and corrected.

Many of our best ideas come from you and these result in improvements that build value for the entire WISP industry. So we invite you to join us for open forum to review and help us improve these systems and other ways to streamline your operations. The Technical CEO Roundtable is held on the first and third Tuesday of each month at 3 PM Pacific. Simply contact dpacquiao@visp.net to include you and your key staff in the invitation.