Self-serve Backup Configuration Automation Application

While working for a major financial company, I became the development owner of an application to allow the self configuration of application server backups,

There were approximately 20,000 servers in this organization and Petabytes of backups occurring nightly

There was a dedicated team of people configuring backups and they were crushed with the load.

To configure a backup a request was received, then technicians would begin the process of vetting the requested server. A variety of tests were performed like verifying the backup host software was installed on the requested machine, verifying the machine was listed in the company asset inventory, insuring chargeback information was sufficient, making sure the machine was reachable by the intended backup server and about 15 more things.

The software I built, flipped the process on its’ head. It moved all of the responsibility of proper preparation and validation onto the sys admins.

The “Netbackup Request Portal” would allow a sys admin to enter the name of a server. The web-based software ¬†would then run a series of tests. Here’s some I remember

  • ping to server from Web server
  • Ping to server from backup server
  • Lookup server in Corp directory to make sure it was properly registered.
  • Verified that administrative contacts were on file
  • Verified no backup already existed for that server.
  • Verified the machines’ location was in a listed datacenter
  • Remote to the backup server that would ultimately serve it and perform ping, tracert and open a telnet session to the client software to insure client software configuration and no firewalls.

Overall there were about 20 tests like these.

Next, the results of all tests would be presented. I’m a big proponent of not only telling someone what is wrong, but also what they need to do about it to fix their problem. So the results screen included some information on failed tests for what to do to fix their problem.

If at this point, the user had failed messages, they would go off and fix them.

It ran for a few years before we turned it over to a different development group. We averaged about 20 configured backups per day. We went through 4-5 releases, mostly adding additional tests to insure a higher backup configuration success rate (Measured by a successful backup the next night)

Innovations:

Running all tests and presenting the results in a batch instead of showing the the first failed one. Detecting the possibility of a backup being configured in a way that would cause WAN backup traffic.

Technology
Network command line tools, C#.NET, T-SQL, PERL, Netbackup Command line querying.