How IT Monitoring Tools Would Have Eliminated Costly Downtime for CPF

How IT Monitoring Tools Would Have Eliminated Costly Downtime for CPF
2015-12-15 NetGain Systems

CPF (Central Provident Fund) and Microsoft announced the downtime of the CPF E-service was due to a trip-up of a failed component that “did not function as intended under situations of high member traffic”.  Would an IT monitoring solution eliminate such downtime?

Microsoft had to rely on their experts to find the root cause and it was reported that some CPF E-Services were still down after two weeks of the reported downtime.

An IT monitoring solution, like NetGain Enterprise Manager, would have alerted the CPF IT team if the threshold for traffic was crossed.

Such downtime only highlights how a failure in a single component can result in a loss of confidence and reputation for both organisation. It is not surprising if they have to shift resources to deal with the non-technical fallout of the downtime in having to deal with the CPF members’ unhappiness.

In a network infrastructure, a component failure does not happen suddenly. The component or components will usually go down after a series of cascading issues. It is not possible to monitor such components with the naked eye and an IT monitoring solution, via pulling the SNMP of the devices, will display the status of the components on screen.

In the case of CPF E-Services, it was noted that high member traffic was the cause of the issue. If a threshold level was set up to alert the IT team of high member traffic via a IT monitoring solution, this would have given the IT team more time to react and prepare for surge in traffic.

smsnotification-e1442515784324

NetGain IT monitoring solution send out SMS Alerts When Threshold Levels Crossed!

A NetGain Systems’ customer shared that he was alerted of unusually high CPU utilisation of his public server and he immediately remote into the server to check. Upon going through the checklist, he found that the public server was at the start of a DDOS attack.  The early alert provided him with more time to react to the issue before the downtime reached the end-user.

Group your IT devices to a service for faster and quicker root cause analysis.

Group your IT devices to a service for faster and quicker root cause analysis.

NetGain Systems’ Business View feature also allow IT teams to group their devices into a service-related hierarchy. This visual presentation provides IT teams to quickly discover how a down component or down device can affect the particular service.

As more and more government agencies go online to provide E-services and with more citizens accepting of online E-services, downtime is no longer an option for these organisations.