The definition of monitoring has been changing over time. In the not-so-distant past, network monitoring meant watching an organization's communication interfaces and receiving notification if they had broken down. A simple "ping" command, for instance, can tell if an addressed network device is not available. Such network monitoring is reactive in nature and is helpful, pointing out a device that needs to be fixed. This may be an acceptable network monitoring technique for organisations that can afford to have their network down while repairs are made. It is unacceptable where uptime and performance requirements are tighter and it will usually fail for modern network designs.
The CERN network is a complex, dynamic and highly redundant multi-vendor environment made of continuously evolving components and technologies - state of the art equipment using multilink connections, load balancing, mesh topologies.
As in any complex system, faults may occur, but:
- Users should not be the primary detection mechanism.
- The cause of the fault has to be identified, presented to the right people in a language they understand so that corrective action may be taken on a 24x7 basis
- Time To Repair must be reduced
Continuous innovation in the organization's goals brings a need to sustain network operations in an environment that is moving faster, is more outwardly focused, and comprises both internal and external services. The challenge for the Communication Systems group is to deliver and manage an infrastructure that is both reliable and capable of sustaining a high rate of change. This is only possible through careful infrastructure design and implementation of management processes. Ultimately, we are continuing to improve the level of management automation through the deployment of the 'best of breed' monitoring tools available in the Network and Systems Management industry.