In this post, we will discuss various methods to collect the network telemetry and identify the best practices. We will wrap-up with some of the design decisions in Anuta ATOM to optimize telemetry collection.
Introduction to Network Telemetry
The demand for network monitoring and optimization is increasing day-by-day. The monitoring software has to collect the state in real time and act upon any anomalies to ensure the network health and reliability. There are many collection methods such as CLI, SNMP, NETFLOW, Syslog, NETCONF, sFLOW, Streaming Telemetry, PCEP, etc. The reliable and efficient collection is the first phase of Closed Loop Automation.
The traditional mode of fetching operational data via SNMP is based on a non-real time pull model that requires complex parsing logic and suffers from performance and scale issues.
Streaming Telemetry is quickly evolving as a viable alternative to SNMP. Many leading vendors such as Cisco, Arista, and Juniper support model-driven telemetry collection that is efficient, highly performant and easily decodable. Streaming Telemetry results in a scalable monitoring infrastructure with analytics-ready data.
SNMP is Dead… Long live SNMP…
So, are we ready to write off SNMP monitoring? Not yet. Streaming Telemetry is still evolving and has limited vendor support with no standardization. And then there is the human resistance to embrace new technology. So, we will have SNMP and Streaming Telemetry based network monitoring for the foreseeable future.
So, what is the best method to collect the operational data? The answer depends on the organizational requirement and platform choices. As we go through the list, you will find that none of the collection options is complete and each has its own merits.
Horses for Courses:
1. Streaming Telemetry is best suited for collecting operational data from the data plane and control plane. It enables granular statistics such as NPU metrics, real-time counters for every protocol. If you have mission-critical networks, Streaming Telemetry is your first choice. Below are some use-cases that will benefit from Streaming Telemetry.
- To collect the CPU, Memory usage of each line card or node so that the software can alert the ops team in case of deviation from baseline behavior.
- Receive NPU counters to detect the oversubscription of links as well as packet drops
- Accordingly, QoS policies can be changed as part of traffic engineering.
- Monitor BGP real-time maps to report all the neighbor or route information.
- Monitor RIB/FIB and FIB/FIB consistency between the RSP and Line cards.
2. SNMP and SYSLOG should be used for general purpose monitoring and alerting.
- To collect device inventory information such as serial number, platform, system-uptime.
- To receive the connected neighbor information (e.g., CDP and LLDP neighbors)
And, in the case of devices that don’t support streaming telemetry, SNMP is the only method available to receive the statistics.
3. IPFIX should be used to collect flow-based information. Based on the device vendor, IPFIX can be using the NETFLOW or sFLOW formats. For example, IPFIX data can be used to
- Analyse the network flows for capacity planning and traffic engineering
- Plan and monitor QoS policy.
- Find out top talkers in the network and report them.
4. PCEP should be used for path computation as well as generating the network topology information.
Apart from the mode of collection, the frequency of collection depends on multiple factors including device vendor, use-case, Information availability through various protocols, and support for periodic sampling in milliseconds versus event-based streaming.
Anuta ATOM support for Network Telemetry
Anuta ATOM provides a platform to collect the telemetry data from all the above sources and stores it in a time-series database (TSDB) to achieve closed-loop automation for multi-vendor infrastructure using advanced functionality such as:
- Filtering while persisting the data into time series database
- Retain the more granular data when the retention policies are configured
- Encrypting the confidential information while storing in the TSDB based on the organizational request.
Network Telemetry Applications
Anuta ATOM enables:
- Network Capacity Management: Delivers real-time inventory of available capacity and usage reports to aid capacity planning.
- Performance and Latency: Collects instant telemetry data to detect application slowness and generates alerts.
- Closed Loop Automation: Automates remediation steps based on latency, security and availability requirements.
- Predictive Analytics: Identifies anomalous traffic patterns and access violations to build the historical baseline behavior.
In summary, to collect comprehensive network telemetry, network administrators will have to rely on multiple options including SNMP, Streaming Telemetry, PCEP, IPFIX with NETFLOW/sFlow. Anuta ATOM delivers a platform that can unify disparate telemetry sources and introduces closed-loop automation to ensure SLAs in a multi-vendor infrastructure.