IT leaders recognize that migrating applications to Hybrid Multi-Cloud (HMC) environments is a critical aspect of digital transformation, delivering tangible operational benefits such as increased business agility, scalability on-demand, and OPEX savings. However, this migration requires multi-year planning and coordination, underpinned by a Telemetry and Network Automation solution that insulates the business from changes in the underlying technology to facilitate a smooth transition from private data centers to HMC environments.
Anuta ATOM is a vendor agnostic, multi-domain and multi-cloud network automation solution that includes low-code automation, real-time analytics, and closed-loop automation. In this webinar replay, customers of Anuta Networks share best practices in building smart, predictable and responsive networks for your current and future infrastructure needs.
In this on-demand webinar you will learn:
- How to build massive-scale telemetry and analytics using model-driven telemetry
- How to transform into a pro-active network operation team with closed-loop automation
- How to deploy quickly with low-code automation
- How to ensure quality, achieve compliance and drive down operational costs for immediate ROI
- Best practices for multi-domain, multi-cloud orchestration and analytics
Introduction to Hybrid Multi-Cloud
Steve: 00:00 So, for those of you here in the US or North America, good morning, if anybody’s calling in from outside North America, I guess it would be the good afternoon or good evening. Welcome to today’s ONUG webinar, titled Model Driven telemetry and low code automation, power responsive networks for hybrid multi cloud migration. So, it’s a very interesting and relevant topic for the for the ONUG community. I’m Steve Collins, I serve as the working group CTO for ONUG, which means I’m very involved in the work that’s going on in several different areas related to defining kind of a set of use cases and supporting requirements for various aspects of hybrid multi cloud migration. And this topic is actually directly relevant to the issues that the ONUG community is dealing with. So, the interesting, interesting webinar here. Let’s move on to the next slide. And so, the agenda here is we’re going to talk about hybrid multi cloud deployments.
And the good news is we have a couple of Anuta Networks, customers online that are going to talk about their experience in this area. Going to learn a little bit about what Anuta networks does, what their solution is all about. And what the operational experience the, two customers have had with, with the product. And we will have time for Q and A at the end, hopefully, maybe about ten minutes or so. If you have questions, feel free to ask them at any time, just go to the zoom dashboard, go to the Q and A tab, and you can enter your question and we will track those as they come in. And then we’ll, have a session at the end. And let’s move to the next slide. So, I just want to set the stage here. We look at hybrid multi cloud infrastructure. right this is the type of the types of systems of software, you know, we’re going to be all migrating to whether it’s the service providers, the enterprises, you know, the leading web scalers are already there. And I think everybody recognizes the power of this type of infrastructure, all the benefits that will accrue over time.
But the other side of this, the flip side of it is there’s it also introduces a lot of complexity. And that sort of manifests itself in a number of different dimensions is this just fundamental issue of scale. fact that stuff is really very multi-layer in nature, it crosses multiple domains, you have kind of a high degree of distribution of you know, workloads and in systems and networks. Of course, everything’s been disaggregated right into various hardware and software components and different planes, and different layers. And the whole thing is very dynamic.
So, there’s a lot of complexity. And that translates into cost, right. And so, that’s what this webinar is all about is how can we tackle this complexity, and one of the real fundamental keys here is automation. Next slide. So, I think at the end of the day, we’re all going to be looking for a model here for hybrid multi cloud automation that kind of follows what I have here in this diagram. You know, we’ve got systems here, we’re going to be instrumenting them, been able to pull metrics from that instrumentation using telemetry mechanisms, collecting that, putting it into repository, performing analytics on that data, and doing that, hopefully, in real time, so that we can monitor the state of the infrastructure.
04:09 And if there’s a change to that state, we can feed that into an orchestration system. And the orchestration system can react appropriately, which usually translates into some type of control or reconfiguration of the underlying infrastructure. And so, this is kind of the model we want to work towards. We’re going to work towards this model, I think in sort of different aspects of that infrastructure, whether it’s the application level, the network level, sort of the systems or ITops level, and even at the security level. So, ultimately, this is this model is going to kind of be pervasive across the entire infrastructure. So, with that, just wanted to kind of set the stage a little bit, this is all directly relevant to what we’re going to be discussing on this call. Let’s move on to the next slide. So, I’d like to introduce the panel today, we have Peter Juffernholz, and he is VP of virtual virtualized network services at Tata Communications. We have Matt Wilson, who’s the Director of network engineering at NeuStar, and we have Praveen Vengalam who’s the VP of Engineering at Anuta networks. And the format here is we’re going to have both Peter and Matt, talk about the challenges, you know, they faced in their environments, you know, kind of what drove them to look at various automation solutions and what their experience has been what they’ve learned. deploying Anuta networks, and praveen will participate in that discussion as well. So, let’s move on to the next slide. And I’m this point, I’m going to hand it over to Peter and let him describe what he’s done at Tata Communications. Peter,
Tata Communications – Business Value and Network Automation Challenges
Peter: 05:56 Yes, thank you. Stephen. Hello, Peter, Juffernholz, at Tata communication and for those who are not familiar with Tata Communications, I just wanted to give you a very quick overview. These days, we are describing ourselves as a digital transformation provider to Enterprise Services, with the backdrop of all those hybrid cloud and connectivity moves that we also highlighted earlier as the backdrop of the changes in our industry. So, our target is to assist our customers in their digital transformation journey from within various aspects everything from network to through security and UCC services etc. That are required for customers to stay relevant to have a competitive edge in deploying technology that is agile and future proof as well as making onboarding new technology de-risking for customers as much as we can. we have a history of being an internet service provider and mpls service provider for many years and have done so on a global scale.
We are an Indian company, however, have significant assets outside of India actually employed equally customers across sorry, employed equally, people across the globe and have a global customer base with slightly more revenues actually achieved outside of India than in India. And we have serviced global MNC for a very long time. 15 years or so, with everything from connectivity to SD-WAN services security etc. More recently, I think our engagement goes now back almost four years with Anuta. And I think the deployment is now three years in place that we have on boarded Anuta to help us with hybrid managed services in general SD-Wan in particular SD-WAN prime service, which is allowing for some path selection and congestion management options on standards Cisco routers. So, that was a project that we have built in parallel to Cisco Iwan, with the advantage not to require any overlay tunnels on the MPLS side and limiting the overhead that is transmitted on a network communication to basically be to BGP communities and some other protocols that are not very heavy.
We have since then, on boarded additional SDN technologies we have a Versa offering in the market, we also in the process of launching Cisco based service for the Viptela variant of the product. So, now, our key issue for the automated automation that we faced was to cut down in those days on the delivery times, and to reduce the error rates that we had delivery times as in working with multiple well, not just in a multi network and the multi-service environment. But configuring complex scenarios on routers we had up to, I think four hundred individual CLI commands that had to go into routers to get them ready for our service. And, and so that the time to deliver service, and the possibility of errors was relatively high.
Yes, you script it a lot. But those scripts were a limited to the mastering of the scripts was limited to the people that actually wrote them. And then it still depended on inject, basically, manually injecting certain information for the proper configuration. So, and that was the onset for the discussions that we had how we can automate it and can build from there with Anuta’s help, and we went through the diligence process looked at different options. Did a market survey of vendors in that field and Anuta was ultimately the choice that we came to because of the flexibility of their platform. And Anuta as a company and the team that we worked with as being very outcome focused and very skilled in their work. So, I will speak about that a little, little bit more, I think as we go forward.
Steve: 10:58 Off to you Matt, I think.
Neustar – Challenges in Automating the DDoS mitigation network
Matt: Yeah, hi. So, this is Matt Wilson, I’m from NeuStar. I suspect a lot of folks haven’t really heard of us, we’re a company that we have a suite of products that help our customers really guide grow and protect their service offerings are, you know, their lines of business, I particularly sit on the protect side of the business. On the protect side, we offer a suite of services around DDoS protection, which is my expertise and, and DNS services as well. So, both the recursive and manage authoritative and recursive DNS. So, if you step back several years ago, we were in the process of really kind of transforming our DDoS business, we were going from being a small network that had just a couple terabytes of bandwidth with about four nodes globally. And you know, fairly straightforward infrastructure, not overly complex, we made this decision to get ahead of the market, right? we wanted to really differentiate ourselves, we wanted to grow our network.
So, we went out and we built a 14 node. One, we’re now at 11 terabits per second DDoS network. And in the process of doing this, we also kind of rapidly or drastically change kind of the architecture, we did this in order to be able to enable us to kind of more rapidly support new product development, you know, kind of new forms of connecting with our customers, you know, adding additional features. And that was really the goal of the entire project. Well, one of the things that we identified very early on with this is that the scale of this was going to become more problematic, it was one thing when you had, you know, basically, you know, four or five routers that you are managing across this, you know, you can script a fair amount, it really wasn’t overly complex. Well, fast forward to the new architecture, where we have multiple vendors, we have multiple, you know, multiple vendors and platforms, you know, all the same across all 14 nodes, but kind of different capabilities, different levels of bandwidth, and every single node, we needed a more effective way to manage this.
And so, for us, I think very similar to Peter, this was very much about being able to provide a standardized set of services, that we could cut down on human error, and that we can deploy very quickly. So, that’s why we’ve looked at a number of solutions, we had scripting as well, we looked at doing more scripting, and for us, one of the things that we really wanted to do is to have something that can maintain the state of the network, right. So, I wanted this entire platform to be something that was watching the network, keeping track of what every single device was supposed to be doing, what its configuration was supposed to be looking like. And that could give us the difference between what we expected and what we wanted. And so, you know, as we deploy things, you know, you’re talking about deploying things globally, right. So, we’re talking about having the ability to push changes at a service level.
And so, when I say a service level, like, like, I want to push a configuration for one customer, or I want to push a brand-new customer out to all 14 nodes simultaneously, have the exact same actions take place, and then being able to verify and double check that exactly what got put in there is exactly what I expected. And if not, then I needed to be able to kind of roll back at a very detailed level, exactly what happened. And in some cases, I wanted to be able to roll back exactly what happened. You know, we’re constantly making changes in the DDoS world, you know, we’re always making constant changes, the customers were, you know, adapting responding to DDoS attacks, you know, tweaking settings, things like that, we needed the ability to not have to roll back an entire configuration of a router, and then realize why only the stuff that I wanted to be in there, I wanted to be able to go back and have the transactional records of exactly what is getting done. rolled back. So, just that transaction that I might have done two hours before or two weeks or two months before. So, we were looking for a solution that would enable us to do that. We looked at all sorts of automation platforms. So, like Service Automation, platforms, also, you know, like things like Ansible tower things along those lines. For us, they didn’t kind of meet what we were looking for. Because, again, I needed something that sort of natively maintain state.
At the same time, kind of about the same time that we were looking and we are starting our deployment, we are starting our implementation of this, Anuta came out with Atom. And so we’ve been in the process of identifying all the use cases behind the data side of this being, you know, very similar to Peter being on the service provider side of this, you know, having the visibility into what’s happening, knowing exactly what’s going on in our infrastructure and being able to do that within the same platform, where you know we’re grabbing all the pieces of data. Streaming telemetry is in it and it really runs the gamut, right. We have log data, we have streaming telemetry, we have a lot of different types of data coming off of our different boxes, being able to pull that into a platform is really being valuable in assisting our automation, like automated responses. So, we’re starting to automate some of our responses based on that data.
17:16 Yep. thank you, Matt that’s an excellent overview. Thank you, Peter, as well, I think at this point, let’s bring Praveen in and Praveen, why don’t you provide a little more background on Anuta Networks and Atom? And then we’ll segue into Matt and Peter talking about their deployment experience.
Introduction to Anuta Networks and Anuta ATOM Overview
Praveen 17:43 Sure, hi, thanks for joining into the webinar. This is Praveen, I am VP of Engineering at Anuta networks. So, we’ve been in business helping large enterprise customers large SPs, MSPs with their automation and orchestration needs. And the main product that we deliver is atom. It delivers Assurance, Telemetry and orchestration for multi-vendor and multiple networks. And we actually cater to lots of different domains, it could be datacenter could be MPLS, WAN core, it’s pretty much an open platform that can be deployed in any domain. It is a multi-vendor network. And as we have seen from the descriptions from both Matt and Peter, one is MSP and other one is an enterprise. So, they place different requirements on to the automation platform. And that’s actually what we will discuss in the next slide. So, if you see on this slide, it just summarizes the major capabilities of atom and to cater to these different domains. And these domains having different vendors like multiple touch points, and being able to orchestrate a service at a granular level and also at a higher level, that is more at a service level. So, we need to provide a lot of mechanisms to our customers, where we help them describe what they want. And we can help achieve the automation.
And apart from being able to automate, we also need to ensure that whatever stays deployed, or whatever stays whatever is automated stays deployed, that’s where we can observe all the configuration changes that are happening in the infrastructure and we can ensure that any accidental changes can be rolled back and then on top of it, we will need to look into some of the performance aspects in the infrastructure and right do some base lining and then do some remediation actions. So, there are lot of these life cycle events that has to be taken care. And at the same time since this automation software is going to be in the main path of the business, right? This is a main business enabler. Now. The software itself has to be highly resilient, the software has to scale. The software has to be open, it has to provide API’s so that we can tie into the different ecosystem partners out there.
And ecosystem integration is fairly important because we do not expect ourselves like atom to be doing everything out there, we’re going to be required to integrate into like IP address management solutions, we’re going to be integrated into NMS solutions, SDN controllers and SD-WAN controllers. And towards that aspect, it’s very important for the platform to be open, it should be able to seamlessly integrate into multiple different endpoints. So, that all becomes imperative for the software.
So, if you see from left to right, the life cycle starts from pretty much on boarding a device, being able to understand what the device capabilities are being able to see the device that that infrastructure element with configuration, even doing some of the routine maintenance activities, like image management is something that we provide in atom and then once the devices are on boarded, we take care of that Day-1 to Day-N activities, like being able to deploy services and that could be services that are one time services like SNMP refresh, or maybe its a security patch, like a one time activities, that are stateless activities, or we can have scenarios that are that are stateful, both Peter and matt have described some services that are more stateful in nature, where they on board an external customer, they push service and service has a life of its own.
In those scenarios, we have a very rich way to declare intent that’s like a Yang Model Driven, service model that can support updates, deletes and an ongoing maintenance of the service. And these services that we deploy in ATOM, they have to be available to API so that northbound self-service portals can tune it. So, that’s where we have a very rich API to atom. And once these services are deployed, we will be constantly tuning into the underlying infrastructure for SNMP traps, syslog, telemetry and if there are any anomalies that are happening, we can define baselining, we can emit alerts. And then from there, the alerts can be tied into remediation actions. So, the alerts in some scenarios can just be emails or in some scenarios, they can be posted as notes on a Slack channel.
But eventually, sometimes some of the alerts when we need to trigger a remediation action, that could be like where we can go in and invoke a workflow. And atom has a very rich work flow engine that’s BPMN compliant, it can also support low-code automation, where we can define the mops, let’s say BGP Neighbor is flapping, interface is flapping, high utilization, I want to steer the traffic to secondary link. All those kinds of remediation actions can be defined, and atom can take care of doing that fairly well. All in all. It’s a highly scalable, open and customizable platform. And as we saw it, there are multiple vendors at play in both customers scenarios and different kinds of activities. One is Networking device, like NMS solutions, SD-WAN solutions. Where been fairly successful in catering to the diverse deployment options like single site like across multiple geographic locations also. We have done that. And that’s about where I would like to pause.
Steve: 23:33 Yeah. So, with that background on Anuta and atom, let’s turn it over to Matt again and let Matt talk about some of his actual deployment experience.
Neustar – Experience Deploying Anuta ATOM
Matt: 23:46 Yeah, so for us when we started down this process, one of the things that we identified pretty quickly was that we needed to, we really needed the ability to have this entire platform a very fault-tolerant right, we’re talking about nodes all over the world, we’re talking about nodes, it just there’s a fair amount of latency between certain nodes. So, we kind of chose a model that was a bit of a multi-tiered model where we had agents in every single node, as well as multiple platforms on the back end, because we needed the redundancy on the back end, we had multiple redundant agents in each node, kind of being able to talk to these. So, that way, we had a fair amount of fail-over resiliency.
One of the things that we had, we also realize is, we were kind of engineering some of this a little bit on the fly. So, we step back when we first started with Anuta, we step back and realize, we needed to better identify what we were looking for. So, where he talked about where we have the YANG modeling, we had to sit down and better understand exactly what each service model needed to look like so service model kind of what the Yang modeling is. So, we had to understand what a service model was.
So, we sat down and we did a big analysis in our infrastructure to say, what are the granular steps we take? what are some of the things we choose to do? we, identified the most critical ones for us that were, you know, in order to kind of enable us to go to market, what we wanted to do. And then we had a big list of kind of future items that we wanted to do as well. And so, we’ve prioritized and gone out and worked on all these priority, kind of first priority ones. Anuta has been great working with us to tweak those as we go. You know, we kind of identify that, you know, maybe sometimes our requirements were not the clearest or not the most well defined. So, we’d sit back and we tweak it, you know, using the professional services.
We have been able to fix some of the kind of little nuances as we move forward, and frankly, as we change our services, we have to tweak these things as we change our services just a little bit. So, it’s been a very valuable tool for us. You know, it’s kind of we are able to use it as an ad hoc tool for the SOC. So, while we wrote service models for some of the most critical and important things that we knew we needed as part of our model, we’ve also we’ve worked with them to, we’re using some just direct ad hoc type of tooling where, you know, we have like some basic API, and UIs, around access to certain devices that has also proven to be really valuable.
And we’re just really starting to hit the tip, the tip of the iceberg on the telemetry and the analytic side of this as well. But all in all, it’s been pretty pleasant experience and you know, it’s not without its complexities as you can imagine anything like this is but you know, it’s they make it pretty easy to work with and just straight off the bat like the deployment was pretty straightforward. You know we got enough we got it testing in our lab we have a kind of a full lab for everything that we do. And so, we got it up working in the lab, we were able to simulate some of those long distance and make sure that we’re handling high latency connections, fail over type of scenarios. But I mean, for us, the deployment has been we were able to identify the stuff very early on and kind of move that forward so.
Steve: 27:52 That’s it. Thanks, Matt. And we’ll talk a little bit more about kind of what you learn from this whole experience shortly. Let’s flip it over to Peter and let him talk about his deployment experience next.
Tata Communications reduces CPE on-boarding with Anuta ATOM
Peter: 28:08 Yeah, certainly. So, from our requirements and wish list on set, it was all about us being able to automate the delivery of CPE is in service activation and doing so in Greenfield and Brownfield scenarios with zero touch provisioning, back end OSS integration, scalability as we are service provider with thousands of customer sites deployed, or tenths of thousands, I should say, and then have the ability of, you know, to verify the work that has happened and also discover operations and, you know, failures and so forth. And so Matt also pointed out to have the ability to, to roll back certain scenarios whe