Network Function Virtualization (NFV): 5 Major Risks
Network Function Virtualization (NFV) is a new holy grail for service providers and increasingly, enterprise IT. Instead of old, inflexible network components, like routers, switches, and firewalls, it is now possible to run virtualized representations of the same devices on your network and even design customized network services and run them at the click of a button. NFV is exactly what organizations need to address dynamic user requirements, growing workloads, and the complexity of agile development.
It is a great story, but in reality, organizations are facing serious risks when implementing NFV architectures. We will reveal a few of those risks and suggest a solution for several of them.
Risk #1: Traffic hidden from monitoring
In virtual-compute environments, a significant portion of the network traffic never hits a physical link. Cisco estimates that about 73% of data-center traffic will come from within the data center by 2019—most of this traffic is virtual machine to virtual machine (VM to VM) communication, which is buried deep inside physical hosts. Known as “east–west traffic,” it is essentially invisible to traditional monitoring architectures and creates a big blind spot for network operations. This could lead to difficulty in diagnosing network performance issues or failure to spot malicious agents within a virtualized data center.
Another obstacle is the inability to isolate production from monitoring traffic. In a virtual environment, there are compute costs and possibly input/output (I/O) tradeoffs to consider when making copies of the VM’s network traffic. In a physical, networking world, packet forwarding is usually done by hardware application-specific integrated circuits (ASICs), and it is possible to prioritize production traffic over monitoring traffic. In an NFV network, capturing and processing monitored traffic requires a central processing unit (CPU), memory, and network resources on your production NFVs, which can have an impact on their performance.
Bottom line: Most of the traffic in an NFV network will be invisible using traditional monitoring strategies.
Risk #2: New security risks and a security information explosion
NFV creates several new security challenges. According to a technical paper from Alcatel Lucent, there are four major NFV-specific security issues operators must be aware of:
- The introduction of new software components that did not exist in the traditional network model: The hypervisor and various management/orchestration elements, which creates a “longer chain of trust.”
- Reduced isolation: In NFV, almost all network elements are capable of communicating directly with each other, at least at the physical level (as opposed to traditional networks, in which different network segments were often physically separated and unable to communicate).
- Sharing of risk between multiple unrelated components due to resource pooling: An attack on a certain virtual network function (VNF) might affect other VNFs running on the same VM or physical server.
- The issue of “key escrow”: How to effectively share keys and security credentials between hosted network functions in ways that will preclude access by attackers.
In addition, NFV environments are characterized by additional complexity—systems are recursive, built up of complex services on top of elementary ones. There are three layers that need to be secured:
- The physical layer including compute, storage, and networking, as well as management systems, such as lifecycle, orchestration, and application program interface (API) access.
- Virtualized network zones defined by virtual firewalls or other network segmentation functions.
- Carrier application security: Virtualized functions used by applications, such as evolved packet core, software-defined networking controller (SDNC), and home subscriber service (HSS), are placed in the established security zones. Their security is assured by a combination of native application security controls and those provided at the network zone layer. Additional security is provided by the physical/platform layer.
With this three-dimensional complexity, even if all components are properly secured, there is far too much information to rely on manual processing, and operators of NFV implementations will have to rely on centralized, automated security processes to ensure security is correctly applied across the entire network. Acquiring monitoring data for analysis by security tools is also a challenge. Special purpose security tools provide a valuable service in checking for virtual data center threats that could be hiding within the virtual data center.
Bottom line: NFV architecture exposes an organization to several new security issues that need to be addressed. NFV is inarguably more complex than a traditional networking model and introduces additional software layers that increase the likelihood of bugs and security breaches. This makes it more difficult to enforce and monitor security policies in an NFV environment.
Risk #3: Performance bottlenecks—the vSwitch and hardware layers
NFV has evolved. While at the outset there were major performance issues at the physical host and VM level and in the communication between host and guest operating systems, these concerns have largely been alleviated. But there are still performance challenges centered on two areas.
The first is the Virtual Switch (vSwitch). In NFV, the vSwitch is a “train station” at which all packets must stop, whether they are “north-south” packets traveling into and outside the physical host, or “east–west” packets flowing between VMs and network services. The vSwitch is an obvious bottleneck, and its performance can be affected by the type of traffic flowing through the network—streaming video and audio, bi-direction video and audio, and browser traffic, are dramatically different in their networking behavior. There are efforts in the industry, such as the VSperf Project run as part of the open platform network function virtualization (OPNFV) consortium, to benchmark VSwitch performance in different scenarios.
The second is hardware acceleration techniques. There are several methods for accelerating physical-host performance at the hardware level—for example, ASICs, field-programmable gate arrays (FPGAs), network processors (NPUs), and graphics processors (GPUs). These devices can be deployed on physical hosts across the network, but the challenge is determining how many of them you need, which ones, and how to elastically provide them to the NVFs that need them most at a certain point in time.
Two newer methods for software acceleration are data plane development kit (DPDK) and single-root input-output virtualization (SR-IOV). DPDK reduces the overhead of packet processing on x86 CPUs and can accelerate data plane performance by at least 10 times. Software modules in the NFV environment—even virtual switches—can be rewritten to take advantage of DPDK. For example, open vSwitch has been enhanced to support DPDK, and there is now a DPDK-enhanced version which can forward 10x more traffic.
Single Root I/O Virtualization (SR-IOV) bypasses the vSwitch altogether, creating a direct path from the VM through hardware resources inside the virtual machine to the Physical NIC. SR-IOV-capable physical NICs make it possible for multiple VMs to simultaneously share the same physical NIC. This improves performance, reduces latency, and provides more predictable overall behavior. SR-IOV is the technology behind the Enhanced Networking feature which provides low network latency on Amazon EC2.
The “building blocks” for hardware acceleration are there, but until the methods are found to provide for them effectively, operators will continue to experience performance concerns at the hardware level.
Bottom line: NFV performance is getting better, but there is still a risk around vSwitches as a central bottleneck, and hardware acceleration techniques still need to be fine-tuned to guarantee maximization of large pools of hardware employed by organizations that deploy NFV.
Risk #4: What to do when something breaks?
In traditional networks, there are established practices for failover, redundancy, detection of problems, and remediation. How does this work in NFV?
NFV is based on the concept that components in the infrastructure are generic, replaceable, and prone to failure. Unlike a traditional network model in which network elements like routers and switches were hardened appliances with dedicated software to handle failure—in NFV, the virtualized forms of those routers and switches are running on a generic VM, which in turn is running on commodity hardware that may fail at any time. It is now the responsibility of the Management and Orchestration (MANO) layer to detect such failures and ensure high availability in different ways: for example, by instantly spawning an identical unit of a working VM and routing the relevant traffic to it.
The concern is ensuring the system is working properly and understanding how it reacts to different failure scenarios. Network operations teams have become really good at spotting a failed router and know exactly what to do to solve the problem quickly without affecting user traffic. In NFV, however, everything works differently.
When a host fails, the VMs running on it need to be transitioned to other hosts. There are also other dimensions of failure—one of the VMs composing a certain virtualized network function (VNF) might fail, while the other VMs might still be running. In this case, the NFV MANO layer should sense that a VNF component has failed and attempt to heal the VNF, for example, by restarting the failed VM, or by replacing the failed VM with a newly-spun VM instance.
Are the correct rules set up to govern these transitions? How long does it take for VNFs to resume service, and is there an interruption of service—especially critical for low-latency applications? In cases where something does not work well, are the appropriate alerts sent to operations staff, and do they know what to do to resolve a problem?
This is a significant risk, because failures will happen and ops teams need visibility into how they are treated by the MANO layer, whether the correct configuration is in place, what the current status is, and when an intervention is required.
Bottom line: It is non-trivial to detect and respond to failures in NFV. Although most failures are supposed to be handled automatically, network operations staff need to know how to verify that everything went well and must re-learn how to deal with problems in the new framework.
Risk #5: Maturity and complexity issues in OpenStack
A majority of NFV implementations rely in part or in whole on OpenStack, which provides a virtual infrastructure manager (VIM), and on some MANO layers above the hypervisor and virtualized resources. But OpenStack, today’s de-facto standard for building private clouds, is also notoriously difficult to learn, deploy, and use.
In the OpenStack User Survey 2016, users commented on the difficulty and complexity of working with OpenStack, although the platform is improving in maturity. Among user comments quoted in the survey:
- “There is a fair amount of complexity that needs to be tackled if one wants to use [OpenStack].”
- “While it is very powerful and flexible, it has also got a very high barrier to entry/learning curve.”
- “New users do not have the flexibility to spend weeks delving into source code to figure out how to do common tasks, especially in the areas of orchestration.”
- “Takes a lot of work to decide on deployment architecture, deploy, and maintain the software.”
- “Frequent releases; keeping up in an operational deployment model is hard to achieve.”
- “No synergies between the sub-projects.”
Bottom line: Most NFV projects will face a steep learning curve in getting their infrastructure to work as expected because of their heavy dependency on OpenStack.
Mitigating NFV risk with Ixia
Ixia is a leader in network testing, visibility, and security with a strong track record in testing and validating virtualized network devices and infrastructures. We work with 47 of the world’s top 50 telecom carriers and with all of the world’s top 15 network equipment makers. Many of the leading companies that are introducing new NFV infrastructure on a large scale are testing and validating it with Ixia.
How do you mitigate the risks inherent in NFV—hidden traffic, security, performance, and recovering from failure? While no one has a complete cure to these problems, we can suggest a path organizations can take to dramatically reduce risk and prevent issues further down the road. That path is continuous testing and validation for your NFV infrastructure, as part of your continuous integration and continuous deployment (CI/CD) pipeline.
The root cause of most concerns with NFV and software-defined networking (SDN) is that it is an unknown. IT organizations have limited experience with how these systems function and what will happen under various scenarios, such as high loads, security attacks, etc. Ixia offers a unique methodology and set of tools for simulating these scenarios while systems are still under development and testing, as well as discovering the technical solutions and operational processes that can prevent problems.
For example, one Ixia tool used to vet NFV deployments is BreakingPoint®—a versatile testing platform that lets you simulate many types of “good” user traffic, as well as “bad” traffic from security attacks or network malfunctions and see how the network behaves in many different scenarios.
Ixia can also help uncover hidden traffic and expose network behavior, even in complex virtualized systems. Ixia’s Phantom vTap™ helps you achieve breakthrough visibility and control of traffic flowing between virtual machines for greater security, compliance, and performance. It enables you to gain complete access to inter-VM east-west traffic. This capability enables the customer to forward packets to any end-point tool of choice, whether physical or virtual; local or remote, to achieve full visibility and verification across their networks. Visibility is key to discovering problems with NFV infrastructure and solving them early.
Ixia’s tools allow you to do all this as an integral part of your CI/CD pipeline. Solutions are available as virtualized appliances that can be deployed together with your NFV infrastructure at any version, and be used to automatically test even the most complex scenarios as part of your build/test/deploy pipeline. This is a crucial capability to allow truly continuous testing that can turn the unknown into the known.