Deploying network intrusion detection system in the cloud (part 1 in a series)
This is a series discussing how you can implement network-based Intrusion Detection System (NIDS) and a robust network visibility architecture in the public cloud.
For brevity, IDS is synonymous with a network-based IDS (NIDS). This post will provide background on why you might want to implement an IDS, the common network visibility architecture patterns, and the challenges with network visibility in the public cloud. I will cover cloud native network visibility architecture and IDS in subsequent posts.
- IDS should be a part of your defense in depth security strategy. It serves as a safety net in conjunction with other defenses.
- A lift-and-shift strategy to network visibility is widely advocated. It allows you to extend visibility to your cloud-based workload but overlooks the additional friction to deploy and operate.
- A lift-and-shift strategy to network visibility impedes you from realizing the agility and flexibility of the public cloud and cloud native technologies.
- There is an alternate cloud native architecture to network visibility and IDS that can be applied to the cloud and existing infrastructure.
I have always been a beneficiary of other writers’ insights which help me learn and think differently. Likewise, I want to share insights from working on CloudLens and to provoke thinking about the visibility architecture in a different light. The other motivation is I just want to start writing; to work on a craft I always find challenging when trying to translate what’s in my head into words.
My day to day work in the past few years has allowed me to assume different roles from architect to software engineer to DevOps to SRE and working in many technology stacks. A lot of what I share will be for readers in a blend of these roles and targeting those with general interest in network visibility.
I assume you have a high-level understanding of what an IDS is. If not, have a look at this.
Most application developers tend to wonder why monitor at the network level when you have the relevant context on what your upstream and downstream dependencies are. Do you really have full visibility?
- You may not be able to implement the same consistent monitoring and observability at the application level across all your dependencies, whether in a microservices or monolithic architecture.
- How will you know that there is this service that you didn’t write but deployed to your environment has just started having communications over the network to some mothership out in the internet? Is it legit or malicious?
- Even if a development team owns every line of code in the services they use and have the same observability mechanism implemented, can a team guarantee that observability is always at 100% as the features in the services grow over time?
- How can you tell if people are targeting a 0-day exploit on your webserver when a patch is not available yet?
Now, taking the context of an engineer who has control over policy configurations on perimeter or application firewalls, security groups in the public cloud or cloud-based policy compliance check engine. I use the generic term engineer because this can be an application developer, a SecOps engineer, a DevOps engineer, or an infrastructure engineer.
- How do you know if there are anomalies introduced by lack-of or incorrect configuration of policies in any of these entities?
- How can you be sure that no exploits take advantage of correct openings using unknown or unpatched weaknesses?
If you zoom out from these roles and focus on security, you probably want to implement a more generic safety-net detection mechanism to look for potentially fishy activities that different engineers may have conveniently abstracted out of their day to day concerns. You need to augment your defense-in-depth strategy beyond inline checks in your network and application layer checks the developers have put in place.
One way to achieve this is to implement an IDS that passively analyzes the network traffic in your environment.
Some examples of detection actions that an IDS can take are:
- Looking for anomalies in traffic patterns
- Flag traffic leaked through firewalls or other inline devices
- Flag traffic within internal network segments that don’t traverse inline checkpoints
- Detect traffic sent to or from unexpected locations
- Detect illegitimate activities via legitimate paths
The primary contributors to an IDS implementation are placement of IDS sensors and quality of the detection.
Before implementation, consider the questions below.
Placing sensors in the right spots in your infrastructure so that you can monitor traffic of interest to you.
- Do you have blind spots where you can’t get packets?
- Do you have unnecessary duplication of monitored traffic?
- Do you have unnecessary analysis of monitored traffic?
As you receive monitored traffic in the IDS sensors, your ability to detect unusual activities is only as good as the IDS’s analysis and its associated configuration for the monitored traffic.
- Does the IDS tool you select have a rich set signatures and anomalies detection? If not, do you need to select more than one IDS tool?
- Does the analysis scale with traffic load variation?
- Does the IDS tool you select have good handle on false positives?
- How often are the set of signatures and anomalies detection feeds updated?
- How easy is it to adjust your signatures and detection configuration?
You want to have clarity on the set of questions that are important to your objectives, and explore the tradeoffs of your solution with respect to placement, quality and cost.
It is common to place at least an IDS sensor behind the perimeter firewall, with a broad detection configuration. Other sensors are deployed on different internal network segments, with a narrower detection configuration, to detect lateral activities. The way to get traffic from the network into these sensors are via physical or virtual taps, or span ports from physical or virtual switches.
Below is an example diagram courtesy of bestofnetworksecurity, which shows deploying IDS sensors directly to different network segments.
There are several potential issues with this type of architecture.
- Duplicate analysis of monitored traffic when an entity on one segment talks to another entity on a different segment.
- Friction of deployment increases as you have more branches or subnet. You must deploy an IDS device onto the segment you want to monitor.
To address issues with the earlier architecture, some will implement another layer of indirection via a data monitoring switch to reduce the tight coupling of the IDS sensors to the production network. The trendy name for this type of switch with additional superpowers is Network Packet Broker (NPB).
Below is a conceptual diagram showing how an NPB architecture logically separate the tools network from the monitored network via intermediary NPBs.
While solving the challenges earlier, this solution introduces new concerns.
- As your monitored traffic grows in volume, the NPB layer can become a bottleneck.
- You still have a tight coupling at the IP layer from your NPB to your tool layer.
- You now have the additional responsibility to operate and maintain the NPB layer.
The lift-and-shift NPB approach tries to retain the logical layering as before. This entails creating entities in the cloud at the NPB layer to intermediate the monitored traffic. The diagram below illustrates couple of options predicated on whether an IDS is itself deployed in the cloud or not.
You can backhaul monitored cloud traffic through a cloud NPB to your datacenter, and then forward the traffic to an IDS tool in your existing environment, as indicated by the orange arrows. Alternatively, you can forward the monitored cloud traffic directly from the cloud NPB to your cloud IDS tool as indicated by the blue arrows.
Networking challenges in the public cloud
With the lift-and-shift architecture understood, you are immediately hit with how to get packets from the cloud apps to the virtual NPB. There are several challenges specific to getting access to packets.
- The public cloud infrastructure doesn’t deliver packets not addressed to you, so you can’t sniff.
- How do you get east-west visibility for intra-host VM and containers communications?
- You don’t have access to the physical or virtual switch like in your own environment, so you can’t span or tap.
- How do you get the monitored traffic to the virtual NPB?
Knowing these constraints, you arrive at the conclusion that you need to implement some agent co-located in the same runtime as your workload. The main job of the agent will be tap the traffic tied to the workload and establish a tunnel to the virtual NPB, followed by the actual transport of the monitored traffic via the tunnel. Below is a diagram depicting how the agent would work.
Now that you’ve gotten past the networking challenges, let’s look at other challenges.
The benefits of the public cloud adoption that are often cited are agility, flexibility and manageability. With a lift-and-shift approach to your visibility architecture, you want to examine how well it takes advantage of these benefits as you embrace the public cloud.
- Is your configuration for the virtual NPB and agents tightly coupled to IP addresses? You must deal with tunnel configurations as either end of your tunnel can potentially move onto a different physical runtime with an ephemeral IP address.
- Your latency to adapt to changing environments. Think of your ability to deal with change if your workload is in a VM vs. a container, where the lifecycle duration of a VM and a container may differ.
- Are you about to deal with fluctuating monitoring bandwidth demands at your NPB and Tool layers by leveraging elasticity?
- Are you cost optimized in the runtime you deploy to for your NPB and Tool layers?
- Are your NPB and Tool’s control paths (middleware and API/UI layer) scalable?
You want to see the level of friction involved for you to harness agility and flexibility into your cloud visibility. Let’s look at each of the layers involved.
- How easy is it to deploy the agent to source the traffic in a Linux or Windows based OS in a VM or container?
- How easy is it to deal with updates?
- How easy is it to set up the tunnels to the intermediary NPB?
- How easy is it to do all the above at scale when you have non-trivial number of agents?
- Do you want to incur the bandwidth cost of backhauling the traffic to your existing tool?
- Like the monitored network layer, how do you deploy this? Is it a software package, a virtual machine or instance image?
- Are there one or more data-plane VPB instances?
- Is there control-plane instance to control the data-plane VPB instances?
- How do you scale these entities both vertically and horizontally when the monitored network bandwidth fluctuates?
- How do you monitor the health of the new entities introduced in this layer?
- How easy is it to set up the exit endpoint of the tunnel from the agent to the NPB?
- How easy is it to set up the entry endpoint of the tunnel from the NPB to the tool?
- Does the IDS tool you choose allow or supply a cloud workload rule set?
- Do the IDS sensors scale vertically and horizontally?
- Are the IDS control plane and user API/UI layers scale with varying traffic load?
- How easy is it to update rules in the sensors to match the monitored traffic?
- How easy is it to terminate the exit of the tunnel from the NPB to the tool?
On the surface, the lift-and-shift visibility strategy gives you an easy path to monitor your cloud workloads. Leaving the actual tool layer issues aside, this is a hefty list of toil work to just fit into the lift-and-shift architecture. This architecture lacks a clear path to harness the agility and flexibility provided by the public cloud.
In the next post, I will explain an alternate cloud native visibility architecture that offers a cloud-first and evolve path for your visibility infrastructure footprint. Stay tuned.