Bro Feature Requests (post 8 in a series)
In my previous posts in this series, I laid out my plan to enable Threat Hunting in a scalable way for a cloud environment by integrating Bro IDS with CloudLens, hosted on Kubernetes, with Elasticsearch and Kibana as the user interface. I also gave brief overviews of the key components, explained how to configure CloudLens to deliver network packets to Bro, and discussed how Bro will be set up to fit on Kubernetes.
For my adaptation of Bro to Kubernetes, I kept the Bro
cluster-layout.bro as-is and made it work. I took this approach in the interest of avoiding modifications to Bro. Some trade-offs come with that approach though, and I'd like to take a moment using this post to dig into what those are.
The primary concern is that the explicit settings in
cluster-layout.bro sidestep some of the "Kubernetes way" of doing things. Let me lay out the specific issues.
Pods reference each other directly
If you look at the config for a worker, for example, it explicitly sets to the manager, logger, and proxy the worker should talk to. It does this by references within this same config file.
For example, in this entry for a worker, the names
proxy-1 reference other entries in the same file.
["worker"] = [$node_type=Cluster::WORKER, $ip=10.38.0.35,
For Kubernetes purposes, it would be greatly preferable for these settings to refer not to entries in this file, but rather to a DNS hostname, which would correspond directly to a Kubernetes service.
- Kubernetes service discovery is via DNS lookup. It shouldn’t be necessary to explicitly name a particular manager instance, for example. As long as the manager service has a well-known name, a pod could just do a DNS lookup on that service name, and let DNS resolution take it from there.
- A pod should not connect directly to another pod. It’s the service’s job to find one or more pods that support that service, and route your connection to one of them. Let the proxy service determine which proxy pod a worker will connect to. Circumventing the Service object circumvents the load balancing scheme of the service. It also breaks use cases like cross-cluster federation, where you won’t find the pod you’re looking for locally.
- Services have a consistent address even as pods change underneath, so if you’re using services, it eliminates much of the need to have this config be dynamically generated. In a Kubernetes environment this implementation is unnecessarily complex.
Pods address each other by IP
This is closely related to the previous concern, but slightly different. When a node looks up another node in the file, like when the worker above looks up the proxy, it finds in that record the IP address and uses that directly to connect. This info has to be an IP rather than a hostname, because the data structure used by Bro specifically uses an IP address.
Really Bro would fit into Kubernetes more seamlessly if this IP address setting could be eliminated entirely. As mentioned above, when a particular node is seeking to connect to another node, it should do a DNS lookup on a service to find the IP address it should talk to, rather than looking it up in this file.
The only remaining purpose for the IP in this file, then, is for a node to know its own IP address. It should be possible to determine that automatically by looking at the local network settings. An explicit IP settings should be optional, and used only to select for a specific interface.
The reason why this is significant is because with Kubernetes, the IP address is not predictable, and is not known until the pod is launched by the scheduler. Eliminating the need to have an explicit IP in the config file eliminates another reason to dynamically generate this file.
Nodes are listed explicitly
cluster-layout.bro lists every individual node explicitly. This is mostly redundant. For example if you have multiple workers, does each one really need individually customized settings? In reality, probably not. Not if it's just connecting to service names known well ahead of time rather than IPs that are allocated dynamically.
Listing every node explicitly again requires that this file be generated dynamically. If there was simply a generic 'worker' section that applies to all worker pods / nodes, the file wouldn't need to change as new pods are launched.
The result of these concerns is that the
cluster-layout.bro file must be generated dynamically after the pods are allocated and the networking information becomes known.
This interacts poorly with another property of Bro - it does not support reloading config files, and must be restarted when the config files change. Since this config file needs to change every time a pod of any kind appears or disappears, this can be quite disruptive.
Recommendation - Services
Long term it would be a better approach to essentially remove the entire
cluster-layout.bro file, using Kubernetes services instead to set up connections between the components.
This would work something like this. For the logger, one would define a logger service, naming it literally "logger". On the service definition, you'd set a port number, which could be 47761 to match the existing convention. You'd tell the service to select pods that are tagged as the logger nodes.
- name: logger
Kuberenetes will create an IP address for this logger service, visible only within the cluster. The service will keep track of the set of pods that are running that match its selection criteria. It will listen on the defined port, and any incoming connections will be load balanced across the available pods. If a pod dies, it will automatically rebalance connections over the remaining or replacement pods.
What's more, Kubernetes will create a DNS entry for this service, like
<service name>.<namespace>.svc.<cluster-name>.local. For example it might look like
logger.default.svc.cluster.local. (Most of the time you can just refer to it as
logger rather than using the FQDN, since you'll want to talk to the service in the same namespace and on the same cluster, which is the default.) It will also create a DNS SRV record that can be used to look up the service port.
# dig logger.default.svc.cluster.local SRV
;; QUESTION SECTION:
;bro-kube-bro-logger.default.svc.cluster.local. IN SRV
;; ANSWER SECTION:
logger.default.svc.cluster.local. 5 IN SRV 0 100 47761 logger.default.svc.cluster.local.
;; ADDITIONAL SECTION:
logger.default.svc.cluster.local. 5 IN A 10.101.22.6
;; Query time: 1 msec
You'd then create similar service definitions for the proxy and for the manager.
Now, rather than having Bro nodes connect to each other using explicit IPs and ports, those nodes should simply do a DNS lookup. You'll still want the hostname of the service to be configurable, but now it can be set at a global level that doesn't need to change as components are restarted and move around the cluster. When a logger dies, for example, the other nodes don't need to have their configs rewritten and restart. They just reconnect to the same service, and are directed to the new replacement logger.
So in my project, I have gone ahead to create these services in anticipation of moving to this model. However, Bro is not really using them. I would have to modify Bro to reduce its dependency on
cluster-layout.bro to begin to take advantage of the Kubernetes services.
One other thing
While I'm discussing areas where I'd like to see changes in Bro, as I mentioned, one other issue is that Bro does not automatically reread its config files when they change. In fact, while there used to be a command to explicitly tell Bro (via BroControl) that it should reread its config files, that command is deprecated. BroControl handles this instead by completely restarting the Bro process.
That's definitely not ideal in a cloud environment where I expect changes to happen fairly regularly.
For my project I've put in place a small wrapper script that starts Bro in a loop and will notices when config changes and kill Bro. This is in keeping with the Kubernetes philosophy of making sure an object takes care of the details on its own. That approach more or less works but could hide crashing problems in Bro itself, and it means there are time windows where Bro could miss packets or where a freshly-started Bro has lost context. A better option would be for Bro to notice config changes on its own directly and remove the need for this workaround entirely.
Next, I'll get into specifics about how to create a cluster and get it ready to host Bro.