Elasticsearch and Kibana (post 5 in a series)
In my previous posts in this series, I laid out my plan to enable Threat Hunting in a scalable way for a cloud environment by integrating Bro IDS with CloudLens, hosted on Kubernetes, with Elasticsearch and Kibana as the user interface. I then explained why Bro will serve as the intrusion detection system in my project, and how CloudLens provides the visibility into the cloud-hosted network. Then how Kubernetes will serve to host the project.
In this post I'll talk a bit about Elasticsearch and Kibana, and the role they'll play in the project.
Elasticsearch is a scalable, open source, full-text search and analytics engine. You populate Elasticsearch with documents. It's very flexible as to the structure of those documents. Then it makes those documents very searchable, and searchable quickly.
The thing about threat hunting is, it relies on logs. Lots and lots of logs. Each log describes some event or action in your network. Hunting for a threat means digging through those logs looking for events or actions that are suspicious. Or more likely, correlating a few different events and activities that, taken together, add up to something suspicious.
It's not that unusual with Bro to just go straight to the logs for this. Just use `grep` and `awk` and `cut` and other tools to dig around, right in the text files.
You can do that, but you'll be investing both brain power just on constructing your searches, and clock time waiting for them to run. What Elasticsearch does is free up that time and cognitive effort so you can spend it on deeper analysis instead.
Elasticsearch is just an engine. It provides APIs. Kibana adds a user interface on top. It's not a user interface geared to a specific use case. Instead, it's very flexible. You can run ad-hoc queries, you can create charts and visualizations of your data, and you can aggregate them into a dashboard. You just adapt Kibana to support your specific need.
You can find premade dashboard for Kibana for many purposes. Sometimes those can be a bit tricky to actually use because their field names don't quite match the data, or there's a version mismatch, or such things. But it's typically not terribly difficult to just use the template as inspiration to pull together a similar dashboard.
The end result
Once you're pushing data into Elasticsearch, it becomes much quicker to dig around and try to find things. This is especially true when you scale up to a larger deployment. Instead of having grep scan linearly through reams of logs, you're letting a multi-node cluster divide up the work, and work against indexed data. Searches turn around quickly, which gives fewer opportunities for coffee breaks, but means you can try more things in a day.
Then with the easy charts and visualizations that Kibana provides, it becomes easier to spot higher-level data like trends and changes. Anomalies like "what's this blip on the graph?" might focus your attention on something you would never have noticed poking around directly in logs.
At this point in the series I've given high level descriptions of all of the key infrastructure that I'll be pulling into my project. Next up, we'll start digging into the details of how it all fits together, and how you can replicate it in your environment.
- What is Elasticsearch? - https://www.elastic.co/blog/describe-elasticsearch
- Kibana overview - https://www.elastic.co/products/kibana