IDS Best Practices
July 16, 2010
Intrusion detection systems (IDSs) have a bad reputation. Yes, they can be noisy and generate lots of false positives, both the network- and host-based products. But they are very useful to have at the WAN edge and within your LAN, and you can correct the signal-to-noise ratio through proper tuning and by understanding your environment. In fact, knowing your environment is the foundation of everything we as security professionals do. If we don't understand what data flows between two points or what servers live in which subnets, we can't really know what to protect and how to protect it. When implementing an IDS or its cousin, the intrusion protection system (IPS), the same principle applies. Here are some best practices for implementing these tools that I've learned on the job.
First, monitor and tune one IDS sensor at a time. This saves you from being overwhelmed by alerts and false positives. Make sure you thoroughly understand the system or networks you are monitoring. Talk to the network and system admins and users and learn what and how they use the systems you are monitoring. You'll also want to know major actions such as when they update software, and whether there are processes that only run at certain times, such as monthly or quarterly. These operations might not be captured in the IDS's initial monitoring period and will raise a flag when they happen.
This rule is especially apt for IPSs, which can block traffic or processes. Most commercial IPS solutions can be placed in a monitor-only mode, which provides the device with a baseline of normal network activity, and lets you tune rules without actually stopping any traffic. Once you are comfortable with your rule set, enable the active protection.
I broke a production system once because I didn't monitor long enough before I started setting protection rules. I missed the fact that users of the system made calls to a second version of Java installed in a location that the security software believed should not have any executables. Had I allowed learning mode to run for another week, I would've saved myself a late-night wake-up call from the team that was attempting to roll out new software, and couldn't because they were being blocked. Keep in mind that you'll need to retune your IDS over time. Your network and systems will change, as will the risks to your organization, which means your rules will also have to be adjusted to keep alerts relevant.
Second, have alerts of a certain priority sent directly to you so you know when you are being attacked, or when other events might require your attention. To reduce the noise, set alerts only to the risks you are most concerned about, and don't rely on out-of-the box settings. A vendor might put an IIS attack at the top of the priority list, but if you only use Apache, you can let those IIS alarms just sail on by.
Third, I strongly recommend employing a log and alert correlation product in conjunction with your IDS. These correlation products can do several things. First, they can group alerts so you don't receive a page or an e-mail every 20 seconds. Instead, you can get batches of alerts or events in more manageable increments. They also provide insight across multiple platforms, including network and host IDSs, firewalls, and syslog events from other systems. This is important because it can provide you with a clearer picture of what's happening. For example, they can put together events across multiple network IDS sensors, host based sensors and syslog to determine an attacker was successful in attacking a server but was denied access by the host protection as he attempted to perform some task. Correlation can provide better insight on events, and enable faster response with less work--that's a huge ROI. If you can't afford one of the big log management or correlation packages, check out open-source solutions such as OSSIM.
Fourth, have a system in place to ensure that IDS event logs are reviewed regularly. The devil's minions--PCI QSAs--will demand an audit trail to prove someone reviewed the logs. You may find it silly to have to prove you are doing your job, but there are benefits here. An audit trail ensures your team is on the ball. It provides metrics around workloads, which can help justify more headcount. The audit trail also has operational value because it helps ensure that jobs that need to be done are being completed. To keep it non-intrusive, integrate the process into your workflows.
I don't have a product that provides an audit trail for exactly what alerts I reviewed, so I've improvised. All alerts that I receive are also sent to the help desk ticketing software and directly into my queue. During the day, I review the alerts when I'm in my office and close the tickets. This meets my audit requirements, allows me to pull metrics from the ticketing system to show how much work we do, lets the boss know the team is working, and ensures I don't miss anything I should have reviewed.
Finally, if you place a network IPS inline or use a tap to monitor traffic, be sure it won't bring down the network. I recently witnessed a network outage because the power to the network tap was accidentally removed. The tap should have failed open, but it didn't. Check to ensure you've set your devices to fail open (if this is your policy), and then test it. Simulate a power outage for that device to ensure traffic continues to flow--all you have to do is pull the power cable.