One of the things I look for when doing network assessments is the availability of syslog data. I’m always amazed at how few sites seem to be using this valuable source of information! Over the years, Terry Slattery and I have written a lot about syslog and syslog-NG, both in private reports and in our blog posts. It feels like it’s again time to blog about it and encourage the readers to not be “syslog slackers!” I’ll be brief -- and then point you to some references.
I hope you’re not turning up your nose at standard Linux syslog (old, basic, whatever). It’s simple; it works. Yes, finding the golden needles in the haystack takes tools. Doable!
Our basic recommendations:
- Feed all syslog to one or more central syslog receivers. Preferably, one big log file. Distributed logs in routers are hard to fetch, and painful to search across. Kiwi or other syslog storage in separate files and folders, ditto. Putting syslog into a database that requires SQL for queries, ditto painful. Just throw it into a flat file if you lack other syslog/log analysis tools.
- Keep one big log file with local system timestamps for time message was received. The point to the One Big File is that you can easily do time correlation (what was happening around when the outage occurred). Roll it over periodically with a script such as logrot: rotate it weekly or daily, saving at least, say, 60 days’ worth.
- Use a tool to convert Windows events to syslog and feed them in, too.
- The more you pool log messages, the more informed you will be.
- Use syslog-NG, free on Linux systems. Filter noise (messages that are not actionable) — not all; life’s too short to set up that degree of filtering. Filter out the high-volume ones. That’s what syslog-NG is good for. Also for notifications and for multiplexing the downstream feed to multiple syslog consumers (Splunk, your security SIEM, whatever), if needed.
- That solves another problem that’s been occurring lately — syslog feeding the SIEM, and network people lack access or have to use SQL, both significant barriers.
- Use syslog-NG to filter audit trail and similar info to files, unless there’s a good reason not to. It may be worth keeping, but it is unlikely to be directly actionable. So save it where you (or Splunk) won’t waste time (money) looking at it! And cut your Splunk bill!
Terry Slattery put me onto the summarize-nmslog script by Darin Davis. I’ve been hacking up a version of that for years. It’s not really in a state where I want to expose it in public. Darin’s posted version seems to have gone off Google’s radar now. So let’s say, if you email me, I’ll be glad to send you the script “as is” — some PERL/regex skills needed.
In part, every time I visit a new site, I seem to have to spend one to two hours hacking the regular expressions to match the local date/time format. The script still produces pretty useful results: frequency counts by overall Cisco message type, then by type by router. At one large site I was at, the Splunk expert was able to produce something similar in about a day.
A great tweak to this, thanks to someone at a customer site (Nikolay!): Take the bottom part of the output, the per-message per-router counts, add column headers, and import into Excel, then set up as a pivot table. Makes it all easier to read.
Why you should care: At one site, I saw a huge number of OSPF adjacency changes over one week. Time to go look at the router(s) involved. Duplex mismatches, ditto. CDP VLANs allowed on trunk mismatches, ditto. STP instability. All stuff that might show up as performance or connectivity dropouts. Many things that do not show up with SNMP polling. With syslog, a chance to get proactive!
Highly recommended; there are nuances the above does not address. (I did say “brief.”)
This article originally appeared on the NetCraftsmen blog.