Logs are the indispensable offal of IT. They are a link between Dev and Ops, the unsung hero of virtuoso troubleshooting, and often the ultimate arbiter of truth in assigning complex root cause. Logs are also a pain to manage well. They can be too terse or inscrutably verbose, evaporate with the termination of ephemeral resources like containers, or inaccessible in highly distributed architectures like the internet of things.
While mature production environments eventually solve these issues, fiddling with logging distracts developers from their primary goal: getting new apps off the ground quickly. Logging as a Service (LaaS) makes developers more efficient and operations more reliable by addressing three main requirements: access, aggregation, and alerting.
console.log (‘Hello World’)
Before you can put your unikernel/microservice/DockerSwarm/Kubernetes deconstructed, cloud-native services wunderkind into continuously delivered production bliss, you have to build the thing in the first place. Or if you run on-premises, before you bless third-party applications as production ready, you’ll need a not-insignificant amount of familiarization, config, and testing. In either case, there are a lot of moving parts to master during the R&D/testing phase, and logs are the go-to tool we rely on to understand what’s really happening.
How many times have you hit the keyboard in a potentially breakthrough moment of inspiration, then lost your spark re-implementing yet another foundation component just to get started? Sure, you can pick the log exporter of your choice, configure it to watch which files you care about, hook into systemd, set up a collector instance, configure the network for transfer, and test. But that seems like a lot of work to test a microservice. Or maybe you’re trying to collect MySQL request errors to help the Dev team debug an app, or need Windows Security events to sit somewhere other than local. A swarm of individual console windows is an option, but not a good one. Well, it’s not as bad as turning everything into syslog, but it’s close.
But what if logging was a service? What if developers, sysadmins and the network team could focus on transformative work, not prerequisites? LaaS allows teams to standardize the collection, persistence, and analysis of events, and accelerate delivery without sacrificing quality. It also solves application troubleshooting issues for ephemeral resources like containers, or host-shifting resources like VMs.
Logs on a server are great, but logs inside of something on a server are tricky. The container boundary intentionally limits access, and auto-scaled resources en masse don’t feel guilty about taking logging with them at termination. You could run SSH in your container and pull from the outside, but you would be a bad person. For packaged apps, vendors often recommend specific tools, which they’ll conveniently add to invoices. That results in more integration or inefficient cross-team tool expertise.
By configuring logging export services inside container images, or using VM OS agent automation, administrators may remove the need to dive into running instances entirely, along with lots of privileged access and/or auditing issues. Autonomous processes -- machines doing work on our behalf -- should publish events, warts and all, where it’s easy to preserve, observe, search, and take action.
Logs are also critical for IoT, or really any tiny thing on a network, but are also particularly difficult to access. Tiny things with tiny CPU/OS/apps dump details to tiny storage. What data they deliver is -- and for security’s sake, should be -- pushed to a server, not open to query from the world. Just because they’re small and cheap, however, doesn’t mean their operational details are disposable. Distributed microsystem log data is hugely valuable for debugging, and LaaS offers an efficient conduit to securely transport thousands or even millions of discrete logs for analysis.
Aggregating all the things
If you’re a larger, high-performing IT organization, you may already wield a centralized logging aggregator. But for the majority of IT, and certainly individual developers standing up proof-of-concept services, managing aggregation technologies over time is a challenge. Budget for application availability, data durability, and multi-standards compatibility is limited enough, even less so for logging.
On the other hand, if log collection is simplified into a single multi-OS/protocol/stack pile, aggregation is a natural and encouraged consequence of reducing complexity for admins and developers. We as an IT species are always happy to remove drudgery and are motivated to integrate and offload, resulting in a glorious, giant pile of useful data on somebody else’s servers. It’s quality thought-curated laziness. Management benefits, too: A single source of event truth is indispensable for team efficiency, and even risk reduction when compliance is an issue.
Where LaaS platforms really shine, though, is navigating an impossible duality: collecting the verbose, but alerting tersely. Inconsistent or time-consuming manual log configuration is the single most frustrating issue for troubleshooting. Too often, even if logs are collected, log levels are set too high to catch events to illuminate bedeviling, infrequent, or novel operation errors. The solution is to increase detail and hope to catch it again while local filesystems fill with unnecessary trace or debug events. Step two, dust off your RegEx spell book, and grep on your lunch break.
LaaS platforms, on the other hand, encourage the inverse: send all the details and let an alerting engine do the work. First, they offer admins optimized search tools, far more efficient than grepping dozens of files. In-memory analysis, tagging, normalization, and saved searches remove repetitive work and also allow reuse across teams. It’s pretty handy as an app developer to reuse a nuanced, platform-specific magic pattern created by an Ops guru.
Better yet, LaaS query analysis engines run continuously, allowing engineers to define real-time alerts, spanning all the systems in the pool. Users may peruse event data that’s of interest to them and decide on their own what’s meaningful and how they want to consume it. Ops admins may opt to send specific failure messages to their mobile device, while developers may send a larger set of filtered log lines from test or staging to a file for analysis later. The key benefit is to convert what were once disparate, isolated, and often lost data into an available fount of comprehensive, discrete events. It delivers proactive details on demand, without alert-spamming everyone on the team.
If you haven’t experimented with Logging as a Service, I urge you to, even as a side project. Once you have a taste for it, you’ll find that centralized logging lets you focus on what drew you to technology in the first place: building. Personally, I started with the Papertrail free-tier for Docker, Raspberry Pi, and AWS side projects. In general, you’ll find that creating a one-time SNS topic exporter, configuring an extra line in Winston for Node, and getting your systemd journalctl events out where they can do some good is easier than you think. Moreover, it has amazing benefits that improve operations quality, shorten time-to-deploy, and might even make your weekends longer.