What Was Microsoft Thinking?
March 20th, 2007 Posted in Rants, Windows Server
I’m generally not one to bash on Microsoft. Support of their OSes pays my bills, so I have a hard time hating it. Let’s face it, Unix & Linux are great, stable, flexible, will do everything you ask of them, and are the only way to go when it comes to server operating systems. Unfortunately until someone can create a GUI that’s usable by users of all ability levels and that can obscure the underlying filesystem, it just isn’t going to catch on as a ubiquitous desktop operating system. The Mac fanboys are likely going to tear me a new one in the comments here proclaiming that Apple has done just that, but let’s face it, there are far fewer machines running OS X than there are Windows. The Mac has it’s place that’s for sure, and it’s gaining ground, but for now Windows is still the reigning champ. I never really had any major complaints about Windows (other than the occasional BSOD, driver issues, and less than perfect security) but that came to an abrupt end last week. Whoever designed the event system for Windows Server needs to die in a fiery car crash.
It came to our attention last week that one of our domain admins had their password compromised by a student. As the AD administrator, it was my responsibility to gather the necessary information for the investigation and prosecution of the offending punk. Other than taking forever because the security log is 500 MB so we can can keep more than a day’s worth of events, limiting the events to a particular username was easy. Here’s my first complaint: why in god’s name wouldn’t Windows Server implement log rotation like every other sane operating system and server application? It seems the only way to do log rotation is with a custom script or 3rd party application.
Things got rough when I was asked to determine who logged into the computer in question as well as the computers around it from a few hours before the incident to an hour after. The good news is that there is a computer field in the event viewer so in theory I could just search on the computer names. The bad news is that the only computer name that shows up in that column is the name of the domain controller … what’s the point? In the message content of the logon event (Event ID 540) is the IP address of the client making the logon request and the username. Hallelujah! All the info I need is there. But here’s the rub; in all of their infinite wisdom, Microsoft didn’t put a mechanism in for searching this information … thanks a lot guys. So now how am I supposed to filter a log that’s just a tad over 500MB to find logons for 5 machines during a 5 hour period?
Enter log parser 2.2. This is a tool created by Microsoft specifically for parsing event logs (intuitive name eh?) and providing them in snazzy formats like text, comma separated values, and even HTML. Log parser uses SQL queries to extract the information you’re looking for from the logs. After stumbling into this, I thought I had hit the motherload, that is until I tried to run it. It’s a command line only tool and has nearly zero documentation. The sql queries aren’t self explanatory even when you figure out how to formulate one. A prime example of this is the “strings” field which tokenizes the event message based on a delimiter. Sounds good right? Well not all log messages are in the same format, so the field that gives you an ip address for a logon event, doesn’t give you anything for a logoff event. Without being an sql guru, it would be extremely difficult to get a list of all log on and log off events together.
I finally formulated two queries to extract the data I needed, one for logons and one for logoffs, and started tracing back the events by time. That’s about the time I nearly flipped my lid. There were logon and logoff events for the same user from the same computer at exactly the same second! How does that happen?!? After doing some searching on Event 538, I finally found a document that explained that it was a logoff event, but that it should never be trusted, and that it’s nearly impossible to tell when a logoff actually occurred. Why is this so hard? Shouldn’t it be easy to contact the domain controller when the user clicks “log off” and register and event with the username, computer name, and ip address? Barring the occasional situation where a user pulls the plug, or there’s a power or network outage, this should be pretty reliable. It’s not even possible with event viewer to filter the log and then print or export the full content of the error message with the details, you can only get the time, type of error, event id, and name of the computer that it occurred on. It honestly seems like logging was a complete afterthought to the OS design.
I’m currently in the process of exploring Microsoft Operations Manager 2005 (MOM 2005), hoping that it will help remedy this problem. Just the promise of a central logging facility and alerts is a major step forward. With any luck, it will also make analyzing those logs much easier than it is currently with the event viewer or log parser. If you’ve got any tips, tricks, or suggestions on 3rd party software to make life easier in this respect, please leave a comment; I’d love to hear them.
tags: mom, microsoft operations manager, event id 538, event id 540, event viewer, log parser, log parser 2.2, logging, rants, windows, windows server
One Response to “What Was Microsoft Thinking?”
By luke on Jan 10, 2008
I use ntsyslog (http://ntsyslog.sourceforge.net/) on all of my windows servers and log to a linux server. This makes life very easy because not only is syslog a standard format so you can find many parsers, but I like having the logs in two places, just in case the windows box gets rooted and someone deletes the logs or whatnot.