Languistics' B-Monitor Keeps Spam and Malicious Code from Infiltrating E-Mail

Using natural language processing methods for e-mail filtering, B-Monitor bypasses problems of the keyword approach. Here's an exclusive peek.

July 8, 2002

4 Min Read
NetworkComputing logo in a gray background | NetworkComputing

Beyond Keywords

Most e-mail-filtering systems build policies from rules that use a keyword approach that groups words or phrases into standard or forbidden classifications -- such as gambling, offensive language, pornography, racism and sexual harassment -- and then apply actions -- such as deletion or quarantine -- to messages with those keywords.

That approach, however, has its limits in the English language, which is rife with ambiguous words. Blow, bust, joint and puff may be innocuous, or they may embarrass, humiliate or inflame. Using such keywords can generate false positives; avoiding them, however, can lead to false negatives -- that is, the forbidden content gets through the e-mail system.

B-Monitor applies NLP (natural language processing) techniques to e-mail filtering. Using Languistics' XML-based !metaMarker, B-Monitor automatically extracts and organizes text information to find contextual meaning. The software uses descriptive tags to classify words as parts of speech and analyze messages' explicit and implicit language content.

B-Monitor tags standard items, such as author, subject, date and time; standard violations, such as offensive language, sexual harassment and gambling; and configurable features, such as product names and transaction types. The tags also can classify the sender's intention, goal or mood. In the labs I installed B-Monitor in an enterprise messaging environment running Sendmail under Solaris on a Sun Fire 280R server and a Sun Ultra 10 workstation. I changed the appliance's IP address and manually configured the bmonitor.xml and adminclient.xml files with the primary mail server's IP address. I set the device to monitor and filter mail for the test domain (w2k.nwc.com) and to handle all incoming and outgoing mail for the primary mail server by configuring sendmail.cf and DNS MX records. To generate mail traffic, I used an SMTP mailer (Blat version 1.9.4) for Windows 2000 Professional on 10 Dell Celeron 500-MHz computers and a mail relay.

Languistics supplied a test collection of 10,000 text files designed to violate an enterprise's acceptable use policy. These files, classified into categories based on content, contained offensive, discriminatory, drug-related and racist language.

After doing a random check to verify the text files' abusiveness, I set up B-Monitor's PolicyBuilder and PolicyMonitor. These Java applications create, view, manage and report on e-mail policies and rules. Using Samba, I exported the applications' directory to a Windows 2000 Server using J2RE 1.3.1_02.

Policies use conditions, actions and exceptions to define specific violations. Conditions are states of a message that exist to trigger a rule. Actions occur when those conditions are met. You can configure exceptions to any rule. For example, you may want to exclude certain rules from applying to some users, like your CEO or CIO. When it ships, B-Monitor will include a Java API to let customers create custom actions, such as sending SMS-based alerts. Unfortunately, B-Monitor does not support Active Directory, LDAP or other directory schemas.5,000 Nasty Messages

I used B-Monitor's sample policy and rules to quarantine messages. Using the SMTP mailer on each of the Dell Celerons, I sent more than 5,000 of the preclassified text files to users on the primary mail server through B-Monitor.

After delivery, I viewed the messages that were quarantined with the PolicyMonitor and compared them to a master list of messages and their classification. B-Monitor delivered 3,011 of 5,135 messages, only 75 of which were false negatives that should have been caught. B-Monitor quarantined 2,124 messages; just 51 were false positives. For the entire test, B-Monitor showed impressive recall (.97), precision (.98) and accuracy (.97) in identifying content that fell outside acceptable use.

In another test, I used a policy with 13 rules to see how it would affect mail processing. I sent 1,000, 5,000 and more than 95,000 text files as messages. B-monitor analyzed and passed two to three messages to the primary mail server per second. Although the mail servers did not do reverse-DNS lookups, as they would in the real world, B-Monitor took action on more than 40,000 of the 95,000-plus messages, and delivered them all in approximately 24 hours. All the while, it maintained between 70 percent and 80 percent utilization.

B-Monitor's PolicyMonitor logs message violations and provides administrators a view of all violations, quarantined messages and administrative alerts.Sean Doherty is a technology editor and lawyer based at our Syracuse University Real-World Labs®. A former project manager and IT engineer at Syracuse University, he helped develop centrally supported applications and storage systems. Send your comments on this article to him at

[email protected].

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like


More Insights