stargeek
PHP news website logo.
home    PHP scripts    articles    seo tools    links    search    contact    shop    realtors


Bayesian spam filtering for the masses







Bayesian spam filtering for the masses

Bayesian spam filtering for the masses 10/30/2003 11:59 PM

Spam, or unsolicited commercial e-mail, is now a sad part of everyday life online. Research companies estimate that more than 50% of the worldwide e-mail traffic is spam. As a result, it's becoming constantly more difficult and time-consuming to sort out legitimate e-mails from the deluge of commercial messages we're being flooded with. But there are ways to fight back. In this series, we'll walk through choosing and setting up a highly effective package for screening out spam.




This is a GrokNews Entry: (what is grok?)





Similar Items

Bayesian spam filtering for the masses

Grok Headline matches for Bayesian spam filtering for the masses

[OT] Safe spam filtering methods (was:
Is predictable spam filtering a
vulnerability?)


[OT] Safe spam filtering methods (was:
Is predictable spam filtering a
vulnerability?)
06/22/2004 11:56 PM
The Fungi (Jun 20 2004)

Bayesian Pattern Filtering Library
0.1.0alpha


Bayesian Pattern Filtering Library
0.1.0alpha
03/13/2003 06:02 PM
A C++ library for building Bayesian Filters.

Microsoft calls for outbound spam
filtering against spam


Microsoft calls for outbound spam
filtering against spam
06/04/2004 10:42 AM
Computer Weekly Jun 4 2004 2:14PM GMT

E-texts used against Bayesian
spam-filters


E-texts used against Bayesian
spam-filters
12/02/2003 07:37 AM
Bayesian anti-spam filters count word-frequency in suspect and compare the results to profiles of word-frequency in spam and ham. Defeating this requires that your spam include a lot of natural human prose. So spammers have started to mine the Gutenberg Project and other sources of human-generated ASCII and dumping random hunks of literature into their messages to get around the filters.
Blogger and journalist Clive Thompson found an excerpt from Chapter 20 of The Master Key by Wizard of Oz author L Frank Baum in a message that had as its subject line "the big unit" (no prizes for guessing what the rest of it was hawking).
Link

bogofilter -- Fast Bayesian Spam Filter


bogofilter -- Fast Bayesian Spam Filter 03/16/2003 01:31 PM
bogofilter-0.11.1.3 - new stable release

SpamProbe - fast bayesian spam filter


SpamProbe - fast bayesian spam filter 03/27/2005 10:08 AM
spamprobe-1.1x6 released

Bayesian spam rumination: when
word-frequency-histograms attack!


Bayesian spam rumination: when
word-frequency-histograms attack!
06/29/2004 10:40 AM
Ed Felten has posted an intriguing rumination on the possible failure modes of Bayesian spam-filtering -- filtering that uses word-frequency statistics to classify email as spam or ham. As Ed points out, Bayesian filters are trained by the spammers, who, by choosing the vocabulary of their messages carefully, can make messages containing certain words or phrases undeliverable on the Internet.
Now suppose a big spammer wanted to poison a particular word, so that messages containing that word would be (mis)classified as spam. The spammer could sprinkle the target word throughout the word salad in his outgoing spam messages. When users classified those messages as spam, the targeted word would develop a negative score in the users' Bayesian spam filters. Later, messages with the targeted word would likely be mistaken for spam.

This attack could even be carried out against a particular targeted user. By feeding that user a steady diet of spam (or pseudo-spam) containing the target word, a malicious person could build up a highly negative score for that word in the targeted user's filter.

Link

Pre-Filtering The Spam


Pre-Filtering The Spam 06/29/2004 03:54 PM
My anti-spam system now uses a variety of server and client side filters to help keep the damn stuff out of the inbox. Now, some are suggesting an even earlier level "pre-filter" for spam. HP Labs has made a fairly simple discovery that even without various email authentication systems, it's pretty easy to get a quick determination as to whether or not an email is spam. The system they developed looks at whether or not the server sending you the email normally sends good emails or sends spam. With that one determination, they can properly pre-classify emails at a pretty high success rate. It's not a replacement for a spam filter at all. They know it's not that good. However, what it can do is do a pre-sort for prioritization purposes -- so that good emails tend to make it through the real spam filters faster. In a number of ways, it's pretty sad that we now need "quality of service" setups for our email.

ISP Hesitate Over Spam Filtering


ISP Hesitate Over Spam Filtering 06/03/2004 02:24 AM
As the spam battle wages on, most of the focus is on end-users and law enforcement. Not too many people seem to focus on the role of ISPs, who sometimes do take a more proactive role in stopping spam. The problem, though, is that when the ISP filters spam, they often run into issues with false positives. If the filters are too loose (to avoid false positives) then too much spam gets through, and users are upset. If the filters are too tight, important messages go missing, and users are upset. Many ISPs are realizing, at the very least, they need to let the end-user have access to the spam folder, so they can occasionally sort through it for false positives - but very few users ever bother to look through it. Some ISPs don't offer any kind of filtering at all, claiming that they don't see how to make money off of it - which seems especially short-sighted. If they can offer sufficient spam filtering, they're much more likely to keep customers than if they simply let everything through when customers are looking to their providers to provide protection from the onslaught of spam. No matter what, it's becoming clear that the spam fight needs to be approached from various angles, and many customers are likely to bail out on ISPs that don't at least offer a spam filtering option.

New Method of Spam Filtering


New Method of Spam Filtering 02/19/2004 02:06 PM

Spam filtering, the next chapter


Spam filtering, the next chapter 05/24/2004 09:17 AM
I've been experiencing good results filtering out spam with a combination of Popfile (a Bayesian classifier) and the built-in filter in Eudora 6.  I get well over 1,000 spams a day, so I need accuracy both in identifying spams and avoiding false positives with legitimate mail. 

The biggest problem with my setup is that it all runs on the client side.  Popfile works as a transparent proxy that runs on my Windows machine.  I don't see the spam in my inbox, but I still have to download it before it can be filtered.  As the spam volumes have increased, that has become an increasingly significant burden.  Every check pulls down scores of messages, most of which wind up in the trash.  I've had several cases where the sheer numbers crash Eudora. Getting email through my Treo is basically a waste of time, because it doesn't have the filters.  If I'm on the road and don't check my mail, there are thousands of messages waiting for me when I get back. 

I finally had time this weekend to set up filtering on the server side.  Werbach.com and my other domains run through a Web hosting provider, Pair Networks, which offers a version of SpamAssassin.  The tricky part was configuring it to automatically filter or delete messages, using procmail, rather than just putting something in the email header for later processing on the client. 

I think I have it working now.  I'm using SpamAssassin on a forgiving setting, because the client-side filters are still running after the mail goes through.  If I can just weed out 60% of my spams before they reach my machine, life would be much better.  So far, it looks like I can do significantly better than that. 

I'm still tweaking the set-up, so it's possible some legitimate email will get stuck in the filters.  If you write to me and don't get a response for a while, please try again. 

Using AI for Spam Filtering (w/ Source
Code)


Using AI for Spam Filtering (w/ Source
Code)
07/11/2004 09:20 AM

Spam filtering with a human touch


Spam filtering with a human touch 09/21/2004 01:11 PM
One company is offering a novel solution to the problem of spam. Will spam filtering done by humans be a hit?

Re: Is predictable spam filtering a
vulnerability?


Re: Is predictable spam filtering a
vulnerability?
06/18/2004 01:01 PM
Joel Eriksson (Jun 17 2004)

Verizon sued over spam filtering


Verizon sued over spam filtering 02/01/2005 08:53 PM
Upset Verizon customers have filed a class-action lawsuit over the telco's aggressive spam filtering. Verizon's blacklists are allegedly blocking all mail from some countries.

A Unique Approach to Spam Filtering


A Unique Approach to Spam Filtering 07/06/2004 03:03 AM
Frontgate MX brings a New Level of Simplicity to Personal E-mail Protection With a Unique "Single Step" Approach to Spam Filtering. [PRWEB Jul 6, 2004]

Human-Powered Spam Filtering


Human-Powered Spam Filtering 09/20/2004 10:28 AM

Is predictable spam filtering a
vulnerability?


Is predictable spam filtering a
vulnerability?
06/17/2004 03:44 AM
R Armiento (Jun 16 2004)

I thought that our spam filtering had
suddenly got...


I thought that our spam filtering had
suddenly got...
12/29/2003 10:31 PM
I thought that our spam filtering had suddenly gotten way better, but turns out, my pyra.com mail just started bouncing instead of forwarding. Bummer. If you tried to reach me at pyra.com try doing so at google.com. Or just wait (upon DNS updating, it should be fixed).

Extreme Spam Filtering – When Filters
and Blacklists Are Not Enough.


Extreme Spam Filtering – When Filters
and Blacklists Are Not Enough.
02/07/2005 01:05 AM
Protect Multiple POP, Yahoo, Hotmail, Gmail, or IMAP E-mail Accounts from Spammers with 0Spam.com. Compatible with all E-mail clients and operating systems. [PRWEB Feb 6, 2005]

Mailsmith gets server-side spam
filtering, more


Mailsmith gets server-side spam
filtering, more
07/21/2004 11:18 AM
Bare Bones Software today announced the release of Mailsmith 2.1.2, the latest version of its powerful e-mail client...

Microsoft pushes spam-filtering
technology


Microsoft pushes spam-filtering
technology
06/24/2005 03:25 PM
ZDNet Jun 23 2005 2:00AM GMT

Microsoft calls for outbound filtering
against spam


Microsoft calls for outbound filtering
against spam
06/04/2004 07:29 AM
SAN JOSE, California -- In its continuing fight against unsolicited commercial e-mail, Microsoft Corp. plans to filter outgoing messages on its consumer mail services and is busy developing new "proofing" technologies, the software maker's chief spam fighter said Thursday.

Notice to customers using e-mail
filtering "SPAM" software


Notice to customers using e-mail
filtering "SPAM" software
11/15/2003 11:03 AM
...

PowerMail's user interface, spam
filtering updated


PowerMail's user interface, spam
filtering updated
05/24/2004 09:10 AM
CTM Development has released PowerMail 5.0, a major upgrade of the Mac OS X mail client...

Mailsmith 2.1.2 adds server-side spam
filtering, more


Mailsmith 2.1.2 adds server-side spam
filtering, more
07/21/2004 11:12 AM
Bare Bones Software Inc. on Wednesday released an update to Mailsmi th, their e-mail client for Mac OS X. New features in this release include support for server-side spam filtering, the ability to process incoming messages with Unix tools during download, and new preferences and interface enhancements.

Eudora 6.0: E-Mail Favorite Gets
Built-In Spam Filtering But Still Shows
Its Age


Eudora 6.0: E-Mail Favorite Gets
Built-In Spam Filtering But Still Shows
Its Age
12/19/2003 11:32 AM
Eudora is an undeniably powerful product. It's fast -- especially when searching thousands of archived messages -- and quite flexible once you take the time to learn its quirks. Its new spam-filtering features are first-rate, especially since they support third-party spam-filtering tools. By Jason Snell (Macworld via MyAppleMenu)

Re: Is predictable spam filtering a
vulnerability? (silently dropping
messages)


Re: Is predictable spam filtering a
vulnerability? (silently dropping
messages)
06/22/2004 08:18 PM
Martin Mačok (Jun 22 2004)

Re: Is predictable spam filtering a
vulnerability? (silently drop ping
messages)


Re: Is predictable spam filtering a
vulnerability? (silently drop ping
messages)
06/24/2004 04:28 PM
Stephen Warren (Jun 24 2004)

Spam, spam, spam, spam ... Canada
targets unwanted email (AFP)


Spam, spam, spam, spam ... Canada
targets unwanted email (AFP)
05/12/2004 04:17 AM
AFP - Canada unveiled a new action plan to combat unsolicited commercial e-mail, nicknamed spam, which jams inboxes and clogs Internet traffic worldwide.

Java Bayesian


Java Bayesian 04/01/2005 06:56 AM

Is there a decent open source Java Bayesian package that is not GPL or similarly restricted from commercial use?  I am aware of only Classifier4J.  Preferably, it should be optimized for server applications and high performance.


Bayesian Aggregator


Bayesian Aggregator 12/02/2003 08:47 AM
In a comment, Kevin Jordan writes: 348North News is a normal aggregator in much of the way you think of it. However, it allows me to identify keywords or themes that it puts together into phrases — and then matches up the phrases with like articles. Like a cross between Google News and Daypop (but that makes it sound much more complex than it is). If you want to see an "interests" based summary for me, check out the Phrase Index. I use fairly general keywords so as not to miss out on the future items. I haven't tried...

Subconsciously, People may be Bayesian


Subconsciously, People may be Bayesian 01/22/2004 02:48 AM
DAVID LEONHARDT writes in a NY Times article about people playing the odds of everyday life with Bayesian Analysis. He describes new research, recently published in Nature, "which stands out because it offers a detailed window into how the Bayesian thought process works, showing the point when uncertainty becomes great enough to give past experience an edge over current observation." Bayesian Analysis, among researchers, is "the combining of new information with conventional wisdom." I do agree about their reliance on past observations, but I believe that they have underestimated the role of future orientation in the whole mix of decision making.

Bayesian Aggregation, Part I


Bayesian Aggregation, Part I 02/14/2003 03:23 PM
On Monday I configured Scenario 3 of my Bayesian Aggregation experiment, building a "good" corpus of my weblog entries and...

Bayesian Filter Library


Bayesian Filter Library 03/13/2003 11:34 AM
0.1.0alpha release

Working with Bayesian Categorizers


Working with Bayesian Categorizers 11/19/2003 08:11 PM
Bayesian classification has proved a powerful weapon against spam. Jon Udell tries to find out whether it can be put to use in other spheres of content categorization.

Bayesian Aggregation, Part II


Bayesian Aggregation, Part II 02/23/2003 05:22 PM
iFile finally classified something as belonging on my weblog, but I have no idea why... Justin Rudd's Busy weekend complicated...

Working with Bayesian categorizers


Working with Bayesian categorizers 12/02/2003 01:38 AM
There's been some discussion in the blog world about using a Bayesian categorizer to enable a person to discriminate along various interest/non-interest axes. I took a run at this recently and, although my experiments haven't been wildly successful, I want to report them because I think the idea may have merit. [Full story: O'Reilly Network: Working with Bayesian Categorizers]
This month's O'Reilly Network column was a struggle because categorization itself is a struggle. I remain convinced that the automated classifiers that are doing such a good job beating back the tide of spam will also turn out to be more generally useful. But finding the right synergy between an automated assistant and a human overseer is a subtle and tricky thing. ...

Devshed: Implement Bayesian Inference
with PHP


Devshed: Implement Bayesian Inference
with PHP
01/06/2005 09:24 AM
DevShed has a new article posted today for all of those interested in a better way to filter out information/spam messages from your data - Implement Bayesian inference using PHP, Part 1 .
Grok Description matches for Bayesian spam filtering for the masses
GrokA matches for Bayesian spam filtering for the masses

Bayesian spam filtering for the masses

The following phrases have been identified by the grok system as matching this entry:

















Also check out:


Grok

Ipod Porn on the
Rise

Brief Abstract of
Wikipedia's
Mesothelioma Cancer
page

Get first aid
instructions in your
cell phone

IE is crap
JSPWiki gains
podcasting support

The GPL is
unconstitutional!

MSFT - not the
high-growth story it
was

Apple releases
source code for
Darwin 7.0

Training Annoyance
Filter to combat
spam

GKrellM: Geek
eye-candy, monitors,
and more

Borland releases Web
services-aimed
JBuilder X

An early eval of
Apple's Mac OS X
10.3

Integrating
Annoyance Filter and
KMail

FSF General Counsel
Eben Moglen on Cisco
and SCO

Case against banker
starts thin, needs
heft

Blog has become
former actor's
portal into new
career

VeriSign responds
with arrogance to
Site Finder critics

What Arnold
Schwarzenegger can
do to help restore
Golden State's
luster

Quattrone defense
case faltering

Progress, innovation
coming to cell
phones

Copyright industry
after another power
grab

Going the distance
with telecom
customers

Case is weak for
lifting H-1B cap

Eclipse Resource
Labels 0.1.2

LANforge FIRE &
ICE 4.0.1-beta1
(Development)

Jameleon 1.4.1.3
Cluster Installation
Finishing Scripts
3.1.0

Telkom CEO puts two
Mafikeng rural
schools on WWW by
satellite

Symantec unveils
latest Norton
offerings

Thawte launches
Crypto Challenge

BEE ICT firm
launches in Pretoria

Airline outsources
to EDS

How much innovation
is enough?

Arrive Alive Web
site gears up for
holidays

IBM dumps computer
programme agency

Internet Xmas
shopping alert

Overture Tops Google
in Paid Listings
Study

Microsoft and
Google: Partners or
Rivals?

Annual PC/104 Design
Contest

The Artificial
Muscle Man

Embedded Processors
of Tomorrow

Review: VIA EPIA
M10000 Mini-ITX

More Martial Arts
Robots

Robot Work Force: A
Blessing or a Curse?

The Incredible,
Shrinking CPUs

Hydrogen Fuel Cell
Robot Project

Spider Robots of the
Forest World

UNECE 2003 World
Robotics Survey

Better Android Skin
Robot Dog Fart
Causes Airport
Crisis

NSF Robotics
Research News

People Are Robots,
Too. Almost

DARPA Grand
Challenge PR
Problems

RelayFax Server
v5.0.3

Ai Picture Explorer
v6.6.1.6601

what is grok?