stargeek
PHP news website logo.
home    PHP scripts    articles    seo tools    links    search    contact    shop    realtors


Bayesian spam rumination: when word-frequency-histograms attack!







Bayesian spam rumination: when
word-frequency-histograms attack!

Bayesian spam rumination: when
word-frequency-histograms attack!
06/29/2004 10:40 AM

Ed Felten has posted an intriguing rumination on the possible failure modes of Bayesian spam-filtering -- filtering that uses word-frequency statistics to classify email as spam or ham. As Ed points out, Bayesian filters are trained by the spammers, who, by choosing the vocabulary of their messages carefully, can make messages containing certain words or phrases undeliverable on the Internet.

Now suppose a big spammer wanted to poison a particular word, so that messages containing that word would be (mis)classified as spam. The spammer could sprinkle the target word throughout the word salad in his outgoing spam messages. When users classified those messages as spam, the targeted word would develop a negative score in the users' Bayesian spam filters. Later, messages with the targeted word would likely be mistaken for spam.

This attack could even be carried out against a particular targeted user. By feeding that user a steady diet of spam (or pseudo-spam) containing the target word, a malicious person could build up a highly negative score for that word in the targeted user's filter.

Link




This is a GrokNews Entry: (what is grok?)





Similar Items

Bayesian spam rumination: when word-frequency-histograms attack!

Grok Headline matches for Bayesian spam rumination: when word-frequency-histograms attack!

Bayesian spam filtering for the masses


Bayesian spam filtering for the masses 10/30/2003 11:59 PM
Spam, or unsolicited commercial e-mail, is now a sad part of everyday life online. Research companies estimate that more than 50% of the worldwide e-mail traffic is spam. As a result, it's becoming constantly more difficult and time-consuming to sort out legitimate e-mails from the deluge of commercial messages we're being flooded with. But there are ways to fight back. In this series, we'll walk through choosing and setting up a highly effective package for screening out spam.

E-texts used against Bayesian
spam-filters


E-texts used against Bayesian
spam-filters
12/02/2003 07:37 AM
Bayesian anti-spam filters count word-frequency in suspect and compare the results to profiles of word-frequency in spam and ham. Defeating this requires that your spam include a lot of natural human prose. So spammers have started to mine the Gutenberg Project and other sources of human-generated ASCII and dumping random hunks of literature into their messages to get around the filters.
Blogger and journalist Clive Thompson found an excerpt from Chapter 20 of The Master Key by Wizard of Oz author L Frank Baum in a message that had as its subject line "the big unit" (no prizes for guessing what the rest of it was hawking).
Link

bogofilter -- Fast Bayesian Spam Filter


bogofilter -- Fast Bayesian Spam Filter 03/16/2003 01:31 PM
bogofilter-0.11.1.3 - new stable release

SpamProbe - fast bayesian spam filter


SpamProbe - fast bayesian spam filter 03/27/2005 10:08 AM
spamprobe-1.1x6 released

FlashTrax incorporates image histograms


FlashTrax incorporates image histograms 08/23/2004 10:48 AM
SmartDisk's FlashTrax device now displays histograms, the graphic representation of the range of image tones in a photograph...

Mac under fake Word 2004 attack


Mac under fake Word 2004 attack 05/12/2004 12:39 PM

Macworld UK - Mac under fake Word 2004
attack


Macworld UK - Mac under fake Word 2004
attack
05/13/2004 06:32 AM
Users who deserve to lose all their data .. MacWorld UK .. alerting .. *cough* .. quote:

macworld.co.uk/news/top_news_item.cfm?NewsID=8664
track this site | 6 links


Comment Spam Attack


Comment Spam Attack 02/05/2005 09:12 PM

So, apparently I'm not the only one that was hit by some bleepity-bleep-bleep spammer trying to post 400+ comment spams to my blogs. MT-B blocked about 300 of them, moderated 80, and let 4 through. That's pretty decent. The other 80 all had the same base domain so future attacks will fail for that one domain. There are also regular expressions in place now that should moderate the more ... interesting ones.

Your comments may get moderated if you include any terms relating to animal sex or incest. If so, I'll notice when I check my mail next and approve/reject it, so don't worry. A little delay is all. Keep those illegal-in-Alabama discussions going! Woo! Eye-wink

That said, I'm wondering if going TypeKey-only is the way to go. Yes, it makes you make an account (boo-hoo) but it keeps things a little more sane on the management end. If I get two more of these full-on assaults I'll do it, but not until then. It will alienate the more lazy amongst you.


A SPAM DoS Attack and Corporate
Responsibility


A SPAM DoS Attack and Corporate
Responsibility
02/04/2003 12:26 AM
Craig points to a K5 article about SPAM DoS Attacks, or what happens if a spammer forges your address on thousands or millions of mail messages. The result is that you'll get a ton of bounces, complaints, and a gernal...

Police hit in iPlod spam attack


Police hit in iPlod spam attack 12/05/2003 05:32 PM
Personal Computer World Dec 5 2003 4:19PM ET

LINX Leads New Attack on Spam Web Sites


LINX Leads New Attack on Spam Web Sites 08/19/2004 03:14 PM
theWHIR Aug 19 2004 6:56PM GMT

Racist spam attack hits Germany


Racist spam attack hits Germany 06/11/2004 11:35 AM
ZDNet UK Jun 11 2004 3:23PM GMT

MovableType going after Spam Attack
Server Loading


MovableType going after Spam Attack
Server Loading
12/24/2004 12:46 PM

It appears we are just a few days away from a release that will help sites like this one that gets hammered on a daily basis by comment and trackback spam.

I know that 99% of you hate the Typekey registration service and I wish they also could come up with a way to allow comments and not cause all of you to have to go register on another site for permission to post. But until some one builds a better mousetrap we are stuck using Typepad. We thank you for your patience. [MovableType]


AOL's Sunshine State spam attack
thwarted


AOL's Sunshine State spam attack
thwarted
12/31/2003 01:17 PM
File another day

Spam Opt-out Link Triggers Malicious
Code Attack


Spam Opt-out Link Triggers Malicious
Code Attack
09/22/2004 11:53 AM

German hate mail spam attack stuns
experts


German hate mail spam attack stuns
experts
06/11/2004 04:57 AM
Virus spreads racist propaganda

Disgruntled employee takes down
ex-firm's Web site with spam attack


Disgruntled employee takes down
ex-firm's Web site with spam attack
07/13/2004 05:30 AM
ZDNet UK Jul 13 2004 10:15AM GMT

Spam, spam, spam, spam ... Canada
targets unwanted email (AFP)


Spam, spam, spam, spam ... Canada
targets unwanted email (AFP)
05/12/2004 04:17 AM
AFP - Canada unveiled a new action plan to combat unsolicited commercial e-mail, nicknamed spam, which jams inboxes and clogs Internet traffic worldwide.

OSD CPU frequency monitor


OSD CPU frequency monitor 06/24/2005 05:47 PM
Initial release

Bayesian Aggregator


Bayesian Aggregator 12/02/2003 08:47 AM
In a comment, Kevin Jordan writes: 348North News is a normal aggregator in much of the way you think of it. However, it allows me to identify keywords or themes that it puts together into phrases — and then matches up the phrases with like articles. Like a cross between Google News and Daypop (but that makes it sound much more complex than it is). If you want to see an "interests" based summary for me, check out the Phrase Index. I use fairly general keywords so as not to miss out on the future items. I haven't tried...

Java Bayesian


Java Bayesian 04/01/2005 06:56 AM

Is there a decent open source Java Bayesian package that is not GPL or similarly restricted from commercial use?  I am aware of only Classifier4J.  Preferably, it should be optimized for server applications and high performance.


Changing the reporting frequency in the
MOM


Changing the reporting frequency in the
MOM
09/21/2004 05:11 PM

Adjusting the Sampling Frequency


Adjusting the Sampling Frequency 05/12/2004 04:22 AM

CPU frequency scaler for Linux 2.4


CPU frequency scaler for Linux 2.4 11/16/2003 04:46 AM
sstepd 0.1.7 released

Subconsciously, People may be Bayesian


Subconsciously, People may be Bayesian 01/22/2004 02:48 AM
DAVID LEONHARDT writes in a NY Times article about people playing the odds of everyday life with Bayesian Analysis. He describes new research, recently published in Nature, "which stands out because it offers a detailed window into how the Bayesian thought process works, showing the point when uncertainty becomes great enough to give past experience an edge over current observation." Bayesian Analysis, among researchers, is "the combining of new information with conventional wisdom." I do agree about their reliance on past observations, but I believe that they have underestimated the role of future orientation in the whole mix of decision making.

Bayesian Filter Library


Bayesian Filter Library 03/13/2003 11:34 AM
0.1.0alpha release

Working with Bayesian Categorizers


Working with Bayesian Categorizers 11/19/2003 08:11 PM
Bayesian classification has proved a powerful weapon against spam. Jon Udell tries to find out whether it can be put to use in other spheres of content categorization.

Working with Bayesian categorizers


Working with Bayesian categorizers 12/02/2003 01:38 AM
There's been some discussion in the blog world about using a Bayesian categorizer to enable a person to discriminate along various interest/non-interest axes. I took a run at this recently and, although my experiments haven't been wildly successful, I want to report them because I think the idea may have merit. [Full story: O'Reilly Network: Working with Bayesian Categorizers]
This month's O'Reilly Network column was a struggle because categorization itself is a struggle. I remain convinced that the automated classifiers that are doing such a good job beating back the tide of spam will also turn out to be more generally useful. But finding the right synergy between an automated assistant and a human overseer is a subtle and tricky thing. ...

Bayesian Aggregation, Part I


Bayesian Aggregation, Part I 02/14/2003 03:23 PM
On Monday I configured Scenario 3 of my Bayesian Aggregation experiment, building a "good" corpus of my weblog entries and...

Bayesian Aggregation, Part II


Bayesian Aggregation, Part II 02/23/2003 05:22 PM
iFile finally classified something as belonging on my weblog, but I have no idea why... Justin Rudd's Busy weekend complicated...

Intel Processor Frequency ID Utility 7.1


Intel Processor Frequency ID Utility 7.1 07/20/2004 07:57 AM

IBM and Sun Put Radio Frequency ID to
Test On Their Own Turfs


IBM and Sun Put Radio Frequency ID to
Test On Their Own Turfs
04/28/2004 08:12 PM
ZDNet Apr 29 2004 0:53AM GMT

FCC to open 50 mhz of frequency in the
3600 mhz range.


FCC to open 50 mhz of frequency in the
3600 mhz range.
04/16/2004 04:56 AM
The FCC is looking to open up the 3.6 ghz spectrum for Wireless ISP Operators. with the current spectrum becoming...

Intel Processor Frequency ID Utility 7.0


Intel Processor Frequency ID Utility 7.0 05/28/2004 03:26 PM

Frequency 2.0 adds built-in FTP
transfers, more


Frequency 2.0 adds built-in FTP
transfers, more
04/09/2004 03:55 PM
Developer Brad Rhine has made Frequency 2.0 available. Frequency is used to update weblogs, or blogs. The new release adds built-in FTP file transfers, the ability to simultaneously manage more than one blog, support for many international characters, one button weblog updates, HTML buttons that add styles and links to posts and more.

DNS Update Frequency to Increase
Dramatically


DNS Update Frequency to Increase
Dramatically
07/12/2004 07:27 AM
Verisign will increase the number of root server updates from 2 per day to 30 per minute.

Bayesian Noise Reduction Library 2.0.2


Bayesian Noise Reduction Library 2.0.2 01/02/2005 04:17 AM
An implementation of the Bayesian Noise Reduction algorithm.

Implement Bayesian inference using PHP,
Part 1


Implement Bayesian inference using PHP,
Part 1
04/09/2004 04:05 PM
This article discusses how Bayesian inference can be used to build an online PHP-based wizard that guides a user through the process making a medical diagnosis. This three-part series features interesting applications designed to help you appreciate the power and potential of Bayesian inference concepts.

octo's bayesian mail filter 0.05


octo's bayesian mail filter 0.05 12/21/2003 08:25 PM
A multi-user spam filter with a database backend.
Grok Description matches for Bayesian spam rumination: when word-frequency-histograms attack!
GrokA matches for Bayesian spam rumination: when word-frequency-histograms attack!

Bayesian spam rumination: when word-frequency-histograms attack!

The following phrases have been identified by the grok system as matching this entry:

















Also check out:


Grok

Ipod Porn on the
Rise

Brief Abstract of
Wikipedia's
Mesothelioma Cancer
page

Get first aid
instructions in your
cell phone

IE is crap
JSPWiki gains
podcasting support

It's not the J in
Java Virtual Machine
that matters, it's
the VM

Express Mail@Mate
v2.6

Sun's Schwartz
Ponders Blogging

WebSphere Takes On
Telcos

'Express' Route For
Windows Coding
Hobbyists

Sony introduces
3.31-megapixel
camcorder

PatchLink Update 6
software released

'Take Control of
Making Music with
GarageBand' released

MacMP3CD gets OGG
Vorbis support

Apple cuts AirPort
Extreme pricing

Delta Force: Black
Hawk Down in stores
next week

CitiTag in Bristol:
setup, initialise,
play, debrief!

Knot Theory
Online Advisor Makes
Finding Great Games
Easy

Microsoft Windows
SharePoint Services
Powers Unprecedented
Collaboration
Opportunities for
and Adoption by
Customers Worldwide

Justices Leave
Online Porn Case
Unresolved
(washingtonpost.com)

U.N. Unconcerned by
Russia's Atomic Work
in Iran (Reuters)

Court Bars Internet
Pornography Law
Enforcement
(Reuters)

Poll: Bush, Kerry
Even Despite Iraq
Woes (AP)

U.S. Expels 2 Guards
at Iran U.N. Mission
(AP)

High Court Upholds
Block of Web Porn
Law (AP)

In Southeast Asia,
Adobe targets piracy

Oracle snaps up
Collaxa

Supreme Court rules
porn law
unconstitutional

Actual ODBC Driver
for Open Source
Databases 1.0

Orbital 1.0.2
Broadcom Puts
802.11g on a Single
Chip

Jake2
SIG - Script
Installer Generator

jTagEditor
WaMu's Whammy
Mattel Loses Barbie
Brawl

Amazon Smacks Back
The Dot-Com Echo
Justices Reject
Effort to Shield
Children From
Internet Porn

Apparently, I joined
Plaxo.

sea of cable
Discovery UV &
Ozone Toothbrush
Purifier

Better Picture of
Sony's Black Backed
Projection Screen

Cox Pushing HD
Service with HD DVRs
Into Many Markets

Archos AV400 Appears
Fully Saleable

Gateway Connected
DVD Player ADC-320
Reviewed

Why Ralph Runs
Kazehakase 0.1.7
jGossip 1.0.0 043
GAI Pal 0.7
BlueCombo 0.1
GAI Othello 0.1
libvisual 0.1.5
libvisual 0.1.5
(Plugins)

what is grok?