stargeek
PHP news website logo.
home    PHP scripts    articles    seo tools    links    search    contact    shop    realtors


Working with Bayesian categorizers







Working with Bayesian categorizers

Working with Bayesian categorizers 12/02/2003 01:38 AM

There's been some discussion in the blog world about using a Bayesian categorizer to enable a person to discriminate along various interest/non-interest axes. I took a run at this recently and, although my experiments haven't been wildly successful, I want to report them because I think the idea may have merit. [Full story: O'Reilly Network: Working with Bayesian Categorizers]
This month's O'Reilly Network column was a struggle because categorization itself is a struggle. I remain convinced that the automated classifiers that are doing such a good job beating back the tide of spam will also turn out to be more generally useful. But finding the right synergy between an automated assistant and a human overseer is a subtle and tricky thing. ...




This is a GrokNews Entry: (what is grok?)





Similar Items

Working with Bayesian categorizers

Grok Headline matches for Working with Bayesian categorizers

Working with Bayesian Categorizers


Working with Bayesian Categorizers 11/19/2003 08:11 PM
Bayesian classification has proved a powerful weapon against spam. Jon Udell tries to find out whether it can be put to use in other spheres of content categorization.

The W3C RDF Data Access Working Group
has published the first public working
draft of SPARQL Variable Binding


The W3C RDF Data Access Working Group
has published the first public working
draft of SPARQL Variable Binding
01/02/2005 11:31 AM
xmlhack Jan 2 2005 1:45PM GMT

Quality Assurance Working Group Updates
Three Working Drafts


Quality Assurance Working Group Updates
Three Working Drafts
11/08/2002 08:17 PM
8 November 2002: The Quality Assurance (QA) Working Group has updated three Working Drafts in its seven-part QA Framework: the Introduction, Process and Operational Guidelines; and Specification Guidelines. Learn more about the QA Activity and the roadmap for ensuring that W3C technologies are well implemented. (News archive)

Bayesian Aggregator


Bayesian Aggregator 12/02/2003 08:47 AM
In a comment, Kevin Jordan writes: 348North News is a normal aggregator in much of the way you think of it. However, it allows me to identify keywords or themes that it puts together into phrases — and then matches up the phrases with like articles. Like a cross between Google News and Daypop (but that makes it sound much more complex than it is). If you want to see an "interests" based summary for me, check out the Phrase Index. I use fairly general keywords so as not to miss out on the future items. I haven't tried...

Java Bayesian


Java Bayesian 04/01/2005 06:56 AM

Is there a decent open source Java Bayesian package that is not GPL or similarly restricted from commercial use?  I am aware of only Classifier4J.  Preferably, it should be optimized for server applications and high performance.


Subconsciously, People may be Bayesian


Subconsciously, People may be Bayesian 01/22/2004 02:48 AM
DAVID LEONHARDT writes in a NY Times article about people playing the odds of everyday life with Bayesian Analysis. He describes new research, recently published in Nature, "which stands out because it offers a detailed window into how the Bayesian thought process works, showing the point when uncertainty becomes great enough to give past experience an edge over current observation." Bayesian Analysis, among researchers, is "the combining of new information with conventional wisdom." I do agree about their reliance on past observations, but I believe that they have underestimated the role of future orientation in the whole mix of decision making.

Bayesian Aggregation, Part I


Bayesian Aggregation, Part I 02/14/2003 03:23 PM
On Monday I configured Scenario 3 of my Bayesian Aggregation experiment, building a "good" corpus of my weblog entries and...

Bayesian Aggregation, Part II


Bayesian Aggregation, Part II 02/23/2003 05:22 PM
iFile finally classified something as belonging on my weblog, but I have no idea why... Justin Rudd's Busy weekend complicated...

Bayesian Filter Library


Bayesian Filter Library 03/13/2003 11:34 AM
0.1.0alpha release

Bayesian Noise Reduction Library 1.2


Bayesian Noise Reduction Library 1.2 07/26/2004 10:38 AM
An implementation of the Bayesian Noise Reduction algorithm.

Bayesian Noise Reduction Library 1.0.0


Bayesian Noise Reduction Library 1.0.0 07/22/2004 12:49 PM
An implementation of the Bayesian Noise Reduction algorithm.

octo's bayesian mail filter 0.05


octo's bayesian mail filter 0.05 12/21/2003 08:25 PM
A multi-user spam filter with a database backend.

Implement Bayesian inference using PHP,
Part 1


Implement Bayesian inference using PHP,
Part 1
04/09/2004 04:05 PM
This article discusses how Bayesian inference can be used to build an online PHP-based wizard that guides a user through the process making a medical diagnosis. This three-part series features interesting applications designed to help you appreciate the power and potential of Bayesian inference concepts.

IBM DeveloperWorks: Bayesian Inference,
Part 3


IBM DeveloperWorks: Bayesian Inference,
Part 3
05/12/2004 08:28 AM
In a new submission from Paul Meagher, he lets us know that the third part of his series has been posted - Im plement Bayesian inference using PHP: Part 3.

Devshed: Implement Bayesian Inference
with PHP


Devshed: Implement Bayesian Inference
with PHP
01/06/2005 09:24 AM
DevShed has a new article posted today for all of those interested in a better way to filter out information/spam messages from your data - Implement Bayesian inference using PHP, Part 1 .

IBM DeveloperWorks: Bayesian Inference,
Part 2


IBM DeveloperWorks: Bayesian Inference,
Part 2
04/14/2004 07:45 AM
Paul Meagher wrote in this morning to tell us about Part Two of his Ba yesian inference using PHP article series.

Bayesian Network Classifiers in Java


Bayesian Network Classifiers in Java 06/10/2004 10:07 PM
JavaBayes v.0346.1

Bayesian spam filtering for the masses


Bayesian spam filtering for the masses 10/30/2003 11:59 PM
Spam, or unsolicited commercial e-mail, is now a sad part of everyday life online. Research companies estimate that more than 50% of the worldwide e-mail traffic is spam. As a result, it's becoming constantly more difficult and time-consuming to sort out legitimate e-mails from the deluge of commercial messages we're being flooded with. But there are ways to fight back. In this series, we'll walk through choosing and setting up a highly effective package for screening out spam.

Bayesian Noise Reduction Library 2.0.0


Bayesian Noise Reduction Library 2.0.0 12/28/2004 11:39 PM
An implementation of the Bayesian Noise Reduction algorithm.

E-texts used against Bayesian
spam-filters


E-texts used against Bayesian
spam-filters
12/02/2003 07:37 AM
Bayesian anti-spam filters count word-frequency in suspect and compare the results to profiles of word-frequency in spam and ham. Defeating this requires that your spam include a lot of natural human prose. So spammers have started to mine the Gutenberg Project and other sources of human-generated ASCII and dumping random hunks of literature into their messages to get around the filters.
Blogger and journalist Clive Thompson found an excerpt from Chapter 20 of The Master Key by Wizard of Oz author L Frank Baum in a message that had as its subject line "the big unit" (no prizes for guessing what the rest of it was hawking).
Link

Bayesian Noise Reduction Library 2.0.2


Bayesian Noise Reduction Library 2.0.2 01/02/2005 04:17 AM
An implementation of the Bayesian Noise Reduction algorithm.

Bayesian decision-making rules our
unconscious


Bayesian decision-making rules our
unconscious
01/22/2004 02:47 AM
Bayesian statistical modelling is a tool used to compare new events to past experience, something useful for applications as diverse as predicting whether a message is spam and whether a Web-page is relevant to a given subject. New research indicates that we do a lot of Bayesian comparisons in our heads, particularily when engaged in athletic tasks:
"Most decisions in our lives are done in the presence of uncertainty," Dr. Körding said. "In all these cases, the prior knowledge we have can be very helpful. If the brain works in the Bayesian way, it would optimally use the prior knowledge."

The researchers drew the analogy to tennis in their paper, and it is not the first study to suggest that athletes have a more sophisticated understanding of mathematics than even they may realize.

Link (via K5)

Bayesian Pattern Filtering Library
0.1.0alpha


Bayesian Pattern Filtering Library
0.1.0alpha
03/13/2003 06:02 PM
A C++ library for building Bayesian Filters.

SpamProbe - fast bayesian spam filter


SpamProbe - fast bayesian spam filter 03/27/2005 10:08 AM
spamprobe-1.1x6 released

bogofilter -- Fast Bayesian Spam Filter


bogofilter -- Fast Bayesian Spam Filter 03/16/2003 01:31 PM
bogofilter-0.11.1.3 - new stable release

Bitflux Blog: PHP Naive Bayesian Filter


Bitflux Blog: PHP Naive Bayesian Filter 03/31/2005 09:56 AM
On the Bitflux Blog today (Christian Stocker) there's a posting about a simple, easy-to-implement n aive PHP Bayesian filter.

"Spammers use text from novels to fool
Bayesian filters"


"Spammers use text from novels to fool
Bayesian filters"
12/03/2003 09:47 PM

Bayesian spam rumination: when
word-frequency-histograms attack!


Bayesian spam rumination: when
word-frequency-histograms attack!
06/29/2004 10:40 AM
Ed Felten has posted an intriguing rumination on the possible failure modes of Bayesian spam-filtering -- filtering that uses word-frequency statistics to classify email as spam or ham. As Ed points out, Bayesian filters are trained by the spammers, who, by choosing the vocabulary of their messages carefully, can make messages containing certain words or phrases undeliverable on the Internet.
Now suppose a big spammer wanted to poison a particular word, so that messages containing that word would be (mis)classified as spam. The spammer could sprinkle the target word throughout the word salad in his outgoing spam messages. When users classified those messages as spam, the targeted word would develop a negative score in the users' Bayesian spam filters. Later, messages with the targeted word would likely be mistaken for spam.

This attack could even be carried out against a particular targeted user. By feeding that user a steady diet of spam (or pseudo-spam) containing the target word, a malicious person could build up a highly negative score for that word in the targeted user's filter.

Link

Working Hard, Hardly Working


Working Hard, Hardly Working 06/15/2004 10:07 AM
Three years ago, I was working at a small company as the unofficial IT director / all-purpose computer bitch. I was laid off in early 2003, but to this day, the job presents me with difficulties; namely, that of telling prospective employers what I did, and for that matter, what the company itself did. I have virtually no idea what this company's function was, despite working there for over a year and a half, although I did learn how to spew an amazing amount of marketing jargon without thinking. As for my role there, it was essentially vast tracts of doing absolutely nothing, punctuated erratically by moments of panicking and crisis-defusion, usually involving something truly earth-shattering like the CEO not being able to print her email. When asked by interviewers "What did your company do?" I am forced to mumble vaguaries about consulting and hope they leave the issue alone.

Working for Your ISP


Working for Your ISP 01/27/2003 12:53 PM
Interesting conflict-of-interest questions when you get hired by the local ISP for web work.

Working in the UK


Working in the UK 07/17/2004 07:23 PM
The regularly scheduled weblog is being interrupted for an urgent personal request for help. If you know anything about getting...

Working together


Working together 05/23/2004 03:19 PM

Along the lines of my last entry.... Brent Simmons' and Adriaan Tijsseling's children shared a playground for some time, now, and today their parents taught them how to play well together. ecto's new 1.1.5 version allows users to add new feeds to NetNewsWire, while NetNewsWire in its 2.0 incarnation can be told to use ecto as the default weblog editor instead of the builtin one.

Thanks, Ado and Brent. [a preponderance of evidence - What Willis Wuz' Talkin' 'Bout]

Right on to Adriaan and Brent!

We need more cooperation liek this in our industry!


Working with XML in .NET


Working with XML in .NET 11/15/2002 03:40 AM
CNET Nov 15 2002 2:09AM ET

Working for you on the web


Working for you on the web 06/14/2004 10:16 AM
Manchester Online Jun 14 2004 2:27PM GMT

Is your antivirus app working? Are you
sure?


Is your antivirus app working? Are you
sure?
06/15/2004 05:56 PM
ZDNet Jun 15 2004 9:50PM GMT

Bally Not Working Out


Bally Not Working Out 05/10/2004 02:40 PM
The latest results at Bally Total Fitness are a little weak.

Wi-Fi Means: Never Having to Say "I'm
Not Working"


Wi-Fi Means: Never Having to Say "I'm
Not Working"
01/16/2004 11:01 AM
The New York Times offers this cheery piece that suggests you can still ignore your kids while in the same room [reg. required]: I'm sounding cynical, but this article does extol the virtues of being able to be connected all of the time and work all of the time, even when in physical proximity to your family. Seriously, however, the notion that you can get necessary work done and not have to hole yourself up in a basement or at a specific location is one of the great benefits of a home wireless network. Oddly, the piece opens looking at Oren Michels, identifying him as the president of a human resources benefits administration firm. I knew that name, so I perform a Google search, and find that he is also president and CEO of WiFinder, a Wi-Fi directory site. (Disclosure: I'm the senior editor at JiWire, an editorial and directory site focused on wireless that competes for ad/sponsor dollars with WiFinder.) I shot a note to Oren to confirm that he was still in that role at WiFinder, which he is. Like WiFinder's chairman and founder Scott Rafer, Oren wears a few hats. My point here is not that it's odd that Oren has multiple jobs, but rather it's an odd choice of the reporter to not mention that Oren Michels is the head of a company that's devoted to spreading information about Wi-Fi. It's not bias; it's just a strange omission, n'est c'est pas?...

Working with Filters


Working with Filters 06/18/2004 04:11 PM
See how you can use filters to help users easily retrieve the Breeze content they need.

Why e-government isn't working


Why e-government isn't working 04/06/2005 03:50 PM
silicon.com Apr 6 2005 4:50PM GMT
Grok Description matches for Working with Bayesian categorizers
GrokA matches for Working with Bayesian categorizers

Working with Bayesian categorizers

The following phrases have been identified by the grok system as matching this entry:

















Also check out:


Grok

Ipod Porn on the
Rise

Brief Abstract of
Wikipedia's
Mesothelioma Cancer
page

Get first aid
instructions in your
cell phone

IE is crap
JSPWiki gains
podcasting support

A tale of two Cairos
Preserving the
Internet's neutral
core

Data models and
network effects

Link-addressable
streams

Site News: Release
RDF Announcement

DevArticles: More on
SQL Server and PHP

phpPatterns: XML-RPC
Hack From a
Foundation

DotGeek: PHP
Marathon Results
Posted

Site News: Happy
Thanksgiving!

phpPatterns: More
PEAR Productivity?

DevArticles: It's
the Simple Things...

php traveller:
Review of Core PHP
Programming

PHP.net: Two
Conferences
Announced

Holiday Gifts for
.Mac Members

AppleCare for iPod
Now Available

Strategic Talent
Management

A Look Back at
Comdex Fall 2003

5th Annual
International Forum,
RATIONALISING
AUTOMATED
TRANSACTION
PROCESSING FROM
INCEPTION TO
SETTLEMENT

3IA'2004 - Computer
Graphics and
Artificial
Intelligence

SRI in a Performance
Driven Market

3rd Annual:
LIQUIDITY MANAGEMENT

ITI 2004 - 26th
International
Conference
INFORMATION
TECHNOLOGY
INTERFACES

CFO Forum 2004
Coupled Problems,
Processes, and
Phenomena (an
embedded special
session of the
WCNA-2004)

COMDEX Photo Essay
II

Off to Norfolk...
On the UK Webloggers
Christmas Party...

On disturbing status
messages...

A penny for your
thoughts isn't worth
it...

Key moments in
Barbelith history...

The long-term cycles
of weblog-writing...

The Great British
Christmas Single is
reborn...

On the Guardian
weblog
competition...

On The Guts of a New
Machine (Part One)

On The Guts of a New
Machine (Part Two)

On The Guts of a New
Machine (Aside)

November 22, 2003
November 30, 2003
December 01, 2003
Porn clone blogs
Trackmarklet
templates

Commerzbank plans to
outsource IT to IBM

Balancing the
benefits and risks
of mobility

India hits back on
outsourcing job
fears

Brief: IBM to sell
software specialized
for industries

Pivotal: Late CDC
offer tops planned
Oak/Talisma deal

IBM reorganization
to focus on a dozen
vertical industries

BEA, IBM team on new
Java specs

Intel chip set to
make PC into
wireless access
point

Asian software
pirates already
selling next Windows
OS

what is grok?