stargeek
PHP news website logo.
home    PHP scripts    articles    seo tools    links    search    contact    shop    realtors


Latent semantic indexing explained







Latent semantic indexing explained

Latent semantic indexing explained 05/01/2004 07:41 AM

In response to my blogging about pages not saying what they're about, Hanan Cohen points us to an exceptionally well-written article by Clara Yu, John Cuadrado, Maciej Ceglowski and J. Scott Payne about latent semantic indexing (not to be confused with latex cement and indenting). Latent semantic indexing adds an important step to the document indexing process. In addition to recording which keywords a document contains, the method examines the document collection as a whole, to see which other documents contain some of those same words. LSI considers documents that have many words in common to be semantically close,...




This is a GrokNews Entry: (what is grok?)





Similar Items

Latent semantic indexing explained

Grok Headline matches for Latent semantic indexing explained

Semantic Indexing


Semantic Indexing 09/17/2004 08:43 AM
Semantic Indexing
http://www.nitle.org/s emantic_search.php

Semantic indexing is their name for a family of techniques for searching and organizing large data collections. The goal of semantic indexing is to find patterns in unstructured data (documents without descriptors such as keywords or special tags) and use those patterns to offer more effective search and categorization services. Semantic indexing techniques are language-agnostic, so data collections don't have to be in English, or even in any human language at all. For example, they have had good preliminary results in protein structure prediction using algorithms adapted from a text search engine. Latent Semantic Indexing (LSI or LSA, for latent semantic analysis) was originally described in a 1990 paper by Deerwester, Dumais, Furnas, Landauer, and Harshman, and is a topic of active study. You can find links to journal articles and other LSI websites on our refer ences page. This has been added to the semantics web section of Deep Web Research Subject Tracer™ Information Blog and Bot Research Subject Tracer™ Information Blog.

Long Tail of Latent Demand


Long Tail of Latent Demand 12/27/2004 11:16 PM
I'm a huge fan of The Long Tail, but the demand it represents is nothing new.  What's new is how we discover it. Latent Demand (also known as Induced Demand) is the potential earnings if a market is served efficiently. ...

Magpie - The Semantic Filter and Tool
For the Semantic Web


Magpie - The Semantic Filter and Tool
For the Semantic Web
12/28/2004 06:58 AM
Magpie - The Semantic Filter and Tool For the Semantic Web
http://kmi.open .ac.uk/projects/magpie/main.html

Magpie uses ontology infrastructure to semantically markup web documents on-the-fly. The existing technologies in this problem domain tend to be rather heavyweight, and often modify the appearance of the actual webpage. Whilst these modifications may sometimes be acceptable, sometimes they may be a cause of a serious annoyance on user's behalf. Often, the existing technologies rely on one very specific ontology... To alleviate some of these issues, they started work on the Magpie technology that would be lightweight and provide sufficiently robust and flexible features for semantically enriched browsing. Magpie tool aims to identify and filter out the concepts-of-interest from any webpage it is given. The current set of concepts can be influenced by a selection of a particular ontology of concepts and relations. In addition to identifying the concepts-of-interest that are relevant from the perspective of a particular ontology, each such concept may provide an applicable set of relations or commands that can be executed. Such relationships are both, determined and evaluated dynamically by querying the ontology server. Another feature they believe improves the user's experience is the ability to turn the semantic menus ON or OFF, to highlight all instances belonging to a particular ontological class, to follow and semantically process the links embedded in the document. This has been added to the Semantic Web Research section of Deep Web Research Subject Tracer™ Information Blog.

Semantic Blogging: Spreading the
Semantic Web Meme


Semantic Blogging: Spreading the
Semantic Web Meme
05/08/2004 06:20 AM
Semantic Blogging: Spreading the Semantic Web Meme by Steve Cayzer
http://snipurl.com/66yj

Steve is a research engineer at Hewlett-Packard's (HP) laboratories in Bristol, England. He is interested in the intersection of semantic web technologies and machine learning techniques, such as automated classification and metadata enrichment. He also has a semantic blog. This paper is about semantic blogging, an application of the semantic web to blogging. The semantic web promises to make the web more useful by endowing metadata with machine processable semantics. Blogging is a lightweight web publishing paradigm which provides a very low barrier to entry, useful syndication and aggregation behaviour, a simple to understand structure and decentralized construction of a rich information network. Semantic blogging builds upon the success and clear network value of blogging by adding additional semantic structure to items shared over the blog channels. In this way we add significant value allowing view, navigation and query along semantic rather than simply chronological or serendipitous connections. Our vision is to use semantic web tools and ideas to help move blogging beyond communal diary browsing to rich information sharing scenarios. We have built a simple prototype as an illustration of this vision. This has been added to the Semantic Web Research section of the Deep Web Research Subject Tracer™ Information Blog.

Indexing TV


Indexing TV 12/17/2004 06:31 PM
From a Bilnkx press release: blinkx is the first search engine to make such TV programs fully searchable on demand. Because blinkx captures and indexes the entire video stream directly from the television, consumers can get straight to the exact clip they want. blinkx TV can be accessed at http://www.blinkx.tv/ Blinkx says it "captures and indexes video streams across news, sports and entertainment programming from 22 channels, including Fox News, ESPN and Biography." I haven't had a chance to try it, and I'm in the air all day (= 24 hours from takeoff till final landing) today......

B plus tree indexing in C# with .NET


B plus tree indexing in C# with .NET 06/08/2004 03:47 PM
pre alpha uploaded for the brave :)

Google Indexing IRC?


Google Indexing IRC? 11/04/2003 11:00 AM
Tony Collen reports (confirmed by Google and others, apparently) that Google is sending robots to IRC channels (internet chat rooms) as part of some experiment. Since things said in chat rooms are traditionally considered very private, I can't imagine what they plan to do....

The Soundex Indexing System


The Soundex Indexing System 03/30/2005 11:26 AM
The Soundex Indexing System
http://www.archives.gov/research_room/genealogy/census/soundex.ht ml

To use the census soundex to locate information about a person, you must know his or her full name and the state or territory in which he or she lived at the time of the census. It is also helpful to know the full name of the head of the household in which the person lived because census takers recorded information under that name. The soundex is a coded surname (last name) index based on the way a surname sounds rather than the way it is spelled. Surnames that sound the same, but are spelled differently, like SMITH and SMYTH, have the same code and are filed together. The soundex coding system was developed so that you can find a surname even though it may have been recorded under various spellings. This has been added to Finding People Subject Tracer™ and Genealogy Resources Subject Tracer™ Information Blogs.

Fix Mail.app indexing issues under 10.3


Fix Mail.app indexing issues under 10.3 08/02/2004 01:58 PM
This is an update to the Reduce mailbox sizes in Mail hint published last year, by deleting the index files and letting Mail.app recreate them. Now, under 10.3, everything's different... In ~/Library/Mail, you will see vario...

Time For An Indexing Summit?


Time For An Indexing Summit? 01/06/2005 10:09 PM

Yahoo Indexing One or Two Pages Only


Yahoo Indexing One or Two Pages Only 04/08/2005 10:04 AM
Search Engine Journal Apr 8 2005 12:32PM GMT

Indexing SQL Server for speed


Indexing SQL Server for speed 02/16/2004 08:00 AM
CNET Feb 16 2004 11:11AM GMT

Briefcase: Indexing Google


Briefcase: Indexing Google 09/03/2004 11:23 PM
IHT Sep 4 2004 2:50AM GMT

Google Indexing Quirk


Google Indexing Quirk 09/05/2004 04:11 AM

Google is dying: I think the title of this bit may be a little dramatic, but the author points out an interesting phenomenon with Google pages lately.

On sites with more than a few thousand pages, Google is not indexing anywhere from ten percent to seventy percent of the pages it knows about. These pages show up in Google's main index as a listing of the URL, which means that the Googlebot is aware of the page. But they do not show up as an indexed page.

When the page is listed but not indexed, the only way to find it in a search is if your search terms hit on words in the URL itself. Even if they do hit, these listed pages rank so poorly compared to indexed pages, that they are almost invisible. This is true even though the listed pages still retain their usual PageRank. [...]

Google is dying. It broke sixteen months ago and hasn't been fixed. It looks to me as if pages that have been noted by the crawler cannot be indexed until some other indexed page gives up its docID number.

Click here to comment on this entry


Oracle, IBM Offer XML Indexing


Oracle, IBM Offer XML Indexing 06/24/2005 07:37 PM
Oracle and IBM aim to help customers rein in new applications not easily integrated with traditional data structure models.

xindy - A Flexible Indexing System


xindy - A Flexible Indexing System 08/07/2004 08:44 AM
Mailing list moved to Sourceforge

Google tests tool to aid Web indexing


Google tests tool to aid Web indexing 06/05/2005 11:35 PM
ZDNet Jun 3 2005 5:17PM GMT

Google Testing New Indexing Approach


Google Testing New Indexing Approach 06/05/2005 11:59 PM
Since its inception, Google has tried to make sense of billions of Web documents and using advanced in-house technology. But now, Google is experimenting with a new concept to better its search crawlers: ask webmasters for help. The program, called Google Sitemaps, could revolutionize how the Web is indexed.

"Google's indexing of syndication
feeds "


"Google's indexing of syndication
feeds "
04/23/2004 02:43 AM

Google Indexing Stop Words


Google Indexing Stop Words 06/27/2004 09:32 AM
This goes back to changes made when Google introduced stemming. Google is now indexing stop words - even though the message says it does.

How To Stop Yahoo! From Indexing PDF
Files


How To Stop Yahoo! From Indexing PDF
Files
09/14/2004 07:35 AM

Google Indexing and Listing WML Files


Google Indexing and Listing WML Files 04/29/2004 12:02 PM
WML was never as big as the prehype that accomponied it, but it is nice to see the support in the Google index.

Google Print Now Indexing Magazine
Articles?


Google Print Now Indexing Magazine
Articles?
04/09/2004 04:12 PM
Kudos to Greg Notess, who pointed out that Google Print, that beta endeavor that's focused on indexing book content, is now indexing magazine articles. You can read his article at...

Google Indexing Redirect Problems
Continue


Google Indexing Redirect Problems
Continue
06/05/2005 11:27 PM
We had high hopes that the issue was resolved, but not only is it not resolved, Google itself is a victim of a redirect via scrapped content: "... it definitely should serve as a wake up call for Google and hopefully this will find a remedy for the situation once and for all. If someone can highjack a PR 9 page of Google's own site, with a lousy PR 5 domain, it means any page, from any site, can be highjacked."

GoDaddy Blocks GoogleBot Indexing on
Domains


GoDaddy Blocks GoogleBot Indexing on
Domains
11/07/2003 09:56 AM
A long running mystery has been solved with sites disappearing from Google. The main theme found by members was that the sites were all hosted by GoDaddy. Google has confirmed that GoDaddy is blocking GoogleBot.

Google indexing system exploit allows
position hijacking


Google indexing system exploit allows
position hijacking
06/06/2005 12:06 AM

peterme.com: Mob indexing? Folk
categorization? Social tagging?


peterme.com: Mob indexing? Folk
categorization? Social tagging?
01/04/2005 09:18 AM
peterme.com: Mob indexing? Folk categorization? Social tagging? .. Peterme .. HT

peterme.com/archives/000444.html
track this site | 3 links


Incremental Indexing Shows Minor
Reshuffling of Google Index


Incremental Indexing Shows Minor
Reshuffling of Google Index
06/05/2005 11:27 PM
Update or Not? It is hard to determine these days. Either way, it looks like a small shuffling of the Google index is taking place at several data centers.

Capitals, Character Sets, Filenames and
Search Engine Indexing.


Capitals, Character Sets, Filenames and
Search Engine Indexing.
08/26/2002 10:33 AM
Could character set issues be stopping you from being indexed? An interesting thread on filename and character set issues surrounding search engine indexing.

KM re-explained


KM re-explained 06/05/2005 11:10 PM
The K stands for blogs. The M stands for tags. Put 'em together and you get "KM." [Technorati tag: km]...

RSS explained


RSS explained 11/17/2003 05:47 AM
RSS -- the little technology with the big, big list of acronym-expansions -- has even more acronym expansions than heretofore suspected; Google definitions has the scoop:
Repetitive stress syndrome that is caused by repetitive movement (causes Carpal Tunnel Syndrome when occurs at wrist/hand)

(Regional Subscription System) The system by which U S WEST processes Equal Access information that allows end users to obtain service from their Interexchange Carrier of Choice.

Radio Science Subsystem (orbiter science investigation)

Link (via EvHead)

"Points" Explained


"Points" Explained 07/19/2004 09:31 AM
Learn what points are before you pay them.

Options Explained


Options Explained 09/10/2004 09:22 AM
Learn how these potentially risky investments work before considering them.

PCI Express Explained


PCI Express Explained 08/19/2004 10:12 AM

Enterprise Value Explained


Enterprise Value Explained 06/08/2004 08:46 AM
Don't neglect debt and cash when determining a company's price tag.

Wolfram explained


Wolfram explained 07/23/2004 01:04 PM
I just came across a Forbes article by Michael S. Malone, dated 11.27.00, called "God, Stephen Wolfram and Everything Else." It's a good, non-technical introduction to Wolfram. Nicely done. Critics of Wolfram won't find much to like in it, and I still think Ray Kurzweil's piece is the best analysis/intro I've read, but Malone puts Wolfram into a useful perspective....

"brilliantly explained how we can win"


"brilliantly explained how we can win" 08/10/2004 09:25 PM

XML Namespaces Explained


XML Namespaces Explained 11/26/2002 11:28 PM
WebmasterBase Nov 26 2002 10:44PM ET

Hacking Explained


Hacking Explained 12/30/2003 01:29 AM

Grok Description matches for Latent semantic indexing explained
GrokA matches for Latent semantic indexing explained

Fixing a Corrupted Iuident.cab File


Fixing a Corrupted Iuident.cab File 04/11/2004 10:43 AM

What Is File System Journaling?


What Is File System Journaling? 12/15/2003 01:16 AM
Have you ever wondered, "What is File System Journaling"? Don't worry, most of us have. File System Journaling was originally a feature only available to the server world. Finally, File System Journaling has made it to our favorite home computers. Join us to learn more about FSJ.

Like Pixels? Check out MacDesign

File system journaling won't prevent all
disk damage


File system journaling won't prevent all
disk damage
01/16/2004 11:05 AM
Today I discovered that just because one has file system Journaling enabled, that doesn't mean one still shouldn't check such HDs from time to time with a disk repair utility, or that a Journaled HD cannot get directory damag...

Fixing a Corrupted Money File


Fixing a Corrupted Money File 08/19/2004 12:04 AM

Test Drive a Bunch of Open Source
Content Management Systems


Test Drive a Bunch of Open Source
Content Management Systems
09/15/2004 09:19 AM
Now THIS is a marvelous idea. Put a bunch of content management systems online with screenshots, and descriptions. Make the admin passwords public so that anyone can log in and...

Tender: UFI needs Content Management
System Application Software


Tender: UFI needs Content Management
System Application Software
05/07/2004 07:34 AM
PublicTechnology.net May 7 2004 11:53AM GMT

Drupal, an open source platform and
content management system. Must
investigate


Drupal, an open source platform and
content management system. Must
investigate
05/15/2004 05:52 AM
drupal.org community plumbing .. Drupal 4.2 .. Drupal

drupal.org
track this site | 4 links


IBM updates SAN File System software


IBM updates SAN File System software 05/25/2004 08:42 AM
IBM Corp. next month will release a new version of its TotalStorage SAN (storage area network) File System Software designed to work with a wider variety of new server and storage environments.

IBM to update SAN File System software


IBM to update SAN File System software 05/25/2004 04:31 PM
Unlike earlier versions of the SAN File System, which supported only IBM products, Version 2.1 will work with storage devices from IBM rivals including EMC, HP and Hitachi Data Systems.

Free File: AlbumWrap Extractor


Free File: AlbumWrap Extractor 09/20/2004 04:34 AM
G4 Tech TV Sep 20 2004 8:24AM GMT

Practical File System Design with the Be
File System


Practical File System Design with the Be
File System
05/10/2004 04:21 PM

CyberGuard's Webwasher Prevents
Microsoft JPEG Exploit; Content
Management Products Filter Files for
Malicious Code Regardless of File
Extension


CyberGuard's Webwasher Prevents
Microsoft JPEG Exploit; Content
Management Products Filter Files for
Malicious Code Regardless of File
Extension
09/21/2004 10:36 AM

Global Evaluation Campaign of a Content
Management System Announced By XITEX
Software


Global Evaluation Campaign of a Content
Management System Announced By XITEX
Software
04/15/2005 04:41 AM
2 months Global Evaluation Campaign of Xitex WebContent M1 due to its new release has been announced by XITEX Software. Its aim is to involve companies and sole developers into evaluation of new release of Xitex WebContent M1. [PRWEB Apr 15, 2005]

St. Bernard Software Announces Open File
Manager 9.3 with Support for 64-bit
Computing Platforms


St. Bernard Software Announces Open File
Manager 9.3 with Support for 64-bit
Computing Platforms
08/11/2004 03:11 PM
AMD Zone Aug 11 2004 7:07PM GMT

Semantic Web Content Accessibility
Guidelines for Current Research
Information Systems (CRIS)


Semantic Web Content Accessibility
Guidelines for Current Research
Information Systems (CRIS)
08/15/2004 06:13 AM
Semantic Web Content Accessibility Guidelines for Current Research Information Systems (CRIS)by A. Lopatenko
http ://eprints.osti.gov/cgi-bin/dexpldcgi?qry1123892181;12

Abstract:
The most exciting challenge for CRIS is to create a service for research information which should be wide-spread, distributed and actual like Google, but at the same time structured, trusted, with a complex search and navigation similar to today CRIS application. The core technology for such a "new" CRIS is the semantic web technology to integrate database contents with HTML and XML web pages for being provided to the research interested public. One (at the moment the best) possible way is to use RDF (Resource Description Framework) which is also recommended by the W3 consortium. This has been added to the articles section of Deep Web Research Subject Tracer™ Information Blog.

Free File: Hungry Frog Freeware Arcade
Math Game


Free File: Hungry Frog Freeware Arcade
Math Game
09/25/2004 02:04 AM
G4 Tech TV Sep 25 2004 5:14AM GMT

1U Rack-Mountable Pentium 4 Embedded
Development Platform Available for OEMs,
Software Developers, and System
Integrators


1U Rack-Mountable Pentium 4 Embedded
Development Platform Available for OEMs,
Software Developers, and System
Integrators
12/19/2004 03:45 PM
Ideal for networking, Internet appliance, industrial automation, and point-of-service (POS) applications, PL-01022 is a rack-mountable embedded platform that features a Pentium 4 processor and up to 1 GB DDR RAM. A high-performance, economical system with several customization options, PL-01022 supports three Gigabit (10/100/1000) Ethernet and two 10/100 LAN, a CompactFlash socket, two serial ports, and two USB ports. [PRWEB Dec 15, 2004]

Hot Banana Wins 2005 e-Content Award -
Best Content Management System - CMS


Hot Banana Wins 2005 e-Content Award -
Best Content Management System - CMS
04/08/2005 04:55 AM
Hot Banana Software Inc., a leading North American Web Content Management Suite (CMS) company, announced today that it has won the 2005 e-Content award for the best Content Management System. The Canadian e-Content Awards are sponsored by the e-Content Institute and were created to recognize and honor e-content products and services used by Canadian organizations and individuals. [PRWEB Apr 8, 2005]

OSS Chicago: Content Management Systems


OSS Chicago: Content Management Systems 02/17/2004 07:45 PM
Submission by OSS Chicago This month OSS Chicago will be tackling the broad topic of open source content management systems. The meeting starts at 7pm on Thursday February 19th, 2004. If you're in the Chicagoland area, you should come check it out. OSSC will be demoing several of the more sophisticated open source content management systems. Some possible candidates are Plone (python), WebGUI (perl), PostNuke (php), and Magnolia (java). We'll also discuss the differences between website management systems, portals, blogs, and content management frameworks. And finally we'll provide many resources for evaluating content management systems for your needs.

Scenarios and Procedures for Microsoft
Systems Management Server 2003: Software
Distribution and Patch Management


Scenarios and Procedures for Microsoft
Systems Management Server 2003: Software
Distribution and Patch Management
09/19/2004 05:52 PM

ZFS, the Last Word in File Systems?


ZFS, the Last Word in File Systems? 09/16/2004 12:40 PM

RO Content Management System


RO Content Management System 12/29/2003 06:47 PM
RoNuke v0.3 released!

Why is a Content Management System bad
for SEO?


Why is a Content Management System bad
for SEO?
12/19/2004 03:08 PM

hel Content Management System


hel Content Management System 04/18/2005 11:33 PM
Work begun today on hel!

File-sharing systems in legal win


File-sharing systems in legal win 08/20/2004 06:19 AM
A US court has ruled that file-sharing firms are not responsible for what users do with their software.

Sun, Microsoft Take Different Tacks on
File Systems


Sun, Microsoft Take Different Tacks on
File Systems
06/03/2004 08:35 PM
Sun's recently announced Dynamic File System and Microsoft's WinFS, found in Windows Longhorn, take different tacks to storage resource management, compatibility and performance.

File Management Framework


File Management Framework 08/29/2004 02:12 PM
pyfmf: First Release

Contenido. Web Content Management System


Contenido. Web Content Management System 10/29/2003 11:26 AM
Contenido 4.4.1 is out!

Latent semantic indexing explained

The following phrases have been identified by the grok system as matching this entry: "ipod corrupted file" open egov freeware "content management system" un-redact software decode iraq the meaning of iuident aolx screen name extractor example os x journaling lsi semantic indexing adium "no mountable file systems"

















Also check out:


Grok

Ipod Porn on the
Rise

Brief Abstract of
Wikipedia's
Mesothelioma Cancer
page

Get first aid
instructions in your
cell phone

IE is crap
JSPWiki gains
podcasting support

Murder probes as two
men killed

Police issue
firebomb warning

Ceremony marks end
of era for QE2

Blunder grounds
Easyjet flights

Palm jazzes up 2
low-priced PDAs

Pupil charged in
Internet threats

Google's dual-share
structure shrouded
in hypocrisy

So, you think you
want to buy Google
stock?

PC Magazine's 2004
Top 100 Sites You
Can't Live Without

Java Tile Based Role
Playing Game

KisKis - Keep It
Secret! Keep It
Safe!

Elektra Center
Ingenix
Stakes are high as
NYSE, Nasdaq vie to
list Google

SIGNATURE STYLE
Goody Steinberg
Letting in the light
Silicon Valley homes
exhibit modern style
tailored to fit

So, you think you
want to buy Google
stock? Answers to
frequently asked
questions

Bush: Disgusted by
Abuse of Iraqis,
Vows to Act

Saddam's Old
Soldiers Patrol
Falluja in U.S. Deal

U.S. Embassy Says
Westerners Killed in
Saudi

Saudi Says
Foreigners/Saudis
Killed in Clash

Investors get
glimpse of Google's
ideas, goals

Hooft
Information Science
Today

Backchanneling
The Winston
Churchill Travel
Fellowship Report

Center for
International
Development at
Harvard University

Preliminary Results
of Survey of
Musicians

Higher Education &
Research
Opportunities in the
United Kingdom

Web Panels for PHP
Conspiracy Charge
Added in Jackson
Indictment (Los
Angeles Times)

Names of Iraq War
Dead Read on ABC
Show (AP)

Americans Said Among
Dead in Saudi Attack
(AP)

Please test the
proposed NewsForge
layout change

Replacing Notepad
Again

Uh Oh!
Fire 'started
deliberately'

Snooker: O'Sullivan
into final

Day of welcome for
EU newcomers

MacMinute Executive
Briefing: May 1

Rules of thumb to
calculate company's
value
(SiliconValley.com)

Novell aims to
recognize partners
that influence
government sales
deals

DEA Agent Shoots
Self in Gun Safety
Class (AP)

Google's Gmail moves
e-mail further down
road

Q & A: Auction
process opens way to
buy shares in Google
IPO

Google.com auction
process could shake
up Wall Street

MI5 posts terror
advice on Internet

'Google won't be a
slave to Wall
Street'

Blair appoints
working peers

McConnell welcomes
new EU states

Businessmen join
House of Lords

what is grok?