Latent semantic indexing explained
Grok Headline matches for Latent semantic indexing explained
Semantic Indexing
Semantic Indexing
09/17/2004 08:43 AMSemantic Indexinghttp://www.nitle.org/s
emantic_search.phpSemantic indexing is their name for
a family of techniques for searching and organizing large data
collections. The goal of semantic indexing is to find patterns in
unstructured data (documents without descriptors such as keywords or
special tags) and use those patterns to offer more effective search
and categorization services. Semantic indexing techniques are
language-agnostic, so data collections don't have to be in English, or
even in any human language at all. For example, they have had good
preliminary results in protein structure prediction using algorithms
adapted from a text search engine. Latent Semantic Indexing (LSI or
LSA, for latent semantic analysis) was originally described in a
1990
paper by Deerwester, Dumais, Furnas, Landauer, and Harshman, and
is a topic of active study. You can find links to journal articles and
other LSI websites on our
refer
ences page. This has been added to the semantics web section of
Deep Web Research Subject
Tracer™ Information Blog and
Bot Research Subject
Tracer™ Information Blog.
Long Tail of Latent Demand
Long Tail of Latent Demand
12/27/2004 11:16 PMI'm a huge fan of The Long Tail, but the demand it represents is
nothing new. What's new is how we discover it. Latent Demand
(also known as Induced Demand) is the potential earnings if a market
is served efficiently. ...
Magpie - The Semantic Filter and Tool
For the Semantic Web
Magpie - The Semantic Filter and Tool
For the Semantic Web
12/28/2004 06:58 AMMagpie - The Semantic Filter and Tool For the Semantic
Web
http://kmi.open
.ac.uk/projects/magpie/main.html
Magpie uses ontology
infrastructure to semantically markup web documents on-the-fly. The
existing technologies in this problem domain tend to be rather
heavyweight, and often modify the appearance of the actual webpage.
Whilst these modifications may sometimes be acceptable, sometimes they
may be a cause of a serious annoyance on user's behalf. Often, the
existing technologies rely on one very specific ontology... To
alleviate some of these issues, they started work on the Magpie
technology that would be lightweight and provide sufficiently robust
and flexible features for semantically enriched browsing. Magpie tool
aims to identify and filter out the concepts-of-interest from any
webpage it is given. The current set of concepts can be influenced by
a selection of a particular ontology of concepts and relations. In
addition to identifying the concepts-of-interest that are relevant
from the perspective of a particular ontology, each such concept may
provide an applicable set of relations or commands that can be
executed. Such relationships are both, determined and evaluated
dynamically by querying the ontology server. Another feature they
believe improves the user's experience is the ability to turn the
semantic menus ON or OFF, to highlight all instances belonging to a
particular ontological class, to follow and semantically process the
links embedded in the document. This has been added to the Semantic
Web Research section of
Deep Web Research Subject
Tracer™ Information Blog.
Semantic Blogging: Spreading the
Semantic Web Meme
Semantic Blogging: Spreading the
Semantic Web Meme
05/08/2004 06:20 AMSemantic Blogging: Spreading the Semantic Web Meme by Steve
Cayzerhttp://snipurl.com/66yjSteve is a research engineer at Hewlett-Packard's (HP) laboratories
in Bristol, England. He is interested in the intersection of semantic
web technologies and machine learning techniques, such as automated
classification and metadata enrichment. He also has a semantic blog.
This paper is about semantic blogging, an application of the semantic
web to blogging. The semantic web promises to make the web more useful
by endowing metadata with machine processable semantics. Blogging is a
lightweight web publishing paradigm which provides a very low barrier
to entry, useful syndication and aggregation behaviour, a simple to
understand structure and decentralized construction of a rich
information network. Semantic blogging builds upon the success and
clear network value of blogging by adding additional semantic
structure to items shared over the blog channels. In this way we add
significant value allowing view, navigation and query along semantic
rather than simply chronological or serendipitous connections. Our
vision is to use semantic web tools and ideas to help move blogging
beyond communal diary browsing to rich information sharing scenarios.
We have built a simple prototype as an illustration of this vision.
This has been added to the Semantic Web Research section of the
Deep Web Research Subject
Tracer™ Information Blog.
Indexing TV
Indexing TV
12/17/2004 06:31 PMFrom a Bilnkx press release: blinkx is the first search engine to make
such TV programs fully searchable on demand. Because blinkx captures
and indexes the entire video stream directly from the television,
consumers can get straight to the exact clip they want. blinkx TV can
be accessed at http://www.blinkx.tv/ Blinkx says it "captures and
indexes video streams across news, sports and entertainment
programming from 22 channels, including Fox News, ESPN and Biography."
I haven't had a chance to try it, and I'm in the air all day (= 24
hours from takeoff till final landing) today......
B plus tree indexing in C# with .NET
B plus tree indexing in C# with .NET
06/08/2004 03:47 PMpre alpha uploaded for the brave :)
Google Indexing IRC?
Google Indexing IRC?
11/04/2003 11:00 AMTony Collen reports (confirmed by Google and others, apparently) that
Google is sending robots to IRC channels (internet chat rooms) as part
of some experiment. Since things said in chat rooms are traditionally
considered very private, I can't imagine what they plan to do....
The Soundex Indexing System
The Soundex Indexing System
03/30/2005 11:26 AMThe Soundex Indexing Systemhttp://www.archives.gov/research_room/genealogy/census/soundex.ht
mlTo use the census soundex to locate information
about a person, you must know his or her full name and the state or
territory in which he or she lived at the time of the census. It is
also helpful to know the full name of the head of the household in
which the person lived because census takers recorded information
under that name. The soundex is a coded surname (last name) index
based on the way a surname sounds rather than the way it is spelled.
Surnames that sound the same, but are spelled differently, like SMITH
and SMYTH, have the same code and are filed together. The soundex
coding system was developed so that you can find a surname even though
it may have been recorded under various spellings. This has been added
to
Finding People Subject
Tracer™ and
Genealogy Resources
Subject Tracer™ Information Blogs.
Fix Mail.app indexing issues under 10.3
Fix Mail.app indexing issues under 10.3
08/02/2004 01:58 PMThis is an update to the Reduce mailbox sizes in Mail hint published
last year, by deleting the index files and letting Mail.app recreate
them.
Now, under 10.3, everything's different...
In ~/Library/Mail, you will see vario...
Time For An Indexing Summit?
Time For An Indexing Summit?
01/06/2005 10:09 PMYahoo Indexing One or Two Pages Only
Yahoo Indexing One or Two Pages Only
04/08/2005 10:04 AMSearch Engine Journal Apr 8 2005 12:32PM GMT
Indexing SQL Server for speed
Indexing SQL Server for speed
02/16/2004 08:00 AMCNET Feb 16 2004 11:11AM GMT
Briefcase: Indexing Google
Briefcase: Indexing Google
09/03/2004 11:23 PMIHT Sep 4 2004 2:50AM GMT
Google Indexing Quirk
Google Indexing Quirk
09/05/2004 04:11 AMGoogle is dying: I
think the title of this bit may be a little dramatic, but the author
points out an interesting phenomenon with Google pages lately.
On sites with more than a few thousand pages, Google is not
indexing anywhere from ten percent to seventy percent of the pages it
knows about. These pages show up in Google's main index as a listing
of the URL, which means that the Googlebot is aware of the page. But
they do not show up as an indexed page.
When the page is listed but not indexed, the only way to find it in
a search is if your search terms hit on words in the URL itself. Even
if they do hit, these listed pages rank so poorly compared to indexed
pages, that they are almost invisible. This is true even though the
listed pages still retain their usual PageRank. [...]
Google is dying. It broke sixteen months ago and hasn't been fixed.
It looks to me as if pages that have been noted by the crawler cannot
be indexed until some other indexed page gives up its docID
number.
Click here to comment on this entry
Oracle, IBM Offer XML Indexing
Oracle, IBM Offer XML Indexing
06/24/2005 07:37 PMOracle and IBM aim to help customers rein in new applications not
easily integrated with traditional data structure models.
xindy - A Flexible Indexing System
xindy - A Flexible Indexing System
08/07/2004 08:44 AMMailing list moved to Sourceforge
Google tests tool to aid Web indexing
Google tests tool to aid Web indexing
06/05/2005 11:35 PMZDNet Jun 3 2005 5:17PM GMT
Google Testing New Indexing Approach
Google Testing New Indexing Approach
06/05/2005 11:59 PMSince its inception, Google has tried to make sense of billions of Web
documents and using advanced in-house technology. But now, Google is
experimenting with a new concept to better its search crawlers: ask
webmasters for help. The program, called Google Sitemaps, could
revolutionize how the Web is indexed.
"Google's indexing of syndication
feeds
"
"Google's indexing of syndication
feeds
"
04/23/2004 02:43 AMGoogle Indexing Stop Words
Google Indexing Stop Words
06/27/2004 09:32 AMThis goes back to changes made when Google introduced stemming. Google
is now indexing stop words - even though the message says it does.
How To Stop Yahoo! From Indexing PDF
Files
How To Stop Yahoo! From Indexing PDF
Files
09/14/2004 07:35 AMGoogle Indexing and Listing WML Files
Google Indexing and Listing WML Files
04/29/2004 12:02 PMWML was never as big as the prehype that accomponied it, but it is
nice to see the support in the Google index.
Google Print Now Indexing Magazine
Articles?
Google Print Now Indexing Magazine
Articles?
04/09/2004 04:12 PMKudos to Greg Notess, who pointed out that Google Print, that beta
endeavor that's focused on indexing book content, is now indexing
magazine articles. You can read his article at...
Google Indexing Redirect Problems
Continue
Google Indexing Redirect Problems
Continue
06/05/2005 11:27 PMWe had high hopes that the issue was resolved, but not only is it not
resolved, Google itself is a victim of a redirect via scrapped
content: "... it definitely should serve as a wake up call for Google
and hopefully this will find a remedy for the situation once and for
all. If someone can highjack a PR 9 page of Google's own site, with a
lousy PR 5 domain, it means any page, from any site, can be
highjacked."
GoDaddy Blocks GoogleBot Indexing on
Domains
GoDaddy Blocks GoogleBot Indexing on
Domains
11/07/2003 09:56 AMA long running mystery has been solved with sites disappearing from
Google. The main theme found by members was that the sites were all
hosted by GoDaddy. Google has confirmed that GoDaddy is blocking
GoogleBot.
Google indexing system exploit allows
position hijacking
Google indexing system exploit allows
position hijacking
06/06/2005 12:06 AMpeterme.com: Mob indexing? Folk
categorization? Social tagging?
peterme.com: Mob indexing? Folk
categorization? Social tagging?
01/04/2005 09:18 AMpeterme.com: Mob indexing? Folk categorization? Social tagging? ..
Peterme .. HT
peterme.com/archives/000444.html
track this
site | 3 links
Incremental Indexing Shows Minor
Reshuffling of Google Index
Incremental Indexing Shows Minor
Reshuffling of Google Index
06/05/2005 11:27 PMUpdate or Not? It is hard to determine these days. Either way, it
looks like a small shuffling of the Google index is taking place at
several data centers.
Capitals, Character Sets, Filenames and
Search Engine Indexing.
Capitals, Character Sets, Filenames and
Search Engine Indexing.
08/26/2002 10:33 AMCould character set issues be stopping you from being indexed? An
interesting thread on filename and character set issues surrounding
search engine indexing.
KM re-explained
KM re-explained
06/05/2005 11:10 PMThe K stands for blogs. The M stands for tags. Put 'em together and
you get "KM." [Technorati tag: km]...
RSS explained
RSS explained
11/17/2003 05:47 AMRSS -- the little technology with the big, big list of
acronym-expansions -- has even more acronym expansions than heretofore
suspected; Google definitions has the scoop:
Repetitive stress syndrome that is caused by repetitive movement
(causes Carpal Tunnel Syndrome when occurs at wrist/hand)
(Regional Subscription System) The system by which U S WEST processes
Equal Access information that allows end users to obtain service from
their Interexchange Carrier of Choice.
Radio Science Subsystem (orbiter science investigation)
Link
(
via EvHead)
"Points" Explained
"Points" Explained
07/19/2004 09:31 AMLearn what points are before you pay them.
Options Explained
Options Explained
09/10/2004 09:22 AMLearn how these potentially risky investments work before considering
them.
PCI Express Explained
PCI Express Explained
08/19/2004 10:12 AMEnterprise Value Explained
Enterprise Value Explained
06/08/2004 08:46 AMDon't neglect debt and cash when determining a company's price tag.
Wolfram explained
Wolfram explained
07/23/2004 01:04 PMI just came across a Forbes article by Michael S. Malone, dated
11.27.00, called "God, Stephen Wolfram and Everything Else." It's a
good, non-technical introduction to Wolfram. Nicely done. Critics of
Wolfram won't find much to like in it, and I still think Ray
Kurzweil's piece is the best analysis/intro I've read, but Malone puts
Wolfram into a useful perspective....
"brilliantly explained how we can win"
"brilliantly explained how we can win"
08/10/2004 09:25 PMXML Namespaces Explained
XML Namespaces Explained
11/26/2002 11:28 PMWebmasterBase Nov 26 2002 10:44PM ET
Hacking Explained
Hacking Explained
12/30/2003 01:29 AMGrok Description matches for Latent semantic indexing explained
GrokA matches for Latent semantic indexing explained
Fixing a Corrupted Iuident.cab File
Fixing a Corrupted Iuident.cab File
04/11/2004 10:43 AMWhat Is File System Journaling?
What Is File System Journaling?
12/15/2003 01:16 AMHave you ever wondered, "What is File System Journaling"? Don't worry,
most of us have. File System Journaling was originally a feature only
available to the server world. Finally, File System Journaling has
made it to our favorite home computers.
Join us to learn more about FSJ.
Like Pixels? Check out
MacDesignFile system journaling won't prevent all
disk damage
File system journaling won't prevent all
disk damage
01/16/2004 11:05 AMToday I discovered that just because one has file system Journaling
enabled, that doesn't mean one still shouldn't check such HDs from
time to time with a disk repair utility, or that a Journaled HD cannot
get directory damag...
Fixing a Corrupted Money File
Fixing a Corrupted Money File
08/19/2004 12:04 AMTest Drive a Bunch of Open Source
Content Management Systems
Test Drive a Bunch of Open Source
Content Management Systems
09/15/2004 09:19 AMNow THIS is a marvelous idea. Put a bunch of content management
systems online with screenshots, and descriptions. Make the admin
passwords public so that anyone can log in and...
Tender: UFI needs Content Management
System Application Software
Tender: UFI needs Content Management
System Application Software
05/07/2004 07:34 AMPublicTechnology.net May 7 2004 11:53AM GMT
Drupal, an open source platform and
content management system. Must
investigate
Drupal, an open source platform and
content management system. Must
investigate
05/15/2004 05:52 AMdrupal.org community plumbing .. Drupal 4.2 .. Drupal
drupal.org
track
this site | 4 links
IBM updates SAN File System software
IBM updates SAN File System software
05/25/2004 08:42 AMIBM Corp. next month will release a new version of its TotalStorage
SAN (storage area network) File System Software designed to work with
a wider variety of new server and storage environments.
IBM to update SAN File System software
IBM to update SAN File System software
05/25/2004 04:31 PMUnlike earlier versions of the SAN File System, which supported only
IBM products, Version 2.1 will work with storage devices from IBM
rivals including EMC, HP and Hitachi Data Systems.
Free File: AlbumWrap Extractor
Free File: AlbumWrap Extractor
09/20/2004 04:34 AMG4 Tech TV Sep 20 2004 8:24AM GMT
Practical File System Design with the Be
File System
Practical File System Design with the Be
File System
05/10/2004 04:21 PMCyberGuard's Webwasher Prevents
Microsoft JPEG Exploit; Content
Management Products Filter Files for
Malicious Code Regardless of File
Extension
CyberGuard's Webwasher Prevents
Microsoft JPEG Exploit; Content
Management Products Filter Files for
Malicious Code Regardless of File
Extension
09/21/2004 10:36 AMGlobal Evaluation Campaign of a Content
Management System Announced By XITEX
Software
Global Evaluation Campaign of a Content
Management System Announced By XITEX
Software
04/15/2005 04:41 AM2 months Global Evaluation Campaign of Xitex WebContent M1 due to its
new release has been announced by XITEX Software. Its aim is to
involve companies and sole developers into evaluation of new release
of Xitex WebContent M1. [PRWEB Apr 15, 2005]
St. Bernard Software Announces Open File
Manager 9.3 with Support for 64-bit
Computing Platforms
St. Bernard Software Announces Open File
Manager 9.3 with Support for 64-bit
Computing Platforms
08/11/2004 03:11 PMAMD Zone Aug 11 2004 7:07PM GMT
Semantic Web Content Accessibility
Guidelines for Current Research
Information Systems (CRIS)
Semantic Web Content Accessibility
Guidelines for Current Research
Information Systems (CRIS)
08/15/2004 06:13 AMSemantic Web Content Accessibility Guidelines for Current
Research Information Systems (CRIS)by A. Lopatenkohttp
://eprints.osti.gov/cgi-bin/dexpldcgi?qry1123892181;12Abstract:The most exciting challenge for CRIS
is to create a service for research information which should be
wide-spread, distributed and actual like Google, but at the same time
structured, trusted, with a complex search and navigation similar to
today CRIS application. The core technology for such a "new" CRIS is
the semantic web technology to integrate database contents with HTML
and XML web pages for being provided to the research interested
public. One (at the moment the best) possible way is to use RDF
(Resource Description Framework) which is also recommended by the W3
consortium. This has been added to the articles section of
Deep Web Research Subject
Tracer™ Information Blog.
Free File: Hungry Frog Freeware Arcade
Math Game
Free File: Hungry Frog Freeware Arcade
Math Game
09/25/2004 02:04 AMG4 Tech TV Sep 25 2004 5:14AM GMT
1U Rack-Mountable Pentium 4 Embedded
Development Platform Available for OEMs,
Software Developers, and System
Integrators
1U Rack-Mountable Pentium 4 Embedded
Development Platform Available for OEMs,
Software Developers, and System
Integrators
12/19/2004 03:45 PMIdeal for networking, Internet appliance, industrial automation, and
point-of-service (POS) applications, PL-01022 is a rack-mountable
embedded platform that features a Pentium 4 processor and up to 1 GB
DDR RAM. A high-performance, economical system with several
customization options, PL-01022 supports three Gigabit (10/100/1000)
Ethernet and two 10/100 LAN, a CompactFlash socket, two serial ports,
and two USB ports. [PRWEB Dec 15, 2004]
Hot Banana Wins 2005 e-Content Award -
Best Content Management System - CMS
Hot Banana Wins 2005 e-Content Award -
Best Content Management System - CMS
04/08/2005 04:55 AMHot Banana Software Inc., a leading North American Web Content
Management Suite (CMS) company, announced today that it has won the
2005 e-Content award for the best Content Management System. The
Canadian e-Content Awards are sponsored by the e-Content Institute and
were created to recognize and honor e-content products and services
used by Canadian organizations and individuals. [PRWEB Apr 8, 2005]
OSS Chicago: Content Management Systems
OSS Chicago: Content Management Systems
02/17/2004 07:45 PM
Submission by OSS Chicago
This month OSS Chicago will be tackling the broad topic of open source
content management systems. The meeting starts at 7pm on Thursday
February 19th, 2004. If you're in the Chicagoland area, you should
come check it out.
OSSC will be demoing several of the more sophisticated open source
content management systems. Some possible candidates are Plone
(python), WebGUI (perl), PostNuke (php), and Magnolia (java). We'll
also discuss the differences between website management systems,
portals, blogs, and content management frameworks. And finally we'll
provide many resources for evaluating content management systems for
your needs.
Scenarios and Procedures for Microsoft
Systems Management Server 2003: Software
Distribution and Patch Management
Scenarios and Procedures for Microsoft
Systems Management Server 2003: Software
Distribution and Patch Management
09/19/2004 05:52 PMZFS, the Last Word in File Systems?
ZFS, the Last Word in File Systems?
09/16/2004 12:40 PMRO Content Management System
RO Content Management System
12/29/2003 06:47 PMRoNuke v0.3 released!
Why is a Content Management System bad
for SEO?
Why is a Content Management System bad
for SEO?
12/19/2004 03:08 PMhel Content Management System
hel Content Management System
04/18/2005 11:33 PMWork begun today on hel!
File-sharing systems in legal win
File-sharing systems in legal win
08/20/2004 06:19 AMA US court has ruled that file-sharing firms are not responsible for
what users do with their software.
Sun, Microsoft Take Different Tacks on
File Systems
Sun, Microsoft Take Different Tacks on
File Systems
06/03/2004 08:35 PMSun's recently announced Dynamic File System and Microsoft's WinFS,
found in Windows Longhorn, take different tacks to storage resource
management, compatibility and performance.
File Management Framework
File Management Framework
08/29/2004 02:12 PMpyfmf: First Release
Contenido. Web Content Management System
Contenido. Web Content Management System
10/29/2003 11:26 AMContenido 4.4.1 is out!
Latent semantic indexing explained