For a PHP weblog, there haven't been many PHP articles or links
recently. This is because I feel most recent PHP articles I read have
nothing fresh to say, repeating material I linked to 2 or 3 years ago.
Perhaps I'm getting jaded. So to keep things fresh, here's a new
article, mostly original, and hopefully of some interest to everyone!
Last year, Tim Bray, one of the co-authors of the XML spec,
mentioned that he used Perl regular expressions to parse
XML.
Now here's the dirty secret; most of it is
machine-generated XML, and in most cases, I use the perl regexp engine
to read and process it.
I was struck by this because I would have thought XPath or SAX
would provide better performance
as they are APIs tuned specifically for XML.
I decided to do some benchmarks to determine which techniques were
better. I also wanted a realistic test, so I benchmarked parsing the
RSS feed of this
web-site, searching for the contents of all title tags, and returning
the contents as an array. The RSS file is from Nov 2003 (yes i did
this benchmark that long ago), and is about 20K and has 12 title tags,
so the returned array will have 12 title strings.
2. Explode('<title>', $rss) then strip the matching </title>
tag using strpos() and substr().
3. XPath, using $title_nodes = $ctx->xpath_eval("//title");
4. SAX, wrote an element handler function that matched and
processed the title tag.
5. DOM, using $titles = $dom->get_elements_by_tagname('title').
Intuitively, this should have been the slowest, as the whole tree is
generated.
Results
Here are the timings for processing the RSS file 1000 times. Faster
is better.
seconds Relative
to REGEX
REGEX 0.1080 1.00
EXPLODE 0.1696 1.57
DOM 6.3212 58.53
XPATH 8.3417 77.24
SAX 10.0851 93.38
Conclusion
Intutively, I would have thought that XPath would be the fastest
as XPath expressions can be compiled and tuned for XML. But the best
performance was achieved using regular expressions, which is
what Tim is using.
It appears that the DOM, SAX and XPath libraries remain immature
(compared to the Perl-compatible regex library) and are not highly
optimized. Strangely enough, DOM performance is better than XPath and
SAX! Perhaps someone else can explain why.
If anyone is interested, i can post the source.
Test platform: Windows 2000, PHP 4.3.3. I also tested on Linux, PHP
4.3.2, with similar results.
High-speed hopes for AT&T07/23/2004 11:10 AM The company's slow road back into the residential phone market will be
tied to broadband's march across the United States.
It's not 1998, I promise, despite this announcement
from Microsoft and Thomson to release an updated MSN-branded set-top
box for browsing the web and streaming media from other household PCs.
To launch in October, the $200 broadband-enabled box will come with a
wireless keyboard and remote and will not have much in the way of
built-in storage to "prevent vulnerability to virus attacks" (and to
keep things cheap). MSN TV (formerly Web TV) has a small, if
anachronistically loyal following, but is there really a need for a
second generation, even if it does support broadband?
Especially if it supports broadband?
The current-generation device is pictured; we'll get you pictures
of the new device as soon as we can.
New Outlet for High-Speed Access02/18/2004 07:51 AM Services offering broadband over power networks are still far from
prevalent or profitable. But with advances in transmission capability,
along with greater attention from regulators, the technology is
gaining momentum. By Joanna Glasner.
Something to look out for when you travel: hotels advertising "Free
High-Speed Internet" may be fudging the terms just a little bit.
I stayed in a hotel in New Jersey once that advertised this. I was
all excited to get my laptop hooked up and check my email when I
discovered what it really meant —
They had some kind of Web TV thing. You got high-speed service to
about 15 sites on the television: CNN, the New York Times, and a few others.
To get "Free and Unrestricted High-Speed Internet," you had to
pay $14.95 a day. And even then, you were still hampered by this
crappier-than-WebTV interface that my Web mail refused to work
for.
Needless to say, I was a little irritated. Now I never take "Free
High-Speed Internet" at face value again. I always ask at the front
desk. Not all hotels are this way (this one, in particular, was great), but has anyway
else encountered this same problem?
High-Speed DVD Burner Roundup
High-Speed DVD Burner Roundup05/25/2004 11:59 AM Are single layer burners still relevant? You bet. With prices dropping
below the $100 mark, high-speed, single layer burners are versatile
and speedy. We check out the current crop of DVD burners and pick our
favorites.
High-speed US net 'pirates' sued04/13/2005 12:04 PM US record companies and movie studios sue students they say have
shared files over a high-speed network.
FCC: High-speed connections up in 200401/04/2005 05:25 PM Homes and businesses with speedy connections increased from 28.2
million to 32.5 million lines during six-month period ending in June
2004, report says.
America: Still the High-Speed Laggard04/08/2005 06:18 PM “Compared to South Korea, Japan, even Canada, broadband adoption
in the U.S. is falling behind. For 20% of Americans, it’s not
even an option In the early 1990s, Taylor Reynolds spent time as an
exchange student in South Korea — a good deal of it hunting for
a computer on which to write his term papers. “I finally found
someone whose sister worked in a preschool, and it had a
computer,” he remembers. “I had…
2 in 5 Web Users Have High-Speed at Home (AP)04/19/2004 06:58 AM AP - Two in five Internet users in the United States now have
high-speed access at home as telephone companies slash prices to
better compete with cable broadband services, a study says.
Update: Mobile High Speed 3G 4.10
Update: Mobile High Speed 3G 4.1012/17/2004 06:26 PM The 3G, EDGE, and GPRS mobile phone connection manager adds support
for recently-released mobile phones from Nokia, Motorola, and
SonyEricsson.
Sony is finally getting around to
releasing their 2GB Memory Stick Pro Duo (High-Speed) cards. They
announced these things back in September of 2004, and have kicked the
release date back twice. The "(High-Speed)" in the name indicates the
cards have a transfer rate of 80Mbps (10MB/sec), which is far more
than you'll ever need for anything practical. Sony really needs to do
something about this Memory Stick naming scheme, because it's getting
to be like Street Fighter games. I needn't remind you this stick is
PSP compatible.
High-Speed Love Connection06/24/2004 04:46 AM Have an Internet hookup? Sexual gratification is a mouseclick away,
thanks to remote-control, scriptable sex toys. By Xeni Jardin.
Update: Mobile High Speed 3.2
Update: Mobile High Speed 3.202/17/2004 11:51 AM The GPRS/EDGE and HSCSD/GSM mobile phone connection manager adds
support for the SonyEricsson GC82 EDGE/GPRS and Sierra Wireless AC750,
an improved interface, and other changes.
AT&T plugs into power lines for high-speed Net07/14/2004 06:57 PM AT&T teams up with power company to test how broadband can be sent
over power lines, an emerging alternative to cable and DSL.
Other nations zip by USA in high-speed Net race (USATODAY.com)
High-Speed Internet Lines Increase in U.S. (AP)06/10/2004 08:50 PM AP - The number of high-speed Internet lines in the United States
increased 42 percent last year, and service now is available in all
but 7 percent of the nation's ZIP codes.
Music Biz Sues High-Speed Traders04/13/2005 05:46 AM The blazing-fast Internet2 research network has been hijacked for
illegal file sharing, according to the entertainment industry. The
RIAA and MPAA launch another round of lawsuits against hundreds of
students. By Katie Dean. Grok Description matches for High Speed XML Parsing is Not Intuitive GrokA matches for High Speed XML Parsing is Not Intuitive
High Speed XML Parsing is Not Intuitive
The following phrases have been identified by the grok system as matching this entry: