stargeek
PHP news website logo.
home    PHP scripts    articles    seo tools    links    search    contact    shop    realtors


Pull Parsing in C# and Java







Pull Parsing in C# and Java

Pull Parsing in C# and Java 05/23/2002 10:39 PM




This is a GrokNews Entry: (what is grok?)





Similar Items

Pull Parsing in C# and Java

Grok Headline matches for Pull Parsing in C# and Java

Boost performance when parsing (Java
Pro)


Boost performance when parsing (Java
Pro)
09/12/2002 07:48 AM

Java command line option parsing suite


Java command line option parsing suite 04/12/2005 11:54 AM
JArgs 1.0 Released

Parsing XML with Perl


Parsing XML with Perl 07/21/2002 10:36 PM
CNET Jul 21 2002 10:12PM ET

Article on Parsing RSS


Article on Parsing RSS 11/18/2002 12:58 PM
I have put up an article on how I parse RSS files. Also, in the same article I provide my RSS parser as a free download. I'd appreciate any feedback on it. The WAI compliancy will have to wait another day. One thing the Bobby accessibility validator doesn't like about my site is the links below every new post. The "permalink" and "comments" are specifically what it doesn't like. This is because the same text is repeated for each news post, although each one points to something (slightly) different. I don't want to get rid of these links, so I'm looking for a suitable (perhaps graphical) alternative.

Parsing OWL in RDF/XML Published


Parsing OWL in RDF/XML Published 01/22/2004 03:25 AM
2004-01-21: The Web Ontology Working Group has released Parsing OWL in RDF/XML as a Working Group Note. The OWL language is used to publish and share sets of terms called ontologies, supporting advanced Web search, software agents and knowledge management. This document describes a strategy for OWL-RDF parsers. Read about the Semantic Web. (News archive)

Parsing RSS At All Costs


Parsing RSS At All Costs 01/22/2003 07:41 PM
In his second Dive into XML column, Mark Pilgrim describes his parse-at-all-costs parser of ill-formed RSS feeds, using Python's sgmllib.

More XML: Parsing with Evolt.org


More XML: Parsing with Evolt.org 08/14/2002 08:16 AM

Independently Parsing Perl


Independently Parsing Perl 06/17/2005 04:30 PM
Stodgy, boring languages have great editors. What's keeping Perl from refactoring support, perfect syntax highlighting, and other advanced transformation techniques? It's really difficult to parse Perl. Fortunately, Adam Kennedy's PPI project provides a standalone Perl parser that operates correctly on all but 28 of the 38,000 CPAN modules. Here's how it works and what you can do with it.

RSS native parsing in the next Firebird


RSS native parsing in the next Firebird 02/10/2004 02:42 AM

This is new to me. I was checking out the nightly builds of Firebird 0.8 betas (windows and linux, mac< /a>) and they' ve got an rss button and panel that parses RSS, with titles linking to the main window. Slick, but they need to let you track which ones have new/old items.

update: It turns out I'm actually a dumbass. I installed this RSS extension so long ago I forgot about it, and because I never saw it show up in any menu, I figured it never "took" on my Firebird install. Then when I had the new nightly build the toolbars were out of whack on first run so I went to customize them and saw the RSS button for the first time, and assumed it came with Firebird 0.8. My bad.


Parsing a Querystring With Perl


Parsing a Querystring With Perl 12/19/2002 07:40 PM
Stickysauce Dec 19 2002 6:46PM ET

Parsing the News.com RSS feed with PHP


Parsing the News.com RSS feed with PHP 12/11/2003 02:48 AM
CNET Dec 11 2003 2:44AM ET

dtddoc step 1: Parsing a DTD


dtddoc step 1: Parsing a DTD 10/02/2002 09:35 AM
Our quest to build a better automatic DTD documentation tool begins with a quick look at some of the available DTD parsers for Java, Perl, and PHP. By Michael Classen. 1002

Simple XML parsing with SAX and DOM
(OnJava.com)


Simple XML parsing with SAX and DOM
(OnJava.com)
07/01/2002 08:28 AM

Python parsing module


Python parsing module 12/18/2003 01:00 PM
pyParsing Python library - version 1.0.1 released

BitFlux Blog: Parsing Bad XML in PHP 5.1


BitFlux Blog: Parsing Bad XML in PHP 5.1 08/19/2004 10:10 AM
In a new note from the BitFlux blog, Christian Stocker has information about the latest patch comitted to the PHP 5.1 branch that allows you to parse not well-formed XML documents and adds the missing elements, eg. missing closing tags.

Functional XML Parsing Framework 5.1


Functional XML Parsing Framework 5.1 09/16/2004 09:22 PM
SAX/DOM/SXML parsers with support for XML namespaces and validation.

Features: Non-Extractive Parsing for XML


Features: Non-Extractive Parsing for XML 05/19/2004 07:15 PM
Changing the way XML parsers are written can make parsing more efficient and more flexible.

Making the News: Parsing RSS Feeds With
PHP


Making the News: Parsing RSS Feeds With
PHP
11/13/2002 08:59 AM

dtddoc step 1: Parsing a DTD
(WebReference.com)


dtddoc step 1: Parsing a DTD
(WebReference.com)
10/08/2002 09:14 AM

Liberal XML parsing related to
personality?


Liberal XML parsing related to
personality?
02/12/2004 07:41 PM
The heat of the discussion on liberal XML parsing has subsided, so this is actually a little late. That's because I wasn't sure if I should post this. But a post by Dave Winer today convinced me to post it anyway. Let me just say up front that I could be completely wrong. ?

Introduction to Event-Driven XML Parsing


Introduction to Event-Driven XML Parsing 02/10/2004 02:49 AM
Apple documents the new-in-Panther NSXMLParser class.

S-exp-based XML parsing/query/conversion


S-exp-based XML parsing/query/conversion 09/16/2004 07:33 PM
SSAX-SXML Release 5.1

Re: Internet Explorer URL parsing
vulnerability


Re: Internet Explorer URL parsing
vulnerability
12/09/2003 03:45 PM
soulshok_at_hippie.dk (Dec 09 2003)

Flaw in Microsoft JPEG Parsing


Flaw in Microsoft JPEG Parsing 09/14/2004 06:12 PM

WPkontakt message parsing error


WPkontakt message parsing error 12/24/2004 12:36 PM
Jaroslaw Sajko (Dec 23 2004)

Parsing XML documents with Perl's
XML::Simple


Parsing XML documents with Perl's
XML::Simple
09/20/2004 12:46 AM
CNET Sep 20 2004 4:09AM GMT

BitFlux Blog: Parsing Bad XML - Part 2


BitFlux Blog: Parsing Bad XML - Part 2 08/20/2004 08:31 AM
In response to his introduction of the non-well-formed XML patch the other day, Christian Stocker has a new posting with a bit of a rebuttal on the subject.

Warcraft III Replay Parsing Library


Warcraft III Replay Parsing Library 08/09/2004 11:30 AM
W3RepLib 0.9 beta released!

The State of the Union Parsing Tool


The State of the Union Parsing Tool 02/05/2005 09:55 PM

style.org/stateoftheunion/parse
track this site | 3 links


Parsing XML documents with Perl
(Builder.com)


Parsing XML documents with Perl
(Builder.com)
07/18/2002 07:34 PM

Internet Explorer URL parsing
vulnerability


Internet Explorer URL parsing
vulnerability
12/09/2003 01:22 PM
bugtraq_at_zapthedingbat.com (Dec 09 2003)

URL Parsing Bug in IE Invites Phishing
Attacks


URL Parsing Bug in IE Invites Phishing
Attacks
06/11/2004 09:09 PM
The bug, which affects fully patched versions of IE, lets malicious sites assume the privileges of more trusted zones.

RE: Internet Explorer URL parsing
vulnerability


RE: Internet Explorer URL parsing
vulnerability
12/10/2003 01:52 PM
http-equiv_at_excite.com (Dec 09 2003)

High Speed XML Parsing is Not Intuitive


High Speed XML Parsing is Not Intuitive 02/11/2004 03:58 AM
For a PHP weblog, there haven't been many PHP articles or links recently. This is because I feel most recent PHP articles I read have nothing fresh to say, repeating material I linked to 2 or 3 years ago. Perhaps I'm getting jaded. So to keep things fresh, here's a new article, mostly original, and hopefully of some interest to everyone!

Last year, Tim Bray, one of the co-authors of the XML spec, mentioned that he used Perl regular expressions to parse XML.

Now here's the dirty secret; most of it is machine-generated XML, and in most cases, I use the perl regexp engine to read and process it.

I was struck by this because I would have thought XPath or SAX would provide better performance as they are APIs tuned specifically for XML.

I decided to do some benchmarks to determine which techniques were better. I also wanted a realistic test, so I benchmarked parsing the RSS feed of this web-site, searching for the contents of all title tags, and returning the contents as an array. The RSS file is from Nov 2003 (yes i did this benchmark that long ago), and is about 20K and has 12 title tags, so the returned array will have 12 title strings.

The techniques used were:

1. Regular expression: preg_match_all('/<title>([^<]*)/',$rss,$titles_arr))

2. Explode('<title>', $rss) then strip the matching </title> tag using strpos() and substr().

3. XPath, using $title_nodes = $ctx->xpath_eval("//title");

4. SAX, wrote an element handler function that matched and processed the title tag.

5. DOM, using $titles = $dom->get_elements_by_tagname('title'). Intuitively, this should have been the slowest, as the whole tree is generated.

Results

Here are the timings for processing the RSS file 1000 times. Faster is better.

            seconds       Relative 
                          to REGEX
REGEX       0.1080          1.00
EXPLODE     0.1696          1.57
DOM         6.3212         58.53
XPATH       8.3417         77.24
SAX        10.0851         93.38

Conclusion

Intutively, I would have thought that XPath would be the fastest as XPath expressions can be compiled and tuned for XML. But the best performance was achieved using regular expressions, which is what Tim is using.

It appears that the DOM, SAX and XPath libraries remain immature (compared to the Perl-compatible regex library) and are not highly optimized. Strangely enough, DOM performance is better than XPath and SAX! Perhaps someone else can explain why.

If anyone is interested, i can post the source.

Test platform: Windows 2000, PHP 4.3.3. I also tested on Linux, PHP 4.3.2, with similar results.


OpenSSL ASN.1 parsing bugs PoC / brute
forcer


OpenSSL ASN.1 parsing bugs PoC / brute
forcer
01/16/2004 10:59 AM
Bram Matthys (Syzop) (Jan 15 2004)

Codewalkers.com: Parsing INI Files Made
Easy


Codewalkers.com: Parsing INI Files Made
Easy
12/16/2003 08:58 AM
When working with a PHP script, sometimes it's just easier to have some of the configuration options outside of the souce. Not only does this make things a bit more friendly for the user, but it makes less debugging for you in the long run. But, to harness this feature, you might need a shove in the right direction - and that's where this new article comes in.

[OpenSSL Advisory] Denial of Service in
ASN.1 parsing


[OpenSSL Advisory] Denial of Service in
ASN.1 parsing
11/04/2003 12:13 PM
Mark J Cox (Nov 04 2003)

[ESA-20031104-029] 'openssl' ASN.1
parsing denial of service


[ESA-20031104-029] 'openssl' ASN.1
parsing denial of service
11/04/2003 01:23 PM
EnGarde Secure Linux (Nov 04 2003)

Incremental XML Parsing and Validation
in a Text Editor


Incremental XML Parsing and Validation
in a Text Editor
12/15/2003 02:29 AM
On 10 December 2003 at XML 2003 in Philadelphia, James Clark presented the ideas and implementation behind his nXML XML editing mode for GNU Emacs.
Grok Description matches for Pull Parsing in C# and Java
GrokA matches for Pull Parsing in C# and Java

Pull Parsing in C# and Java

The following phrases have been identified by the grok system as matching this entry:

















Also check out: