stargeek
PHP news website logo.
home    PHP scripts    articles    seo tools    links    search    contact    shop    realtors


PHPKitchen: Validating URLs with PHP







PHPKitchen: Validating URLs with PHP

PHPKitchen: Validating URLs with PHP 08/16/2004 08:45 AM

In a pointer from PHPKitchen.com today, there's a script highlighted for anyone out there looking to validate any kind of URL conforming to just about any kinds of rules.




This is a GrokNews Entry: (what is grok?)





Similar Items

PHPKitchen: Validating URLs with PHP

Grok Headline matches for PHPKitchen: Validating URLs with PHP

phpKitchen: Plotting with PHP


phpKitchen: Plotting with PHP 02/12/2003 09:25 AM

Validating HTML


Validating HTML 06/05/2004 07:34 AM

I've been playing around with the W3C HTML validator, and I've found, sadly, that there's no easy way to get this page to validate. There were some problems that I fixed, but when I try to validate against 4.01 Transitional, I get about 50 errors related to the use of "&" in URLs.

Apparently you're supposed to use the HTML entity for the ampersand ("&") even in URLs. But since this entitiy isn't present in the URL in the address bar of the browser, and that's where you generally copy the URL from, how are you supposed to convert these without manually picking through every URL you use? You could try to get funky with regular expressions, but I can't imagine that would work perfectly in every case.

This brings up a larger point in that you can't really expect to validate a site where a large part of the HTML of the page is provided by people other than the original Web developer. Every entry on this page — comprising the entire middle section — can be entered by someone else, and how can I make sure they're entering valid HTML markup?

This is where HTML Tidy integration will work very well in PHP 5. Using this tool, you can validate HTML that people enter before you store it in the database, or before you output it. You can make sure all tags are closed, all tags match, etc. so perhaps you can hope for some sort of valid markup.

But, in an even larger sense, does validation matter much? I've never gotten any comment from anyone about the validation of this site. So what that I'm throwing 50 errors because of ampersands in URLs — can someone provide me with a valid (excuse the pun) reason why this matters?

I understand problems can occur from gross misuse of the HTML spec, but are all validation errors created equal? My apparent misuse of ampersands has got to rank pretty low on the sin list.

Click here to comment on this entry


Class-Validating-0.01


Class-Validating-0.01 01/04/2005 06:25 PM

PHPKitchen: The Best Of...PHP App.
Frameworks


PHPKitchen: The Best Of...PHP App.
Frameworks
12/18/2002 09:05 AM

Class-Validating-0.02


Class-Validating-0.02 01/06/2005 07:04 AM

Validating a Custom DTD


Validating a Custom DTD 02/01/2005 09:28 PM
In his article in this issue, Peter-Paul Koch proposes adding custom attributes to form elements to allow triggers for specialized behaviors. The W3C validator won't validate a document with these attributes, as they aren't part of the XHTML specification. Not to worry! This article will show you how to create a custom DTD that will add those custom attributes, and will also show you how to validate documents that use those new attributes.

PHPKitchen: Comparison of Two Giants


PHPKitchen: Comparison of Two Giants 01/21/2003 08:55 AM

Really validating XML with DTDs (XML
Journal)


Really validating XML with DTDs (XML
Journal)
06/18/2002 02:20 PM

PHPKitchen: On Installing LAMP


PHPKitchen: On Installing LAMP 09/22/2004 08:39 AM
In a new posting from PHPKitchen:

PHPKitchen.com: Interview - Zeev &
Andi


PHPKitchen.com: Interview - Zeev &
Andi
06/25/2004 07:17 AM
From PHPKitchen.com:

Planet Perl feeds not validating


Planet Perl feeds not validating 02/10/2004 03:56 PM
It was brought to our attention that the Planet Perl RSS feeds doesn't validate. Both feeds are not encoding my name from the configuration file and the RSS 2.0 feed is using an invalid date format apparently. This is where you, dear reader, comes in. I know you are bouncing in your chair from excitement for fixing the python code already. The PlanetPlanet site is empty, so I'm not quite sure where to send patches, but when they are made...

PHPKitchen: Running a Successful OS
Project


PHPKitchen: Running a Successful OS
Project
08/23/2004 08:21 AM
Most out there enjoy hacking around on scripts, sometimes altering others' Open Source code to meet their needs. What happens, though, if you decide to make that step and jump out with your own Open Source project? How can you do it? What resources do you need? Well, this new article (via a link from PHPKitchen) can give you all of the answers and more.

PHPkitchen: The Back Button Problem


PHPkitchen: The Back Button Problem 09/15/2004 07:53 AM
PHPKitchen has an i nteresting post covering one of the banes of most web developers existance - the "back button problem".

PHPKitchen: Making Pretty Code


PHPKitchen: Making Pretty Code 02/17/2003 09:10 AM

Validating a Credit Card Number with
JavaScript


Validating a Credit Card Number with
JavaScript
05/23/2002 10:39 PM

Community News: Cocina.phpkitchen.com
Launched


Community News: Cocina.phpkitchen.com
Launched
02/18/2003 04:17 PM

Microsoft to employ pro-merchant
whitelist technology for validating spam


Microsoft to employ pro-merchant
whitelist technology for validating spam
05/05/2004 10:40 PM
Microsoft will begin using a whilelisting service for spam filtering. Marketers will have to post a bond with IronPort to send to Hotmail and MSN inboxes. Will it work?

Internet Explorer Tools for Validating
XML and Viewing XSLT Output


Internet Explorer Tools for Validating
XML and Viewing XSLT Output
03/20/2003 08:33 AM

No more usernames in URLs


No more usernames in URLs 02/10/2004 02:44 AM

This one could get very interesting. Microsoft have announced that an upcoming update to Internet Explorer will remove the ability to include usernames in URLs completely. This is in response to the growing problem of so called "phishing" scams, which use trick URLs to con important information such as passwords and credit card details out of unsuspecting browser users.

Phishing is big business. In this article on SecurityFocus, a loose transcript is provided of a talk by an FBI agent who explains how phishing is used by organised crime gangs in Eastern Europe:

This is bad enough and it's also cruelly funny, but the scary part came in when Dave started talking about the other group behind the explosion of viruses and Trojans: Eastern European hackers, backed by organized crime, such as the Russian mafia. In other words, the professionals.

These people are after one thing: money. The easiest way to illegally acquire money now is through the use of online tools like Trojans, or through phishing: set up a fake Web site for PayPal or eBay or Amazon, and then convince the naive to enter their usernames, passwords, and credit card information. Viruses and spam also intersect in this nasty spiderweb. Viruses help spread Trojans, and Trojans are used to turn unsuspecting users' computers into spam factories, or hosts for phishing expeditions, and thus furthering the spread of all the elements in this process: viruses, Trojans, spam, and phishing. It's a vicious cycle, and unfortunately, it appears to be getting worse. The FBI is working as hard as it can, but the nations of Eastern Europe are somewhat powerless to solve the problem at this time.

IE is so susceptible to this kind of attack that it's not even funny. In addition to the "invisible username" bug I covered last month, a recent discovery compounds the problem by allowing dangerous executable files to pose as safe file types when downloaded from the web. New Explorer hole could be devastating has the full details.

Microsoft's solution is drastic to say the least. Passing the username as part of a URL has been part of the makeup of the internet since at least 1994, and the ability is baked in to a huge range of web client and server software. It's described in RFC 23996. The feature is rarely used however, and the overall effect of its removal from IE is hard to judge. Off the top of my head I can think of only one site that uses it for legitimate reasons: FilePlanet, which incorporates it in to the site's download queuing system (at least last time I checked).

There's an interesting contrast to be made here between open and closed development methodologies. The Mozilla project has had a bug open on this issue for over two years, which has drawn over 170 comments with plenty of great ideas but no approved solution. Microsoft on the other hand have remained silent on the issue until (we can only assume) the bad publicity surrounding it forced them to act, at which point they announced a fix that appears to gly in the face of commonly accepted web standards - but does undoubtedly solve the problem. Of course, with no chance for user feedback prior to the decision it amounts to little less than a decree from God - which correlates directly to their inarguable domination of the browser market, at least in terms of market share.

Of course, the millions of IE users who decline to upgrade their browser will remain just as susceptible as they always were (unless they stop clicking links) - a fact for which we can hardly blame Microsoft. It does however mean that phishing will remain a lucrative scam for a long time to come.


URLs vs. XHTML


URLs vs. XHTML 03/11/2003 02:00 PM
After linking a few items on Amazon.com, my XHTML has been broken for who–knows–how–long. It popped up as I redesigned,...

How to Obscure URLs


How to Obscure URLs 04/19/2004 09:57 PM

How to Obscure Any URL: Great, great page on how spammers and scammers obscure URLs so most people don't know where they're going.

These tricks are known to the spammers and scammers, and they're used freely in unsolicited mails. You'll also see them in ad-related URLs and occasionally on web pages where the writer hopes to avoid recognition of a linked address for whatever reason. Now, I'm making these tricks known to you.

Also worth nothing is that this is a great page dedicated to substance over style. One page, very long, full of infomation with no worries about overly-frilly presentation. We need more pages like this.

Via Don Park.

Click here to comment on this entry


URLs Set in Stone


URLs Set in Stone 01/05/2005 01:19 AM

I've often wondered whether or not you should change blog posts once they're published. While I often do just because I'm anal, part of me thinks that a blog post is a historical record and should be frozen in time.

It's sort of that way for the titles of posts on Gadgetopia since they're used for the URL. For instance, I screwed up the title of this post (it should be "ALT Attributes," not "ALT tags"), but I can't change it because that would change the URL. I'd have to put in a redirect because it gets a lot of traffic.

As annoying as it is on the surface, there's something...pure in this that I like. The title of this entry is frozen in time. It is how it was originally published. Just like a newspaper publisher can't take something back once it hits the newstand, I can't change this title.

This get me to thinking that it would be an interesting...expirement, to MD5 has hthe entire body of an entry and use the result as the URL. This means that you couldn't change one single character of the entry after it was published without completely changing the URL.

I find this idea intriguing. Not enough to try it, mind you, but stil interesting to consider.


Autolink URLs in MT Entries


Autolink URLs in MT Entries 08/16/2004 05:58 PM

One of the things I really hate about reading newspaper Web sites is they often include URLs but don’t link them. So you have to copy and paste to open them (or just rightclick with the right Firefox extension).

I don’t want my site to look like a big, dumb newspaper. So I wrote a filter using Brad’ ;s regex plugin to autolink URLs in entries. It hasn’t been extensively tested but it has worked for a the half-dozen or so entries in the Project X blog.

Install the regex plugin and then add this to the top of your templates…

<MTAddRegex name="autolink">s![> ](http://[^<" ]+)!<a href="$1">$1</a> !g</MTAddRegex>

Then add the attribute regex="autolink" to your MTEntryBody and MTEntry more tags in your templates: <MTEntryBody regex="autolink">.


Notes and Tips: ".Mac" URLs


Notes and Tips: ".Mac" URLs 04/06/2005 12:19 PM
Here's more about a new ".Mac" URL problem and workarounds.

Why Blogger redirects some URLs


Why Blogger redirects some URLs 05/11/2004 01:58 AM
The new Blogger redirects a lot of its links through another server. Ev explains why: it's to keep down comment-spam, to avoid apportioning unwarranted PageRank, and to protect Google's intranet.
Since blogger.com is linked from google.com, any sites we link to could pass on a fairly high PageRank value. (PageRank is one of the factors that determines what results show up in what order for searches.) In order to remove any possibility of unequal ranking of Blogger-powered blogs in the Google main search index, we send links through a URL from which Google knows to ignore PageRank. This way, Blogger blogs earn PageRank only on the basis of their content and other people linking to them, not because they're powered by a tool owned by Google.
Link (via EvHead)

CGI Redirected URLs and PageRank


CGI Redirected URLs and PageRank 06/19/2002 08:56 AM
When a directory listing with a CGI redirect points to your site, does this benefit your Page Rank?

Generating One-Time URLs with PHP


Generating One-Time URLs with PHP 12/05/2002 08:50 PM
Not everything on the Internet is designed for archival. Some data is time- or recipient-sensitive and should be protected. Daniel Solin demonstrates how to generate URL access keys for sensitive data with PHP.

Sweet Mother of URLs


Sweet Mother of URLs 07/23/2004 11:32 PM

sweetmotherof frothygoodnessthatsbadnewsforbud.com: I saw this URL on a Miller Lite ad. It actually resolves. Sadly, it redirects, so you'll never see it in the address bar.

Click here to comment on this entry


how URLs and ideas propagate through
bl0gs,


how URLs and ideas propagate through
bl0gs,
03/06/2004 01:53 AM
Blog Epidemic Analyzer .. appositi tool .. this

www-idl.hpl.hp.com/blogstuff/index.html
track this site | 9 links


Alf makes grabbing MP3 URLs really easy!


Alf makes grabbing MP3 URLs really easy! 01/17/2004 11:21 PM

m3u generator bookmarklet. Alf Eaton has come up with a m3u generator bookmarklet which will harvest the links to mp3s on a page  you're viewing in the browser and give you a playlist. Drag that last link to your links bar, and try it on this pa ge of songs from Les Ogres de Barback or this page of songs from the klezmer band Sirba (found thanks to Lucas).

[Seb's Open Research]

 

Coolio - once again Alf is leading the way!  I could have used that util over the past few weeks - building all the jukeboxes I've been up to.

Now Alf can take those MP3s he's grabbing and put them into a Laszlo SoundBlox - just like the one I got in my gutter (and Barlow has in his - too!)


Web Sites That Shorten Long URLs


Web Sites That Shorten Long URLs 06/20/2004 08:14 AM
Web Sites That Shorten Long URLs
http://notlong.com/links/

These free web sites can take a long URL and give you back a shorter URL without requiring registration. Since these sites forward a click from one link to another, they are also known as URL forwarders and some do subdomain forwarding. Any of these services will do a decent job, but if you want to study them before you pick one, here is an informal survey of the competitive landscape. [beSpacific June 15, 2004]

Using ForceType For Nicer Page URLs


Using ForceType For Nicer Page URLs 06/06/2002 07:37 AM
Apache has features that allow us to setup easy to remember URL's for our web site's pages. In this article Joe shows us how easy it is to do with Apache and a little PHP.      SQL Server Stored Procedures 101 // by Himanshu Khatri - 03rd Jun 2002

Get mailto URLs to open in mutt


Get mailto URLs to open in mutt 05/25/2004 10:14 AM
If you want URLs like mail foo@bar.tld about stuff to work with mutt (or pine, if you patch the code, but blech), you can download a small program I wrote. Documentation is lacking, as is customizability, but hey, for my firs...

Canonical URLs and network effects


Canonical URLs and network effects 09/27/2004 08:57 AM
After retracing his steps in order to correctly credit a link he had recently cited, Darren Barefoot wondered whether it had been worth the trouble:
Generally, I just choose the site closest to the source, and credit them. That probably doesn't make sense, as I should be crediting the source where I found them. Or is it important to show the entire 'chain of evidence'? Ultimately, who really cares? [Darren Barefoot: The (boring) problem of attribution]
I think that it is worth the trouble, and that publishing platorms and blogging tools ought to conspire to help automate the tedious chore. The reason usually given is that the original source deserves credit, and that it's unfair to redirect that credit. That's true, but there's a deep systemic principle at work here too. Canonical URLs create powerful network effects that we dilute at our peril. ...

Multiple URLs to Same Page in Google


Multiple URLs to Same Page in Google 12/19/2004 03:08 PM
Wild variations on a url are showing up in the Google index. Most often these are a result of a incorrectly configured server, but some feel there is something wrong on Googles end.

Get mutt to open URLs in Safari


Get mutt to open URLs in Safari 05/10/2004 10:19 AM
If you're a mutt user (a very popular terminal-based email application) and you would like to open URLs from your emails in Safari as opposed to viewing them with lynx or elinks (neither is bundled with OS X), here's what you...

Friendly URLs in Movable Type


Friendly URLs in Movable Type 02/01/2005 08:40 PM
Arve has written a very nice tutorial covering how to set up Movable Type to use search engine and user friendly url's. Not only does he show how to set up Movable Type so you can customise the url's yourself,...

Announcing Link-Fu: Battle of the
Bizarro URLs


Announcing Link-Fu: Battle of the
Bizarro URLs
11/04/2003 02:35 PM
OK. Listen up, freaks -- here are the rules. Link-Fu is an online competition where during a specific, pre-established period of time -- in this case, Thursday, November 6 from 9AM-12PM, Eastern Time -- you send us one url that links to some very weird something somewhere. Something so bizarre and wild and intriguing and fascinating, that no-one else (or as few nobodies as possible) has seen.

Judges: Warren Ellis, Invi sible Cowgirl, Mark, Cory, Pesco, and yours truly. We declare a winner based on whatever we happen to like best. Not the grossest, not neccesarily Farkish or Rotten. Just the flat-out most bizarre -- though grotesquery is not neccesarily out of the question. In fact, here was last week's barfbag winner (WARNING: extremely distgusting, NSFW, Cowgirl found it). The winner wins the title of High Master of Link-Fu, until we hold the next battle.

So, if you'd like to compete in the website smackdown -- e-mail the funkiest, most potently bizarro url-age you can find to linkfubattle@yahoo.com on Thursday, November 6 from 9AM-12PM, Eastern Time (US). We will announce the winner Friday morning. May the best link win. (disclaimer: the whole thing was Warren and Cowgirl's idea.)

Slugs: Decrufting Movable Type URLs


Slugs: Decrufting Movable Type URLs 02/01/2005 10:08 PM
A tutorial on how to migrate from the old, numeric Movable Type URIs, to search-engine and user-friendly URLs without file extensions, and with proper, custom slug text.
Grok Description matches for PHPKitchen: Validating URLs with PHP
GrokA matches for PHPKitchen: Validating URLs with PHP

PHPKitchen: Validating URLs with PHP

The following phrases have been identified by the grok system as matching this entry:

















Also check out:


Grok

Ipod Porn on the
Rise

Brief Abstract of
Wikipedia's
Mesothelioma Cancer
page

Get first aid
instructions in your
cell phone

IE is crap
JSPWiki gains
podcasting support

Linux.com: Securing
PHP

Ben Ramsey's Weblog:
Programmers with CS
Degrees?

Circle.ch: Labels
from SVG to PDF with
PHP

uschedule 0.7.1
STUBS and
Franki/Earlgrey
Linux 0.6.3 (STUBS
Configurations)

reloc 1.0
aConCorde 0.4.1
BusyBox 1.0.0-rc3
OS X 10.3.5 adds
Bluetooth remote
control support

Iowa Comic Book Shop
Wins Retailer Award
(AP)

Neb. Newspaper
Prints Edition
Backward (AP)

Indian tongue
thrives in Paraguay
(Reuters)

Wash. Man Jumps Off
Ferry to Rescue Dog
(AP)

Pa. Woman Gives
Birth to 2 Sets of
Twins (AP)

Biometric Keyboards,
Mice Dominate Fall
Lineup

Sun-Microsoft Deal:
Still More Fluff
Than Real Stuff

Lack Of Originality
Strengthen The Good:
Strengthen The Good
In Florida

MSNBC - An Affair To
Regret

she says were set up
by his staff

Patterico's
Pontifications:
Liberal Bias in the
Wording of a News
Article

How To Pick A
Programming Language

Dictionary.com/Word
of the Day:
abominate

884130 - Programs
that may behave
differently in
Windows XP Service
Pack 2

This has got to be
one of the most
ridiculous things
I've ever read

#1 Viagra Site -
Cheap Viagra

LRB | Steven Shapin
: The Great Neurotic
Art

seen fit to notice
Cambodiagate

Software Pac-Man
Down low blues
Sex, lies and the
"down low"

Trying to be adult
"Checkpoint" by
Nicholson Baker

Kerry's Iowa problem
Don't worry, take
Prozac

ScarletFusion is
'Your External
Marketing
Department'

Taking notes
U.N. Demands Justice
After Massacre of
150 Refugees in
Burundi

Chavez Claims
Victory in
Referendum on His
Rule

Dozens of Blasts
Rock Najaf; 3
American Troops Are
Killed

AMD could ship a
quad-core Opteron on
2007

MS invokes DMCA to
stop SP2 file
sharing demo

Ebookers ups revs,
narrows losses

HK feds bust illegal
cricket fighting
ring

Intel's heir charts
the same course

Google may lose out
on Gmail

AMD asleep at the
wheel?

Training could
counter offshoring,
says analyst

Newham: staying with
Microsoft is '68%
cheaper than open
source'

IT has key economic
role, says new EC
president

what is grok?