HTML: Escaping &'s in URLs in HTML
Warning: Failure to ignore the following validation warning may result in lost productivity!Concerning Ampersands (&'s) in URLs, the following is what I wrote in the Aquarium documentation:
The short answer is, if you have a URL with more than one parameter, you should wrap it with
$htmlentwhen you embed it in HTML if you want to pass DTD validation. If you don't care, then it really won't matter. What follows is an explanation of why I can't make it any easier on you.
- You must escape &'s in URLs in order to pass DTD validation. Per the spec, a browser could look at
http://a.com/?a=b©=2and interpret the
©=as part of value of the
avariable instead of a new variable named
©is an HTML entity.
- To handle #1, Aquarium use to escape the &'s in every URL automatically.
- However, #2 broke redirects if you redirected to a URL with more than one parameter. I so rarely did this, that I didn't know about the bug for a good year.
- To fix the bug in #3, we came up with a scheme to always escape &'s but deal with the redirect case specially. Now imagine if you create a HTML link whose URL has a parameter named referrer that is set to your current URL, which itself happens to have two parameters. When the user clicks on the link, Aquarium now has a GET parameter named referrer that is a URL. The programmer can use that URL directly in HTML (in which case the &'s must be escaped) or he might redirect to it (in which case the &'s must not be escaped). The programmer is never going to remember whether the URL is already escaped (per #3, it already is) and whether he needs to escape it for a link or unescape it for a redirect. His brain would core dump.
When it comes to escaping things, a good general rule is to escape things at the last possible moment. By violating this rule, bad things were happening.
- We could force engineers to wrap every URL in HTML with htmlent, but that would suck. Too much existing code doesn't.
- Browsers are smart, and most of the time, if you don't escape the &'s, the browser won't get confused. In fact, you can't generate a URL like
http://a.com/?a=b&copy;=2with Aquarium anyway, because the ; will get urlencoded to
%3B. Hence, it's not possible to get Aquarium to generate a URL that would confuse the browser. Programmers who are worried about passing DTD validation and have a URL with more than one parameter will just have to use
$htmlent. That's better than forcing every programmer to think about the problem every time he generates a URL since, practically speaking, the warning is pedantic.