URL Encoding
(or: 'What are those "%20" codes in URLs?')

= Index DOT Html by Brian Wilson [indexdot@blooberry.com] =

Main Index | Element Index | Element Tree | HTML Support History
RFC 1738 | Which characters must be encoded and why
How to URL encode characters | URL encode a character



RFC 1738: Uniform Resource Locators (URL) specification
The specification for URLs (RFC 1738, Dec. '94) poses a problem, in that it limits the use of allowed characters in URLs to only a limited subset of the US-ASCII character set:
"...Only alphanumerics [0-9a-zA-Z], the special characters "$-_.+!*'()," [not including the quotes - ed], and reserved characters used for their reserved purposes may be used unencoded within a URL."
HTML, on the other hand, allows the entire range of the ISO-8859-1 (ISO-Latin) character set to be used in documents - and HTML4 expands the allowable range to include all of the Unicode character set as well.

What characters need to be encoded and why?
"Reserved characters"
     Why: URLs use some characters for special use in defining their syntax. When these characters are not used in their special role inside a URL, they need to be encoded.
Characters: ";", "/", "?", ":", "@", "=" and "&"
ASCII Control characters
     Why: These characters are not printable.
Characters: Includes the ISO-8859-1 (ISO-Latin) character ranges 00-1F hex (0-31 decimal) and 7F (127 decimal.)
Non-ASCII characters
     Why: These are by definition not legal in URLs since they are not in the ASCII set.
Characters: Includes the entire "top half" of the ISO-Latin set 80-FF hex (128-255 decimal.)
"Unsafe characters"
     Why: Some characters present the possibility of not being understood within URLs for various reasons. These characters should also always be encoded.
Characters:
CharacterCode
Points
(Hex)
Code
Points
(Dec)
Why encode?
Space2032 Significant sequences of spaces may be lost in some uses (especially multiple spaces)
Quotation marks
'Less Than' symbol ("<")
'Greater Than' symbol (">")
22
3C
3E
34
60
62
These characters are often used to delimit URLs in plain text.
'Pound' character ("#") 2335 This is used in URLs to indicate where a fragment identifier (bookmarks/anchors in HTML) begins.
Percent character ("%") 2537 This is used to URL encode/escape other characters, so it should itself also be encoded.
Misc. characters:
   Left Curly Brace ("{")
   Right Curly Brace ("}")
   Vertical Bar/Pipe ("|")
   Backslash ("\")
   Caret ("^")
   Tilde ("~")
   Left Square Bracket ("[")
   Right Square Bracket ("]")
   Grave Accent ("`")

7B
7D
7C
5C
5E
7E
5B
5D
60

123
125
124
92
94
126
91
93
96
Some systems can possibly modify these characters.


How are characters URL encoded?
URL encoding of a character consists of a "%" symbol, followed by the two-digit hexadecimal representation (case-insensitive) of the ISO-Latin code point for the character.
Example
  • Space = decimal code point 32 in the ISO-Latin set.
  • 32 decimal = 20 in hexadecimal
  • The URL encoded representation will be "%20"

URL encode your own characters
Type any character into the "Original Character" box below and you can easily convert it to the URL encoded representation of the character.
    
Original
Character
URL
Encoding

Browser Peculiarities
  • Internet Explorer is notoriously relaxed in its requirements for encoding spaces in URLs. This tends to contribute to author sloppiness in authoring URLs. Keep in mind that Netscape and Opera are much more strict on this point, and spaces MUST be encoded if the URL is to be considered to be correct.


Boring Copyright Stuff...