0

Base64, URL encoding, HTML entity encoding

Beginner5 min read·php-03-011
security

Concept

Encoding functions transform data between representations — they are not encryption, they do not provide security, and they are fully reversible. Understanding what each encoding does (and does not) protect against is essential for writing secure PHP code.

Base64 (base64_encode/base64_decode) converts binary data to printable ASCII using a 64-character alphabet (A-Z, a-z, 0-9, +, /). Output is ~33% larger than input. Use it to embed binary data in text contexts: email attachments, data URIs, JSON payloads. base64_url_encode (no built-in, but common helper) replaces + with - and / with _ for URL-safe usage — important when passing base64 in query strings.

URL encoding (urlencode/urldecode) encodes characters that have special meaning in URLs (&, =, +, %, etc.) as percent-encoded sequences (%XX). urlencode also encodes spaces as +. Use it for individual query parameter values. rawurlencode (%20 for spaces) is the RFC 3986 standard and is preferred for path segments.

HTML entity encoding (htmlspecialchars/htmlspecialchars_decode) converts <, >, &, ", ' to their HTML entity equivalents (&lt;, &gt;, etc.). This is the primary defense against XSS. Always apply it when outputting user-supplied data into HTML. Use ENT_QUOTES | ENT_HTML5 flags and specify the encoding explicitly: htmlspecialchars($value, ENT_QUOTES | ENT_HTML5, 'UTF-8'). htmlentities converts ALL applicable characters; htmlspecialchars only converts the dangerous 5 — the latter is usually what you want.

Code Example

php
<?php
declare(strict_types=1);

// Base64 — embed binary in text
$binary = random_bytes(32);
$encoded = base64_encode($binary);
echo $encoded; // "xK3m7..." — 44 chars for 32 bytes

$decoded = base64_decode($encoded);
assert($decoded === $binary);

// URL-safe base64 (no built-in)
function base64url_encode(string $data): string
{
    return rtrim(strtr(base64_encode($data), '+/', '-_'), '=');
}
function base64url_decode(string $data): string
{
    return base64_decode(strtr($data, '-_', '+/'));
}

// URL encoding — query parameters
$params = ['q' => 'PHP & Laravel', 'sort' => 'date+time'];
$query = http_build_query($params);
echo $query; // "q=PHP+%26+Laravel&sort=date%2Btime"

echo urlencode('hello world & more'); // "hello+world+%26+more"
echo rawurlencode('hello world & more'); // "hello%20world%20%26%20more"

// Path vs query string encoding
$path = '/search/' . rawurlencode('héllo wörld'); // path segment
$qs   = '?q=' . urlencode('héllo wörld');          // query string

// HTML entity encoding — XSS prevention
$userInput = '<script>alert("xss")</script> and "quotes" & \'apostrophes\'';

// CORRECT — use ENT_QUOTES and specify encoding
echo htmlspecialchars($userInput, ENT_QUOTES | ENT_HTML5, 'UTF-8');
// "&lt;script&gt;alert(&quot;xss&quot;)&lt;/script&gt; and &quot;quotes&quot; &amp; &#039;apostrophes&#039;"

// WRONG — missing flags allows single-quote injection
echo htmlspecialchars($userInput); // doesn't escape single quotes by default

// In Laravel Blade: {{ $var }} calls htmlspecialchars automatically
// {!! $var !!} is UNESCAPED — only use for trusted HTML content