Unicode Character Lookup
Search and explore Unicode characters by name, codepoint, or category. Find emoji, symbols, and special characters with their HTML entities and CSS escape codes.
Common Characters
How to Use Unicode Character Lookup
- 1Search by character name (e.g. "snowflake"), codepoint (e.g. U+2744), or paste a character.
- 2See the character's name, category, block, and encoding details.
- 3Copy the HTML entity, CSS escape, or JavaScript escape code.
- 4Browse Unicode blocks and categories.
Zenovay
Privacy-first analytics for your website
Understand your visitors without invasive tracking. GDPR compliant, lightweight, and powerful.
Related Tools
JSON Formatter & ValidatorFormat, validate, and beautify JSON data with syntax highlighting and error detection.
JWT DecoderDecode and inspect JWT tokens. View header, payload, and verify signatures.
Base64 Encode/DecodeEncode text to Base64 or decode Base64 back to text. Supports UTF-8 and binary data.
URL Encode/DecodeEncode or decode URL components. Handle special characters, query strings, and full URLs.
Frequently Asked Questions
What is Unicode?▾
Unicode is a universal character encoding standard that assigns a unique number (codepoint) to every character in every writing system. The Unicode Standard covers 149,813 characters (Unicode 15.1) across 161 scripts including Latin, Arabic, Chinese, Japanese, Korean, Devanagari, Emoji, mathematical symbols, and historic scripts. Unicode codepoints are written as U+XXXX (e.g., U+0041 = A, U+1F600 = 😀). UTF-8, UTF-16, and UTF-32 are encoding forms that store Unicode codepoints as bytes.
What is the difference between Unicode and UTF-8?▾
Unicode is the abstract standard (assigns numbers to characters). UTF-8 is a concrete encoding (converts those numbers to bytes). In UTF-8: ASCII characters (U+0000 to U+007F) use 1 byte; characters up to U+07FF use 2 bytes; up to U+FFFF use 3 bytes; up to U+10FFFF use 4 bytes. UTF-8 is backward-compatible with ASCII and the dominant encoding on the web (~98% of websites). UTF-16 uses 2 or 4 bytes per character and is used internally by JavaScript and Java.
What is a Unicode codepoint and how do I escape it?▾
A codepoint is the unique number assigned to each character. U+0041 = decimal 65 = letter A. Escape forms: HTML entity: A or A (hex) or & (named). JavaScript: \u0041 (BMP) or \u{1F600} (full range, ES2015+). CSS: \41 or \000041. Python: \u0041 or \U00001F600. JSON: \u0041 (BMP only, surrogate pairs for others). URL encoding: %41 (percent-encoded).
What is a Unicode block?▾
Unicode is divided into 308 blocks (Unicode 15.1), each a contiguous range of codepoints for a related group of characters. Examples: Basic Latin (U+0000-U+007F), Latin-1 Supplement (U+0080-U+00FF), Greek (U+0370-U+03FF), Cyrillic (U+0400-U+04FF), CJK Unified Ideographs (U+4E00-U+9FFF, 20,902 characters), Emoji (Emoticons block U+1F600-U+1F64F). The "Basic Multilingual Plane" (BMP) covers U+0000 to U+FFFF.
What are Unicode categories?▾
Unicode assigns each character a General Category: L (Letter): Lu=uppercase, Ll=lowercase, Lt=titlecase, Lm=modifier, Lo=other. N (Number): Nd=decimal digit, Nl=letter number, No=other. P (Punctuation): Pc, Pd, Ps, Pe, Pi, Pf, Po. S (Symbol): Sm=math, Sc=currency, Sk=modifier, So=other. Z (Separator): Zs=space, Zl=line, Zp=paragraph. C (Other): Cc=control, Cf=format, Cs=surrogate, Co=private use, Cn=unassigned.