Remove Text Diacritics

Remove accent marks and diacritical symbols from text.

Input
Specify letters that should keep their diacritical marks. All other accented characters will be converted to their base form.
Output

What It Does

The Remove Text Diacritics tool strips all diacritical marks — accents, umlauts, tildes, cedillas, and other combining characters — from letters, converting them to their plain ASCII equivalents. Characters like é become e, ñ becomes n, ü becomes u, ç becomes c, and ã becomes a, while the rest of your text remains completely intact. This process, often called diacritic removal or accent stripping, is essential whenever you need to work with text in systems that only support the basic ASCII character set. Whether you're building a search index, generating URL slugs, normalizing a database of international names, or exporting data to a legacy system that chokes on extended Unicode characters, this tool handles the conversion instantly. It supports the full range of Unicode combining characters, meaning it works correctly for Latin-based languages including French, Spanish, German, Portuguese, Italian, Polish, Czech, Romanian, Vietnamese, and dozens more. The tool is particularly valuable for developers and data engineers who need to pre-process multilingual text before feeding it into downstream systems, as well as content managers who need clean, ASCII-safe versions of titles and headings for use in file names, identifiers, or database keys. Unlike a simple find-and-replace approach, this tool uses Unicode normalization under the hood to correctly identify and remove all combining diacritical marks in a single pass, ensuring no accented character slips through.

How It Works

The Remove Text Diacritics applies its selected transformation logic to your input and produces output based on the options you choose.

It applies a fixed set of transformation rules to your input, so the output is stable and easy to verify.

All processing happens in your browser, so your input stays on your device during the transformation.

Common Use Cases

  • Generating clean URL slugs from article titles or product names that contain accented characters, ensuring links are readable and compatible with all web servers.
  • Normalizing customer name databases that contain entries from multiple countries, so search and matching logic works consistently regardless of how a name was originally entered.
  • Preparing text for export to legacy database systems, CSV files, or APIs that only accept standard ASCII and reject or corrupt extended Unicode characters.
  • Building search indexes where users should be able to find 'café' by typing 'cafe', or 'naïve' by typing 'naive', without needing to enter the exact accented form.
  • Creating file names from user-generated content, since most operating systems and file systems handle accented characters inconsistently across platforms.
  • Cleaning up OCR-scanned documents where accented characters may have been misread or inconsistently encoded before further text processing.
  • Standardizing transliterated names and addresses from international forms for use in downstream systems that require consistent ASCII-only formatting.

How to Use

  1. Paste or type any text containing accented or diacritically marked characters into the input field — you can paste entire paragraphs, lists of names, or any block of multilingual text.
  2. The tool immediately processes the input and displays the normalized result in the output field, with all diacritical marks removed and base letters preserved in their original positions.
  3. Review the output to confirm the conversion looks correct — every accented character should now appear as its plain ASCII counterpart while punctuation, spacing, and non-accented characters remain unchanged.
  4. Click the copy button to copy the cleaned text to your clipboard, ready to paste directly into your database, code editor, URL field, or any other destination.
  5. If you need to process a large volume of text, you can paste the entire block at once — the tool handles multi-line input and long documents without any character limit concerns.

Features

  • Full Unicode diacritic coverage — correctly removes combining marks across all Latin-based scripts, including acute, grave, circumflex, tilde, umlaut, cedilla, breve, caron, and ring accents.
  • Non-destructive base character preservation — only the diacritical marks are removed; the underlying letter and all surrounding text remain exactly as they were.
  • Instant real-time processing — the conversion happens as you type or paste, with no submit button required and no delay even for lengthy text blocks.
  • Handles multi-language input in a single pass — you can mix French, Spanish, German, Portuguese, and Czech text in one input and all diacritics will be correctly stripped.
  • Preserves all non-Latin characters and symbols — numbers, punctuation, spaces, and characters from non-Latin scripts are left completely untouched.
  • One-click copy to clipboard — copy the normalized output instantly without needing to manually select the text.
  • Works entirely in the browser — no text is sent to a server, keeping your data private and the tool fast regardless of your internet connection.

Examples

Below is a representative input and output so you can see the transformation clearly.

Input
café naïve jalapeño
Output
cafe naive jalapeno

Edge Cases

  • Very large inputs may take a few seconds to process in the browser. If performance slows, split the input into smaller batches.
  • Mixed formatting (tabs, line breaks, or inconsistent delimiters) can affect output. Normalize spacing first if needed.
  • Remove Text Diacritics follows the selected options strictly. If the output looks unexpected, re-check option settings and input format.

Troubleshooting

  • Output looks unchanged: confirm the input contains the pattern this tool modifies and that the correct options are selected.
  • Output differs from a previous run: confirm that the input and every option match, because deterministic tools should repeat when the settings are identical.
  • Unexpected characters: check for hidden whitespace or encoding issues in the input and try normalizing first.
  • Slow processing: reduce input size or try a modern browser with more available memory.

Tips

When using diacritic removal for URL slug generation, combine this tool with a lowercase converter and a spaces-to-hyphens replacement to produce clean, SEO-friendly slugs in one workflow. For search normalization, apply diacritic removal to both the stored text and the search query at query time — this way users can search without worrying about typing the correct accent. Be aware that diacritic removal is a lossy operation: once marks are removed, the original form cannot be recovered, so always keep a copy of the original text if you may need it later. For languages like Vietnamese where diacritics carry tonal meaning, stripping marks changes the semantic meaning of words — only remove diacritics in contexts where that loss is acceptable, such as slug generation, rather than in contexts where meaning matters.

Diacritical marks are small visual symbols added to letters to indicate a change in pronunciation, tone, or meaning. Found across hundreds of languages that use the Latin alphabet, they include the acute accent (é), grave accent (è), circumflex (ê), umlaut or diaeresis (ü), tilde (ñ), cedilla (ç), caron (š), breve (ă), ring (å), and many others. In written language, these marks are essential — removing them changes spelling and sometimes meaning. But in the world of software systems, data processing, and the internet, diacritical marks frequently cause problems that make stripping them a practical necessity. The root of the issue is ASCII, the American Standard Code for Information Interchange, which was defined in 1963 and only encodes 128 characters — the basic Latin alphabet, digits, punctuation, and control characters. No accented letters were included. For decades, ASCII was the lingua franca of computing, and enormous amounts of infrastructure — from database engines to URL parsing rules to file systems — was built around the assumption that text would be ASCII. Even today, with Unicode now the universal standard, countless legacy systems, APIs, and tools either break or silently corrupt text when they encounter characters outside the ASCII range. This is why diacritic removal remains a relevant and widely-used text normalization technique. Rather than fighting with a legacy system over encoding, you simply pre-process your text to strip the marks before they reach the system. The most common application is URL slug generation: web URLs technically support Unicode through percent-encoding, but a URL like `/article/caf%C3%A9-au-lait` is far less readable and shareable than `/article/cafe-au-lait`. Most CMS platforms and blogging tools automatically strip diacritics when generating slugs for exactly this reason. Search normalization is another major use case. In information retrieval, it's generally desirable for a search for "resume" to match documents containing "résumé", and for "uber" to match "Über". Search engines accomplish this through a process called Unicode folding or accent folding during index time and query time, which is essentially automated diacritic removal applied symmetrically to both the indexed content and the search terms. Diacritic removal is closely related to, but distinct from, transliteration. Transliteration converts characters from one script to another — for example, converting Cyrillic or Arabic text into Latin characters. Diacritic removal only operates within the Latin script, removing combining marks from letters that already have a Latin base. For purely Latin-script languages like French, Spanish, or German, diacritic removal is sufficient for ASCII normalization. For non-Latin scripts, transliteration is needed first. Under the hood, proper diacritic removal works by first applying Unicode Canonical Decomposition (NFD normalization), which separates each precomposed character like é into its base character (e) and its combining accent mark (the acute accent as a separate code point). Once decomposed, the combining marks — which all fall in the Unicode range U+0300 to U+036F — can be filtered out, leaving only the base characters. This is far more reliable than maintaining a manual lookup table of accented characters, because it handles the full scope of Unicode's combining diacritical marks automatically, including rare and less common accents that a hand-crafted table might miss. It's worth noting when NOT to remove diacritics: in any context where linguistic accuracy matters, such as publishing, education, or displaying names correctly, diacritics should be preserved. Removing the accent from a person's name like "José" or a place name like "Zürich" when displaying it to users is considered disrespectful and incorrect. The right approach is to store and display the original form, and only apply diacritic removal internally for technical purposes like indexing or slug generation.

Frequently Asked Questions

What are diacritical marks, and which ones does this tool remove?

Diacritical marks are symbols added to letters to modify their pronunciation or meaning. Common examples include the acute accent (é), grave accent (è), circumflex (ê), umlaut (ü), tilde (ñ), cedilla (ç), caron (š), and ring above (å). This tool removes all Unicode combining diacritical marks in the range U+0300–U+036F, which covers the full set of accents used in Latin-script languages including French, Spanish, German, Portuguese, Italian, Polish, Czech, Romanian, and many more.

Why would I need to remove diacritics from text?

The most common reasons are technical compatibility and normalization. Many legacy systems, databases, APIs, and file systems were built around ASCII and either reject or silently corrupt accented characters. Removing diacritics before passing text to these systems prevents encoding errors. Other common use cases include generating clean URL slugs, building search indexes that should match regardless of whether users type accented characters, and normalizing names in databases to improve matching and deduplication.

Does removing diacritics change the meaning of words?

In some languages, yes — diacritics can distinguish between different words or indicate tonal differences, so removing them creates an ambiguous or altered form. For example, in Spanish, 'si' (if) and 'sí' (yes) differ only in the accent. In Vietnamese, diacritics are critical to conveying tone and meaning, so accent stripping significantly changes words. For this reason, diacritic removal should only be used for technical purposes like slug generation or search indexing — never for displaying text to users where linguistic accuracy matters.

How is diacritic removal different from transliteration?

Diacritic removal only works on characters that already use a Latin base letter, stripping away the combining accent marks. Transliteration converts characters from one writing system to another entirely — for example, converting the Russian word 'Москва' into the Latin-script form 'Moskva'. If your text is already in a Latin-script language like French or Spanish, diacritic removal is all you need for ASCII normalization. If your text uses Cyrillic, Arabic, Greek, or other non-Latin scripts, you need transliteration, not just diacritic removal.

Will this tool affect characters that aren't accented, like numbers or punctuation?

No — the tool only removes Unicode combining diacritical marks. Plain ASCII characters, numbers, punctuation, spaces, and line breaks are passed through completely unchanged. Characters from non-Latin scripts (such as Chinese, Arabic, or Cyrillic) are also left intact, since they don't have Latin base letters for the tool to normalize. Only characters that are composed of a Latin base letter plus a combining diacritic will be affected.

Is diacritic removal the same as Unicode normalization?

They are related but not the same thing. Unicode normalization refers to a family of standardization processes (NFC, NFD, NFKC, NFKD) that ensure the same character is always represented in the same way in memory — since some accented characters can be encoded either as a single precomposed code point or as a base letter plus a combining mark. Diacritic removal uses NFD decomposition as a step in the process, but then goes further by actually deleting the combining marks rather than just standardizing how they're stored. The end result is plain ASCII base letters, which is more aggressive than normalization alone.