Blog

Zero-Width Characters Explained (ZWS, ZWNJ, ZWJ, BOM)

. Zero-Width Characters

Zero-Width Characters Explained (ZWS, ZWNJ, ZWJ, BOM)

Understanding zero width characters: why they exist and why they break your text

Zero width characters live in a strange corner of unicode. They occupy no visible space, leave no trace when copied and yet profoundly alter how text behaves across platforms. When a line refuses to break, when a hashtag becomes unclickable, when a URL silently fails or when an AI generated paragraph feels off without any obvious reason, a zero width character is usually the culprit. Although they were originally created for legitimate linguistic reasons, these characters now frequently appear in modern workflows that involve copying, pasting and generating text, especially when content passes through Slack, Google Docs, messaging apps or large language models. Understanding them is essential to controlling text predictability, formatting consistency and cross platform compatibility.

ZWS, ZWNJ, ZWJ and BOM: the core zero width set

Four characters dominate the landscape of invisible formatting anomalies: ZWS (zero width space), ZWNJ (zero width non joiner), ZWJ (zero width joiner) and BOM (byte order mark). Each behaves differently. Each can quietly shift the way text renders. Each also appears more often than most writers expect, especially when text has touched AI systems, collaborative editors or export pipelines.

The zero width space (ZWS)

ZWS is a ghost separator. It behaves like a space that takes no space. In languages that require custom word breaking rules, ZWS is a legitimate tool. In English content, however, it becomes a silent saboteur. A ZWS inside a hashtag produces a broken hashtag. A ZWS inside a URL makes the link unusable. A ZWS in an SEO title can create indexing anomalies or snippet truncation. When copied from Slack or Google Docs, ZWS often embeds itself in emojis or formatting transitions. It is also common in AI generated paragraphs because large language models sometimes predict hidden unicode between tokens when they try to imitate sophisticated punctuation or typography.

The zero width non joiner (ZWNJ)

ZWNJ was designed for scripts where characters naturally join, such as Arabic or Persian. Its role is to interrupt the join without creating a visible space. In English text, ZWNJ can trigger strange layout behaviour, including words that fail to wrap, hyphens that behave unpredictably or pasted lines that refuse to break in Markdown or HTML. ZWNJ does not belong in most Western writing pipelines, yet it frequently appears when text flows through PDF exports, OCR systems or non Latin keyboard integrations. Once it is inserted, it can travel from platform to platform without detection.

The zero width joiner (ZWJ)

ZWJ is the inverse of ZWNJ. Instead of breaking a join, it forces one. ZWJ is now widely used in emoji composition. Sequences such as family emojis, multi person groups or skin tone variants are produced by strings of emojis linked by ZWJ. Copying these sequences from WhatsApp, Messenger, Teams or Slack almost always adds ZWJ to the clipboard. When those ZWJ characters drift into standard text, layout inconsistencies emerge. A paragraph may reflow differently on iPhone than on desktop Chrome. A title may shift position by a few pixels. A bullet list may collapse unexpectedly. The character is invisible but the impact is visible on every platform that interprets unicode differently.

The byte order mark (BOM)

BOM is a relic of unicode encoding history. Its goal was to identify endianness in text streams. Today it appears mainly when exporting text from code editors, PDFs or certain mobile apps. A BOM at the beginning of a file can change how browsers interpret encoding, which leads to mojibake, meta tag misreading or malformed JSON. In content workflows, a BOM usually appears when copying fragments from older systems or when AI models generate output with subtle encoding artefacts.

How zero width characters enter modern content workflows

The most surprising aspect of zero width characters is not their existence but their ubiquity. None of them were designed for LinkedIn posts, tweets, SEO metadata or blog content. They appear everywhere because modern text pipelines are chaotic by nature and move content through AI systems, apps, editors and export layers. Each layer can inject behaviour that was never intended for English language publishing.

Slack as a source of unicode ambiguity

Slack uses a hybrid markup engine that blends Markdown, emoji processing and proprietary formatting rules. When copying a message that contains emojis, italic markers or user mentions, Slack may insert ZWJ or ZWS to preserve visual grouping. These characters survive the copy paste operation. When the text is then pasted into LinkedIn, WordPress or Notion, invisible alignment bugs appear. Line breaks refuse to collapse, hashtags misbehave and emojis appear stuck to the preceding word. Because the characters are invisible, most users never suspect Slack as the source of the corruption.

Google Docs as a hidden unicode factory

Google Docs is one of the largest unintentional producers of zero width characters. Its collaboration system adds formatting metadata inside the text stream and part of that metadata uses ZWS or ZWNJ. When content is exported to web content management systems, Docs introduces joining or non joining marks that alter line wrapping in subtle ways. When the text is cleaned, it suddenly behaves as expected. When it remains uncleaned, behaviour diverges across browsers. This is why blog content pasted from Docs can look correct but render incorrectly.

PDFs and OCR pipelines

Zero width characters are common byproducts of PDF extraction or OCR. Converters attempt to reconstruct original spacing and sometimes insert ZWS to approximate word boundaries. When pasted into HTML, these characters create layout instability. They also produce inconsistent tokenisation when processed by AI models, which means AI rewrites based on PDF content can propagate hidden characters indefinitely.

Messaging apps and emoji variants

Apps such as WhatsApp, iMessage and Messenger rely heavily on ZWJ sequences for emoji customization. Copying a message that contains composite emojis drags those ZWJ links into the clipboard. If the next destination interprets ZWJ differently, the emoji may split into components or the layout may shift. In some environments, the emoji continues to display correctly while the invisible ZWJ remains embedded in surrounding text and quietly alters formatting rules.

Why zero width characters affect AI generated text

Large language models operate on token prediction. Some tokens map directly to invisible unicode characters. Others represent sequences that, once decoded, produce characters such as ZWS or ZWJ. When a model attempts to mimic human writing patterns, stylistic spacing or emoji composition, it may choose sequences that incidentally include zero width characters. The model does not understand layout. It predicts tokens that statistically match the patterns it has seen. This is why AI generated paragraphs sometimes contain spacing anomalies or stubborn formatting bugs that persist even after several rewrites.

The ghost token phenomenon

Some AI models occasionally produce zero width characters as a side effect of token boundary rules. These characters behave like ghosts inside the text stream and cause unexpected wrapping in CMS editors, SEO fields and mobile layouts. They are invisible to the eye but not to the rendering engine. InvisibleFix detects and removes these ghosts before the content reaches production.

Emoji chains and ZWJ leakage

When AI generates text that includes emojis or lists of emoji variants, it may output ZWJ sequences unintentionally. Even a single invisible joiner embedded inside a paragraph can produce layout inconsistencies. Some mobile browsers interpret ZWJ as a hint that modifies preceding glyphs, which leads to spacing or kerning variations. Cleaning these sequences stabilises rendering.

How zero width characters damage real world publishing

The silent nature of zero width characters makes them particularly dangerous in production workflows. They create broken links, unstable layouts, inconsistent SEO previews and platform dependent rendering. Because they do not appear visually, most creators assume the issue lies inside the CMS, the browser or the platform. In reality, the culprit is often a single invisible byte.

Broken hashtags and dead URLs

A ZWS inside a hashtag cuts the tag in half. A ZWNJ inside a URL makes the string unrecognisable to parsers. When content originates from AI tools or Slack, these anomalies are common. Cleaning restores structural integrity.

Unstable line wrapping on mobile

A paragraph that contains ZWNJ or ZWJ may wrap differently in Chrome, Safari and mobile browsers. This creates unpredictable UX on social platforms where character counts and line breaks matter. Removing zero width characters ensures consistent layout across devices.

SEO metadata corruption

Meta titles and descriptions that contain zero width characters may appear correct in WordPress but break inside search snippet generators. Invisible characters change pixel width calculations and can cause truncated or malformed search results. Cleaning metadata is essential for stable SEO previews.

How InvisibleFix removes zero width characters reliably

InvisibleFix detects zero width characters with a sanitisation engine built specifically for mixed unicode streams. Instead of relying on simple regular expressions, the engine analyses actual byte sequences to avoid false positives or accidental removal of legitimate script features. This makes cleaning safe for multilingual workflows. When sanitisation is applied, ZWS, ZWNJ, ZWJ and BOM are removed without changing visible text, which produces a normalised, platform stable output ready for LinkedIn, WordPress, Slack and SEO pipelines.

Zero width characters are invisible but their impact never is

Zero width characters represent an invisible layer of complexity inside modern text workflows. They originate from legitimate linguistic needs but now permeate AI tools, messaging apps and collaborative editors. They break formatting, disrupt SEO, interfere with links and create subtle rendering differences across browsers. Cleaning them is not optional. It is a critical step in preparing text for publication, whether the content is meant for websites, social platforms or internal documentation. InvisibleFix simplifies this process by exposing the hidden layer and removing these anomalies before they reach production.

Recent Posts