Display an HTML encoded String in a UITextView without changing characters to emoji

I wanted to display some text in a UITextView, and one of the glyphs was a Unicode checkmark, which I didn’t think anything of. But what I was finding was that when iOS rendered the checkmark, it was an emoji version instead of the boring old Unicode variant.

Before I ran into that issue though, I had to solve how to turn an encoded HTML string into something useable. A string like this:

<b>Hello</b><br><p>This is normal text.</p><br>

So I wrote an extension on attributed string to solve this problem, which uses some hackery to coax iOS into parsing and rendering the string for me:

extension NSAttributedString {
    convenience init?(htmlEncodedString: String) throws {
        if let data = htmlEncodedString.data(using: .unicode) {
            let rawHTML = try NSAttributedString(data: data, options: [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType], documentAttributes: nil).string
            let styledHTML = "\(rawHTML)"

            if let htmlData = styledHTML.data(using: .unicode) {
                try self.init(data: htmlData, options: [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType], documentAttributes: nil)
            } else {
                return nil
            }
        } else {
            return nil
        }
    }

It abuses the fact that NSAttributedString can take an NSData, which can understand HTML encoding, and then you can grab the decoded HTML string from it. At runtime, the raw HTML variable will look like:

<b>Hello</b><br><p>This is normal text.</p><br>

After that, NSAttributedString can understand this HTML and turn it into something you can render for the user. The weirdness comes when you include Unicode characters like the following:

✉✔✌✍❤☀☂☯☢☎❄▶◀

I cannot guarantee that these are rendering the same on your machine, particularly if iOS has the same behavior in browser. For sake of precision, these values in encoded Unicode are supposed to be:

0x2709, 0x2714, 0x270C, 0x270D, 0x2764, 0x2600, 0x2602, 0x262F, 0x2622, 0x260E, 0x2744

But these were rendering in the UITextView as all emoji!? I found that solution was to use this scantily documented “Unicode variance selector” by suffixing the   violating Unicode values with it. Granted, I do not know if this is a definitive list of the Unicode values which do this on iOS, but I’ve wrapped all this up in an extension which you can use for your own purposes:

extension String {
    var escapingCharactersWithVariationSelector0E: String {
        var newStr = ""
        for unicodeScalar in unicodeScalars {
            switch unicodeScalar.value {
            case 0x2709, 0x2714, 0x270C, 0x270D, 0x2764, 0x2600, 0x2602, 0x262F, 0x2622, 0x260E, 0x2744:
                var escapedScalar = String(Character(unicodeScalar))
                escapedScalar.append("\u{0000FE0E}")
                newStr.append(escapedScalar)
            default:
                newStr.append(Character(unicodeScalar))
            }
        }

        return newStr
    }
}

To wrap this all up, if you’d like to display:

var str = "&lt;b&gt;Hello&lt;/b&gt;&lt;br&gt;&lt;p&gt;This is normal text.&lt;/p&gt;&lt;br&gt; ✉✔✌✍❤☀☂☯☢☎❄▶◀"

You can use:

do {
    textView.attributedText = try NSAttributedString(htmlEncodedString: str.escapingCharactersWithVariationSelector0E)
} catch {
    ()
}

Please share the other Unicode values that do this! And check out the attached Playground which demonstrates the effect.

StringPlayground.playground

Leave a Reply

Your email address will not be published. Required fields are marked *