String.count vs NSString.length
Watch out for this difference in behaviour between String count and NSString length
I recently came across a bug in Lyrcs that caused an incorrect rhyme highlight to be applied.
Mat
and cataa
do not rhyme, yet Lyrcs is showing they do. The cause is a simple one but easy to miss; String.count
and NSString.length
do not always return the same value especially when emojis are involved. Lyrcs was performing the following:
let string: String = ...
let range = NSRange(location: 0, length: string.count)
(string as NSString).enumerateSubstrings(in: range, options: .byWords) { substring, range, _, _ in
...
}
This meant that the range
was a few characters short of the whole range of string
.
An excellent resource to learn more about this is Counting Characters from the Swift programming language book:
The count of the characters returned by the
count
property isn’t always the same as thelength
property of anNSString
that contains the same characters. The length of anNSString
is based on the number of 16-bit code units within the string’s UTF-16 representation and not the number of Unicode extended grapheme clusters within the string.
There are a couple of ways to solve this:
Use a Swift range created from
string
let string: String = ... let range = string.startIndex..<string.endIndex string.enumerateSubstrings(in: range, options: .byWords) { substring, range, _, _ in ... }
Use
NSString.length
let string: String = ... let range = NSRange(location: 0, length: (string as NSString).length (string as NSString).enumerateSubstrings(in: range, options: .byWords) { substring, range, _, _ in ... }
Use
Swift.utf16.count
let string: String = ... let range = NSRange(location: 0, length: string.utf16.count) (string as NSString).enumerateSubstrings(in: range, options: .byWords) { substring, range, _, _ in ... }
The pure Swift approach would be the first one but there are times where you're dealing with Objective C APIs and it may be easier to resort to the trusty as NSString
cast. You'll be pleased to know this has since been fixed in Lyrcs 3.5.1.