String.count vs NSString.length

String.count vs NSString.length

Watch out for this difference in behaviour between String count and NSString length

·

2 min read

I recently came across a bug in Lyrcs that caused an incorrect rhyme highlight to be applied.

Mat and cataa do not rhyme, yet Lyrcs is showing they do. The cause is a simple one but easy to miss; String.count and NSString.length do not always return the same value especially when emojis are involved. Lyrcs was performing the following:

let string: String = ...
let range = NSRange(location: 0, length: string.count)
(string as NSString).enumerateSubstrings(in: range, options: .byWords) { substring, range, _, _ in
    ...
}

This meant that the range was a few characters short of the whole range of string.

An excellent resource to learn more about this is Counting Characters from the Swift programming language book:

The count of the characters returned by the count property isn’t always the same as the length property of an NSString that contains the same characters. The length of an NSString is based on the number of 16-bit code units within the string’s UTF-16 representation and not the number of Unicode extended grapheme clusters within the string.

There are a couple of ways to solve this:

  1. Use a Swift range created from string

     let string: String = ...
     let range = string.startIndex..<string.endIndex
     string.enumerateSubstrings(in: range, options: .byWords) { substring, range, _, _ in
         ...
     }
    
  2. Use NSString.length

     let string: String = ...
     let range = NSRange(location: 0, length: (string as NSString).length
     (string as NSString).enumerateSubstrings(in: range, options: .byWords) { substring, range, _, _ in
         ...
     }
    
  3. Use Swift.utf16.count

     let string: String = ...
     let range = NSRange(location: 0, length: string.utf16.count)
     (string as NSString).enumerateSubstrings(in: range, options: .byWords) { substring, range, _, _ in
         ...
     }
    

The pure Swift approach would be the first one but there are times where you're dealing with Objective C APIs and it may be easier to resort to the trusty as NSString cast. You'll be pleased to know this has since been fixed in Lyrcs 3.5.1.