Natural sort order comparison – Part 1

Introduction

One of the things that are at the origin of my passion for programming is the surprise factor. No matter how many experience and studies you make any new assignment brings something new, something not completely explored.
My excursions to the UI are not so frequent and because of that I do not encounter so often some of classical problems that usually are related to UI, in particular sorting cases.
Recently I worked on a project that in UI lists different versions of one well defined entity. The main property of the entity is the version number and it is type of string. Mentioned property, along with others, is shown inside a ListView control. Sorting the list using the standard string comparison was not an option. By using a standard string comparison, numerical part of the string is not treated as a number, instead is treated as a string, which gives that the string “v1.9” comes after “v1.10”. Actually, if we consider a dot as a decimal separator, this is a correct result. However, in our case we need to consider a dot as a major-minor separator and as this we need to treat the minor part as an integer. In order to make the example easier, we will use only major number as version indicator. The entity we are going to list will contain 13 different versions, from “v1” to “v13”. Using a standard string comparison, once we order our list on version field, the result will be the following:

v1
v10
v11
v12
v13
v2

v9

Actually this is correct result if we consider the numbers as ASCII characters, but if we, as usually humans do, would like to consider the numeric part as an integer, this is an undesirable result. What we are expecting to have is the following:

v1
v2

v9
v10
v11
v12
v13

You will find a similar behavior in your Windows Explorer each time you order the files by name.

Now how do we solve this task?

Getting started

In the front of a similar problem, I don’t like just to Google a couple of keywords and use the first solution that appears on the search list. Also, the application is flexible enough so we can easily encounter different user settings and the sorting requirements may change. In order to see a bigger picture I decided to explore this area a bit more in detail.
At the end I found two viable approaches to this problem

  • Using a StrCmpLogicalW function via P/Invoke
  • Writing a custom class that implements IComparer interface and a custom comparison algorithm
    Both methods are valid, each one with its own pros and cons. So let’s see them both in detail.

StrCmpLogicalW via P/Invoke

Inside the shlwapi.dll you will find a method called StrCmpLogicalW, which, as said in method’s description, “Compares two Unicode strings. Digits in the strings are considered as numerical content rather than text.”. This is a pretty handy method, it is easy to implement and it is fast. However it has couple of cons that can make you choose different approach, as example it shows different behavior on Windows XP instead of Windows Vista. Also it is not implemented in Windows 2000 and numerical content is always treated as integer. However, it is very unlikely that you will find yourself in any of listed situations and that the standard functionalities offered by StrCmpLogicalW will be a limitation.

The implementation

For the less experienced programmers, I will show a decent way to create a class that implements IComparer interface and uses the StrCmpLogicalW class in order to calculate the comparison result.
As first we should declare a helper class that will make the calls to the StrCmpLogicalW easier.

[SuppressUnmanagedCodeSecurity]
internal static class SafeNativeMethods
{
    [DllImport("shlwapi.dll", CharSet = CharSet.Unicode)]
    public static extern int StrCmpLogicalW(string psz1, string psz2);
}

Actually there is not a lot to say about this class. It is an internal (the class can be accessed by any code in the same assembly, but not from another assembly) and is declared as static (indicating that it contains only static members). SuppressUnmanagedCodeSecurity attribute is primarily used to increase performance, for further details consult the following page http://msdn.microsoft.com/en-us/library/ms182311(v=vs.80).aspx.
Now we have all the necessary in order to implement the following class

public sealed class NaturalOrderComparerNative : IComparer, IComparer
{
    public int Compare(string a, string b)
    {
        return SafeNativeMethods.StrCmpLogicalW(a, b);
    }
 
    public int Compare(object x, object y)
    {
        string a = x as string;
        string b = y as string;
 
        return Compare(a, b);
    }
}

Implementing the IComparer interface is straight forward as it requires only Compare method to be coded. Compare method has the following signature public int Compare(object x, object y), it compares given objects and returns a value indicating whether first is less than, equal to, or greater than the second one. In case that x is less than y it will return -1, if x equals y will return 0 (zero) and in case x is greater than y will return 1.
I also implemented a generic version of this interface, IComparer. I think that there are no more explanations to do about this small piece of code. If not, just comment under the article and I will try to do my best in order to make this clear.
So let’s summarize:

PROS CONS
Performances Platform dependent
Simple implementation Non customizable
Proven algorithm

In the second part of this article (blog post) we will see how to implement a custom algorithm and some usage examples, together with full source code in C# and VB.

Follow us so you can be notified about the new posts.