skip to navigation
skip to content

Not Logged In

natsort 2.2.0

Sort lists naturally

Latest Version: 3.1.1

Natural sorting for python. natsort requires python version 2.6 or greater (this includes python 3.x). To run version 2.6, the argparse module is required.

natsort comes with a shell script that is desecribed below. You can also execute natsort from the command line with python -m natsort.

Synopsis

The natsort package provides a key helps sorts lists "naturally"; that is it sorts alphabetically and numerically, and not by ASCII. It provides support for ints and floats (including negatives and exponental notation) and basic support for sorting version numbers (1.2.3, no letters).

When you try to sort a list of strings that contain numbers, the normal python sort algorithm sorts by ASCII, so you might not get the results that you expect:

>>> a = ['a2', 'a8', 'a7', 'a5', 'a9', 'a1', 'a4', 'a10', 'a3', 'a6']
>>> sorted(a)
['a1', 'a10', 'a2', 'a3', 'a4', 'a5', 'a6', 'a7', 'a8', 'a9']

Notice that it has the order ('1', '10', '2')? This is because the list is being sorted in ASCII order, which sorts numbers like you would letters (i.e. 'a', 'at', 'b'). It would be better if you had a sorting algorithm that recognized numbers as numbers and treated them like numbers, not letters. This is where natsort comes in:

>>> from natsort import natsorted
>>> a = ['a2', 'a8', 'a7', 'a5', 'a9', 'a1', 'a4', 'a10', 'a3', 'a6']
>>> natsorted(a)
['a1', 'a2', 'a3', 'a4', 'a5', 'a6', 'a7', 'a8', 'a9', 'a10']

natsort identifies the numbers and sorts them separately from the letters. It not only works on integers, but on floats as well (even in exponential notation!):

>>> a = ['a50', 'a51.', 'a50.4', 'a5.034e1', 'a50.300']
>>> sorted(a)
['a5.034e1', 'a50', 'a50.300', 'a50.4', 'a51.']
>>> natsorted(a)
['a50', 'a50.300', 'a5.034e1', 'a50.4', 'a51.']

Last, natsort can also handle version numbers appropriately, provided they are in the format "M.m.p" where M=major version number, m=minor version number, and p=patch. The patch can be just a number, or can also be a number followed by a letter. You must include the patch, otherwise natsort will think it is a float:

>>> a = ['1.9.9a', '1.11', '1.9.9b', '1.11.4', '1.10.1']
>>> sorted(a)
['1.10.1', '1.11', '1.11.4', '1.9.9a', '1.9.9b']
>>> natsorted(a)
['1.9.9a', '1.9.9b', '1.10.1', '1.11.4', '1.11']
>>> # If you had used 1.11.0 instead of 1.11 this would work properly
>>> a = ['1.9.9a', '1.11.0', '1.9.9b', '1.11.4', '1.10.1']
>>> natsorted(a)
['1.9.9a', '1.9.9b', '1.10.1', '1.11.0', '1.11.4']

natsort is not necessarily optimized for speed, but it is designed to be as flexible as possible.

A Note About Sorting Version Numbers

The algorithm that natsort uses is optimized to find negative numbers and floating point numbers (including those with exponentials). Because of this, you might not get results you expect when sorting version numbers. For example:

>>> available_versions = ['1.8.1-r26', '1.8.1-r30', '2.0-r2', '2.0-r7', '2.0-r11']
>>> natsorted(available_versions)
['1.8.1-r26', '1.8.1-r30', '2.0-r2', '2.0-r7', '2.0-r11']

The above works fine, but adding a prefix can mess things up if you are not careful:

>>> natsorted(['my-package-{0}'.format(v) for v in available_versions])
['my-package-2.0-r2', 'my-package-2.0-r7', 'my-package-2.0-r11', 'my-package-1.8.1-r26', 'my-package-1.8.1-r30']

This is not in the order you might expect. natsort sees the '-' before the number and starts to look for a float. It sees ones in -2.0 and -1.8, and then sorts them in increasing order. Obviously, this is not what you want for version numbers (but would be good for floats). This can be fixed by not using a dash as a separator:

>>> natsorted(['my-package{0}'.format(v) for v in available_versions])
['my-package1.8.1-r26', 'my-package1.8.1-r30', 'my-package2.0-r2', 'my-package2.0-r7', 'my-package2.0-r11']

If you find that you need to be able to sort version numbers more reliably, I recommend taking a look at the naturalsort package which will give you what you expect for version numbers.

API

The natsort package provides three functions: natsort_key, natsorted, and index_natsorted.

natsort_key

Using natsort_key is just like any other sorting key in python:

>>> from natsort import natsort_key
>>> a = ['num3', 'num5', 'num2']
>>> a.sort(key=natsort_key)
>>> a
['num2', 'num3', 'num5']

Of course, you can chain natsort_key with other functions to sort by some attribute of a class (for example). The easiest way is to make a lambda expression that calls natsort_key:

>>> class Foo:
...    def __init__(self, bar):
...        self.bar = bar
...    def __repr__(self):
...        return "Foo('{0}')".format(self.bar)
>>> b = [Foo('num3'), Foo('num5'), Foo('num2')]
>>> b.sort(key=lambda x: natsort_key(x.bar)) # Get attribute explicitly
>>> b
[Foo('num2'), Foo('num3'), Foo('num5')]
>>> c = [Foo('num3'), Foo('num5'), Foo('num2')]
>>> from operator import attrgetter
>>> f = attrgetter('bar') # Using the operator module gives more flexibility
>>> c.sort(key=lambda x: natsort_key(f(x)))
>>> c
[Foo('num2'), Foo('num3'), Foo('num5')]

natsorted

natsorted is simply a wrapper for sorted(list, key=natsort_key):

>>> from natsort import natsorted
>>> a = ['num3', 'num5', 'num2']
>>> natsorted(a)
['num2', 'num3', 'num5']

natsorted also supports a key argument just like the sorted function. Using our Foo class from above:

>>> b = [Foo('num3'), Foo('num5'), Foo('num2')]
>>> natsorted(b, key=attrgetter('bar'))
[Foo('num2'), Foo('num3'), Foo('num5')]

index_natsorted

Use index_natsorted if you want to sort multiple lists by the sorting of one list:

>>> from natsort import index_natsorted
>>> a = ['num3', 'num5', 'num2']
>>> b = ['foo', 'bar', 'baz']
>>> index = index_natsorted(a)
>>> index
[2, 0, 1]
>>> # Sort both lists by the sort order of a
>>> [a[i] for i in index]
['num2', 'num3', 'num5']
>>> [b[i] for i in index]
['baz', 'foo', 'bar']

Again, index_natsorted accepts a key argument:

>>> c = [Foo('num3'), Foo('num5'), Foo('num2')]
>>> index_natsorted(c, key=attrgetter('bar'))
[2, 0, 1]

Shell Script

For your convenience, there is a natsort shell script supplied to you that allows you to call natsort from the command-line. natsort was written to aid in computational chemistry research so that it would be easy to analyze large sets of output files named after the parameter used:

$ ls *.out
mode1000.35.out mode1243.34.out mode744.43.out mode943.54.out

(Obviously, in reality there would be more files, but you get the idea.) Notice that the shell sorts in ASCII order. This is the behavior of programs like find as well as ls. The problem is, when passing these files to an analysis program causes them not to appear in numerical order, which can lead to bad analysis. To remedy this, use natsort:

# This won't get you what you want
$ foo *.out
# This will sort naturally
$ natsort *.out
mode744.43.out
mode943.54.out
mode1000.35.out
mode1243.34.out
$ natsort *.out | xargs foo

You can also filter out numbers using the natsort command-line script:

$ natsort *.out -f 900 1100 # Select only numbers between 900-1100
mode943.54.out
mode1000.35.out

If needed, you can exclude specific numbers:

$ natsort *.out -e 1000.35 # Exclude 1000.35 from search
mode744.43.out
mode943.54.out
mode1243.34.out

For other options, use natsort --help.

It is also helpful to note that natsort accepts pipes, and also will sort each directory in a PATH independently of each other. Files in the current directory are listed before files in subdirectories.

Author

Seth M. Morton

History

6-25-2013 v. 2.2.0

  • Added key attribute to natsorted and index_natsorted so that it mimics the functionality of the built-in sorted
  • Added tests to reflect the new functionality, as well as tests demonstrating how to get similar functionality using natsort_key.

12-5-2012 v. 2.1.0

  • Reorganized package
  • Now using a platform independent shell script generator (entry_points from distribute)
  • Can now execute natsort from command line with python -m natsort as well

11-30-2012 v. 2.0.2

  • Added the use_2to3 option to setup.py
  • Added distribute_setup.py to the distribution
  • Added dependency to the argparse module (for python2.6)

11-21-2012 v. 2.0.1

  • Reorganized directory structure
  • Added tests into the natsort.py file iteself

11-16-2012, v. 2.0.0

  • Updated sorting algorithm to support floats (including exponentials) and basic version number support
  • Added better README documentation
  • Added doctests
 
File Type Py Version Uploaded on Size
natsort-2.2.0.tar.gz (md5) Source 2013-06-26 13KB
natsort-2.2.0.zip (md5) Source 2013-06-26 22KB
  • Downloads (All Versions):
  • 128 downloads in the last day
  • 1736 downloads in the last week
  • 5543 downloads in the last month