Customizable Natural-Order Sort

Versione 3.4.5 (80,5 KB) da Stephen23
Alphanumeric sort of a cell/string/categorical array, with customizable number format.
4,9K download
Aggiornato 13 lug 2023

Visualizza la licenza

Nota dell'editore: This file was selected as MATLAB Central Pick of the Week

To sort any file-names or folder-names use NATSORTFILES:
To sort the rows of a string/cell array use NATSORTROWS:
Summary
Alphanumeric sort the text in a string/cell/categorical array. Sorts the text by character code taking into account the values of any number substrings. Compare for example:
X = {'a2', 'a10', 'a1'};
sort(X)
ans = 'a1' 'a10' 'a2'
natsort(X)
ans = 'a1' 'a2' 'a10'
By default NATSORT interprets all consecutive digits as integer numbers, the number substring recognition can be specified using a regular expression, allowing the number substrings to have:
  • a +/- sign
  • a decimal point and decimal fraction
  • E-notation exponent
  • decimal, octal, hexadecimal or binary notation
  • Inf or NaN values
  • criteria supported by regular expressions: lookarounds, quantifiers, etc.
And of course the sorting itself can also be controlled:
  • ascending/descending sort direction
  • character case sensitivity/insensitivity
  • relative order of numbers vs. characters
  • relative order of numbers vs NaNs
Examples
%% Multiple integers (e.g. release version numbers):
>> A = {'v10.6', 'v9.10', 'v9.5', 'v10.10', 'v9.10.20', 'v9.10.8'};
>> sort(A) % for comparison.
ans = 'v10.10' 'v10.6' 'v9.10' 'v9.10.20' 'v9.10.8' 'v9.5'
>> natsort(A)
ans = 'v9.5' 'v9.10' 'v9.10.8' 'v9.10.20' 'v10.6' 'v10.10'
%% Integer, decimal, NaN, or Inf numbers, possibly with +/- signs:
>> B = {'test+NaN', 'test11.5', 'test-1.4', 'test', 'test-Inf', 'test+0.3'};
>> sort(B) % for comparison.
ans = 'test' 'test+0.3' 'test+NaN' 'test-1.4' 'test-Inf' 'test11.5'
>> natsort(B, '[-+]?(NaN|Inf|\d+\.?\d*)')
ans = 'test' 'test-Inf' 'test-1.4' 'test+0.3' 'test11.5' 'test+NaN'
%% Integer or decimal numbers, possibly with an exponent:
>> C = {'0.56e007', '', '43E-2', '10000', '9.8'};
>> sort(C) % for comparison.
ans = '' '0.56e007' '10000' '43E-2' '9.8'
>> natsort(C, '\d+\.?\d*(E[-+]?\d+)?')
ans = '' '43E-2' '9.8' '10000' '0.56e007'
%% Hexadecimal numbers (with '0X' prefix):
>> D = {'a0X7C4z', 'a0X5z', 'a0X18z', 'a0XFz'};
>> sort(D) % for comparison.
ans = 'a0X18z' 'a0X5z' 'a0X7C4z' 'a0XFz'
>> natsort(D, '0X[0-9A-F]+', '%i')
ans = 'a0X5z' 'a0XFz' 'a0X18z' 'a0X7C4z'
%% Binary numbers:
>> E = {'a11111000100z', 'a101z', 'a000000000011000z', 'a1111z'};
>> sort(E) % for comparison.
ans = 'a000000000011000z' 'a101z' 'a11111000100z' 'a1111z'
>> natsort(E, '[01]+', '%b')
ans = 'a101z' 'a1111z' 'a000000000011000z' 'a11111000100z'
%% Case sensitivity:
>> F = {'a2', 'A20', 'A1', 'a10', 'A2', 'a1'};
>> natsort(F, [], 'ignorecase') % default
ans = 'A1' 'a1' 'a2' 'A2' 'a10' 'A20'
>> natsort(F, [], 'matchcase')
ans = 'A1' 'A2' 'A20' 'a1' 'a2' 'a10'
%% Sort order:
>> G = {'2', 'a', '', '3', 'B', '1'};
>> natsort(G, [], 'ascend') % default
ans = '' '1' '2' '3' 'a' 'B'
>> natsort(G, [], 'descend')
ans = 'B' 'a' '3' '2' '1' ''
>> natsort(G, [], 'num<char') % default
ans = '' '1' '2' '3' 'a' 'B'
>> natsort(G, [], 'char<num')
ans = '' 'a' 'B' '1' '2' '3'
%% UINT64 numbers (with full precision):
>> natsort({'a18446744073709551615z', 'a18446744073709551614z'}, [], '%lu')
ans = 'a18446744073709551614z' 'a18446744073709551615z'

Cita come

Stephen23 (2024). Customizable Natural-Order Sort (https://www.mathworks.com/matlabcentral/fileexchange/34464-customizable-natural-order-sort), MATLAB Central File Exchange. Recuperato .

Compatibilità della release di MATLAB
Creato con R2010b
Compatibile con R2009b e release successive
Compatibilità della piattaforma
Windows macOS Linux
Categorie
Scopri di più su String Parsing in Help Center e MATLAB Answers

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
Versione Pubblicato Note della release
3.4.5

* Accept decimal comma as well as decimal point.
* HTML example use string arrays.

3.4.4

* Add testcases.

3.4.3

* Now R2009b compatible.

3.4.2

* Edit description & help.

3.4.1

* Edit description & help.

3.4.0

* Add plenty of testcases.
* Fix bug in descending sort with an empty input array.

3.3.0

* Improve test function, add test cases.

3.2.0

* Update TESTFUN.

3.1.0

* More robust TESTFUN pretty-print code.
* Improve option checking.

3.0.5

* Improve examples.

3.0.4

* Correct summary.

3.0.3

* Improve string handling.

3.0.2

* Simplify numeric class handling.
* Add permutations test examples.

3.0.1

* handle single element with no number.

3.0.0

* Accepts and sorts a string array, categorical array, cell array of char, etc.
* Regular expression and optional arguments may be string or char.
* Simplify char<num algorithm.
* Simplify debugging output cell array.

2.1.2

* Consistent alignment tab/spaces.

2.1.1

* Add error IDs.

2.1.0

* Fix handling of char<num.

2.0.0

Total rewrite: faster and less memory.
* Remove 'asdigit' option.
* Rename 'beforechar' and 'afterchar' to 'num<char' and 'char<num'.
* Add options 'num<NaN' and 'NaN<num'.
* Improve HTML documentation.
* Include testcases.

1.11.0.0

* Consistent internal variable names.

1.10.0.0

* Minor help edit.
* Improve input checking.
* Improve blurb and HTML.
* Add HTML documentation.

1.9.0.0

* Improve binary numeric handling.
* Improve handling of skipped fields.
* Add an example of skipped field usage.

1.8.0.0

* Improved binary substring parsing.
* Better examples.

1.7.0.0

- Update documentation only, improve examples.

1.6.0.0

- Add binary numeric parsing.
- Improve input checking.
- Replace multiple debugging output arrays with one cell array.
- Allow lookarounds in regular expression.

1.5.0.0

- Simplify hexadecimal example.
- Correct output summary.

1.4.0.0

- Now parses hexadecimal and octal substrings.
- int64 and uint64 parsed at full precision.
- Allow <options> in any order.
- For debugging: return indices of character and numeric arrays.

1.3.0.0

- Implement more compact sort algorithm.
- "sscanf" numeric format can be controlled by an optional input argument.
- Provide use examples.
- Output debugging arrays now char+numeric.

1.1.0.0

- Add examples showing different numeric tokens.
- Case-insensitive sort is now default.

1.0.0.0