1 min read

Linguistic Sorting in SAS Proc Sort

Just took a look at the linguistic sorting features in SAS Sort procedure, and got some neat options to apply to my task. For example, I want to sort ID in the following dataset:

data t1;
    input ID $ ;
datalines;
T20
T4
T3
T1
;

and want to get such intuitive orderings (files sorting in Window 7 directory):

sort_num

But when apply the default sorting:

proc sort data=t1 out=t2;
    by ID;
run;

I get:

T1
T20
T3
T4

To produce what expected, add a SORTSEQ option:

proc sort data=t1 out=t3  
    SORTSEQ=LINGUISTIC(NUMERIC_COLLATION=ON)
          ;
    by ID;
run;

T1
T3
T4
T20

In the first block of code, the default sorting is determined by their characters’ appearance in EBCDIC or the ASCII tables (according to OS). To change this default collating sequences, a specific linguistic collation (numeric collation) option added.

For details, see the corresponding part in SAS SORT Procedure and Collating Sequence in SAS® 9.3 National Language Support (NLS) with a great paper.