2 min read

A SAS Note for Length Limit of Strings in CDISC Datasets

Clinical programmers are very familiar with the length limit of strings in CDISC compliant datasets, such as

  • #1: variable names: <= 8 characters
  • #2: variable labels: <= 40 characters
  • #3: data set labels: <= 40 characters
  • #4: data value of a single variable: <= 200 characters

and they are due to the limitations of SAS XPORT transport files (v5). But there are some rules which are not  as explicit as rule #1-4, for example,

  • #5: For domain LB, data value of variable LBTESTCD: <= 8 characters
  • #6: For domain LB, data value of variable LBTEST: <= 40 characters

Rule #5-6 is much more strict than #4  where 200 is the maximum. Actually, in SDTM Implementation Guide, the rational was well stated (“4.1.5.3.1  TEST NAME (–TEST) GREATER THAN 40 CHARACTERS”):

Since the –TEST variable is meant to serve as a label for a –TESTCD when a Findings dataset is transposed to a more horizontal format, the length of –TEST is normally limited to 40 characters to conform to the limitations of the SAS V5 Transport format currently used for submission datasets.

You can take the following piece of codes to understand it where rule #5-6 actually are set to follow rule #1-2:

data lb;
    input usubjid $ lbtestcd $  lbtest $ lborres visitnum;
datalines;
001 AA aaaaa 12 1
001 BB bbbbb 20 1
001 CC ccccc 3  1
001 AA aaaaa 4  2
001 BB bbbbb 40 2
001 CC ccccc 35 2
;

proc transpose data=lb out=lb2 (drop=name);
    by usubjid  visitnum;
    var lborres;
    id lbtestcd;
    idlabel lbtest;

run;

idlabel