2 min read

XML and SAS

Last month, I gave a talk, XML: the SAS Approach, in CDISC Interchange China 2010(at the Medical School of Fudan University, Shanghai, 2010-09-15). FDA favors CDISC and HL7, the two XML based standards, and SAS programmers in biopharmaceutical industry  need incorporate the XML technology into their toolboxes. Fortunately, you don’t need to be an XML expert then to play XML in your daily work, and, SAS system DOES offer multiple tools and applications to handle XML files, i.e. import and export XML data:

  • SAS data steps approach:                        import and export
  • SAS XML Libname engine:                         import and export
  • SAS ODS XML statement(ODS MARKUP):   export
  • PROC CDISC:                                            import and export
  • SAS XML Mapper:                                      import
  • SAS CDISC Viewer:                           as if  import

The SAS CDISC Viewer and PROC CDISC procedure are some bit toys, and the rest really work. The Perl Regular Expression(PRX) approach is also presented to export and import XML data.

A simple demo. First, use FILE and PUT statements to generate an XML file:

data _null_;
    file “export.xml”;
    put ‘<?xml version=“1.0” encoding=“windows-1252” ?>’;
    put ‘’;
    put ‘’;
    put ‘ Welcome to CDISC Interchange 2010 China ’;
    put ‘
’;
    put ‘’;
    put ‘ We are in Shanghai! ’;
    put ‘
’;
    put ‘
’; 
run;

Then read the whole XML file to SAS dataset:

data import0 ;
    infile “export.xml” dsd missover truncover lrecl = 1024;
    input line $1024.;
    if line = ” then delete;
run;

Third step, extract the information you want(the text between and tags) using  Perl Regular Expression:

data import (keep = line );
     retain queName ;
     retain line ;
     set import0;     

     /*use PRX to capture the structure of XML data;*/
     If _n_=1 then do;
            queName=prxparse(‘/^ /’);
     end;
     queNameN=prxmatch(queName,line);

    /*use PRX to remove the XXML tags;*/
     if queNameN>0 then do;
        rx1=prxparse(“s/<.*?>//“);
        call prxchange(rx1,99,line);
        output;
     end;     
run;

The logic of PRX approach to process XML data is very simple and can be easily modified according to your needs:

  • complicate and utilize the PRX codes to capture the hierarchical structure of XML data.
  • remove XML tags and output the information to SAS dataset.