4 min read

Face Off: Review OpenCDISC XML files

OpenCDISC, the first open source CDISC validator, is already in the toolbox of FDA reviewers (CDER/CBER, see CDISC Standards in the Regulatory Submission Process, 26 January 2012, P.33). The key features in OpenCDISC is a dichotomy of validation rules (XML based) and application logic. Currently OpenCDISC Validator (Version 1.2.1) officially supports the four following CDISC modules:

You can get the corresponding configuration files (validation rules) online or in the software folder (in ..opencdisc-validatorconfig with extension of .xml). Since SDTM 3.1.2 has the most rich set of validation rules from Janus, WebSDM and of course additional  OpenCDISC rules by itself, its configuration file (config-sdtm-3.1.2.xml) deserves more attention. Better understanding of config-sdtm-3.1.2.xml is the first step to customize the software according to business needs. Followings are some personal tips and tricks to play and even “torture” the file, using Notepad++, web browsers (IE and Firefox), Excel with MSXML and SAS XML Mapper.

1. DON’T use the Windows default Notepad to open and edit the xml file


while the reason:

if you use Notepad to open a XML file, almost you get nothing but strings and strings.

and another supporting reason, see bellowing picture.

2. USE Notepad++ or other REAL text editors to open and edit it


Notepad++ makes the difference. It supports multiple tabs view, XML syntax highlighting and XML tags match and other fancy stuff never in the plain Notepad. And like OpenCDISC, it’s free, both in sense of free beer and free speech.

Other real text editor, include Vim, UltraEdit and such, but for most users, I still think Notepad++ is the most handy one.

3. At first, use a web browsers to review it


It is the web view of config-sdtm-3.1.2.xml. The secret is a style file, define-1.0.xsl in ..opencdisc-validatorconfigschematron. This is another story of dichotomy. The config-sdtm-3.1.2.xml file itself is only used to store metadata (machine-readable), while the style file (also a XML file) used to instruct how to display it (human-readable). Within some proper internal interface, web browsers (I tested in IE and Firefox; Google Chrome doesn’t work). Excel can also render this XML file well (only test on Excel 2010 and 2007) while Web view is much better:


4. The real awesome job: use Microsoft XML parser or other XML parsers to dig into XML structure


I use Excel 2010 with Microsoft XML parser (MSXML 6.0. You can get the version of your MSXML by visiting this website in IE and you will get the different results when switching to other web browsers because Firefox and Chrome use other parsers).

You can also get a instance of each XML tag:


5. The real awesome job: use SAS XML Mapper to get the tabulation view

And you may want to exact all the tables in the XML file with tabulation view, ideally, in SAS dataset:

For example, the first few rows in config-sdtm-3.1.2.xml:


and the corresponding SAS dataset:


Actually you can put all the data in XML into a big dataset but with lots of redundancies. To use SAS XML Mapper (the latest version is 9.3), you should design a mapping file to tell the structure of the XML file. For the simple ODM dataset, you indicate the table name, column name and path, type and length:


It never be fun to play with XML files. SAS XML Mapper is supposed to read CDISC ODM based XML files automatically (OpenCDISC XML files are called ODM compliant), but at least for this config-sdtm-3.1.2.xml, it failed and that’s why we should create a mapping file (see above) by ourselves. Fortunately you don’t need to write it from scratch (it would be thousands of lines of codes):

  • find a CDISC ODM based XML file that SAS XML Mapper can read automatically, e.g., in http://www.cdisc.org/define-xml, a file named define-example1.xml works well.
  • use AutoMap function in SAS XML Mapper to get the mapping file.
  • modify the mapping file to fit your needs.
  • for details, refer SAS XML mapping syntax.

6. Final Notes for Excel

Right click config-sdtm-3.1.2.xml then open with “Microsoft Excel”:


Option 2 will go to section 3. If go with option 1:


Option 1-1 and 1-2:   tabulation view in section 5

Option  1-3:  tag view in section 4.