Fraktal SAS Programming

Aus phenixxenia.org
Zur Navigation springen Zur Suche springen


Preface

The SAS System (SAS) is an impressive powerful ecosystem of languages, tools and programs leaving the user with all means at hand to work with data and satisfy his curiosity, be it of scientific origin or simply driven by work orders in a top-down ruled organization.

Given the above, it is not surprising that

  1. SAS license fees appear high, and
  2. the individual trying to start a user career feels pretty lonesome.

Since no one would buy a modern smartphone to simply make phone calls it is likewise un-appropriate to use SAS solely as a

  • SQL database system
  • basket of tabulation programs
  • graphics toolbox
  • web publishing agent
  • data-warehouse platform
  • statistics package
  • metadata manager
  • source code generator

Indeed, SAS can perform any of these functions, and more, and even worse, a small team of SAS geeks can deliver any combination of them as scenario-tailored application in an awesome short time frame.

Of course, the result will be a dynamic, self-documenting, metadata driven and generic sort of thing.

That’s why SAS starters feel lonesome and hence, matured users have organized themselves in non-commercial networks worldwide, the largest of which is PhUSE, the Pharmaceutical User Software Exchange.

Are you ready?

Welcome to the club!

Coding

Rules?

While there is no technical reason to introduce and follow coding rules and typographical conventions, it has proven as helpful to do so depending on working context and purpose that is followed.

SAS is freedom is good news for most ad-hoc programmers aiming to have results the same minute.

SAS is freedom is bad news for all team leads and managers bearing responsibility for sustainable usage of resources and maintenance of programs written by individuals that will most likely leave some day.

Throughout the text of this tutorial we will therefore adhere to a set of rules that might seem superfluous at 1st sight but will help to catch structure and process implemented in a program without deep-diving into the code.

Standards!

SAS supports modular coding very well because code processing follows a block or “group” structure as the architects at SAS Institute Inc. would put it. Let’s directly jump into this topic:

data basix;
city='Washington'; lat="038° 054′ N"; long="077° 002′ W"; output;
city='Berlin'; lat="052° 031′ N"; long="013° 024′ O"; output;
city='Tokyo'; lat="035° 041′ N"; long="139° 046′ O"; output;
proc sort; by lat;
proc print; run;

This appears to be an easy to read and straightforward written program, and this is definitely true. And indeed, this code will complete without error messages and produce a formatted list of three cities along with their explicit latitude and longitude.

But this is not the program that is processed by SAS.

What does SAS see?

Groups

The SAS compiler processes the source code submitted in so called steps which in turn are comprised from groups of lines terminated by a semicolon. If users do not code full steps, then SAS completes the code up to a certain amount.

Lines terminated with semicolon are called statements.

Steps comprised from statements like above are called run groups.

Logically, the submitted code from the above example, will be transformed into three run groups that are executed in discrete steps. In each step syntax check and handling of user feedback is handled separately.

data basix;
city='Washington'; lat="038° 054′ N"; long="077° 002′ W"; output;
city='Berlin'; lat="052° 031′ N"; long="013° 024′ O"; output;
city='Tokyo'; lat="035° 041′ N"; long="139° 046′ O"; output;
run;
proc sort data=basix out=basix; 
by lat;
run;
proc print data=basix; 
run;

Segments

As mentioned earlier, SAS coded workflow is processed as sequence of blocks or groups. Since this processing structure is used everywhere in SAS, we will refer to these blocks and groups as segments throughout the remainder of this text.

Due to various languages available inside SAS, particular segments might have their very special appearance. The run group example from above is merely one of them.

Segments from different syntaxes may be hierarchically nested.

Segments may not intersect, with one exception, however.

Macro

Straightforward Coding

Because it is the most prominent type in a professional senior SAS programmer’s life’s production (the Oeuvre), we will describe SAS Macro ("MACRO") coding as 1st segment type.

As we remember from the run group segment type example, code segments are verbatim encapsulated by a specific start string which is accompanied by an appropriate termination string. MACRO definitions are defined by using these two specific statements:

%MACRO name;
program code
%MEND name;
%NAME;

Generalized Approach

It appears necessary to stress here, that any MACRO does not execute the program code contained but passes it to the SAS compiler which will perform a Compile-and-Go step as default. Nevertheless it would be premature to assume that this mechanism requires the code to be SAS code.

Instead, it is possible to GENERATE-and-PASS any code.

SAS provides means and concepts to direct generated source code to appropriate agents, be it external programs or the operating system itself. OS functions may be called explicitly or implicitly or code may be written to a text file that is executed later on.

Out of the numerous options, the following two might appear quite useful.

Utilize OS Functions

1. Access results from OS commands as data source.

filename myfref  pipe “dir c:\ /d”;

This statement assigns a file reference with target type pipe. The pipe type dynamically accesses the result of an OS function as data stream that can be used as text input file inside a data step.

2. Perform an operation on OS level.

systask command “mkdir c:\&MYDIR.”;

The SYSTASK statement is a powerful means to initiate and control background tasks. With options WAIT/NOWAIT it provides direct utilization of OS multitasking by initiating parts of complex SAS code as background tasks.

Write Vector Graphics

filename _xml_ "&MYPTH.\&MYGPH..svg";
data _null_;
file _xml_;
put '<?xml version="1.0" standalone="no"?>';
put '<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">';
put '<svg xmlns="http://www.w3.org/2000/svg" version="1.1" 
 width="29.7cm" height="21cm" 
 viewBox="-200 -100 1200 800"> 
 <desc> Example anim01 - demonstrate animation elements 
 </desc>
 <title> SpotGrid 
 </title>
';
put '
 <rect id="OuterBorder" x="-4" y="-4" width="904" height="604" fill="rgb(255,255,255)" stroke="rgb(0,0,0)" stroke-width="8">  
 </rect>
';
put '</svg>';
run;

Advanced Coding

While the crude approach to MACRO programming is 1st choice for any ad-hoc implementation it will never result in a piece of software that will survive in a quality controlled environment. Moreover, when used as component in a modularized system it will not produce predictable results and very likely mess things up at run-time.

Why?

The key reason is, that SAS languages – like other programming languages – do use variable properties, but without forcing the programmer to deliver this information by declaring everything forehand in a header section or similar place.

SAS code is executed regardless whether explicit declaration is found or not. When none is found, SAS applies built-in rules to perform automatic declaration on which it then operates. Properties given that way might not conform with programmers’ expectations or the system’s design requirements.

That’s why!

Symbol Tables

Control information tokens, referred to as parameters are called macro variables ("variables") when writing MACROs. Macro variables are stored in tables, which have been given the name symbol tables ("tables").

On starting a SAS process a global symbol table is initiated and populated with control information used by the session and non-MACRO programs.

On invocation of a MACRO, a local symbol table is initiated and kept alive during run-time of the MACRO. Local symbol tables disappear on termination of "their" MACRO.

Symbol tables are two-column character type matrices with one single property scope being either global or local. MACRO variable names are stored in the 1st column, MACRO variable values are stored in the 2nd column.

MACRO symbol tables are stored in memory.

MACRO functions are processed in memory.

Parameter Scope

Since scope is the only property of MACRO variables, declaration is easy: Simply assign each variable used to one of these two groups.

However, there is a set of rules requiring your attention:

  • A particular variable name may appear in an unlimited number of tables.
  • MACROs may be nested to form unlimited invocation hierarchies.
  • A calling MACRO’s local table appears global to the called MACRO.
  • Read references to variables are performed 1st against the local table.
  • Write references are processed likewise: local 1st, global 2nd.
  • Write references not met in the invocation hierarchy generate a local variable.

Obviously variable declaration is a critical issue in a validated environment. If not done in total then the validation status of the whole system is questionable.

Extending Control

Now, with proper declaration, it is safe to run your MACRO as a component in a validated system. However, it is still difficult to follow its results and discover failure risks or un-wanted misbehavior.

You might therefore find it useful to add functionality such as:

  1. apply logic to check for parameters’ appropriate values
  2. navigate through the ecosystem by reading and processing metadata
  3. document workflow by writing comments to the LOG
  4. inform responsibles about invocation by sending an email
  5. write a text file that contains the plain code the MACRO generated

We will implement these requirements now step by step and thereby touch relevant parts of the so-called SAS Macro Facility.