Data Step Programming (from Fraktal SAS Programming): Unterschied zwischen den Versionen

Aus phenixxenia.org
Zur Navigation springen Zur Suche springen
K
K
 
(4 dazwischenliegende Versionen desselben Benutzers werden nicht angezeigt)
Zeile 32: Zeile 32:
 
* A ''Data Step'' starts with a '''''"DATA"''''' statement, containing up to 32 names for datasets to be created in this step.
 
* A ''Data Step'' starts with a '''''"DATA"''''' statement, containing up to 32 names for datasets to be created in this step.
  
* When reading from an already existing data source in tabular format a '''''"SET"''''' statement is used accompanied with a particular table reference, that initiates looping over the records (or ''"observations"'' or ''"rows"'') in the data source, applying the ''DSL'' coded algorithm on each record.
+
* When reading from an already existing data source in  
 +
** text file format, an '''''"INFILE"''''' statement is used accompanied by a ''file reference'' that points to the text file, followed by an '''''"INPUT"''''' statement to code two-dimensional reading;
 +
** tabular format, a '''''"SET"''''' statement is used accompanied by concatenated ''library reference'' and table name, that initiates looping over the records (or ''"observations"'' or ''"rows"'') in the data source, applying the ''DSL'' coded algorithm on each record.
  
 
* When all processing is coded, a '''''"RUN"''''' statement is issued to invoke the '''''SAS''' DSL'' compiler that will perform a '''Compile and Go''' then, resulting in immediate results such as generated datasets and more.
 
* When all processing is coded, a '''''"RUN"''''' statement is issued to invoke the '''''SAS''' DSL'' compiler that will perform a '''Compile and Go''' then, resulting in immediate results such as generated datasets and more.
  
 
Being terminated with a ''RUN'' statement, each ''Data Step'' is called a '''''"Run Group"'''''.  
 
Being terminated with a ''RUN'' statement, each ''Data Step'' is called a '''''"Run Group"'''''.  
 +
 +
 +
=== Hello World ===
 +
 +
No surpise, the standard introduction to coding looks like '''''PL/1'' code''', except the '''''DATA''''' and '''''RUN''''' statements:
 +
 +
data _NULL_;
 +
  put 'hello world';
 +
run;
 +
 +
Not important here, but, once in a while quite useful, is the dataset name '''''"_null_"'''''. No dataset is created but statements are executed, which is solely '''''PUT''''' here.
 +
  
 
=== Documented Code Examples ===
 
=== Documented Code Examples ===
  
[[Read Text File with DSL (from Fraktal SAS Programming)|Read Text File]]
+
* [[Create Dataset with DSL (from Fraktal SAS Programming)|Create Dataset]]
[[Create Dataset with DSL (from Fraktal SAS Programming)|Create Dataset]]
+
* [[Read Text File with DSL (from Fraktal SAS Programming)|Read Text File]]
[[Process Data using DSL (from Fraktal SAS Programming)|Process Data]]
+
* [[Process Data using DSL (from Fraktal SAS Programming)|Process Data]]
  
 
{{SeitenNavigation1
 
{{SeitenNavigation1

Aktuelle Version vom 16. Juli 2014, 14:07 Uhr

Zurück

Übersicht

Vorwärts

What is this?

The triplex name "Data Step Programming" needs to be explained step-by-step:

  • DATA is the SAS technical term for values operated on.
  • STEP is the SAS conceptual name for a segment-wise oriented coding structure.
  • PROGRAMMING is the SAS term for coding a scripted (not compiled) algorithm.

Data Step Programming is done using the "Data Step Language" (DSL). The Data Step Language is a fully equipped 3rd generation language, modelled on IBM Corporation's "PL/1" called successor candidate for FORTRAN.

What does it do?

Simply speaking, SAS Data Step Programming processes one "observation" at a time when generating a "dataset". An observation is a data line, known as "row" to the rDBMS specialist coding SQL, that is derived from the punch card concept in pioneering ages of IT; hence, a dataset is a table made from observations that share a common structure.

Observations are processed in a one-line-register called "Program Data Vector" (PDV).

Generally speaking, each line of code in DSL applies some function to the PDV, the altered content of which is then written to the dataset generated, either implicitly or on explicitly stated order using an "OUTPUT" statement.

The Data Step

SAS code segments coded in DSL are called a Data Step.

  • A Data Step starts with a "DATA" statement, containing up to 32 names for datasets to be created in this step.
  • When reading from an already existing data source in
    • text file format, an "INFILE" statement is used accompanied by a file reference that points to the text file, followed by an "INPUT" statement to code two-dimensional reading;
    • tabular format, a "SET" statement is used accompanied by concatenated library reference and table name, that initiates looping over the records (or "observations" or "rows") in the data source, applying the DSL coded algorithm on each record.
  • When all processing is coded, a "RUN" statement is issued to invoke the SAS DSL compiler that will perform a Compile and Go then, resulting in immediate results such as generated datasets and more.

Being terminated with a RUN statement, each Data Step is called a "Run Group".


Hello World

No surpise, the standard introduction to coding looks like PL/1 code, except the DATA and RUN statements:

data _NULL_;
 put 'hello world';
run;

Not important here, but, once in a while quite useful, is the dataset name "_null_". No dataset is created but statements are executed, which is solely PUT here.


Documented Code Examples

Zurück

Übersicht

Vorwärts