Process Data using DSL (from Fraktal SAS Programming): Unterschied zwischen den Versionen

Version vom 14. Juli 2014, 16:22 Uhr

Inhaltsverzeichnis

1 General Remarks on table-formatted Data Repositories
2 Table-to-Table
3 Table-to-two-Table
4 Results

General Remarks on table-formatted Data Repositories

In 21st century data processing data are kept in databases managed by active server software already listed elsewhere in these guidelines. Hence, the most common scenario met by SAS coders is working on data which are already in tabular format. These are not as aged as, but quite near to the concept SAS uses for its proprietary table format known as SAS datasets. Having emerged in 1990, the SAS Multiple Engine Architecture provides a continuously growing number of means to seamlessly integrate data tables managed by 3rd party database servers. To the SAS programmer these means are known as engines that provide mainly read/write interaction with most data sources used in professional IT.

See chapter DBMS Interaction for technically more detailed information on this topic.

As a result from the forementioned situation we do not put emphasis on data server technology nor vendor when talking about processing data sources of tabular format. Everthing behind a library reference is assumed to be of compatible structure and, hence, treated equally throughout these guidelines.

Welcome to record-wise data processing!

Table-to-Table

Code executed	Function performed
data hotels_2;	Generate a new data table hotels_2
set hotels;	Use SET statement to read lines from data table hotels
run;	End data step run-group

This maximum simple table-to-table data step uses a SET statement to loop over the observation/row number to process each line from the source dataset and write the result to the target dataset. No OUTPUT statement is needed here since the data step automatically performs this function at the end of processing unless no explicit OUTPUT statement is used.

Since there's no processing performed here, dataset hotel_2 is a perfect copy of dataset hotel. To copy datasets as a whole SAS provides other means such as PROC COPY. Generating copies line-by-line is the most inefficient method to do this.

Table-to-two-Table

Code executed	Function performed
data random_lo random_hi;	*Generate new data tables random_lo and* random_hi**
set random;	Use SET statement to read lines from data table random
select (group); when('A','B','C','D') do;	*Branch over values of group* to partition random based on uniformly distributes noise**
if noise < 0.5 then output random_lo; else output random_hi;	*Write lower interval of equally distributed noise to random_lo* and vice versa**
end;	*Terminate uniform noise* branch**
otherwise do;	*Branch over remaining values of group* to partition random based on normally distributes noise**
if noise < 0 then output random_lo; else output random_hi;	*Write lower interval of normal distributed noise to random_lo* and vice versa**
end;	*Terminate normal noise* branch**
end;	Terminate branching over group
run;	End data step run-group

This maximum simple table-to-table data step uses a SET statement to loop over the observation/row number to process each line from the source dataset and write the result to the target dataset. No OUTPUT statement is needed here since the data step automatically performs this function at the end of processing unless no explicit OUTPUT statement is used.

Since there's no processing performed here, dataset hotel_2 is a perfect copy of dataset hotel. To copy datasets as a whole SAS provides other means such as PROC COPY. Generating copies line-by-line is the most inefficient method to do this.

Results

@@ Zeile 64: / Zeile 64: @@
 |
   select (group);
-  when('A','B','C','D')
+  when('A','B','C','D') do;
- else output random_hi;
+| '''Branch over values of ''group'' to partition ''random'' based on uniformly distributes noise'''
-| '''Write lower interval of equally distributed noise to ''random_lo'' and vice versa'''
 |-
 |
@@ Zeile 73: / Zeile 72: @@
   else output random_hi;
 | '''Write lower interval of equally distributed noise to ''random_lo'' and vice versa'''
+|-
+|
+ end;
+| '''Terminate ''uniform noise'' branch'''
+|-
+|
+ otherwise do;
+| '''Branch over remaining values of ''group'' to partition ''random'' based on normally distributes noise'''
 |-
 |
@@ Zeile 79: / Zeile 86: @@
   else output random_hi;
 | '''Write lower interval of normal distributed noise to ''random_lo'' and vice versa'''
+|-
+|
+ end;
+| '''Terminate ''normal noise'' branch'''
+|-
+|
+ end;
+| '''Terminate branching over ''group'''''
 |-
 |

Process Data using DSL (from Fraktal SAS Programming): Unterschied zwischen den Versionen

Version vom 14. Juli 2014, 16:22 Uhr

Inhaltsverzeichnis

General Remarks on table-formatted Data Repositories

Table-to-Table

Table-to-two-Table

Results

Navigationsmenü

Meine Werkzeuge

Namensräume

Varianten

Ansichten

Mehr

Suche

Navigation

Werkzeuge