Wednesday, September 21, 2011

Running DataStage from outside of DataStage

Another good article from Vincent McBurney :

This is a followup from comments on my parameter week post on 101 uses of job parameters. This post is about calling DataStage jobs and the range of job control options.

The go to command for interacting with DataStage from the command line or from scripts or from other products is the dsjob command. The documentation for dsjob is buried in the Server Job Developers Guide, it is cunningly placed there to keep Enterprise users, who would never think to read the Server Editon guide, in a state of perpetual bewilderment.

I was born in a state of bewilderment so I am in my zone. 

I am not going to go into the job control API or mobile device job control, refer to your documentation for those options! I will cover the more commonly used methods.

Sequence Jobs and the DataStage Director
The easiest out of the box job control comes from the DataStage Director product and the Sequence Job. The Sequence job puts jobs in the right order and passes them all a consistent set of job parameters. The DataStage Directory runs the Sequence job according to the defined schedule and lets the user set the job parameters at run time.

A lot of additional stages within the Sequence Job provide dynamic parameter setting, after job notification, conditional triggers to control job flow, looping, waiting for files and access to the DataStage BASIC programming language.

Third Party Scheduling and Scripting
DataStage comes with a scheduling tool, the Director. It provides a front end for viewing jobs, running jobs and looking at job log results. Under the covers it adds scheduled jobs to the operating system scheduler. The main advantage of it over a third party scheduling tool is the job run options screen that lets you enter job parameter values when you schedule the job.

In third party scheduling tools you need to set job parameters as you run the job in some type of scripting language. Jobs are executed by scheduling tools using the dsjob command. This command can require a lot of arguments so it is often run via a script or batch file.

The mother of all DataStage run scripts can be found in this dsxchange thread. Written by Ken Bland and Steve Boyce it will start jobs, set run time parameters from a parameter ini file, check the status of finished jobs, service your car, solve the Da Vinci code and run an audit process after the job has finished.

This script is run from a scheduling tool to make the setup of the scheduling easier. 

The mother of all job run scripts sets parameters that are saved in an ini file. Parameters can also be saved in a database table, with a job extracting the settings to an ini file before a batch run. 

They can also be stored as environment parameters in a users .profile file. These environment parameters can be passed into the job via a script or they can be accessed directly in the job by adding environment job parameters and setting the value to the magic word $ENV.

They can also be stored as project specific environment parameters as we saw during the exhilirating job parameter week, where we brought job parameters to life and struggled to come up with a good theme motto. These job parameters are much like environment parameters but use the magic word $PROJDEF.

Job Control Code and Old School DataStage
Old school DataStage programmers, those who know who Ardent are and remember the days when you only needed one Developer Guide, will be accomplished at writing job control code. This uses a BASIC programming language based on the Universe database code to prepare, execute and audit jobs.

The DataStage BASIC language has better access to jobs the operating system scripts. While an external script has to do everything through the dsjob and dsadmin commands the BASIC language has access to a much larger number of DataStage commands. Like dsjob these commands are cunningly hidden in the Server Job Developers Guide.

Before the days of sequence jobs, (DataStage 5?), and before sequence jobs became quite useful in version 7.5 this job control code was far more prevelent and easier to code then job control in external scripts. It was extremely useful at putting jobs in the right sequence, retrieving job parameters from files, checking the results of jobs and shelling out to execute operating system commands.

Job control code is still widely used even when external scripts or sequence jobs are in use. They fill in gaps of functionality by providing job auditing, setting dynamic calculated parameter values, checking for files etc etc etc. It is a very powerful language.

Also from the dsxchange forum we can find examples of job control code. This time from Arnd:
GetJobParameter(ParameterName)
EQUATE ProgramName TO 'GetJobParameter'
OPENSEQ 'ParameterFile' TO InFilePtr ELSE CALL DSLogFatal('Oh No, cannot open file',ProgramName)
Finished = 0
Ans = ''
READNEXT InRecord FROM InFilePtr ELSE Finished = 1
LOOP UNTIL Finished
FileParameterName = TRIM(FIELD(InRecord,'=',1))
FileParameterValue = TRIM(FIELD(InRecord,'=',2,99))
IF (FileParameterName=ParameterName)
THEN
Finished = 1
Ans = FileParameterValue
END
READNEXT InRecord FROM InFilePtr ELSE Finished = 1
REPEAT
IF NOT(Ans) THEN CALL DSLogFatal('Could not find value for "':ParameterName:'".',ProgramName)
CLOSESEQ InFilePtr 


What are you comfortable with?
People from a Unix background are most comfortable with Unix scheduling tools, .profile environment parameters and running and auditing of jobs from within Unix scripts using the dsjob command.

People from database backgrounds like have parameters in database tables and may even put an entire job schedule into a table with dependencies and sequencing. They need a bridge between the database and DataStage so they still need a layer of either Unix scripts or job control code to run the jobs.

People from programming backgrounds will be very comfortable with the DataStage BASIC programming language and find it can do just about anything regarding the starting, stopping and auditing of jobs. They can retrieve settings and parameters from files or databases.

The method I currently prefer is Sequence Jobs for all job dependencies, project specific environment variables for most slowly changing job parameters, some job control routines for job auditing and dynamic parameters and external operating system commands and a dsjob script for starting Sequence Jobs from a third party scheduling tool or from the command line.

What I like about project specific environment parameters is that the job can be called up from anywhere without requiring any parameter settings. It can be called up from within the Designer by developers, from ad hoc testing scripts by testers and from third party scheduling tools in production.
Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

6 comments:

  1. Thanks for providing the information on  DataStage Online training. Online training have the benefits of being convenient, flexible and on your own time.

    ReplyDelete
  2. I really enjoy the blog.Much thanks again. Really Great.
    Very informative article post. Really looking forward to read more. Will read on…


    oracle online training
    sap fico online training
    dotnet online training
    qa-qtp-software-testing-training-tutorial

    ReplyDelete
  3. I appreciate you sharing this article. Really thank you! Much obliged.
    This is one awesome blog article. Much thanks again.

    sap online training
    software online training
    sap sd online training
    hadoop online training
    sap-crm-online-training

    ReplyDelete
  4. recently i came your blog and have been read along..it's very interesting..we are giving datastage online training

    ReplyDelete
  5. This is a such a great help. You helped me a lot indeed and reading this your article I have found many new and useful information about this subject. Datastage online training

    ReplyDelete
  6. This is a such a great help. You helped me a lot indeed and reading this your article I have found many new and useful information about this subject. Datastage online training

    ReplyDelete