Friday, February 3, 2012

Ten Reasons Why You Need DataStage 8.5

Source: it.toolbox.com - Vincent

I have taken a look through the new functions and capabilities of DataStage 8.5 and come up with a top ten list of why you should upgrade to it.
Information Server 8.5 came out a couple weeks ago and is currently available on IBM Passport Advantage for existing customers and from IBM PartnerWorld for IM partners.  The XML pack described below is available as a separate download from the IBM Fix Central website.
This is a list of the ten best things in DataStage 8.5.  Most of these are improvements in DataStage Parallel Jobs only while a couple of them will help Server Job customers as well.

1. It’s Faster

Faster, faster, faster.  A lot of tasks in DataStage 8.5 are at least 40% faster than 8.1 such as starting DataStage, opening a job, running a Parallel job and runtime performance have all improved.

2. It' is now an XML ETL Tool

Previous versions of DataStage were mediocre at processing XML.  DataStage 8.5 is a great XML processing tool.  It can open, understand and store XML schema files.  I did a longer post about just this pack in New Hierarchical Transformer makes DataStage great a XML Tool and if you have XML files without schemas you can follow a tip at the DataStage Real Time blog: The new XMLPack in 8.5….generating xsd’s….
The new XML read and transform stages are much better at reading large and complex XML files and processing them in parallel:
DataStage 8.5 XML Job

 

3. Transformer Looping

The best Transformer yet.  The DataStage 8.5 parallel transformer is the best version yet thanks to new functions for looping inside a transformer and performing transformations across a grouping of records.
With looping inside a Transformer you can output multiple rows for each input row.  In this example a record has a company name and four revenue sales figures for four regions – the loop will go through each column and output a row for each value if it is populated:
DataStage 8.5 Transformer Looping

Transformer Remembering
DataStage 8.5 Transformer has Remembering and key change detection which is something that ETL experts have been manually coding into DataStage for years using some well known workarounds.  A key change in a DataStage job involves a group of records with a shared key where you want to process that group as a type of array inside the overall recordset. 
I am going to make a longer post about that later but there are two new cache objects inside a Transformer – SaveInputRecord() and GetSavedInputRecord(0 where you can save a record and retrieve it later on to compare two or more records inside a Transformer. 
There are new system variables for looping and key change detection - @ITERATION, LastRow() indicates the last row in a job, LastTwoInGroup(InputColumn) indicates a particular column value will change in the next record.
Here is an aggregation example where rows are looped through and an aggregate row is written out when the key changes:DataStage 8.5 Transformer Aggregation

 

4. Easy to Install

Easier to install and more robust.  DataStage 8.5 has the best installer of any version of DataStage ever.  Mind you – I jumped aboard the DataStage train in version 3.6 so I cannot vouch for earlier installers but 8.5 has the best wizard, the best pre-requisite checking and the best recovery.  It also has the IBM Support Assistant packs for Information Server that make debugging and reporting of PMRs to IBM much easier.  There is also a Guide to Migrating to InfoSphere Information Serve 8.5 that explains how to migrate from most earlier versions.
See my earlier blog post Why Information Server 8.5 is Easier to Install than Information Server 8.1.
Patch Merge – that’s right, patch merge.  The new installer has the ability to merge patches and fixes into the install for easier management of patches and fixes.

 

5. Check In and Check Out Jobs

Check in and Check out version control.  DataStage 8.5 Manager comes with direct access to the source control functions of CVS and Rational ClearCase in an Eclipse workspace.  You can send artefacts to the source control system and replace a DataStage component from out of the source control system.
DataStage 8.5 Check In
DataStage 8.5 comes with out of the box menu integration with CVS and Rational ClearCase but for other source control systems you need to use the Eclipse source control plugins.

 

6. High Availability Easier than ever

High Availability – the version 8.5 installation guide has over thirty pages on Information Server topologies including a bunch of high availability scenarios across all tiers of the product.  On top of that there are new chapters for the high availability of the metadata repository, the services layer and the DataStage engine.
  • Horizontal and vertical scaling and load balancing.
  • Cluster support for WebSphere Application Server.
    • Cluster support for XMETA repository: DB2 HADR/Cluster or Oracle RAC.
    • Improved failover support on the engine.

 

7. New Information Architecture Diagramming Tool

InfoSphere Blueprint Direct – DataStage 8.5 comes with a free new product for creating diagrams of an information architecture and linking elements in the diagram directly into DataStage jobs and Metadata Workbench metadata.  Solution Architects can draw a diagram of a data integration solution including sources, Warehouses and repositories.
DataStage 8.5 Blueprint Director

8. Vertical Pivot

There are people out there who have been campaigning for vertical pivot for a long time – you know who you are!  It is now available and it can pivot multiple input rows with a common key into output rows with multiple columns.  Key based groups, columnar pivot and aggregate functions.
You can also do this type of vertical pivoting in the new Transformer using the column change detection and row cache – but the Vertical pivot stage makes it easier as a specialised stage.

9. Z/OS File Stage

Makes it easier to process complex flat files by providing native support for mainframe files.  Use it for VSAM files – KSDS, ESDS, RRDS.  Sequential QSAM, BDAM, BSAM.  Fixed and variable length records.  Single or multiple record type files.


DataStage 8.5 zOS File Stage

10.  Balanced Optimizer Comes Home

In DataStage 8.5 the Balanced Optimizer has been merged into the Designer and it has a number of usability improvements that turns DataStage into a better ETLT or ELT option.  Balanced Optimizer looks at a normal DataStage job and comes up with a version that pushes some of the steps down onto a source or target database engine.  IE it balances the load across the ETL engine and the database engines.
Version 8.5 has improved logging, improved impact analysis support and easier management of optimised versions of jobs in terms of creating, deleting, renaming, moving, compiling and deploying them.
DataStage 8.5 Balanced Optimizer

Wednesday, September 21, 2011

UNIX Script to execute DataStage job

From Kenneth Bland:

#!/bin/ksh
#######
################################################################################
#######
####### FILE: MasterControl.ksh
#######
####### DESCRIPTION: Starts a DataStage MasterControl type job passing
#######              all runtime parameter values
#######
#######
####### Date       Version   Developer      Description
####### ---------- --------- -------------- ------------------------------------
####### 2002-05-15 1.0       Ken Bland      Initial release
####### 2002-06-27 2.2       Ken Bland      FileSetDate/ProcessDate modifications
####### 2002-07-10 2.2       Steve Boyce    Added ProcessDate as 14th parameter
####### 2002-08-16 2.3       Steve Boyce    Now calls stored procedure
#######                                     GET_NEW_BATCH_NBR in datastage schema
#######                                     instead of deriving it and inserting
#######                                     here.
#######                                     Uses comSQLPlus.ksh and comPLSQL.ksh
#######                                     instead of SQLPlusStub.ksh.ksh
#######

PROG=`basename ${0}`
EXIT_STATUS=0

NOW=`date`
echo "${NOW} ${PROG} Initialization..."
echo

#######
####### CONFIGURATION ##########################################################
#######

if [ ${#} -ne 14 ]; then
   echo "${NOW} ${PROG} : Invalid parameter list."
   echo "${NOW} ${PROG} : The script needs 14 parameters:"
   echo "${NOW} ${PROG} :    JobName"
   echo "${NOW} ${PROG} :    ParameterFile"
   echo "${NOW} ${PROG} :    FileSetDate (YYYY-MM-DD)"
   echo "${NOW} ${PROG} :    BatchNumber"
   echo "${NOW} ${PROG} :    JobHierarchyFile"
   echo "${NOW} ${PROG} :    SourceSystemList"
   echo "${NOW} ${PROG} :    SubjectAreaList"
   echo "${NOW} ${PROG} :    ClearWorkArea"
   echo "${NOW} ${PROG} :    StartingMilestone"
   echo "${NOW} ${PROG} :    EndingMilestone"
   echo "${NOW} ${PROG} :    DebugMode"
   echo "${NOW} ${PROG} :    JobLinkStatisticChecksFile"
   echo "${NOW} ${PROG} :    ResurrectLogFile"
   echo "${NOW} ${PROG} :    ProcessDate (NULL|YYYY-MM-DD H24:MI:SS)"
   exit 99
fi

JobName="${1}"
ParameterFile="${2}"
FileSetDate="${3}"
BatchNumber="${4}"
JobHierarchyFile="${5}"
SourceSystemList="${6}"
SubjectAreaList="${7}"
ClearWorkArea="${8}"
StartingMilestone="${9}"
EndingMilestone="${10}"
DebugMode="${11}"
JobLinkStatisticChecksFile="${12}"
ResurrectLogFile="${13}"
ProcessDate="${14}"

echo "${NOW} ${PROG} JobName ${JobName}"
echo "${NOW} ${PROG} ParameterFile ${ParameterFile}"
echo "${NOW} ${PROG} FileSetDate ${FileSetDate}"
echo "${NOW} ${PROG} BatchNumber ${BatchNumber}"
echo "${NOW} ${PROG} JobHierarchyFile ${JobHierarchyFile}"
echo "${NOW} ${PROG} SourceSystemList ${SourceSystemList}"
echo "${NOW} ${PROG} SubjectAreaList ${SubjectAreaList}"
echo "${NOW} ${PROG} ClearWorkArea ${ClearWorkArea}"
echo "${NOW} ${PROG} StartingMilestone ${StartingMilestone}"
echo "${NOW} ${PROG} EndingMilestone ${EndingMilestone}"
echo "${NOW} ${PROG} DebugMode ${DebugMode}"
echo "${NOW} ${PROG} JobLinkStatisticChecksFile ${JobLinkStatisticChecksFile}"
echo "${NOW} ${PROG} ResurrectLogFile ${ResurrectLogFile}"
echo "${NOW} ${PROG} ProcessDate ${ProcessDate}"
echo

# Below will look in the parameters.ini file to determine the directory path each.
UserID=`whoami`
BinFileDirectory=`cat /.dshome`/bin
LogFileDirectory=`grep -w LogFileDirectory ${ParameterFile}|cut -d "=" -f2`
TempFileDirectory=`grep -w TempFileDirectory ${ParameterFile}|cut -d "=" -f2`
CommonScriptFileDirectory=`grep -w CommonScriptFileDirectory ${ParameterFile}|cut -d "=" -f2`
CommonLogFileDirectory=`grep -w CommonLogFileDirectory ${ParameterFile}|cut -d "=" -f2`
LogFileName=${CommonLogFileDirectory}/${PROG}_${JobName}.log
TEMPBATCHNBRLOG=${TempFileDirectory}/${PROG}_${JobName}_start.log
DATASTAGEPROJECT=`grep -w DATASTAGEPROJECT ${ParameterFile}|cut -d "=" -f2`
DSSERVER=`grep -w DSSERVER ${ParameterFile}|cut -d "=" -f2`
DSUSERID=`grep -w DSUSERID ${ParameterFile}|cut -d "=" -f2`
DSPASSWORD=`grep -w DSPASSWORD ${ParameterFile}|cut -d "=" -f2`

NOW=`date`
echo "${NOW} ${PROG} UserID ${UserID}"
echo "${NOW} ${PROG} BinFileDirectory ${BinFileDirectory}"
echo "${NOW} ${PROG} LogFileDirectory ${LogFileDirectory}"
echo "${NOW} ${PROG} TempFileDirectory ${TempFileDirectory}"
echo "${NOW} ${PROG} CommonScriptFileDirectory ${CommonScriptFileDirectory}"
echo "${NOW} ${PROG} CommonLogFileDirectory ${CommonLogFileDirectory}"
echo "${NOW} ${PROG} LogFileName ${LogFileName}"
echo "${NOW} ${PROG} TEMPBATCHNBRLOG ${TEMPBATCHNBRLOG}"
echo "${NOW} ${PROG} DATASTAGEPROJECT ${DATASTAGEPROJECT}"
echo "${NOW} ${PROG} DSSERVER ${DSSERVER}"
echo "${NOW} ${PROG} DSUSERID ${DSUSERID}"
echo "${NOW} ${PROG} DSPASSWORD *Protected*"
echo

#######
####### PARAMETER BUILD Without batch number ##################################
#######

if [ "${ProcessDate}" = "NULL" ]; then
   StartTimestamp=`date '+%Y-%m-%d %H:%M:%S'`
else
   StartTimestamp="${ProcessDate}"
fi
ParamList=" -param ParameterFile=${ParameterFile}"
ParamList="${ParamList} -param ProcessDate=\"${StartTimestamp}\""
ParamList="${ParamList} -param FileSetDate=${FileSetDate}"
ParamList="${ParamList} -param JobHierarchyFile=${JobHierarchyFile}"
ParamList="${ParamList} -param SourceSystemList=${SourceSystemList}"
ParamList="${ParamList} -param SubjectAreaList=${SubjectAreaList}"
ParamList="${ParamList} -param ClearWorkArea=${ClearWorkArea}"
ParamList="${ParamList} -param StartingMilestone=${StartingMilestone}"
ParamList="${ParamList} -param EndingMilestone=${EndingMilestone}"
ParamList="${ParamList} -param DebugMode=${DebugMode}"
ParamList="${ParamList} -param JobLinkStatisticChecksFile=${JobLinkStatisticChecksFile}"
ParamList="${ParamList} -param ResurrectLogFile=${ResurrectLogFile}"

#######
####### Get Batch Number and create ETL_BATCH_AUDIT record #####################
#######

echo "${NOW} ${PROG} About to get new BATCH_NBR and insert it into ETL_BATCH_AUDIT..."
${CommonScriptFileDirectory}/comPLSQL.ksh ${ParameterFile} "IRDSN" "IRUserID" \
                                           GET_NEW_BATCH_NBR \
                                              "${JobName}" \
                                              "${StartTimestamp}" \
                                              "${UserID}" \
                                              "${SourceSystemList}" \
                                              "${SubjectAreaList}" \
                                              "${ParamList}" \
                                              "${FileSetDate}" > ${TEMPBATCHNBRLOG}
SQL_EXIT_STATUS=$?
cat ${TEMPBATCHNBRLOG}
if [ "${SQL_EXIT_STATUS}" != 0 ]; then
   NOW=`date`
   echo "${NOW} ${PROG} Failure to connect/insert into ETL_Batch_Audit table!"
   exit ${SQL_EXIT_STATUS}
fi

#######
####### Get BATCH_NBR from batch number log file ##############################
#######
BatchNumber=`grep -w BATCH_NBR ${TEMPBATCHNBRLOG}|cut -d "=" -f2`
if [ -z "${BatchNumber}" ]; then
   NOW=`date`
   echo "${NOW} ${PROG} Failure to retrieve BATCH_NBR from ${TEMPBATCHNBRLOG}"
   exit ${SQL_EXIT_STATUS}
fi

#######
####### Add batch number to list of parameters #################################
#######
ParamList="${ParamList} -param BatchNumber=${BatchNumber}"

NOW=`date`
echo
echo ${NOW} ${PROG} Parameter list: ${ParamList}
echo

#######
####### DataStage EXECUTION ####################################################
#######

NOW=`date`
echo "${NOW} ${PROG} Executing DataStage dsjob program..."

echo ${BinFileDirectory}/dsjob -server ${DSSERVER} -user ${DSUSERID} -password ${DSPASSWORD} -run -wait ${ParamList} ${DATASTAGEPROJECT} ${JobName} 2>&1 > ${LogFileName}
echo
echo "${BinFileDirectory}/dsjob -server ${DSSERVER} -user ${DSUSERID} -password ${DSPASSWORD} -run -wait ${ParamList} ${DATASTAGEPROJECT} ${JobName} 2>&1 > ${LogFileName}"
echo

eval ${BinFileDirectory}/dsjob -server ${DSSERVER} -user ${DSUSERID} -password ${DSPASSWORD} -run -wait ${ParamList} ${DATASTAGEPROJECT} ${JobName} 2>&1 >> ${LogFileName}

jobwaiting=`grep "Waiting for job..." ${LogFileName}`
if [ "${jobwaiting}" != "Waiting for job..." ]; then
   NOW=`date`
   echo ${NOW} ${PROG} "DataStage failed to start the job"
   failedstart=1
else
   NOW=`date`
   echo ${NOW} ${PROG} "DataStage successfully started the job"
   failedstart=0
fi
NOW=`date`
echo ${NOW} ${PROG} "Retrieving job information"

${BinFileDirectory}/dsjob -server ${DSSERVER} -user ${DSUSERID} -password ${DSPASSWORD} -jobinfo ${DATASTAGEPROJECT} ${JobName} >> ${LogFileName}

#######
####### CHECK STATUS ###########################################################
#######

ERROR=`grep "Job Status" ${LogFileName}`
ERROR=${ERROR##*\(}
ERROR=${ERROR%%\)*}

if [ "${failedstart}" != 0 ]; then
   NOW=`date`
   echo ${NOW} ${PROG} "The job failed to start"
   AuditStatus="FAILURE"
   Comments="MasterControl aborted"
   EXIT_STATUS=1
else
   if [ "${ERROR}" = 1 -o "${ERROR}" = 2 ]; then
      NOW=`date`
      echo ${NOW} ${PROG} "The job completed successfully"
      AuditStatus="SUCCESS"
      Comments=""
      EXIT_STATUS=0
   else
      NOW=`date`
      echo ${NOW} ${PROG} "The job aborted"
      AuditStatus="FAILURE"
      Comments="MasterControl aborted"
      EXIT_STATUS=1
   fi
fi

FailedJobCount=`grep -i FAILED ${LogFileDirectory}/${JobName}.log|wc -l|cut -b1-9`
FailedJobCount=`expr ${FailedJobCount} + 0`
echo ${NOW} ${PROG} The number of failed jobs is [${FailedJobCount}]

if [ "${FailedJobCount}" != 0 ]; then
   NOW=`date`
   echo ${NOW} ${PROG} "The job had failed processes"
   AuditStatus="FAILURE"
   Comments="MasterControl had ${FailedJobCount} failed processes"
   EXIT_STATUS=1
fi
StoppedJobStreamCount=`grep "JOB STREAM STOPPED" ${LogFileDirectory}/${JobName}.his|wc -l|cut -b1-9`
StoppedJobStreamCount=`expr ${StoppedJobStreamCount} + 0`
if [ "${StoppedJobStreamCount}" != 0 ]; then
   NOW=`date`
   echo ${NOW} ${PROG} "The job stream was STOPped or KILLed"
   AuditStatus="FAILURE"
   Comments="MasterControl job stream was STOPped or KILLed"
   EXIT_STATUS=1
fi

#######
####### AUDIT ##################################################################
#######

echo
echo "${NOW} ${PROG} About to update ETL_BATCH_AUDIT with status information..."
EndTimestamp=`date '+%Y-%m-%d %H:%M:%S'`

SQLstring="UPDATE ETL_BATCH_AUDIT A \
              SET A.END_TIMESTAMP = TO_DATE('${EndTimestamp}','YYYY-MM-DD HH24:MI:SS'), \
                         A.STATUS = '${AuditStatus}', \
                       A.COMMENTS = '${Comments}', \
                A.RUNTIMESETTINGS = '${ParamList}' \
            WHERE (A.BATCH_NBR = ${BatchNumber});"

NOW=`date`
echo ${NOW} ${PROG} Audit SQL ${SQLstring}
SQLScriptFileName=${TempFileDirectory}/${PROG}_${JobName}_end.sql
echo ${SQLstring} > ${SQLScriptFileName}

${CommonScriptFileDirectory}/comSQLPlus.ksh ${ParameterFile} IRDSN IRUserID ${SQLScriptFileName}
SQL_EXIT_STATUS=$?
if [ "${SQL_EXIT_STATUS}" != 0 ]; then
   NOW=`date`
   echo ${NOW} ${PROG} Failure to connect/update into ETL_Batch_Audit table!
   exit ${SQL_EXIT_STATUS}
fi

#######
####### EXIT ###################################################################
#######

NOW=`date`
echo ${NOW} ${PROG} Complete, exiting with status [${EXIT_STATUS}]
exit ${EXIT_STATUS}

Running DataStage from outside of DataStage

Another good article from Vincent McBurney :

This is a followup from comments on my parameter week post on 101 uses of job parameters. This post is about calling DataStage jobs and the range of job control options.

The go to command for interacting with DataStage from the command line or from scripts or from other products is the dsjob command. The documentation for dsjob is buried in the Server Job Developers Guide, it is cunningly placed there to keep Enterprise users, who would never think to read the Server Editon guide, in a state of perpetual bewilderment.

I was born in a state of bewilderment so I am in my zone. 

I am not going to go into the job control API or mobile device job control, refer to your documentation for those options! I will cover the more commonly used methods.

Sequence Jobs and the DataStage Director
The easiest out of the box job control comes from the DataStage Director product and the Sequence Job. The Sequence job puts jobs in the right order and passes them all a consistent set of job parameters. The DataStage Directory runs the Sequence job according to the defined schedule and lets the user set the job parameters at run time.

A lot of additional stages within the Sequence Job provide dynamic parameter setting, after job notification, conditional triggers to control job flow, looping, waiting for files and access to the DataStage BASIC programming language.

Third Party Scheduling and Scripting
DataStage comes with a scheduling tool, the Director. It provides a front end for viewing jobs, running jobs and looking at job log results. Under the covers it adds scheduled jobs to the operating system scheduler. The main advantage of it over a third party scheduling tool is the job run options screen that lets you enter job parameter values when you schedule the job.

In third party scheduling tools you need to set job parameters as you run the job in some type of scripting language. Jobs are executed by scheduling tools using the dsjob command. This command can require a lot of arguments so it is often run via a script or batch file.

The mother of all DataStage run scripts can be found in this dsxchange thread. Written by Ken Bland and Steve Boyce it will start jobs, set run time parameters from a parameter ini file, check the status of finished jobs, service your car, solve the Da Vinci code and run an audit process after the job has finished.

This script is run from a scheduling tool to make the setup of the scheduling easier. 

The mother of all job run scripts sets parameters that are saved in an ini file. Parameters can also be saved in a database table, with a job extracting the settings to an ini file before a batch run. 

They can also be stored as environment parameters in a users .profile file. These environment parameters can be passed into the job via a script or they can be accessed directly in the job by adding environment job parameters and setting the value to the magic word $ENV.

They can also be stored as project specific environment parameters as we saw during the exhilirating job parameter week, where we brought job parameters to life and struggled to come up with a good theme motto. These job parameters are much like environment parameters but use the magic word $PROJDEF.

Job Control Code and Old School DataStage
Old school DataStage programmers, those who know who Ardent are and remember the days when you only needed one Developer Guide, will be accomplished at writing job control code. This uses a BASIC programming language based on the Universe database code to prepare, execute and audit jobs.

The DataStage BASIC language has better access to jobs the operating system scripts. While an external script has to do everything through the dsjob and dsadmin commands the BASIC language has access to a much larger number of DataStage commands. Like dsjob these commands are cunningly hidden in the Server Job Developers Guide.

Before the days of sequence jobs, (DataStage 5?), and before sequence jobs became quite useful in version 7.5 this job control code was far more prevelent and easier to code then job control in external scripts. It was extremely useful at putting jobs in the right sequence, retrieving job parameters from files, checking the results of jobs and shelling out to execute operating system commands.

Job control code is still widely used even when external scripts or sequence jobs are in use. They fill in gaps of functionality by providing job auditing, setting dynamic calculated parameter values, checking for files etc etc etc. It is a very powerful language.

Also from the dsxchange forum we can find examples of job control code. This time from Arnd:
GetJobParameter(ParameterName)
EQUATE ProgramName TO 'GetJobParameter'
OPENSEQ 'ParameterFile' TO InFilePtr ELSE CALL DSLogFatal('Oh No, cannot open file',ProgramName)
Finished = 0
Ans = ''
READNEXT InRecord FROM InFilePtr ELSE Finished = 1
LOOP UNTIL Finished
FileParameterName = TRIM(FIELD(InRecord,'=',1))
FileParameterValue = TRIM(FIELD(InRecord,'=',2,99))
IF (FileParameterName=ParameterName)
THEN
Finished = 1
Ans = FileParameterValue
END
READNEXT InRecord FROM InFilePtr ELSE Finished = 1
REPEAT
IF NOT(Ans) THEN CALL DSLogFatal('Could not find value for "':ParameterName:'".',ProgramName)
CLOSESEQ InFilePtr 


What are you comfortable with?
People from a Unix background are most comfortable with Unix scheduling tools, .profile environment parameters and running and auditing of jobs from within Unix scripts using the dsjob command.

People from database backgrounds like have parameters in database tables and may even put an entire job schedule into a table with dependencies and sequencing. They need a bridge between the database and DataStage so they still need a layer of either Unix scripts or job control code to run the jobs.

People from programming backgrounds will be very comfortable with the DataStage BASIC programming language and find it can do just about anything regarding the starting, stopping and auditing of jobs. They can retrieve settings and parameters from files or databases.

The method I currently prefer is Sequence Jobs for all job dependencies, project specific environment variables for most slowly changing job parameters, some job control routines for job auditing and dynamic parameters and external operating system commands and a dsjob script for starting Sequence Jobs from a third party scheduling tool or from the command line.

What I like about project specific environment parameters is that the job can be called up from anywhere without requiring any parameter settings. It can be called up from within the Designer by developers, from ad hoc testing scripts by testers and from third party scheduling tools in production.
Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

Thursday, September 15, 2011

IBM WebSphere DataStage known problems and workarounds

Source : IBM Public Library

The currently known limitations, problems, and workarounds for WebSphere® DataStage®.
At time of publication, the following information describes the currently known limitations, problems, and workarounds for WebSphere DataStage. Any limitations and restrictions might or might not apply to other releases of the product.
Starting WebSphere DataStage clients fails when the Java™ Virtual Machine is not loaded (eCase 131388)
WebSphere DataStage clients do not start when the Java Virtual Machine is not loaded. An error message that begins with the following text is displayed:
DataStage Repository Error: Unable to create COM ASB services
A common cause for this error is that the Java Virtual Machine is not able to allocate the memory that it is initially configured to use.
Workaround
Change the memory allocation of the JVM by editing the Information_Server_install_dir\ASBNode\conf\proxy.xml file on the client system. Reduce the values of the InitialHeapSize and MaximumHeapSize settings, save the file, and restart IBM® Information Server console.
Connectivity requires binding the executable to DB2® to use DB2Z in WebSphere DataStage (eCase 126562)
After the IBM Information Server installation is complete, perform these post installation commands to use DB2Z in DataStage.
  • On the server layer, type these commands to bind the executable to DB2:
  • db2 connect to dbname user dbuser using dbpassword
    db2 bind db2zesql.bnd datetime iso blocking all
    db2 terminate
  • Use the DataStage Administrator on the WebSphere DataStage client layer to add sqllib/lib to the LIBPATH.
XML report is not generated with the enablegeneratexml option (eCase 126768)
Operating systems: All
The XML report is not generated with the following command:
dsadmin -enablegeneratexml TRUE project_name
Workaround
Contact IBM customer support and request client side patch 126768.
Datasets from previous versions are read-only after upgrading WebSphere DataStage (eCase 118650)
Operating systems: All
If you upgrade WebSphere DataStage to any other version, you cannot modify data sets that were produced before the upgrade. Parallel datasets created in earlier versions of WebSphere DataStage can be read and deleted. No other operations, such as appending or overwriting, are supported.
Workaround
Full compatibility is available only between releases of the same version.
OCI8TO9.B utility is not supported (eCase 125155)
Operating systems: All
The WebSphere DataStage Connectivity Guide for Oracle Databases incorrectly states that the OCI8TO9.B utility can be run on WebSphere DataStage release 6.0 and later. The OCI8TO9.B utility is supported for IBM Information Server versions up to version 7.5.3.
Workaround
If you need to convert an Oracle 8 Project to Oracle 9 or 10, use the OCI8TO9.B utility prior to installing IBM Information Server 8.0.1.
Documentation incorrectly lists a Windows® directory for the UNIX®, Linux® user privileges for running parallel jobs (eCase 125997)
Operating systems: UNIX, Linux
The IBM Information Server Installation, Configuration, and Planning Guide incorrectly includes the /ishome/Server/Datasets /ishome/Server/DSEngine/uvtemp in the list of directories that require read, write, and modify access to run parallel jobs.
Workaround
The correct list of UNIX and Linux directories that require read, write, and modify user access privileges is:
  • /ishome/Server/Scratch
  • /ishome/Server/Datasets
  • /tmp
Existing WebSphere DataStage Japanese projects are not visible in the Project tab after IBM Information Server 8.0.1 is installed (eCase 111937)
WebSphere DataStage projects with Japanese messages are not supported.
When using the Teradata connector, the PrimaryKey/PrimaryIndex attribute is not imported (eCase 126375)
When using the Teradata connector, the PrimaryKey/PrimaryIndex attribute is not imported when the Share metadata when importing from Connector is disabled.
Workaround
Use another table import option, such as Plug-in Metadata Definitions or Orchestrate® Schema Definitions.
Teradata connector jobs with Load Type = Stream option end with error (eCase 125768)
Teradata connector jobs with Load Type = Stream option end with the following error:
received signal SIGBUS
Workaround
Use one of the other load methods, or use the Teradata MultiLoad plug-in with the TPump option.
  • When you specify the sort option in the Aggregator stage and the data to be sorted is not in order, the job ends without displaying an error. (eCase 120088)
  • WebSphere DataStage WAVES jobs do not run on SUSE Linux for PowerPC®. (eCase 120082)
  • In the DataStage parallel canvas, Java Pack jobs fail with the null pointer error. (eCase 120773)
  • Models that are created by the resource estimation tool are not automatically updated when you modify a job, even if you recompile the job. If you request a projection based on an existing model after a job is modified, the system might generate a runtime error. Click the Continue button to navigate to other models. (eCase 101786)
  • WebSphere DataStage parallel jobs give mapping errors for these characters: 301C, 2014, 2016, 2212, or 00A6. If you require mapping for any of these characters, select the 1999 mapping. (eCase 119793)
  • The dssearch command line utility is not supported.
  • In a Flat File stage job, you must specify a complete path name for the surrogate key state file. If you only specify the name of the surrogate key state file, the job fails with a permission error. (eCase 120830)
  • The dsjob -import option is not supported. (eCase 94401)
  • The WebSphere DataStage version control component is not supported.
  • Released jobs are not supported.
  • Limitation on support for the Turkish locale. For the WebSphere DataStage server canvas, a memory management problem occurs during casing operations when the CType locale category name is set to TR-TURKISH. Do not use this setting. For the WebSphere DataStage parallel canvas, locale support is available only for collation operations. Locale-based case operations are not supported for any locale. For casing operations in the Turkish locale, you cannot set the CType locale category name to tr_TR for parallel jobs. The special casing rules for Turkish letters dotted-I and dotless-I cannot be enforced.
  • The automatic stage validation functionality in the Designer client (enabled by the Show stage validation errors button on the Toolbar) has been dropped.
  • ClickPack is no longer supported. Existing jobs are not supported at runtime.
  • XML Pack 1.0 is no longer supported. Users of XML Pack 1.0 must migrate to XML Pack 2.0. Information about compatibility with earlier versions is provided below:
    • XML Pack 2.0 is not compatible with XML Pack 1.0. Consequently, table definitions that are created by the previous version of the XML Pack cannot be reused with the new version. Create new table definitions by using the new XML Meta Data Importer (installed by default).
    • Automatic migration between XML Pack 1.0 and XML Pack 2.0 is not provided because each pack supports a different type of XML metadata. The 1.0 Pack supports XML DTDs, an XML legacy definition, while the 2.0 Pack supports XSDs. Use a third party tool to generate an XSD compliant definition from your existing DTD.
  • GB18030 restrictions in version 8.0.1 (eCases 105680, 105675, 107838, and 107609) GB18030 is a standard for encoding Chinese character data. The DataStage server runtime environment processes a subset of the full phase 1 GB18030 characters. The DataStage parallel runtime environment supports the full range of required phase 1 GB18030 characters.
    The DataStage and QualityStage™ clients (Designer, Director, and Administrator) include a large number of UI components that cannot handle GB18030 character data. The unsupported character data is incorrectly mapped within these controls. These UI controls can only handle characters that are supported in the Microsoft® 936 (Simplified Chinese) code page and all other characters are mapped to ? (question mark) characters. As a result, the original character encoding is lost.
    This limitation affects a large part of the design environment (Designer), including stage property editors, the design canvas (naming of stages and links), and other windows and dialog boxes. Because of this limitation, unsupported characters cannot be entered or used in any programming logic that relies on values entered via the UI. For example, you cannot use the unsupported characters in a literal string in a transformer expression. Therefore, you cannot compare a data field in a row being processed at runtime, with a user-entered literal in the unsupported range. DataStage Director and Administrator client applications are also similarly affected.
  • Possible failure connecting DataStage clients to a remote WebSphere Metadata Server or remote DataStage server The DataStage components can be installed across different systems in different subnets or networks. For example, a site might install the WebSphere Metadata Server on one dedicated server and the DataStage runtime server on a different dedicated server. The DataStage clients can be installed on desktop workstations in a different subnet from the servers. For a DataStage client to connect to the WebSphere Metadata Server or the DataStage runtime server, the client system must be able to resolve the host names of these servers.
    The DataStage runtime server must be able to resolve the host name of WebSphere Metadata Server. DataStage uses unqualified host names (no domain). If the DataStage and QualityStage client is unable to address the servers by the unqualified host name (host short name), the domain name of each server must be added as a DNS suffix to the client's TCP/IP properties settings in the network configuration of the DataStage and QualityStage client. In some network configurations, the DNS server might not be able to resolve the host name to the IP address of the server computer (because each computer is in a different subnet or network, or they are just unknown to the DNS server) causing the DataStage connection to fail. In these configurations, you must modify the hosts file on the DataStage runtime server computer and every DataStage and QualityStage client computer to complete name resolution successfully.
    1. Modifying the DNS suffix settings:
      1. Open the Network Connection properties.
        • Go to Control Panel > Network Connections.
        • Right click the LAN connection you use and click Properties.
      2. Select Internet Protocol (TCP/IP) and click Properties.
      3. In the Internet Protocol (TCP/IP) Properties window, click Advanced under the General tab.
      4. In the Advanced TCP/IP Settings window, click the DNS tab.
      5. Select Append these DNS suffixes (in order) option and click Add.
      6. Type the domain suffix and click Add. Add additional suffixes, if different, for the remaining servers. If any of the suffixes are subsets of other suffixes in the list, use the arrow buttons to order these so that the longest ones are above the shorter ones.
      7. Click OK in each open window and close the Network Connections properties window.
    2. Modifying the hosts file:
      1. You must add entries for both the WebSphere Metadata Server computer and the DataStage runtime server computer to the hosts file. The reason is that DataStage and QualityStage client computers must be able to resolve host names for both the WebSphere Metadata Server computer and the DataStage runtime server computer. The below example shows the entries that must be added to the \Windows\system32\drivers\etc\hosts file of each client computer:
        <IP Address> <Name of MetaDataServer>
        For example:
         193.200.200.1 MetaDataServer
        
        <IP Address> <Name of DSRuntimeServer>
        For example:
         193.200.200.2 DSRuntimeServer
      2. The DataStage runtime server computer must be able to resolve the host name of the Metadata server computer. The following entry must be added to the \Windows\system32\drivers\etc\hosts file on the DataStage runtime server computer:
        <IP Address> <Name of MetaDataServer>
        For example:
         193.200.200.1 MetaDataServer
      3. In some network configurations, a computer might be known by different names, such as local name and the name listed in the DNS server for that computer. In this case, you must include both host names in the hosts file. Below is an example of such entries:
        <IP Address> <Name of MetaDataServerDNS>
        For example:
        193.200.200.1 MetaDataServerDNS
        
        <IP Address> <Name of MetaDataServerLocal>
        For example:
        193.200.200.1 MetaDataServerLocal
        
        <IP Address> <Name of DSRuntimeServerDNS>
        For example:
        193.200.200.2 DSRuntimeServerDNS
        
        <IP Address> <Name of DSRuntimeServerLocal>
        For example:
        193.200.200.2 DSRuntimeServerLocal
  • Oracle direct path load compatibility
    It is not possible to perform an Oracle direct path load using an Oracle 10.2 client to an Oracle 9 server. Starting with Oracle 9i, the client version must be the same as or earlier than the server version. If you upgrade from an earlier version of WebSphere DataStage and have jobs that use the Oracle Enterprise stage or the Oracle OCI Load plug-in stage, these jobs might not work correctly for a direct path load unless the Oracle client and server version requirements are met. An alternative is not to use the direct path load feature. For the Oracle Enterprise stage, configure the APT_ORACLE_LOAD_OPTIONS environment variable, for example:
    APT_ORACLE_LOAD_OPTIONS= 'OPTIONS(DIRECT=FALSE,PARALLEL=TRUE)'
  • Multi-Client Manager only supported on DataStage and QualityStage client installations The Multi-Client Manager can switch between versions 7 and 8 of the DataStage and QualityStage client on a single system. As such, the Multi-Client Manager does not expect a DataStage server to be installed on the same computer as the clients. The server must be on a separate computer. If you use the Multi-Client Manager on a computer that has both client and server software installed, the Multi-Client Manager will not switch between clients correctly.
  • Data type restrictions in SQL Builder
    The following functions are not supported:
    Oracle
    • SUBSTR2
    • SUBSTR4
    • NCHAR
    • LENGTH2
    • LENGTH4
    • INSTR2
    • INSTR4
    • CAST
    • NEW_TIME
    • RPAD
    • MONTHS_BETWEEN
    • Functions having an OVER clause
    Teradata Enterprise stage
    • EXTRACT
    • OCTET_LENGTH
    • CURRENT_DATE
    • CURRENT_TIME
    • CURRENT_TIMESTAMP
    Sybase Enterprise stage
    • CAST
    • CORR
    • SUBSTRING
    • CURRENT_DATE
    • CURRENT_TIME
    • CURRENT_TIMESTAMP
    ODBC Enterprise stage
    • CURRENT_DATE
    • CURRENT_TIME
    • CURRENT_TIMESTAMP

  • CONVERT and CORR functions in SQL Builder for SQL Server Enterprise stage are not supported. If you try to use these functions, the "The row is invalid" error is generated. (eCase 116121)

  • The WebSphere TX Map stage fails when used with the Java class adapter. (eCases 111030 and 111160)

WebSphere DataStage general operation
  • When you install the WebSphere MQ plug-in in the IBM Information Server console mode, the navigation controls are not available. You must use numbers to use the corresponding options. Press 1 for Next, 2 for Previous, 3 for Cancel, and 5 to Redisplay. (eCase 120401)
  • When using WebSphere MQ CC 6.0, you might encounter an error when creating a queue manager. If you encounter this error, apply Fix Pack 6.0.2 for WebSphere MQ. (eCase 119818)
  • The default Chinese locale of zh_CN.EUC is not supported. You must change the locale setting to use the zh_CN.GBK locale setting. (eCase 117549)
  • Problem displaying parts of the Advanced Find window or the expanded repository view in the Designer client on some systems
    Symptoms include missing column headings in the Advanced Find window and missing object names in the expanded repository view in the Designer client.
    Workaround
    • Upgrade the graphics drivers to the newest versions.
    • Reduce the graphics hardware acceleration midway that will disable all DirectDraw and Direct3D accelerations, as well as all cursor and advanced drawing accelerations. Select Display Properties > Advanced > Troubleshoot tab on the client system
    (eCase 83211)
  • Problems can occur if you import or create table definitions from a shared table in which the database, schema, table, or column names include characters that do not conform to DataStage naming constraints (eCase 93585):
    • First character must be an underscore ( _ ) or alpha (A_Z).
    • Subsequent characters can be underscore, alphanumeric, $ (dollar sign), or period.
    • Do not use the following characters: | (vertical bar), # (number sign), / (forward slash) and quotation marks.
  • There is a problem importing DataStage export files in XML format that contain parameter sets.
    Workaround
    Export these objects using the .dsx format. (eCase 115457)

  • If running in a single-node configuration, a job might hang or fail to run with the error "A cycle was found in the combined graph. Failed describing Operator Combinations."
    Workaround
    Set the APT_NO_ONE_NODE_COMBINING_OPTIMIZATION environment variable. (eCase 116847)

  • ITAG is supported in DataStage version 8.0.1. However, when you log into DataStage and QualityStage Administrator for the first time, you must manually specify the DataStage server port number for the tagged instance in conjunction with the server name in this form: "HostName:PortNumber". (eCase 115789)

  • When the DataStage ODBC stage is used as a lookup, the parameters are bound in the order that the keys are defined in the generated SQL. If you want to reorder the key positions, for example to have better performance, you need to use user-defined SQL. (eCase 100695)
    Workaround
    Specify the user-defined environment variable ODBCBindingOrder = 1. If ODBCBindingOrder = 1 is not defined, the parameters are bound according to the generated SQL. For example, a job using an ODBC stage lookup processes four columns, two of which are keys with order DB2ADMIN.A1.COL1, DB2ADMIN.A1.UCN. The generated SQL is as follows:
    SELECT DB2ADMIN.A1.COL1, DB2ADMIN.A1.COL2, DB2ADMIN.A1.UCN, 
    DB2ADMIN.A2.R_UCN FROM A1, A2  
    WHERE (DB2ADMIN.A1.COL1 = ? AND DB2ADMIN.A1.UCN = ?);
    The user-defined SQL (with the order of the keys switched by the user) is as follows:
    SELECT DB2ADMIN.A1.COL1, DB2ADMIN.A1.COL2, DB2ADMIN.A1.UCN,  
    DB2ADMIN.A2.R_UCN FROM A1, A2 
    WHERE  (DB2ADMIN.A1.UCN = ? AND DB2ADMIN.A1.COL1 = ? );
    To run the user-defined SQL, add the ODBCBindingOrder environment variable, and set the value to 1.

  • On the reference link of the ODBC stage, a LEFT OUTER JOIN SQL statement with parameter markers is not supported. (e85505). For example, the following SQL does not work:
    SELECT testdb.dbo.Table2.f1, testdb.dbo.Table2.f2, 
    testdb.dbo.Table2.f4 FROM testdb.dbo.Table2 
    LEFT OUTER JOIN testdb.dbo.Table1 on 
    (testdb.dbo.Table2.f1 = ? 
    AND testdb.dbo.Table2.f2 = ?);
    Workaround
    A LEFT OUTER JOIN SQL statement without parameter marker can be used. For example:
    SELECT testdb.dbo.Table2.f1, testdb.dbo.Table2.f2, 
    testdb.dbo.Table2.f4 
    FROM testdb.dbo.Table2 LEFT OUTER JOIN testdb.dbo.Table1 
    on (testdb.dbo.Table2.f1 = 
    testdb.dbo.Table1.f1 AND testdb.dbo.Table2.f2 = 
    testdb.dbo.Table1.f2)

  • Problems using the scheduler on systems with languages other than English
    If you run DataStage on a system with a language other than English, you might encounter problems when scheduling jobs to run on specific days of the week.
    Workaround
    Localize the days of the week for each project. (The AT® command, which performs the Windows scheduling, accepts day names only in the local language.) To localize the day names:
    1. Go to the project directory for your first project. This directory is on the DataStage server, by default in folder \IBM\InformationServer\Server\Projects.
    2. Edit the DSParams file in a text editor.
    3. Add the localized days of the week to the end of the file. The following is an example of what you might add for a French system:
      [SCHEDULER]  
      MONDAY=L  
      TUESDAY=M  
      WEDNESDAY=ME  
      THURSDAY=J  
      FRIDAY=V  
      SATURDAY=S  
      SUNDAY=D 
      You might need to experiment with which day names the local AT command accepts. If in doubt, enter the full name (for example, LUNDI, MARDI, and so on).
    4. Repeat the process for each of your projects.
    You might receive an error message stating that there are no entries in the list when you use the scheduler on a system with languages other than English. This message is output by the AT command and passed on by the Director client. To prevent this message from being displayed:
    1. Identify a unique part of the message that the AT command outputs (for example, est vide in French).
    2. For each project, add the following line to its DSParams file:
      NO ENTRIES=est vide 
    The AT command typically accepts keywords other than the days of the week in English. If your system does not, you can add localized versions of the additional keywords NEXT, EVERY, and DELETE to your projects as follows:
    1. Edit the DSParams file for each project.
    2. Add a line of the form:
      KEYWORD=localized_keyword 
      For example: NEXT=Proxima

  • Incorrect number of rows displayed for parallel Complex Flat File stage When you draw multiple output links for the Complex Flat File stage on the parallel canvas, the number of rows that are shown as processed is not depicted correctly. Only the first output link that is drawn shows the correct number of rows. Other output links incorrectly show zero rows. In addition, if the output link is configured for de-normalization of arrays or a constraint, the number of rows shown on the link is the number of imported rows before de-normalization or filtering for a constraint. (eCase 111912)

  • With Internet Explorer version 7, running a difference comparison of two jobs, and attempting to follow a link from the report to a stage, causes the DataStage and QualityStage Designer to hang.
    Workaround
    Apply a fix from Microsoft. To apply the fix, see the Microsoft knowledge base article number 930828. (eCase 110705)

  • Quick find or advanced find does not correctly find matching objects when the search text contains the German Sharp-S character. Specifically, objects with the German Sharp-S character in the name are not found. (eCase 109056)

  • Quick find and advanced find do not consider the characters ue and ü to be equivalent. For example, searching for Duerst does not match the object called Düerst and vice-versa. (eCase 109667)

  • If you use parameter sets with the same name but different case, conflicts can occur (for example, each parameter set shares the value files of the other). (eCase 96682)

  • Parameter sets do not work correctly when used in a data connection. (eCase 90504)

  • The create table definition function from a shared table skips any table that are deleted from the shared metadata repository by another user after the list of tables for the create table operation was created. (eCase 107844)

  • When the DataStage engine and the project are on different drives, errors can occur when you view or compile parallel jobs.
    Workaround
    Create a TMP directory on each separate drive or use the TMPDIR environment variable to point to an existing directory. (eCase 106782)

  • The performance data file for collecting performance analysis data is not generated at run time.
    Workaround
    Request job performance data at the time the job is compiled. (eCase 101790)

  • Surrogate key jobs end abnormally for a DB2 database type if the source name contains a Japanese character. (eCase 106968)

  • A stage property that uses a backslash (\) before a job parameter does not get resolved correctly, for example, a file name in a Sequential File stage that has a backslash before a job parameter. (eCase 106636)

  • Do not use the UTF16 character map in jobs on the parallel canvas.
    Workaround
    Use the UTF16LE or UTF16BE maps. (eCase 109459)

  • Sometimes the job monitor for the parallel engine does not start when Windows starts up. Jobs run without row counts.
    Workaround
    1. Make sure that all DataStage client programs are disconnected. You can confirm that the job monitor is running by opening a DOS command window and issuing the command:
      ps -ef | grep JobMon 
      The system displays the process ID of the JobMon process.
    2. Manually start the job monitor by opening the DataStage control panel.
    3. Stop all the services.
    4. Restart the services.
    (eCase 110757)

  • Communication between the DataStage and QualityStage client and the DataStage server for WebSphere TX Map stage When the DataStage and QualityStage client is installed on a different computer than the DataStage server, the WebSphere TX Map stage editor can copy files to and from the computer on which DataStage server is installed. If the DataStage server is on UNIX, the files are copied by using FTP. If the DataStage server is on Windows, files are copied by using the operating system of the computer with the DataStage client installation. Use UNC file paths to refer to the files and directories on the computer with the DataStage server installation. For this to work, certain conditions must be fulfilled:
    For a DataStage server on Windows:
    • On the DataStage and QualityStage client computer, log on to Windows by using a domain or workgroup logon that is also valid for the DataStage server computer.
    For a DataStage server on UNIX:
    • Log on to the DataStage and QualityStage client by using a user name and password that are also valid for logging into the FTP server on DataStage server.
    (eCase 115349)

  • Running a Where used... (deep) analysis on a table definition does not display the paths in the output link constraint of a job when viewed in the dependency viewer.
    Workaround
    Perform a Where used...? (deep) and then examine the paths. (eCase 115039)

  • You can set a default value for date, time, timestamp on the Job Properties - Default tab of a WebSphere DataStage job. The default value that you set might be different from the default value set by your system. If you set this default value, you must not enclose the pattern string within single quotation marks ( ' ' ) because the system does not assume that this is in error and generates a warning message. If you want to quote the string, you must use the double quotation mark ( " " ). (eCase 115596)

  • Job sequences from releases before DataStage 7.5 that use jobs which rely on job parameters do not run following import into a post-7.5.1 system.
    Workaround
    Edit the job activity stages in the job sequence. Enter the parameter names again where specified. The sequence runs after re-compilation. (eCase 114789)

  • When running DataStage and QualityStage Designer and performing a find or impact analysis operation, you might encounter the below message: QueryInterface failed for _Collection or VBA._Collection failed
    This error is caused by a problem with the registration of a Visual Basic runtime.
    Workaround
    Register the .dll file again from a command window by using the below command:
    regsvr32 C:\WINDOWS\system32\msvbvm60.dll. 
    (eCases 117025 and 89472)

  • Several character maps are provided by the parallel canvas (PX) to describe record or field string data encoded in the UTF-16 character set, depending on whether the data is stored in big endian or little endian format, and whether a Byte Order Mark (BOM) appears at the beginning of the record or field (the endian format of the data can be determined from the BOM). When UTF-16BE is specified as the character set for a record or field, data is assumed to be stored in big endian UTF-16 format. When UTF-16LE is specified as the character set for a record or field, data is assumed to be stored in little endian UTF-16 format. No BOM appears at the beginning of the record or field data on input, and no BOM is written on the output. When UTF-16 is specified as the character set, a BOM might optionally appear at the beginning of the record or field on the input to indicate the endian format of the data. On the output, a BOM is always written at the beginning of the data stream for the specified record or field. Since field data typically does not contain UTF byte order marks, use either the UTF-16BE or UTF-16LE map instead of UTF-16 in most situations. The analyst must determine whether the UTF-16 data is stored in big endian or little endian format. The behavior of the UTF-16, UTF-16BE, and UTF-16LE character maps is inherited from ICU (International Components for Unicode). (eCase 109459)

  • In an MPP environment, the DataStage parallel engine must be able to run the remote shell command (rsh) without a password on all processing nodes. The parallel engine searches the following paths on a processing node, in the below order, to find the remote shell command:
    • $APT_ORCHHOME/etc/remsh (if it exists)
    • /usr/ucb/rsh
    • /us/bin/remsh
    • /bin/remsh
    • /usr/bin/rsh
    where $APT_ORCHHOME is the directory in which the DataStage parallel engine is installed.
    On Solaris 2.10, the remsh provided by the system might not run successfully within the DataStage parallel engine. This situation can lead to errors such as "rsh issued, no response received".
    Workaround
    To specify the location of the rsh command, copy or rename the file $APT_ORCHHOME/etc/remsh.example supplied by the DataStage parallel engine to $APT_ORCHHOME/etc/remsh. The file contains the following shell script:
    #!/bin/sh 
    # Example apt/etc/remsh 
    exec /usr/bin/rsh "$@"
    As written, this shell script invokes /usr/bin/rsh.
    Edit the last line of this script exec /usr/bin/rsh "$@" to invoke your specific remote shell command. All users should be able to run the script. To ensure this, use chmod:
    #chmod 755 $APT_ORCHHOME/etc/remsh
    (eCase 117475)


DataStage connectivity

The following are known problems with DataStage connectivity:
  • Informix® , Teradata, Classic Federation, and Netezza enterprise stages are not available on SUSE Linux for PowerPC.
  • Netezza enterprise stage is not available on HP-UX 11i v2 on Intel® Itanium®.

BCPLoad stage (eCase 119124)

BCPLoad jobs do not run on the Sybase ASE client version 12.5.4. However, BCP Load jobs work with Sybase ASE client version 12.5.2 and Sybase IQ version 12.6.

Netezza Enterprise

  • When you specify an incorrect close command, the job completes and displays the status as OK. The job should end with status=failed. (eCase 79755)
  • For the nzload utility to work on a SUSE Linux system, you must replace the libstdc++-3-libc6.2-2-2.10.0.so file in /usr/lib directory with the most current libstdc++-3-libc6.2-2-2.10.0.so file. Otherwise, nzload fails with the following error:
    undefined symbol   _dynamic_cast_2
    To obtain the most current libstdc++-3-libc6.2-2-2.10.0.so file, contact Netezza customer support. (eCase 85585)
    Note: This known issue is specific to SUSE Linux for System x™.

Informix Enterprise (eCase 117177)

To prevent termination of Informix enterprise stage HPL jobs, set LDR_CNTRL=MAXDATA=0 in the environment parameters of the job.

DB2 Enterprise (eCase 116556)

If a job contains a DB2 stage that uses user-defined SQL and a job parameter that also contains a single quote, then these single quotes are stripped out and the SQL statement becomes non-valid.

Sybase Enterprise

  • Lookup fails for the microseconds timestamp data type for Sybase IQ. (eCase 110464)
  • The Sybase enterprise stage fails to append for a primary key constraint for Sybase IQ. (eCase 100132)
  • BigInt, unsigned Int, unsigned bigInt, and unsigned smallInt data types are not supported for Sybase ASE version 15.0.
  • A parallel job with a Sybase enterprise stage that selects a table with a long name or a table with long column names, greater than 30 characters, finishes but is not successful.
    Workaround
    If you run the same job with table and column names that are shorter than 30 characters, the job completes successfully. (eCase 104960)

  • A parallel job using the Sybase ASE stage ends abnormally when using Sybase ASE database version 15.0. The naming convention of the libraries in $SYBASE/$SYBASE_OCS/dll changed. Now the library names begin with "libsyb" instead of "lib" as in the previous versions. A script in the $SYBASE/$SYBASE_OCS/scripts directory creates links to libraries in $SYBASE/$SYBASE_OCS/dll of ASE Sybase 15.0. For Windows, the script is copylibs.bat and for UNIX it is lnsyblibs. Run the script before connecting to Sybase ASE 15.0 Server from Sybase ASE 15.0 client. Instructions on how to run the script are provided in the "New Features Open Server 15.0 and SDK 15.0 for Windows, Linux and UNIX documentation on the Sybase Web site. (eCase 109024)


iWay Enterprise

  • Sparse Lookup fails to create values for the following DB2 data type boundary values: Varchar minimum value, bigInt minimum and maximum values, decimal minimum and maximum values, and float minimum and maximum values. (eCase 105044)
  • A lookup operation fails when
    • You select Lookup Failure = Reject,
    • You select Lookup type = Sparse and
    • You drag into the output stage any column from the reference link without dragging that column from the primary link.

pFTP Enterprise (eCase 110196)

  • The pFTP operator (in sFTP mode) fails when a wildcard "?" (question mark) is used in the URI field. (eCase 102854)
  • On Windows Server 2003, Service Pack 1 the error "Rename Temp File calloc:File exists" might be encountered, when trying to get a file and saving it in a different drive.
    Workaround
    Apply Microsoft fix KB899679. (eCase 91302)

  • The restart feature in the FTP enterprise stage does not support the restarting of a job that experienced network failure. (eCase 81730)

  • The pFTP enterprise stage does not support the Internet Protocol version 6 FTP client (IPv6)
    The default Internet protocol for the FTP client installed on SUSE version 10 is IPv6. This FTP client is not compatible with the pFTP enterprise stage. If the FTP client is configured with IPv6 as the default Internet protocol, you receive the following error message:
    "FTP_Enterprise_1,0: error: ftp returned 500 
    ftp msg: 500 'EPSV': command not understood."
    Workaround
    Activate the -disable IPv6 option when you install the FTP client. (eCase 120483)


Teradata Enterprise

  • FLOAT, REAL, and DOUBLE PRECISION data types do not work with Teradata write jobs. (eCase 106763).
  • If a table contains a single column of CHAR data type, a write job works correctly. If a table contains a column of CHAR data type with columns of other data types, a write job gives unrecoverable errors, and an empty table is created. This error occurs only with data other than English. (eCase 106860)

ODBC Enterprise

To enable NcharSupport in an Oracle database, modify the odbc.ini file of the Oracle data sources and set EnableNcharSupport=1. (eCase 101140)

ODBC Connector

  • Reading data from tables containing LOB columns is supported (for SQL Server) only if the LOB columns are the last columns in the SELECT statement. (eCase 105289)
  • On Linux systems, jobs that use both the DB2 enterprise stage together with access to the DB2 system with the ODBC connector and the DB2 wire driver do not run successfully.
    Workaround
    Configure jobs such that they use the DB2 enterprise stage, or the ODBC connector or DB2 wire driver, but not both. (eCase 103387)

  • Rejecting records due to check constraint violations might not function correctly with the IBM DB2 Wire Driver. The driver reports a message "Unknown error: SQLCODE -545". (eCase 101228)

  • In the connector stage editor, if the properties tree is highlighted, pressing F1 does not display help information. Use the Help button to display help information. (eCase 97244)

  • The .odbc.ini file on AIX® contains an incorrect file name for the Teradata driver. The file lists the file name as ivtera22.so. The correct name is VMTera22.so.
    Workaround
    When you set up a data source entry in .odbc.ini to Teradata, modify the "Driver" entry as follows:
    Teradata] 
    Driver=/export/aix52qa01sand0/IBM/InformationServer/ 
    Server/branded_odbc/lib/VMTera22.so
    (eCase 117034)

  • Running jobs on a Linux server using the ODBC connector on a SQL Server 2005 database using the SQL Server wire driver can result in unexpected errors from the driver. (eCase 115699)

  • You cannot import table definitions from a UTF-8 DB2 database using the ODBC connector and the DB2 wire driver when the server is installed on a Japanese AIX system. (eCase 116769)

  • An issue exists with accessing the Teradata driver through the ODBC connector from the connector stage editor or connector import wizard. The ASBAgent can stop, requiring it to be restarted manually. (eCase 115699)

  • There is a known problem when multiple DataStage Designer or WebSphere Information Analyzer clients attempt to access the ODBC connector or WebSphere MQ connector simultaneously. The problem can occur when accessing the connectors for any purpose, such as to browse database tables or message queues, test connections, or import table definitions. A message similar to following might be displayed:
    com.ascential.asb.cas.shared.ConnectorServiceException: 
    An exception occurred while trying to receive the response from the handler: 
    Error unmarshaling return header: java.io.EOFException
    The workaround to permit simultaneous access to connectors from these clients:
    1. Go to ASBNode/bin directory in your Information Server installation.
    2. Export the following environment variable: export CC_MSG_LEVEL=6
    3. Restart the agent daemons by running the command ./NodeAgents.sh restart

  • ODBC driver issues
    • Teradata driver on Windows and Linux (eCase 109492)
      When you use the ODBC connector with Teradata 6.1, a COMMIT WORK unrecoverable error is generated. This error causes the job to end when using the Teradata driver.
      Workaround
      Use the Teradata ODBC driver.


  • DB2 ODBC driver
    When you import tables with the connector metadata import wizard, using the native DB2 ODBC driver for DB2 9.1, you cannot select a table and use its Related Tables hyperlink to select associated tables (eCase 111730).
    Workaround
    Modify the client configuration file (db2cli.ini in the root directory of the DB2 installation). For the relevant data source, add ReturnAliases=0.

  • Microsoft text driver (eCase 80352) The Microsoft text driver is not supported.


Oracle Enterprise

  • In order to use the Oracle OCI or Oracle enterprise stages, users in the primary dstage group must have read and execute permissions on libraries in $ORACLE_HOME/lib and $ORACLE_HOME/bin and similarly read permissions on all files in $ORACLE_HOME. Otherwise, users might experience problems using Oracle OCI and Oracle enterprise stages to connect to Oracle 10.1.0.5.
  • For Oracle 10g operators that use local Oracle 10g servers on AIX and Solaris, you must disable signal handling in Oracle Net 8. To disable signal handling in Oracle Net 8, set bequeath_detach=yes in the environment variable $ORACLE_HOME/network/admin/sqlnet.ora. (eCase 119712)

SQL Server Enterprise (eCase 116355)

The combination of the close statement and any one of the upsert statements that has a single query (such as insert only, delete only, and update only) does not work.

Third party directories are not found by DataStage Engine and ASB Agent

The WebSphere MQ library directory is not in the library path environment variable.
If the WebSphere MQ connector cannot connect to WebSphere MQ in client or server mode, you receive the following error in the log:
IIS-CONN-WSMQ-000005
Windows
An exception occurred while trying to receive the response from the handler:
An exception was received from the handler: 
System call LoadLibraryEx() failed with OS error 126 
(The specified module could not be found). 
UNIX
An exception occurred while trying to receive the response from the handler: 
An exception was received from the handler: 
System call dlopen() failed with OS error 2 
(No such file or directory)
Plug-in jobs terminate with the following error:
main_program: Fatal Error: Fatal: Shared library (mqs.so) failed to load: errno = (2),
system message = (Unable to find library 'libimqi23ah_r.so'.)
Workaround
Make the WebSphere MQ libraries accessible to the connector. By default, the libraries are in the following directory. Contact your WebSphere MQ Administrator if you are not able to find the libraries in these paths.
  • Windows: C:\Program Files\IBM\WebSphereMQ\bin
  • UNIX (32-bit): /opt/mqm/lib
  • UNIX (64-bit): /opt/mqm/lib64
After you identify the directory location of the WebSphere MQ libraries, modify the library path environment variable to include the appropriate directory for the user that runs the connector or plug-in process. One way to achieve this is to update the dsenv script as described below.
The library path is in one of the environment variables in the following table:
Table 1. Environment variables by operating system
Operating system Environment variable
Windows Path
AIX LIBPATH
HP-UX on PA-RISC (32-bit) SHLIB_PATH
HP-UX on Intel Itanium (64-bit) LD_LIBRARY_PATH
Solaris/Linux LD_LIBRARY_PATH
Add the WebSphere MQ library directory to the library path environment variable to use WebSphere MQ with the WebSphere MQ connector that runs in a DataStage parallel job.
  1. Add the WebSphere MQ library directory to the corresponding library path environment variable in the dsenv script in /opt/IBM/InformationServer/Server/DSEngine.
  2. Log in as root.
  3. Type the following commands to source the dsenv script:
    . /opt/IBM/InformationServer/Server/DSEngine/dsenv
  4. Type the following commands to restart DataStage Engine and Agent services.
    cd /opt/IBM/InformationServer/Server/DSEngine/bin
    ./uv -admin -stop
    ./uv -admin -start
    cd /opt/IBM/InformationServer/ASBNode/bin
    . ./NodeAgents_env_DS.sh
    ./NodeAgents.sh stopAgent
    ./NodeAgents.sh start
Workaround for MQ Series Plug-in and MQ Connector on HP-UX 11i v2 on Intel Itanium and SUSE Linux Enterprise Server 10 for IBM zSeries®
Set the following variables in $DSHOME/dsenv or in the WebSphere DataStage administrator environment variables:
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/mqm/lib64; export LD_LIBRARY_PATH
PATH=$PATH:/opt/mqm/bin; export PATH
(eCases 120292, 124837, and 126854)

Miscellaneous stages

  • The Command stage fails to copy a file containing special characters such as & (ampersand). (eCase 108820)
  • The XML input stage does not support namespace variable usage when the XPath expression is provided with the column description. (eCase 108470)
  • The UniData® stage is displayed on the server canvas on Linux even though it is not supported on this platform. (eCase 111345)
Unable to run XML Input Stage on AIX (eCase 122076)
Jobs terminate because they fail to load the xmli.so library file. Server and parallel jobs are terminated.
Workaround
Contact IBM Support.
When using the Teradata connector, the Metadata Importer (Connector Wizard) method of importing metadata might not work for Asian language character sets (eCase 123832)
Workaround
  1. Click DataStage Designer > Import > Table Definitions.
  2. Click Plug-in Metadata Definitions or click Orchestrate Schema Definitions to import table definitions from a variety of data sources.
When using the Teradata connector, the WebSphere DataStage 4-byte float is not the same as Teradata 8-byte float and cause write jobs to insert incorrect float values (eCase 123831)
If you write a float value and then read that value, the result might not be identical to the original value that was written. This is because WebSphere DataStage uses a 4-byte float, whereas Teradata uses an 8-byte float.
Workaround
Modify the job to use a double float. WebSphere DataStage uses an 8-byte double float, so this matches the Teradata float.
View Data exception when Generate SQL = YES (eCase 123370)
View data functionality is not working from within the connector stage editor when the property Generate SQL is set to Yes. Clicking the View data hyperlink results in an error.
Workaround
Set Generate SQL to No and enter the statement manually or use the SQL Builder. Alternatively, View data functionality is accessible from the Connector Import Wizard.
ODBC Connector problem on AIX when writing to CLOB data (eCase 117097)
The job can hang indefinitely during the write operation.
Workaround
Use an alternate stage.
DB2 API extract and date handling of type 0001-01-01 problem (eCase 118621)
When DB2 date values are represented by the default value of 0001-01-01, these values are incorrectly read as an Integer equivalent of -718432 (instead of 0). PX Jobs reading these values result in joblog warnings similar to
APT_CombinedOperatorController,0:Resource bundle corresponding to message key DSTAGE-TODC-00103 not found! Check that DSHOME or APT_RESPATH is set. and DB2_UDB_API_0,0: ?? (1; 0; 1721424) 
This causes zero records to be processed.
Workaround
Set the Date column to CHAR.
Error loading orchoracle when using Oracle Enterprise stage (eCase 115580)
Workaround
Type the following command to load the Oracle library correctly:
PX_DBCONNECTHOME=/scratch/IBM/InformationServer/Server/DSComponents;export PX_DBCONNECTHOME  - $ ./install.liborchoracle
Parallel job with Java Client stage does not work (eCase 118191)
Workaround
Set the DSHOME, DATASTAGE_JRE and DATASTAGE_JVM environment variables as follows (adjust appropriately per install):
For Windows operating systems:
  1. Set DATASTAGE_JRE
    DATASTAGE_JRE=$ASBNode/apps/jre
  2. Set DSHOME
    DSHOME=C:\IBM\InformationServer\Server\DSEngine
  3. Modify LIBPATH inside dsenv
    LIBPATH=$ASBHOME/jre/bin/sovvm before $ASBNode/apps/jre/bin
  4. Do not set DATASTAGE_JVM.
For Linux, Solaris, and UNIX operating systems:
  1. Set DATASTAGE_JRE
    DATASTAGE_JRE=$ASBNode/apps 
  2. Set DSHOME
    DSHOME=/opt/IBM/InformationServer/Server/DSEngine
  3. Set DATASTAGE_JVM
    DATASTAGE_JVM=jre/bin/j9vm  
Chinese locale zh_CN.EUC on Solaris is not supported (eCase 117549)
Workaround
On Solaris, the locale should be set to the zh_CN.GBK locale, which is a superset of the older zh_CN.EUC locale.
Chinese installation not supported for Linux environment (eCase 114895)
UTF-8 locale on Chinese Linux systems is not supported. Only zh_CN.GB2312 is supported.
Shared Metadata Management allows multiple databases to be created under the same host using the same database name and instance (eCase 115223)
Shared Metadata Management allows multiple databases to be created under the same host using the same database name and instance
In SQL Builder, selected columns are not dragged to the update column selection grid for Upsert Mode/Order as Update then Insert (eCase 115234)
In SQL Builder, selected columns are not dragged to the update column selection grid for Upsert Mode/Order as Update then Insert
In SQL Builder, the update values are lost when Validation is ON (eCase 116587)
In update column grid, the update values are lost when values are added to the respective select columns. This problem is seen when the Validation button on the Toolbar is turned ON before making the query.
Workaround
Turn the Validation button OFF before building the query.
Note: The default state of the Validation is OFF.
In SQL Builder, certain Oracle 9i functions are displaying an error (eCase 115989)
These functions are not supported.
Oracle Enterprise: Making a delete statement after making an upsert statement results in SQL Builder opening in upsert mode (eCase 116968)
Building a DELETE query after making an upsert causes the wrong SQL Builder window to open with the Insert, Update and SQL tabs.
Workaround
To build a DELETE query after making an upsert, cancel and come back to the Stage Editor, and then invoke SQL Builder for DELETE.
Creating a project using the Administrator is very slow when using the default kernel parameters (eCase 116480)
Workaround
Use the recommended settings to configure the kernel parameters on Solaris.
  1. Make a backup copy of /etc/system before you modify it.
  2. Update /etc/system with the suggested kernel parameters and values.
  3. Restart the computer.
Suggested kernel values for installing WebSphere Application Server on Solaris [1]:
set shmsys:shminfo_shmmax = 4294967295
set shmsys:shminfo_shmseg = 1024
set shmsys:shminfo_shmmni = 1024
set semsys:seminfo_semaem = 16384
set semsys:seminfo_semmni = 1024
set semsys:seminfo_semmap = 1026
set semsys:seminfo_semmns = 16384
set semsys:seminfo_semmsl = 100
set semsys:seminfo_semopm = 100
set semsys:seminfo_semmnu = 2048
set semsys:seminfo_semume = 256
set msgsys:msginfo_msgmap = 1026
set msgsys:msginfo_msgmax = 65535
Suggested kernel values for db2osconf:
set msgsys:msginfo_msgmni = 2560
set semsys:seminfo_semmni = 3072
set shmsys:shminfo_shmmax = 5747768524
set shmsys:shminfo_shmmni = 3072
Parallel job not supported when running jobs from WebSphere Information Services Directory (eCase 117012)
It is not possible to run a Parallel job that contains either a Server Shared Container or a Basic Transformer stage when those stages are connected to a WebSphere Information Services Director Input stage. The Parallel engine produces a warning similar to the following: Operator nnn is not wave aware; the operator is reset and rerun on each wave if multiple waves present.
Unable to import metadata from IBM WebSphere Metadata Server database (eCase 117747)
Import metadata from IBM WebSphere Metadata Server database does not work.
IBM XL C/C++ Enterprise Edition for AIX, V9.0 compiler is not supported (eCase 122662)
This function is not supported.
Invalid Search Expression alert is displayed sequentially when you click the Refresh button (eCase 115991)
Error message is displayed when an invalid search expression is entered when using the SQL builder utility.
Workaround
Click the Refresh icon on toolbar.
Errors result when switching between client and server (eCase 124080)
When using the MQ Connector, switching between client and server modes and then selecting a queue or test connect might result in an error.
Workaround
Restart the agent.
Connection problem occurs between ITAG installation on WebSphere DataStage when using Admin Client, after installing second ITAG version
(eCase 116898)
Informix XPS Load stage fails on the server and parallel canvas on HP-UX 11i v2 on PA-RISC (eCase 127034)
Workaround
Obtain and install the patch available from IBM support.
MQ plug-ins terminate for jobs run on the server canvas on HP-UX 11i v2 on PA-RISC (eCase 127036)
The queue manager fails on WebSphere MQ on HP-UX 11i v2 on PA-RISC when the Java heap size is larger than 1024.
Workaround
Install the WebSphere MQ V6.0 Fix Pack 6.0.2.1 http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg1IY88573.
Sybase server connection fails with Login fails with invalid packet size error message (eCase 127022)
When connecting from the SybaseOC Plug-in to the Sybase server the login fails with invalid packet size. Plug-in stage property SOURCE_TABLE is not defined for input links.
Plug-in jobs terminate on HP-UX 11i v2 on Intel Itanium (eCase 126733)
  • Informix plug-in jobs end after creating table successfully. The jobs end with the following error:
    Abnormal termination of stage informix.
    Informix_CLI_0.IDENT1 detected.
  • Bulk load jobs end when used from DRS plug-in. The jobs end with the following library loading errors: issue.
    Could not load drsdb2bl.so.
XPS plug-in jobs end when run in manual mode
Workaround
Obtain the patch from IBM customer support to use the XPS plug-in.
Configure WebSphere DataStage and WebSphere QualityStage for use with PAM (eCase 127126)
Operating system: Linux Enterprise Server 10 for IBM zSeries
Add the following entry to the /etc/pam.d/dsepam file to enable WebSphere DataStage to use PAM (dsepam) services on Linux Enterprise Server 10 for IBM zSeries:
#%PAM-1.0
auth      required  pam_unix2.so  nullok #set_secrpc
password  required  pam_unix2.so  nullok #set_secrpc
account   required  pam_unix2.so  nullok #set_secrpc