Sunday, July 24, 2011

How to debugging your DataStage Parallel Jobs

Here is a good article about tips for debugging a datastage job :



Steps

Enable the following environment variables in DataStage Administrator:
  • APT_PM_PLAYER_TIMING – shows how much CPU time each stage uses
  • APT_PM_SHOW_PIDS – show process ID of each stage
  • APT_RECORD_COUNTS – shows record counts in log
  • APT_CONFIG_FILE – switch configuration file (one node, multiple nodes)
  • OSH_DUMP – shows OSH code for your job. Shows if any unexpected settings were set by the GUI.
  • APT_DUMP_SCORE – shows all processes and inserted operators in your job
  • APT_DISABLE_COMBINATION – do not combine multiple stages in to one process. Disabling this will make it easier to see where your errors are occurring.
  • Use a Copy stage to dump out data to intermediate peek stages or sequential debug files. Copy stages get removed during compile time so they do not increase overhead.
  • Use row generator stage to generate sample data.
  • Look at the phantom files for additional error messages: c:\datastage\project_folder\&PH&
  • To catch partitioning problems, run your job with a single node configuration file and compare the output with your multi-node run. You can just look at the file size, or sort the data for a more detailed comparison (Unix sort + diff commands).

No comments:

Post a Comment