Datastage Exam and Examination Questions
Following by the official IBM website (http://www-03.ibm.com/certify/tests/obj415.shtml) below are areas to prepare in order to pass the test:
Test 415 - IBM WebSphere IIS DataStage Enterprise Edition v7.5
Section 1 - Installation and Configuration (5%)
Describe how to properly install and configure DataStage EE
Describe users and groups
Describe the environment (e.g., dsenv, ODBC)
Setup database connectivity
Describe OS configuration/kernel
Describe USS configuration
Identify required components for server
C++ compiler
Identify the tasks required to create and configure a project to be used for EE jobs.
Project location
Assign DataStage EE roles
Environment defaults
Given a configuration file, identify its components and its overall intended purpose.
Project location
Assign DataStage EE roles
Environment defaults
List the steps necessary to start/stop DataStage EE properly.
netstat -a|grep ds
Section 2 - Metadata (5%)
Demonstrate knowledge of Orchestrate schema.
Distinguish internal data type (Orchestrate schema) vs external data type
Describe how to set extended properties for table definition
Import metadata using plug-ins vs orchdbutil
Explain nullable mapping rules (e.g., source vs target)
NLS data types
Identify the method of importing metadata.
Flat sources (e.g., sequential file, Orchestrate schema, ODBC, etc.)
COBOL CopyBook
XML
Given a scenario, demonstrate knowledge of runtime column propagation.
Usage
Impact on stage mapping and target databases
Section 3 - Persistent Storage (10%)
Given a scenario, explain the process of importing/exporting data to/from framework (e.g., sequential file, external source/target).
Explain use of various file stages (e.g., file, CFF, fileset, dataset) and where appropriate to use
If USS, define the native file format (e.g., EBCDIC, VSDM)
Given a scenario, describe proper use of a sequential file.
Read in parallel (e.g., reader per node, multiple files)
Handle various formats (e.g., fix vs variable, delimeted vs nondelimited, etc.)
Describe how to import and export nullable data
Explain how to identify and capture rejected records (e.g., log counts, using reject link, options for rejection)
Given a scenario, describe proper usage of CFF (native not plug-in).
Explain how to import data from a file that has multiple varying record types (e.g., COBOL CopyBook, EBCDIC to ASCII)
Describe proper usage of FileSets and DataSets.
Explain differences and similarities of FileSet and DataSet (e.g., header data file segments, internal DS vs external format (FS))
Determine which tools can be used to manage FileSets and DataSets (GUI and CLI)
Describe use of FTP stage for remote data (e.g., how to parallel, plug-in vs enterprise).
Restructure stages (e.g., column import/export)
Identify importing/exporting of XML data.
XML stage options and usage
XPATH and XLS
Section 4 - Parallel Architecture (10%)
Given a scenario, demonstrate proper use of data partitioning and collecting.
Partitioning goals (e.g., data distribution, meeting data grouping requirements)
Explain the differences and usage of round robin, entire, same, and hash partitioning
Explain the differences between auto, ordered, and sort collector
Identify partitioning type, parallel/sequential by analyzing a DataStage EE screen shot
Explain the differences between partitioning keys and sorting (stage) keys through a scenario where these do not match
Demonstrate proper use of partitioning methods based on given business requirements and DataStage EE technical requirements
Explain the differences between a funnel state and a collector
Describe input of partitioning and re-partitioning in an MPP/cluster environment
Given a scenario, demonstrate knowledge of parallel execution.
Given a job design and configuration file, provide estimates of the number of processes generated at runtime
Explain the purpose and use of resource and node pools
Given a source DataSet, describe the degree of parallelism using auto and same partitioning
Section 5 - Databases (15%)
Given a scenario, demonstrate proper selection of database stages and database specific stage properties.
DB2
Based on DB2 connectivity requirements, select appropriate stage (e.g., DB2 API, DB2 Enterprise)
List environment variables needed to successfully run a DB2 job
Teradata
Based on functional requirements, select appropriate Teradata stage (e.g., TD Enterprise, TD MultiLoad (target only), TD API)
Requested sessions and total sessions
DRS
ODBC
Informix
Sybase
Oracle
Remote Server
Partition table
Given a scenario using Oracle, load target and proper use of index mode and $API_ORA_LOAD_OPTIONS
Identify source database options.
Identify which stages/options read in parallel
Explain the degree of parallelism for Oracle Enterprise and DB2 Enterprise
Identify the use of "sparse" vs "normal" lookup
When to use (e.g., 1:100)
How to construct sparse lookup and SQL statements
Given a scenario, demonstrate knowledge of target database options.
Articulate benefits and limitations of using parallel load method (e.g., constraints, speed, exclusive locks, etc.)
Explain the differences between upsert and load methods on target database stages
Describe how to control restart of DB target (separate the "L" from ETL)
Use OPEN, CLOSE, create temp, Load, SELECT FROM... INSERT INFO, DROP
Identify the impact of RCP when target has fewer columns than job flow
Separate update and insert records before target upsert
Given a scenario, describe how to design EE ETL jobs that will extract data from DB2 (or any DBMS), combine with data from another source and load to another DBMS target.
Demonstrate knowledge of working with NLS database sources and targets.
Section 6 - Data Transformation (15%)
Given a scenario, demonstrate knowledge of default type conversions, output mappings, and associated warnings.
Demonstrate appropriate use of modify for NULL handling, string trim, non-default conversions
Given a scenario, demonstrate proper selections of Transformer stage vs other stages.
Copy stage/output mapping
No BASIC Transformer
Filter vs Transformer constraint (e.g., filter vs interpreted)
Given a scenario, describe Transformer stage capabilities (including stage variables, link variables, DataStage macros, constraints, system variables, link ordering, @PART NUM, functions).
Explain NULL handling within parallel transformer (e.g., reject rule, NULL functions)
Demonstrate the use of Transformer stage variables (e.g., to identify key grouping boundaries on incoming data
Use a Transformer only to generate a sequence of numbers in parallel
Identify processes to add functionality not provided by existing DataStage stages (e.g., wrapper, buildops, user def functions/routines).
Given a scenario, demonstrate no closed loop scenario when updating a reference table.
Section 7 - Combining and Sorting Data (10%)
Demonstrate knowledge of Join, Lookup, and Merge stages.
Explain the differences between Lookup, Join, and Merge stages and demonstrate when to use each
Explain link (e.g., input/output/reject) requirements
Explain sorting requirements
Explain full outer join usage
Demonstrate understanding of link ordering with JOIN and MERGE stages
Lookup - continue (also outer JOIN) with non-nullable input columns
Explain what happens with duplicate input column names on JOIN and MERGE
Given a scenario, demonstrate knowledge of SORT stage.
Describe the differences between parallel SORT, SORT, and sequential SORT
Describe the differences between unique option in osrt, RemDup stage
Demonstrate understanding of SORT key column properties (e.g., don't sort, previously sorted)
Demonstrate understanding of framework inserted sorts
Examine score
Set $APT_SORT_INSERTION_CHECK_ONLY
Distinguish between SQL ORDER BY vs parallel sort
Explain NLS collation sequences
Given a scenario, demonstrate understanding of Aggregator stage.
Memory impact of sort vs hash
Demonstrate use of Aggregator properties
Count vs calculation
Explicitly create output columns and map
Take output of Aggregator and join with detail records
Describe the proper usage of change capture/change apply.
Separate delta rows (e.g., inserts vs updates)
Section 8 - Automation and Production Deployment (10%)
Given a scenario, articulate the change control process.
"Dev to QA to prod"
Explain the differences between compile and force compile
Use of protected projects
Import/Export
Including compiled Transformer
Managing custom components (e.g., buildops, wrappers)
USS deployment topics (e.g., remote deployment, how to deploy)
Identify the use of the dsjob command line utility.
Explain how a DataStage job may be run using a third party scheduler (e.g., autosys, control U, CRON, etc.)
Explain how performance run statistics from a job run can be captured
Capture log records to a text file (XML format)
Explain how to use parameters to simplify the creation and maintenance of job designs.
Explain the processes necessary to run multiple copies of the source (job multi-instance)
Given a scenario, demonstrate the ability to use job sequencers (e.g., exception hunting, re-startable, dependencies, passing return value from routing, parameter passing and job status)
Create different designs using Job Sequencer
Section 9 - Monitoring and Troubleshooting (10%)
Given a scenario, demonstrate knowledge of parallel job score.
Show how it is captured and interpret its results
Identify and interpret its contents
Number of processes
Number of nodes
Number of DataSets
Combined operators
Parallel/Sequential execution
Inserted buffer and sort operators
Given a scenario, identify and define environment variables that control EE with regard to added functionality and reporting.
Articulate ability to identify runtime metadata vs design metadata
APT_PM_SHOW_PIDS, etc.
Given a process list, identify conductor, section leader, and player processes.
Given a scenario, identify areas that may improve performance (e.g., buffer size, repartitioning, config files, operator combination, etc.).
Section 10 - Job Design (10%)
Demonstrate knowledge of shared containers.
Reuse using RCP
Using usage analysis and multi-job compile to recompile jobs that use a modified shared container
Given a scenario, describe how to minimize SORTS and repartitions.
Demonstrate knowledge of creating restart points across jobs using intermediate DataSets.
Given a scenario, demonstrate proper use of standards.
Naming conventions for stages and links
Differences between annotation and description annotation