September 24, 2018

Sreekanth B

CA Technologies Pentaho Recently Asked Interview Questions Answers

Define Pentaho Reporting Evaluation.?

Pentaho Reporting Evaluation is a particular package of a subset of the Pentaho Reporting capabilities, designed for typical first-phase evaluation activities such as accessing sample data, creating and editing reports, and viewing and interacting with reports.

How To Perform Database Join With Pdi (pentaho Data Integration)?

PDI supports joining of two tables form the same databse using a ‘Table Input’ method, performing the join in SQL only.

On the other hand, for joining two tables in different databases, users implement ‘Database Join’ step. However, in database join, each input row query executes on the target system from the main stream, resulting in lower performance as the number of queries implement on the B increases.

To avoid the above situation, there is yet another option to join rows form two different Table Input steps. You can use ‘Merge Join ‘step, using the SQL query having ‘ORDER BY’ clause. Remember, the rows must be perfectly sorted before implementing merge join.

Explain How To Sequentialize Transformations?

Since PDI transformations support parallel execution of all the steps/operations, it is impossible to sequentialize transformations in Pentaho. Moreover, to make this happen, users need to change the core architecture, which will actually result in slow processing.
CA Technologies Pentaho Recently Asked Interview Questions Answers
CA Technologies Pentaho Recently Asked Interview Questions Answers

Explain Pentaho Report Designer (prd).?

PRD is a graphic tool to execute report-editing functions and create simple and advanced reports and help users export them in PDF, Excel, HTML and CSV files. PRD consists of Java-based report engine offering data integration, portability and scalability. Thus, it can be embedded in Java web applications and also other application servers like Pentaho BAserver.

Explain The Benefits Of Data Integration.?

The biggest benefit is that integrating data improves consistency and reduces conflicting and erratic data from the DB. Integration of data allows users to fetch exactly what they look for, enabling them utilizeand work with what they collected.

Accurate data extraction, which further facilitates flexible reporting and monitoring of the available volumes of data.

Helps meet deadlines for effective business management.

Track customer’s information and buying behavior to improve traffic and conversions in the future, thus advancing your business performance.

Define Pentaho And Its Usage.?

Revered as one of the most efficient and resourceful data integration tools (DI), Pentaho virtually supports all available data sources and allows scalable data clustering and data mining. It is a light-weight Business Intelligence suite executing Online Analytical Processing (OLAP) services, ETL functions, reports and dashboards creation and other data-analysis and visualization operations.

Explain The Important Features Of Pentaho.?

Pentaho is capable of creating Advanced Reporting Algorithms regardless of their input and output data format.
It supports various report formats, whether Excel spreadsheets, XMLs, PDF docs, CSV files.
It is a Professionally Certified DI Software rendered by the renowned Pentaho Company headquartered in Florida, United States.
Offers enhanced functionality and in-Hadoop functionality.
Allows dynamic drill down into larger and greater information.
Rapid Interactive response optimization.
Explore and view multidimensional data.

Name Major Applications Comprising Pentaho Bi Project.?

Business Intelligence Platform.
Dashboards and Visualizations.
Reporting.
Data Mining.
Data Analysis.
Data Integration and ETL (also called Kettle).
Data Discovery and Analysis (OLAP).

What Is Mdx And Its Usage?

MDX is an acronym for ‘Multi-Dimensional Expressions,’ the standard query language introduced by Microsoft SQL OLAP Services. MDX is an imperative part of XML for analysis API, which has a different structure than SQL.

A basic MDX query is:

SELECT {[Quantity].[Unit Sales], [Quantity].[Store Sales]} ON COLUMNS,

{[Product].members} ON ROWS

FROM [Sales]

WHERE [Time].[1999].[Q2]

Define Three Major Types Of Data Integration Jobs.?

Transformation Jobs : Used for preparing data and used only when the there is no change in data until transforming of data job is finished.

Provisioning Jobs : Used for transmission/transfer of large volumes of data. Used only when no change is data is allowed unless job transformation and on large provisioning requirement.

Hybrid Jobs : Execute both transformation and provisioning jobs. No limitations for data changes; it can be updates regardless of success/failure. The transforming and provisioning requirements are not large in this case.

Illustrate The Difference Between Transformations And Jobs.?

While transformations refer to shifting and transforming rows from source system to target system, jobs perform high level operations like implementing transformations, file transfer via FTP, sending mails, etc.

Another significant difference is that the transformation allows parallel execution whereas jobs implement steps in order.

Define Pentaho Report Types.?

There are several categories of Pentaho reports :

Transactional Reports : Data to be used form transactions. Objective is to publish detailed and comprehensive data for day-to-day organization’s activities like purchase orders, sales reporting.

Tactical Reports : data comes from daily or weekly transactional data summary. Objective is to present short-term information for instant decision making like replacing merchandize.

Strategic Reports : data comes from stable and reliable sources to create long-term business information reports like season sales analysis.

Helper Reports : data comes from various resources and includes images, videos to present a variety of activities.

What Are Variables And Arguments In Transformations?

Transformations dialog box consists of two different tables: one of arguments and the other of variables. While arguments refer to command line specified during batch processing, PDI variables refer to objects that are set in a previous transformation/job in the OS.

How To Configure Jndi For Pentaho Di Server?

Pentaho offers JNDI connection configuration for local DI to avoid continuous running of application server during the development and testing of transformations.  Edit the properties in jdbc.propertiesfile located at…data-integration-serverpentaho-solutionssystemsimple-jndi.


How Do You Duplicate A Field In A Row In A Transformation?

Several solutions exist:

Use a “Select Values” step renaming a field while selecting also the original one. The result will be that the original field will be duplicated to another name.

It will look as follows:

This will duplicate fieldA to fieldB and fieldC.

Use a calculator step and use e.g. The NLV(A,B) operation as follows:

This will have the same effect as the first solution: 3 fields in the output which are copies of each other: fieldA, fieldB, and fieldC.

Use a JavaScript step to copy the field:

This will have the same effect as the previous solutions: 3 fields in the output which are copies of each other: fieldA, fieldB, and fieldC.

Why Can’t I Duplicate Fieldnames In A Single Row?

You can’t. PDI will complain in most of the cases if you have duplicate fieldnames. Before PDI v2.5.0 you were able to force duplicate fields, but also only the first value of the duplicate fields could ever be used.

I’ve Got A Transformation That Doesn’t Run Fast Enough, But It Is Hard To Tell In What Order To Optimize The Steps. What Should I Do?

Transformations stream data through their steps:

That means that the slowest step is going to determine the speed of a transformation.
So you optimize the slowest steps first. How can you tell which step is the slowest: look at the size of the input buffer in the log view.
In the latest 3.1.0-M1 nightly build you will also find a graphical overview of this: HTTP://WWW.IBRIDGE.BE/?P=92
(the “graph” button at the bottom of the log view will show the details).
A slow step will have consistently large input buffer sizes. A fast step will consistently have low input buffer sizes.

We Will Be Using Pdi Integrated In A Web Application Deployed On An Application Server. We’ve Created A Jndi Datasource In Our Application Server. Of Course Spoon Doesn’t Run In The Context Of The Application Server, So How Can We Use The Jndi Data Source In Pdi?

If you look in the PDI main directory you will see a sub-directory “simple-jndi”, which contains a file called “jdbc.properties”. You should change this file so that the JNDI information matches the one you use in your application server.

After that you set in the connection tab of Spoon the “Method of access” to JNDI, the “Connection type” to the type of database you’re using. And “Connection name” to the name of the JDNI datasource (as used in “jdbc.properties”).

Mention Major Features Of Pentaho?

Direct Analytics on MongoDB: It authorizes business analysts and IT to access, analyze, and visualize MongoDB data.

Science Pack: Pentaho’s Data Science Pack operationalizes analytical modeling and machine learning while allowing data scientists and developers to unburden the labor of data preparation to Pentaho Data Integration.

Full YARN Support for Hadoop: Pentaho’s YARN mixing enables organizations to exploit the full computing power of Hadoop while leveraging existing skillsets and technology investments.

Define Tuple?

Finite ordered list of elements is called as tuple.

Subscribe to get more Posts :