September 5, 2017

Srikaanth Kumar

Database Optimization Advanced Interview Questions


Reasons of poor performance of query.

Following are the reasons for the poor performance of a query:

- No indexes.

- Excess recompilations of stored procedures.

- Procedures and triggers without SET NOCOUNT ON.

- Poorly written query with unnecessarily complicated joins.

- Highly normalized database design.


- Excess usage of cursors and temporary tables.

- Queries with predicates that use comparison operators between different columns of the same table.

- Queries with predicates that use operators, and any one of the following are true:

1. There are no statistics on the columns involved on either side of the operators.

2. The distribution of values in the statistics is not uniform, but the query seeks a highly selective value set. This situation can be especially true if the operator is anything other than the equality (=) operator.

3. The predicate uses the not equal to (!=) comparison operator or the NOT logical operator.

- Queries that use any of the SQL Server built-in functions or a scalar-valued, user-defined function whose argument is not a constant value.

- Queries that involve joining columns through arithmetic or string concatenation operators.

- Queries that compare variables whose values are not known when the query is compiled and optimized.


What are the ways to code efficient transactions?

Should not allow input from users during a transaction.

- All the information that is required from the user should be gathered before the transaction starts. If some additional information is required rollback the transaction and restart the transaction after the input is provided.

Transactions should not be opened while browsing through data.

- Till all the preliminary data analysis is not completed the transactions should not be started.

Keep the transaction as short as possible.

- After the modifications are made start the transaction and execute the modified statements and immediately rollback or commit and do not open the transaction unless it is required.

Try to use lower transaction isolation levels.

A system with a low probability of concurrent updates there is an occasional error “somebody else changed your data after you read it” can be much lower than the overhead of always locking rows as they are read.

Accessing the least amount of data possible while in a transaction.

The smaller the amount of data that is accessed in the transaction the fewer the number of rows that will be locked by reducing the contention between the transactions.

Explain Execution Plan.

- SQL Server caches the plan of execution of query or stored procedure which it uses in subsequent call.

- This is a very important feature with regards to performance enhancement.

- You can view execution plan of data retrieval graphically or textually.

- Execution plans, describes the SQL command EXPLAIN PLAN, and explains how to interpret its output.

- The EXPLAIN PLAN statement displays execution plans chosen by the Oracle optimizer for SELECT, UPDATE, INSERT, and DELETE statements.

- A statement's execution plan is the sequence of operations Oracle performs to run the statement.

- The row source tree is the core of the execution plan.

It shows the following information:

1. An ordering of the tables referenced by the statement.

2. An access method for each table mentioned in the statement.

3. A join method for tables affected by join operations in the statement.

4. Data operations like filter, sort, or aggregation

In addition to the row source tree, the plan table contains information about the following:

1. Optimization, such as the cost and cardinality of each operation.

2. Partitioning, such as the set of accessed partitions.

3. Parallel execution, such as the distribution method of join inputs.

- The EXPLAIN PLAN results let you determine whether the optimizer selects a particular execution plan, such as, nested loops join.

- It also helps to understand the optimizer decisions, such as why the optimizer chose a nested loops join instead of a hash join, and lets you understand the performance of a query.

What are B-trees?

- A method of placing and locating files (called records or keys) in a database is known as a B-Tree.
- The number of times a medium must be accessed to locate a desired record, thereby speeding up the process is minimized by the B-tree algorithm.

Uses of a B-tree in databases:

1. The keys are kept in sorted order for sequential traversing
2. A hierarchical index is used to minimize the number of disk reads
3. Partially full blocks are used to speed insertions and deletions
4. Keep the index balanced with an elegant recursive algorithm

- A B-tree also minimizes the waste by making sure that the interior nodes are at least half full.
- An arbitrary number of insertions and deletions can be handled by a B-tree.

Explain Table Scan and Index Scan.

- We use queries to improve the performance.

- SQL Server has to process more data to find the records that meet the queries criteria because of one common problem that exists is the lack of indexes or incorrect indexes .
- These issues are known as Index Scans and Table Scans.
- When SQL Server has to scan the data or index pages to find the appropriate records it is known as an index scan or table scan .
- A seek uses the index to pinpoint the records that are needed to satisfy the query and a scan is totally opposite of a seek.
- We need to find and fix scans as they generally require more I/O and also take longer to process.
- At its first release the performance is great, but as time passes and as more data is added the index scans take longer and longer to complete.
- These issues can be found by running Profiler or setting up a server side trace and look for statements that have high read values.
- Once the statements are identified then they can be seen at the query plan to see if there are scans occurring.

Describe FillFactor concept in indexes

- The value that determines the percentage of space on each leaf-level page to be filled with data is known as Fill Factor.

- In SQL Server, the smallest unit is a page, which is size 8K.
- This page can store one or more rows based on the size of the row.
- The default value of Fill Factor is 100, which has the same value as 0.
- The SQL Server allows to fill the leaf-level pages of an index with the maximum numbers of the rows it can fit by the default Fill Factor(100 or 0).
- When the fill factor is 100 there will be no or very little empty space left in the page.
- The "page split" event happens to accommodate new data if the page is completely filled and new data is inserted in the table which belongs to the completely filled page.
- SQL server splits the page in two pages by dividing the data in half and when new data arrives it accommodates the new data, and belongs to the page which is filled up completely.
- It tells us that a completely filled page is now two half-filled pages.
- As soon as the page split process is over, the new data is inserted at its logical place.
- The process of page split is expensive in terms of the resources.
- More amount of storage space is required to accommodate the small amount of data as there is a lock on the page for a brief period.

What are Index statistics?

- Optimizer statistics are a collection of data that describe more details about the database and the objects in the database.

- These statistics are used by the query optimizer to choose the best execution plan for each SQL statement. Optimizer statistics include the following:

1. Table statistics

a. Number of rows
b. Number of blocks
c. Average row length

2. Column statistics

a. Number of distinct values (NDV) in column
b. Number of nulls in column
c. Data distribution (histogram)

3. Index statistics

a. Number of leaf blocks
b. Levels
c. Clustering factor

4. System statistics

a. I/O performance and utilization
b. CPU performance and utilization

Describe Fragmentation

- The storage of the data can be done by fragmenting the whole database into several pieces called fragments.
- Each of this fragment is stored at a different site.
- The logical data units stored at various sites in a distributed database system are known as fragments.

Advantages of fragmentation

1. Usage

- The applications work with views rather than the entire relations.
- It is appropriate to work with subsets of relation as the unit of distribution for data distribution.

2. Efficiency

- The data is stored where it is frequently used.
- Data which is not required by the local applications is not stored.

3. Parallelism

- A transaction can be divided into several sub queries that operate on fragments by using these fragments as a unit of distribution.
- This will help to increase the degree of concurrency, or parallelism, in the system, which in turn allows transactions to execute safely in parallel.

4. Security

- Data which is not required by the local applications is not stored which in turn is not available to unauthorized users.

Disadvantages of fragmentation

1. Performance

- The performance of global application which is required by the data from the several fragments which are located at different sites may be slower.

2. Integrity

- The integrity control can be difficult if the data and the functional dependencies are fragmented and are located at different sites.

Explain Nested Join, Hash Join, and Merge Join in SQL Query Plan.

Nested Loops Joins

- Nested loops join an outer data set to an inner data set.
- For each row in the outer data set that matches the single-table predicates, the database retrieves all rows in the inner data set that satisfy the join predicate.
- If an index is available, then the database can use it to access the inner data set by rowid.

Hash Joins

- The database uses a hash join to join larger data sets.
- The optimizer uses the smaller of two data sets to build a hash table on the join key in memory, using a deterministic hash function to specify the location in the hash table in which to store each row.
- The database then scans the larger data set, probing the hash table to find the rows that meet the join condition.

Sort Merge Joins

- A sort merge join is a variation on a nested loops join.
- The database sorts two data sets (the SORT JOIN operations), if they are not already sorted.
- For each row in the first data set, the database probes the second data set for matching rows and joins them (the MERGE JOIN operation), basing its start position on the match made in the previous iteration.



Subscribe to get more Posts :