BI Interview Questions

keeking發表於2012-06-13

Recently I had to find a BI resource to do some ETL work for me.  I wanted to make sure I had the right person for the job so I came up with a list of interview questions that proved out to be spot on.  It really helped me weed out the folks that knew a lot of buzz words versus the folks that had actually worked with the different aspects of the BI ETL process.  I've provided some of the answers to a few of the questions.  You would be surprised at what little people really know when they claim to be an expert at BI.  I did find the perfect person using these questions and the project is well under way and on target! 

Can you explain normalization?

First Normal Form. (1NF)

First normal form. (1NF) sets the very basic rules for an organized database:

  • Eliminate duplicative columns from the same table.
  • Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).

Second Normal Form. (2NF)

Second normal form. (2NF) further addresses the concept of removing duplicative data:

  • Meet all the requirements of the first normal form.
  • Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
  • Create relationships between these new tables and their predecessors through the use of foreign keys.

Third Normal Form. (3NF)

Third normal form. (3NF) goes one large step further:

  • Meet all the requirements of the second normal form.
  • Remove columns that are not dependent upon the primary key.

 

What is de-normalization?

A true relational DBMS would allow for a fully normalized database at the logical level, while providing physical storage of data that is tuned for high performance. De-normalization is a technique to move from higher to lower normal forms of database modeling in order to speed up database access.Only one valid reason exists for denormalizing a relational design - to enhance performance

Differences between Stored Procedures and Functions

  1. Stored procedures can't return a table variable where as function can do that.

  2. You can use stored procedures to alter the server environment parameters where as using functions you can't.

A stored procedure is like a miniture program in SQL Server. It can be as simple as a select statement, or as complex as a long script. that adds, deletes, updates, and/or reads data from multiple tables in a database. (Stored procedures can also implement loops and cursors which both allow you to work with smaller results or row by row operations on data.)

The SQL Server functions are option for doing certain operations in SQL Server. They can not be used to update, delete, or add records to the database. They simply return a single value or a table value. They can only be use to select records. However, they can be called very easily from within standard SQL

Function can not do delete/update/insert

What is a surrogate key?

It is just a unique identifier or number for each row that can be used for the primary key to the table. The only requirement for a surrogate primary key is that it is unique for each row in the table.

Explain what a star schema is?

 A highly de-normalized technique. A star schema has one fact table and is associated with numerous dimensions table and depicts a star.

Explain snow flaking?

The normalized principles applied star schema is known as Snow flake schema. Every dimension table is associated with sub dimension table.

Differences:

  • A dimension table will not have parent table in star schema, whereas snow flake schemas have one or more parent tables.
  • The dimensional table itself consists of hierarchies of dimensions in star schema, where as hierarchies are split into different tables in snow flake schema. The drilling down data from top most hierarchies to the lowermost hierarchies can be done.

What features in SQL Server do you use to tune queries?

The following SQL Server query tuning features can be used with the missing indexes feature:

  • sys.dm_db_index_usage_stats and sys.dm_exec_query_stats dynamic management views

  • Database Engine Tuning Advisor


Are clustered index scans beneficial to an execution plan?

What is an index? What are the types of indexes? How many clustered indexes can be created on a table? I create a separate index on each column of a table. what are the advantages and disadvantages of this approach?


Indexes in SQL Server are similar to the indexes in books. They help SQL Server retrieve the data quicker.


Indexes are of two types. Clustered indexes and non-clustered indexes. When you create a clustered index on a table, all the rows in the table are stored in the order of the clustered index key. So, there can be only one clustered index per table. Non-clustered indexes have their own storage separate from the table data storage. Non-clustered indexes are stored as B-tree structures (so do clustered indexes), with the leaf level nodes having the index key and it's row locater. The row located could be the RID or the Clustered index key, depending up on the absence or presence of clustered index on the table.


Are there any performance issues with having a lot of indexes on a table? 

If you create an index on each column of a table, it improves the query performance, as the query optimizer can choose from all the existing indexes to come up with an efficient execution plan. At the same time, data modification operations (such as INSERT, UPDATE, DELETE) will become slow, as every time data changes in the table, all the indexes need to be updated. Another disadvantage is that, indexes need disk space, the more indexes you have, more disk space is used.

Can you explain the MERGE command and how it is used?

Performs insert, update, or delete operations on a target table based on the results of a join with a source table. For example, you can synchronize two tables by inserting, updating, or deleting rows in one table based on differences found in the other table.

 

--確定目標表
Merge Into Demo_AllProducts p
--從資料來源查詢編碼相同的產品
using Demo_Shop1_Product s on p.DCode=s.DCode
--如果編碼相同,則更新目標表的名稱
When Matched and P.DName<>s.DName Then Update set P.DName=s.DName
--如果目標表中不存在,則從資料來源插入目標表
When Not Matched By Target Then Insert (DName,DCode,DDate) values (s.DName,s.DCode,s.DDate)
--如果資料來源的行在源表中不存在,則刪除源錶行
When Not Matched By Source Then Delete;



What’s the difference in using a table variable and a temp table?

The first difference is that transaction logs are not recorded for the table variables.  The second major difference is that any procedure with a temporary table cannot be pre-compiled, while an execution plan of procedures with table variables can be statically compiled in advance. Pre-compiling a script. gives a major advantage to its speed of execution.  Finally, table variables exist only in the same scope as variables. Contrary to the temporary tables, they are not visible in inner stored procedures and in exec(string) statements.

What are the steps you will take to improve performance of a poor performing query?


This is a very open ended question and there could be a lot of reasons behind the poor performance of a query. But some general issues that you could talk about would be: No indexes, table scans, missing or out of date statistics, blocking, excess recompilations of stored procedures, procedures and triggers without SET NOCOUNT ON, poorly written query with unnecessarily complicated joins, too much normalization, excess usage of cursors and temporary tables.


Some of the tools/ways that help you troubleshooting performance problems are:


  • SET SHOWPLAN_ALL ON, 
  • SET SHOWPLAN_TEXT ON, 
  • SET STATISTICS IO ON, 
  • SQL Server Profiler, 
  • Windows NT /2000 Performance monitor, 
  • Graphical execution plan in Query Analyzer.

Download the white paper on performance tuning SQL Server from Microsoft web site.

What are the benefits or detriments to using cursors?

Cursors allow row-by-row processing of the resultsets.


Types of cursors:


Static, 
Dynamic, 
Forward-only, 
Keyset-driven.

Disadvantages of cursors: Each time you fetch a row from the cursor, it results in a network roundtrip, where as a normal SELECT query makes only one round trip, however large the resultset is. Cursors are also costly because they require more resources and temporary storage (results in more IO operations). Further, there are restrictions on the SELECT statements that can be used with some types of cursors.

What is a deadlock and what is a live lock? How will you go about resolving deadlocks?


Deadlock is a situation when two processes, each having a lock on one piece of data, attempt to acquire a lock on the other's piece. Each process  would wait indefinitely for the other to release the lock, unless one of the user processes is terminated. SQL Server detects deadlocks and terminates one user's process.


A livelock is one, where a  request for an exclusive lock is repeatedly denied because a series of overlapping shared locks keeps interfering. SQL Server detects the situation after four denials and refuses further shared locks. A livelock also occurs when read transactions monopolize a table or page, forcing a write transaction to wait indefinitely.


Check out SET DEADLOCK_PRIORITY and "Minimizing Deadlocks"  in SQL Server books online. Also check out the article Q169960 from Microsoft knowledge base

What is foreach-loop container? Give an example of where it can be used.


What is a sequence container? Give an example of where it can be used.

What task do you use to send an email?  Send Mail task.

Mention a few mapping operations that the Character Map transformation supports? 

The Character Map transformation applies string functions, such as conversion from lowercase to uppercase, to character data. This transformation operates only on column data with a string data type. The Character Map transformation can convert column data in place or add a column to the transformation output and put the converted data in the new column. You can apply different sets of mapping operations to the same input column and put the results in different columns. For example, you can convert the same column to uppercase and lowercase and put the results in two different columns.

 

Explain the functionality of: SCD transformation.

The Slowly Changing Dimension transformation provides the following functionality for managing slowly changing dimensions:

  • Matching incoming rows with rows in the lookup table to identify new and existing rows.
  • Identifying incoming rows that contain changes when changes are not permitted.
  • Identifying inferred member records that require updating.
  • Identifying incoming rows that contain historical changes that require insertion of new records and the updating of expired records.
  • Detecting incoming rows that contain changes that require the updating of existing records, including expired ones.


Explain the functionality of: Union All transformation.


What is the “Lookup” transformation used for?


What is the use of “package configurations” in SSIS?


What are the different ways in which configuration details can be stored?


How to deploy a package from development server to production server?


How to deploy packages to file system?


How to deploy packages to SQL Server? What database will packages be stored?

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/7899089/viewspace-732728/,如需轉載,請註明出處,否則將追究法律責任。

相關文章