SQL Server 2008效能故障排查(四)——TempDB

發糞塗牆發表於2012-07-18

接著上一章:I/O

TempDB:

         TempDB是一個全域性資料庫,儲存內部和使用者物件還有零食表、物件、在SQLServer操作過程中產生的儲存過程等資料。在一個SQLServer例項中,只有一個TempDB。所以容易成為效能和磁碟空間瓶頸。TempDB可能因為空間可用程度和過量的DDL和DML操作而變得超負荷。這可能導致不相關的應用執行緩慢甚至失敗。

         常見的TempDB問題如下:

l  TempDB空間超支。

l  因為TempDB的I/O瓶頸而導致查詢緩慢。這可以檢視前面的I/O瓶頸章節。

l  過渡的DDL操作導致系統表產生瓶頸。

l  資源分配爭用。

在開始診斷TempDB問題之前,先看看TempDB的空間是如何使用的。可以總結為4部分:

Category

Description

User Objects(使用者物件)

由使用者會話顯式建立並且在系統目錄中被跟蹤的物件。包括:

表及索引;

全域性臨時表(##t1)及其索引;

區域性臨時表(#t1)及其索引;

會話(session)範圍:包括會話範圍及在儲存過程中的範圍;

表變數(@t1)範圍:包括會話範圍及在儲存過程中的範圍;

Internal Objects(內部物件)

這是語句範圍的物件,存在和消失於SQLServer處理的查詢中。包括:

工作檔案(hash join);

執行排序;

工作表(遊標、離線(spool)和LOB(大物件資料型別)型別儲存);

從優化角度,當工作表被刪除時,一個IAM也和一個區將被儲存用於新的工作表。

Version Store(版本儲存)

這部分用於儲存行版本、MARS、聯機索引、觸發器、基於快照的隔離級別的行版本。

Free Space(空餘空間)

TempDB的可用空間

TempDB的總使用空間等於使用者物件(userobjects)加上內建物件(internal objects)加上版本儲存(version store)加上可用空間。

可用空間等於效能計數器中tempdb 的可用空間值。

 

監控Tempdb空間(Monitoring tempdb Space):

提早發現問題總比出現了再解決要強。你可以使用效能計數器:Free Space in tempdb(KB)來監控TempDB的空間使用情況。這個計數器按KB來跟蹤TempDB。DBA可以使用這個指標來判斷tempdb是否執行在低空間環境。但是,標識不同類別,就像簽名定義的一樣,tempdb使用磁碟空間的情況是非常豐富的。下面的查詢返回tempdb被使用者和內建物件使用情況,注意,這僅僅適用於tempdb:

 

Select

    SUM(user_object_reserved_page_count)*8 asuser_objects_kb,

    SUM(internal_object_reserved_page_count)*8 asinternal_objects_kb,

    SUM(version_store_reserved_page_count)*as version_store_kb,

    SUM(unallocated_extent_page_count)*8 as freespace_kb

From sys.dm_db_file_space_usage

Where database_id = 2

本機結果:

user_objects_kb     internal_objects_kb version_store_kb     freespace_kb

-------------------- ---------------------------------------- --------------------

NULL                NULL                 NULL                 NULL

注意這些資料是不包含混合區的計算,混合區被分配給使用者和內建物件。

 

空間分配故障排查:

使用者物件、內建物件和版本儲存能引起tempdb的空間申請,下面我們看看如何檢查每部分的故障問題。

使用者物件(User objects):

因為使用者物件不屬於任何特定會話(specific sessions),你需要理解規範的應用程式應該根據特定的要求建立和調整使用者物件。你可以通過執行exec sp_spaceused @objname=’<user-object>’來找到由個別使用者物件使用的空間。比如,執行以下指令碼來列舉所有tempdb 物件:

DECLARE userobj_cursor CURSOR FOR

select

     sys.schemas.name + '.' + sys.objects.name

from sys.objects, sys.schemas

where object_id > 100 and

     type_desc = 'USER_TABLE'and

     sys.objects.schema_id = sys.schemas.schema_id

go

 

open userobj_cursor

go

 

declare @name varchar(256)

fetch userobj_cursor into @name

while (@@FETCH_STATUS = 0)

begin

    exec sp_spaceused@objname = @name

       fetch userobj_cursor into @name  

end

close userobj_cursor

 

版本儲存(Version Store):

SQLServer2008 提供一個行版本框架,目前為止,以下特性被用於行版本框架:

l  觸發器

l  MARS

l  聯機索引

l  基於行版本隔離級別:需要在資料庫級別設定選項

更多資訊請檢視聯機叢書:RowVersioning Resource Usage

行版本在會話過程中是共享的,建立者也沒有許可權去回收行版本。你需要找到並可能的情況下停止執行最久的事務來保證行版本的清除。下面的插敘是返回執行最久的基於行版本儲存的兩個事務:

 

select top 2

   transaction_id,

   transaction_sequence_num,

   elapsed_time_seconds

from sys.dm_tran_active_snapshot_database_transactions

order by elapsed_time_seconds DESC

 

以下是示例結果:

 

transaction_id       transaction_sequence_numelapsed_time_seconds

-------------------- --------------------------------------------

8609                 3                          6523

20156                25                         783

 

因為第二個活動事務在一個短時期內被引用,所以你應該把時間花在第一個事務中。但是,沒有辦法預估多少版本空間將被停止的事務釋放。所以建議停止多一點的事務來清空這部分的空間。

可以通過特定賬號來固定tempdb中版本儲存的空間或者靠清除,如果可能,應該清除執行最久的快照隔離事務或者使用已提交讀快照隔離的執行最久的查詢。可以使用以下公式大概預估行版本儲存所需空間:

 

[Sizeof version store] = 2 * [version store data generated per minute] * [longestrunning time (minutes) of the transaction]

 

在所有允許使用行版本隔離級別的資料庫中,版本儲存每分鐘產生的事務和日誌資料產生的相同。但是也有例外:在更新時的日誌記錄;還有最近插入的資料是沒有形成版本,但會被記錄日誌。如果是大容量日誌操作,並且恢復模式不是完全恢復,你可以使用Version Generation Rate 和Version Cleanup Rate效能計數器來計算。如果VersionCleanup Rate為0,則執行久的事務會防止行版本儲存被清空。附帶說明,在發生tempdb空間不足的錯誤前,SQLServer2008會堅持到最後,防止行版本儲存被收縮。在收縮過程中,執行最久的事務卻沒產生任何行版本的將被標記為“受害者”,並清空這部分的空間。訊息3967就是在錯誤日誌中顯示每個受害事務的資訊。一旦事務被標記為受害者,將不能建立或者訪問行版本資訊。訊息3966記錄受害事務嘗試讀取行版本時將被回滾的情況。當收縮行版本儲存成功後,tempdb將有更多的可用空間,否則,tempdb將耗盡。

 

內建物件(Internal Objects):

內建物件是被每個語句建立或銷燬的物件,處理在前面說道的部分之外,其他都會建立。如果你發現在tempdb中有一個很大的空間被分配,你就要檢查哪個會話或任務消耗了這部分空間,然後儘可能校正。

SQLServer2008提供DMVs:

sys.dm_db_session_space_usage 和sys.dm_db_task_space_usage

來追蹤tempdb空間被哪些會話或者任務分配了。雖然任務是在會話環境下執行,但是任務的空間使用在任務完成之後才被會話佔用的。可以使用以下查詢來找到排行前列的會話分配。注意這些結果只包含任務已經完成的會話:

 

select

   session_id,

   internal_objects_alloc_page_count,

   internal_objects_dealloc_page_count

from sys.dm_db_session_space_usage

order byinternal_objects_alloc_page_count DESC

 

可以使用下面的查詢前列會話中分配給內建物件,包含目前活動的任務:

 

SELECT

   t1.session_id,

    (t1.internal_objects_alloc_page_count + task_alloc) as allocated,

    (t1.internal_objects_dealloc_page_count + task_dealloc) as

   deallocated

from sys.dm_db_session_space_usage ast1,

    (select session_id,

       sum(internal_objects_alloc_page_count)

           as task_alloc,

    sum(internal_objects_dealloc_page_count) as

       task_dealloc

     from sys.dm_db_task_space_usagegroup bysession_id) ast2

where t1.session_id = t2.session_id and t1.session_id >50

order by allocated DESC

 

下面是示例輸出:

 

session_id allocated            deallocated

---------- -------------------------------------

52          5120                 5136

51           16                   0

 

當你有一個隔離的任務或者產生大量內建物件分配的任務時,可以使用下面語句來發現這些語句和他們的詳細執行計劃:

 

select

   t1.session_id,

   t1.request_id,

   t1.task_alloc,

   t1.task_dealloc,

   t2.sql_handle,

   t2.statement_start_offset,

   t2.statement_end_offset,

   t2.plan_handle

from (Select session_id,

            request_id,

            sum(internal_objects_alloc_page_count) as task_alloc,

            sum(internal_objects_dealloc_page_count) as task_dealloc

     from sys.dm_db_task_space_usage

     group bysession_id, request_id)as t1,

     sys.dm_exec_requests ast2

where t1.session_id = t2.session_id and

     (t1.request_id =t2.request_id)

order by t1.task_alloc DESC

 

示例輸出:

 

session_id request_id  task_alloc           task_dealloc 

---------------------------------------------------------      

52        0           1024                 1024                

 

sql_handle                     statement_start_offset

-----------------------------------------------------------------------

0x02000000D490961BDD2A8BE3B0FB81ED67655EFEEB360172   356 

 

statement_end_offset  plan_handle     

---------------------------------                                      

-1                   0x06000500D490961BA8C19503000000000000000000000000

 

可以使用sql_handle和plan_handle列來得到語句的執行計劃:

 

select text from sys.dm_exec_sql_text(@sql_handle)

select * fromsys.dm_exec_query_plan(@plan_handle)

 

注意,當你想訪問這些執行計劃的時候可能它們不再快取中,為了保證執行計劃的可用性,快取的執行計劃會頻繁輪詢和儲存結果,以便更好地使用。所以它可能會在後來才查到。

當SQLServer重啟時,tempdb會初始化並增長到配置大小。這可能導致tempdb出現碎片,和引起間接開銷,包含資料庫自動增長時申請新區而導致的阻塞,擴張tempdb空間。這可能導致你的工作負載增大而影響效能。建議預先設定tempdb到適合的大小。

 

過度的DDL和分配操作:

Tempdb爭論的兩個點為:

建立和刪除大資料量的臨時表或者表變數會引起源資料的爭用。在SQLServer2008中,區域性臨時表和表變數只是快取最小的源資料。但是,下面的條件必須滿足。否則,這些臨時物件將不會被快取:

l  沒有建立命名約束

l  作用在表上的DDL語句,在臨時表建立後沒有執行,比如CREATE INDEX或者CREATE STATISTICS語句。

l  沒有使用動態SQL建立的臨時物件,如sp_executesqlN’create table #t(a int)’。

l  在別的物件中建立的臨時物件,比如儲存過程、觸發器或者使用者自定義函式、或者在臨時物件中返回使用者自定義函式、表值函式。

具有代表性的是,幾乎所有的在堆中的臨時/工作表都有這種情況。所以,一個增、刪、或者drop操作都會英氣PFS(空頁面空間)頁面的嚴重資源爭用。如果大部分這些表都小於64KB和使用混合區來分配空間,會給SGAM(共享全域性分配對映)頁也帶來很重的負擔。

SQLServer2008快取一個資料頁和一個IAM頁給均不臨時表作為最小分配資源。工作表的快取改進了。當一個查詢執行時,計劃也會被快取,工作表在多個執行中的計劃裡面被使用,但很少清空。此外,第一個工作表的9個頁面會被保留。

因為SGAM和PFS頁發生在資料檔案中固定間隔發生。所以容易找到它們的資源描述。所以,比如2:1:1表示在tempdb中的第一個PFS頁(databaseid=2,fileid=1,pageid=1),2:1:3表示第一個SGAM頁。SGAM頁在每511232個頁面後產生一個。PFS頁會在每8088個頁面後產生一個。你可以通過這個特性去tempdb中超找所有PFS和SGAM頁。任何時候一個任務都會等待得到這些頁上的閂鎖(latch),這些資訊儲存在sys.dm_os_waiting_tasks表中。由於閂鎖等待是很短暫的,所以你可以經常查詢這些表(大概10秒一次)。並且收集這些資訊做後續分析。比如,你可以使用下面面查詢去載入所有在tempdb頁中等待的任務到Analysis資料庫的waiting_tasks表中:

 

-- get the current timestamp

declare @now datetime

select @now = getdate()

 

-- insert data into a table forlater analysis

insert into analysis..waiting_tasks

     select

         session_id,

         wait_duration_ms,

         resource_description,

         @now

     from sys.dm_os_waiting_tasks

     where wait_type like‘PAGE%LATCH_%and

           resource_description like ‘2:%

 

任何時候當你在表中發現tempdb頁中的latch申請,你就能分析是否基於PFS/SGAM頁。如果是,意味著在tempdb中存在分配爭用。如果看到爭用在tempdb的其他頁,並且如果你能識別這些也屬於系統表,意味著存在過多的DDL操作引起了資源爭用。

在tempdb物件分配造成的不正常增長,也可以監控下面的效能計數器:

1.   SQL Server:Access Methods\Workfiles Created /Sec

2.   SQL Server:Access Methods\Worktables Created /Sec

3.   SQL Server:Access Methods\Mixed Page Allocations /Sec

4.   SQL Server:General Statistics\Temp Tables Created /Sec

5.   SQL Server:General Statistics\Temp Tables for destruction

 

解決:

如果tempdb由於過度的DDL操作引起資源爭用。你可以檢查應用程式和看看是否最小化DDL操作。可以嘗試以下建議:

l  從SQLServer2005開始,臨時物件在前面所說的情況下被快取。但是,如果你依然遇到重大的DDL爭用。你就需要查詢哪些臨時物件沒有被快取和為什麼會發生這種情況。如果這些物件發生在迴圈或者儲存過程裡面,考慮把它們移出儲存過程或者迴圈中。

l  檢查執行計劃,是否有一些計劃建立了大量的臨時物件、假離線、排序或者工作表。對此,你需要把一些臨時物件清理掉。比如,在列中建立用於order by的索引可以考慮移除排序。

如果爭用是由於SGAM/PFS頁引起,可以通過以下方式減緩:

l  增加tempdb資料檔案,來平衡磁碟和檔案的負載。理想的情況下,應該和CPU個數持平。

使用TF-1118來移除混合區的分配。


下一章:記憶體


原文

tempdb
tempdb globally stores both internal and user objects and the temporary tables, objects, and stored procedures that are created during SQL Server operation.
There is a single tempdb for each SQL Server instance. It can be a performance and disk space bottleneck. tempdb can become overloaded in terms of space available and excessive DDL and DML operations. This can cause unrelated applications running on the server to slow down or fail.
Some of the common issues with tempdb are as follows:
• Running out of storage space in tempdb.
• Queries that run slowly due to the I/O bottleneck in tempdb. This is covered under I/O Bottlenecks earlier in this paper.
• Excessive DDL operations leading to a bottleneck in the system tables.
• Allocation contention.
Before we start diagnosing problems with tempdb, let us first look at how the space in tempdb is used. It can be grouped into four main categories.

Category Description
User objects These are explicitly created by user sessions and are tracked in system catalog. They include the following:
Table and index.
Global temporary table (##t1) and index.
Local temporary table (#t1) and index.
Session scoped.
Stored procedure scoped in which it was created.
Table variable (@t1).
Session scoped.
Stored procedure scoped in which it was created.

Internal objects These are statement scoped objects that are created and destroyed by SQL Server to process queries. These are not tracked in the system catalog. They include the following:
Work file (hash join)
Sort run
Work table (cursor, spool and temporary large object data type (LOB) storage)
As an optimization, when a work table is dropped, one IAM page and an extent is saved to be used with a new work table.
There are two exceptions: The temporary LOB storage is batch scoped, and the cursor worktable is session scoped.
Version store This is used for storing row versions. MARS, online index, triggers, and snapshot-based isolation levels are based on row versioning.
Free space This represents the disk space that is available in tempdb.

The total space used by tempdb equal to the user objects plus the internal objects plus the version store plus the free space.
This free space is same as the performance counter free space in tempdb.

Monitoring tempdb Space
It is better to prevent a problem than it is to work to solve it later. You can use the Free Space in tempdb (KB) performance counter to monitor the amount of space tempdb is using. This counter tracks free space in tempdb in kilobytes. Administrators can use this counter to determine whether tempdb is running low on free space.
However, identifying how the different categories, as defined earlier, are using the disk space in tempdb is a more interesting, and productive, question.
The following query returns the tempdb space used by user and by internal objects. Currently, it provides information for tempdb only.

Select
SUM (user_object_reserved_page_count)*8 as user_objects_kb,
SUM (internal_object_reserved_page_count)*8 as internal_objects_kb,
SUM (version_store_reserved_page_count)*8 as version_store_kb,
SUM (unallocated_extent_page_count)*8 as freespace_kb
From sys.dm_db_file_space_usage
Where database_id = 2

Here is one sample output (with space in KBs).

user_objets_kb internal_objects_kb version_store_kb freespace_kb
---------------- -------------------- ------------------ ------------
8736 128 64 448

Note that these calculations don’t account for pages in mixed extents. The pages in mixed extents can be allocated to user and internal objects.


Troubleshooting Space Issues
User objects, internal objects, and version storage can all cause space issues in tempdb. In this section, we consider how you can troubleshoot each of these categories.
User Objects
Because user objects are not owned by any specific sessions, you need to understand the specifications of the application that created them and adjust the tempdb size requirements accordingly. You can find the space used by individual user objects by executing exec sp_spaceused @objname='<user-object>'. For example, you can run the following script to enumerate all the tempdb objects.

DECLARE userobj_cursor CURSOR FOR
select
sys.schemas.name + '.' + sys.objects.name
from sys.objects, sys.schemas
where object_id > 100 and
type_desc = 'USER_TABLE'and
sys.objects.schema_id = sys.schemas.schema_id
go

open userobj_cursor
go

declare @name varchar(256)
fetch userobj_cursor into @name
while (@@FETCH_STATUS = 0)
begin
exec sp_spaceused @objname = @name
fetch userobj_cursor into @name
end
close userobj_cursor



Version Store
SQL Server 2008 provides a row versioning framework. Currently, the following features use the row versioning framework:
• Triggers
• MARS
• Online index
• Row versioning-based isolation levels: requires setting an option at the database level
For more information about these features, see Row Versioning Resource Usage (http://msdn.microsoft.com/en-us/library/ms175492.aspx) in SQL Server 2008 Books Online.
Row versions are shared across sessions. The creator of the row version has no control over when the row version can be reclaimed. You will need to find and then possibly stop the longest-running transaction that is preventing the row version cleanup.
The following query returns the top two longest-running transactions that depend on the versions in the version store.

select top 2
transaction_id,
transaction_sequence_num,
elapsed_time_seconds
from sys.dm_tran_active_snapshot_database_transactions
order by elapsed_time_seconds DESC

Here is a sample output that shows that a transaction with XSN 3 and Transaction ID 8609 has been active for 6,523 seconds.

transaction_id transaction_sequence_num elapsed_time_seconds
-------------------- ------------------------ --------------------
8609 3 6523
20156 25 783

Because the second transaction has been active for a relatively short period, you might be able to free up a significant amount of version store by stopping the first transaction. However, there is no way to estimate how much version space will be freed up by stopping this transaction. You may need to stop few a more transactions to free up significant space.

You can mitigate this problem by either sizing your tempdb properly to account for the version store or by eliminating, where possible, long-running transactions with snapshot isolation or long-running queries with read-committed-snapshot isolation. You can roughly estimate the size of the version store that is needed by using the following formula. (A factor of two is needed to account for the worst-case scenario, which occurs when the two longest-running transactions overlap.)

[Size of version store] = 2 * [version store data generated per minute] *
[longest running time (minutes) of the transaction]

In all databases that are enabled for row versioning based isolation levels, the version store data generated per minute for a transaction is about the same as log data generated per minute. However, there are some exceptions: Only differences are logged for updates; and a newly inserted data row is not versioned, but it might be logged, if it is a bulk-logged operation and the recovery mode is not full recovery.
You can also use the Version Generation Rate and Version Cleanup Rate performance counters to fine-tune your computation. If your Version Cleanup Rate is 0, a long-running transaction could be preventing the version store cleanup.
Incidentally, before generating an out-of-tempdb-space error, SQL Server 2008 makes a last-ditch attempt by forcing the version stores to shrink. During the shrink process, the longest-running transactions that have not yet generated any row versions are marked as victims. This frees up the version space used by them. Message 3967 is generated in the error log for each such victim transaction. If a transaction is marked as a victim, it can no longer read the row versions in the version store or create new ones. Message 3966 is generated and the transaction is rolled back when the victim transaction attempts to read row versions. If the shrink of the version store succeeds, more space is available in tempdb. Otherwise, tempdb runs out of space.

Internal Objects
Internal objects are created and destroyed for each statement, with exceptions as outlined in the table in tempdb earlier in this paper. If you notice that a huge amount of tempdb space is allocated, you should determine which session or tasks are consuming the space and then possibly take corrective action.
SQL Server 2008 provides two DMVs, sys.dm_db_session_space_usage and sys.dm_db_task_space_usage, to track tempdb space that is allocated to sessions and tasks, respectively. Though tasks are run in the context of sessions, the space used by tasks is accounted for under sessions only after the tasks complete.
You can use the following query to find the top sessions that are allocating internal objects. Note that this query includes only the tasks that have been completed in the sessions.

select
session_id,
internal_objects_alloc_page_count,
internal_objects_dealloc_page_count
from sys.dm_db_session_space_usage
order by internal_objects_alloc_page_count DESC

You can use the following query to find the top user sessions that are allocating internal objects, including currently active tasks.

SELECT
t1.session_id,
(t1.internal_objects_alloc_page_count + task_alloc) as allocated,
(t1.internal_objects_dealloc_page_count + task_dealloc) as
deallocated
from sys.dm_db_session_space_usage as t1,
(select session_id,
sum(internal_objects_alloc_page_count)
as task_alloc,
sum (internal_objects_dealloc_page_count) as
task_dealloc
from sys.dm_db_task_space_usage group by session_id) as t2
where t1.session_id = t2.session_id and t1.session_id >50
order by allocated DESC



Here is sample output.

session_id allocated deallocated
---------- -------------------- --------------------
52 5120 5136
51 16 0

After you have isolated the task or tasks that are generating a lot of internal object allocations, you can find out which Transact-SQL statement it is and its query plan for a more detailed analysis.

select
t1.session_id,
t1.request_id,
t1.task_alloc,
t1.task_dealloc,
t2.sql_handle,
t2.statement_start_offset,
t2.statement_end_offset,
t2.plan_handle
from (Select session_id,
request_id,
sum(internal_objects_alloc_page_count) as task_alloc,
sum (internal_objects_dealloc_page_count) as task_dealloc
from sys.dm_db_task_space_usage
group by session_id, request_id) as t1,
sys.dm_exec_requests as t2
where t1.session_id = t2.session_id and
(t1.request_id = t2.request_id)
order by t1.task_alloc DESC



Here is sample output.

session_id request_id task_alloc task_dealloc
---------------------------------------------------------
52 0 1024 1024

sql_handle statement_start_offset
-----------------------------------------------------------------------
0x02000000D490961BDD2A8BE3B0FB81ED67655EFEEB360172 356

statement_end_offset plan_handle
---------------------------------
-1 0x06000500D490961BA8C19503000000000000000000000000

You can use the sql_handle and plan_handle columns to get the SQL statement and the query plan as follows.

select text from sys.dm_exec_sql_text(@sql_handle)
select * from sys.dm_exec_query_plan(@plan_handle)

Note that it is possible that a query plan may not be in the cache when you want to access it. To guarantee the availability of the query plans, poll the plan cache frequently and save the results, preferably in a table, so that it can be queried later.
When SQL Server is restarted, the tempdb size goes back to the initially configured size and it grows based on the requirements. This can lead to fragmentation of the tempdb and can incur overhead, including the blocking of the allocation of new extents during the database auto-grow, and expanding the size of the tempdb. This can impact the performance of your workload. We recommend that you preallocate tempdb to the appropriate size.

Excessive DDL and Allocation Operations
Two sources of contention in tempdb can result in the following situations.
Creating and dropping large numbers of temporary tables and table variables can cause contention on metadata. In SQL Server 2008, local temporary tables and table variables are cached to minimize metadata contention. However, the following conditions must be satisfied; otherwise, the temp objects are not cached:
• Named constraints are not created.
• DDL statements that affect the table are not run after the temp table has been created, such as the CREATE INDEX or CREATE STATISTICS statements.
• The temp object is not created by using dynamic SQL, such as: sp_executesql N'create table #t(a int)'.
• The temp object is created inside another object, such as a stored procedure, trigger, or user-defined function; or the temp object is the return table of a user-defined, table-valued function.
Typically, most temporary/work tables are heaps; therefore, an insert, delete, or drop operation can cause heavy contention on Page Free Space (PFS) pages. If most of these tables are smaller than 64 KB and use mixed extent for allocation or deal location, this can put heavy contention on Shared Global Allocation Map (SGAM) pages. SQL Server 2008 caches one data page and one IAM page for local temporary tables to minimize allocation contention. Worktable caching is improved. When a query execution plan is cached, the work tables needed by the plan are not dropped across multiple executions of the plan but merely truncated. In addition, the first nine pages for the work table are kept.
Because SGAM and PFS pages occur at fixed intervals in data files, it is easy to find their resource description. So, for example, 2:1:1 represents the first PFS page in the tempdb (database-id = 2, file-id =1, page-id = 1) and 2:1:3 represents the first SGAM page. SGAM pages occur after every 511,232 pages, and each PFS page occurs after every 8,088 pages. You can use this to find all other PFS and SGAM pages across all files in tempdb. Any time a task is waiting to acquire latch on these pages, it shows up in sys.dm_os_waiting_tasks. Because latch waits are transient, you should query this table frequently (about once every 10 seconds) and collect this data for analysis later. For example, you can use the following query to load all tasks waiting on tempdb pages into a waiting_tasks table in the analysis database.

-- get the current timestamp
declare @now datetime
select @now = getdate()

-- insert data into a table for later analysis
insert into analysis..waiting_tasks
select
session_id,
wait_duration_ms,
resource_description,
@now
from sys.dm_os_waiting_tasks
where wait_type like ‘PAGE%LATCH_%’ and
resource_description like ‘2:%’

Any time you see tasks waiting to acquire latches on tempdb pages, you can analyze to see whether it is due to PFS or SGAM pages. If it is, this implies allocation contention in tempdb. If you see contention on other pages in tempdb, and if you can identify that a page belongs to the system table, this implies contention due to excessive DDL operations.
You can also monitor the following Performance Monitor counters for any unusual increase in the temporary objects allocation/deal location activity:
• SQL Server:Access Methods\Workfiles Created /Sec
• SQL Server:Access Methods\Worktables Created /Sec
• SQL Server:Access Methods\Mixed Page Allocations /Sec
• SQL Server:General Statistics\Temp Tables Created /Sec
• SQL Server:General Statistics\Temp Tables for destruction
Resolution
If the contention in tempdb is due to excessive DDL operation, you should look at your application and see whether you can minimize the DDL operation. You can try the following suggestions:
• Starting with SQL Server 2005, the temporary objects are cached under conditions as described earlier. However, if you are still encountering significant DDL contention, you need to look at what temporary objects are not being cached and where do they occur. If such objects occur inside a loop or a stored procedure, consider moving them out of the loop or the stored procedure.
• Look at query plans to see if some plans create lot of temporary objects, spools, sorts, or worktables. You may need to eliminate some temporary objects. For example, creating an index on a column that is used in ORDER BY might eliminate the sort.
If the contention is due to the contention in SGAM and PFS pages, you can mitigate it by trying the following:
• Increase the tempdb data files by an equal amount to distribute the workload across all of the disks and files. Ideally, you want to have as many files as there are CPUs (taking into account the affinity).
• Use TF-1118 to eliminate mixed extent allocations.

相關文章