SQL Server允許并發(fā)操作,BLOCKING是指在某一操作沒(méi)有完成之前,其他操作必須等待,以便于保證數(shù)據(jù)的完整性。BLOCKING的解決方法要查看BLOCKING的頭是什么,為什么BLOCKING頭上的語(yǔ)句執(zhí)行的很慢。通常來(lái)講只要我們能找到BLOCKING頭上的語(yǔ)句,我們總能夠想出各種各種的辦法,來(lái)提升性能,緩解或解決BLOCKING的問(wèn)題。
?
但是問(wèn)題的關(guān)鍵是,我們不知道BLOCKING什么時(shí)候會(huì)發(fā)生。用戶(hù)跟我們抱怨數(shù)據(jù)庫(kù)性能很差,等我們連上數(shù)據(jù)庫(kù)去查看的時(shí)候,那時(shí)候有可能BLOCKING可能就已經(jīng)過(guò)去了。性能又變好了。或者由于問(wèn)題的緊急性,我們直接重新啟動(dòng)服務(wù)器以恢復(fù)運(yùn)營(yíng)。但是問(wèn)題并沒(méi)有最終解決,我們不知道下次問(wèn)題會(huì)在什么時(shí)候發(fā)生。
?
BLOCKING問(wèn)題的后果比較嚴(yán)重。因?yàn)榻K端用戶(hù)能直接體驗(yàn)到。他們提交一個(gè)訂單的時(shí)候,無(wú)論如何提交不上去,通常幾秒之內(nèi)能完成的一個(gè)訂單提交,甚至要等待十幾分鐘,才能提交完成。更有甚者,極嚴(yán)重的BLOCKING能導(dǎo)致SQL Server停止工作。如下面的SQL ERRORLOG所表示,?在短短的幾分鐘之內(nèi),SPID數(shù)據(jù)從158增長(zhǎng)到694,?并馬上導(dǎo)致SQL Server打了一個(gè)dump,?停止工作。我們很容易推斷出問(wèn)題的原因是由于BLOCKING導(dǎo)致的,但是我們無(wú)法得知BLOCKING HEADER是什么,我們必須要等下次問(wèn)題重現(xiàn)時(shí),輔之以工具,才能得知BLOCKING HEADER在做什么事情。如果信息抓取時(shí)機(jī)不對(duì),我們可能要等問(wèn)題發(fā)生好幾次,才能抓到。這時(shí)候,客戶(hù)和經(jīng)理就會(huì)有抱怨了。因?yàn)槲覀兊南到y(tǒng)是生產(chǎn)系統(tǒng),問(wèn)題每發(fā)生一次,都會(huì)對(duì)客戶(hù)帶來(lái)?yè)p失。
?
2011-06-01 16:22:30.98 spid1931??? Alert There are 158 Active database sessions which is too high.
2011-06-01 16:23:31.16 spid3248??? Alert There are 342 Active database sessions which is too high.
2011-06-01 16:24:31.06 spid3884??? Alert There are 517 Active database sessions which is too high.
2011-06-01 16:25:31.08 spid3688??? Alert There are 694 Active database sessions which is too high.
2011-06-01 16:26:50.93 Server????? Using 'dbghelp.dll' version '4.0.5'
2011-06-01 16:26:50.97 Server????? **Dump thread -?spid ?= 0, EC = 0x0000000000000000
2011-06-01 16:26:50.97 Server????? ***Stack Dump being sent to D:\MSSQL10.INSTANCE\MSSQL\LOG\SQLDump0004.txt
2011-06-01 16:26:50.97 Server????? * *******************************************************************************
2011-06-01 16:26:50.97 Server????? *
2011-06-01 16:26:50.97?Server ????? * BEGIN STACK DUMP:
2011-06-01 16:26:50.97 Server????? *?? 06/01/11 16:26:50?spid ?4124
2011-06-01 16:26:50.97 Server????? *
2011-06-01 16:26:50.97 Server????? * Deadlocked Schedulers
2011-06-01 16:26:50.97 Server????? *
2011-06-01 16:26:50.97 Server????? * *******************************************************************************
2011-06-01 16:26:50.97 Server????? * -------------------------------------------------------------------------------
2011-06-01 16:26:50.97 Server????? * Short Stack Dump
2011-06-01 16:26:51.01 Server????? Stack Signature for the dump is 0x0000000000000258
?
BLOCKING的信息抓取有很多種方法。這里羅列了幾種。并且對(duì)每種分析它的優(yōu)缺點(diǎn)。以便我們選擇。在枚舉方法之前,我們先簡(jiǎn)單演示一下BLOCKING.
?
我們首先創(chuàng)建一個(gè)測(cè)試表:
DROP ? TABLE ?[TESTTABLE]
GO
?
CREATE ? TABLE ?[dbo] . [TESTTABLE] (
??????[ID] [int]? NULL,
??????[NAME] [nvarchar] ( 50 ) ? NULL
)
GO
?
INSERT ? INTO ?TESTTABLE? VALUES? ( 1 , ? 'aaaa' )
GO
?
然后打開(kāi)一個(gè)查詢(xún)窗口,執(zhí)行下面的語(yǔ)句,?該語(yǔ)句修改一行數(shù)據(jù),并等待3分鐘,然后在結(jié)束transaction
BEGIN ? TRANSACTION
UPDATE ?TESTTABLE? SET ?[NAME]? = ? 'bbbb' ? WHERE ?[ID]? = ?1
WAITFOR ?? DELAY ? '00:03:00'
COMMIT ? TRANSACTION
?
這時(shí)候,如果打開(kāi)另外一個(gè)查詢(xún)窗口,執(zhí)行下面的語(yǔ)句,下面的語(yǔ)句就會(huì)被BLOCK住。
UPDATE ?TESTTABLE? SET ?[NAME]? = ? 'cccc' ? WHERE ?[ID]? = ?1
?
?
方法一,?抓取SQL Profiler
======================
SQL Profiler里面包含大量的信息。其中有一個(gè)事件在Errors and Warnings->Blocked Process Report專(zhuān)門(mén)用來(lái)獲得blocking的情況。但是因?yàn)樾畔⒘勘容^大,而且我們并不能很好的估算在什么時(shí)候會(huì)產(chǎn)生blocking,另外在生產(chǎn)環(huán)境使用Profiler,?對(duì)性能可能會(huì)有影響,所以SQL Profiler并不是最合適的工具。我們?cè)谶@里并不對(duì)它贅述。
?
方法二,?執(zhí)行查詢(xún)
================
如果我們檢查問(wèn)題的時(shí)候,blocking還存在,那么,我們可以直接可以運(yùn)行幾個(gè)查詢(xún),得知BLOCKING HEADER的信息
?
SELECT ? * ? FROM ? sys . sysprocesses ? where ?spid > 50
?
上述查詢(xún)只是告訴我們,BLOCKING HEADER的頭是SPID=53,?但是并沒(méi)有告訴我們SPID=53在做什么事情。我們可以用下面的查詢(xún),得到SPID=53的信息
DBCC ?INPUTBUFFER ( 53 )
?
我們可以把上述的兩個(gè)查詢(xún)合并起來(lái),用下面的查詢(xún):
?
SELECT ?SPID = p . spid ,
???????DBName? = ? convert ( CHAR ( 20 ), d . name ),
???????ProgramName? = ? program_name ,
???????LoginName? = ? convert ( CHAR ( 20 ), l . name ),
???????HostName? = ? convert ( CHAR ( 20 ), hostname ),
??????? Status ? = ?p . status ,
???????BlockedBy? = ?p . blocked ,
???????LoginTime? = ?login_time ,
???????QUERY? = ? CAST ( TEXT ? AS ? VARCHAR ( MAX ))
FROM ??? MASTER . dbo . sysprocesses ?p
??????? INNER ? JOIN ? MASTER . dbo . sysdatabases ?d
????????? ON ?p . dbid ? = ?d . dbid
??????? INNER ? JOIN ? MASTER . dbo . syslogins ?l
????????? ON ?p . sid ? = ?l . sid
??????? CROSS ? APPLY ? sys . dm_exec_sql_text ( sql_handle )
WHERE ??p . blocked? = ?0
??????? AND ? EXISTS ? ( SELECT ?1
??????????????????? FROM ??? MASTER .. sysprocesses ?p1
??????????????????? WHERE ??p1 . blocked? = ?p . spid )
?
?
這樣,一次執(zhí)行,就能告訴我們BLOCKING header的SPID信息,以及該SPID在做的語(yǔ)句。我們可以進(jìn)一步研究該語(yǔ)句,以理解為什么該語(yǔ)句執(zhí)行很慢。
?
用這個(gè)方法有一個(gè)缺點(diǎn),就是使用的時(shí)候,要求BLOCKING是存在的。如果BLOCKING已經(jīng)消失了,那么我們的方法就不管用了。
?
?
方法三,長(zhǎng)期執(zhí)行一個(gè)BLOCKING SCRIPT
==================================
因?yàn)槲覀兺ǔo(wú)法知道BLOCKING什么時(shí)候會(huì)產(chǎn)生,所以通常的辦法是我們長(zhǎng)期運(yùn)行一個(gè)BLOCKING SCRIPT,?這樣,等下次發(fā)生的時(shí)候,我們就會(huì)有足夠的信息。長(zhǎng)期運(yùn)行BLOCKING SCRIPT對(duì)性能基本上是沒(méi)有影響的。因?yàn)槲覀兠扛?0秒鐘抓取一次信息。缺點(diǎn)是,如果問(wèn)題一個(gè)月才發(fā)生一次,那么,我們的BLOCKING日志信息會(huì)很大。所以這種方法適用于幾天之內(nèi)能重現(xiàn)問(wèn)題。
?
運(yùn)行方法如下:
?
如果要停止運(yùn)行,我們按ctrl+c就可以了。
BLOCKING的信息存在log.out這個(gè)文件中
?
我們可以打開(kāi)log.out這個(gè)文件,?會(huì)發(fā)現(xiàn)SPID 54被?SPID 53給Block住了。
?
?
而隨后,我們可以看到SPID=53在做什么事情:
?
?
下面是BLOCKING SCRIPT的腳本,?我們可以把它存為blocking.sql
?
use ? master
go
while ?1? = 1
begin
print ? 'Start time: ' ? + ? convert ( varchar ( 26 ), ? getdate (), ?121 )
Print ? 'Running processes'
select ?spid , ?blocked , ?waittype , ?waittime , ?lastwaittype , ?waitresource , ? dbid , uid , ?cpu , ?physical_io , ?memusage , ?login_time , ?last_batch ,
open_tran , ? status , ?hostname , ? program_name , ?cmd , ?net_library , ?loginame
from ? sysprocesses
--where (kpid <> 0 ) or (spid < 51)
-- Change it if you only want to see the working processes
print ? '*********lockinfor***********'
select ? convert ? ( smallint , ?req_spid ) ? As ?spid ,
rsc_dbid? As ? dbid ,
rsc_objid? As ? ObjId ,
rsc_indid? As ?IndId ,
substring ? ( v . name , ?1 , ?4 ) ? As ? Type ,
substring ? ( rsc_text , ?1 , ?16 ) ? as ? Resource ,
substring ? ( u . name , ?1 , ?8 ) ? As ?Mode ,
substring ? ( x . name , ?1 , ?5 ) ? As ? Status
from ? master . dbo . syslockinfo ,
master . dbo . spt_values v ,
master . dbo . spt_values x ,
master . dbo . spt_values u
where ? master . dbo . syslockinfo . rsc_type? = ?v . number
and ?v . type ? = ? 'LR'
and ? master . dbo . syslockinfo . req_status? = ?x . number
and ?x . type ? = ? 'LS'
and ? master . dbo . syslockinfo . req_mode? + ?1? = ?u . number
and ?u . type ? = ? 'L'
order ? by ?spid
print ? 'inputbuffer for running processes'
declare ?@spid? varchar ( 6 )
declare ?ibuffer? cursor ? fast_forward ? for
select ? cast ? ( spid? as ? varchar ( 6 )) ? as ?spid? from ? sysprocesses ? where ?spid? > 50
open ?ibuffer
fetch ? next ? from ?ibuffer? into ?@spid
while? ( @@fetch_status ? != ? - 1 )
begin
print ? ''
print ? 'DBCC INPUTBUFFER FOR SPID ' ? + ?@spid
exec? ( 'dbcc inputbuffer (' ? + ?@spid? + ? ')' )
fetch ? next ? from ?ibuffer? into ?@spid
end
deallocate ?ibuffer
waitfor ? delay ? '0:0:10'
End
?
這種方法的缺陷就是,log.out會(huì)比較巨大,會(huì)占用很大的空間,如果blocking一個(gè)月甚至更長(zhǎng)時(shí)間才發(fā)生一次,那我們的這個(gè)方法就不太適宜。
?
?
方法四,我們用Agent Job來(lái)檢查BLOCKING
=====================================
長(zhǎng)期運(yùn)行一個(gè)BLOCKING SCRIPT的缺點(diǎn)是我們每隔一段時(shí)間,去查詢(xún)信息,但是大多數(shù)收集的信息是無(wú)用的。所以會(huì)導(dǎo)致日志文件巨大,對(duì)于一個(gè)生產(chǎn)系統(tǒng)來(lái)講,磁盤(pán)空間滿可不是個(gè)好事情,另外,有一些客戶(hù)對(duì)于用命令行來(lái)長(zhǎng)期運(yùn)行TSQL腳本有所顧忌,所以我們做了一個(gè)改進(jìn)。這次,我們只收集有用的信息。對(duì)于無(wú)用的信息我們不關(guān)注。這樣能極大減少日志大小。
?
我們首先創(chuàng)建一個(gè)觀察數(shù)據(jù)庫(kù),然后建立兩張表格?Blocking_sysprocesses和Blocking_SQLText,?建立一個(gè)存儲(chǔ)過(guò)程和一個(gè)Job,?該Job每隔一段時(shí)間去調(diào)用存儲(chǔ)過(guò)程。只有發(fā)現(xiàn)有blocking的,我們才記錄到表格Blocking_sysprocesses和Blocking_SQLText這兩個(gè)表格中。如果跟blocking無(wú)關(guān),我們就不對(duì)它進(jìn)行記錄。下面是TSQL語(yǔ)句:
?
?
CREATE ? DATABASE ?[MonitorBlocking]
GO
?
USE ?[MonitorBlocking]
GO
?
CREATE ? TABLE ?Blocking_sysprocesses (
??????[spid]? smallint ,
??????[kpid]? smallint ,
??????[blocked]? smallint ,
??????[waitType]? binary ( 2 ),
??????[waitTime]? bigInt ,
??????[lastWaitType]? nchar ( 32 ),
??????[waitResource]? nchar ( 256 ),
??????[dbID]? smallint ,
??????[uid]? smallint ,
??????[cpu]? int ,
??????[physical_IO]? int ,
??????[memusage]? int ,
??????[login_Time]? datetime ,
??????[last_Batch]? datetime ,
??????[open_Tran]? smallint ,
??????[status]? nchar ( 30 ),
??????[sid]? binary ( 86 ),
??????[hostName]? nchar ( 128 ),
??????[program_Name]? nchar ( 128 ),
??????[hostProcess]? nchar ( 10 ),
??????[cmd]? nchar ( 16 ),
??????[nt_Domain]? nchar ( 128 ),
??????[nt_UserName]? nchar ( 128 ),
??????[net_Library]? nchar ( 12 ),
??????[loginName]? nchar ( 128 ),
??????[context_Info]? binary ( 128 ),
??????[sqlHandle]? binary ( 20 ),
??????[CapturedTimeStamp]? datetime
)
GO
CREATE ? TABLE ?[dbo] . [Blocking_SqlText] (
??????[spid] [smallint] ,
??????[sql_text] [nvarchar] ( 2000 ),
??????[Capture_Timestamp] [datetime]
)
GO
?
CREATE ? PROCEDURE ?[dbo] . [checkBlocking]
AS
BEGIN
?
SET ? NOCOUNT ? ON ;
SET ? TRANSACTION ? ISOLATION ? LEVEL ? READ ? UNCOMMITTED
?
declare ?@Duration??? int ? -- in milliseconds, 1000 = 1 sec
declare ?@now???????????? datetime
declare ?@Processes?? int
?
select ??@Duration? = ?100?? -- in milliseconds, 1000 = 1 sec
select ??@Processes? = ?0
?
select ?@now? = ? getdate ()
?
CREATE ? TABLE ?#Blocks_rg (
??????[spid]? smallint ,
??????[kpid]? smallint ,
??????[blocked]? smallint ,
??????[waitType]? binary ( 2 ),
??????[waitTime]? bigInt ,
??????[lastWaitType]? nchar ( 32 ),
??????[waitResource]? nchar ( 256 ),
??????[dbID]? smallint ,
??????[uid]? smallint ,
??????[cpu]? int ,
??????[physical_IO]? int ,
??????[memusage]? int ,
??????[login_Time]? datetime ,
??????[last_Batch]? datetime ,
??????[open_Tran]? smallint ,
??????[status]? nchar ( 30 ),
??????[sid]? binary ( 86 ),
??????[hostName]? nchar ( 128 ),
??????[program_Name]? nchar ( 128 ),
??????[hostProcess]? nchar ( 10 ),
??????[cmd]? nchar ( 16 ),
??????[nt_Domain]? nchar ( 128 ),
??????[nt_UserName]? nchar ( 128 ),
??????[net_Library]? nchar ( 12 ),
??????[loginName]? nchar ( 128 ),
??????[context_Info]? binary ( 128 ),
??????[sqlHandle]? binary ( 20 ),
??????[CapturedTimeStamp]? datetime
) ????
?????
INSERT ? INTO ?#Blocks_rg?
SELECT
??????[spid] ,
??????[kpid] ,
??????[blocked] ,
??????[waitType] ,
??????[waitTime] ,
??????[lastWaitType] ,
??????[waitResource] ,
??????[dbID] ,
??????[uid] ,
??????[cpu] ,
??????[physical_IO] ,
??????[memusage] ,
??????[login_Time] ,
??????[last_Batch] ,
??????[open_Tran] ,
??????[status] ,
??????[sid] ,
??????[hostName] ,
??????[program_name] ,
??????[hostProcess] ,
??????[cmd] ,
??????[nt_Domain] ,
??????[nt_UserName] ,
??????[net_Library] ,
??????[loginame] ,
??????[context_Info] ,
??????[sql_Handle] ,
??????@now? as ?[Capture_Timestamp]
FROM ? master. . sysprocesses ? where ?blocked? <> ?0
AND ?waitTime? > ?@Duration?????
?????
SET ?@Processes? = ? @@rowcount
?
INSERT ? into ?#Blocks_rg
SELECT
?
??????src . [spid] ,
??????src . [kpid] ,
??????src . [blocked] ,
??????src . [waitType] ,
??????src . [waitTime] ,
??????src . [lastWaitType] ,
??????src . [waitResource] ,
??????src . [dbID] ,
??????src . [uid] ,
??????src . [cpu] ,
??????src . [physical_IO] ,
??????src . [memusage] ,
??????src . [login_Time] ,
??????src . [last_Batch] ,
??????src . [open_Tran] ,
??????src . [status] ,
??????src . [sid] ,
??????src . [hostName] ,
??????src . [program_name] ,
??????src . [hostProcess] ,
??????src . [cmd] ,
??????src . [nt_Domain] ,
??????src . [nt_UserName] ,
??????src . [net_Library] ,
??????src . [loginame] ,
??????src . [context_Info] ,
??????src . [sql_Handle]
?????? , @now? as ?[Capture_Timestamp]
FROM ?? master. . sysprocesses ?src? inner ? join ?#Blocks_rg trgt
??????? on ?trgt . blocked? = ?src . [spid]
?
if ?@Processes? > ?0
BEGIN
?????? INSERT ?[dbo] . [Blocking_sysprocesses]
?????? SELECT ? * ? from ?#Blocks_rg
?????
DECLARE ?@SQL_Handle? binary ( 20 ), ?@SPID? smallInt ;
DECLARE ?cur_handle? CURSOR ? FOR ? SELECT ?sqlHandle , ?spid? FROM ?#Blocks_rg ;
OPEN ?cur_Handle
FETCH ? NEXT ? FROM ?cur_handle? INTO ?@SQL_Handle , ?@SPID
WHILE? ( @@FETCH_STATUS ? = ?0 )
BEGIN
?
INSERT ?[dbo] . [Blocking_SqlText]
SELECT ??????@SPID , ? CONVERT ( nvarchar ( 4000 ), ?[text] ) ? , @now? as ?[Capture_Timestamp] from ? :: fn_get_sql ( @SQL_Handle )
?
FETCH ? NEXT ? FROM ?cur_handle? INTO ?@SQL_Handle , ?@SPID
END
CLOSE ?cur_Handle
DEALLOCATE ?cur_Handle
?
END
?
DROP ? table ?#Blocks_rg
?
END
?
GO
?
?
?
?
USE ?msdb ;
GO
?
EXEC ?dbo . sp_add_job
?????? @job_name? = ? N'MonitorBlocking' ;
GO
?
EXEC ? sp_add_jobstep
?????? @job_name? = ? N'MonitorBlocking' ,
??????@step_name? = ? N'execute blocking script' , ?
??????@subsystem? = ? N'TSQL' ,
??????@command? = ? N'exec checkBlocking' ,
@database_name = N'MonitorBlocking';
GO ???
?
EXEC ? sp_add_jobSchedule
?????? @name? = ? N'ScheduleBlockingCheck' ,
??????@job_name? = ? N'MonitorBlocking' ,
??????@freq_type? = ?4 , ? -- daily
??????@freq_interval? = ?1 ,
??????@freq_subday_type? = ?4 ,
??????@freq_subday_interval? = ?1
?
EXEC ? sp_add_jobserver ? @job_name? = ? N'MonitorBlocking' , ?@server_name? = ? N'(local)'
?
當(dāng)Blocking發(fā)生一段時(shí)間后,我們可以查詢(xún)下面的兩個(gè)表格,以得知當(dāng)時(shí)問(wèn)題發(fā)生時(shí)的blocking信息:
?
?
use ?MonitorBlocking
GO ???
SELECT ? * ? from ?Blocking_sqlText
SELECT ? * ? FROM ?Blocking_sysprocesses
更多文章、技術(shù)交流、商務(wù)合作、聯(lián)系博主
微信掃碼或搜索:z360901061

微信掃一掃加我為好友
QQ號(hào)聯(lián)系: 360901061
您的支持是博主寫(xiě)作最大的動(dòng)力,如果您喜歡我的文章,感覺(jué)我的文章對(duì)您有幫助,請(qǐng)用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧,狠狠點(diǎn)擊下面給點(diǎn)支持吧,站長(zhǎng)非常感激您!手機(jī)微信長(zhǎng)按不能支付解決辦法:請(qǐng)將微信支付二維碼保存到相冊(cè),切換到微信,然后點(diǎn)擊微信右上角掃一掃功能,選擇支付二維碼完成支付。
【本文對(duì)您有幫助就好】元
