@CopyLeft by ICANTH , I Can do ANy THing that I CAN THink !~
Author :WenHui ,WuHan University ,2012-6-15
?
PDF版閱讀地址 : http://www.docin.com/p1-424285718.html
?
普通自旋鎖
自旋鎖最常見的使用場景是創(chuàng)建一段臨界區(qū) :
static DEFINE_SPINLOCK(xxx_lock);
unsigned long flags;
spin_lock_irqsave(&xxx_lock, flags);
... critical section here ..
spin_unlock_irqrestore(&xxx_lock, flags);
自旋鎖使用時值得注意的是:對于采用使用自旋鎖以保證共享變量的存取安全時,僅當(dāng)系統(tǒng)中 所有涉及 到存取該共享變量的程序部分都采用 成對的spin_lock、和spin_unlock 來進行操作才能保證其安全性。
NOTE! The spin-lock is safe only when you _also_ use the lock itself to do locking across CPU's, which implies that EVERYTHING that touches a shared variable has to agree about the spinlock they want to use.
在Linux2.6.15.5中,自旋體數(shù)據(jù)結(jié)構(gòu)如下:
當(dāng)配置CONFIG_SMP時,raw_spinlock_t才是一個含有slock變量的結(jié)構(gòu),該slock字段標(biāo)識自旋鎖是否空閑狀態(tài),用以處理多CPU處理器并發(fā)申請鎖的情況;當(dāng)未配置CONFIG_SMP時,對于單CPU而言,不會發(fā)生發(fā)申請自旋鎖,故raw_lock為空結(jié)構(gòu)體。
當(dāng)配置CONFIG_SMP和CONFIG_PREEMPT時,spinlock_t才會有break_lock字段,break_lock字段用于標(biāo)記自旋鎖競爭狀態(tài),當(dāng)break_lock = 0時表示沒有多于兩個的執(zhí)行路徑,當(dāng)break_lock = 1時表示沒有其它進程在忙等待該鎖。當(dāng)在SMP多CPU體系架構(gòu)下有可能出現(xiàn)申請不到自旋鎖、空等的情況,但LINUX內(nèi)核必須保證在spin_lock的原子性,故在配置CONFIG_PREEMPT時必須禁止內(nèi)核搶占。
字段
|
描述
|
spin_lock_init(lock)
|
一個自旋鎖時,可使用接口函數(shù)將其初始化為鎖定狀態(tài)
|
spin_lock(lock)
|
用于鎖定自旋鎖,如果成功則返回;否則循環(huán)等待自旋鎖變?yōu)榭臻e
|
spin_unlock(lock)
|
釋放自旋鎖lock,重新設(shè)置自旋鎖為鎖定狀態(tài)
|
spin_is_locked(lock)
|
判斷當(dāng)前自旋鎖是否處于鎖定狀態(tài)
|
spin_unlock_wait(lock)
|
循環(huán)等待、直到自旋鎖lock變?yōu)榭捎脿顟B(tài)
|
spin_trylock(lock)
|
嘗試鎖定自旋鎖lock,如不成功則返回0;否則鎖定,并返回1
|
spin_can_lock(lock)
|
判斷自旋鎖lock是否處于空閑狀態(tài)
|
spin_lock和spin_unlock的關(guān)系如下:
可見,在 UP 體系架構(gòu) 中,由于沒有必要有實際的鎖以防止多CPU搶占,spin操作僅僅是禁止和開啟內(nèi)核搶占。
LINUX 2.6.35版本,將spin lock實現(xiàn)更改為 ticket lock。spin_lock數(shù)據(jù)結(jié)構(gòu)除了用于內(nèi)核調(diào)試之外,字段為: raw_spinlock rlock 。
ticket spinlock將rlock字段分解為如下兩部分:
Next是下一個票號,而Owner是允許使用自旋鎖的票號。加鎖時CPU取Next,并將rlock.Next + 1。將Next與Owner相比較,若相同,則加鎖成功;否則循環(huán)等待、直到Next = rlock.Owner為止。解鎖則直接將Owner + 1即可。
spin_lock和spin_unlock的調(diào)用關(guān)系如下:
?
普通自旋鎖源碼分析
源程序文件目錄關(guān)系圖
在/include/linux/spinlock.h中通過是否配置CONFIG_SMP項判斷導(dǎo)入哪種自旋鎖定義及操作:
?
004
/*
005
? * include/linux/spinlock.h - generic spinlock/rwlock declarations
007
? * here's the role of the various spinlock/rwlock related include files:
009
? * on SMP builds:
011
? *? asm/spinlock_types.h: contains the arch_spinlock_t/arch_rwlock_t and the
012
? *??????????????????????? initializers
014
? *? linux/spinlock_types.h:
015
? *??????????????????????? defines the generic type and initializers
017
? *? asm/spinlock.h:?????? contains the arch_spin_*()/etc. lowlevel
018
? *??????????????????????? implementations, mostly inline assembly code
022
? *? linux/spinlock_api_smp.h:
023
? *??????????????????????? contains the prototypes for the _spin_*() APIs.
025
? *? linux/spinlock.h:???? builds the final spin_*() APIs.
027
? * on UP builds:
029
? *? linux/spinlock_type_up.h:
030
? *??????????????????????? contains the generic, simplified UP spinlock type.
031
? *??????????????????????? (which is an empty structure on non-debug builds)
033
? *? linux/spinlock_types.h:
034
? *??????????????????????? defines the generic type and initializers
036
? *? linux/spinlock_up.h:
037
? *??????????????????????? contains the arch_spin_*()/etc. version of UP
038
? *??????????????????????? builds. (which are NOPs on non-debug, non-preempt
039
? *??????????????????????? builds)
041
? *?? (included on UP-non-debug builds:)
043
? *? linux/spinlock_api_up.h:
044
? *??????????????????????? builds the _spin_*() APIs.
046
? *? linux/spinlock.h:???? builds the final spin_*() APIs.
047
? */
?
082
/*
083
? * Pull the arch_spin*() functions/declarations (UP-nondebug doesnt need them):
084
? */
085
#ifdef CONFIG_SMP
086
# include <asm/spinlock.h>
087
#else
088
# include <
linux/spinlock_up.h
>
089
#endif
064
typedef struct
spinlock
{
065
???????? union {
066
???????????????? struct
raw_spinlock
rlock
;
075
???????? };
076
}
spinlock_t
;
282
static inline void
spin_lock
(
spinlock_t
*
lock
)
283
{
284
????????
raw_spin_lock
(&
lock
->
rlock
);
285
}
169
#define
raw_spin_lock
(
lock
)????
_raw_spin_lock
(
lock
)
?
?
322
static inline void
spin_unlock
(
spinlock_t
*
lock
)
323
{
324
????????
raw_spin_unlock
(&
lock
->
rlock
);
325
}
222
#define
raw_spin_unlock
(
lock
)??????????
_raw_spin_unlock
(
lock
)
?
UP 體系架構(gòu)
?
spin_lock函數(shù)在UP體系架構(gòu)中最終實現(xiàn)方式為:
/include/linux/spinlock_api_up.h
052
#define
_raw_spin_lock
(
lock
)???????????????????
__LOCK
(
lock
)
021
/*
022
? *
In the UP-nondebug case there's no real locking going on
, so the
023
? * only thing we have to do is to keep the preempt counts and irq
024
? * flags straight, to suppress compiler warnings of unused lock
025
? * variables, and to add the proper checker annotations:
026
? */
027
#define
__LOCK
(
lock
) \
028
?? do {
preempt_disable
();
__acquire
(
lock
); (void)(
lock
); } while (0)
052
#define
_raw_spin_lock
(
lock
)???????????????????
__LOCK
(
lock
)
?
preempt_disable在未配置CONFIG_PREEMPT時為空函數(shù),否則禁止內(nèi)核搶占。而__acquire()用于內(nèi)核編譯過程中靜態(tài)檢查。(void)(lock)則是為避免編譯器產(chǎn)生lock未被使用的警告。
?
spin_unlock函數(shù)在UP體系架構(gòu)中最終實現(xiàn)方式為:
039
#define
__UNLOCK
(
lock
) \
040
?? do {
preempt_enable
();
__release
(
lock
); (void)(
lock
); } while (0)
?
SMP 體系架構(gòu)-Tickect Spin Lock的實現(xiàn)方式
在Linux2.6.24中,自旋鎖由一個整數(shù)表示,當(dāng)為1時表示鎖是空閑的,spin_lock()每次減少1,故 <=0時則表示有多個鎖在忙等待,但這將導(dǎo)致不公平性。自linux2.6.25開始,自旋鎖將整數(shù)拆為一個16位數(shù),結(jié)構(gòu)如下:
該實現(xiàn)機制稱為“Ticket spinlocks”,Next字節(jié)表示下一次請求鎖給其分配的票號,而Owner表示當(dāng)前可以取得鎖的票號,Next和Owner初始化為0。 當(dāng)lock.Next = lock.Owner時,表示該鎖處于空閑狀態(tài) 。 spin_lock 執(zhí)行如下過程:
1、my_ticket = slock.next
2、slock.next++
3、wait until my_ticket = slock.owner
spin_unlock 執(zhí)行如下過程:
1、slock.owner++
但該鎖將導(dǎo)致一個問題:8個bit將只能最多表示255個CPU來競爭該鎖。故系統(tǒng)通過的方式,將實現(xiàn)兩個tickect_spin_lock和ticket_spin_unclock的版本:
058
#if (
NR_CPUS
< 256)
059
#define
TICKET_SHIFT
8
106
#else
107
#define
TICKET_SHIFT
16
?
SMP 體系架構(gòu)-SPIN LOCK (ticket_shif 8)
046
#ifdef CONFIG_INLINE_SPIN_LOCK
047
#define
_raw_spin_lock
(
lock
)
__raw_spin_lock
(
lock
)
048
#endif
/include/linux/spinlock_api_smp.h:
140
static inline void
__raw_spin_lock
(
raw_spinlock_t
*
lock
)
141
{
142
????????
preempt_disable
();
143
????????
spin_acquire
(&
lock
->
dep_map
, 0, 0,
_RET_IP_
);
144
????????
LOCK_CONTENDED
(
lock
,
do_raw_spin_trylock
,
do_raw_spin_lock
);
145
}
在__raw_spin_lock中,首先禁止內(nèi)核搶占,調(diào)用LOCK_CONTENED宏
391
#define
LOCK_CONTENDED
(
_lock
, try,
lock
)??????????????????????? \
392
do {??????????????????????????????????????????????????????????? \
393
???????? if (!try(
_lock
)) {????????????????????????????????????? \
394
????????????????
lock_contended
(&(
_lock
)->
dep_map
,
_RET_IP_
);??? \
395
????????????????
lock
(
_lock
);??????????????????????????????????? \
396
???????? }?????????????????????????????????????????????????????? \
397
????????
lock_acquired
(&(
_lock
)->
dep_map
,
_RET_IP_
);???????????????????? \
398
} while (0)
其中即在_raw_spin_lock中,即為首先調(diào)用do_raw_spin_trylock嘗試加鎖,若失敗則繼續(xù)調(diào)用do_raw_spin_lock進行加鎖。而do_raw_spin_xxx具體實現(xiàn)與平臺有關(guān)。
/include/linux/spinlock.h
136
static inline void
do_raw_spin_lock
(
raw_spinlock_t
*
lock
)
__acquires
(
lock
)
137
{
138
????????
__acquire
(
lock
);
139
????????
arch_spin_lock
(&
lock
->
raw_lock
);
140
}
?
149
static inline int
do_raw_spin_trylock
(
raw_spinlock_t
*
lock
)
150
{
151
???????? return
arch_spin_trylock
(&(
lock
)->
raw_lock
);
152
}
在X86平臺下, do_raw_spin_lock 和 do_raw_spin_trylock 實現(xiàn)為兩個函數(shù):
/arch/x86/include/asm/spinlock.h
188
static
__always_inline
void
arch_spin_lock
(
arch_spinlock_t
*
lock
)
189
{
190
????????
__ticket_spin_lock
(
lock
);
191
}
192
193
static
__always_inline
int
arch_spin_trylock
(
arch_spinlock_t
*
lock
)
194
{
195
???????? return
__ticket_spin_trylock
(
lock
);
196
}
058
#if (
NR_CPUS
< 256)
059
#define
TICKET_SHIFT
8
061
static
__always_inline
void
__ticket_spin_lock
(
arch_spinlock_t
*
lock
)
062
{
063
???????? short
inc
= 0x0100;
064
065
???????? asm volatile (
066
????????????????
LOCK_PREFIX
"xaddw %w0, %1\n"
067
???????????????? "1:\t"
068
???????????????? "cmpb %h0, %b0\n\t"
069
???????????????? "je 2f\n\t"
070
???????????????? "rep ; nop\n\t"
071
???????????????? "movb %1, %b0\n\t"
072
???????????????? /* don't need lfence here, because loads are in-order */
073
???????????????? "jmp 1b\n"
074
???????????????? "2:"
075
???????????????? : "+Q" (
inc
), "+m" (
lock
->
slock
)
076
???????????????? :
077
???????????????? : "memory", "cc");
078
}
066 行 :LOCK_PREFIX在UP上為空定義,而在SMP上為Lock,用以保證從 066行~074行 為原子操作,強制所有CPU緩存失效。xaddw指令用法如下:
xaddw src, dsc ==
tmp = dsc
desc = dsc + src
src = tmp
XADDW語法驗證實驗:
xaddw使%0和%1按1個word長度交換相加,即:%0: inc → slock, %1: slock → slock + 0x0100。%1此時高字節(jié)Next + 1。xaddw使%0和%1內(nèi)容改變?nèi)缦拢?
068 行 :比較inc中自己的Next是否與Owner中ticket相等,若相等則獲取自旋鎖使用權(quán)、結(jié)束循環(huán)。
070 行 ~ 073行 :如果Owner不屬于自己,則執(zhí)行空語句,并重新讀取slock中的Owner,跳回至068行進行判斷。
為什么要用LOCK_PREFIX宏來代替直接使用lock指令的方式呢?解釋如下:為了避免在配置了CONFIG_SMP項編譯產(chǎn)生的SMP內(nèi)核、實際卻運行在UP系統(tǒng)上時系統(tǒng)執(zhí)行l(wèi)ock命令所帶來的開銷,系統(tǒng)創(chuàng)建在.smp_locks一張SMP alternatives table用以保存系統(tǒng)中所有l(wèi)ock指令的指針。當(dāng)實際運行時,若從SMP→UP時,可以根據(jù).smp_locks lock 指針表通過熱補丁的方式將lock指令替換成nop指令。當(dāng)然也可以實現(xiàn)系統(tǒng)運行時將鎖由UP→SMP的切換。具體應(yīng)用可參見參考資料《Linux 內(nèi)核 LOCK_PREFIX 的含義》。
?
009
/*
010
? * Alternative inline assembly for SMP.
011
? *
012
? * The LOCK_PREFIX macro defined here replaces the LOCK and
013
? * LOCK_PREFIX macros used everywhere in the source tree.
014
? *
015
? * SMP alternatives use the same data structures as the other
016
? * alternatives and the X86_FEATURE_UP flag to indicate the case of a
017
? * UP system running a SMP kernel.? The existing apply_alternatives()
018
? * works fine for patching a SMP kernel for UP.
019
? *
020
? * The SMP alternative tables can be kept after boot and contain both
021
? * UP and SMP versions of the instructions to allow switching back to
022
? * SMP at runtime, when hotplugging in a new CPU, which is especially
023
? * useful in virtualized environments.
024
? *
025
? * The very common lock prefix is handled as special case in a
026
? * separate table which is a pure address list without replacement ptr
027
? * and size information.? That keeps the table sizes small.
028
? */
029
030
#ifdef CONFIG_SMP
031
#define
LOCK_PREFIX_HERE
\
032
???????????????? ".section .smp_locks,\"a\"\n"?? \
033
???????????????? ".balign 4\n"?????????????????? \
034
???????????????? ".long 671f - .\n" /* offset */ \
035
???????????????? ".previous\n"?????????????????? \
036
???????????????? "671:"
037
038
#define
LOCK_PREFIX
LOCK_PREFIX_HERE
"\n\tlock; "
039
040
#else /* ! CONFIG_SMP */
041
#define
LOCK_PREFIX_HERE
""
042
#define
LOCK_PREFIX
""
043
#endif
032 行 “.section .smp_locks, a”,表示以下代碼生成在.smp_locks段中,而“a”代表——allocatable。
033 行~034行 “.balign 4 .long 571f”,表示以4字節(jié)對齊、將671標(biāo)簽的地址置于.smp_locks段中,而標(biāo)簽671的地址即為:代碼段lock指令的地址。(其實就是lock指令的指針啦~~~)
033 行~034行 “.previous”偽指令,表示恢復(fù)以前section,即代碼段。故在 038行 將導(dǎo)致在代碼段生成lock指令。
LOCK_CONTENDED 時首先嘗試使用__ticket_spin_trylock對lock進行加鎖,若失敗則繼續(xù)使用__ticket_spin_lock進行加鎖。不直接調(diào)用__ticket_spin_lock而使用__ticket_spin_trylock的原因是:
trylock首先不會修改lock.slock的ticket,它只是通過再次檢查,1)將slock讀出,并判斷slock是否處于空閑狀態(tài);2)調(diào)用LOCK執(zhí)行原子操作,判斷當(dāng)前slock的Next是否已經(jīng)被其它CPU修改,若未被修改則獲得該鎖,并將lock.slock.Next + 1。
spin_lock,無論如何,首先調(diào)用LOCK執(zhí)行原子性操作、聲明ticket;而trylock則首先進行slock.Next == slock.Owner的判斷,降低第二次比較調(diào)用LOCK的概率。
?
080
static
__always_inline
int
__ticket_spin_trylock
(
arch_spinlock_t
*
lock
)
081
{
082
???????? int
tmp
, new;
083
084
???????? asm volatile("movzwl %2, %0\n\t"
085
????????????????????? "cmpb %h0,%b0\n\t"
086
????????????????????? "leal 0x100(%"
REG_PTR_MODE
"0), %1\n\t"
087
????????????????????? "jne 1f\n\t"
088
?????????????????????
LOCK_PREFIX
"cmpxchgw %w1,%2\n\t"
089
????????????????????? "1:"
090
????????????????????? "sete %b1\n\t"
091
????????????????????? "movzbl %b1,%0\n\t"
092
????????????????????? : "=&a" (
tmp
), "=&q" (new), "+m" (
lock
->
slock
)
093
????????????????????? :
094
????????????????????? : "memory", "cc");
095
096
???????? return
tmp
;
097
}
084 行 將lock.slock的值賦給tmp。
085 行 比較tmp.next == tmp.owner,判斷當(dāng)前自旋鎖是否空閑。
086 行 leal指令( Load effective address ),實際上是movl的變形,“l(fā)eal 0x10 (%eax, %eax, 3), %edx” → “%edx = 0x10 + %eax + %eax * 3”,但leal卻不像movl那樣從內(nèi)存取值、而直接讀取寄存器。 086行 語句,根據(jù)REG_PTR_MODE不同配置,在X86平臺下為:“l(fā)eal 0x100(%k0), %1”,而在其它平臺為:“l(fā)eal 0x100(%q0), %1”,忽略占位符修飾“k”或“q”,則該行語句等價于:
“movl (%0 + 0x100),%1”,此時new = { tmp.Next + 1 , tmp.Owner }。
087 行 若tmp.next != tmp.owner,即自旋鎖不空閑,則跳到089行將0賦值給tmp并返回。
088 行 原子性地執(zhí)行操作cmpxchgw,用以檢測當(dāng)前自旋鎖是否已被其它CPU修改lock.slock的Next域,若有競爭者則失敗、否則獲得該鎖并將Next + 1,這一系列操作是原子性的!cmpxchgw操作解釋如下:
the accumulator ( 8-32 bits ) with "dest". If equal the "dest" is loaded with "src", otherwise the accumulator is loaded with "dest".(在IA32下,%EAX即為累加器。)
所以,“cmpxchgw %w1, %2”等效于:
“tmp.Next == lock.slock.Next ? lock.slock = new : tmp = lock.slock”
若Next未發(fā)生變化,則將lock.slock更新為new, 實質(zhì)上是將slock的Next+1 。
090 行 執(zhí)行sete指令,若cmpxchgw或cmpb成功則將new的最低字節(jié)%b1賦值為1,否則賦值為0. sete的解釋為:
Sets the byte in the operand to 1 if the Zero Flag is set, otherwise sets the operand to 0.
091 行 movzbl( movz from byte to long )指令將%b1賦值給tmp最低字節(jié),且其它位補0.即將tmp置為0或1.
?
SMP 體系架構(gòu)-SPIN UNLOCK (ticket_shif 8)
/include/linux/spinlock_api_smp.h
046
#ifdef CONFIG_INLINE_SPIN_LOCK
047
#define
_raw_spin_lock
(
lock
)
__raw_spin_lock
(
lock
)
048
#endif
149
static inline void
__raw_spin_unlock
(
raw_spinlock_t
*
lock
)
150
{
151
????????
spin_release
(&
lock
->
dep_map
, 1,
_RET_IP_
);
152
????????
do_raw_spin_unlock
(
lock
);
153
????????
preempt_enable
();
154
}
spin_unlock即最終調(diào)用do_raw_spin_unlock對自旋鎖進行釋放操作。
/include/linux/spinlock.h
136
static inline void
do_raw_spin_lock
(
raw_spinlock_t
*
lock
)
__acquires
(
lock
)
137
{
138
????????
__acquire
(
lock
);
139
????????
arch_spin_lock
(&
lock
->
raw_lock
);
140
}
對于x86的IA32平臺,arch_spin_lock實現(xiàn)如下:
/arch/x86/include/asm/spinlock.h
198
static
__always_inline
void
arch_spin_unlock
(
arch_spinlock_t
*
lock
)
199
{
200
????????
__ticket_spin_unlock
(
lock
);
201
}
058
#if (
NR_CPUS
< 256)
059
#define
TICKET_SHIFT
8
099
static
__always_inline
void
__ticket_spin_unlock
(
arch_spinlock_t
*
lock
)
100
{
101
???????? asm volatile(
UNLOCK_LOCK_PREFIX
"incb %0"
102
????????????????????? : "+m" (
lock
->
slock
)
103
????????????????????? :
104
????????????????????? : "memory", "cc");
105
}
101 行 將lock->slock的Owner + 1,表示可以讓下一個擁有牌號的CPU加鎖。
030
#if
defined
(CONFIG_X86_32) && \
031
???????? (
defined
(CONFIG_X86_OOSTORE) ||
defined
(CONFIG_X86_PPRO_FENCE))
032
/*
033
? * On PPro SMP or if we are using OOSTORE, we use a locked operation to unlock
034
? * (PPro errata 66, 92)
035
? */
036
#
define
UNLOCK_LOCK_PREFIX
LOCK_PREFIX
037
#else
038
#
define
UNLOCK_LOCK_PREFIX
039
#endif
?
參考資料
自旋鎖
《spinlocks.txt》,/Documentation/spinlocks.txt
《Ticket spinlocks》, http://lwn.net/Articles/267968/
《Linux x86 spinlock實現(xiàn)之分析》, http://blog.csdn.net/david_henry/article/details/5405093
《Linux 內(nèi)核 LOCK_PREFIX 的含義》, http://blog.csdn.net/ture010love/article/details/7663008
《The Intel 8086 / 8088/ 80186 / 80286 / 80386 / 80486 Instruction Set》: http://zsmith.co/intel.html
更多文章、技術(shù)交流、商務(wù)合作、聯(lián)系博主
微信掃碼或搜索:z360901061
微信掃一掃加我為好友
QQ號聯(lián)系: 360901061
您的支持是博主寫作最大的動力,如果您喜歡我的文章,感覺我的文章對您有幫助,請用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧,狠狠點擊下面給點支持吧,站長非常感激您!手機微信長按不能支付解決辦法:請將微信支付二維碼保存到相冊,切換到微信,然后點擊微信右上角掃一掃功能,選擇支付二維碼完成支付。
【本文對您有幫助就好】元

