一．介紹

在一個典型的系統中，有多個計算裝置。在 TensorFlow 中支援的裝置型別包括 CPU 和 GPU。他們用字串來表達，例如：

"/cpu:0": 機器的 CPU
"/device:GPU:0": 機器的 GPU 如果你只有一個
"/device:GPU:1": 機器的第二個 GPU

如果 TensorFlow 操作同時有 CPU 和 GPU 的實現，操作將會優先分配給 GPU 裝置。例如，matmul 同時有 CPU 和 GPU 核心，在一個系統中同時有裝置 cpu:0 和 gpu:0，gpu:0 將會被選擇來執行 matmul。

二． 記錄裝置狀態

為了確定你的操作和張量分配給了哪一個裝置，建立一個把 log_device_placement 的配置選項設定為 True 的會話即可。


# 建立一個計算圖



a


 

=

 


tf

.constant([

1.0

,

 


2.0

,

 


3.0

,

 


4.0

,

 


5.0

,

 


6.0

], shape=[

2

,

 


3

], name=

'a'

)



b


 

=

 


tf

.constant([

1.0

,

 


2.0

,

 


3.0

,

 


4.0

,

 


5.0

,

 


6.0

], shape=[

3

,

 


2

], name=

'b'

)



c


 

=

 


tf

.matmul(

a

,

 


b

)


# 建立一個 session，它的 log_device_placement 被設定為 True.


sess =

 


tf

.Session(config=

tf

.ConfigProto(log_device_placement=True))


# 執行這個操作



print

(sess.run(

c

))

你將會看到一下輸出:



Device


 


mapping

:


/

job


:localhost

/

replica


:0

/

task


:0

/

device


:GPU


:0


 


-

>

 


device

:

 



,

 


name

:

 


Tesla


 


K40c

,

 


pci


 


bus




id

:

 


0000


:05


:00.0




b

: /

job


:localhost

/

replica


:0

/

task


:0

/

device


:GPU


:0




a

: /

job


:localhost

/

replica


:0

/

task


:0

/

device


:GPU


:0




MatMul

: /

job


:localhost

/

replica


:0

/

task


:0

/

device


:GPU


:0




[[ 22.  28.]




[ 49.  64.]

]

三．手動分配裝置

如果你希望一個特定的操作執行在一個你選擇的裝置上，而不是自動選擇的裝置，你可以使用 tf.device 來建立一個裝置環境，這樣所有在這個環境的操作會有相同的裝置分配選項。


# 建立一個會話


with

 


tf

.device(

'/cpu:0'

):


 

a


 

=

 


tf

.constant([

1.0

,

 


2.0

,

 


3.0

,

 


4.0

,

 


5.0

,

 


6.0

], shape=[

2

,

 


3

], name=

'a'

)


 

b


 

=

 


tf

.constant([

1.0

,

 


2.0

,

 


3.0

,

 


4.0

,

 


5.0

,

 


6.0

], shape=[

3

,

 


2

], name=

'b'

)



c


 

=

 


tf

.matmul(

a

,

 


b

)


# 建立一個 session，它的 log_device_placement 被設定為 True


sess =

 


tf

.Session(config=

tf

.ConfigProto(log_device_placement=True))


# 執行這個操作



print

(sess.run(

c

))

你將會看到 a 和 b 被分配給了 cpu:0。因為沒有指定特定的裝置來執行 matmul 操作，TensorFlow 將會根據操作和已有的裝置來選擇(在這個例子中是 gpu:0)，並且如果有需要會自動在裝置之間複製張量。



Device


 


mapping

:


/

job


:localhost

/

replica


:0

/

task


:0

/

device


:GPU


:0


 


-

>

 


device

:

 



,

 


name

:

 


Tesla


 


K40c

,

 


pci


 


bus




id

:

 


0000


:05


:00.0




b

: /

job


:localhost

/

replica


:0

/

task


:0

/

cpu


:0




a

: /

job


:localhost

/

replica


:0

/

task


:0

/

cpu


:0




MatMul

: /

job


:localhost

/

replica


:0

/

task


:0

/

device


:GPU


:0




[[ 22.  28.]




[ 49.  64.]

]

四．允許 GPU 記憶體增長

預設情況下，TensorFlow 將幾乎所有的 GPU的視訊記憶體（受 CUDA_VISIBLE_DEVICES 影響）對映到程式。透過減少記憶體碎片，可以更有效地使用裝置上寶貴的GPU記憶體資源。

在某些情況下，只需要分配可用記憶體的一個子集給程式，或者僅根據程式需要增加記憶體使用量。 TensorFlow 在 Session 上提供了兩個 Config 選項來控制這個選項。

第一個是 allow_growth 選項，它根據執行時的需要分配 GPU 記憶體：它開始分配很少的記憶體，並且隨著 Sessions 執行並需要更多的 GPU 記憶體，我們根據 TensorFlow 程式需要繼續擴充套件了GPU所需的記憶體區域。請注意，我們不釋放記憶體，因為這會導致記憶體碎片變得更糟。要開啟此選項，請透過以下方式在 ConfigProto 中設定選項：


config =

 


tf

.ConfigProto()


config.gpu_options.allow_growth = True


session =

 


tf

.Session(config=config, ...)

第二種方法是 per_process_gpu_memory_fraction 選項，它決定了每個可見GPU應該分配的總記憶體量的一部分。例如，可以透過以下方式告訴 TensorFlow 僅分配每個GPU的總記憶體的40％：


config =

 


tf

.ConfigProto()


config.gpu_options.per_process_gpu_memory_fraction =

 


0.4



session =

 


tf

.Session(config=config, ...)

如果要真正限制 TensorFlow 程式可用的GPU記憶體量，這非常有用。

五．在多 GPU系統上使用單個GPU

如果您的系統中有多個GPU，則預設情況下將選擇具有最低ID的GPU。如果您想在不同的GPU上執行，則需要明確指定首選項：


# 建立一個計算圖


with

 


tf

.device(

'/device:GPU:2'

):


 

a


 

=

 


tf

.constant([

1.0

,

 


2.0

,

 


3.0

,

 


4.0

,

 


5.0

,

 


6.0

], shape=[

2

,

 


3

], name=

'a'

)


 

b


 

=

 


tf

.constant([

1.0

,

 


2.0

,

 


3.0

,

 


4.0

,

 


5.0

,

 


6.0

], shape=[

3

,

 


2

], name=

'b'

)


 

c


 

=

 


tf

.matmul(

a

,

 


b

)


# 建立一個 log_device_placement 設定為True 的會話


sess =

 


tf

.Session(config=

tf

.ConfigProto(log_device_placement=True))


# 執行這個操作



print

(sess.run(

c

))

你會看到現在 a 和 b 被分配給 cpu:0。由於未明確指定裝置用於 MatMul 操作，因此 TensorFlow 執行時將根據操作和可用裝置（本例中為 gpu:0）選擇一個裝置，並根據需要自動複製裝置之間的張量。

如果指定的裝置不存在，將得到 InvalidArgumentError：


InvalidArgumentError: Invalid argumen

t:


 

Cannot assign

 


a


 

device

 


to


 

node

 


'b'

:


Could not satisfy explicit device specification

 


'/device:GPU:2'



  [[Node:

 


b


 

= Const[dtype=DT_FLOAT, value=Tensor<

type

: float shape: [

3

,

2

]


  value

s:


 


1


 


2


 


3

...>, _device=

"/device:GPU:2"

]()]]

如果希望 TensorFlow 在指定的裝置不存在的情況下自動選擇現有的受支援裝置來執行操

作，則可以在建立會話時在配置選項中將 allow_soft_placement 設定為 True。


# 建立計算圖


with

 


tf

.device(

'/device:GPU:2'

):


 

a


 

=

 


tf

.constant([

1.0

,

 


2.0

,

 


3.0

,

 


4.0

,

 


5.0

,

 


6.0

], shape=[

2

,

 


3

], name=

'a'

)


 

b


 

=

 


tf

.constant([

1.0

,

 


2.0

,

 


3.0

,

 


4.0

,

 


5.0

,

 


6.0

], shape=[

3

,

 


2

], name=

'b'

)


 

c


 

=

 


tf

.matmul(

a

,

 


b

)


# 建立一個 allow_soft_placement 和 log_device_placement 設定為 True 的會話




sess =

 


tf

.Session(config=

tf

.ConfigProto(


     allow_soft_placement=True, log_device_placement=True))


# 執行這個操作



print

(sess.run(

c

))

六． 使用多個 GPU

如果您想要在多個 GPU 上執行 TensorFlow ，則可以採用多塔式方式構建模型，其中每個塔都分配有不同的 GPU。例如：


# 建立計算圖



c


 

= []



for


 

d in [

'/device:GPU:2'

,

 


'/device:GPU:3'

]:


 with

 


tf

.device(d):


   

a


 

=

 


tf

.constant([

1.0

,

 


2.0

,

 


3.0

,

 


4.0

,

 


5.0

,

 


6.0

], shape=[

2

,

 


3

])


   

b


 

=

 


tf

.constant([

1.0

,

 


2.0

,

 


3.0

,

 


4.0

,

 


5.0

,

 


6.0

], shape=[

3

,

 


2

])


   

c

.

append

(

tf

.matmul(

a

,

 


b

))


with

 


tf

.device(

'/cpu:0'

):


 sum =

 


tf

.add_n(

c

)


# 建立一個 log_device_placement 設定為 True 的會話


sess =

 


tf

.Session(config=

tf

.ConfigProto(log_device_placement=True))


# 執行這個操作



print

(sess.run(sum))

你將會看到以下的輸出：



Device


 


mapping

:


/

job


:localhost

/

replica


:0

/

task


:0

/

device


:GPU


:0


 


-

>

 


device

:

 



,

 


name

:

 


Tesla


 


K20m

,

 


pci


 


bus




id

:

 


0000


:02


:00.0



/

job


:localhost

/

replica


:0

/

task


:0

/

device


:GPU


:1


 


-

>

 


device

:

 


1

,

 


name

:

 


Tesla


 


K20m

,

 


pci


 


bus




id

:

 


0000


:03


:00.0



/

job


:localhost

/

replica


:0

/

task


:0

/

device


:GPU


:2


 


-

>

 


device

:

 


2

,

 


name

:

 


Tesla


 


K20m

,

 


pci


 


bus




id

:

 


0000


:83


:00.0



/

job


:localhost

/

replica


:0

/

task


:0

/

device


:GPU


:3


 


-

>

 


device

:

 


3

,

 


name

:

 


Tesla


 


K20m

,

 


pci


 


bus




id

:

 


0000


:84


:00.0




Const_3

: /

job


:localhost

/

replica


:0

/

task


:0

/

device


:GPU


:3




Const_2

: /

job


:localhost

/

replica


:0

/

task


:0

/

device


:GPU


:3




MatMul_1

: /

job


:localhost

/

replica


:0

/

task


:0

/

device


:GPU


:3




Const_1

: /

job


:localhost

/

replica


:0

/

task


:0

/

device


:GPU


:2




Const

: /

job


:localhost

/

replica


:0

/

task


:0

/

device


:GPU


:2




MatMul

: /

job


:localhost

/

replica


:0

/

task


:0

/

device


:GPU


:2




AddN

: /

job


:localhost

/

replica


:0

/

task


:0

/

cpu


:0




[[  44.   56.]




[  98.  128.]

]

【本文轉載自：磐創AI，作者：翻譯 | fendouai，原文連線：https://mp.weixin.qq.com/s/x3OTTwaEno0i-_z1DBExlQ】

Tensorflow多GPU使用詳解

相關文章