在macOS10.12系統上給theano配置GPU加速–Intel顯示卡

NO IMAGE

首先查了一下,cuda只支援Nvida顯示卡,所以只好放棄了。轉而選擇gpuarray backend,這個版本還沒有release,都是開發版。

根據官網提示,首先你需要先安裝cmake、cython、nose等軟體和py庫。因為我之前安裝過cmake,而且我使用的是anaconda,所以這些py庫也都有。真是非常方便。

下面開始安裝:

# 後面會發現這其實是一個大坑!
git clone https://github.com/Theano/libgpuarray.git
cd libgpuarray
mkdir Build
cd Build
cmake .. -DCMAKE_BUILD_TYPE=Release
make
make install
cd ..

這一步看起來還算簡單。
make install會在/usr/local/include/下建立gpuarray/目錄,這裡面有下面?build所需的一些標頭檔案,同時在libgpuarray/lib下面會建立libgpuarray.dylib和libgpuarray-static.a這兩個特別重要的動態連結庫。

下一步安裝pygpu,注意,可能需要先改一下setup.py中的include_dirslibrary_dirs2個變數如下:

include_dirs = ["/usr/local/include", np.get_include()]
library_dirs = ["lib"]

否則可能提示找不到標頭檔案或者動態連結庫。然後執行:

python setup.py build
python setup.py install

這樣pygpu就算安裝完成了。

下一步就是測試gpu是否正常工作。
建立如下check1.py檔案, 它的功能很簡單,就是計算長度為vlen的隨機陣列每個元素的exp值。

### check1.py
from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time
vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000
rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in range(iters):
r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
print('Used the cpu')
else:
print('Used the gpu')

測試cpu效能:

THEANO_FLAGS=device=cpu python check1.py

結果:

[Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)]
Looping 1000 times took 3.219283 seconds
Result is [ 1.23178029  1.61879337  1.52278066 ...,  2.20771813  2.29967761
1.62323284]
Used the cpu

測試gpu效能:

THEANO_FLAGS=device=opencl0:1 python check1.py

結果出現如下錯誤:

Wrong major API version for gpuarray:-9997 Make sure Theano and libgpuarray/pygpu are in sync.

看樣子應該是版本不符合。google了很久,發現原因是:我剛才從github上安裝的是最新的gpuarray,而我的theano是0.8.2,可能不是最新的了,於是我只好更新一下theano:

pip install –upgrade –no-deps git git://github.com/Theano/Theano.git

更新好theano,下面再執行上面的命令,還是有問題:

clang -dynamiclib -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -march=haswell -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -fPIC -undefined dynamic_lookup -I/Users/flybywind/anaconda3/lib/python3.5/site-packages/pygpu-0.2.1-py3.5-macosx-10.6-x86_64.egg/pygpu -I/Users/flybywind/anaconda3/lib/python3.5/site-packages/numpy/core/include -I/Users/flybywind/anaconda3/lib/python3.5/site-packages/numpy/core/include -I/Users/flybywind/anaconda3/include/python3.5m -I/Users/flybywind/anaconda3/lib/python3.5/site-packages/theano/gof -L/Users/flybywind/anaconda3/lib -fvisibility=hidden -o /Users/flybywind/.theano/compiledir_Darwin-16.0.0-x86_64-i386-64bit-i386-3.5.2-64/tmppv5z0wy8/mb4366a5a742592cc8864699a71f9f43c.so /Users/flybywind/.theano/compiledir_Darwin-16.0.0-x86_64-i386-64bit-i386-3.5.2-64/tmppv5z0wy8/mod.cpp -lgpuarray
/Users/flybywind/.theano/compiledir_Darwin-16.0.0-x86_64-i386-64bit-i386-3.5.2-64/tmppv5z0wy8/mod.cpp:4:10: fatal error: ‘gpuarray/array.h’ file not found

這個錯誤跟剛才類似,我也懶得去找-I是從哪裡設定了,索性把/usr/local/include/gpuarray拷貝到/Users/flybywind/anaconda3/lib/python3.5/site-packages/pygpu-0.2.1-py3.5-macosx-10.6-x86_64.egg/pygpu下面了。

然後再執行,又掛了,這次提示:
ld: library not found for -lgpuarray
clang: error: linker command failed with exit code 1 (use -v to see invocation)

故技重施,把上面?libgpuarray/lib中的動態連結庫拷貝到/Users/flybywind/anaconda3/lib下面好了。

繼續試,終於好了:

Mapped name None to device opencl0:1: Iris
PCI Bus ID: (unsupported for device opencl0:1)
[GpuElemwise{exp,no_inplace}(<GpuArrayType<None>(float32, (False,))>), HostFromGpu(gpuarray)(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 1.042960 seconds
Result is [ 1.23178029  1.61879337  1.52278066 ...,  2.20771813  2.29967761
1.62323284]
Used the gpu

好了,現在還有2個問題:

opencl0:1到底啥意思?

Device specifiers are composed of the type string and the device id like so:

"cuda0"
"opencl0:1"

For opencl the device id is the platform number, a colon (:) and the device number. There are no widespread and/or easy way to list available platforms and devices. You can experiement with the values, unavaiable ones will just raise an error, and there are no gaps in the valid numbers.

就是說,opencl表示型別,跟cuda類似。但是對於opencl,還要指定platform和裝置編號,中間用”:”分隔。編號都是連續的,所以這2個數從0開始往後試即可[來源]。一般platform就是0,所以我試了0:0, 發現不對,有問題:

Mapped name None to device opencl0:0: Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz
PCI Bus ID: (unsupported for device opencl0:0)
[GpuElemwise{exp,no_inplace}(<GpuArrayType<None>(float32, (False,))>), HostFromGpu(gpuarray)(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 1.459022 seconds
Result is [ 1.23178029  1.61879325  1.52278078 ...,  2.20771813  2.29967737
1.62323272]
Used the gpu

這個很有意思!首先時間確實縮短了,graph也是GpuElem,最後numpy的檢測也顯示是gpu,但是device卻顯示的是CPU。好像是一種混合體。。。
於是再試一下0:1, 發現終於對了:

Mapped name None to device opencl0:1: Iris
PCI Bus ID: (unsupported for device opencl0:1)
[GpuElemwise{exp,no_inplace}(<GpuArrayType<None>(float32, (False,))>), HostFromGpu(gpuarray)(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 1.042960 seconds
Result is [ 1.23178029  1.61879337  1.52278066 ...,  2.20771813  2.29967761
1.62323284]
Used the gpu

此時裝置顯示也是對的,而時間進一步縮短!

PCI Bus ID: (unsupported for device opencl0:1)什麼意思?
如果是cuda,最新的gpuarray是可以顯示PCI匯流排id的:

Mapped name None to device cuda: GeForce 840M;

PCI Bus ID: 0000:0A:00.0

但是opencl就是這幅德行。所以,the end! 我終於解決了所有問題了。