搭建深度学习服务器环境

安装Ubuntu和Windows双系统

详细的安装双系统就不过多介绍了,可以参考这篇文章,但是在安装过程中有几个问题需要说明:

  • 安装Ubuntu之前首先要把BIOS的security boot关闭,否则会出现NVIDIA驱动安装完以后重启电脑会反复进入登录界面。

  • 硬盘分区的时候可以只分为swapUEFI\\home四个分区,不分\home也可以,在挂在\分区的时候会自动生成\home和其他分区,但是在重装系统的时候\home无法重新挂在之前的\home分区导致数据丢失(类似于Windows的非系统盘)。

  • 重装Ubuntu系统时请早Windows下用EasyUEFI软件将Ubuntu的引导项删除。


安装NVIDIA驱动

安装驱动NVIDIA所需的依赖包

在终端里依次输入以下命令安装驱动所需的依赖包

1
2
3
4
5
sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler
sudo apt-get install --no-install-recommends libboost-all-dev
sudo apt-get install libopenblas-dev liblapack-dev libatlas-base-dev
sudo apt-get install libgflags-dev libgoogle-glog-dev liblmdb-dev
sudo apt-get install git cmake build-essential

假如有安装包一直下载失败,可以使用:

1
sudo apt-get update

禁用Ubuntu自带的显卡驱动

Ubuntu 16.04 自带 nouveau显卡驱动,这个自带的驱动是不能用于CUDA的,需要卸载重装。假如重装过显卡驱动则可跳过这一步。没有重装过的就跟着我的步骤往下走。

首先得禁用Ubuntu自带的显卡驱动nouveau,只有在禁用掉 nouveau 后才能顺利安装 NVIDIA 显卡驱动,禁用方法就是在 /etc/modprobe.d/blacklist-nouveau.conf文件中添加一条禁用命令,首先需要打开该文件,通过以下命令打开:

1
sudo gedit /etc/modprobe.d/blacklist-nouveau.conf

打开后发现该文件中没有任何内容,写入

1
2
blacklist nouveau  
options nouveau modeset=0

保存后关闭文件,注意此时还需执行以下命令使禁用 nouveau 真正生效:

1
sudo update-initramfs -u

然后输入以下命令,若什么都没有显示则禁用nouveau生效了(重启电脑有可能会出现黑屏,原因是禁用了集成显卡,系统没有显卡驱动):

1
lsmod | grep nouveau

安装NVIDIA官方显卡驱动

通过Ctrl + Alt + F1进入文本模式,输入帐号密码登录,通过Ctrl + Alt + F7可返回图形化模式,在文本模式登录后首先关闭桌面服务:

1
sudo service lightdm stop

这里会要求你输入账户的密码。然后通过Ctrl + Alt + F7发现已无法成功返回图形化模式,说明桌面服务已成功关闭,注意此步对接下来的 NVIDIA 显卡驱动安装尤为重要,必需确保桌面服务已关闭。按Ctrl + Alt + F1再次进入终端命令行界面,先卸载之前的显卡驱动(注意以下命令在Zsh的shell环境下不认识*,需要切换到bash的shell环境):

1
sudo apt-get purge nvidia*

加入官方ppa源:

1
sudo add-apt-repository ppa:graphics-drivers/ppa

之后刷新软件库并安装显卡驱动:

1
2
3
4
sudo apt-get update
sudo apt-get install nvidia-418 nvidia-settings nvidia-prime # CUDA 10.1
#sudo apt-get install nvidia-390 nvidia-settings nvidia-prime # cuda 8.0 或 CUDA 9.0
#sudo apt-get install nvidia-415 nvidia-settings nvidia-prime # CUDA 9.0

重启电脑,通过下面命令查看显卡信息:

1
nvidia-settings

配置NVIDIA环境变量

使用 gedit 命令打开配置文件:

1
sudo gedit ~/.bashrc

打开后在文件最后加入以下两行内容:

1
2
export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH

保存并退出,运行以下内容使环境变量生效:

1
source  ~/.bashrc

查看NVIDIA驱动版本

1
cat /proc/driver/nvidia/version

nvidia driver
或者

1
nvidia-smi

nvidia smi


安装CUDA

安装CUDA步骤

推荐下载安装.run格式文件

  • 安装CUDA9.0以及之前版本
    安装完显卡驱动后,CUDA Toolkitsamples可单独安装,直接在终端运行安装,无需进入文本模式:

    1
    sudo sh cuda_9.0.176_384.81_linux.run --no-opengl-libs

    执行此命令约1分钟后会出现安装协议要你看,刚开始是0%,此时长按回车键让此百分比增长,直到100%,然后按照提示操作即可,先输入 accept ,是否安装显卡驱动选择no:

    1
    2
    Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 387.26?
    (y)es/(n)o/(q)uit: n

    其余的一律按默认或者y进行安装即可。

    CUDA安装完成

    安装结束以后在/usr/local/目录下查看,可以看到不但生成对应版本的cuda-9.0文件夹,还生成一个相应软连接文件夹cuda:
    cuda1

  • 安装CUDA10.1
    按照前面安装NVIDIA驱动方法安装NVIDIA-418驱动

    1
    2
    chmod 777 cuda_10.1.105_418.39_linux.run
    sudo sh ./cuda_10.1.105_418.39_linux.run

    输入accept进入安装界面:
    CUDA 10.1
    不要安装CUDA自带的NVIDIA驱动,将光标移动到Driver选项上,按下空格键取消选择安装NVIDIA驱动,移动光标再到Install上然后按回车。
    CUDA 1O.1
    若已经安装旧版本的CUDA版本,会出现以下提示,输入yes继续安装即可:
    CUDA 1O.1
    安装成功后提示:
    CUDA success

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    ===========
    = Summary =
    ===========

    Driver: Not Selected
    Toolkit: Installed in /usr/local/cuda-10.1/
    Samples: Installed in /home/andy/, but missing recommended libraries

    Please make sure that
    - PATH includes /usr/local/cuda-10.1/bin
    - LD_LIBRARY_PATH includes /usr/local/cuda-10.1/lib64, or, add /usr/local/cuda-10.1/lib64 to /etc/ld.so.conf and run ldconfig as root

    To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-10.1/bin

    Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.1/doc/pdf for detailed information on setting up CUDA.
    ***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 418.00 is required for CUDA 10.1 functionality to work.
    To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
    sudo <CudaInstaller>.run --silent --driver

    Logfile is /var/log/cuda-installer.log

    安装成功以后在/usr/local/目录下查看,可以看到不但生成对应版本的cuda-10.1文件夹,还生成一个相应软连接文件夹cuda或者将之前cuda9.0生成的cuda软连接重新指向cuda10.1文件夹:
    CUDA 10.1 Sucessful

修改配置文件

安装完成后配置CUDA环境变量,使用vim配置文件:

1
vim ~/.bashrc

在该文件最后加入以下两行并保存:

1
2
3
# CUDA
export PATH=/usr/local/cuda/bin:$PATH #/usr/local/cuda和/usr/local/cuda-10.1是同一个文件夹,两者通过软连接相连
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

使该配置生效:

1
source  ~/.bashrc

检验CUDA 是否安装成功,输入:

1
2
3
cd /usr/local/cuda/samples/1_Utilities/deviceQuery
sudo make
./deviceQuery
  • CUDA 9.0 PASS:
    CUDA 9.0 PASS

  • CUDA 10.1 PASS:
    CUDA 10.1 PASS

查看CUDA版本

1
cat /usr/local/cuda/version.txt
  • CUDA 9.0
    CUDA 9.0

  • CUDA 10.1
    CUDA 10.1

卸载CUDA的方法

1
2
3
4
5
6
7
cd /usr/local/cuda/bin

# CUDA 9.0
sudo ./uninstall_cuda_9.0.pl

# CUDA 10.1
sudo ./cuda-uninstaller

卸载完成后如果显示:Not removing directory, it is not empty: /usr/local/cuda-9.0 ,假如需要重装CUDA 9.0的话就把这个文件夹删除。在/usr/local/路径下输入:

1
2
3
4
5
# CUDA 9.0
sudo rm -rf cuda-9.0

# CUDA 10.1
sudo rm -rf cuda-10.1

安装CUDA过程中遇到的问题

CUDA 10.1提示安装失败:
CUDA Error
查看vim /var/log/cuda-installer.log显示:
Error Detail
ERROR: You appear to be running an X server; please exit X,是在安装CUDA的时候选择的安装CUDA自带的NVIDIA显卡驱动导致的,解决方法是:
(1)在安装CUDA的时候不要选择安装CUDA自带的NVIDIA驱动;
(2)若要用CUDA自带的NVIDIA显卡驱动,则Ctrl + Alt + F1在终端命令行进行安装:

1
2
3
4
sudo service lightdm stop
bash # Switch from zsh environment to bash environment
sudo apt-get purge nvidia*
sudo sh ./cuda_10.1.105_418.39_linux.run

若是在终端命令行下安装的CUDA,则需要安装成功后运行:

1
sudo service lightdm start

然后再按通过Ctrl + Alt + F7可返回图形化模式。


安装cuDNN

下载安装cuDNN

cuDNN要根据CUDA选择相应平台版本,在Ubuntu16.04下(Ubuntu其他版本类似)到cuDNN官网推荐下载安装.tgz格式的文件, 不推荐下载安装.deb格式,若误装了.deb格式的cuDNN请用以下命令进行卸载:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
dpkg -l |grep -i libcudnn* # 查看.deb安装的cudnn
sudo apt-get purge libcudnn×
```
下面以安装**cuDNN v7.5.0**为例安装,其他版本类似,只需要将版本号改一下即可:
![cuDNN Download](cudnn.png)

解压`cudnn-10.1-linux-x64-v7.5.0.56.tgz`到当前文件夹,得到一个`cuda`文件夹,该文件夹下有`include`和 `lib64`两个文件夹:
![cuDNN folder](cuDNN-folder.png)

**若安装了多个`CUDA`版本,要特别注意`/usr/local/cuda`软连接到了哪个版本的`CUDA`。**

命令行进入其中的`include`文件夹路径下,然后进行以下操作:
```shell
cd ~/Download/cuda/include/
sudo cp cudnn.h /usr/local/cuda/include/ #复制头文件

然后命令行进入cuda/lib64文件夹路径下(其实cuda/lib64文件夹下通过Beyond Compare查看,libcudnn.solibcudnn.so.7libcudnn.so.7.5.0是同一个文件的不同扩展名),运行以下命令:

1
2
3
4
5
6
7
cd ~/Download/cuda/lib64/
sudo cp lib* /usr/local/cuda/lib64/ #复制动态链接库
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
cd /usr/local/cuda/lib64/
sudo rm -rf libcudnn.so libcudnn.so.7 #删除原有动态文件
sudo ln -s libcudnn.so.7.5.0 libcudnn.so.7 #生成软链接
sudo ln -s libcudnn.so.7 libcudnn.so #生成软链接

cudnn1

随后需要将路径/usr/local/cuda/lib64添加到动态库,分两步:
1)安装vim, 输入:

1
sudo apt-get install vim-gtk

2)配置,输入:

1
sudo vim /etc/ld.so.conf.d/cuda.conf

编辑状态下,输入:

1
/usr/local/cuda/lib64

保存退出,输入下面代码使其生效

1
sudo ldconfig

安装完成后可用nvcc -V命令验证是否安装成功,若出现以下信息则表示安装成功:

1
2
3
4
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:17_PST_2019
Cuda compilation tools, release 10.1, V10.1.105

查看cuDNN版本:

1
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

cudnn2

cuDNN常见问题

1
2
3
Error : Failed to get convolution algorithm. 
This is probably because cuDNN failed to initialize,
so try looking to see if a warning log message was printed above.

出现上述问题是安装的cuDNN版本跟CUDATensorFlow相兼容的版本不符合,重新安装指定版本的cuDNN即可。

参考资料

cuDNN官方安装指导


CUDA多版本问题

在实验的时候有些算法跟当前生效(安装)的CUDAcuDNN版本不一致,所以需要同时安装多个版本,这里就是解决同时管理多个CUDA版本问题.

  1. 首先按照上述介绍的安装CUDA和对应版本的安装cuDNN,安装实验环境依赖的版本;

  2. 默认/usr/local/cuda软连接最新安装的CUDA文件夹上的:
    cuda2

  3. 删除已经软连接的/usr/local/cuda,将需要的CUDA-X.0安装文件夹软连接到/usr/local/cuda上, 例如需要CUDA 9.0这个版本:

    1
    2
    3
    4
    5
    6
    7
    8
    9
        cd /usr/local/
    sudo rm cuda
    sudo ln -s /usr/local/cuda-9.0 /usr/local/cuda
    ```
    ![cuda3](cuda9-cuda10.1.png)
    4. 由于在安装`CUDA`的时候已经将`cuda`加入了环境变量,所以不用再加入了。
    5. 查看`CUDA`版本
    ```shell
    cat /usr/local/cuda/version.txt

    cuda


Anaconda

安装Anaconda

下载Anacondash文件Anaconda3-5.2.0-Linux-x86_64.sh,然后运行以下代码:

1
2
chmod a+x ./Anaconda3-5.2.0-Linux-x86_64.sh #chmod 777 ./Anaconda3-5.2.0-Linux-x86_64.sh
bash Anaconda3-5.2.0-Linux-x86_64.sh

或者

1
2
chmod 777 Anaconda3-5.3.1-Linux-x86_64.sh
bash Anaconda3-5.3.1-Linux-x86_64.sh

conda install -c menpo opencv3命令有时候会显示权限不够permission issue。这是因为你安装anaconda时用的是sudo,这时候需要修改anaconda3文件夹权限:

1
sudo chown -R 你的用户名 /home/你的用户名/anaconda3

屏蔽Anaconda

1
vim ~/.bashrc

然后屏蔽后的结果如下:

1
2
3
4
# added by Anaconda3 5.3.1 installer
#export PATH="/home/andy/anaconda3/bin:$PATH"
#export LD_LIBRARY_PATH=~/anaconda3/lib:$LD_LIBRARY_PATH
#export CPLUS_INCLUDE_PATH=~/anaconda3/include/python3.6m

其实这里涉及到linux可执行程序搜索路径的问题,上述PATH="/home/andy/anaconda3/bin:$PATH"/home/andy/anaconda3/bin放在了原始的$PATH前面,这样系统在执行的时候首先检查要可执行文件是否在/home/andy/anaconda3/bin中,然后再从$PATH中搜索,理解了这个关系,上述代码可以改为,这样改了以后将不需要重建Anaconda软连接这一步操作了:

1
2
3
4
# added by Anaconda3 5.3.1 installer
export PATH="$PATH:/home/andy/anaconda3/bin"
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/anaconda3/lib
export CPLUS_INCLUDE_PATH=~/anaconda3/include/python3.6m

Anaconda最新版屏蔽如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# added by Anaconda3 5.3.1 installer
# >>> conda init >>>
# !! Contents within this block are managed by 'conda init' !!
#__conda_setup="$(CONDA_REPORT_ERRORS=false '/home/andy/anaconda3/bin/conda' shell.bash hook 2> /dev/null)"
#if [ $? -eq 0 ]; then
# \eval "$__conda_setup"
#else
# if [ -f "/home/andy/anaconda3/etc/profile.d/conda.sh" ]; then
# . "/home/andy/anaconda3/etc/profile.d/conda.sh"
# CONDA_CHANGEPS1=false conda activate base
# else
# \export PATH="/home/andy/anaconda3/bin:$PATH"
# fi
#fi
#unset __conda_setup
# <<< conda init <<<

最后命令行输入以下命令:

1
source ~/.bashrc

必须重启电脑

重建Anaconda软连接

重建原理
由于linux系统默认搜索可执行文件的顺序为/bin -> /usr/bin -> /usr/local/bin ,而前两个为系统的可执行文件存放的地方,/usr/local/bin为用户自定义的可执行文件存放区,所以只需要将Anaconda~/anaconda3/bin/可执行文件软连接/usr/local/bin即可。

当需要重新使用Anaconda的时候,只需要将Anaconda的执行文件软连接/usr/local/bin里,注意这里要用绝对路径,否则不起作用,如:

1
2
3
sudo ln  -s  /home/andy/anaconda3/bin/conda  /usr/local/bin/conda
sudo ln -s /home/andy/anaconda3/bin/activate /usr/local/bin/activate
sudo ln -s /home/andy/anaconda3/bin/deactivate /usr/local/bin/deactivate

首先注意usrUnix System Resource,而不是User,

  • /usr/bin下面的都是系统预装的可执行程序,会随着系统升级而改变
  • /usr/local/bin目录是给用户放置自己的可执行程序的地方,推荐放在这里,不会被系统升级而覆盖同名文件

软连接后使用时:
首先用以下命令查看anaconda环境(自带为base):

1
conda env list

conda list

激活环境用:

1
2
3
conda activate [env name]
# or
source activate [env name]

注意: 上面[env name]用具体的环境名代替,如conda activate base.
conda list

取消激活环境用:

1
2
3
conda deactivate
# or
source deactivate

conda list

Anaconda虚拟环境

创建新的虚拟环境

1
conda create -n venv python=3.6 # select python version

激活虚拟环境:

1
source activate venv

删除虚拟环境:

1
conda env remove -n venv

安装opencv

下载OpenCV

进入官网 : http://opencv.org/releases.html 或者 https://github.com/opencv/opencv/releases, 选择 需要的 x.x.x.zip版本, 下载 opencv-x.x.x.zip :

1
2
3
4
cd
wget https://github.com/opencv/opencv/archive/x.x.x.zip
chmod 777 x.x.x.zip
unzip x.x.x.zip

编译OpenCV

随后解压到你要安装的位置,命令行进入已解压的文件夹 opencv-x.x.x 目录下,执行:

1
2
3
4
5
cd opencv-x.x.x
mkdir build # 创建编译的文件目录
cd build
cmake -D CMAKE_BUILD_TYPE=Release -D CMAKE_INSTALL_PREFIX=/usr/local -DBUILD_JPEG=ON -DBUILD_TIFF=ON -DBUILD_PNG=ON ..
make -j8 #编译

遇到一下报错信息有两种可能:
编译报错

  • 在编译opencv3.4.0源码的时候,会下载诸如ippicv_2017u3_lnx_intel64_20170822.tgz的压缩包,如果下载失败,请下载离线包,解压该文件,会得到.cache文件夹,用此文件夹覆盖opencv源码文件夹下的.cache文件夹,再重新编译即可。.cahce文件夹为隐藏文件,可用ctrl+h查看。

  • 若本机里安装了Anaconda推荐屏蔽Anaconda,否则需要在~/.bashrc~/.zshrc中加入:

    1
    2
    3
    4
    5
    # added by Anaconda3 installer
    export PATH="/home/andy/anaconda3/bin:$PATH"
    export LD_LIBRARY_PATH=~/anaconda3/lib:$LD_LIBRARY_PATH
    export CPLUS_INCLUDE_PATH=~/anaconda3/include/python3.6m
    export PATH="$PATH:$HOME/bin"

98%的时候会等很久很久,属于正常现象。

安装OpenCV

编译成功后安装:

1
sudo make install #安装

运行以下命令刷新opencv动态链接库:

1
sudo ldconfig

安装完成后通过查看 opencv 版本验证是否安装成功:

1
pkg-config --modversion opencv

若运行以上命令提示一下错误
编译报错
临时解决方法

1
2
3
4
5
6
7
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
```
**彻底解决方法**
接下来要给系统加入`opencv`库的环境变量:
用`gedit`打开`/etc/ld.so.conf`,注意要用sudo打开获得权限,不然无法修改, 如:
```shell
sudo gedit /etc/ld.so.conf

在文件中加上一行:

1
/usr/local/lib

/user/localopencv安装路径 就是makefile中指定的安装路径。

再运行:

1
sudo ldconfig
  • bash

    • 所有用户
      修改/etc/bash.bashrc文件:

      1
      sudo vim /etc/bash.bashrc

      在文件末尾加入:

      1
      2
      PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/lib/pkgconfig 
      export PKG_CONFIG_PATH

      运行source /etc/bash.bashrc使其生效。

    • 当前用户
      修改~/.bashrc文件:

      1
      vim ~/.bashrc

      在文件末尾加入:

      1
      2
      PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/lib/pkgconfig 
      export PKG_CONFIG_PATH

      运行source ~/.bashrc使其生效。

  • zsh

    • 所有用户

      1
      vim /etc/zsh/zprofile

      然后加入以下内容:

      1
      2
      PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/lib/pkgconfig 
      export PKG_CONFIG_PATH

      运行source /etc/zsh/zprofile使其生效。

    • 当前用户

      1
      vim ~/.zshrc

      然后加入以下内容:

      1
      2
      PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/lib/pkgconfig 
      export PKG_CONFIG_PATH

      运行source ~/.zshrc使其生效。

卸载OpenCV

进入OpenCV解压文件夹中的buid文件夹:

1
cd $HOME/opencv-x.x.x/build

运行:

1
sudo make uninstall

然后把整个opencv-x.x.x文件夹都删掉。随后再运行:

1
2
3
4
sudo rm -r /usr/local/include/opencv2 /usr/local/include/opencv \
/usr/include/opencv /usr/include/opencv2 /usr/local/share/opencv \
/usr/local/share/OpenCV /usr/share/opencv /usr/share/OpenCV \
/usr/local/bin/opencv* /usr/local/lib/libopencv

把一些残余的动态链接文件和空文件夹删掉。有些文件夹已经被删掉了所以会找不到路径。


TensorRT

安装TensorRT

TensorRT官方安装指南

TensorRT环境变量设置

首先下载tar版本的安装包,需要登陆NVIDIA账号。
安装TensorRT前需要安装Cuda安装cudnn,安装步骤可以参考上方。
打开下载的TensorRT所在路径,解压下载的tar文件:

1
2
chmod 777 TensorRT-XXX.tar.gz
tar -xzvf TensorRT-XXX.tar.gz

将加压后的TensorRT-XXX文件夹移动到HOME目录下,并创建软连接,这样可以安装多个版本的TensorRT-XXX,在切换的时候只需要将用到的TensorRT-XXX版本软连接到TensorRT上就可以了:

1
2
3
4
5
6
7
8
9
10
11
mv TensorRT-XXX  ~/TensorRT-XXX
cd

# Create Symbol Link
ln -s ~/TensorRT-XXX TensorRT

# TensorRT 3
sudo ln -s ~/TensorRT/bin/giexec /usr/local/bin/

# TensorRT >= 4
sudo ln -s ~/TensorRT/bin/trtexec /usr/local/bin/

然后设置环境变量

1
2
3
4
5
# bash
vim ~/.bashrc # 打开环境变量文件

# zsh
vim ~/.zshrc # 打开环境变量文件
1
2
3
4
# 将下面三个环境变量写入环境变量文件并保存
export LD_LIBRARY_PATH=~/TensorRT/lib:$LD_LIBRARY_PATH
export CUDA_INSTALL_DIR=/usr/local/cuda
export CUDNN_INSTALL_DIR=/usr/local/cuda
1
2
3
4
5
# bash
source ~/.bashrc # 使刚刚修改的环境变量文件生效

# zsh
source ~/.zshrc

安装Python的TensorRT包

进到解压后的TensorRTPython文件下:

非虚拟环境下

1
2
3
4
5
6
7
cd ~/TensorRT/python/

# 对于python2
sudo pip2 install tensorrt-XXX-cp27-cp27mu-linux_x86_64.whl

# 对于python3
sudo pip3 install tensorrt-XXX-cp35-cp35m-linux_x86_64.whl

或者:

1
2
3
4
5
6
7
cd TensorRT/python/

# 对于python2
pip2 install tensorrt-XXX-cp27-cp27mu-linux_x86_64.whl --user

# 对于python3
pip3 install tensorrt-XXX-cp35-cp35m-linux_x86_64.whl --user

虚拟环境下

1
2
3
4
5
6
7
8
source  activate venv
cd TensorRT/python/

# 对于python2
pip install tensorrt-XXX-cp27-cp27mu-linux_x86_64.whl

# 对于python3
pip install tensorrt-XXX-cp35-cp35m-linux_x86_64.whl

如安装失败请参考安装过程中遇到的问题以及解决方法

安装uff

转到uff目录下安装uff文件夹下安装:

非虚拟环境下

1
2
3
4
5
6
7
cd ~/TensorRT/uff/

# 对于python2
sudo pip2 install uff-XXX-py2.py3-none-any.whl

# 对于python3
sudo pip3 install uff-XXX-py2.py3-none-any.whl

或者:

1
2
3
4
5
6
7
cd TensorRT/uff/

# 对于python2
pip2 install uff-XXX-py2.py3-none-any --user

# 对于python3
pip3 install uff-XXX-py2.py3-none-any --user

虚拟环境下

1
2
3
4
5
6
7
8
source  activate venv
cd TensorRT/uff/

# 对于python2
pip install uff-XXX-py2.py3-none-any.whl

# 对于python3
pip install uff-XXX-py2.py3-none-any.whl

验证TensorRT是否安装成功

测试TensorRT是否安装成功

1
which tensorrt

会输出TensorRT的安装路径:

1
/usr/local/bin/tensorrt

测试uff是否安装成功

1
which convert-to-uff

会输出uff的安装路径:

1
/usr/local/bin/convert-to-uff

拷贝lenet5.uff到python相关目录进行验证:

1
2
3
4
5
6
sudo cp TensorRT/data/mnist/lenet5.uff TensorRT/python/data/mnist/lenet5.uff
cd TensorRT/samples/sampleMNIST
make clean
make
cd /TensorRT/bin(转到bin目录下面,make后的可执行文件在此目录下)
./sample_mnist

命令执行顺利即安装成功。

TensorRT安装过程中遇到的问题以及解决方法

在安装PythonTensorRT包时可能出现的错误

1
2
3
4
In file included from src/cpp/cuda.cpp:1:0:
src/cpp/cuda.hpp:14:18: fatal error: cuda.h: No such file or directory
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

TensorRT报错
原因
显示是找不到cuda.h,根据网上分析是因为用了sudo之后环境变量用的是root的环境变量。

解决方案
将cuda的安装路径添加到root的环境变量中,在root角色下安装Python的TensorRT包:

1
2
3
4
5
6
sudo gedit /etc/profile.d/cuda.sh
```
添加:

```shell
export PATH=/usr/local/cuda/bin:$PATH
1
2
3
4
5
6
7
sudo su -
# 对于python2
pip2 install tensorrt-XXX-cp27-cp27mu-linux_x86_64.whl

# 对于python3
pip3 install tensorrt-XXX-cp35-cp35m-linux_x86_64.whl
exit

Python导入tensorrt或者tensorflow的时候提示ImportError: numpy.core.multiarray failed to import

解决方法:

1
pip install -U numpy

TensorRT生成Engine

Caffe模型用TensorRT生成Engine

1
2
3
4
5
~/TensorRT/bin/giexec \
--deploy=path_to_prototxt/intputdeploy.prototxt \
--output=prob \
--model=path_to_caffemodel/caffeModelName.caffemodel \
--engine=path_to_output_engine/outputEngineName.engine

Tensorflow模型用TensorRT生成Engine

源码文件在src/tensorrt/tools中。

首先将TensorFlow模型生成uff文件,然后再将uff文件转为engine:

将TensorFlow模型生成UFF文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# -*- coding: utf-8 -*-
# Author : Andy Liu
# Last modified: 2019-03-15

# This script is used to convert tensorflow model file to uff file
# Using:
# python tf_to_uff.py

import uff
import tensorflow as tf
import tensorrt as trt
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

ckpt_path = "model/model.ckpt"
forzen_model_path = "model/frozen_graphs/frozen_graph.pb"
uff_path = "model/uff/model.uff"

frozen_input_name = "input"
net_input_shape = (3, 32, 32)
frozen_output_names = ["fc_3/frozen"]

def getChatBotModel(ckpt_path):
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
saver = tf.train.import_meta_graph(ckpt_path+'.meta')
saver.restore(sess, ckpt_path)
graph = tf.get_default_graph().as_graph_def()
#graph = tf.get_default_graph()
#print('graph list:', graph.get_operations())
frozen_graph = tf.graph_util.convert_variables_to_constants(sess, graph, frozen_output_names)
return tf.graph_util.remove_training_nodes(frozen_graph)

tf_model = getChatBotModel(ckpt_path)
with tf.gfile.FastGFile(forzen_model_path, mode='wb') as f:
f.write(tf_model.SerializeToString())
#uff_model = uff.from_tensorflow(tf_model, output_nodes=frozen_output_names, output_filename=uff_path, text=True)
uff_model = uff.from_tensorflow_frozen_model(forzen_model_path, output_nodes=frozen_output_names, output_filename=uff_path, text=True)
print('Success! UFF file is in ', os.path.abspath(uff_path))

将UFF文件转为Engine

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# -*- coding: utf-8 -*-
# Author : Andy Liu
# Last modified: 2019-03-15

# This script is used to convert .uff file to .engine for TX2/PX2 or other NVIDIA Platform
# Using:
# python uff_to_engine.py

import os
# import tensorflow as tf
import tensorrt as trt
from tensorrt.parsers import uffparser
import uff

print("TensorRT version = ", trt.__version__)
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

frozen_input_name = "input"
net_input_shape = (3, 32, 32)
frozen_output_name = "fc_3/frozen"
uff_path = 'model.uff'
engine_path = "model.engine"

def uff2engine(frozen_input_name, net_input_shape,frozen_output_name,uff_path,engine_path):
with open(uff_path, 'rb') as f:
uff_model = f.read()
G_LOGGER = trt.infer.ConsoleLogger(trt.infer.LogSeverity.ERROR)
parser = uffparser.create_uff_parser()
parser.register_input(frozen_input_name, net_input_shape, 0)
# parser.register_input("input", (3, 128, 128), 0)
parser.register_output(frozen_output_name)
engine = trt.utils.uff_to_trt_engine(G_LOGGER, uff_model, parser, 1, 1<<20 )
parser.destroy()
trt.utils.write_engine_to_file(engine_path, engine.serialize())

if __name__ == '__main__':
engine_dir = os.path.dirname(engine_path)
if not os.path.exists(engine_dir) and not engine_dir == '.' and not engine_dir =='':
print("Warning !!! %s is not exists, now has create "%engine_dir)
os.makedirs(engine_dir)

uff2engine(frozen_input_name, net_input_shape,frozen_output_name,uff_path,engine_path)
print("Success! Engine file has saved in ", os.path.abspath(engine_path))

调用Engine进行推理

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
import os
# import tensorflow as tf
import tensorrt as trt
from tensorrt.parsers import uffparser
import pycuda.driver as cuda
# import uff
from PIL import Image
import numpy as np

IMG_PATH = "./img/1.png"
LABEL = 1
ENGINE_PATH = "./model/engine/model.engine"
NET_INPUT_SHAPE = (32, 32)
NET_OUTPUT_SHAPE = 5

def normalize_img(img):
"""
Normalize image data to [-1,+1]
Arguments:
img: source image
"""
return (img-128.)/128.

# Load Image
def load_image(img_path, net_input_shape):
img = Image.open(img_path)
img = img.resize(net_input_shape)
return np.asarray(img, dtype=np.float32)


img = load_image(IMG_PATH, NET_INPUT_SHAPE)
img = normalize_img(img)

# Load Engine file
G_LOGGER = trt.infer.ConsoleLogger(trt.infer.LogSeverity.ERROR)
engine = trt.utils.load_engine(G_LOGGER, ENGINE_PATH)
context = engine.create_execution_context()
runtime = trt.infer.create_infer_runtime(G_LOGGER)

output = np.empty(5, dtype = np.float32)

# Alocate device memory
d_input = cuda.mem_alloc(1 * img.nbytes)
d_output = cuda.mem_alloc(NET_OUTPUT_SHAPE * output.nbytes)

bindings = [int(d_input), int(d_output)]

stream = cuda.Stream()

# Transfer input data to device
cuda.memcpy_htod_async(d_input, img, stream)
# Execute model
context.enqueue(1, bindings, stream.handle, None)
# Transfer predictions back
cuda.memcpy_dtoh_async(output, d_output, stream)
# Syncronize threads
stream.synchronize()


print("Test Case: " + str(LABEL))
print ("Prediction: " + str(np.argmax(output)))

TensorRT官方实例

资料在本仓库src/tensorrt目录下:

参考资料

TensorRT官方安装指南


安装caffe

Python2下安装Cafe

安装依赖库

1
2
3
4
5
6
7
8
9
10
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install -y build-essential cmake git pkg-config
sudo apt-get install -y libprotobuf-dev libleveldb-dev libsnappy-dev protobuf-compiler

sudo apt-get install -y libatlas-base-dev
sudo apt-get install -y --no-install-recommends libboost-all-dev

sudo apt-get install -y libgflags-dev libgoogle-glog-dev liblmdb-dev
sudo apt-get -y install build-essential cmake git libgtk2.0-dev pkg-config python-dev python-numpy libdc1394-22 libdc1394-22-dev libjpeg-dev libpng12-dev libtiff5-dev libjasper-dev libavcodec-dev libavformat-dev libswscale-dev libxine2-dev libgstreamer0.10-dev libgstreamer-plugins-base0.10-dev libv4l-dev libtbb-dev libqt4-dev libfaac-dev libmp3lame-dev libopencore-amrnb-dev libopencore-amrwb-dev libtheora-dev libvorbis-dev libxvidcore-dev x264 v4l-utils unzip

配置CUDACUDNN

添加 CUDA 环境变量

1
2
3
4
5
vim ~/.bashrc

# CUDA
export PATH=/usr/local/cuda/bin:$PATH # cuda -> /usr/local/cuda-9.0
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

安装OpenCV,方法同: 安装OpenCV

然后按照前面的方法屏蔽Anaconda

配置Caffe

首先cd 到你要安装的路径下运行

1
git clone https://github.com/BVLC/caffe.git

这时候会出现一个 caffe 文件夹。命令行进入此文件夹,运行:

1
2
3
4
5
6
7
8
9
10
11
12
13
cp Makefile.config.example Makefile.config

# 若无法拷贝则运行以下命令
# chmod 777 Makefile.config.example
# cp Makefile.config.example Makefile.config
```
此命令是将 `Makefile.config.example` 文件复制一份并更名为 `Makefile.config` ,复制一份的原因是编译 `caffe` 时需要的是 `Makefile.config` 文件,而Makefile.config.example 只是 `caffe` 给出的配置文件例子,不能用来编译 `caffe`。

**然后修改 Makefile.config 文件**,在 `caffe` 目录下打开该文件:
```shell
vim Makefile.config

# 或者用右键选择gedit/vscode打开该文件

修改 Makefile.config 文件内容

  • 应用 cudnn
    将:#USE_CUDNN := 1修改为:USE_CUDNN := 1

  • 应用 opencv 3 版本
    将:#OPENCV_VERSION := 3修改为:OPENCV_VERSION := 3

  • 使用 python 接口
    将: #WITH_PYTHON_LAYER := 1修改为WITH_PYTHON_LAYER := 1

  • 修改 python 路径
    将:

    1
    2
    INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
    LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib

    修改为:

    1
    2
    INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial
    LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/hdf5/serial

    此python路径为系统自带python的路径,假如想使用Anaconda的python的话需要在其他地方修改。

  • 去掉compute_20
    找到

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    # CUDA architecture setting: going with all of them.
    # For CUDA < 6.0, comment the *_50 through *_61 lines for compatibility.
    # For CUDA < 8.0, comment the *_60 and *_61 lines for compatibility.
    # For CUDA >= 9.0, comment the *_20 and *_21 lines for compatibility.
    CUDA_ARCH := -gencode arch=compute_20,code=sm_20 \
    -gencode arch=compute_20,code=sm_21 \
    -gencode arch=compute_30,code=sm_30 \
    -gencode arch=compute_35,code=sm_35 \
    -gencode arch=compute_50,code=sm_50 \
    -gencode arch=compute_52,code=sm_52 \
    -gencode arch=compute_60,code=sm_60 \
    -gencode arch=compute_61,code=sm_61 \
    -gencode arch=compute_61,code=compute_61

    改为:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    # CUDA architecture setting: going with all of them.
    # For CUDA < 6.0, comment the *_50 through *_61 lines for compatibility.
    # For CUDA < 8.0, comment the *_60 and *_61 lines for compatibility.
    # For CUDA >= 9.0, comment the *_20 and *_21 lines for compatibility.
    CUDA_ARCH := -gencode arch=compute_30,code=sm_30 \
    -gencode arch=compute_35,code=sm_35 \
    -gencode arch=compute_50,code=sm_50 \
    -gencode arch=compute_52,code=sm_52 \
    -gencode arch=compute_60,code=sm_60 \
    -gencode arch=compute_61,code=sm_61 \
    -gencode arch=compute_61,code=compute_61

    由于CUDA 9.x +并不支持compute_20,此处不修改的话编译caffe时会报错:

    1
    nvcc fatal  : Unsupported gpu architecture 'compute_20'

配置好的完整的Makefile.config文件

在caffe源码目录中修改后的完整Makefile.config文件,内容如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
## Refer to http://caffe.berkeleyvision.org/installation.html
# Contributions simplifying and improving our build system are welcome!

# cuDNN acceleration switch (uncomment to build with cuDNN).
USE_CUDNN := 1

# CPU-only switch (uncomment to build without GPU support).
# CPU_ONLY := 1

# uncomment to disable IO dependencies and corresponding data layers
# USE_OPENCV := 0
# USE_LEVELDB := 0
# USE_LMDB := 0

# uncomment to allow MDB_NOLOCK when reading LMDB files (only if necessary)
# You should not set this flag if you will be reading LMDBs with any
# possibility of simultaneous read and write
# ALLOW_LMDB_NOLOCK := 1

# Uncomment if you're using OpenCV 3
OPENCV_VERSION := 3

# To customize your choice of compiler, uncomment and set the following.
# N.B. the default for Linux is g++ and the default for OSX is clang++
CUSTOM_CXX := g++

# CUDA directory contains bin/ and lib/ directories that we need.
CUDA_DIR := /usr/local/cuda
# On Ubuntu 14.04, if cuda tools are installed via
# "sudo apt-get install nvidia-cuda-toolkit" then use this instead:
# CUDA_DIR := /usr

# CUDA architecture setting: going with all of them.
# For CUDA < 6.0, comment the *_50 through *_61 lines for compatibility.
# For CUDA < 8.0, comment the *_60 and *_61 lines for compatibility.
# For CUDA >= 9.0, comment the *_20 and *_21 lines for compatibility.
CUDA_ARCH := -gencode arch=compute_30,code=sm_30 \
-gencode arch=compute_35,code=sm_35 \
-gencode arch=compute_50,code=sm_50 \
-gencode arch=compute_52,code=sm_52 \
-gencode arch=compute_60,code=sm_60 \
-gencode arch=compute_61,code=sm_61 \
-gencode arch=compute_61,code=compute_61

# BLAS choice:
# atlas for ATLAS (default)
# mkl for MKL
# open for OpenBlas
BLAS := atlas
# Custom (MKL/ATLAS/OpenBLAS) include and lib directories.
# Leave commented to accept the defaults for your choice of BLAS
# (which should work)!
# BLAS_INCLUDE := /path/to/your/blas
# BLAS_LIB := /path/to/your/blas

# Homebrew puts openblas in a directory that is not on the standard search path
# BLAS_INCLUDE := $(shell brew --prefix openblas)/include
# BLAS_LIB := $(shell brew --prefix openblas)/lib

# This is required only if you will compile the matlab interface.
# MATLAB directory should contain the mex binary in /bin.
# MATLAB_DIR := /usr/local
# MATLAB_DIR := /Applications/MATLAB_R2012b.app

# NOTE: this is required only if you will compile the python interface.
# We need to be able to find Python.h and numpy/arrayobject.h.
PYTHON_INCLUDE := /usr/include/python2.7 \
/usr/lib/python2.7/dist-packages/numpy/core/include
# Anaconda Python distribution is quite popular. Include path:
# Verify anaconda location, sometimes it's in root.
# ANACONDA_HOME := $(HOME)/anaconda
# PYTHON_INCLUDE := $(ANACONDA_HOME)/include \
# $(ANACONDA_HOME)/include/python2.7 \
# $(ANACONDA_HOME)/lib/python2.7/site-packages/numpy/core/include

# Uncomment to use Python 3 (default is Python 2)
# PYTHON_LIBRARIES := boost_python3 python3.5m
# PYTHON_INCLUDE := /usr/include/python3.5m \
# /usr/lib/python3.5/dist-packages/numpy/core/include

# We need to be able to find libpythonX.X.so or .dylib.
PYTHON_LIB := /usr/lib
# PYTHON_LIB := $(ANACONDA_HOME)/lib

# Homebrew installs numpy in a non standard path (keg only)
# PYTHON_INCLUDE += $(dir $(shell python -c 'import numpy.core; print(numpy.core.__file__)'))/include
# PYTHON_LIB += $(shell brew --prefix numpy)/lib

# Uncomment to support layers written in Python (will link against Python libs)
WITH_PYTHON_LAYER := 1

# Whatever else you find you need goes here.
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial/
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/hdf5/serial

# If Homebrew is installed at a non standard location (for example your home directory) and you use it for general dependencies
# INCLUDE_DIRS += $(shell brew --prefix)/include
# LIBRARY_DIRS += $(shell brew --prefix)/lib

# NCCL acceleration switch (uncomment to build with NCCL)
# https://github.com/NVIDIA/nccl (last tested version: v1.2.3-1+cuda8.0)
# USE_NCCL := 1

# Uncomment to use `pkg-config` to specify OpenCV library paths.
# (Usually not necessary -- OpenCV libraries are normally installed in one of the above $LIBRARY_DIRS.)
# USE_PKG_CONFIG := 1

# N.B. both build and distribute dirs are cleared on `make clean`
BUILD_DIR := build
DISTRIBUTE_DIR := distribute

# Uncomment for debugging. Does not work on OSX due to https://github.com/BVLC/caffe/issues/171
# DEBUG := 1

# The ID of the GPU that 'make runtest' will use to run unit tests.
TEST_GPUID := 0

# enable pretty build (comment to see full commands)
Q ?= @

修改caffe 目录下的Makefile文件

修改的地方找起来比较困难的话可以复制到word里查找
将:

1
NVCCFLAGS +=-ccbin=$(CXX) -Xcompiler-fPIC $(COMMON_FLAGS)

替换为:

1
NVCCFLAGS += -D_FORCE_INLINES -ccbin=$(CXX) -Xcompiler -fPIC $(COMMON_FLAGS)

将:

1
LIBRARIES += glog gflags protobuf boost_system boost_filesystem m hdf5_hl hdf5

改为:

1
LIBRARIES += glog gflags protobuf boost_system boost_filesystem m hdf5_serial_hl hdf5_serial

至此caffe配置文件修改完毕,可以开始编译了。假如显卡不是feimi架构的可以输入如下命令防止出现Unsupported gpu architecture 'compute_20'的问题:

1
cmake -D CMAKE_BUILD_TYPE=RELEASE  -D CUDA_GENERATION=Kepler ..

编译安装Caffe

caffe 目录下执行:

1
2
3
4
5
cd caffe
make all -j $(($(nproc) + 1))
make test -j $(($(nproc) + 1))
make runtest -j $(($(nproc) + 1))
make pycaffe -j $(($(nproc) + 1))

runtest之后成功成功的界面如下:
png

添加Caffe环境变量

1
2
vim ~/.bashrc
export PYTHONPATH=~/caffe/python:$PYTHONPATH

常见问题

常见问题 1
在caffe源码目录中修改Makefile文件中这一行如下:

1
LIBRARIES += glog gflags protobuf boost_system boost_filesystem m hdf5_serial_hl hdf5_serial

上述中Makefile.configMakefile文件都要添加hdf5相关选项,否则会提示以下错误:
hdf5报错

常见问题 2
python中导入caffe库的时候会提示以下信息:

1
2
/usr/local/lib/python2.7/dist-packages/scipy/sparse/lil.py:19: RuntimeWarning: numpy.dtype 
size changed, may indicate binary incompatibility. Expected 96, got 88

解决方法
numpy降版本:

1
2
pip uninstall numpy
pip install numpy==1.14.5

常见问题 3
导入caffe的时候还有一个错误:
导入caffe报错
原因是我在ubutnu下用的linuxbrew安装的Python2设为默认Python了,然后caffe编译配置文件里用的是系统的Python2路径,导致系统自带的Pythonlinuxbrew安装的Python环境混乱。
解决方法是屏蔽掉linuxbrew环境。只用系统自带的Python,将~/.profile文件中的eval $(/home/linuxbrew/.linuxbrew/bin/brew shellenv)这一行屏蔽:

1
2
# linuxbrew
#eval $(/home/linuxbrew/.linuxbrew/bin/brew shellenv)

然后重启电脑.

常见问题 4
导入caffe报错
导致上述原因是pip2同时存在于/usr/bin/pip2/usr/local/bin/pip2两个地方:

1
2
3
4
5
6
7
8
# 查看pip2位于哪里
$ where pip2
/usr/local/bin/pip2
/usr/bin/pip2

# 查看当前用到的pip2是哪一个
$ which pip
/usr/local/bin/pip

解决方法是用/usr/local/bin/pip2安装protobuf:

1
/usr/local/bin/pip2 install protobuf

Importing caffe results in ImportError: “No module named google.protobuf.internal”
This is probably because you have two python environments in your machine, the one provided by your linux distribution(pip) and the other by the anaconda environment (/home/username/anaconda2/bin/pip).
Try installing protobuf for both environments to be sure
pip install protobuf
/home/username/anaconda2/bin/pip install protobuf

Python3下安装Cafe

切换系统Python版本到Python3

将系统Python切换到Python3版本:

1
2
3
4
which python3
which python
sudo rm /usr/bin/python # 删掉Python软连接
sudo ln -s /usr/bin/python3 /usr/bin/python # 将Python3软连接到Python

装依赖库

同Python2.7安装依赖库

配置CUDACUDNN

同Python2.7配置CUDA以及CUDNN

pip 安装依赖模块

1
2
pip install opencv-python==3.4.0.12 # OpenCV的Python版本要跟opencv源码安装的版本对应起来
pip install protobuf

安装OpenCV,方法同: 安装OpenCV

然后按照前面的方法屏蔽Anaconda

配置Caffe

首先cd 到你要安装的路径下运行

1
git clone https://github.com/BVLC/caffe.git

这时候会出现一个 caffe 文件夹。命令行进入此文件夹,运行:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
cp Makefile.config.example Makefile.config

# 若无法拷贝则运行以下命令
# chmod 777 Makefile.config.example
# cp Makefile.config.example Makefile.config
```
此命令是将 `Makefile.config.example` 文件复制一份并更名为 `Makefile.config` ,复制一份的原因是编译 `caffe` 时需要的是 `Makefile.config` 文件,而Makefile.config.example 只是 `caffe` 给出的配置文件例子,不能用来编译 `caffe`。

##### 然后修改 Makefile.config 文件**,在 `caffe` 目录下打开该文件

```shell
vim Makefile.config

# 或者用右键选择gedit/vscode打开该文件

caffe源码目录中修改Makefile.config内容如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
## Refer to http://caffe.berkeleyvision.org/installation.html
# Contributions simplifying and improving our build system are welcome!

# cuDNN acceleration switch (uncomment to build with cuDNN).
USE_CUDNN := 1

# CPU-only switch (uncomment to build without GPU support).
# CPU_ONLY := 1

# uncomment to disable IO dependencies and corresponding data layers
# USE_OPENCV := 0
# USE_LEVELDB := 0
# USE_LMDB := 0

# uncomment to allow MDB_NOLOCK when reading LMDB files (only if necessary)
# You should not set this flag if you will be reading LMDBs with any
# possibility of simultaneous read and write
# ALLOW_LMDB_NOLOCK := 1

# Uncomment if you're using OpenCV 3
OPENCV_VERSION := 3

# To customize your choice of compiler, uncomment and set the following.
# N.B. the default for Linux is g++ and the default for OSX is clang++
CUSTOM_CXX := g++

# CUDA directory contains bin/ and lib/ directories that we need.
CUDA_DIR := /usr/local/cuda
# On Ubuntu 14.04, if cuda tools are installed via
# "sudo apt-get install nvidia-cuda-toolkit" then use this instead:
# CUDA_DIR := /usr

# CUDA architecture setting: going with all of them.
# For CUDA < 6.0, comment the *_50 through *_61 lines for compatibility.
# For CUDA < 8.0, comment the *_60 and *_61 lines for compatibility.
# For CUDA >= 9.0, comment the *_20 and *_21 lines for compatibility.
CUDA_ARCH := -gencode arch=compute_30,code=sm_30 \
-gencode arch=compute_35,code=sm_35 \
-gencode arch=compute_50,code=sm_50 \
-gencode arch=compute_52,code=sm_52 \
-gencode arch=compute_60,code=sm_60 \
-gencode arch=compute_61,code=sm_61 \
-gencode arch=compute_61,code=compute_61

# BLAS choice:
# atlas for ATLAS (default)
# mkl for MKL
# open for OpenBlas
BLAS := atlas
# Custom (MKL/ATLAS/OpenBLAS) include and lib directories.
# Leave commented to accept the defaults for your choice of BLAS
# (which should work)!
# BLAS_INCLUDE := /path/to/your/blas
# BLAS_LIB := /path/to/your/blas

# Homebrew puts openblas in a directory that is not on the standard search path
# BLAS_INCLUDE := $(shell brew --prefix openblas)/include
# BLAS_LIB := $(shell brew --prefix openblas)/lib

# This is required only if you will compile the matlab interface.
# MATLAB directory should contain the mex binary in /bin.
# MATLAB_DIR := /usr/local
# MATLAB_DIR := /Applications/MATLAB_R2012b.app

# NOTE: this is required only if you will compile the python interface.
# We need to be able to find Python.h and numpy/arrayobject.h.
# PYTHON_INCLUDE := /usr/include/python2.7 \
# /usr/lib/python2.7/dist-packages/numpy/core/include
# Anaconda Python distribution is quite popular. Include path:
# Verify anaconda location, sometimes it's in root.
# ANACONDA_HOME := $(HOME)/anaconda
# PYTHON_INCLUDE := $(ANACONDA_HOME)/include \
# $(ANACONDA_HOME)/include/python2.7 \
# $(ANACONDA_HOME)/lib/python2.7/site-packages/numpy/core/include

# Uncomment to use Python 3 (default is Python 2)
PYTHON_LIBRARIES := boost_python3 python3.5m
PYTHON_INCLUDE := /usr/include/python3.5m \
/usr/lib/python3.5/dist-packages/numpy/core/include

# We need to be able to find libpythonX.X.so or .dylib.
PYTHON_LIB := /usr/lib
# PYTHON_LIB := $(ANACONDA_HOME)/lib

# Homebrew installs numpy in a non standard path (keg only)
# PYTHON_INCLUDE += $(dir $(shell python -c 'import numpy.core; print(numpy.core.__file__)'))/include
# PYTHON_LIB += $(shell brew --prefix numpy)/lib

# Uncomment to support layers written in Python (will link against Python libs)
WITH_PYTHON_LAYER := 1

# Whatever else you find you need goes here.
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial/
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/hdf5/serial

# If Homebrew is installed at a non standard location (for example your home directory) and you use it for general dependencies
# INCLUDE_DIRS += $(shell brew --prefix)/include
# LIBRARY_DIRS += $(shell brew --prefix)/lib

# NCCL acceleration switch (uncomment to build with NCCL)
# https://github.com/NVIDIA/nccl (last tested version: v1.2.3-1+cuda8.0)
# USE_NCCL := 1

# Uncomment to use `pkg-config` to specify OpenCV library paths.
# (Usually not necessary -- OpenCV libraries are normally installed in one of the above $LIBRARY_DIRS.)
# USE_PKG_CONFIG := 1

# N.B. both build and distribute dirs are cleared on `make clean`
BUILD_DIR := build
DISTRIBUTE_DIR := distribute

# Uncomment for debugging. Does not work on OSX due to https://github.com/BVLC/caffe/issues/171
# DEBUG := 1

# The ID of the GPU that 'make runtest' will use to run unit tests.
TEST_GPUID := 0

# enable pretty build (comment to see full commands)
Q ?= @
```

##### 修改` caffe 目录`下的` Makefile `文件

*修改的地方找起来比较困难的话可以复制到word里查找*
将:
```shell
NVCCFLAGS +=-ccbin=$(CXX) -Xcompiler-fPIC $(COMMON_FLAGS)

替换为:

1
NVCCFLAGS += -D_FORCE_INLINES -ccbin=$(CXX) -Xcompiler -fPIC $(COMMON_FLAGS)

将:

1
LIBRARIES += glog gflags protobuf boost_system boost_filesystem m hdf5_hl hdf5

改为:

1
LIBRARIES += glog gflags protobuf boost_system boost_filesystem m hdf5_serial_hl hdf5_serial

至此caffe配置文件修改完毕,可以开始编译了。假如显卡不是feimi架构的可以输入如下命令防止出现Unsupported gpu architecture 'compute_20'的问题:

1
cmake -D CMAKE_BUILD_TYPE=RELEASE  -D CUDA_GENERATION=Kepler ..

编译安装Caffe

1
2
3
4
5
cd caffe
make all -j $(($(nproc) + 1))
make test -j $(($(nproc) + 1))
make runtest -j $(($(nproc) + 1))
make pycaffe -j $(($(nproc) + 1))

添加Caffe环境变量

1
2
vim ~/.bashrc
export PYTHONPATH=~/caffe/python:$PYTHONPATH

常见问题

常见问题 1
编译caffe报错
解决方法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
git clone https://github.com/madler/zlib
cd path/to/zlib
./configure
make
make install # you may add 'sudo'
```

**常见问题 2**
<table><tr><td bgcolor=Violet>protoc: error while loading shared libraries: libprotoc.so.10: cannot open shared object file: No such file or directory</td></tr></table>

**解决:**

```shell
export LD_LIBRARY_PATH=/usr/local/lib

常见问题 3

/sbin/ldconfig.real: /usr/local/cuda-9.0/lib64/libcudnn.so.5 不是符号连接

解决:
在sudo ldconfig时遇到usr/local/cuda-9.0/lib64/libcudnn.so.5不是符号连接的问题,解决办法也很简单,重新建立链接并删除原链接

首先找到usr/local/cuda-8.0/lib64/目录,搜索libcudnn然后发现两个文件libcudnn.so.5libcudnn.so.5.0.5理论上只有一个libcudnn.so.5.0.5

终端执行:

1
ln -sf /usr/local/cuda-9.0/lib64/libcudnn.so.5.0.5 /usr/local/cuda-9.0/lib64/libcudnn.so.5

sudo ldconfig时就可以了,这时候会发现usr/local/cuda-9.0/lib64/目录下只有libcudnn.so.5.0.5文件了,libcudnn.so.5消失了。

常见问题 4

.build_release/tools/caffe: error while loading shared libraries: libhdf5.so.10: cannot open shared object file: No such file or directory

解决:

1
echo "export LD_LIBRARY_PATH=/home/abc/anaconda2/lib:$LD_LIBRARY_PATH" >>~/.bashrc

常见问题 5

错误:python/caffe/_caffe.cpp:1:52:致命错误:Python.h:没有那个文件或目录 编译中断。 make:*** [python/caffe/_caffe.so]错误1

解决:
执行:sudo find / -name 'Python.h'找到他的路径,
Makefile.config的PYTHON_INCLUDE加上/home/abc/anaconda2/include/python2.7(路径是自己的)

常见问题 6

错误:import caffe时:ImportError:No module named skimage.io

解决:
可能是我们没有安装所谓的skimage.io模块,所以可以用以下的命令来安装:

1
pip install scikit-image  # you may need use sudo

常见问题 7

import caffe Traceback(most recent call last): File"", line 1, in ImportError:No module named caffe

解决:

1
2
echo'export PATH="/home/andy/caffe/python:$PATH"' >>~/.bashrc
source~/.bashrc

关掉终端,重新进入。


安装protobuf

protobuf是什么

protobuf(Protocol Buffer)它是google提供的一个开源库,是一种语言无关、平台无关、扩展性好的用于通信协议、数据存储的结构化数据串行化方法。有如XML,不过它更小、更快、也更简单。你可以定义自己的数据结构,然后使用代码生成器生成的代码来读写这个数据结构。

protobuf-c 是什么

由于Protocol Buffer原生没有对C的支持,只能使用protobuf-c这个第三方库,它提供了支持C语言的API接口。

下面先安装protobuf,然后安装protobuf-c 。

安装protocbuf

下载源码安装包

https://developers.google.com/protocol-buffers/
下载界面

GitHub界面
在release下可以找到所有的版本,我这里用的是2.4.1版本,复制protobuf-2.4.1.tar.gz的链接然后用wget命令下载。

1
wget https://github.com/google/protobuf/releases/download/v2.4.1/protobuf-2.4.1.tar.gz

解压

1
tar -zxvf protobuf-2.4.1.tar.gz

编译/安装

1
cd protobuf-2.4.1

(可以参考README思路来做。)

1
2
3
4
./configure
make
make check #(check结果可能会有错误,但不用管她,因为暂时那些功能用不到)
make install

(完了之后会在 /usr/local/bin 目录下生成一个可执行文件 protoc)

检查安装是否成功

1
protoc --version

如果成功,则会输出版本号信息。如果有问题,则会输出错误内容。

错误及解决方法

1
protoc: error while loading shared libraries: libprotoc.so.8: cannot open shared

错误原因
protobuf的默认安装路径是/usr/local/lib,而/usr/local/lib 不在Ubuntu体系默认的 LD_LIBRARY_PATH 里,所以就找不到该lib
解决方法:
1). 创建文件sudo gedit /etc/ld.so.conf.d/libprotobuf.conf,在该文件中输入如下内容:

1
/usr/local/lib

2). 执行命令

1
sudo ldconfig

这时,再运行protoc –version 就可以正常看到版本号了

安装protobuf-c

(这里使用的是protobuf-c-0.15版本,较高版本的安装类似)

进入下面的链接
https://code.google.com/p/protobuf-c/
进入Downloads界面
下载界面

下载界面

下载界面

不知怎地,wget无法下载途中的protobuf-c-0.15.tar.gz文件。

怎么办呢,我们可以点击上图中的Export to GitHub,将代码导入到GitHub(当然你得有并登录自己的github账号),不过只有源码,没有release版。我们先wget下载源码,解包。由于是源码,所以没有configure文件,但是可以通过执行autogen.sh来生成configure文件,之后的操作就和安装protobuf类似了,这里就不细说了。
安装完成后会在/usr/local/bin目录下便会生成一个可执行文件 protoc-c

在安装完protobuf-c后,我们来检验一下protobuf-c是否安装成功。到 protobuf-c-0.15/src/test 目录下,执行如下命令:

1
protoc-c --c_out=. test.proto

(c_out 标志是用来指定编译后所生成文件的输出路径,这里c_out指定的是当前目录。)
如果在c_out指定目录下能够生成 test.pb-c.c 和 test.pb-c.h 这两个文件则说明安装成功了。

Protobuf的使用示例

1
touch person.proto

输入如下内容:

1
2
3
4
message Person {
required string name = 1;
required int32 id = 2;
}

编译.proto文件

1
protoc-c --c_out=. person.proto
1
touch main.c

输入如下代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#include <stdio.h>
#include <stdlib.h>
#include "person.pb-c.h"

void main()
{
// 定义一个Person元素,并往其中存入数据
Person person = PERSON__INIT;
person.id = 1314;
person.name = "lily"; // 字符串 lily 位于常量区

printf("id = %d\n", person.id);
printf("name = %s\n", person.name);

// 打包
int len = person__get_packed_size(&person);
//printf("len = %d\n", len);
void *sendpack = malloc(len);
person__pack(&person, sendpack);
// sendpack是打好的包,可以通过socket通讯将其发送出去。
//(这里主要讲protobuf,就不发送了)

// 接收端解包
Person *recvbuf = person__unpack(NULL, len, sendpack);
printf("id = %d\n", recvbuf->id);
printf("name = %s\n", recvbuf->name);
// 包用完了要释放
person__free_unpacked(recvbuf, NULL);
free(sendpack);
}

编译

1
gcc person.pb-c.c main.c -lprotobuf-c

执行./a.out,输出结果如下:

1
2
3
4
id = 1314
name = lily
id = 1314
name = lily

Linux MATLAB安装

安装前准备工作

下载MATLAB for Linux文件, 这里用到的是@晨曦月下提供的百度网盘链接下载:

链接: https://pan.baidu.com/s/1W6jWkaXEMpMUEmIl8qmRwg
密码: igx6

进入下载后的文件夹(假如下载后的文件放在了/home/Download/, 解压破解文件Matlab2018aLinux64Crack.tar.gz文件, 创建一个文件夹Crack来放置解压后的文件:

1
2
cd ~/Download
sudo mkdir Crack

解压文件:

1
2
cd ~/Download
tar -xvf Matlab2018aLinux64Crack.tar.gz -C Crack

/mnt中创建一个文件夹用来挂载R2018a_glnxa64_dvd1.isoR2018a_glnxa64_dvd2.iso:

1
2
cd /mnt
sudo mkdir iso

先挂载R2018a_glnxa64_dvd1.iso:

1
2
cd ~
sudo mount -t auto -o loop R2018a_glnxa64_dvd1.iso /mnt/iso

如果这个时候提示/mnt/iso: WARNING:device write-protected, mounted read-only,那就修改下/mnt的权限:

1
2
cd /
sudo chmod 755 mnt

Matlab安装过程

安装开始,从挂载的文件夹iso中:

1
2
cd ~
sudo ./mnt/iso/install
  1. 选择 Use a File Installation Key:
    matlab1

  2. 选择Yes,同意条约:
    matlab2

  3. 选择默认安装目录,默认放在/usr/local

  4. 选择I have the File Installation Key for my license,输入:
    09806-07443-53955-64350-21751-41297

  5. 安装到某个进度会提示插入iso2,这个时候挂载R2018a_glnxa64_dvd2.iso

    1
    2
    cd ~
    sudo mount -t auto -o loop R2018a_glnxa64_dvd2.iso /mnt/iso
  6. 最后安装完成选择finsh

激活

  1. 复制破解文件Cracklicense_standalone.lic到安装目录中

    1
    2
    cd ~/Crack
    sudo cp license_standalone.lic /usr/loca/MATLAB/R2018a/licenses
  2. 复制Crack中的R2018a安装目录

    1
    2
    cd ~/Crack
    sudo cp -r R2018a /usr/local/MATLAB

至此激活完成!

收拾残局, 取消挂载,删除文件:

1
2
3
sudo umunt /mnt/iso
cd /mnt
sudo rmdir iso

Matlab设置

创建命令方便在任何终端都可以打开matlab,采用软链接的方式在/usr/local/bin中创建启动命令matlab:

1
2
cd /usr/lcoal/bin
sudo ln -s /usr/local/MATLAB/R2018a/bin/matlab matlab

参考资料

linux安装MATLAB R2018a步骤

0%