文章目录
TensorRT链接
官方API链接:https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/
TensorRT工具
trtexec集成了TensorRT对接三方格式的parser。
对接TensorFlow
- pb转uff
-
环境
cd python
pip install tensorrt-xxxxx.whl
cd ../uff
pip install uff-xxxxx.whl
cd ../graphsurgeon
pip install graphsurgeon-xxxxx.whl -
命令
convert-to-uff xxxx.pb
-
pb转onnx
python -m tf2onnx.convert --graphdef xxxxx.pb --output xxxxx.onnx --inputs input1:0,input2:0 --outputs output1:0,output2:0 -
trtexec
trtexec --uff=xxxx.uff --output=xxxx,xxxx --uffInput=input1,C,H,W --uffInput=input2,C,H,W --batch=N
trtexec --onnx=xxxx.onnx --explicitBatch
采坑 API
nvinfer1::INetworkDefinition
add各种layer的文档写的真的是,一言难尽
network 分成两种:
- implicit(隐式) batch dimension的网络(比如,HWC)
- explicit(显式) dimensions = full dims网络(NHWC)
addinput的时候会有明显差别。
addInput
官网注释:
For networks with an implicit batch dimension, this volume includes the batch dimension with its length set to the maximum batch size. For networks with all explicit dimensions and with wildcard dimensions, the volume is based on the maxima specified by an IOptimizationProfile.Dimensions are normally non-negative integers. The exception is that in networks with all explicit dimensions, -1 can be used as a wildcard for a dimension to be specified at runtime. Input tensors with such a wildcard must have a corresponding entry in the IOptimizationProfiles indicating the permitted extrema, and the input dimensions must be set by IExecutionContext::setBindingDimensions. Different IExecutionContext instances can have different dimensions. Wildcard dimensions are only supported for EngineCapability::kSTANDARD. They are not supported in safety contexts. DLA does not support Wildcard dimensions.
以NCHW的输入为例,
-
implicit(隐式) batch dimension的网络
正常来说,应该是只输入CHW,在execute的时候再设置batch size。
如果此时输入是NCHW,那么N就是作为batch size的最大值。然后,再execute的时候,以设置的batch size为准?那网络中各个链接tensor的申请的内存呢?以max为准? -
explicit(显式) dimensions网络
正常来首,应该设置非负的NCHW。但是,输入维度可以为未知数(-1表示)。如果输入维度里有-1,构图的依据是需要IOptimizationProfile.Dimensions来设置-1维度的取值范围,在execute之前,通过IExecutionContext::setBindingDimensions确定。
// HW is -1 wildcard
auto input = preprocessorNetwork->addInput("input", nvinfer1::DataType::kFLOAT, Dims4{1, 1, -1, -1});
// Create an optimization profile so that we can specify a range of input dimensions.
nvinfer1::IOptimizationProfile* profile = builder->createOptimizationProfile();
// This profile will be valid for all images whose size falls in the range of [(1, 1, 1, 1), (1, 1, 56, 56)]
// but TensorRT will optimize for (1, 1, 28, 28)
// We do not need to check the return of setDimension and addOptimizationProfile here as all dims are explicitly set
profile->setDimensions(input->getName(), OptProfileSelector::kMIN, Dims4{1, 1, 1, 1});
profile->setDimensions(input->getName(), OptProfileSelector::kOPT, Dims4{1, 1, 28, 28});
profile->setDimensions(input->getName(), OptProfileSelector::kMAX, Dims4{1, 1, 56, 56});
preprocessorConfig->addOptimizationProfile(profile);
// Set the input size for the preprocessor
mPreprocessorContext->setBindingDimensions(0, inputDims), false, "Invalid binding dimensions.";
// We can only run inference once all dynamic input shapes have been specified.
bool ret = mPreprocessorContext->allInputDimensionsSpecified();
addReduce
头文件和文档注释:
//! \param input The input tensor to the layer.
//! \param operation The reduction operation to perform.
//! \param reduceAxes The reduction dimensions.
//! The bit in position i of bitmask reduceAxes corresponds to explicit dimension i if result.
//! E.g., the least significant bit corresponds to the first explicit dimension and the next to least significant bit corresponds to the second explicit dimension.
//!
//! \param keepDimensions The boolean that specifies whether or not to keep the reduced dimensions in the output of the layer.
IReduceLayer* addReduce(ITensor& input, ReduceOperation operation, uint32_t reduceAxes, bool keepDimensions);
降维算子,根据reduceAxes的轴做降维,降维方式可以选ReduceOperation的kSUM,kPROD,kMAX,kMIN,kAVG。
reduceAxes,直接翻译:位掩码的i位,对应i维度的if取值?小端位对应第一个维度,第二小端位对应第二个维度。
reduceAxis |= 1u << axis_data;
| axis index | 二进制 | 十进制 |
|---|---|---|
| 3 | 1000 | 8 |
| 2 | 0100 | 4 |
| 1 | 0010 | 2 |
| 0 | 0001 | 1 |
addShuffle
很多改变维度的算子,比如reshape、flatten、squeeze、unsqueeze、transpose等。
固定常量维度的,直接用setReshapeDimensions可以设定。
transpose的常量perm,用setFirstTranspose设置。
dynamic reshape算子
nvinfer1::ITensor两类tensor,shape tensor 和 execution tensor。shape tensor 是表示shape信息的,shape算子的输出就是一个shape tensor。execution tensor 就是实际做计算的。一般来说一个网络的输入和输出tensor都应该是execution tensor。
reshape算子的shape如果是常量,直接用setReshapeDimension设置即可。
如果shape是变量,此时的shape对应的变量tensor就是一个shape tensor。
nvinfer1::IShuffleLayer默认是static的,setInput(0, xxxx)更新需要被reshape的tensor。
setInput(1, xxxx)第二个参数是一个shape tensor时,nvinfer1::IShuffleLayer会变为dynamic,可动态计算reshape。
addPluginV2
TensorRT不支持的算子,可以自己实现plugin的方式。
头文件模板:
class EqualPluginCreater : public nvinfer1::IPluginCreator {
public:
EqualPluginCreater();
const char *getPluginName() const noexcept override;
const char *getPluginVersion() const noexcept override;
const nvinfer1::PluginFieldCollection *getFieldNames() noexcept override;
nvinfer1::IPluginV2 *createPlugin(const char *name, const nvinfer1::PluginFieldCollection *fc) noexcept override;
nvinfer1::IPluginV2 *deserializePlugin(const char *name, const void *serialData,
size_t serialLength) noexcept override;
void setPluginNamespace(const char *pluginNamespace) noexcept override;
const char *getPluginNamespace() const noexcept override;
private:
static nvinfer1::PluginFieldCollection field_collection_;
static std::vector<nvinfer1::PluginField> fields_;
std::string name_space_;
};
class EqualPlugin : public nvinfer1::IPluginV2DynamicExt {
// 支持动态input shape要用这个
public:
explicit EqualPlugin(const std::string name) : layer_name_(name) {
}
// It doesn't make sense to make GeluPluginDynamic without arguments, so we delete
// default constructor.
EqualPlugin() = delete;
// IPluginV2DynamicExt Methods
nvinfer1::IPluginV2DynamicExt *clone() const noexcept override;
// 构图的时候调用,输出的tensor的维度
nvinfer1::DimsExprs getOutputDimensions(int outputIndex, const nvinfer1::DimsExprs *inputs, int nbInputs,
nvinf

本文介绍了TensorRT API的使用,特别是对接TensorFlow时的转换,以及在nvinfer1::INetworkDefinition中遇到的坑,包括addInput、addReduce、addShuffle等操作。特别讨论了动态reshape、LSTM实现方式以及内存管理和优化策略,如matmul操作的优化。

1866

被折叠的 条评论
为什么被折叠?



