1. tf.nn.conv2d和tf.contrib.slim.conv2d的区别
  2. 1. tf.nn.conv2d的函数
  3. 2. tf.contrib.slim.conv2d函数
  4. 3. 对比
  5. 4. 代码使用对比
  6. 5. 代码例子
  7. 6. API位置

tf.nn.conv2d和tf.contrib.slim.conv2d的区别

1. tf.nn.conv2d的函数

定义如下:

conv2d(
    input,
    filter,
    strides,
    padding,
    use_cudnn_on_gpu=None,
    data_format=None,
    name=None
)

介绍参数:

  • input:指卷积需要输入的参数,具有这样的shape[batch, in_height, in_width, in_channels],分别是[batch张图片, 每张图片高度为in_height, 每张图片宽度为in_width, 图像通道为in_channels]。注意这是一个4维的Tensor,要求数据类型为float32和float64其中之一

  • filter:指用来做卷积的滤波器,滤波器的shape为[filter_height, filter_width, in_channels, out_channels],分别对应[滤波器高度, 滤波器宽度, 接受图像的通道数, 卷积后通道数],
    其中第三个参数 in_channels需要与input中的第四个参数 in_channels一致,这里是维度一致,不是数值一致.
    out_channels第一看的话有些不好理解,如rgb输入三通道图,我们的滤波器的out_channels设为1的话,就是三通道对应值相加,最后输出一个卷积核。

  • strides:卷积时在图像每一维的步长,其值可以直接默认一个数,也可以是一个四维数如[1,2,1,1],则其意思是水平方向卷积步长为第二个参数2,垂直方向步长为1.其中第一和第四个参数暂时不用修改。

  • padding:代表填充方式,参数只有两种,SAME和VALID,
    SAME比VALID的填充方式多了一列,比如一个33图像用22的滤波器进行卷积,当步长设为2的时候,会缺少一列,则进行第二次卷积的时候,VALID发现余下的窗口不足2*2会直接把第三列去掉,SAME则会填充一列,填充值为0。

  • use_cudnn_on_gpu:bool类型,是否使用cudnn加速,默认为true。大概意思是是否使用gpu加速,还没搞太懂。

  • data_format:可选”NHWC”, “NCHW”`.指定输入和输出数据的数据格式。
    默认是”NHWC”, 数据按顺序存储[batch, height, width, channels].
    格式可以是“NCHW”,数据存储顺序为:[batch, channels, height, width].

  • name:给返回的tensor命名。给输出feature map起名字。

  • 结果返回一个Tensor,这个输出,就是我们常说的feature map

2. tf.contrib.slim.conv2d函数

tensorflow/contrib/layers/python/layers/layers.py

定义如下:

convolution(inputs,
          num_outputs,
          kernel_size,
          stride=1,
          padding='SAME',
          data_format=None,
          rate=1,
          activation_fn=nn.relu,
          normalizer_fn=None,
          normalizer_params=None,
          weights_initializer=initializers.xavier_initializer(),
          weights_regularizer=None,
          biases_initializer=init_ops.zeros_initializer(),
          biases_regularizer=None,
          reuse=None,
          variables_collections=None,
          outputs_collections=None,
          trainable=True,
          scope=None):

  • inputs同样是指需要做卷积的输入图像
  • num_outputs指定卷积核的个数(就是filter的个数)
  • kernel_size用于指定卷积核的维度(卷积核的宽度,卷积核的高度)
  • stride为卷积时在图像每一维的步长
  • padding为padding的方式选择,VALID或者SAME
  • data_format是用于指定输入的input的格式
  • rate这个参数不是太理解,而且tf.nn.conv2d中也没有,对于使用atrous convolution的膨胀率(不是太懂这个atrous convolution)
  • activation_fn用于激活函数的指定,默认的为ReLU函数
  • normalizer_fn用于指定正则化函数
  • normalizer_params用于指定正则化函数的参数
  • weights_initializer用于指定权重的初始化程序
  • weights_regularizer为权重可选的正则化程序
  • biases_initializer用于指定biase的初始化程序
  • biases_regularizer: biases可选的正则化程序
  • reuse指定是否共享层或者和变量
  • variable_collections指定所有变量的集合列表或者字典
  • outputs_collections指定输出被添加的集合
  • trainable:卷积层的参数是否可被训练
  • scope:共享变量所指的variable_scope

3. 对比

tf.contrib.slim.conv2d提供了更多可以指定的初始化的部分,而对于tf.nn.conv2d而言,其指定filter的方式相比较tf.contrib.slim.conv2d更加的复杂。去除掉少用的初始化部分,其实两者的API可以简化如下:

tf.contrib.slim.conv2d (inputs,
                num_outputs,[卷积核个数]
                kernel_size,[卷积核的高度,卷积核的宽度]
                stride=1,
                padding='SAME',
)
tf.nn.conv2d(
    input,(与上述一致)
    filter,([卷积核的高度,卷积核的宽度,图像通道数,卷积核个数])
    strides,
    padding,
)

4. 代码使用对比

普通的tensorflow代码

input = ...
with tf.name_scope('conv1_1') as scope:
  filter = tf.Variable(tf.truncated_normal([3, 3, 64, 128], dtype=tf.float32,
                                           stddev=1e-1), name='weights')
  conv = tf.nn.conv2d(input, filter, [1, 1, 1, 1], padding='SAME')
  biases = tf.Variable(tf.constant(0.0, shape=[128], dtype=tf.float32),
                       trainable=True, name='biases')
  bias = tf.nn.bias_add(conv, biases)
  conv1 = tf.nn.relu(bias, name=scope)

为了缓解重复这些代码,TF-Slim在更抽象的神经网络层的层面上提供了大量方便使用的操作符
比如,将上面的代码和TF-Slim响应的代码调用进行比较:

input = ...
net = slim.conv2d(input, 128, [3, 3], scope='conv1_1')

5. 代码例子

import tensorflow as tf 
import tensorflow.contrib.slim as slim

x1 = tf.ones(shape=[1, 64, 64, 3]) 
w = tf.fill([5, 5, 3, 64], 1)
# print("rank is", tf.rank(x1))
y1 = tf.nn.conv2d(x1, w, strides=[1, 1, 1, 1], padding='SAME')
y2 = slim.conv2d(x1, 64, [5, 5], weights_initializer=tf.ones_initializer, padding='SAME')

with tf.Session() as sess: 
    sess.run(tf.global_variables_initializer()) 
    y1_value,y2_value,x1_value=sess.run([y1,y2,x1])
    print("shapes are", y1_value.shape, y2_value.shape)
    print(y1_value==y2_value)
    print(y1_value)
    print(y2_value)

6. API位置

https://tensorflow.google.cn/versions/master/api_docs/python/tf/nn/conv2d

tensorflow/contrib/layers/python/layers/layers.py

tf.contrib.slim.conv2d的API英文版


def convolution(inputs,
                num_outputs,
                kernel_size,
                stride=1,
                padding='SAME',
                data_format=None,
                rate=1,
                activation_fn=nn.relu,
                normalizer_fn=None,
                normalizer_params=None,
                weights_initializer=initializers.xavier_initializer(),
                weights_regularizer=None,
                biases_initializer=init_ops.zeros_initializer(),
                biases_regularizer=None,
                reuse=None,
                variables_collections=None,
                outputs_collections=None,
                trainable=True,
                scope=None):
  """Adds an N-D convolution followed by an optional batch_norm layer.

  It is required that 1 <= N <= 3.

  `convolution` creates a variable called `weights`, representing the
  convolutional kernel, that is convolved (actually cross-correlated) with the
  `inputs` to produce a `Tensor` of activations. If a `normalizer_fn` is
  provided (such as `batch_norm`), it is then applied. Otherwise, if
  `normalizer_fn` is None and a `biases_initializer` is provided then a `biases`
  variable would be created and added the activations. Finally, if
  `activation_fn` is not `None`, it is applied to the activations as well.

  Performs atrous convolution with input stride/dilation rate equal to `rate`
  if a value > 1 for any dimension of `rate` is specified.  In this case
  `stride` values != 1 are not supported.

  Args:
    inputs: A Tensor of rank N+2 of shape
      `[batch_size] + input_spatial_shape + [in_channels]` if data_format does
      not start with "NC" (default), or
      `[batch_size, in_channels] + input_spatial_shape` if data_format starts
      with "NC".
    num_outputs: Integer, the number of output filters.
    kernel_size: A sequence of N positive integers specifying the spatial
      dimensions of the filters.  Can be a single integer to specify the same
      value for all spatial dimensions.
    stride: A sequence of N positive integers specifying the stride at which to
      compute output.  Can be a single integer to specify the same value for all
      spatial dimensions.  Specifying any `stride` value != 1 is incompatible
      with specifying any `rate` value != 1.
    padding: One of `"VALID"` or `"SAME"`.
    data_format: A string or None.  Specifies whether the channel dimension of
      the `input` and output is the last dimension (default, or if `data_format`
      does not start with "NC"), or the second dimension (if `data_format`
      starts with "NC").  For N=1, the valid values are "NWC" (default) and
      "NCW".  For N=2, the valid values are "NHWC" (default) and "NCHW".
      For N=3, the valid values are "NDHWC" (default) and "NCDHW".
    rate: A sequence of N positive integers specifying the dilation rate to use
      for atrous convolution.  Can be a single integer to specify the same
      value for all spatial dimensions.  Specifying any `rate` value != 1 is
      incompatible with specifying any `stride` value != 1.
    activation_fn: Activation function. The default value is a ReLU function.
      Explicitly set it to None to skip it and maintain a linear activation.
    normalizer_fn: Normalization function to use instead of `biases`. If
      `normalizer_fn` is provided then `biases_initializer` and
      `biases_regularizer` are ignored and `biases` are not created nor added.
      default set to None for no normalizer function
    normalizer_params: Normalization function parameters.
    weights_initializer: An initializer for the weights.
    weights_regularizer: Optional regularizer for the weights.
    biases_initializer: An initializer for the biases. If None skip biases.
    biases_regularizer: Optional regularizer for the biases.
    reuse: Whether or not the layer and its variables should be reused. To be
      able to reuse the layer scope must be given.
    variables_collections: Optional list of collections for all the variables or
      a dictionary containing a different list of collection per variable.
    outputs_collections: Collection to add the outputs.
    trainable: If `True` also add variables to the graph collection
      `GraphKeys.TRAINABLE_VARIABLES` (see tf.Variable).
    scope: Optional scope for `variable_scope`.

  Returns:
    A tensor representing the output of the operation.

  Raises:
    ValueError: If `data_format` is invalid.
    ValueError: Both 'rate' and `stride` are not uniformly 1.
  """
    if data_format not in [None, 'NWC', 'NCW', 'NHWC', 'NCHW', 'NDHWC', 'NCDHW']:
    raise ValueError('Invalid data_format: %r' % (data_format,))

     layer_variable_getter = _build_variable_getter({
      'bias': 'biases',
      'kernel': 'weights'
  })

技术交流学习,请加QQ微信:631531977
目录