1. 修改 object detection API 源码,支持任意数量的输入通道

官方源码只支持3个通道输入,支持jpg,png等图片格式,
目的: 通过在配置中增加维度的设置,支持任意数量的通道

https://github.com/tensorflow/models/tree/2243d30cd9be6a57f8f23d398d702a3ff7aacf11/object_detection

1.1. 准备输入

1.1.1. 创建tfrecord:

更改.tfrecord文件的格式以存储多维数组,而不是像示例中那样的图像,

https://github.com/tensorflow/models/blob/2243d30cd9be6a57f8f23d398d702a3ff7aacf11/object_detection/create_pascal_tf_record.py

主要修改上面文件中下面的两行

    with tf.gfile.GFile(full_path, 'rb') as fid:
      encoded_jpg = fid.read()

您需要将输入准备为numpy数组,并将其编码为字符串,如下例所示:

    # Read your image and extra inputs
    image = cv2.imread('path/to/image')[:, :, ::-1]
    background = cv2.imread('path/to/background')[:, :, ::-1]
    # Image and background are numpy arrays that has dimension of H x W x 3
    # Concatenate them on depth channel to create an H x W x 6 input
    inputs_stacked = np.concatenate([image, background], axis=-1)
    # Encode your input as string
    encoded_inputs = inputs_stacked.tostring()

接下来,在创建tf.train.Example对象时,必须将新输入存储为bytes_feature。另外,您还需要在输入中存储通道数量.

    tf_example = tf.train.Example(features=tf.train.Features(feature={
    ...
        'image/channels': dataset_util.int64_feature(len(input_types) * 3),
        'image/encoded': dataset_util.bytes_feature(encoded_inputs),
    ...
      }))

1.1.2. 更改数据解码器:

您必须更改文件object_detection/data_decoders/tf_example_decoder.py以允许它读取您的新输入:

首先,在TFExampleDecoder类中,添加此函数来读取您的输入:

    def _read_image(self, keys_to_tensors):
      image_encoded = keys_to_tensors['image/encoded']
      height = keys_to_tensors['image/height']
      width = keys_to_tensors['image/width']
      channels = keys_to_tensors['image/channels']
      to_shape = tf.cast(tf.stack([height, width, channels]), tf.int32)
      image = tf.reshape(tf.decode_raw(image_encoded, tf.uint8), to_shape)
      return image

此函数使用您之前存储的通道信息数将您的编码输入重塑为3D数组

现在,在同一个类的init函数中,在self.keys_to_features字典中,添加以下行:

(https://github.com/tensorflow/models/blob/2243d30cd9be6a57f8f23d398d702a3ff7aacf11/object_detection/data_decoders/tf_example_decoder.py#L34)

    'image/channels': tf.FixedLenFeature((), tf.int64, 1),

另外,在这一行更改编码图像的item_to_handler

https://github.com/tensorflow/models/blob/2243d30cd9be6a57f8f23d398d702a3ff7aacf11/object_detection/data_decoders/tf_example_decoder.py#L56

对此:

        fields.InputDataFields.image:
            slim_example_decoder.ItemHandlerCallback(
              keys=['image/encoded', 'image/height', 'image/width', 'image/channels'],
              func=self._read_image
            )

1.2. 更改架构代码:

您需要更改一小部分体系结构代码,以使API知道tfrecord文件的输入具有多于3个输入通道:

1.2.1. 更改protobuf文件:

在文件object_detection/protos/faster_rcnn.proto中,将此行添加到消息FasterRcnn的末尾:

    optional uint32 num_input_channels = 28 [default=3];

现在,再次编译你的protobuf文件:

    protoc object_detection/protos/*.proto --python_out=.

1.2.2. 更改特征提取器元架构的签名:

您需要在目录object_detection/meta_architectures中打开包含模型的元结构的文件,并通过添加num_input_channel来更改FeatureExtractore类的签名,以便您的特征提取器知道您的输入有多少个通道。

例如,如果使用FasterRCNNResnet101,请打开文件object_detection/meta_architectures/faster_rcnn_meta_arch.py

接下来,查找您的特征提取器的__init__函数,并将num_input_channels添加到其签名中。另外,将通道存储为对象的属性。对于FasterRCNN,您需要查看第88行

(https://github.com/tensorflow/models/blob/2243d30cd9be6a57f8f23d398d702a3ff7aacf11/object_detection/meta_architectures/faster_rcnn_meta_arch.py#L88)

并将init函数更改为:

      def __init__(self,
                   is_training,
                   first_stage_features_stride,
                   num_input_channels=3,
                   reuse_weights=None,
                   weight_decay=0.0):
        """Constructor.

        Args:
          is_training: A boolean indicating whether the training version of the
            computation graph should be constructed.
          first_stage_features_stride: Output stride of extracted RPN feature map.
          reuse_weights: Whether to reuse variables. Default is None.
          weight_decay: float weight decay for feature extractor (default: 0.0).
        """
        self._is_training = is_training
        self._first_stage_features_stride = first_stage_features_stride
        self._num_input_channels = num_input_channels
        self._reuse_weights = reuse_weights
        self._weight_decay = weight_decay

1.2.3. 更改特征提取器代码:

您需要在目录object_detection/models/中更改特征提取器代码。对于FasterRCNNResnet101,您需要打开文件faster_rcnn_resnet_v1_feature_extractor.py

寻找你的特征提取器的类。例如,我正在使用ResnetV1,因此我将在第40行

(https://github.com/tensorflow/models/blob/2243d30cd9be6a57f8f23d398d702a3ff7aacf11/object_detection/models/faster_rcnn_resnet_v1_feature_extractor.py#L40)

寻找类 FasterRCNNResnetV1FeatureExtractor
再次添加num_input_channels到init函数签名。
接下来,将num_input_channels添加到super 调用中:

      def __init__(self,
                   architecture,
                   resnet_model,
                   is_training,
                   first_stage_features_stride,
                   num_input_channels=3,
                   reuse_weights=None,
                   weight_decay=0.0):
        """Constructor.

        Args:
          architecture: Architecture name of the Resnet V1 model.
          resnet_model: Definition of the Resnet V1 model.
          is_training: See base class.
          first_stage_features_stride: See base class.
          reuse_weights: See base class.
          weight_decay: See base class.

        Raises:
          ValueError: If `first_stage_features_stride` is not 8 or 16.
        """
        if first_stage_features_stride != 8 and first_stage_features_stride != 16:
          raise ValueError('`first_stage_features_stride` must be 8 or 16.')
        self._architecture = architecture
        self._resnet_model = resnet_model
        super(FasterRCNNResnetV1FeatureExtractor, self).__init__(
            is_training, first_stage_features_stride, num_input_channels, reuse_weights, weight_decay)

您还需要在同一个类中更改预处理函数(第67行),因为此函数会减去每个通道的平均值。基本上,您可能会希望为新通道添加零,或者重新使用RGB通道的值。对于ResnetV1,请仔细选择平均值,因为它会影响性能。

还有一个类需要在同一个文件中更改。它是您刚刚在上面更改的类的一个子类。这个类的名字取决于你选择的模型。对于FasterRCNNResnet101,类名称将是FasterRCNNResnet101FeatureExtractor。就像上面一样,将num_input_channels添加到签名和super 调用中:

      def __init__(self,
                   is_training,
                   first_stage_features_stride,
                   num_input_channels=3,
                   reuse_weights=None,
                   weight_decay=0.0):
        """Constructor.

        Args:
          is_training: See base class.
          first_stage_features_stride: See base class.
          reuse_weights: See base class.
          weight_decay: See base class.

        Raises:
          ValueError: If `first_stage_features_stride` is not 8 or 16,
            or if `architecture` is not supported.
        """
        super(FasterRCNNResnet101FeatureExtractor, self).__init__(
            'resnet_v1_101', resnet_v1.resnet_v1_101, is_training,
            first_stage_features_stride, num_input_channels, reuse_weights, weight_decay)

1.2.4. 更改model_builder.py:

更改文件object_detection/builder/model_builder.py。您如何更改此文件取决于您使用的是哪种体系结构。如果你使用SSD,你需要改变不同函数,但是方法都是一样的。在此行更改_build_faster_rcnn_feature_extractor的函数签名,(基本上只是添加num_input_channels = 3)
https://github.com/tensorflow/models/blob/2243d30cd9be6a57f8f23d398d702a3ff7aacf11/object_detection/builders/model_builder.py#L164

    def _build_faster_rcnn_feature_extractor(feature_extractor_config, is_training, num_input_channels=3, reuse_weights=None):

接下来,更改相同函数的返回行
https://github.com/tensorflow/models/blob/2243d30cd9be6a57f8f23d398d702a3ff7aacf11/object_detection/builders/model_builder.py#L189

    return feature_extractor_class(is_training, first_stage_features_stride, num_input_channels, reuse_weights)

接下来,查找函数_build_faster_rcnn_model,将此行添加到其正文:

    num_input_channels = frcnn_config.num_input_channels

在第213行的相同函数中调用特征提取器构建调用:
https://github.com/tensorflow/models/blob/2243d30cd9be6a57f8f23d398d702a3ff7aacf11/object_detection/builders/model_builder.py#L213)
to this:

    feature_extractor = _build_faster_rcnn_feature_extractor(
        frcnn_config.feature_extractor, is_training, num_input_channels)

1.3. 配置文件:

在配置文件中,确保在faster_rcnn消息下添加num_input_channels。

1.4. 导出推理图:

exporter.py文件中,查找函数_image_tensor_input_placeholder()。将返回的占位符的维数更改为(None,None,None,num_input_channels)。对于tf_example输入,您不需要指定此维度。

1.5. 迁移学习

由于特征提取器中第一个卷积加权的维数已更改,因此无法使用原始存储库中提供的检查点进行迁移学习。您必须修改这些检查点,以使第一个卷积权重的维数与您的模型相匹配。

其实很容易。把检查点的所有权重作为numpy数组加载到内存中,查找第一个卷积权重,对其进行更改,使用新维数为所有权重创建tf.Variable并将其保存为新检查点。TensorFlow python工具的一个很好的参考文件是
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/inspect_checkpoint.py

1.6. 参考文献

https://github.com/minhnhat93/tf_object_detection_multi_channels
minhnhat93/tf_object_detection_multi_channels

修改TensorFlow Object Detection API


技术交流学习,请加QQ微信:631531977
目录