1. 高级训练接口 Training Loop

tensorflow/contrib/slim/python/slim/learning.py

TF-Slim在learning.py提供了一套简单但功能强大的工具。 这些功能包括一个训练函数,可以反复测量损失，计算梯度并将模型保存到磁盘，以及用于操纵梯度的几个便利函数。

1.1. slim.learning.train

slim.learning.train(
train_op,
logdir,
number_of_steps=1000,
save_summaries_secs=300,
save_interval_secs=600):
• slim.learning.train与train_op一起提供，用于计算损失和梯度步骤。
• logdir指定检查点和事件文件的存储目录。
• number_of_steps我们可以限制采取任何数字的梯度步数。 在这种情况下，我们要求采取1000个步骤。
• save_summaries_secs = 300表示我们将每隔5分钟计算摘要，
• save_interval_secs = 600表示我们将每10分钟保存一次模型检查点。

1.2. A simple working training script

# Load data and create the model:
images, labels = LoadData()
predictions = MyModel(images)

# Define the loss:
slim.losses.log_loss(predictions, labels)
total_loss = slim.losses.get_total_loss()

# Define the optimizer:
optimizer = tf.train.MomentumOptimizer(FLAGS.learning_rate, FLAGS.momentum)

# Create the train_op
train_op = slim.learning.create_train_op(total_loss, optimizer)

# Run training.
slim.learning.train(train_op, my_log_dir)

1.3. Creating the train_op

# 创建train_op并剪裁梯度标准：Create the train_op and clip the gradient norms:
train_op = slim.learning.create_train_op(
total_loss,
optimizer,

# 创建train_op并通过提供从变量名称（或变量）到缩放系数的映射来缩放梯度：
# Create the train_op and scale the gradients by providing a map from variable
# name (or variable) to a scaling coefficient:
'conv0/weights': 1.2,
'fc8/weights': 3.4,
}
train_op = slim.learning.create_train_op(
total_loss,
optimizer,

Many networks utilize modules, like BatchNorm, that require performing a series
of non-gradient updates during training. slim.learning.create_train_op allows
a user to pass in a list of update_ops to call along with the gradient updates.

train_op = slim.learning.create_train_op(total_loss, optimizer, update_ops)

By default, slim.learning.create_train_op includes all update ops that are
part of the tf.GraphKeys.UPDATE_OPS collection. Additionally, TF-Slim’s
slim.batch_norm function adds the moving mean and moving variance updates to
this collection. Consequently, users who want to use slim.batch_norm will not
need to take any additional steps in order to have the moving mean and moving
variance updates be computed.

However, users with additional, specialized updates can either override the
default update ops or simply add additional update ops to the
tf.GraphKeys.UPDATE_OPS collection:

# Force TF-Slim NOT to use ANY update_ops:
train_op = slim.learning.create_train_op(
total_loss,
optimizer,
update_ops=[])

# Use an alternative set of update ops:
train_op = slim.learning.create_train_op(
total_loss,
optimizer,
update_ops=my_other_update_ops)

# Use an alternative set of update ops in addition to the default updates:

train_op = slim.learning.create_train_op(
total_loss,
optimizer)

# Which is the same as:
train_op = slim.learning.create_train_op(
total_loss,
optimizer,
update_ops=tf.get_collection(tf.GraphKeys.UPDATE_OPS))

1.5. 从checkpoint初始化模型 Initializing a model from a checkpoint

# Create the train_op
train_op = slim.learning.create_train_op(total_loss, optimizer)

# Create the initial assignment op
checkpoint_path = '/path/to/old_model_checkpoint'
variables_to_restore = slim.get_model_variables()
init_assign_op, init_feed_dict = slim.assign_from_checkpoint(
checkpoint_path, variables_to_restore)

# Create an initial assignment function.
def InitAssignFn(sess):
sess.run(init_assign_op, init_feed_dict)

# Run training.
slim.learning.train(train_op, my_log_dir, init_fn=InitAssignFn)

1.6. Initializing a model from a checkpoint whose variable names don’t match

At times, a user may want to initialize a new model with values from a
checkpoint whose variable names do not match those of the current model. In this
case, one needs to create a mapping from the checkpoint variable names to the
current model variables. This requires only a small modification of the code
above:

# Creates a model with two variables, var0 and var1
predictions = MyModel(images)

# Create the train_op
train_op = slim.learning.create_train_op(total_loss, optimizer)

checkpoint_path = '/path/to/old_model_checkpoint'

# Create the mapping:
variables_to_restore = {
'name_var_0_in_checkpoint': slim.get_unique_variable('var0'),
'name_var_1_in_checkpoint': slim.get_unique_variable('var1')
}
init_assign_op, init_feed_dict = slim.assign_from_checkpoint(
checkpoint_path, variables_to_restore)

# Create an initial assignment function.
def InitAssignFn(sess):
sess.run(init_assign_op, init_feed_dict)

# Run training.
slim.learning.train(train_op, my_log_dir, init_fn=InitAssignFn)

1.7. Fine-Tuning Part of a model from a checkpoint

Rather than initializing all of the weights of a given model, we sometimes
only want to restore some of the weights from a checkpoint. To do this, one
need only filter those variables to initialize as follows:

# Create the train_op
train_op = slim.learning.create_train_op(total_loss, optimizer)

checkpoint_path = '/path/to/old_model_checkpoint'

# Specify the variables to restore via a list of inclusion or exclusion
# patterns:
variables_to_restore = slim.get_variables_to_restore(
include=["conv"], exclude=["fc8", "fc9])
# or
variables_to_restore = slim.get_variables_to_restore(exclude=["conv"])

init_assign_op, init_feed_dict = slim.assign_from_checkpoint(
checkpoint_path, variables_to_restore)

# Create an initial assignment function.
def InitAssignFn(sess):
sess.run(init_assign_op, init_feed_dict)

# Run training.
slim.learning.train(train_op, my_log_dir, init_fn=InitAssignFn)

1.8. Initializing model variables from values in memory

One may want to initialize the weights of a model from values from an arbitrary
source (a text document, matlab file, etc). While this is technically feasible
using plain TensorFlow, it also results in the values of your weights being
stored in the graph. For large models, this becomes prohibitively large. TF-Slim
allows you to perform this initial assignment without having to store the values
of the initial model in the graph itself by using placeholders and a feed
dictionary:

# Create the train_op
train_op = slim.learning.create_train_op(total_loss, optimizer)

# Create the mapping from variable names to values:

var_names_to_values = {
'var0': var0_initial_value,
'var1': var1_initial_value,
}
init_assign_op, init_feed_dict = slim.assign_from_values(var_names_to_values)

# Create an initial assignment function.
def InitAssignFn(sess):
sess.run(init_assign_op, init_feed_dict)

# Run training.
slim.learning.train(train_op, my_log_dir, init_fn=InitAssignFn)

1.9. 举例说明

g = tf.Graph()
# Create the model and specify the losses

total_loss = slim.losses.get_total_loss()

# create_train_op ensures that each time we ask for the loss, the update_ops
# are run and the gradients being computed are applied too.
train_op = slim.learning.create_train_op(total_loss, optimizer)
logdir =  # Where checkpoints are stored.

slim.learning.train(
train_op,
logdir,
number_of_steps=1000,
save_summaries_secs=300,
save_interval_secs=600):

https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/slim