如何在张量流中实现提前停止

     2023-02-23     124

关键词:

【中文标题】如何在张量流中实现提前停止【英文标题】:how to implement early stopping in tensorflow 【发布时间】:2018-03-07 19:18:08 【问题描述】:
def train():
# Model
model = Model()

# Loss, Optimizer
global_step = tf.Variable(1, dtype=tf.int32, trainable=False, name='global_step')
loss_fn = model.loss()
optimizer = tf.train.AdamOptimizer(learning_rate=TrainConfig.LR).minimize(loss_fn, global_step=global_step)

# Summaries
summary_op = summaries(model, loss_fn)

with tf.Session(config=TrainConfig.session_conf) as sess:

    # Initialized, Load state
    sess.run(tf.global_variables_initializer())
    model.load_state(sess, TrainConfig.CKPT_PATH)

    writer = tf.summary.FileWriter(TrainConfig.GRAPH_PATH, sess.graph)

    # Input source
    data = Data(TrainConfig.DATA_PATH)

    loss = Diff()
    for step in xrange(global_step.eval(), TrainConfig.FINAL_STEP):

            mixed_wav, src1_wav, src2_wav, _ = data.next_wavs(TrainConfig.SECONDS, TrainConfig.NUM_WAVFILE, step)

            mixed_spec = to_spectrogram(mixed_wav)
            mixed_mag = get_magnitude(mixed_spec)

            src1_spec, src2_spec = to_spectrogram(src1_wav), to_spectrogram(src2_wav)
            src1_mag, src2_mag = get_magnitude(src1_spec), get_magnitude(src2_spec)

            src1_batch, _ = model.spec_to_batch(src1_mag)
            src2_batch, _ = model.spec_to_batch(src2_mag)
            mixed_batch, _ = model.spec_to_batch(mixed_mag)

            # Initializae our callback.
            #early_stopping_cb = EarlyStoppingCallback(val_acc_thresh=0.5)


            l, _, summary = sess.run([loss_fn, optimizer, summary_op],
                                     feed_dict=model.x_mixed: mixed_batch, model.y_src1: src1_batch,
                                                model.y_src2: src2_batch)

            loss.update(l)
            print('step-\td_loss=:2.2f\tloss='.format(step, loss.diff * 100, loss.value))

            writer.add_summary(summary, global_step=step)

            # Save state
            if step % TrainConfig.CKPT_STEP == 0:
                tf.train.Saver().save(sess, TrainConfig.CKPT_PATH + '/checkpoint', global_step=step)

    writer.close()

我有这个神经网络代码,可以将音乐与 .wav 文件中的声音分开。 如何引入提前停止算法来停止火车部分?我看到一些谈论 ValidationMonitor 的项目。有人可以帮我吗?

【问题讨论】:

从最新的 TensorFlow 文档(测试版)中,可以使用自定义回调实现提前停止。 tensorflow.org/beta/guide/keras/… 好吧,我提供的链接直接指向一个示例回调类,EarlyStoppingAtMinLoss。该类的一个实例可以在训练期间作为回调传递给模型,并在训练期间用于在损失停止减少时提前停止。该示例给出了类的实现,以及在训练期间如何使用它。此外,文档中还提到了一个额外的回调,“tf.keras.callbacks.EarlyStopping 提供了更完整和通用的实现”。这是回调:tensorflow.org/versions/r2.0/api_docs/python/tf/keras/callbacks/… 【参考方案1】:

这是我的早期停止的实现,你可以适应它:

可以在训练过程的某些阶段应用提前停止,例如在每个 epoch 结束时。具体来说;就我而言;我监控每个时期的测试(验证)损失,并且在 20 时期 (self.require_improvement= 20) 之后测试损失没有改善后,训练被中断。

您可以将最大时期设置为 10000 或 20000 或任何您想要的 (self.max_epochs = 10000)。

  self.require_improvement= 20
  self.max_epochs = 10000

这是我使用提前停止的训练功能:

def train(self):

# training data
    train_input = self.Normalize(self.x_train)
    train_output = self.y_train.copy()            
#===============
    save_sess=self.sess # this used to compare the result of previous sess with actual one
# ===============
  #costs history :
    costs = []
    costs_inter=[]
# =================
  #for early stopping :
    best_cost=1000000 
    stop = False
    last_improvement=0
# ================
    n_samples = train_input.shape[0] # size of the training set
# ===============
   #train the mini_batches model using the early stopping criteria
    epoch = 0
    while epoch < self.max_epochs and stop == False:
        #train the model on the traning set by mini batches
        #suffle then split the training set to mini-batches of size self.batch_size
        seq =list(range(n_samples))
        random.shuffle(seq)
        mini_batches = [
            seq[k:k+self.batch_size]
            for k in range(0,n_samples, self.batch_size)
        ]

        avg_cost = 0. # The average cost of mini_batches
        step= 0

        for sample in mini_batches:

            batch_x = x_train.iloc[sample, :]
            batch_y =train_output.iloc[sample, :]
            batch_y = np.array(batch_y).flatten()

            feed_dict=self.X: batch_x,self.Y:batch_y, self.is_train:True

            _, cost,acc=self.sess.run([self.train_step, self.loss_, self.accuracy_],feed_dict=feed_dict)
            avg_cost += cost *len(sample)/n_samples 
            print('epoch[] step [] train -- loss : , accuracy : '.format(epoch,step, avg_cost, acc))
            step += 100

        #cost history since the last best cost
        costs_inter.append(avg_cost)

        #early stopping based on the validation set/ max_steps_without_decrease of the loss value : require_improvement
        if avg_cost < best_cost:
            save_sess= self.sess # save session
            best_cost = avg_cost
            costs +=costs_inter # costs history of the validatio set
            last_improvement = 0
            costs_inter= []
        else:
            last_improvement +=1
        if last_improvement > self.require_improvement:
            print("No improvement found during the ( self.require_improvement) last iterations, stopping optimization.")
            # Break out from the loop.
            stop = True
            self.sess=save_sess # restore session with the best cost

        ## Run validation after every epoch : 
        print('---------------------------------------------------------')
        self.y_validation = np.array(self.y_validation).flatten()
        loss_valid, acc_valid = self.sess.run([self.loss_,self.accuracy_], 
                                              feed_dict=self.X: self.x_validation, self.Y: self.y_validation,self.is_train: True)
        print("Epoch: 0, validation loss: 1:.2f, validation accuracy: 2:.01%".format(epoch + 1, loss_valid, acc_valid))
        print('---------------------------------------------------------')

        epoch +=1

我们可以在这里恢复重要的代码:

def train(self):
  ...
      #costs history :
        costs = []
        costs_inter=[]
      #for early stopping :
        best_cost=1000000 
        stop = False
        last_improvement=0
       #train the mini_batches model using the early stopping criteria
        epoch = 0
        while epoch < self.max_epochs and stop == False:
            ...
            for sample in mini_batches:
            ...                   
            #cost history since the last best cost
            costs_inter.append(avg_cost)

            #early stopping based on the validation set/ max_steps_without_decrease of the loss value : require_improvement
            if avg_cost < best_cost:
                save_sess= self.sess # save session
                best_cost = avg_cost
                costs +=costs_inter # costs history of the validatio set
                last_improvement = 0
                costs_inter= []
            else:
                last_improvement +=1
            if last_improvement > self.require_improvement:
                print("No improvement found during the ( self.require_improvement) last iterations, stopping optimization.")
                # Break out from the loop.
                stop = True
                self.sess=save_sess # restore session with the best cost
            ...
            epoch +=1

希望它能帮助某人:)。

【讨论】:

对不起这个愚蠢的问题,但是你如何调用火车功能?如何在调用时更改 self 参数?【参考方案2】:

ValidationMonitor 被标记为已弃用。不推荐。但你仍然可以使用它。 这是一个如何创建的示例:

    validation_monitor = monitors.ValidationMonitor(
        input_fn=functools.partial(input_fn, subset="evaluation"),
        eval_steps=128,
        every_n_steps=88,
        early_stopping_metric="accuracy",
        early_stopping_rounds = 1000
    )

你可以自己实现,这是我的实现:

          if (loss_value < self.best_loss):
            self.stopping_step = 0
            self.best_loss = loss_value
          else:
            self.stopping_step += 1
          if self.stopping_step >= FLAGS.early_stopping_step:
            self.should_stop = True
            print("Early stopping is trigger at step:  loss:".format(global_step,loss_value))
            run_context.request_stop()

【讨论】:

【参考方案3】:

从 TensorFlow 版本开始,r1.10 早期停止钩子可用于 early_stopping.py 中的估算器 API(请参阅 github)。

例如tf.contrib.estimator.stop_if_no_decrease_hook(见docs)

【讨论】:

【参考方案4】:

对于带有tf.keras 的自定义训练循环,您可以这样实现:

def main(early_stopping, epochs=50):
    loss_history = deque(maxlen=early_stopping + 1)

    for epoch in range(epochs):
        fit(epoch)

        loss_history.append(test_loss.result().numpy())

        if len(loss_history) > early_stopping:
            if loss_history.popleft() < min(loss_history):
                print(f'\nEarly stopping. No validation loss '
                      f'improvement in early_stopping epochs.')
                break

在每个 epoch 结束时,验证损失都会在 collections.deque 中重新计算。假设early_stopping 设置为 3。每个 epoch,最后 4 个损失与最后三个损失进行比较。如果这3个损失没有改善,则循环中断。

这里是完整的代码:

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import tensorflow_datasets as tfds
import tensorflow as tf
from collections import deque

data, info = tfds.load('iris', split='train',
                       as_supervised=True,
                       shuffle_files=True,
                       with_info=True)

dataset = data.shuffle(info.splits['train'].num_examples)

train_dataset = dataset.take(120).batch(4)
test_dataset = dataset.skip(120).take(30).batch(4)


model = tf.keras.Sequential([
    tf.keras.layers.Dense(8, activation='relu'),
    tf.keras.layers.Dense(16, activation='relu'),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(info.features['label'].num_classes, activation='softmax')
])


loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

train_loss = tf.keras.metrics.Mean()
test_loss = tf.keras.metrics.Mean()

train_acc = tf.keras.metrics.SparseCategoricalAccuracy()
test_acc = tf.keras.metrics.SparseCategoricalAccuracy()


opt = tf.keras.optimizers.Adam(learning_rate=1e-3)


@tf.function
def train_step(inputs, labels):
    with tf.GradientTape() as tape:
        logits = model(inputs)
        loss = loss_object(labels, logits)

    gradients = tape.gradient(loss, model.trainable_variables)
    opt.apply_gradients(zip(gradients, model.trainable_variables))
    train_loss(loss)
    train_acc(labels, logits)


@tf.function
def test_step(inputs, labels):
    logits = model(inputs)
    loss = loss_object(labels, logits)
    test_loss(loss)
    test_acc(labels, logits)


def fit(epoch):
    template = 'Epoch :>2 Train Loss :.4f Test Loss :.4f ' \
               'Train Acc :.2% Test Acc :.2%'

    train_loss.reset_states()
    test_loss.reset_states()
    train_acc.reset_states()
    test_acc.reset_states()

    for X_train, y_train in train_dataset:
        train_step(X_train, y_train)

    for X_test, y_test in test_dataset:
        test_step(X_test, y_test)

    print(template.format(
        epoch + 1,
        train_loss.result(),
        test_loss.result(),
        train_acc.result(),
        test_acc.result()
    ))


def main(early_stopping, epochs=50):
    loss_history = deque(maxlen=early_stopping + 1)

    for epoch in range(epochs):
        fit(epoch)

        loss_history.append(test_loss.result().numpy())

        if len(loss_history) > early_stopping:
            if loss_history.popleft() < min(loss_history):
                print(f'\nEarly stopping. No validation loss '
                      f'improvement in early_stopping epochs.')
                break

if __name__ == '__main__':
    main(epochs=100, early_stopping=3)

这是输出:

Epoch  1 Train Loss 1.0368 Test Loss 0.9507 Train Acc 66.67% Test Acc 76.67%
Epoch  2 Train Loss 1.0013 Test Loss 0.9673 Train Acc 65.83% Test Acc 70.00%
Epoch  3 Train Loss 0.9582 Test Loss 1.0055 Train Acc 64.17% Test Acc 56.67%
Epoch  4 Train Loss 0.9116 Test Loss 0.8510 Train Acc 63.33% Test Acc 70.00%
Epoch  5 Train Loss 0.8401 Test Loss 0.8632 Train Acc 67.50% Test Acc 76.67%
Epoch  6 Train Loss 0.8114 Test Loss 0.7535 Train Acc 72.50% Test Acc 80.00%
Epoch  7 Train Loss 0.8105 Test Loss 0.8240 Train Acc 68.33% Test Acc 80.00%
Epoch  8 Train Loss 0.7956 Test Loss 0.7855 Train Acc 81.67% Test Acc 93.33%
Epoch  9 Train Loss 0.7740 Test Loss 0.8094 Train Acc 89.17% Test Acc 73.33%

Early stopping. No validation loss improvement in 3 epochs.

如您所见,最后一个最佳验证损失是在 epoch 6,然后是 3 次损失,没有任何改善。然后循环因此中断。

【讨论】:

如何在keras中包装张量流RNNCell?

】如何在keras中包装张量流RNNCell?【英文标题】:HowtowrapatensorflowRNNCellinkeras?【发布时间】:2019-05-1401:33:34【问题描述】:我想在keras层中实现自定义LSTM单元。实际上这个实现存在于tensorflow中,所以我想知道是否可以将其包装为... 查看详情

如何在 Chromecast 中实现 Ustream HLS 流?

】如何在Chromecast中实现UstreamHLS流?【英文标题】:HowcanIimplementanUstreamHLSstreaminChromecast?【发布时间】:2015-03-0619:50:55【问题描述】:我一直在尝试在Chromecast中实现直播。使用Ustreamapi,我可以获得HLS流链接。在GoogleCast文档中,... 查看详情

如何在社交网络中实现活动流

】如何在社交网络中实现活动流【英文标题】:Howtoimplementtheactivitystreaminasocialnetwork【发布时间】:2010-11-2911:03:24【问题描述】:我正在开发自己的社交网络,但我在网络上没有找到用户操作流的实现示例...例如,如何过滤每个... 查看详情

如何在浏览器中实现 HTTP/2 流连接?

】如何在浏览器中实现HTTP/2流连接?【英文标题】:HowtoimplementHTTP/2streamconnectioninbrowser?【发布时间】:2019-02-1519:06:35【问题描述】:如今,HTTP/2的性能正在上升。Node.js的最新版本很好地支持了HTTP/2。https://nodejs.org/api/http2.html但... 查看详情

如何在 Flutter 中实现音频流应用和安全存储

】如何在Flutter中实现音频流应用和安全存储【英文标题】:Howtoimplementaudiostreamingappandsecurestorageinflutter【发布时间】:2019-11-0603:57:38【问题描述】:我现在要使用颤振https://play.google.com/store/apps/details?id=deezer.android.app构建像deezer... 查看详情

如何在 Java Web 应用程序中实现复杂的页面流

】如何在JavaWeb应用程序中实现复杂的页面流【英文标题】:HowtoimplementacomplexpageflowinaJavaWebapplication【发布时间】:2012-11-0404:10:51【问题描述】:我正在尝试将一个相当复杂的页面流(100多个页面)实现为传统的Web应用程序。我... 查看详情

如何在 C++ 中实现观察者设计模式流数据?

】如何在C++中实现观察者设计模式流数据?【英文标题】:HowcanIimplementtheobserverdesignpatternstreamingdatainC++?【发布时间】:2011-07-3013:46:26【问题描述】:我想将数据从服务器连续(流式传输)发送到客户端,而不需要客户端不断循... 查看详情

如何停止在共享库中实现的阻塞 pthread_join()

】如何停止在共享库中实现的阻塞pthread_join()【英文标题】:Howtostopablockingpthread_join()implementedinasharedlibrary【发布时间】:2019-01-1403:26:22【问题描述】:我的代码在程序退出之前从第三方库调用了一个函数。不幸的是,被调用的... 查看详情

使用队列时如何在张量流中训练期间测试网络

】使用队列时如何在张量流中训练期间测试网络【英文标题】:Howtotestanetworkduringtrainingintensorflowwhenusingaqueue【发布时间】:2016-11-1219:50:35【问题描述】:我正在使用下面的代码使用队列将我的训练示例提供给我的网络,并且它... 查看详情

如何在 Keras 中实现高斯模糊层?

】如何在Keras中实现高斯模糊层?【英文标题】:howdoIimplementGaussianblurringlayerinKeras?【发布时间】:2019-09-0217:59:10【问题描述】:我有一个自动编码器,我需要在输出后添加一个高斯噪声层。我需要一个自定义层来执行此操作,... 查看详情

基于 AUC 的提前停止

...】:我对ML相当陌生,目前正在使用tensorflow和keras在python中实现一个简单的3DCNN。我想根据AUC进行优化,并且还想在AUC分数方面使用提前停止/保存最佳网络。我一直在使用tensorflow的AUC函数,如下所示,它在训练中效果很好。但是... 查看详情

张量流中的最小 RNN 示例

】张量流中的最小RNN示例【英文标题】:MinimalRNNexampleintensorflow【发布时间】:2016-03-2819:40:10【问题描述】:尝试在tensorflow中实现一个最小的玩具RNN示例。目标是学习从输入数据到目标数据的映射,类似于这个精彩简洁的examplein... 查看详情

张量的加权平均

...息来自y_true张量。张量流中有加权平均函数吗?或者关于如何实现这种损失函数的任何其他建议?我的函数看起来像这样:函数(y_true,y_pred)A=(y_true-y_pred) 查看详情

如何从流数据中实现 AudioQueue

】如何从流数据中实现AudioQueue【英文标题】:HowtoimplementAudioQueuefromstreameddata【发布时间】:2018-08-1520:06:27【问题描述】:我正在尝试设置一个音频队列来播放流式音频数据。到目前为止,这就是我所拥有的:varaudioStream=AudioStrea... 查看详情

如何在 Laravel 中实现 GoogleOR-Tool?特别是对于“作为最小成本流算法的分配”(Java)

】如何在Laravel中实现GoogleOR-Tool?特别是对于“作为最小成本流算法的分配”(Java)【英文标题】:HowtoimplementGoogleOR-ToolinLaravel?especiallyforthe"AssignmentasaMinCostFlowAlgorithm"(Java)【发布时间】:2019-07-2721:09:22【问题描述】:我... 查看详情

如何在iphone中实现Apple推送通知服务[重复]

】如何在iphone中实现Apple推送通知服务[重复]【英文标题】:HowtoimplementApplepushnotificationserviceiniphone[duplicate]【发布时间】:2011-11-0311:23:37【问题描述】:可能重复:HowcaniimplementPushNotificationiniphone如何在iphone中实现Apple推送通知服... 查看详情

如何在 C#/Silverlight 中实现带通滤波器

】如何在C#/Silverlight中实现带通滤波器【英文标题】:Howtoimplementaband-passfilterinc#/Silverlight【发布时间】:2009-11-0314:37:56【问题描述】:如何在C#中实现带通滤波器?我在Silverlight中使用自定义MediaStreamSource并使用加法合成来产生声... 查看详情

如何在java中实现负载均衡器

】如何在java中实现负载均衡器【英文标题】:HowtoimplementLoadBalancerinjava【发布时间】:2017-09-0501:33:07【问题描述】:我想实现一种模式,如果错误百分比超过阈值,则自动或手动停止对外部服务(即端口)的所有请求一段时间。... 查看详情