正文

六pytorch进阶训练技巧(代码片段)

沧夜2021  沧夜2021  2023-03-12  191

关键词：

六、PyTorch进阶训练技巧

文章目录

六、PyTorch进阶训练技巧

1. 自定义损失函数

损失函数是深度学习过程中需要定义的一个重要环节。在PyTorch中，损失函数的定义有着函数定义和类定义的两种方式。

1.1. 函数定义

def my_loss(output, target):
    loss = torch.mean((output - target)**2)
    return loss

1.2. 类定义

损失函数类需要继承自nn.Module类

1.2.1. DiceLoss

Dice Loss是一种在分割领域常见的损失函数。Dice系数，是一种集合相似度度量函数，通常用于计算两个样本点的相似度（值范围为[0, 1]），|X⋂Y| 表示 X 和 Y 之间的交集；|X| 和 |Y| 分别表示 X 和 Y 的元素个数. 其中，分子中的系数 2，是因为分母存在重复计算 X 和 Y 之间的共同元素的原因。

如果Dice系数越大，表明集合越相似，Loss越小；反之亦然。
$\\ Loss = 1 - \\frac2| X \\cap Y ||X| +|Y|$

注：|X⋂Y|表示两个集合对应元素点乘，然后逐元素相乘的结果相加求和。
$\\frac2| X \\cap Y ||X| +|Y|$

class DiceLoss(nn.Module):
    def __init__(self,weight=None,size_average=True):
        super(DiceLoss,self).__init__()
        
	def forward(self,inputs,targets,smooth=1):
        inputs = F.sigmoid(inputs)       
        inputs = inputs.view(-1)
        targets = targets.view(-1)
        intersection = (inputs * targets).sum()                   
        dice = (2.*intersection + smooth)/(inputs.sum() + targets.sum() + smooth)  
        return 1 - dice

# 使用方法    
criterion = DiceLoss()
loss = criterion(input,targets)

1.2.2. DiceBCELoss

class DiceBCELoss(nn.Module):
    def __init__(self, weight=None, size_average=True):
        super(DiceBCELoss, self).__init__()

    def forward(self, inputs, targets, smooth=1):
        inputs = F.sigmoid(inputs)       
        inputs = inputs.view(-1)
        targets = targets.view(-1)
        intersection = (inputs * targets).sum()                     
        dice_loss = 1 - (2.*intersection + smooth)/(inputs.sum() + targets.sum() + smooth)  
        BCE = F.binary_cross_entropy(inputs, targets, reduction='mean')
        Dice_BCE = BCE + dice_loss
        
        return Dice_BCE

1.2.3. IoULoss

出自旷世 2016 ACM 论文《UnitBox: An Advanced Object Detection Network》，链接 https://arxiv.org/pdf/1608.01471.pdf。

class IoULoss(nn.Module):
    def __init__(self, weight=None, size_average=True):
        super(IoULoss, self).__init__()

    def forward(self, inputs, targets, smooth=1):
        inputs = F.sigmoid(inputs)       
        inputs = inputs.view(-1)
        targets = targets.view(-1)
        intersection = (inputs * targets).sum()
        total = (inputs + targets).sum()
        union = total - intersection 
        
        IoU = (intersection + smooth)/(union + smooth)
                
        return 1 - IoU

1.2.4. FocalLoss

ALPHA = 0.8
GAMMA = 2

class FocalLoss(nn.Module):
    def __init__(self, weight=None, size_average=True):
        super(FocalLoss, self).__init__()

    def forward(self, inputs, targets, alpha=ALPHA, gamma=GAMMA, smooth=1):
        inputs = F.sigmoid(inputs)       
        inputs = inputs.view(-1)
        targets = targets.view(-1)
        BCE = F.binary_cross_entropy(inputs, targets, reduction='mean')
        BCE_EXP = torch.exp(-BCE)
        focal_loss = alpha * (1-BCE_EXP)**gamma * BCE
                       
        return focal_loss

在自定义损失函数时，涉及到数学运算时，我们最好全程使用PyTorch提供的张量计算接口，这样就不需要我们实现自动求导功能并且我们可以直接调用cuda

2. 动态调整学习率

学习率的选择是深度学习中一个困扰人们许久的问题，学习速率设置过小，会极大降低收敛速度，增加训练时间；学习率太大，可能导致参数在最优解两侧来回振荡。但是当我们选定了一个合适的学习率后，经过许多轮的训练后，可能会出现准确率震荡或loss不再下降等情况，说明当前学习率已不能满足模型调优的需求。此时我们就可以通过一个适当的学习率衰减策略来改善这种现象，提高我们的精度。这种设置方式在PyTorch中被称为scheduler

scheduler的使用有两种方式：

2.1. 使用官方提供的scheduler

PyTorch已经在torch.optim.lr_scheduler为我们封装好了一些动态调整学习率的方法

# 选择一种优化器
optimizer = torch.optim.Adam(...) 
# 选择上面提到的一种或多种动态调整学习率的方法
scheduler1 = torch.optim.lr_scheduler.... 
scheduler2 = torch.optim.lr_scheduler....
...
schedulern = torch.optim.lr_scheduler....
# 进行训练
for epoch in range(100):
    train(...)
    validate(...)
    optimizer.step()
    # 需要在优化器参数更新之后再动态调整学习率
	scheduler1.step() 
	...
    schedulern.step()

我们在使用官方给出的torch.optim.lr_scheduler时，需要将scheduler.step()放在optimizer.step()后面进行使用。

2.2. 自定义scheduler

自定义函数adjust_learning_rate来改变param_group中lr的值

def adjust_learning_rate(optimizer, epoch):
    lr = args.lr * (0.1 ** (epoch // 30))
    for param_group in optimizer.param_groups:
        param_group['lr'] = lr

def adjust_learning_rate(optimizer,...):
    ...
optimizer = torch.optim.SGD(model.parameters(),lr = args.lr,momentum = 0.9)
for epoch in range(10):
    train(...)
    validate(...)
    adjust_learning_rate(optimizer,epoch)

3. 模型微调-torchvision

3.1 使用现有的预训练模型

实例化网络如下：

import torchvision.models as models
resnet18 = models.resnet18()
# resnet18 = models.resnet18(pretrained=False)  等价于与上面的表达式
alexnet = models.alexnet()
vgg16 = models.vgg16()
squeezenet = models.squeezenet1_0()
densenet = models.densenet161()
inception = models.inception_v3()
googlenet = models.googlenet()
shufflenet = models.shufflenet_v2_x1_0()
mobilenet_v2 = models.mobilenet_v2()
mobilenet_v3_large = models.mobilenet_v3_large()
mobilenet_v3_small = models.mobilenet_v3_small()
resnext50_32x4d = models.resnext50_32x4d()
wide_resnet50_2 = models.wide_resnet50_2()
mnasnet = models.mnasnet1_0()

还可以选择是否传递pretrained参数

通过True或者False来决定是否使用预训练好的权重，在默认状态下pretrained = False，意味着我们不使用预训练得到的权重，当pretrained = True，意味着我们将使用在一些数据集上预训练得到的权重。

import torchvision.models as models
resnet18 = models.resnet18(pretrained=True)
alexnet = models.alexnet(pretrained=True)
squeezenet = models.squeezenet1_0(pretrained=True)
vgg16 = models.vgg16(pretrained=True)
densenet = models.densenet161(pretrained=True)
inception = models.inception_v3(pretrained=True)
googlenet = models.googlenet(pretrained=True)
shufflenet = models.shufflenet_v2_x1_0(pretrained=True)
mobilenet_v2 = models.mobilenet_v2(pretrained=True)
mobilenet_v3_large = models.mobilenet_v3_large(pretrained=True)
mobilenet_v3_small = models.mobilenet_v3_small(pretrained=True)
resnext50_32x4d = models.resnext50_32x4d(pretrained=True)
wide_resnet50_2 = models.wide_resnet50_2(pretrained=True)
mnasnet = models.mnasnet1_0(pretrained=True)

需要注意的是：

通常PyTorch模型的扩展为.pt或.pth，程序运行时会首先检查默认路径中是否有已经下载的模型权重，一旦权重被下载，下次加载就不需要下载了。
一般情况下预训练模型的下载会比较慢，我们可以直接通过迅雷或者其他方式去这里查看自己的模型里面model_urls，然后手动下载，预训练模型的权重在Linux和Mac的默认下载路径是用户根目录下的.cache文件夹。在Windows下就是C:\\Users\\<username>\\.cache\\torch\\hub\\checkpoint。我们可以通过使用 torch.utils.model_zoo.load_url()设置权重的下载地址。

如果觉得麻烦，还可以将自己的权重下载下来放到同文件夹下，然后再将参数加载网络。

self.model = models.resnet50(pretrained=False)
self.model.load_state_dict(torch.load('./model/resnet50-19c8e357.pth'))

如果中途强行停止下载的话，一定要去对应路径下将权重文件删除干净，要不然可能会报错。

3.2 训练特定层

在默认情况下，参数的属性.requires_grad = True，如果我们从头开始训练或微调不需要注意这里。但如果我们正在提取特征并且只想为新初始化的层计算梯度，其他参数不进行改变。那我们就需要通过设置requires_grad = False来冻结部分层。在PyTorch官方中提供了这样一个例程。

def set_parameter_requires_grad(model, feature_extracting):
    if feature_extracting:
        for param in model.parameters():
            param.requires_grad = False

在下面我们仍旧使用resnet18为例的将1000类改为4类，但是仅改变最后一层的模型参数，不改变特征提取的模型参数；注意我们先冻结模型参数的梯度，再对模型输出部分的全连接层进行修改，这样修改后的全连接层的参数就是可计算梯度的。

import torchvision.models as models
# 冻结参数的梯度
feature_extract = True
model = models.resnet18(pretrained=True)
set_parameter_requires_grad(model, feature_extract)
# 修改模型
num_ftrs = model.fc.in_features
model.fc = nn.Linear(in_features=512, out_features=4, bias=True)

之后在训练过程中，model仍会进行梯度回传，但是参数更新则只会发生在fc层。通过设定参数的requires_grad属性，我们完成了指定训练模型的特定层的目标，这对实现模型微调非常重要。

4. 模型微调 - timm

除了使用torchvision.models进行预训练以外，还有一个常见的预训练模型库，叫做timm，这个库是由来自加拿大温哥华Ross Wightman创建的。里面提供了许多计算机视觉的SOTA模型，可以当作是torchvision的扩充版本，并且里面的模型在准确度上也较高。

更多资料可以查看：

Github链接：https://github.com/rwightman/pytorch-image-models
官网链接：https://fastai.github.io/timmdocs/ https://rwightman.github.io/pytorch-image-models/

4.1. timm的安装

关于timm的安装，我们可以选择以下两种方式进行：

通过pip安装

pip install timm

通过git与pip进行安装

git clone https://github.com/rwightman/pytorch-image-models
cd pytorch-image-models && pip install -e .

4.2. 如何查看预训练模型种类

4.2.1. 查看`timm`提供的预训练模型

截止到2022.3.27日为止，timm提供的预训练模型已经达到了592个，我们可以通过timm.list_models()方法查看timm提供的预训练模型（注：本章测试代码均是在jupyter notebook上进行）

import timm
avail_pretrained_models = timm.list_models(pretrained=True)
len(avail_pretrained_models)
592

4.2.2. 查看特定模型的所有种类

每一种系列可能对应着不同方案的模型

比如Resnet系列就包括了ResNet18，50，101等模型，我们可以在timm.list_models()传入想查询的模型名称（模糊查询），比如我们想查询densenet系列的所有模型。

all_densnet_models = timm.list_models("*densenet*")
all_densnet_models

我们发现以列表的形式返回了所有densenet系列的所有模型。

['densenet121',
 'densenet121d',
 'densenet161',
 'densenet169',
 'densenet201',
 'densenet264',
 'densenet264d_iabn',
 'densenetblur121d',
 'tv_densenet121']

4.2.3. 查看模型的具体参数

当我们想查看下模型的具体参数的时候，我们可以通过访问模型的default_cfg属性来进行查看，具体操作如下

model = timm.create_model('resnet34',num_classes=10,pretrained=True)
model.default_cfg
'url': 'https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/resnet34-43635321.pth',
 'num_classes': 1000,
 'input_size': (3, 224, 224),
 'pool_size': (7, 7),
 'crop_pct': 0.875,
 'interpolation': 'bilinear',
 'mean': (0.485, 0.456, 0.406),
 'std': (0.229, 0.224, 0.225),
 'first_conv': 'conv1',
 'classifier': 'fc',
 'architecture': 'resnet34'

除此之外，我们可以通过访问这个链接查看提供的预训练模型的准确度等信息。

4.3. 使用和修改预训练模型

在得到我们想要使用的预训练模型后，我们可以通过timm.create_model()的方法来进行模型的创建，我们可以通过传入参数pretrained=True，来使用预训练模型。同样的，我们也可以使用跟torchvision里面的模型一样的方法查看模型的参数，类型

import timm
import torch

model = timm.create_model('resnet34',pretrained=True)
x = torch.randn(1,3,224,224)
output = model(x)
output.shape
torch.Size([1, 1000])

查看某一层模型参数（以第一层卷积为例）

model = timm.create_model('resnet34',pretrained=True)
list(dict(model.named_children())['conv1'].parameters())
[Parameter containing:
 tensor([[[[-2.9398e-02, -3.6421e-02, -2.8832e-02,  ..., -1.8349e-02,
            -6.9210e-03,  1.2127e-02],
           [-3.6199e-02, -6.0810e-02, -5.3891e-02,  ..., -4.2744e-02,
            -7.3169e-03, -1.1834e-02],
            ...
           [ 8.4563e-03, -1.7099e-02, -1.2176e-03,  ...,  7.0081e-02,
             2.9756e-02, -4.1400e-03]]]], requires_grad=True)]

修改模型（将1000类改为10类输出）

model = timm.create_model('resnet34',num_classes=10,pretrained=True)
x = torch.randn(1,3,224,224)
output = model查看详情  
                
pytorch训练技巧(代码片段)
Pytorch训练技巧文章目录Pytorch训练技巧1、指定GPU编号2、查看模型每层输出详情3、梯度裁剪（GradientClipping）4、扩展单张图片维度5、独热编码6、防止验证模型时爆显存7、学习率衰减8、冻结某些层的参数1、指定GPU编号设...  查看详情  
                
pytorch训练深度学习小技巧收集(代码片段)
1、对不同的网络层配置不同的学习率importtorchoptimizer=torch.optim.Adam([dict(params=model.conv1.parameters(),weight_decay=5e-4),dict(params=model.conv2.parameters(),weight_decay=0)],lr=args.lr)#Onlyperformweight-decayonfirstconvolution.  查看详情  
                
pytorch学习笔记：pytorch生态简介(代码片段)
PyTorch生态简介往期学习资料推荐：1.Pytorch实战笔记_GoAI的博客-CSDN博客2.Pytorch入门教程_GoAI的博客-CSDN博客本系列目录：PyTorch学习笔记（一）：PyTorch环境安装PyTorch学习笔记（二）：简介与基础知识Py...  查看详情  
                
把显存用在刀刃上！17种pytorch节约显存技巧(代码片段)
引导1.显存都用在哪儿了？2.技巧1：使用就地操作3.技巧2：避免中间变量4.技巧3：优化网络模型5.技巧4：减小BATCH_SIZE6.技巧5：拆分BATCH7.技巧6：降低PATCH_SIZE8.技巧7：优化损失求和9.技巧8：调整训...  查看详情  
                
python示例pytorch训练循环(代码片段)
  查看详情  
                
ocr技术系列之六文本检测ctpn的代码实现(代码片段)
这几天一直在用Pytorch来复现文本检测领域的CTPN论文，本文章将从数据处理、训练标签生成、神经网络搭建、损失函数设计、训练主过程编写等这几个方面来一步一步复现CTPN。CTPN算法理论可以参考这里。训练数据处理我们的训...  查看详情  
                
pytorch基础训练库pytorch-base-trainer(支持模型剪枝分布式训练)(代码片段)
Pytorch基础训练库Pytorch-Base-Trainer(支持模型剪枝分布式训练)目录Pytorch基础训练库Pytorch-Base-Trainer(PBT)(支持分布式训练)1.Introduction2.Install3.训练框架 (1)训练引擎(Engine)(2)回调函数(Callback)4.使用方法5.Example:构建自己的分类Pipeline6.可...  查看详情  
                
pytorch自定义数据集模型训练流程(代码片段)
文章目录Pytorch模型自定义数据集训练流程1、任务描述2、导入各种需要用到的包3、分割数据集4、将数据转成pytorch标准的DataLoader输入格式5、导入预训练模型，并修改分类层6、开始模型训练7、利用训好的模型做预测Pytorch模...  查看详情  
                
pytorch（网络模型训练）(代码片段)
上一篇目录标题网络模型训练小插曲训练模型数据训练GPU训练第一种方式方式二：查看GPU信息完整模型验证网络模型训练小插曲区别importtorcha=torch.tensor(5)print(a)print(a.item())importtorchoutput=torch.tensor([[0.1,0.2],[0.05,0.4]])print(ou...  查看详情  
                
动物数据集+动物分类识别训练代码(pytorch)(代码片段)
动物数据集+动物分类识别训练代码(Pytorch)目录 动物数据集+动物分类识别训练代码(Pytorch)1.前言2.Animals-Dataset动物数据集说明（1）Animals90动物数据集（2）Animals10动物数据集（3）自定义数据集3.动物分...  查看详情  
                
训练技巧详解含有部分代码bagoftricksforimageclassificationwithconvolutionalneuralnetworks
训练技巧详解【含有部分代码】BagofTricksforImageClassificationwithConvolutionalNeuralNetworks置顶 2018-12-1122:07:40 Snoopy_Dream 阅读数1332更多分类专栏： 计算机视觉 pytorch 深度学习tricks 版权声明：本文为博主原创  查看详情  
                
pytorch单机多卡训练(代码片段)
pytorch单机多卡训练训练只需要在model定义处增加下面一行：model=model.to(device)#device为0号model=torch.nn.DataParallel(model)载入模型如果是多GPU载入，没有问题如果训练时是多GPU，但是测试时是单GPU，会出现报错解决办法  查看详情  
                
pytorch笔记-开发技巧与爱因斯坦标示法(einops)(代码片段)
欢迎关注我的CSDN：https://blog.csdn.net/caroline_wendy本文地址：https://blog.csdn.net/caroline_wendy/article/details/128222398NLPseq2seq代码编写技巧数据tokenization，离散符号，翻译，划分tokentoken2idx，将token变成idxaddSoS&EoS，开头和结尾添加标识符...  查看详情  
                
pytorch预训练(代码片段)
前言最近使用PyTorch感觉妙不可言，有种当初使用Keras的快感，而且速度还不慢。各种设计直接简洁，方便研究，比tensorflow的臃肿好多了。今天让我们来谈谈PyTorch的预训练，主要是自己写代码的经验以及论坛PyT...  查看详情  
                
pytorch技巧(代码片段)
one-hotencoding和常规label的转化常规label指0,1,2,3,4,5,......（一个数代表一类）#常规label转one-hot向量defencode_onehot(labels):#用单位矩阵来构建onehot向量classes=set(labels)classes_dict=c:np.identity(len(classes))[i,:]fori,cin#单位矩阵e  查看详情  
                
pytorch多卡分布式训练distributeddataparallel使用方法(代码片段)
PyTorch多卡分布式训练DistributedDataParallel 使用方法目录PyTorch多卡分布式训练DistributedDataParallel 使用方法1.DP模式和DP模式(1)单进程多GPU训练模式:DP模式(2)多进程多GPU训练模式：DDP模式2.Pytorch分布式训练方法3.Pytorch-Base-Trainer(PBT)...  查看详情  
                
python使用pytorch微调预先训练的模型(代码片段)
  查看详情

正文

六pytorch进阶训练技巧(代码片段)

六、PyTorch进阶训练技巧

文章目录

1. 自定义损失函数

1.1. 函数定义

1.2. 类定义

1.2.1. DiceLoss

1.2.2. DiceBCELoss

1.2.3. IoULoss

1.2.4. FocalLoss

2. 动态调整学习率

2.1. 使用官方提供的scheduler

2.2. 自定义scheduler

3. 模型微调-torchvision

3.1 使用现有的预训练模型

3.2 训练特定层

4. 模型微调 - timm

4.1. timm的安装

4.2. 如何查看预训练模型种类

4.2.1. 查看timm提供的预训练模型

4.2.2. 查看特定模型的所有种类

4.2.3. 查看模型的具体参数

4.3. 使用和修改预训练模型

pytorch训练技巧(代码片段)

pytorch训练深度学习小技巧收集(代码片段)

pytorch学习笔记：pytorch生态简介(代码片段)

把显存用在刀刃上！17种pytorch节约显存技巧(代码片段)

python示例pytorch训练循环(代码片段)

ocr技术系列之六文本检测ctpn的代码实现(代码片段)

pytorch基础训练库pytorch-base-trainer(支持模型剪枝分布式训练)(代码片段)

pytorch自定义数据集模型训练流程(代码片段)

pytorch（网络模型训练）(代码片段)

动物数据集+动物分类识别训练代码(pytorch)(代码片段)

训练技巧详解含有部分代码bagoftricksforimageclassificationwithconvolutionalneuralnetworks

pytorch单机多卡训练(代码片段)

pytorch笔记-开发技巧与爱因斯坦标示法(einops)(代码片段)

pytorch预训练(代码片段)

pytorch技巧(代码片段)

pytorch多卡分布式训练distributeddataparallel使用方法(代码片段)

python使用pytorch微调预先训练的模型(代码片段)

4.2.1. 查看`timm`提供的预训练模型