timm使用教程

视觉神经网络模型优秀开源工作：timm 库使用方法和代码解读 (opens new window)

使用教程

开始使用 timm

安装库 (Python3, PyTorch version 1.4+)：

pip install timm

加载你需要的预训练模型权重：

import timm

m = timm.create_model('mobilenetv3_large_100', pretrained=True)
m.eval()

1
2
3
4

加载所有的预训练模型列表 (pprint 是美化打印的标准库)：

import timm
from pprint import pprint
model_names = timm.list_models(pretrained=True)
pprint(model_names)
>>> ['adv_inception_v3',
 'cspdarknet53',
 'cspresnext50',
 'densenet121',
 'densenet161',
 'densenet169',
 'densenet201',
 'densenetblur121d',
 'dla34',
 'dla46_c',
...
]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

利用通配符加载所有的预训练模型列表：

import timm
from pprint import pprint
model_names = timm.list_models('*resne*t*')
pprint(model_names)
>>> ['cspresnet50',
 'cspresnet50d',
 'cspresnet50w',
 'cspresnext50',
...
]

1
2
3
4
5
6
7
8
9
10

统计

如何使用某个模型

这里以著名的 MobileNet v3 为例。MobileNetV3 是一种卷积神经网络，专为手机 CPU 设计。网络设计包括在 MBConv 块中使用 hard swish activation (opens new window) 激活函数和 squeeze-and-excitation (opens new window) 模块。

加载 MobileNet v3 预训练模型：

import timm
model = timm.create_model('mobilenetv3_large_100', pretrained=True)
model.eval()

1
2
3

加载图片和预处理：

import urllib
from PIL import Image
from timm.data import resolve_data_config
from timm.data.transforms_factory import create_transform

config = resolve_data_config({}, model=model)
transform = create_transform(**config)

url, filename = ("https://github.com/pytorch/hub/raw/master/images/dog.jpg", "dog.jpg")
urllib.request.urlretrieve(url, filename)
img = Image.open(filename).convert('RGB')
tensor = transform(img).unsqueeze(0) # transform and add batch dimension

1
2
3
4
5
6
7
8
9
10
11
12

获取模型预测结果

import torch
with torch.no_grad():
    out = model(tensor)
probabilities = torch.nn.functional.softmax(out[0], dim=0)
print(probabilities.shape)
# prints: torch.Size([1000])

1
2
3
4
5
6

获取预测前5名的类名称：

# Get imagenet class mappings
url, filename = ("https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt", "imagenet_classes.txt")
urllib.request.urlretrieve(url, filename) 
with open("imagenet_classes.txt", "r") as f:
    categories = [s.strip() for s in f.readlines()]

# Print top categories per image
top5_prob, top5_catid = torch.topk(probabilities, 5)
for i in range(top5_prob.size(0)):
    print(categories[top5_catid[i]], top5_prob[i].item())
# prints class names and probabilities like:
# [('Samoyed', 0.6425196528434753), ('Pomeranian', 0.04062102362513542), ('keeshond', 0.03186424449086189), ('white wolf', 0.01739676296710968), ('Eskimo dog', 0.011717947199940681)]

1
2
3
4
5
6
7
8
9
10
11
12

开始训练你的模型

对于训练数据集文件夹，指定包含 train 和 validation 的基础文件夹。

想训练一个 SE-ResNet34 在 ImageNet 数据集，4 GPUs，分布式训练，使用 cosine 的 learning rate schedule，命令为：

./distributed_train.sh 4 /data/imagenet --model seresnet34 --sched cosine --epochs 150 --warmup-epochs 5 --lr 0.4 --reprob 0.5 --remode pixel --batch-size 256 --amp -j 4

注：--amp默认使用 native AMP。--apex-amp 将强制使用 Apex 组件。

想训练 EfficientNet-B2 with RandAugment - 80.4 top-1, 95.1 top-5：

These params are for dual Titan RTX cards with NVIDIA Apex installed:

./distributed_train.sh 2 /imagenet/ --model efficientnet_b2 -b 128 --sched step --epochs 450 --decay-epochs 2.4 --decay-rate .97 --opt rmsproptf --opt-eps .001 -j 8 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.3 --drop-connect 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --amp --lr .016

想训练 MixNet-XL with RandAugment - 80.5 top-1, 94.9 top-5：

This params are for dual Titan RTX cards with NVIDIA Apex installed:

./distributed_train.sh 2 /imagenet/ --model mixnet_xl -b 128 --sched step --epochs 450 --decay-epochs 2.4 --decay-rate .969 --opt rmsproptf --opt-eps .001 -j 8 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.3 --drop-connect 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.3 --amp --lr .016 --dist-bn reduce

想训练 SE-ResNeXt-26-D and SE-ResNeXt-26-T：

These hparams (or similar) work well for a wide range of ResNet architecture, generally a good idea to increase the epoch # as the model size increases... ie approx 180-200 for ResNe(X)t50, and 220+ for larger. Increase batch size and LR proportionally for better GPUs or with AMP enabled. These params were for 2 1080Ti cards:

./distributed_train.sh 2 /imagenet/ --model seresnext26t_32x4d --lr 0.1 --warmup-epochs 5 --epochs 160 --weight-decay 1e-4 --sched cosine --reprob 0.4 --remode pixel -b 112

想训练 EfficientNet-B3 with RandAugment - 81.5 top-1, 95.7 top-5：

The training of this model started with the same command line as EfficientNet-B2 w/ RA above. After almost three weeks of training the process crashed. The results weren't looking amazing so I resumed the training several times with tweaks to a few params (increase RE prob, decrease rand-aug, increase ema-decay). Nothing looked great. I ended up averaging the best checkpoints from all restarts. The result is mediocre at default res/crop but oddly performs much better with a full image test crop of 1.0.

想训练 EfficientNet-B0 with RandAugment - 77.7 top-1, 95.3 top-5：

https://github.com/michaelklachko (opens new window) achieved these results with the command line for B2 adapted for larger batch size, with the recommended B0 dropout rate of 0.2.

./distributed_train.sh 2 /imagenet/ --model efficientnet_b0 -b 384 --sched step --epochs 450 --decay-epochs 2.4 --decay-rate .97 --opt rmsproptf --opt-eps .001 -j 8 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.2 --drop-connect 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --amp --lr .048

想训练 ResNet50 with JSD loss and RandAugment (clean + 2x RA augs) - 79.04 top-1, 94.39 top-5：

./distributed_train.sh 2 /imagenet -b 64 --model resnet50 --sched cosine --epochs 200 --lr 0.05 --amp --remode pixel --reprob 0.6 --aug-splits 3 --aa rand-m9-mstd0.5-inc1 --resplit --split-bn --jsd --dist-bn reduce

想训练 EfficientNet-ES (EdgeTPU-Small) with RandAugment - 78.066 top-1, 93.926 top-5

./distributed_train.sh 8 /imagenet --model efficientnet_es -b 128 --sched step --epochs 450 --decay-epochs 2.4 --decay-rate .97 --opt rmsproptf --opt-eps .001 -j 8 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.2 --drop-connect 0.2 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --amp --lr .064

想训练 MobileNetV3-Large-100 - 75.766 top-1, 92,542 top-5：

./distributed_train.sh 2 /imagenet/ --model mobilenetv3_large_100 -b 512 --sched step --epochs 600 --decay-epochs 2.4 --decay-rate .973 --opt rmsproptf --opt-eps .001 -j 7 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.2 --drop-connect 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --amp --lr .064 --lr-noise 0.42 0.9

想训练 ResNeXt-50 32x4d w/ RandAugment - 79.762 top-1, 94.60 top-5：.

./distributed_train.sh 8 /imagenet --model resnext50_32x4d --lr 0.6 --warmup-epochs 5 --epochs 240 --weight-decay 1e-4 --sched cosine --reprob 0.4 --recount 3 --remode pixel --aa rand-m7-mstd0.5-inc1 -b 192 -j 6 --amp --dist-bn reduce

验证/推理你的模型

对于验证集文件夹，指定在 validation 的文件夹位置。

验证带有预训练权重的模型：

python validate.py /imagenet/validation/ --model seresnext26_32x4d --pretrained

根据给定的 checkpoint 作前向推理：

python inference.py /imagenet/validation/ --model mobilenetv3_large_100 --checkpoint ./output/train/model_best.pth.tar

特征提取

timm 中的所有模型都可以从模型中获取各种类型的特征，用于除分类之外的任务。

获取 Penultimate Layer Features：

Penultimate Layer Features的中文含义是 "倒数第2层的特征"，即 classifier 之前的特征。timm 库可以通过多种方式获得倒数第二个模型层的特征，而无需进行模型的手术。

import torch
import timm
m = timm.create_model('resnet50', pretrained=True, num_classes=0)
o = m(torch.randn(2, 3, 224, 224))
print(f'Pooled shape: {o.shape}')

1
2
3
4
5

输出：

Pooled shape: torch.Size([2, 2048])

获取分类器之后的特征：

import torch
import timm
m = timm.create_model('ese_vovnet19b_dw', pretrained=True)
o = m(torch.randn(2, 3, 224, 224))
print(f'Original shape: {o.shape}')
m.reset_classifier(0)
o = m(torch.randn(2, 3, 224, 224))
print(f'Pooled shape: {o.shape}')

1
2
3
4
5
6
7
8

输出：

Pooled shape: torch.Size([2, 1024])

输出多尺度特征：

默认情况下，大多数模型将输出 5 个stride (并非所有模型都有那么多)，第一个从 stride = 2 开始 (有些从 1 或 4 开始)。

import torch
import timm
m = timm.create_model('resnest26d', features_only=True, pretrained=True)
o = m(torch.randn(2, 3, 224, 224))
for x in o:
  print(x.shape)

1
2
3
4
5
6

输出：

torch.Size([2, 64, 112, 112])
torch.Size([2, 256, 56, 56])
torch.Size([2, 512, 28, 28])
torch.Size([2, 1024, 14, 14])
torch.Size([2, 2048, 7, 7])

1
2
3
4
5

.feature_info 属性是一个封装了特征提取信息的类：

比如这个例子输出各个特征的通道数：

import torch
import timm
m = timm.create_model('regnety_032', features_only=True, pretrained=True)
print(f'Feature channels: {m.feature_info.channels()}')
o = m(torch.randn(2, 3, 224, 224))
for x in o:
  print(x.shape)

1
2
3
4
5
6
7

输出：

Feature channels: [32, 72, 216, 576, 1512]
torch.Size([2, 32, 112, 112])
torch.Size([2, 72, 56, 56])
torch.Size([2, 216, 28, 28])
torch.Size([2, 576, 14, 14])
torch.Size([2, 1512, 7, 7])

1
2
3
4
5
6

选择特定的 feature level 或限制 stride：

out_indices：指定输出特征的索引 (实际是指定通道数)。
output_stride：指定输出特征的 stride 值，通过将特征进行 dilated convolution 得到。

import torch
import timm
m = timm.create_model('ecaresnet101d', features_only=True, output_stride=8, out_indices=(2, 4), pretrained=True)
print(f'Feature channels: {m.feature_info.channels()}')
print(f'Feature reduction: {m.feature_info.reduction()}')
o = m(torch.randn(2, 3, 320, 320))
for x in o:
  print(x.shape)

1
2
3
4
5
6
7
8

输出：

Feature channels: [512, 2048]
Feature reduction: [8, 8]
torch.Size([2, 512, 40, 40])
torch.Size([2, 2048, 40, 40])

1
2
3
4

这个例子里面 out_indices=8，代表输出 stride=8 的特征。out_indices=(2,4) 代表输出特征的索引是2和4，即channel数分别是512和2048。

上次更新: 2025/06/29, 11:12:32

← timm概述 timm代码解读→