CNN卷积层可视化

CNN 可视化

卷积神经网络（CNN）是深度学习中非常重要的模型结构，它广泛地用于图像处理，极大地提升了模型表现，推动了计算机视觉的发展和进步。但 CNN 是一个“黑盒模型”，人们并不知道 CNN 是如何获得较好表现的，由此带来了深度学习的可解释性问题。如果能理解 CNN 工作的方式，人们不仅能够解释所获得的结果，提升模型的鲁棒性，而且还能有针对性地改进 CNN 的结构以获得进一步的效果提升。

理解 CNN 的重要一步是可视化，包括可视化特征是如何提取的、提取到的特征的形式以及模型在输入数据上的关注点等。本节我们就从上述三个方面出发，介绍如何在 PyTorch 的框架下完成 CNN 模型的可视化。

经过本节的学习，你将收获：

可视化 CNN 卷积核的方法
可视化 CNN 特征图的方法
可视化 CNN 显著图（class activation map）的方法

CNN 卷积核可视化

卷积核在 CNN 中负责提取特征，可视化卷积核能够帮助人们理解 CNN 各个层在提取什么样的特征，进而理解模型的工作原理。例如在 Zeiler 和 Fergus 2013 年的paper (opens new window)中就研究了 CNN 各个层的卷积核的不同，他们发现靠近输入的层提取的特征是相对简单的结构，而靠近输出的层提取的特征就和图中的实体形状相近了，如下图所示：

layer2

layer3

layer4

在 PyTorch 中可视化卷积核也非常方便，核心在于特定层的卷积核即特定层的模型权重，可视化卷积核就等价于可视化对应的权重矩阵。下面给出在 PyTorch 中可视化卷积核的实现方案，以 torchvision 自带的 VGG11 模型为例。

首先加载模型，并确定模型的层信息：

import torch
from torchvision.models import vgg11

model = vgg11(pretrained=True)
print(dict(model.features.named_children()))

1
2
3
4
5

{'0': Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
 '1': ReLU(inplace=True),
 '2': MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False),
 '3': Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
 '4': ReLU(inplace=True),
 '5': MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False),
 '6': Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
 '7': ReLU(inplace=True),
 '8': Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
 '9': ReLU(inplace=True),
 '10': MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False),
 '11': Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
 '12': ReLU(inplace=True),
 '13': Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
 '14': ReLU(inplace=True),
 '15': MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False),
 '16': Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
 '17': ReLU(inplace=True),
 '18': Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
 '19': ReLU(inplace=True),
 '20': MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

卷积核对应的应为卷积层（Conv2d），这里以第“3”层为例，可视化对应的参数：

conv1 = dict(model.features.named_children())['3']
kernel_set = conv1.weight.detach()
num = len(conv1.weight.detach())
print(kernel_set.shape)
for i in range(0,num):
    i_kernel = kernel_set[i]
    plt.figure(figsize=(20, 17))
    if (len(i_kernel)) > 1:
        for idx, filer in enumerate(i_kernel):
            plt.subplot(9, 9, idx+1)
            plt.axis('off')
            plt.imshow(filer[ :, :].detach(),cmap='bwr')

1
2
3
4
5
6
7
8
9
10
11
12

torch.Size([128, 64, 3, 3])

由于第“3”层的特征图由 64 维变为 128 维，因此共有 128*64 个卷积核，其中部分卷积核可视化效果如下图所示：

kernel

CNN 特征图可视化方法

与卷积核相对应，输入的原始图像经过每次卷积层得到的数据称为特征图，可视化卷积核是为了看模型提取哪些特征，可视化特征图则是为了看模型提取到的特征是什么样子的。

获取特征图的方法有很多种，可以从输入开始，逐层做前向传播，直到想要的特征图处将其返回。尽管这种方法可行，但是有些麻烦了。在 PyTorch 中，提供了一个专用的接口使得网络在前向传播过程中能够获取到特征图，这个接口的名称非常形象，叫做 hook。可以想象这样的场景，数据通过网络向前传播，网络某一层我们预先设置了一个钩子，数据传播过后钩子上会留下数据在这一层的样子，读取钩子的信息就是这一层的特征图。具体实现如下：

class Hook(object):
    def __init__(self):
        self.module_name = []
        self.features_in_hook = []
        self.features_out_hook = []

    def __call__(self,module, fea_in, fea_out):
        print("hooker working", self)
        self.module_name.append(module.__class__)
        self.features_in_hook.append(fea_in)
        self.features_out_hook.append(fea_out)
        return None


def plot_feature(model, idx, inputs):
    hh = Hook()
    model.features[idx].register_forward_hook(hh)

    # forward_model(model,False)
    model.eval()
    _ = model(inputs)
    print(hh.module_name)
    print((hh.features_in_hook[0][0].shape))
    print((hh.features_out_hook[0].shape))

    out1 = hh.features_out_hook[0]

    total_ft  = out1.shape[1]
    first_item = out1[0].cpu().clone()

    plt.figure(figsize=(20, 17))


    for ftidx in range(total_ft):
        if ftidx > 99:
            break
        ft = first_item[ftidx]
        plt.subplot(10, 10, ftidx+1)

        plt.axis('off')
        #plt.imshow(ft[ :, :].detach(),cmap='gray')
        plt.imshow(ft[ :, :].detach())

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

这里我们首先实现了一个 hook 类，之后在 plot_feature 函数中，将该 hook 类的对象注册到要进行可视化的网络的某层中。model 在进行前向传播的时候会调用 hook 的call函数，我们也就是在那里存储了当前层的输入和输出。这里的 features_out_hook 是一个 list，每次前向传播一次，都是调用一次，也就是 features_out_hook 长度会增加 1。

CNN class activation map 可视化方法

class activation map （CAM）的作用是判断哪些变量对模型来说是重要的，在 CNN 可视化的场景下，即判断图像中哪些像素点对预测结果是重要的。除了确定重要的像素点，人们也会对重要区域的梯度感兴趣，因此在 CAM 的基础上也进一步改进得到了 Grad-CAM（以及诸多变种）。CAM 和 Grad-CAM 的示例如下图所示：

cam

相比可视化卷积核与可视化特征图，CAM 系列可视化更为直观，能够一目了然地确定重要区域，进而进行可解释性分析或模型优化改进。CAM 系列操作的实现可以通过开源工具包 pytorch-grad-cam 来实现。

安装

pip install grad-cam

一个简单的例子

import torch
from torchvision.models import vgg11,resnet18,resnet101,resnext101_32x8d
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np

model = vgg11(pretrained=True)
img_path = './dog.png'
# resize操作是为了和传入神经网络训练图片大小一致
img = Image.open(img_path).resize((224,224))
# 需要将原始图片转为np.float32格式并且在0-1之间
rgb_img = np.float32(img)/255
plt.imshow(img)

1
2
3
4
5
6
7
8
9
10
11
12
13

dog

from pytorch_grad_cam import GradCAM,ScoreCAM,GradCAMPlusPlus,AblationCAM,XGradCAM,EigenCAM,FullGrad
from pytorch_grad_cam.utils.model_targets import ClassifierOutputTarget
from pytorch_grad_cam.utils.image import show_cam_on_image

# 将图片转为tensor
img_tensor = torch.from_numpy(rgb_img).permute(2,0,1).unsqueeze(0)

target_layers = [model.features[-1]]
# 选取合适的类激活图，但是ScoreCAM和AblationCAM需要batch_size
cam = GradCAM(model=model,target_layers=target_layers)
targets = [ClassifierOutputTarget(preds)]
# 上方preds需要设定，比如ImageNet有1000类，这里可以设为200
grayscale_cam = cam(input_tensor=img_tensor, targets=targets)
grayscale_cam = grayscale_cam[0, :]
cam_img = show_cam_on_image(rgb_img, grayscale_cam, use_rgb=True)
print(type(cam_img))
Image.fromarray(cam_img)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

grad_cam

使用 FlashTorch 快速实现 CNN 可视化

聪明的你可能要问了，已经 202x 年了，难道还要我们手把手去写各种 CNN 可视化的代码吗？答案当然是否定的。随着 PyTorch 社区的努力，目前已经有不少开源工具能够帮助我们快速实现 CNN 可视化。这里我们介绍其中的一个——FlashTorch (opens new window)。

（注：使用中发现该 package 对环境有要求，如果下方代码运行报错，请参考作者给出的配置或者 Colab 运行环境：https://github.com/MisaOgura/flashtorch/issues/39）

安装

pip install flashtorch

可视化梯度

# Download example images
# !mkdir -p images
# !wget -nv \
#    https://github.com/MisaOgura/flashtorch/raw/master/examples/images/great_grey_owl.jpg \
#    https://github.com/MisaOgura/flashtorch/raw/master/examples/images/peacock.jpg   \
#    https://github.com/MisaOgura/flashtorch/raw/master/examples/images/toucan.jpg    \
#    -P /content/images

import matplotlib.pyplot as plt
import torchvision.models as models
from flashtorch.utils import apply_transforms, load_image
from flashtorch.saliency import Backprop

model = models.alexnet(pretrained=True)
backprop = Backprop(model)

image = load_image('/content/images/great_grey_owl.jpg')
owl = apply_transforms(image)

target_class = 24
backprop.visualize(owl, target_class, guided=True, use_gpu=True)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

ft-gradient

可视化卷积核

import torchvision.models as models
from flashtorch.activmax import GradientAscent

model = models.vgg16(pretrained=True)
g_ascent = GradientAscent(model.features)

# specify layer and filter info
conv5_1 = model.features[24]
conv5_1_filters = [45, 271, 363, 489]

g_ascent.visualize(conv5_1, conv5_1_filters, title="VGG16: conv5_1")

1
2
3
4
5
6
7
8
9
10
11

ft-activate

参考资料

https://andrewhuman.github.io/cnn-hidden-layout_search
https://cloud.tencent.com/developer/article/1747222
https://github.com/jacobgil/pytorch-grad-cam
https://github.com/MisaOgura/flashtorch

#PyTorch

上次更新: 2025/06/25, 11:25:50

← 可视化网络结构 TensorBoard→