【代码解读】OpenCOOD框架之model模块(以PointPillarFCooper为例)

point_pillar_fcooper

  • PointPillarFCooper
  • PointPillars
    • PillarVFE
    • PFNLayer
    • PointPillarScatter
    • BaseBEVBackbone
    • DownsampleConv
      • DoubleConv
  • SpatialFusion
  • 检测头

(紧扣PointPillarFCooper的框架结构,一点一点看代码)

PointPillarFCooper

# -*- coding: utf-8 -*-
# Author: Runsheng Xu <rxx3386@ucla.edu>
# License: TDG-Attribution-NonCommercial-NoDistrib
import pprint

import torch.nn as nn

from opencood.models.sub_modules.pillar_vfe import PillarVFE
from opencood.models.sub_modules.point_pillar_scatter import PointPillarScatter
from opencood.models.sub_modules.base_bev_backbone import BaseBEVBackbone
from opencood.models.sub_modules.downsample_conv import DownsampleConv
from opencood.models.sub_modules.naive_compress import NaiveCompressor
from opencood.models.fuse_modules.f_cooper_fuse import SpatialFusion


class PointPillarFCooper(nn.Module):
    """
    F-Cooper implementation with point pillar backbone.
    """
    def __init__(self, args):
        super(PointPillarFCooper, self).__init__()
        print("args: ")
        pprint.pprint(args)
        self.max_cav = args['max_cav']
        # PIllar VFE Voxel Feature Encoding
        self.pillar_vfe = PillarVFE(args['pillar_vfe'],
                                    num_point_features=4,
                                    voxel_size=args['voxel_size'],
                                    point_cloud_range=args['lidar_range'])
        self.scatter = PointPillarScatter(args['point_pillar_scatter'])
        self.backbone = BaseBEVBackbone(args['base_bev_backbone'], 64)
        # used to downsample the feature map for efficient computation
        self.shrink_flag = False
        if 'shrink_header' in args:
            self.shrink_flag = True
            self.shrink_conv = DownsampleConv(args['shrink_header'])
        self.compression = False

        if args['compression'] > 0:
            self.compression = True
            self.naive_compressor = NaiveCompressor(256, args['compression'])

        self.fusion_net = SpatialFusion()

        self.cls_head = nn.Conv2d(128 * 2, args['anchor_number'],
                                  kernel_size=1)
        self.reg_head = nn.Conv2d(128 * 2, 7 * args['anchor_number'],
                                  kernel_size=1)

        if args['backbone_fix']:
            self.backbone_fix()

  • args: 其实就是从hypes_yaml配置文件里传来的参数
args:
{'anchor_number': 2,
 'backbone_fix': False,
 'base_bev_backbone': {'layer_nums': [3, 5, 8],
                       'layer_strides': [2, 2, 2],
                       'num_filters': [64, 128, 256],
                       'num_upsample_filter': [128, 128, 128],
                       'upsample_strides': [1, 2, 4]},
 'compression': 0,
 'lidar_range': [-140.8, -40, -3, 140.8, 40, 1],
 'max_cav': 5,
 'pillar_vfe': {'num_filters': [64],
                'use_absolute_xyz': True,
                'use_norm': True,
                'with_distance': False},
 'point_pillar_scatter': {'grid_size': array([704, 200,   1], dtype=int64),
                          'num_features': 64},
 'shrink_header': {'dim': [256],
                   'input_dim': 384,
                   'kernal_size': [1],
                   'padding': [0],
                   'stride': [1]},
 'voxel_size': [0.4, 0.4, 4]}

  • PointPillarsFcooper结构
PointPillarFCooper(
  (pillar_vfe): PillarVFE(
    (pfn_layers): ModuleList(
      (0): PFNLayer(
        (linear): Linear(in_features=10, out_features=64, bias=False)
        (norm): BatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      )
    )
  )
  (scatter): PointPillarScatter()
  (backbone): BaseBEVBackbone(
    (blocks): ModuleList(
      (0): Sequential(
        (0): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0)
        (1): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (2): BatchNorm2d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (3): ReLU()
        (4): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (5): BatchNorm2d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (6): ReLU()
        (7): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (8): BatchNorm2d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (9): ReLU()
        (10): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (11): BatchNorm2d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (12): ReLU()
      )
      (1): Sequential(
        (0): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0)
        (1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (2): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (3): ReLU()
        (4): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (5): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (6): ReLU()
        (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (8): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (9): ReLU()
        (10): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (11): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (12): ReLU()
        (13): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (14): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (15): ReLU()
        (16): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (17): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (18): ReLU()
      )
      (2): Sequential(
        (0): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0)
        (1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (2): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (3): ReLU()
        (4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (5): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (6): ReLU()
        (7): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (8): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (9): ReLU()
        (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (11): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (12): ReLU()
        (13): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (14): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (15): ReLU()
        (16): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (17): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (18): ReLU()
        (19): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (20): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (21): ReLU()
        (22): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (23): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (24): ReLU()
        (25): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (26): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (27): ReLU()
      )
    )
    (deblocks): ModuleList(
      (0): Sequential(
        (0): ConvTranspose2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU()
      )
      (1): Sequential(
        (0): ConvTranspose2d(128, 128, kernel_size=(2, 2), stride=(2, 2), bias=False)
        (1): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU()
      )
      (2): Sequential(
        (0): ConvTranspose2d(256, 128, kernel_size=(4, 4), stride=(4, 4), bias=False)
        (1): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU()
      )
    )
  )
  (shrink_conv): DownsampleConv(
    (layers): ModuleList(
      (0): DoubleConv(
        (double_conv): Sequential(
          (0): Conv2d(384, 256, kernel_size=(1, 1), stride=(1, 1))
          (1): ReLU(inplace=True)
          (2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (3): ReLU(inplace=True)
        )
      )
    )
  )
  (fusion_net): SpatialFusion()
  (cls_head): Conv2d(256, 2, kernel_size=(1, 1), stride=(1, 1))
  (reg_head): Conv2d(256, 14, kernel_size=(1, 1), stride=(1, 1))
)

PointPillars

在这里插入图片描述
网络overview:网络的主要组成部分是PFN、Backbone和 SSD 检测头。原始点云被转换为堆叠的柱子张量和柱子索引张量。编码器使用堆叠的柱子来学习一组特征,这些特征可以分散回卷积神经网络的 2D 伪图像。检测头使用来自主干的特征来预测对象的 3D 边界框。请注意:在这里,我们展示了汽车网络的骨干维度。

PillarVFE

就是 voxel feature encoder:先对点云进行特征提取
VFE由PFNLayer(Pillar Feature Net)组成

  • model_cfg
{'num_filters': [64],
 'use_absolute_xyz': True,
  'use_norm': True,
  'with_distance': False},
class PillarVFE(nn.Module):
    def __init__(self, model_cfg, num_point_features, voxel_size,
                 point_cloud_range):
        super().__init__()
        self.model_cfg = model_cfg

        self.use_norm = self.model_cfg['use_norm']
        self.with_distance = self.model_cfg['with_distance']

        self.use_absolute_xyz = self.model_cfg['use_absolute_xyz']
        num_point_features += 6 if self.use_absolute_xyz else 3
        if self.with_distance:
            num_point_features += 1

        self.num_filters = self.model_cfg['num_filters']
        assert len(self.num_filters) > 0
        num_filters = [num_point_features] + list(self.num_filters)

        pfn_layers = []
        for i in range(len(num_filters) - 1):
            in_filters = num_filters[i]
            out_filters = num_filters[i + 1]
            pfn_layers.append(
                PFNLayer(in_filters, out_filters, self.use_norm,
                         last_layer=(i >= len(num_filters) - 2))
            )
        self.pfn_layers = nn.ModuleList(pfn_layers)

        self.voxel_x = voxel_size[0]
        self.voxel_y = voxel_size[1]
        self.voxel_z = voxel_size[2]
        self.x_offset = self.voxel_x / 2 + point_cloud_range[0]
        self.y_offset = self.voxel_y / 2 + point_cloud_range[1]
        self.z_offset = self.voxel_z / 2 + point_cloud_range[2]

PFNLayer

这里只是一个全连接+归一化(好像和原来的算法有出入)

class PFNLayer(nn.Module):
    def __init__(self,
                 in_channels,
                 out_channels,
                 use_norm=True,
                 last_layer=False):
        super().__init__()

        self.last_vfe = last_layer
        self.use_norm = use_norm
        if not self.last_vfe:
            out_channels = out_channels // 2

        if self.use_norm:
            self.linear = nn.Linear(in_channels, out_channels, bias=False)
            self.norm = nn.BatchNorm1d(out_channels, eps=1e-3, momentum=0.01)
        else:
            self.linear = nn.Linear(in_channels, out_channels, bias=True)

        self.part = 50000

PointPillarScatter

主要作用就是三维点云压缩成bev(鸟瞰图)

class PointPillarScatter(nn.Module):
    def __init__(self, model_cfg):
        super().__init__()

        self.model_cfg = model_cfg
        self.num_bev_features = self.model_cfg['num_features']
        self.nx, self.ny, self.nz = model_cfg['grid_size']
        assert self.nz == 1
  • model_cfg:
{'grid_size': array([704, 200,   1], dtype=int64),
 'num_features': 64}

BaseBEVBackbone

参考这个图
在这里插入图片描述
3 * Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)

5 * Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)

8 * Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)

3、5、8对应着layer_nums

  • model_cfg
{'layer_nums': [3, 5, 8],
 'layer_strides': [2, 2, 2],
 'num_filters': [64, 128, 256],
 'num_upsample_filter': [128, 128, 128],
 'upsample_strides': [1, 2, 4]},
class BaseBEVBackbone(nn.Module):
    def __init__(self, model_cfg, input_channels):
        super().__init__()
        self.model_cfg = model_cfg

        if 'layer_nums' in self.model_cfg:

            assert len(self.model_cfg['layer_nums']) == \
                   len(self.model_cfg['layer_strides']) == \
                   len(self.model_cfg['num_filters'])

            layer_nums = self.model_cfg['layer_nums']
            layer_strides = self.model_cfg['layer_strides']
            num_filters = self.model_cfg['num_filters']
        else:
            layer_nums = layer_strides = num_filters = []

        if 'upsample_strides' in self.model_cfg:
            assert len(self.model_cfg['upsample_strides']) \
                   == len(self.model_cfg['num_upsample_filter'])

            num_upsample_filters = self.model_cfg['num_upsample_filter']
            upsample_strides = self.model_cfg['upsample_strides']

        else:
            upsample_strides = num_upsample_filters = []

        num_levels = len(layer_nums)   # len(layer_nums)个Sequential
        c_in_list = [input_channels, *num_filters[:-1]]

        self.blocks = nn.ModuleList()
        self.deblocks = nn.ModuleList()

        for idx in range(num_levels):
            cur_layers = [
                nn.ZeroPad2d(1),
                nn.Conv2d(
                    c_in_list[idx], num_filters[idx], kernel_size=3,
                    stride=layer_strides[idx], padding=0, bias=False
                ),
                nn.BatchNorm2d(num_filters[idx], eps=1e-3, momentum=0.01),
                nn.ReLU()
            ]
            for k in range(layer_nums[idx]):  # 每个Sequential里有多少个以下结构
                cur_layers.extend([
                    nn.Conv2d(num_filters[idx], num_filters[idx],
                              kernel_size=3, padding=1, bias=False),
                    nn.BatchNorm2d(num_filters[idx], eps=1e-3, momentum=0.01),
                    nn.ReLU()
                ])

            self.blocks.append(nn.Sequential(*cur_layers))
            # 以下是deblock模块
            if len(upsample_strides) > 0:
                stride = upsample_strides[idx]
                if stride >= 1:
                    self.deblocks.append(nn.Sequential(
                        nn.ConvTranspose2d(
                            num_filters[idx], num_upsample_filters[idx],
                            upsample_strides[idx],
                            stride=upsample_strides[idx], bias=False
                        ),
                        nn.BatchNorm2d(num_upsample_filters[idx],
                                       eps=1e-3, momentum=0.01),
                        nn.ReLU()
                    ))
                else:
                    stride = np.round(1 / stride).astype(np.int)
                    self.deblocks.append(nn.Sequential(
                        nn.Conv2d(
                            num_filters[idx], num_upsample_filters[idx],
                            stride,
                            stride=stride, bias=False
                        ),
                        nn.BatchNorm2d(num_upsample_filters[idx], eps=1e-3,
                                       momentum=0.01),
                        nn.ReLU()
                    ))

        c_in = sum(num_upsample_filters)
        if len(upsample_strides) > num_levels:
            self.deblocks.append(nn.Sequential(
                nn.ConvTranspose2d(c_in, c_in, upsample_strides[-1],
                                   stride=upsample_strides[-1], bias=False),
                nn.BatchNorm2d(c_in, eps=1e-3, momentum=0.01),
                nn.ReLU(),
            ))

        self.num_bev_features = c_in

DownsampleConv

其实就是下采样(用了几个DoubleConv)
主要作用就是

  • 降低计算成本: 在深度神经网络中,参数量和计算量通常会随着输入数据的尺寸增加而增加。通过下采样,可以降低每个层的输入数据的尺寸,从而降低网络的计算成本。
  • 减少过拟合: 下采样可以通过减少输入数据的维度和数量来减少模型的复杂性,从而有助于降低过拟合的风险。过拟合是指模型在训练数据上表现良好,但在测试数据上表现较差的现象。
  • 提高模型的泛化能力: 通过减少输入数据的空间分辨率,下采样有助于模型学习更加抽象和通用的特征,从而提高了模型对于不同数据的泛化能力。
  • 加速训练和推理过程: 由于下采样可以降低网络的计算成本,因此可以加快模型的训练和推理过程。这对于处理大规模数据和实时应用特别有用。
class DownsampleConv(nn.Module):
    def __init__(self, config):
        super(DownsampleConv, self).__init__()
        self.layers = nn.ModuleList([])
        input_dim = config['input_dim']

        for (ksize, dim, stride, padding) in zip(config['kernal_size'],
                                                 config['dim'],
                                                 config['stride'],
                                                 config['padding']):
            self.layers.append(DoubleConv(input_dim,
                                          dim,
                                          kernel_size=ksize,
                                          stride=stride,
                                          padding=padding))
            input_dim = dim

config参数

{'dim': [256],
 'input_dim': 384,
 'kernal_size': [1],
 'padding': [0],
 'stride': [1]},

DoubleConv

其实就是两层卷积

class DoubleConv(nn.Module):
    """
    Double convoltuion
    Args:
        in_channels: input channel num
        out_channels: output channel num
    """

    def __init__(self, in_channels, out_channels, kernel_size,
                 stride, padding):
        super().__init__()
        self.double_conv = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size,
                      stride=stride, padding=padding),
            nn.ReLU(inplace=True),
            nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1),
            nn.ReLU(inplace=True)
        )

SpatialFusion

其实就是取最大来进行融合特征
在这里插入图片描述

class SpatialFusion(nn.Module):
    def __init__(self):
        super(SpatialFusion, self).__init__()

    def regroup(self, x, record_len):
        cum_sum_len = torch.cumsum(record_len, dim=0)
        split_x = torch.tensor_split(x, cum_sum_len[:-1].cpu())
        return split_x

    def forward(self, x, record_len):
        # x: B, C, H, W, split x:[(B1, C, W, H), (B2, C, W, H)]
        split_x = self.regroup(x, record_len)
        out = []

        for xx in split_x:
            xx = torch.max(xx, dim=0, keepdim=True)[0]
            out.append(xx)
        return torch.cat(out, dim=0)

检测头

(cls_head): Conv2d(256, 2, kernel_size=(1, 1), stride=(1, 1))
(reg_head): Conv2d(256, 14, kernel_size=(1, 1), stride=(1, 1))

http://www.niftyadmin.cn/n/5396044.html

相关文章

uniapp小程序uView自定义tabbar

两年没接触小程序&#xff0c;又重新拾请来 前言 工具&#xff1a;HBuilder X 3.99版本 微信开发者工具 1.06 语言&#xff1a;vue2 uView 一、创建项目 先使用HBuilder X工具创建一个空白uni-app项目 uviewTest 二、安装和配置 HBuilder X找到工具-》插件安装-》插件市场 u…

express+mysql+vue,从零搭建一个商城管理系统5--用户注册

提示&#xff1a;学习express&#xff0c;搭建管理系统 文章目录 前言一、新建user表二、安装bcryptjs、MD5、body-parser三、修改config/db.js四、新建config/bcrypt.js五、新建models文件夹和models/user.js五、index.js引入使用body-parser六、修改routes/user.js七、启动项…

深度学习 精选笔记(3)线性神经网络-线性回归

学习参考&#xff1a; 动手学深度学习2.0Deep-Learning-with-TensorFlow-bookpytorchlightning ①如有冒犯、请联系侵删。 ②已写完的笔记文章会不定时一直修订修改(删、改、增)&#xff0c;以达到集多方教程的精华于一文的目的。 ③非常推荐上面&#xff08;学习参考&#x…

iconfont的组件化使用方法(SVG)

目录 一、需求描述二、操作步骤1.在iconfont中选择项目需要使用的图标2.在项目中创建iconfont.js3.创建svgIcon组件 一、需求描述 将iconfont图标库选择的图标以SVG的形式引入项目并通过组件化的形式在项目中引用可控制图标的大小和颜色 二、操作步骤 1.在iconfont中选择项目…

十二、计算机视觉-开运算与闭运算

提示&#xff1a;文章写完后&#xff0c;目录可以自动生成&#xff0c;如何生成可参考右边的帮助文档 文章目录 一、基础概念二、使用开运算闭运算 一、基础概念 学过前面2节在看开运算和闭运算就比较简单了 开运算&#xff1a;是先进行腐蚀操作&#xff0c;然后进行膨胀操作的…

kubectl 声明式资源管理方式

目录 介绍 YAML 语法格式 命令 应用yaml文件指定的资源 删除yaml文件指定的资源 查看资源的yaml格式信息 查看yaml文件字段说明 查看 api 资源版本标签 修改yaml文件指定的资源 离线修改 在线修改 编写yaml文件 创建资源对象 查看创建的pod资源 创建service服务对…

Docker 常用操作命令备忘

Docker 一旦设置好了环境&#xff0c;日常就只要使用简单命令就可以运行和停止。 于是&#xff0c;我每次用的时候&#xff0c;都想不起来一些关键性的命令到底怎么用&#xff0c;特此记录。 一、镜像管理 从公有仓库拉取镜像 &#xff08;对于使用苹果电脑 M1/M2/M3 芯片的 …

Clickhouse填坑记4:Too many parts问题分析

Clickhouse在进行大数据量同步时,感觉很爽,插入速度非常快,但是,在使用过程中却出现了几次“Too many parts”异常报错,搞得很痛苦,这里记录一下解决过程。 我这边采用的是Flink程序,实时将数据写入ClickHouse,在执行一段时间后,会提示“Too many parts”异常,如下异…