任务 | 模型 | 结构 | #Params(M) | Flops(G) | Top-1 Acc (%) |
图像分类 | ResNet-101 [3] | CNN | 45 | 7.9 | 79.8 |
ViT-B [20] | Transformer | 86.6 | 17.6 | 77.9 | |
PVT-S [25] | 架构参考 | 24.5 | 3.8 | 79.8 | |
CSWin-S [29] | 架构参考 | 35 | 6.9 | 83.6 | |
HRT-B [30] | 架构参考 | 50.3 | 13.7 | 82.8 | |
DeiT-B [35] | 知识蒸馏 | 86 | 17.5 | 81.8 | |
CoAtNet-0 [37] | 串联拼接 | 25 | 4.2 | 81.6 | |
ConTNet-B [40] | 串联拼接 | 39.6 | 6.4 | 81.8 | |
MobileViT-S [41] | 串联拼接 | 5.6 | - | 78.4 | |
MobileViTV2-2.0 [42] | 串联拼接 | 10.6 | 4 | 80.4 | |
ConFormer-S [43] | 并联拼接 | 37.7 | 10.6 | 83.4 | |
Mobile-Former-508M [44] | 并联拼接 | 14 | 0.508 | 79.3 | |
ViTc-1GF [47] | 嵌入块替换 | 17.8 | 4 | 79.1 | |
CCT [48] | 嵌入块替换 | 22.36 | 11.06 | 80.67 | |
LocalViT-S [49] | 前馈层替换 | 22.4 | 4.6 | 80.8 | |
ConViT-S [50] | 自注意力层替换 | 27 | 5.4 | 81.3 | |
BoTNet-S1-50 [51] | 自注意力层替换 | 20.8 | 8.54 | 84.7 | |
LeViT-384 [52] | 架构参考 + 局部替换 | 39.1 | 2.353 | 82.6 | |
CvT-21 [53] | 架构参考 + 局部替换 | 32 | 7.1 | 82.5 | |
CeiT-S [54] | 嵌入块替换 + 前馈层替换 | 24.2 | 4.5 | 82 | |
Edgevits-S [56] | 架构参考 + 局部替换 | 11.1 | 1.9 | 81 | |
CMT-S [57] | 架构参考 + 局部替换 | 25.1 | 4 | 83.5 | |
ParC-Net-S [58] | 架构参考 + 局部替换 | 5 | 3.5 | 78.6 | |
Next-ViT-S [60] | 局部替换 + 创新混合策略 | 31.7 | 5.8 | 82.5 | |
EdgeNeXt-S [61] | 架构参考 + 局部替换 | 5.6 | 1.93 | 78.8 | |
Swin-ACmix-S [62] | 创新混合模块 | 51 | 9 | 83.5 |