Method | Backbone | Frames × Crops × Clips | Flops | Param | Pre-train | Top-1 val (%) | Top-5 val (%) |
TSN | ResNet50 | 8 × 1 × 1 | 33G × 1 × 1 | N/A | ImageNet | 19.70 | 46.60 |
TRN | BNInception | 8 × 10 × N/A | 16G × 10 × N/A | 18.3M | ImageNet | 34.40 | 63.20 |
TRN Two-Stream | BNInception | (8 + 8) × 10 × N/A | 32G × 10 × N/A | 36.6M | ImageNet | 42.00 | / |
I3D | 3D ResNet50 | 32 × 3 × 2 | 153G × 3 × 2 | 28.0M | ImageNet + Kinetics400 | 41.60 | 72.20 |
NL-I3D | 3D ResNet50 | 32 × 3 × 2 | 168G × 3 × 2 | 35.3M | ImageNet + Kinetics400 | 44.40 | 76.00 |
NL-I3D&GCN | 3D ResNet50 | 32 × 3 × 2 | 303G × 3 × 2 | 62.2M | ImageNet + Kinetics400 | 46.10 | 76.80 |