出处

模型

准确率

F1值

文本模态

视觉模态

语音模态

[51]

MHSAN

78.70%

/

word2vec

3D-CNN

openSMILE

[52]

Multilogue-Net

81.19%

80.10%

CNN

3D-CNN

openSMILE

[53]

HFFN

80.19%

80.34%

Glove

FACET

COVAREP

[54]

MMMU-BA

82.31%

/

word2vec

3D-CNN

openSMILE

[55]

MARNN

84.31%

/

word2vec

3D-CNN

openSMILE

[56]

MulT

83%

82.80%

Glove

FACET

COVAREP