参数
参数值
Learning rate
3e−5
Batch size
16
Training epochs
6
Max sequence length
128
Optimizer
Adam
Dropout rate
0.1