超参数
值
Learning rate
5e−6, 1e−5, 2e−5, 3e−5
Warmup rate
0.06, 0.1
Dropout rate
0.1
Batch Size
16, 32, 64, 128