超参数

Learning rate

5e−6, 1e−5, 2e−5, 3e−5

Warmup rate

0.06, 0.1

Dropout rate

0.1

Batch Size

16, 32, 64, 128