参数

参数值

Learning rate

3e−5

Batch size

16

Training epochs

6

Max sequence length

128

Optimizer

Adam

Dropout rate

0.1