参数 | 数值 | 参数 | 数值 |
Transformer层数 | 12 | Max_seq_length | 128 |
Hidden_dim | 768 | Optimizer | Adam |
Learning_rate | 10−5 | LSTM_size | 128 |
Batch_size | 16 | Dropout_rate | 0.5 |
Gradient_clip | 0.5 | Epoch | 16 |