| 参数 | 数值 | 参数 | 数值 |
| Transformer层数 | 12 | Max_seq_length | 128 |
| Hidden_dim | 768 | Optimizer | Adam |
| Learning_rate | 10−5 | LSTM_size | 128 |
| Batch_size | 16 | Dropout_rate | 0.5 |
| Gradient_clip | 0.5 | Epoch | 16 |