| 训练配置 | 具体参数 |
| Batch size | 20 |
| Hidden size | 1500 |
| Num steps | 40 |
| Init scale | 0.05 |
| Max grad_norm | 10 |
| Epoch start_decay | 16 |
| Max epoch | 60 |
| Lr decay | 0.65 |
| Dropout | 1.1 |