Learning rate			Weight decay	Momentum	Mini-batch size
Inital	Policy	Gamma	10-4	0.9	16
0.002	Inverse	10-4