I have found, through various research papers, that the agreed-upon optimizer to use is SGD (with or without momentum).