Adaptive energy-based gradient methods for large-scale optimization and data-driven discovery of dynamical systems via neural networks