Indeed. Except that you're not doing gradient descent to train the neural net itself, but rather to choose an optimal neural net architecture. Meta-training.
Most of my work with optimization has been through the
scipy.optimize libraries, which offer quite a selection of tools - although I've more recently been needing to do it in C++ to avoid having to implement python bindings. The scipy libraries are also annoyingly deficient in A) threading, and B) ability to resume - although you can work around (A) by doing threading inside your cost evaluation function, and (B) by cashing and saving the results of every cost evaluation, so that if you have to start over, it can just look up the answers up to the point where you left off.
Most of my previous optimization projects have been with CFD model optimization, although I've also used it for things like compression, and my most recent project is an attempt to use differential evolution of fluids with an arrhenius equation database to try to evolve a hypercycle