Remus
Active Member
You don’t know the solution, but you do know which direction it’s in(with a small amount of noise from sampling bias). SGD works by finding that direction, moving a small amount in that direction, and then recalculating that direction. “Stochastic Gradient Descent” is literally exactly what it says.
In a multi dimensional curve SDG is essentially a greedy algorithm when all you can "see" is your nose tip. I argue the way FSD pick edge cases to work on likely follows the same principal, randomly identify many scenarios to gather data at the same time. The scenario that gathers the most number of cases in a fixed time frame is the "gradient" points to the next direction you should go.