We develop and analyze a variant of Nesterov's accelerated gradient descent (AGD) for minimization of smooth non-convex functions. We prove that one of two cases occurs: either our AGD variant converges quickly, as if the function was convex, or we produce a certificate that the function is "guilty" of being non-convex. This non-convexity certificate allows us to exploit negative curvature and obtain deterministic, dimension-free acceleration of convergence for non-convex functions. For a function f with Lipschitz continuous gradient and Hessian, we compute a point x with ‖{nabla}f(x)‖ ≤ ε in O(ε~(-7/4) log(1/ε)) gradient and function evaluations. Assuming additionally that the third derivative is Lipschitz, we require only O(ε~(-5/3) log(1/ε)) evaluations.
展开▼