# Provable Nonconvex Methods/Algorithms

General nonconvex optimization is undoubtedly hard – in sharp contrast to convex optimization, of which there is good separation of problem structure, input data, and optimization algorithms. But many nonconvex problems of interest become amenable to simple and practical algorithms and rigorous analyses once the artificial separation is removed. This page collects recent research effort in this line. (**Update: Dec 14 2017**)

[**S**] indicates my contribution.

## Problems with Hidden Convexity or Analytic Solutions

- These slides summarize lots of them.

### Blind Deconvolution

### Separable Nonnegative Matrix Factorization (NMF)

- Intersecting Faces: Non-negative Matrix Factorization With New Guarantees (2015)
- The why and how of nonnegative matrix factorization (2014)
- Computing a Nonnegative Matrix Factorization – Provably (2011)

## Problems with Provable Global Results

### Matrix Completion/Sensing

(See also low-rank matrix/tensor recovery )

- Accelerated Alternating Projections for Robust Principal Component Analysis (2017)
- Provable quantum state tomography via non-convex methods (2017)
- Memory-efficient Kernel PCA via Partial Matrix Sampling and Nonconvex Optimization: a Model-free Analysis of Local Minima (2017)
- Nonconvex Low-Rank Matrix Recovery with Arbitrary Outliers via Median-Truncated Gradient Descent (2017)
- Robust Principal Component Analysis by Manifold Optimization (2017)
- Spectral Compressed Sensing via Projected Gradient Descent (2017)
- Reexamining Low Rank Matrix Factorization for Trace Norm Regularization (2017)
- A Well-Tempered Landscape for Non-convex Robust Subspace Recovery (2017)
- Optimal Sample Complexity for Matrix Completion and Related Problems via \(\ell_2\)-Regularization (2017)
- Geometry of Factored Nuclear Norm Regularization (2017)
- No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis (2017)
- Painless Breakups – Efficient Demixing of Low Rank Matrices (2017)
- Speeding Up Latent Variable Gaussian Graphical Model Estimation via Nonconvex Optimizations (2017)
- Global Optimality in Low-rank Matrix Optimization (2017)
- A Nonconvex Free Lunch for Low-Rank plus Sparse Matrix Recovery (2017)
- Symmetry, Saddle Points, and Global Geometry of Nonconvex Matrix Factorization (2016)
- Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach (2016)
- Nearly-optimal Robust Matrix Completion (2016)
- Provable non-convex projected gradient descent for a class of constrained matrix optimization problems (2016)
- Finding Low-rank Solutions to Matrix Problems, Efficiently and Provably (2016)
- Provable Efficient Online Matrix Completion via Non-convex Stochastic Gradient Descent (2016)
- Fast Algorithms for Robust PCA via Gradient Descent (2016)
- Convergence Analysis for Rectangular Matrix Completion Using Burer-Monteiro Factorization and Gradient Descent (2016)
- Matrix Completion has No Spurious Local Minimum (2016)
- Global Optimality of Local Search for Low Rank Matrix Recovery (2016)
- Guarantees of Riemannian Optimization for Low Rank Matrix Completion (2016)
- A Note on Alternating Minimization Algorithm for the Matrix Completion Problem (2016)
- Recovery guarantee of weighted low-rank approximation via alternating minimization (2016)
- Efficient Matrix Sensing Using Rank-1 Gaussian Measurements (2015)
- Guarantees of Riemannian Optimization for Low Rank Matrix Recovery (2015)
- Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees (2015)
- Low-rank Solutions of Linear Matrix Equations via Procrustes Flow (2015)

- A Convergent Gradient Descent Algorithm for Rank Minimization and Semidefinite Programming from Random Linear Measurements (2015)
- Guaranteed Matrix Completion via Non-convex Factorization (2014)
- Fast Exact Matrix Completion with Finite Samples (2014)

- Non-convex Robust PCA (2014)

- Fast Matrix Completion without the Condition Number (2014)

- Understanding Alternating Minimization for Matrix Completion (2013)

- Low-rank Matrix Completion using Alternating Minimization (2012)

- Matrix Completion from a Few Entries (2009)

- Guaranteed Rank Minimization via Singular Value Projection (2009)

### Tensor Recovery/Decomposition & Hidden Variable Models

- The landscape of the spiked tensor model (2017)
- Statistically Optimal and Computationally Efficient Low Rank Tensor Completion from Noisy Entries (2017)
- On the Optimization Landscape of Tensor Decompositions (2017)
- Orthogonalized ALS: A Theoretically Principled Tensor Decomposition Algorithm for Practical Use (2017)
- On Polynomial Time Methods for Exact Low Rank Tensor Completion (2017)
- Homotopy Method for Tensor PCA (2016)
- Low-tubal-rank Tensor Completion using Alternating Minimization (2016)
- Speeding up sum-of-squares for tensor decomposition and planted sparse vectors (2015)
- Tensor vs Matrix Methods: Robust Tensor Decomposition under Block Sparse Perturbations (2015)
- Analyzing Tensor Power Method Dynamics: Applications to Learning Overcomplete Latent Variable Models (2014)
- Tensor decompositions for learning latent variable models (2014)
- Provable Learning of Overcomplete Latent Variable Models: Semi-supervised and Unsupervised Settings (2014)
- Provable Tensor Factorization with Missing Data (2014)
- Guaranteed Non-Orthogonal Tensor Decomposition via Alternating Rank-1 Updates (2014)

### Phase Retrieval

- Compressive Phase Retrieval via Reweighted Amplitude Flow (2017)
- A Local Analysis of Block Coordinate Descent for Gaussian Phase Retrieval ([
**S**], 2017) - Convolutional Phase Retrieval via Gradient Descent (2017)
- Linear Convergence of An Iterative Phase Retrieval Algorithm with Data Reuse (2017)
- The nonsmooth landscape of phase retrieval (2017)
- Phase Retrieval via Linear Programming: Fundamental Limits and Algorithmic Improvements (2017)
- Convergence of the randomized Kaczmarz method for phase retrieval (2017)
- Phase Retrieval via Randomized Kaczmarz: Theoretical Guarantees (2017)
- Phase retrieval using alternating minimization in a batch setting (2017)
- Solving Almost all Systems of Random Quadratic Equations (2017)
- Phase Retrieval Using Structured Sparsity: A Sample Efficient Algorithmic Framework (2017)
- Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval (2017)
- Robust Wirtinger Flow for Phase Retrieval with Arbitrary Corruption (2017)
- Phase Retrieval via Sparse Wirtinger Flow (2017)
- Structured signal recovery from quadratic measurements: Breaking sample complexity barriers via nonconvex optimization (2017)
- Sparse Phase Retrieval via Truncated Amplitude Flow (2016)
- Solving Large-scale Systems of Random Quadratic Equations via Stochastic Truncated Amplitude Flow (2016)
- Low Rank Phase Retrieval (2016)
- Phase retrieval with random Gaussian sensing vectors by alternating projections (2016)
- Non-Convex Phase Retrieval from STFT Measurements (2016)
- Gauss-Newton Method for Phase Retrieval (2016)
- Phase Retrieval via Incremental Truncated Wirtinger Flow (2016)
- Solving Systems of Random Quadratic Equations via Truncated Amplitude Flow (2016)
- Reshaped Wirtinger Flow for Solving Quadratic Systems of Equations (2016)
- Provable Non-convex Phase Retrieval with Outliers: Median Truncated Wirtinger Flow (2016)
- A Geometric Analysis of Phase Retrieval ([
**S**], 2016) - Phase retrieval for wavelet transforms (2015)
- The Local Convexity of Solving Quadratic Equations (2015)
- Solving systems of phaseless equations via Kaczmarz methods: A proof of concept study (2015)
- Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems (2015)

- Phase Retrieval via Wirtinger Flow: Theory and Algorithms (2014)
- Phase Retrieval using Alternating Minimization (2013)

### Dictionary Learning

(See also **Theory** part in Dictionary/Deep Learning)

- A Provable Approach for Double-Sparse Coding (2017)
- Alternating minimization for dictionary learning with random initialization (2017)
- Complete Dictionary Recovery over the Sphere ([
**S**], 2015)

- Simple, Efficient, and Neural Algorithms for Sparse Coding (2015)
- More Algorithms for Provable Dictionary Learning (2014)
- Exact Recovery of Sparsely Used Overcomplete Dictionaries (2013)
- New Algorithms for Learning Incoherent and Overcomplete Dictionaries (2013)
- Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization (2013)

### Deep Learning

- Deep linear neural networks with arbitrary loss: All local minima are global (2017)
- Global optimality conditions for deep neural networks (2017)
- Depth Creates No Bad Local Minima (2017)
- Electron-Proton Dynamics in Deep Learning (2017)
- Deep Learning without Poor Local Minima (2016)
- No bad local minima: Data independent training error guarantees for multilayer neural networks (2016)

### Sparse Vectors in Linear Subspaces

(See Structured Element Pursuit )

### Nonnegative/Sparse Principal Component Analysis

- Recovery Guarantee of Non-negative Matrix Factorization via Alternating Updates (2016)
- Non-negative Principal Component Analysis: Message Passing Algorithms and Sharp Asymptotics (2014)

### Mixed Linear Regression

- Solving a Mixture of Many Random Linear Equations by Tensor Decomposition and Alternating Minimization (2016)
- Provable Tensor Methods for Learning Mixtures of Classifiers (2014)
- Alternating Minimization for Mixed Linear Regression (2013)

### Blind Deconvolution/Calibration

- Blind Gain and Phase Calibration via Sparse Spectral Methods (2017)
- Blind Deconvolution by a Steepest Descent Algorithm on a Quotient Manifold (2017)
- Regularized Gradient Descent: A Nonconvex Recipe for Fast Joint Blind Deconvolution and Demixing (2017)
- Self-Calibration via Linear Least Squares (2016)
- Through the Haze: A Non-Convex Approach to Blind Calibration for Linear Random Sensing Models (2016)
- Fast and guaranteed blind multichannel deconvolution under a bilinear system model (2016)
- Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization (2016)
- A Non-Convex Blind Calibration Method for Randomised Sensing Strategies (2016)
- RIP-like Properties in Subsampled Blind Deconvolution (2015)
- Blind Recovery of Sparse Signals from Subsampled Convolution (2015)
- Near Optimal Compressed Sensing of Sparse Rank-One Matrices via Sparse Power Factorization (2013)

### Super Resolution

### Synchronization Problems/Community Detection

- Near-optimal bounds for phase synchronization (2017)
- Message-passing algorithms for synchronization problems over compact groups (2016)
- On the low-rank approach for semidefinite programs arising in synchronization and community detection (2016)
- Nonconvex phase synchronization (2016)

### Joint Alignment

- The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences (2016)

### Numerical Linear Algebra

- Orbital minimization method with \(\ell_1\) regularization (2017)
- The Global Optimization Geometry of Nonsymmetric Matrix Factorization and Sensing (2017)
- On the matrix square root via geometric optimization (2015)
- Computing Matrix Squareroot via Non Convex Local Search (2015)

### Bayesian Inference

### Empirical Risk Minimization & Shallow Networks

- Gradient Descent Learns One-hidden-layer CNN: Don’t be Afraid of Spurious Local Minima (2017)
- Learning Non-overlapping Convolutional Neural Networks with Multiple Kernels (2017)
- Learning One-hidden-layer Neural Networks with Landscape Design (2017)
- Theoretical properties of the global optimizer of two layer neural network (2017)
- Critical Points of Neural Networks: Analytical Forms and Landscape Properties (2017)
- SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data (2017)
- Porcupine Neural Networks: (Almost) All Local Optima are Global (2017)
- When is a Convolutional Filter Easy To Learn? (2017)
- Theoretical insights into the optimization landscape of over-parameterized shallow neural networks (2017)
- Recovery Guarantees for One-hidden-layer Neural Networks (2017)
- Convergence Analysis of Two-layer Neural Networks with ReLU Activation (2017)
- Global Guarantees for Enforcing Deep Generative Priors by Empirical Risk (2017)
- Learning ReLUs via Gradient Descent (2017)
- Gradient Descent Learns Linear Dynamical Systems (2016)
- The Landscape of Empirical Risk for Non-convex Losses (2016)

### System Identification

### Burer-Monteiro Style Decomposition Algorithms

- Solving SDPs for synchronization and MaxCut problems via the Grothendieck inequality (2017)
- The Nonconvex Geometry of Low-Rank Matrix Optimizations with General Objective Functions (2016)
- The non-convex Burer-Monteiro approach works on smooth semidefinite programs (2016)

- On the low-rank approach for semidefinite programs arising in synchronization and community detection (2016)
- Global Convergence of Stochastic Gradient Descent for Some Nonconvex Matrix Problems (2014)

### Nonconvex Feasibility Problems

- A convergent relaxation of the Douglas-Rachford algorithm (2017)
- A Lyapunov-type approach to convergence of the Douglas-Rachford algorithm (2017)
- Feasibility Problems: Douglas-Rachford and Projection Methods (Project Page)
- Dynamics of the Douglas-Rachford Method for Ellipses and p-Spheres (2016)
- A remark on the convergence of the Douglas-Rachford iteration in a non-convex setting (2016)
- On the finite convergence of the Douglas-Rachford algorithm for solving (not necessarily convex) feasibility problems in Euclidean spaces (2016)
- Global Behavior of the Douglas-Rachford Method for a Nonconvex Feasibility Problem (2015)
- The Douglas–Rachford Algorithm in the Absence of Convexity (2011)

## Of Statistical Nature …

- Sparse Tensor Graphical Model: Non-convex Optimization and Statistical Inference (2016)
- Statistical and Computational Guarantees for the Baum-Welch Algorithm (2015)
- Provable Sparse Tensor Decomposition (2015)
- Statistical consistency and asymptotic normality for high-dimensional robust M-estimators (2015)
- Support recovery without incoherence: A case for nonconvex regularizations (2014)
- High Dimensional Expectation-Maximization Algorithm: Statistical Optimization and Asymptotic Normality (2014)
- Statistical guarantees for the EM algorithm: From population to sample-based analysis (2014)
- Nonconvex Statistical Optimization: Minimax-Optimal Sparse PCA in Polynomial Time (2014)
- Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima (2013)
- High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity (2011)

## Relevant Tools/Local Results

- Saving Gradient and Negative Curvature Computations: Finding Local Minima More Efficiently (2017)
- NEON+: Accelerated Gradient Methods for Extracting Negative Curvature for Non-Convex Optimization (2017)
- Convex Optimization with Nonconvex Oracles (2017)
- Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent (2017)
- Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion and Blind Deconvolution (2017)
- Neon2: Finding Local Minima via First-Order Oracles (2017)
- Run-and-Inspect Method for Nonconvex Optimization and Global Optimality Bounds for R-Local Minimizers (2017)
- Revisiting Normalized Gradient Descent: Evasion of Saddle Points (2017)
- Stochastic Cubic Regularization for Fast Nonconvex Optimization (2017)
- Convex Optimization with Nonconvex Oracles (2017)
- First-order Stochastic Algorithms for Escaping From Saddle Points in Almost Linear Time (2017)
- On local non-global minimizers of quadratic functions with cubic regularization (2017)
- Frank-Wolfe methods for geodesically convex optimization with application to the matrix geometric mean (2017)
- Lower Bounds for Higher-Order Convex Optimization (2017)
- Lower Bounds for Finding Stationary Points II: First-Order Methods (2017)

- Lower Bounds for Finding Stationary Points I (2017)
- Stochastic Non-convex Optimization with Strong High Probability Second-order Convergence (2017)
- Block Coordinate Descent Only Converge to Minimizers (2017)
- First-order Methods Almost Always Avoid Saddle Points (2017)
- Accelerated Block Coordinate Proximal Gradients with Applications in High Dimensional Statistics (2017)
- The power of sum-of-squares for detecting hidden structures (2017)
- Primal-Dual Optimization Algorithms over Riemannian Manifolds: an Iteration Complexity Analysis (2017)
- On Noisy Negative Curvature Descent: Competing with Gradient Descent for Faster Non-convex Optimization (2017)
- Worst-case evaluation complexity and optimality of second-order methods for nonconvex smooth optimization (2017)
- Douglas-Rachford splitting and ADMM for nonconvex optimization: new convergence results and accelerated versions (2017)
- Local Minimizers and Second-Order Conditions in Composite Piecewise Programming via Directional Derivatives (2017)
- Alternating minimization and alternating descent over nonconvex sets (2017)
- Second-Order Optimization for Non-Convex Machine Learning: An Empirical Study (2017)
- Newton-Type Methods for Non-Convex Optimization Under Inexact Hessian Information (2017)
- Non-convex Conditional Gradient Sliding (2017)
- Improved second-order evaluation complexity for unconstrained nonlinear optimization using high-order regularized models (2017)
- An Inexact Regularized Newton Framework with a Worst-Case Iteration Complexity of \(O(\varepsilon^{-3/2})\) for Nonconvex Optimization (2017)
- A Second Order Method for Nonconvex Optimization (2017)
- Behavior of Accelerated Gradient Methods Near Critical Points of Nonconvex Problems (2017)
- Mirror descent in non-convex stochastic programming (2017)
- Complexity analysis of second-order line-search algorithms for smooth nonconvex optimization (2017)
- Provable Alternating Gradient Descent for Non-negative Matrix Factorization with Strong Correlations (2017)
- An Alternative to EM for Gaussian Mixture Models: Batch and Stochastic Riemannian Optimization (2017)
- Using Negative Curvature in Solving Nonlinear Programs (2017)
- Gradient Descent Can Take Exponential Time to Escape Saddle Points (2017)
- Optimality of orders one to three and beyond: characterization and evaluation complexity in constrained nonconvex optimization (2017)

- Sub-sampled Cubic Regularization for Non-convex Optimization (2017)
- Linearized ADMM for Non-convex Non-smooth Optimization with Convergence Analysis (2017)
- “Convex Until Proven Guilty”: Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions (2017)
- Accelerating Stochastic Gradient Descent (2017)
- On the Gap Between Strict-Saddles and True Convexity: An \(\Omega(\log d)\) Lower Bound for Eigenvector Approximation (2017)
- Catalyst Acceleration for Gradient-Based Non-Convex Optimization (2017)
- Perspective: Energy Landscapes for Machine Learning (2017)
- Gradient descent with nonconvex constraints: local concavity determines convergence (2017)
- Riemannian stochastic quasi-Newton algorithm with variance reduction and its convergence analysis (2017)
- How to Escape Saddle Points Efficiently (2017)
- Convergence rate of a simulated annealing algorithm with noisy observations (2017)
- Exploiting Negative Curvature in Deterministic and Stochastic Optimization (2017)
- Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs (2017)
- Online Multiview Representation Learning: Dropping Convexity for Better Efficiency (2017)
- Natasha: Faster Stochastic Non-Convex Optimization via Strongly Non-Convex Parameter (2017)
- Phase Transitions of Spectral Initialization for High-Dimensional Nonconvex Estimation (2017)
- Maximum likelihood estimation of determinantal point processes (2017)
- Fast Rates for Empirical Risk Minimization of Strict Saddle Problems (2017)
- Gradient Descent Efficiently Finds the Cubic-Regularized Non-Convex Newton Step (2016)
- The Power of Normalization: Faster Evasion of Saddle Points (2016)
- Accelerated Methods for Non-Convex Optimization (2016)
- Worst-case evaluation complexity for unconstrained nonlinear optimization using high-order regularized models (2016)
- Finding Local Minima for Nonconvex Optimization in Linear Time (2016)
- Convergence Rate Analysis of a Stochastic Trust Region Method for Nonconvex Optimization (2016)
- Global analysis of Expectation Maximization for mixtures of two Gaussians (2016)
- Local Maxima in the Likelihood of Gaussian Mixture Models: Structural Results and Algorithmic Consequences (2016)
- Local Convergence of the Heavy-ball Method and iPiano for Non-convex Optimization (2016)
- Global rates of convergence for nonconvex optimization on manifolds (2016)
- Gradient Descent Converges to Minimizers: The Case of Non-Isolated Critical Points (2016)
- On the Douglas-Rachford algorithm (2016)
- A Grothendieck-type inequality for local maxima (2016)
- First-order Methods for Geodesically Convex Optimization (2016)
- Efficient approaches for escaping higher order saddle points in non-convex optimization (2016)
- Gradient Descent Converges to Minimizers (2016)
- When Are Nonconvex Problems Not Scary? ([
**S**], 2015; see also my thesis) - On the Global Optimality for Linear Constrained Rank Minimization Problem (2015)
- Global Convergence of ADMM in Nonconvex Nonsmooth Optimization (2015)
- Dropping Convexity for Faster Semi-definite Optimization (2015)
- Local Linear Convergence of the ADMM/Douglas–Rachford Algorithms without Strong Convexity and Application to Statistical Imaging (2015)
- Escaping the Local Minima via Simulated Annealing: Optimization of Approximately Convex Functions (2015)
- Escaping From Saddle Points — Online Stochastic Gradient for Tensor Decomposition (2015)
- Peaceman-Rachford splitting for a class of nonconvex optimization problems (2015)
- Douglas-Rachford splitting for nonconvex optimization with application to nonconvex feasibility problems (2014)
- Global convergence of splitting methods for nonconvex composite optimization (2014)
- Accelerated Gradient Methods for Nonconvex Nonlinear and Stochastic Programming (2013)
- Proximal alternating linearized minimization for nonconvex and nonsmooth problems (2013)
- Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods (2013)
- Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality (2008)
- Cubic regularization of Newton method and its global performance (2006)
- Computing the Local-Nonglobal Minimizer of a Large Scale Trust-Region Subproblem (2005)
- On Some Properties of Quadratic Programs with a Convex Quadratic Constraint (1998)
- Local minimizers of quadratic functions on Euclidean balls and spheres (1994)

Disclaimer- This page is meant to serve a hub for references on this problem, and does not represent in any way personal endorsement of papers listed here. So I do not hold any responsibility for quality and technical correctness of each paper listed here. The reader is advised to use this resource with discretion.

If you’d like your paper to be listed here- Just drop me a few lines via email (which can be found on “Welcome” page). If you don’t bother to spend a word, just deposit your paper on arXiv. I get email alert about new animals there every morning, and will be happy to hunt one for this zoo if it seemsfit.