# Provable Nonconvex Methods/Algorithms

General nonconvex optimization is undoubtedly hard — in sharp contrast to convex optimization, of which there is good separation of problem structure, input data, and optimization algorithms. But many nonconvex problems of interest become amenable to simple and practical algorithms and rigorous analyses once the artificial separation is removed. This page collects recent research effort in this line. (**Update: Mar 14 2019**)

[**S**] indicates my contribution.

[**New**] A BibTex file for papers listed on the page can be downloaded HERE!

## Contents

- Review Articles
- Problems with Hidden Convexity or Analytic Solutions
- Problems with Provable Global Results
- Matrix Completion/Sensing
- Tensor Recovery/Decomposition & Hidden Variable Models
- Phase Retrieval
- Dictionary Learning
- Deep Learning
- Sparse Vectors in Linear Subspaces
- Nonnegative/Sparse Principal Component Analysis
- Mixed Linear Regression
- Blind Deconvolution/Calibration
- Super Resolution
- Synchronization Problems/Community Detection
- Joint Alignment
- Numerical Linear Algebra
- Bayesian Inference
- Empirical Risk Minimization & Shallow Networks
- System Identification
- Burer-Monteiro Style Decomposition Algorithms
- Generic Structured Problems
- Nonconvex Feasibility Problems

- Of Statistical Nature …
- Relevant Optimization Methods, Theory, Miscs

## Review Articles

- Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview (2018)
- Harnessing Structures in Big Data via Guaranteed Low-Rank Matrix Estimation (2018)
- Non-convex Optimization for Machine Learning (2017)

## Problems with Hidden Convexity or Analytic Solutions

- These slides summarize lots of them.

### Blind Deconvolution

### Separable Nonnegative Matrix Factorization (NMF)

- Intersecting Faces: Non-negative Matrix Factorization With New Guarantees (2015)
- The why and how of nonnegative matrix factorization (2014)
- Computing a Nonnegative Matrix Factorization – Provably (2011)

## Problems with Provable Global Results

### Matrix Completion/Sensing

(See also low-rank matrix/tensor recovery )

- Noisy Matrix Completion: Understanding Statistical Guarantees for Convex Relaxation via Nonconvex Optimization (2019)
- Nonconvex Rectangular Matrix Completion via Gradient Descent without $\ell_{2, \infty}$ Regularization (2019)
- Sharp Restricted Isometry Bounds for the Inexistence of Spurious Local Minima in Nonconvex Matrix Recovery (2019)
- Exact Guarantees on the Absence of Spurious Local Minima for Non-negative Robust Principal Component Analysis (2018)
- An equivalence between stationary points for rank constraints versus low-rank factorizations (2018)
- Iterative Hard Thresholding for Low-Rank Recovery from Rank-One Projections (2018)
- Nonconvex Robust Low-rank Matrix Recovery (2018)
- Run Procrustes, Run! On the convergence of accelerated Procrustes Flow (2018)
- Solving Systems of Quadratic Equations via Exponential-type Gradient Descent Algorithm (2018)
- How Much Restricted Isometry is Needed In Nonconvex Matrix Recovery? (2018)
- The Leave-one-out Approach for Matrix Completion: Primal and Dual Analysis (2018)
- Nonconvex Matrix Factorization from Rank-One Measurements (2018)
- Algorithmic Regularization in Over-parameterized Matrix Recovery (2017)
- Accelerated Alternating Projections for Robust Principal Component Analysis (2017)
- Provable quantum state tomography via non-convex methods (2017)
- Memory-efficient Kernel PCA via Partial Matrix Sampling and Nonconvex Optimization: a Model-free Analysis of Local Minima (2017)
- Nonconvex Low-Rank Matrix Recovery with Arbitrary Outliers via Median-Truncated Gradient Descent (2017)
- Robust Principal Component Analysis by Manifold Optimization (2017)
- Spectral Compressed Sensing via Projected Gradient Descent (2017)
- Reexamining Low Rank Matrix Factorization for Trace Norm Regularization (2017)
- A Well-Tempered Landscape for Non-convex Robust Subspace Recovery (2017)
- Optimal Sample Complexity for Matrix Completion and Related Problems via $\ell_2$-Regularization (2017)
- Geometry of Factored Nuclear Norm Regularization (2017)
- No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis (2017)
- Painless Breakups – Efficient Demixing of Low Rank Matrices (2017)
- Speeding Up Latent Variable Gaussian Graphical Model Estimation via Nonconvex Optimizations (2017)
- Global Optimality in Low-rank Matrix Optimization (2017)
- A Nonconvex Free Lunch for Low-Rank plus Sparse Matrix Recovery (2017)
- Symmetry, Saddle Points, and Global Geometry of Nonconvex Matrix Factorization (2016)
- Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach (2016)
- Nearly-optimal Robust Matrix Completion (2016)
- Provable non-convex projected gradient descent for a class of constrained matrix optimization problems (2016)
- Finding Low-rank Solutions to Matrix Problems, Efficiently and Provably (2016)
- Provable Efficient Online Matrix Completion via Non-convex Stochastic Gradient Descent (2016)
- Fast Algorithms for Robust PCA via Gradient Descent (2016)
- Convergence Analysis for Rectangular Matrix Completion Using Burer-Monteiro Factorization and Gradient Descent (2016)
- Matrix Completion has No Spurious Local Minimum (2016)
- Global Optimality of Local Search for Low Rank Matrix Recovery (2016)
- Guarantees of Riemannian Optimization for Low Rank Matrix Completion (2016)
- A Note on Alternating Minimization Algorithm for the Matrix Completion Problem (2016)
- Recovery guarantee of weighted low-rank approximation via alternating minimization (2016)
- Efficient Matrix Sensing Using Rank-1 Gaussian Measurements (2015)
- Guarantees of Riemannian Optimization for Low Rank Matrix Recovery (2015)
- Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees (2015)
- Low-rank Solutions of Linear Matrix Equations via Procrustes Flow (2015)
- A Convergent Gradient Descent Algorithm for Rank Minimization and Semidefinite Programming from Random Linear Measurements (2015)
- Guaranteed Matrix Completion via Non-convex Factorization (2014)
- Fast Exact Matrix Completion with Finite Samples (2014)
- Non-convex Robust PCA (2014)
- Fast Matrix Completion without the Condition Number (2014)
- Understanding Alternating Minimization for Matrix Completion (2013)
- Low-rank Matrix Completion using Alternating Minimization (2012)
- Matrix Completion from a Few Entries (2009)
- Guaranteed Rank Minimization via Singular Value Projection (2009)

### Tensor Recovery/Decomposition & Hidden Variable Models

- Smoothed Analysis in Unsupervised Learning via Decoupling (2018)
- Recovery Guarantees for Quadratic Tensors with Limited Observations (2018)
- Algorithmic thresholds for tensor PCA (2018)
- Guaranteed Simultaneous Asymmetric Tensor Decomposition via Orthogonalized Alternating Least Squares (2018)
- A theory on the absence of spurious optimality (2018)
- Sparse and Low-rank Tensor Estimation via Cubic Sketchings (2018)
- The landscape of the spiked tensor model (2017)
- Statistically Optimal and Computationally Efficient Low Rank Tensor Completion from Noisy Entries (2017)
- On the Optimization Landscape of Tensor Decompositions (2017)
- Tensor SVD: Statistical and Computational Limits (2017)
- Orthogonalized ALS: A Theoretically Principled Tensor Decomposition Algorithm for Practical Use (2017)
- On Polynomial Time Methods for Exact Low Rank Tensor Completion (2017)
- Homotopy Method for Tensor PCA (2016)
- Low-tubal-rank Tensor Completion using Alternating Minimization (2016)
- Speeding up sum-of-squares for tensor decomposition and planted sparse vectors (2015)
- Tensor vs Matrix Methods: Robust Tensor Decomposition under Block Sparse Perturbations (2015)
- Analyzing Tensor Power Method Dynamics: Applications to Learning Overcomplete Latent Variable Models (2014)
- Tensor decompositions for learning latent variable models (2014)
- Provable Learning of Overcomplete Latent Variable Models: Semi-supervised and Unsupervised Settings (2014)
- Provable Tensor Factorization with Missing Data (2014)
- Guaranteed Non-Orthogonal Tensor Decomposition via Alternating Rank-1 Updates (2014)

### Phase Retrieval

- Spectral Method for Phase Retrieval: an Expectation Propagation Perspective (2019)
- Rigorous Analysis of Spectral Methods for Random Orthogonal Matrices (2019)
- Solving Complex Quadratic Systems with Full-Rank Random Matrices (2019)
- A Generalization of Wirtinger Flow for Exact Interferometric Inversion (2019)
- Phase Retrieval by Alternating Minimization with Random Initialization (2018)
- Optimal Spectral Initialization for Signal Recovery with Applications to Phase Retrieval (2018)
- Towards the optimal construction of a loss function without spurious local minima for solving quadratic equations (2018)
- Solving systems of phaseless equations via Riemannian optimization with optimal sampling complexity (2018)
- Linear Spectral Estimators and an Application to Phase Retrieval (2018)
- Approximate Message Passing for Amplitude Based Optimization (2018)
- Gradient Descent with Random Initialization: Fast Global Convergence for Nonconvex Phase Retrieval (2018)
- Optimization-based AMP for Phase Retrieval: The Impact of Initialization and $\ell_2$-regularization (2018)
- Compressive Phase Retrieval of Structured Signal (2017)
- Misspecified Nonconvex Statistical Optimization for Phase Retrieval (2017)
- Compressive Phase Retrieval via Reweighted Amplitude Flow (2017)
- A Local Analysis of Block Coordinate Descent for Gaussian Phase Retrieval ([
**S**], 2017) - Convolutional Phase Retrieval via Gradient Descent (2017)
- Linear Convergence of An Iterative Phase Retrieval Algorithm with Data Reuse (2017)
- The nonsmooth landscape of phase retrieval (2017)
- Phase Retrieval via Linear Programming: Fundamental Limits and Algorithmic Improvements (2017)
- Convergence of the randomized Kaczmarz method for phase retrieval (2017)
- Phase Retrieval via Randomized Kaczmarz: Theoretical Guarantees (2017)
- Phase retrieval using alternating minimization in a batch setting (2017)
- Solving Almost all Systems of Random Quadratic Equations (2017)
- Phase Retrieval Using Structured Sparsity: A Sample Efficient Algorithmic Framework (2017)
- Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval (2017)
- Robust Wirtinger Flow for Phase Retrieval with Arbitrary Corruption (2017)
- Phase Retrieval via Sparse Wirtinger Flow (2017)
- Structured signal recovery from quadratic measurements: Breaking sample complexity barriers via nonconvex optimization (2017)
- Sparse Phase Retrieval via Truncated Amplitude Flow (2016)
- Solving Large-scale Systems of Random Quadratic Equations via Stochastic Truncated Amplitude Flow (2016)
- Low Rank Phase Retrieval (2016)
- Phase retrieval with random Gaussian sensing vectors by alternating projections (2016)
- Non-Convex Phase Retrieval from STFT Measurements (2016)
- Gauss-Newton Method for Phase Retrieval (2016)
- Phase Retrieval via Incremental Truncated Wirtinger Flow (2016)
- Solving Systems of Random Quadratic Equations via Truncated Amplitude Flow (2016)
- Reshaped Wirtinger Flow for Solving Quadratic Systems of Equations (2016)
- Provable Non-convex Phase Retrieval with Outliers: Median Truncated Wirtinger Flow (2016)
- A Geometric Analysis of Phase Retrieval ([
**S**], 2016) - Phase retrieval for wavelet transforms (2015)
- The Local Convexity of Solving Quadratic Equations (2015)
- Solving systems of phaseless equations via Kaczmarz methods: A proof of concept study (2015)
- Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems (2015)
- Phase Retrieval via Wirtinger Flow: Theory and Algorithms (2014)
- Phase Retrieval using Alternating Minimization (2013)

### Dictionary Learning

(See also **Theory** part in Dictionary/Deep Learning)

- Subgradient Descent Learns Orthogonal Dictionaries ([
**S**], 2018) - Efficient Dictionary Learning with Gradient Descent (2018)
- Towards Learning Sparsely Used Dictionaries with Arbitrary Supports (2018)
- A Provable Approach for Double-Sparse Coding (2017)
- Alternating minimization for dictionary learning with random initialization (2017)
- Complete Dictionary Recovery over the Sphere ([
**S**], 2015) - Simple, Efficient, and Neural Algorithms for Sparse Coding (2015)
- More Algorithms for Provable Dictionary Learning (2014)
- Exact Recovery of Sparsely Used Overcomplete Dictionaries (2013)
- New Algorithms for Learning Incoherent and Overcomplete Dictionaries (2013)
- Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization (2013)

### Deep Learning

- Mean Field Analysis of Deep Neural Networks (2019)
- Stochastic Gradient Descent for Nonconvex Learning without Bounded Gradient Assumptions (2019)
- A Generalization Theory of Gradient Descent for Learning Over-parameterized Deep ReLU Networks (2019)
- Can SGD Learn Recurrent Neural Networks with Provable Generalization? (2019)
- Width Provably Matters in Optimization for Deep Linear Neural Networks (2019)
- Elimination of All Bad Local Minima in Deep Learning (2019)
- Over-Parameterized Deep Neural Networks Have No Strict Local Minima For Any Continuous Activations (2018)
- Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks (2018)
- Effect of Depth and Width on Local Minima in Deep Learning (2018)
- A Convergence Theory for Deep Learning via Over-Parameterization (2018)
- Gradient Descent Finds Global Minima of Deep Neural Networks (2018)
- Depth with Nonlinearity Creates No Bad Local Minima in ResNets (2018)
- A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks (2018)
- Gradient descent aligns the layers of deep linear networks (2018)
- Adding One Neuron Can Eliminate All Bad Local Minima (2018)
- End-to-end Learning of a Convolutional Neural Network via Deep Tensor Decomposition (2018)
- Understanding the Loss Surface of Neural Networks for Binary Classification (2018)
- Gradient descent with identity initialization efficiently learns positive definite linear transformations by deep residual networks (2018)
- Deep linear neural networks with arbitrary loss: All local minima are global (2017)
- Global optimality conditions for deep neural networks (2017)
- Depth Creates No Bad Local Minima (2017)
- Electron-Proton Dynamics in Deep Learning (2017)
- Deep Learning without Poor Local Minima (2016)
- No bad local minima: Data independent training error guarantees for multilayer neural networks (2016)

### Sparse Vectors in Linear Subspaces

(See Structured Element Pursuit )

### Nonnegative/Sparse Principal Component Analysis

- Recovery Guarantee of Non-negative Matrix Factorization via Alternating Updates (2016)
- Non-negative Principal Component Analysis: Message Passing Algorithms and Sharp Asymptotics (2014)

### Mixed Linear Regression

- Iteratively Learning from the Best (2018)
- Global Convergence of EM Algorithm for Mixtures of Two Component Linear Regression (2018)
- Solving a Mixture of Many Random Linear Equations by Tensor Decomposition and Alternating Minimization (2016)
- Provable Tensor Methods for Learning Mixtures of Classifiers (2014)
- Alternating Minimization for Mixed Linear Regression (2013)

### Blind Deconvolution/Calibration

- On the Global Geometry of Sphere-Constrained Sparse Blind Deconvolution (2019)
- Composite optimization for robust blind deconvolution (2019)
- Geometry and Symmetry in Short-and-Sparse Deconvolution (2019)
- Nonconvex Demixing From Bilinear Measurements (2018)
- Structured Local Optima in Sparse Blind Deconvolution (2018)
- Global Geometry of Multichannel Sparse Blind Deconvolution on the Sphere (2018)
- Blind Gain and Phase Calibration via Sparse Spectral Methods (2017)
- Blind Deconvolution by a Steepest Descent Algorithm on a Quotient Manifold (2017)
- Regularized Gradient Descent: A Nonconvex Recipe for Fast Joint Blind Deconvolution and Demixing (2017)
- Self-Calibration via Linear Least Squares (2016)
- Through the Haze: A Non-Convex Approach to Blind Calibration for Linear Random Sensing Models (2016)
- Fast and guaranteed blind multichannel deconvolution under a bilinear system model (2016)
- Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization (2016)
- A Non-Convex Blind Calibration Method for Randomised Sensing Strategies (2016)
- RIP-like Properties in Subsampled Blind Deconvolution (2015)
- Blind Recovery of Sparse Signals from Subsampled Convolution (2015)
- Near Optimal Compressed Sensing of Sparse Rank-One Matrices via Sparse Power Factorization (2013)

### Super Resolution

- The basins of attraction of the global minimizers of the non-convex sparse spikes estimation problem (2018)
- Greed is Super: A Fast Algorithm for Super-Resolution (2015)

### Synchronization Problems/Community Detection

- Multi-Frequency Phase Synchronization (2019)
- On the Landscape of Synchronization Networks: A Perspective from Nonconvex Optimization (2018)
- Near-optimal bounds for phase synchronization (2017)
- Message-passing algorithms for synchronization problems over compact groups (2016)
- On the low-rank approach for semidefinite programs arising in synchronization and community detection (2016)
- Nonconvex phase synchronization (2016)

### Joint Alignment

- The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences (2016)

### Numerical Linear Algebra

- On Landscape of Lagrangian Functions and Stochastic Search for Constrained Nonconvex Optimization (2018)
- PCA by Optimisation of Symmetric Functions has no Spurious Local Optima (2018)
- PCA by Determinant Optimization has no Spurious Local Optima (2018)
- Orbital minimization method with $\ell_1$ regularization (2017)
- The Global Optimization Geometry of Nonsymmetric Matrix Factorization and Sensing (2017)
- On the matrix square root via geometric optimization (2015)
- Computing Matrix Squareroot via Non Convex Local Search (2015)

### Bayesian Inference

### Empirical Risk Minimization & Shallow Networks

- Global Convergence of Adaptive Gradient Methods for An Over-parameterized Neural Network (2019)
- Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit (2019)
- Towards moderate overparameterization: global convergence guarantees for training shallow neural networks (2019)
- Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks (2019)
- A Deterministic Approach to Avoid Saddle Points (2019)
- Fitting ReLUs via SGD and Quantized SGD (2019)
- Analysis of a Two-Layer Neural Network via Displacement Convexity (2019)
- Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path? (2018)
- A Provably Convergent Scheme for Compressive Sensing under Random Generative Priors (2018)
- Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers (2018)
- Learning Two Layer Rectified Neural Networks in Polynomial Time (2018)
- Uniform Convergence of Gradients for Non-Convex Learning and Optimization (2018)
- On the Convergence Rate of Training Recurrent Neural Networks (2018)
- Learning Two-layer Neural Networks with Symmetric Inputs (2018)
- Algorithmic Aspects of Inverse Problems Using Generative Models (2018)
- ReLU Regression: Complexity, Exact and Approximation Algorithms (2018)
- Gradient Descent Provably Optimizes Over-parameterized Neural Networks (2018)
- Stochastic Gradient Descent Learns State Equations with Nonlinear Activations (2018)
- Learning ReLU Networks on Linearly Separable Data: Algorithm, Optimality, and Generalization (2018)
- Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data (2018)
- Learning ReLU Networks via Alternating Minimization (2018)
- Learning One-hidden-layer ReLU Networks via Gradient Descent (2018)
- Deep Denoising: Rate-Optimal Recovery of Structured Signals with a Deep Prior (2018)
- Improved Learning of One-hidden-layer Convolutional Neural Networks with Overlaps (2018)
- The Global Optimization Geometry of Shallow Linear Neural Networks (2018)
- Polynomial Convergence of Gradient Descent for Training One-Hidden-Layer Neural Networks (2018)
- A Mean Field View of the Landscape of Two-Layers Neural Networks (2018)
- Representing smooth functions as compositions of near-identity functions with implications for deep network optimization (2018)
- SUNLayer: Stable denoising with generative networks (2018)
- Representation Learning and Recovery in the ReLU Model (2018)
- Learning Deep Models: Critical Points and Local Openness (2018)
- On the Power of Over-parametrization in Neural Networks with Quadratic Activation (2018)
- On the Connection Between Learning Two-Layers Neural Networks and Tensor Decomposition (2018)
- Local Geometry of One-Hidden-Layer Neural Networks for Logistic Regression (2018)
- A Critical View of Global Optimality in Deep Learning (2018)
- The Multilinear Structure of ReLU Networks (2017)
- Spurious Local Minima are Common in Two-Layer ReLU Neural Networks (2017)
- Gradient Descent Learns One-hidden-layer CNN: Don’t be Afraid of Spurious Local Minima (2017)
- Learning Non-overlapping Convolutional Neural Networks with Multiple Kernels (2017)
- Learning One-hidden-layer Neural Networks with Landscape Design (2017)
- Theoretical properties of the global optimizer of two layer neural network (2017)
- Critical Points of Neural Networks: Analytical Forms and Landscape Properties (2017)
- SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data (2017)
- Porcupine Neural Networks: (Almost) All Local Optima are Global (2017)
- When is a Convolutional Filter Easy To Learn? (2017)
- Theoretical insights into the optimization landscape of over-parameterized shallow neural networks (2017)
- Recovery Guarantees for One-hidden-layer Neural Networks (2017)
- Convergence Analysis of Two-layer Neural Networks with ReLU Activation (2017)
- Global Guarantees for Enforcing Deep Generative Priors by Empirical Risk (2017)
- Learning ReLUs via Gradient Descent (2017)
- Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs (2017)
- Gradient Descent Learns Linear Dynamical Systems (2016)
- The Landscape of Empirical Risk for Non-convex Losses (2016)

### System Identification

### Burer-Monteiro Style Decomposition Algorithms

- Rank optimality for the Burer-Monteiro factorization (2018)
- Smoothed analysis of the low-rank approach for smooth semidefinite programs (2018)
- Deterministic guarantees for Burer-Monteiro factorizations of smooth semidefinite programs (2018)
- Smoothed analysis for low-rank solutions to semidefinite programs in quadratic penalty form (2018)
- Solving SDPs for synchronization and MaxCut problems via the Grothendieck inequality (2017)
- The Nonconvex Geometry of Low-Rank Matrix Optimizations with General Objective Functions (2016)
- The non-convex Burer-Monteiro approach works on smooth semidefinite programs (2016)
- On the low-rank approach for semidefinite programs arising in synchronization and community detection (2016)
- Global Convergence of Stochastic Gradient Descent for Some Nonconvex Matrix Problems (2014)

### Generic Structured Problems

- Run-and-Inspect Method for Nonconvex Optimization and Global Optimality Bounds for R-Local Minimizers (2018)
- Convex Optimization with Nonconvex Oracles (2017)

### Nonconvex Feasibility Problems

- Finding magic squares with the Douglas-Rachford algorithm (2019)
- A convergent relaxation of the Douglas-Rachford algorithm (2017)
- A Lyapunov-type approach to convergence of the Douglas-Rachford algorithm (2017)
- Feasibility Problems: Douglas-Rachford and Projection Methods (Project Page)
- Dynamics of the Douglas-Rachford Method for Ellipses and p-Spheres (2016)
- A remark on the convergence of the Douglas-Rachford iteration in a non-convex setting (2016)
- On the finite convergence of the Douglas-Rachford algorithm for solving (not necessarily convex) feasibility problems in Euclidean spaces (2016)
- Global Behavior of the Douglas-Rachford Method for a Nonconvex Feasibility Problem (2015)
- The Douglas–Rachford Algorithm in the Absence of Convexity (2011)

## Of Statistical Nature …

- Sparse Tensorraphical Model: Non-convex Optimization and Statistical Inference (2016)
- Statistical and Computational Guarantees for the Baum-Welch Algorithm (2015)
- Provable Sparse Tensor Decomposition (2015)
- Statistical consistency and asymptotic normality for high-dimensional robust M-estimators (2015)
- Support recovery without incoherence: A case for nonconvex regularizations (2014)
- High Dimensional Expectation-Maximization Algorithm: Statistical Optimization and Asymptotic Normality (2014)
- Statistical guarantees for the EM algorithm: From population to sample-based analysis (2014)
- Nonconvex Statistical Optimization: Minimax-Optimal Sparse PCA in Polynomial Time (2014)
- Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local ptima (2013)
- High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity (2011)

## Relevant Optimization Methods, Theory, Miscs

- Limited-Memory BFGS with Displacement Aggregation (2019)
- Proximal algorithms for constrained composite optimization, with applications to solving low-rank SDPs (2019)
- SGD without Replacement: Sharper Rates for General Smooth Convex Functions (2019)
- A Stochastic Trust Region Method for Non-convex Minimization (2019)
- Escaping Saddle Points with the Successive Convex Approximation Algorithm (2019)
- Inertial Block Mirror Descent Method for Non-Convex Non-Smooth Optimization (2019)
- Minimization of nonsmooth nonconvex functions using inexact evaluations and its worst-case complexity (2019)
- High-Order Evaluation Complexity for Convexly-Constrained Optimization with Non-Lipschitzian Group Sparsity Terms (2019)
- Analysis of the alternating direction method of multipliers for nonconvex problems (2019)
- Stochastic Proximal Gradient Methods for Non-smooth Non-Convex Regularized Problems (2019)
- Stochastic Gradient Descent Escapes Saddle Points Efficiently (2019)
- Adaptive stochastic gradient algorithms on Riemannian manifolds (2019)
- Inexact restoration with subsampled trust-region methods for finite-sum minimization (2019)
- Exponentiated Gradient Meets Gradient Descent (2019)
- A Convergence Analysis of Nonlinearly Constrained ADMM in Deep Learning (2019)
- Momentum Schemes with Stochastic Variance Reduction for Nonconvex Composite Optimization (2019)
- Sharp Analysis for Nonconvex SGD Escaping from Saddle Points (2019)
- Passed \& Spurious: analysing descent algorithms and local minima in spiked matrix-tensor model (2019)
- Stochastic Recursive Variance-Reduced Cubic Regularization Methods (2019)
- Lower Bounds for Smooth Nonconvex Finite-Sum Optimization (2019)
- Perturbed Proximal Descent to Escape Saddle Points for Non-convex and Non-smooth Objective Functions (2019)
- Escaping Saddle Points with Adaptive Gradient Methods (2019)
- Simple algorithms for optimization on Riemannian manifolds with constraints (2019)
- On the Convergence of (Stochastic) Gradient Descent with Extrapolation for Non-Convex Optimization (2019)
- Fast Gradient Methods for Symmetric Nonnegative Matrix Factorization (2019)
- An accelerated variant of simulated annealing that converges under fast cooling (2019)
- Cheap Orthogonal Constraints in Neural Networks: A Simple Parametrization of the Orthogonal and Unitary Group (2019)
- A Unified Analysis of Extra-gradient and Optimistic Gradient Methods for Saddle Point Problems: Proximal Point Approach (2019)
- Stochastic Gradient Methods for Non-Smooth Non-Convex Regularized Optimization (2019)
- DTN: A Learning Rate Scheme with Convergence Rate of $O(1/t)$ for SGD (2019)
- Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization (2019)
- Primal-dual proximal splitting and generalized conjugation in non-smooth non-convex optimization (2019)
- A Proximal Alternating Direction Method of Multiplier for Linearly Constrained Nonconvex Minimization (2018)
- Semi-Riemannian Manifold Optimization (2018)
- Breaking Reversibility Accelerates Langevin Dynamics for Global Non-Convex Optimization (2018)
- Solving Non-Convex Non-Concave Min-Max Games Under Polyak-Łojasiewicz Condition (2018)
- A Doubly Accelerated Inexact Proximal Point Method for Nonconvex Composite Optimization Problems (2018)
- Convergence Analysis of the Relaxed Douglas-Rachford Algorithm (2018)
- Adaptive Stochastic Variance Reduction for Subsampled Newton Method with Cubic Regularization (2018)
- Stochastic Optimization for DC Functions and Non-smooth Non-convex Regularizers with Non-asymptotic Convergence (2018)
- Markov Chain Block Coordinate Descent (2018)
- Faster First-Order Methods for Stochastic Non-Convex Optimization on Riemannian Manifolds (2018)
- Universal regularization methods - varying the power, the smoothness and the accuracy (2018)
- Sampling Can Be Faster Than Optimization (2018)
- Blind Over-the-Air Computation and Data Fusion via Provable Wirtinger Flow (2018)
- Deterministic and stochastic inexact regularization algorithms for nonconvex optimization with optimal complexity (2018)
- R-SPIDER: A Fast Riemannian Stochastic Optimization Algorithm with Curvature Independent Rate (2018)
- Proximal Gradient Method for Manifold Optimization (2018)
- Sharp worst-case evaluation complexity bounds for arbitrary-order nonconvex optimization with inexpensive constraints (2018)
- Inexact alternating projections on nonconvex sets (2018)
- On exponential convergence of SGD in non-convex over-parametrized learning (2018)
- A Stochastic Penalty Model for Convex and Nonconvex Optimization with Big Constraints (2018)
- Provably Correct Automatic Subdifferentiation for Qualified Programs (2018)
- Global Non-convex Optimization with Discretized Diffusions (2018)
- Benefits of over-parameterization with EM (2018)
- Newton method for finding a singularity of a special class of locally Lipschitz continuous vector fields on Riemannian manifolds (2018)
- Inexact Newton method with feasible inexact projections for solving constrained smooth and nonsmooth equations (2018)
- Solving Weakly-Convex-Weakly-Concave Saddle-Point Problems as Weakly-Monotone Variational Inequality (2018)
- SpiderBoost: A Class of Faster Variance-reduced Algorithms for Nonconvex Optimization (2018)
- Uniform Graphical Convergence of Subgradients in Nonconvex Optimization and Learning (2018)
- A Subsampling Line-Search Method with Second-Order Results (2018)
- Optimization on Spheres: Models and Proximal Algorithms with Computational Performance Comparisons (2018)
- Analytical Convergence Regions of Accelerated First-Order Methods in Nonconvex Optimization under Regularity Condition (2018)
- Cubic Regularization with Momentum for Nonconvex Optimization (2018)
- Parallelizable Algorithms for Optimization Problems with Orthogonality Constraints (2018)
- Optimal Adaptive and Accelerated Stochastic Gradient Descent (2018)
- Riemannian Adaptive Optimization Methods (2018)
- Newton-MR: Newton’s Method Without Smoothness or Convexity (2018)
- Convergence to Second-Order Stationarity for Constrained Non-Convex Optimization (2018)
- Non-Convex Min-Max Optimization: Provable Algorithms and Applications in Machine Learning (2018)
- Hessian barrier algorithms for linearly constrained optimization problems (2018)
- Stochastic Second-order Methods for Non-convex Optimization with Inexact Hessian and Gradient (2018)
- An Inexact First-order Method for Constrained Nonlinear Optimization (2018)
- Survey: Sixty Years of Douglas–Rachford (2018)
- On the Stability and Convergence of Stochastic Gradient Descent with Momentum (2018)
- Zeroth-order (Non)-Convex Stochastic Optimization via Conditional Gradient and Gradient Updates (2018)
- Optimization with Non-Differentiable Constraints with Applications to Fairness, Recall, Churn, and Other Goals (2018)
- Escaping Saddle Points in Constrained Optimization (2018)
- Primal-dual accelerated gradient descent with line search for convex and nonconvex optimization problems (2018)
- Secondary gradient descent in higher codimension (2018)
- On Markov Chain Gradient Descent (2018)
- Global Convergence of Stochastic Gradient Hamiltonian Monte Carlo for Non-Convex Stochastic Optimization: Non-Asymptotic Performance Bounds and Momentum-Based Acceleration (2018)
- Structured Quasi-Newton Methods for Optimization with Orthogonality Constraints (2018)
- On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization (2018)
- Theoretical study of an adaptive cubic regularization method with dynamic inexact Hessian information (2018)
- Iteration-Complexity of the Subgradient Method on Riemannian Manifolds with Lower Bounded Curvature (2018)
- Convergence of Cubic Regularization for Nonconvex Optimization under KL Property (2018)
- A Note on Inexact Condition for Cubic Regularized Newton’s Method (2018)
- Discrete gradient descent differs qualitatively from gradient flow (2018)
- Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems (2018)
- On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization (2018)
- A linear-time algorithm for generalized trust region problems (2018)
- A geometric integration approach to nonsmooth, nonconvex optimisation (2018)
- Convergence Rate of Block-Coordinate Maximization Burer-Monteiro Method for Solving Large SDPs (2018)
- SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator (2018)
- Geodesic Convex Optimization: Differentiation on Manifolds, Geodesics, and Convexity (2018)
- On the Convergence Rate of Stochastic Mirror Descent for Nonsmooth Nonconvex Optimization (2018)
- AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization (2018)
- Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced (2018)
- Gradient Method for Optimization on Riemannian Manifolds with Lower Bounded Curvature (2018)
- Towards Riemannian Accelerated Gradient Methods (2018)
- Adaptive regularization with cubics on manifolds with a first-order analysis (2018)
- Minimizing Nonconvex Population Risk from Rough Empirical Risk (2018)
- On the Connection Between Sequential Quadratic Programming and Riemannian Gradient Methods (2018)
- Cutting plane methods can be extended into nonconvex optimization (2018)
- Stochastic Gradient Descent for Stochastic Doubly-Nonconvex Composite Optimization (2018)
- Adaptive Stochastic Gradient Langevin Dynamics: Taming Convergence and Saddle Point Escape Time (2018)
- A geometric integration approach to smooth optimisation: Foundations of the discrete gradient method (2018)
- A Cubic Regularized Newton’s Method over Riemannian Manifolds (2018)
- Accelerated Stochastic Algorithms for Nonconvex Finite-sum and Multi-block Optimization (2018)
- Local Saddle Point Optimization: A Curvature Exploitation Approach (2018)
- Gradient Sampling Methods for Nonsmooth Optimization (2018)
- Stochastic subgradient method converges on tame functions (2018)
- An Envelope for Davis-Yin Splitting and Strict Saddle Point Avoidance (2018)
- Convergence guarantees for a class of non-convex and non-smooth optimization problems (2018)
- On the spherical quasi-convexity of quadratic functions (2018)
- Operator Scaling via Geodesically Convex Optimization, Invariant Theory and Polynomial Identity Testing (2018)
- Lower error bounds for the stochastic gradient descent optimization algorithm: Sharp convergence rates for slowly and fast decaying learning rates (2018)
- A Riemannian BFGS Method Without Differentiated Retraction for Nonconvex Optimization Problems (2018)
- Nonconvex weak sharp minima on Riemannian manifolds (2018)
- A Newton-CG Algorithm with Complexity Guarantees for Smooth Unconstrained Optimization (2018)
- Continuous Relaxation of MAP Inference: A Nonconvex Perspective (2018)
- On the Sublinear Convergence of Randomly Perturbed Alternating Gradient Descent to Second Order Stationary Solutions (2018)
- ADMM for Multiaffine Constrained Optimization (2018)
- Averaging Stochastic Gradient Descent on Riemannian Manifolds (2018)
- Gradient Primal-Dual Algorithm Converges to Second-Order Stationary Solutions for Nonconvex Distributed Optimization (2018)
- Concise Complexity Analyses for Trust-Region Methods (2018)
- Stochastic subgradient method converges at the rate $O(k^{-1/4})$ on weakly convex functions (2018)
- On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization (2018)
- Characterizing Implicit Bias in Terms of Optimization Geometry (2018)
- Local Optimality and Generalization Guarantees for the Langevin Algorithm via Empirical Metastability (2018)
- Neural Networks with Finite Intrinsic Dimension have no Spurious Valleys (2018)
- Sample Complexity of Stochastic Variance-Reduced Cubic Regularization for Nonconvex Optimization (2018)
- An Alternative View: When Does SGD Escape Local Minima? (2018)
- Toward Deeper Understanding of Nonconvex Stochastic Optimization with Momentum using Diffusion Approximations (2018)
- Convergence Analysis of Alternating Nonconvex Projections (2018)
- Manifold Optimization Over the Set of Doubly Stochastic Matrices: A Second-Order Geometry (2018)
- Exact Semidefinite Formulations for a Class of (Random and Non-Random) Nonconvex Quadratic Programs (2018)
- How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization (2018)
- On the Quadratic Convergence of the Cubic Regularization Method under a Local Error Bound Condition (2018)
- The proximal alternating direction method of multipliers in the nonconvex setting: convergence analysis and rates (2018)
- Nonconvex Lagrangian-Based Optimization: Monitoring Schemes and Global Convergence (2018)
- Third-order Smoothness Helps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima (2017)
- Saving Gradient and Negative Curvature Computations: Finding Local Minima More Efficiently (2017)
- NEON+: Accelerated Gradient Methods for Extracting Negative Curvature for Non-Convex Optimization (2017)
- Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent (2017)
- Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion and Blind Deconvolution (2017)
- Neon2: Finding Local Minima via First-Order Oracles (2017)
- Run-and-Inspect Method for Nonconvex Optimization and Global Optimality Bounds for R-Local Minimizers (2017)
- Revisiting Normalized Gradient Descent: Evasion of Saddle Points (2017)
- Stochastic Cubic Regularization for Fast Nonconvex Optimization (2017)
- Convex Optimization with Nonconvex Oracles (2017)
- First-order Stochastic Algorithms for Escaping From Saddle Points in Almost Linear Time (2017)
- On local non-global minimizers of quadratic functions with cubic regularization (2017)
- Frank-Wolfe methods for geodesically convex optimization with application to the matrix geometric mean (2017)
- Lower Bounds for Higher-Order Convex Optimization (2017)
- Lower Bounds for Finding Stationary Points II: First-Order Methods (2017)
- Lower Bounds for Finding Stationary Points I (2017)
- Stochastic Non-convex Optimization with Strong High Probability Second-order Convergence (2017)
- Block Coordinate Descent Only Converge to Minimizers (2017)
- First-order Methods Almost Always Avoid Saddle Points (2017)
- Accelerated Block Coordinate Proximal Gradients with Applications in High Dimensional Statistics (2017)
- The power of sum-of-squares for detecting hidden structures (2017)
- Primal-Dual Optimization Algorithms over Riemannian Manifolds: an Iteration Complexity Analysis (2017)
- On Noisy Negative Curvature Descent: Competing with Gradient Descent for Faster Non-convex Optimization (2017)
- Worst-case evaluation complexity and optimality of second-order methods for nonconvex smooth optimization (2017)
- Douglas-Rachford splitting and ADMM for nonconvex optimization: new convergence results and accelerated versions (2017)
- Local Minimizers and Second-Order Conditions in Composite Piecewise Programming via Directional Derivatives (2017)
- Alternating minimization and alternating descent over nonconvex sets (2017)
- Second-Order Optimization for Non-Convex Machine Learning: An Empirical Study (2017)
- Newton-Type Methods for Non-Convex Optimization Under Inexact Hessian Information (2017)
- Non-convex Conditional Gradient Sliding (2017)
- Improved second-order evaluation complexity for unconstrained nonlinear optimization using high-order regularized models (2017)
- An Inexact Regularized Newton Framework with a Worst-Case Iteration Complexity of $O(\varepsilon^{-3/2})$ for Nonconvex Optimization (2017)
- A Second Order Method for Nonconvex Optimization (2017)
- Behavior of Accelerated Gradient Methods Near Critical Points of Nonconvex Problems (2017)
- Mirror descent in non-convex stochastic programming (2017)
- Complexity analysis of second-order line-search algorithms for smooth nonconvex optimization (2017)
- Provable Alternating Gradient Descent for Non-negative Matrix Factorization with Strong Correlations (2017)
- An Alternative to EM for Gaussian Mixture Models: Batch and Stochastic Riemannian Optimization (2017)
- Using Negative Curvature in Solving Nonlinear Programs (2017)
- Gradient Descent Can Take Exponential Time to Escape Saddle Points (2017)
- Optimality of orders one to three and beyond: characterization and evaluation complexity in constrained nonconvex optimization (2017)
- Sub-sampled Cubic Regularization for Non-convex Optimization (2017)
- Iteration-Complexity of Gradient, Subgradient and Proximal Point Methods on Riemannian Manifolds (2017)
- Linearized ADMM for Non-convex Non-smooth Optimization with Convergence Analysis (2017)
- “Convex Until Proven Guilty”: Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions (2017)
- Accelerating Stochastic Gradient Descent (2017)
- On the Gap Between Strict-Saddles and True Convexity: An $\Omega(\log d)$ Lower Bound for Eigenvector Approximation (2017)
- Catalyst Acceleration for Gradient-Based Non-Convex Optimization (2017)
- Perspective: Energy Landscapes for Machine Learning (2017)
- Gradient descent with nonconvex constraints: local concavity determines convergence (2017)
- Riemannian stochastic quasi-Newton algorithm with variance reduction and its convergence analysis (2017)
- How to Escape Saddle Points Efficiently (2017)
- Convergence rate of a simulated annealing algorithm with noisy observations (2017)
- Exploiting Negative Curvature in Deterministic and Stochastic Optimization (2017)
- Online Multiview Representation Learning: Dropping Convexity for Better Efficiency (2017)
- Natasha: Faster Stochastic Non-Convex Optimization via Strongly Non-Convex Parameter (2017)
- Phase Transitions of Spectral Initialization for High-Dimensional Nonconvex Estimation (2017)
- Maximum likelihood estimation of determinantal point processes (2017)
- Fast Rates for Empirical Risk Minimization of Strict Saddle Problems (2017)
- Gradient Descent Efficiently Finds the Cubic-Regularized Non-Convex Newton Step (2016)
- The Power of Normalization: Faster Evasion of Saddle Points (2016)
- Accelerated Methods for Non-Convex Optimization (2016)
- Worst-case evaluation complexity for unconstrained nonlinear optimization using high-order regularized models (2016)
- Finding Local Minima for Nonconvex Optimization in Linear Time (2016)
- Convergence Rate Analysis of a Stochastic Trust Region Method for Nonconvex Optimization (2016)
- Global analysis of Expectation Maximization for mixtures of two Gaussians (2016)
- Local Maxima in the Likelihood of Gaussian Mixture Models: Structural Results and Algorithmic Consequences (2016)
- Local Convergence of the Heavy-ball Method and iPiano for Non-convex Optimization (2016)
- Global rates of convergence for nonconvex optimization on manifolds (2016)
- Gradient Descent Converges to Minimizers: The Case of Non-Isolated Critical Points (2016)
- On the Douglas-Rachford algorithm (2016)
- A Grothendieck-type inequality for local maxima (2016)
- First-order Methods for Geodesically Convex Optimization (2016)
- Efficient approaches for escaping higher order saddle points in non-convex optimization (2016)
- Gradient Descent Converges to Minimizers (2016)
- When Are Nonconvex Problems Not Scary? ([
**S**], 2015; see also my thesis) - On the Global Optimality for Linear Constrained Rank Minimization Problem (2015)
- Global Convergence of ADMM in Nonconvex Nonsmooth Optimization (2015)
- Dropping Convexity for Faster Semi-definite Optimization (2015)
- Local Linear Convergence of the ADMM/Douglas–Rachford Algorithms without Strong Convexity and Application to Statistical Imaging (2015)
- Escaping the Local Minima via Simulated Annealing: Optimization of Approximately Convex Functions (2015)
- Escaping From Saddle Points — Online Stochastic Gradient for Tensor Decomposition (2015)
- Peaceman-Rachford splitting for a class of nonconvex optimization problems (2015)
- Douglas-Rachford splitting for nonconvex optimization with application to nonconvex feasibility problems (2014)
- Global convergence of splitting methods for nonconvex composite optimization (2014)
- Accelerated Gradient Methods for Nonconvex Nonlinear and Stochastic Programming (2013)
- Proximal alternating linearized minimization for nonconvex and nonsmooth problems (2013)
- Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods (2013)
- Optimality conditions for the nonlinear programming problems on Riemannian manifolds (2012)
- Second-order negative-curvature methods for box-constrained and general constrained optimization (2010)
- Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality (2008)
- Cubic regularization of Newton method and its global performance (2006)
- Computing the Local-Nonglobal Minimizer of a Large Scale Trust-Region Subproblem (2005)
- On Some Properties of Quadratic Programs with a Convex Quadratic Constraint (1998)
- Local minimizers of quadratic functions on Euclidean balls and spheres (1994)

Disclaimer- This page is meant to serve a hub for references on this problem, and does not represent in any way personal endorsement of papers listed here. So I do not hold any responsibility for quality and technical correctness of each paper listed here. The reader is advised to use this resource with discretion.

If you’d like your paper to be listed here- Just drop me a few lines via email (which can be found on “Welcome” page). If you don’t bother to spend a word, just deposit your paper on arXiv. I get email alert about new animals there every morning, and will be happy to hunt one for this zoo if it seemsfit.

Special thanks to: Damek Davis, Wotao Yin, Vladislav Voroninski, David Martinez