# Provable Nonconvex Methods/Algorithms

General nonconvex optimization is undoubtedly hard — in sharp contrast to convex optimization, of which there is good separation of problem structure, input data, and optimization algorithms. But many nonconvex problems of interest become amenable to simple and practical algorithms and rigorous analyses once the artificial separation is removed. This page collects recent research effort in this line. (**Update: Mar 26 2020**)

[**S**] indicates my contribution.

[**New**] A BibTex file for papers listed on the page can be downloaded HERE!

## Contents

- Review Articles
- Problems with Hidden Convexity or Analytic Solutions
- Problems with Provable Global Results
- Matrix Completion/Sensing
- Tensor Recovery/Decomposition & Hidden Variable Models
- Phase Retrieval
- Dictionary Learning
- Deep Learning
- Sparse Vectors in Linear Subspaces
- Nonnegative/Sparse Principal Component Analysis
- Mixed Linear Regression
- Blind Deconvolution/Calibration
- Super Resolution
- Synchronization Problems/Community Detection
- Joint Alignment
- Numerical Linear Algebra
- Bayesian Inference
- Empirical Risk Minimization & Shallow Networks
- System Identification
- Burer-Monteiro Style Decomposition Algorithms
- Generic Structured Problems
- Nonconvex Feasibility Problems

- Of Statistical Nature …
- Relevant Optimization Methods, Theory, Miscs

## Review Articles

- Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview (2018)
- Harnessing Structures in Big Data via Guaranteed Low-Rank Matrix Estimation (2018)
- Non-convex Optimization for Machine Learning (2017)

## Problems with Hidden Convexity or Analytic Solutions

- These slides summarize lots of them.

### Blind Deconvolution

### Separable Nonnegative Matrix Factorization (NMF)

- Intersecting Faces: Non-negative Matrix Factorization With New Guarantees (2015)
- The why and how of nonnegative matrix factorization (2014)
- Computing a Nonnegative Matrix Factorization – Provably (2011)

## Problems with Provable Global Results

### Matrix Completion/Sensing

(See also low-rank matrix/tensor recovery )

- The Global Geometry of Centralized and Distributed Low-rank Matrix Recovery without Regularization (2020)
- The Landscape of Matrix Factorization Revisited (2020)
- Iterative algorithm with structured diagonal Hessian approximation for solving nonlinear least squares problems (2020)
- Rank $2r$ iterative least squares: efficient recovery of ill-conditioned low rank matrices from few entries (2020)
- Bridging Convex and Nonconvex Optimization in Robust PCA: Noise, Outliers, and Missing Data (2020)
- Adversarially Robust Low Dimensional Representations (2019)
- Fast and Robust Spectrally Sparse Signal Recovery: A Provable Non-Convex Approach via Robust Low-Rank Hankel Matrix Reconstruction (2019)
- Low-rank matrix recovery with composite optimization: good conditioning and rapid convergence (2019)
- Robust Subspace Recovery with Adversarial Outliers (2019)
- Noisy Matrix Completion: Understanding Statistical Guarantees for Convex Relaxation via Nonconvex Optimization (2019)
- Nonconvex Rectangular Matrix Completion via Gradient Descent without $\ell_{2, \infty}$ Regularization (2019)
- Sharp Restricted Isometry Bounds for the Inexistence of Spurious Local Minima in Nonconvex Matrix Recovery (2019)
- Exact Guarantees on the Absence of Spurious Local Minima for Non-negative Robust Principal Component Analysis (2018)
- An equivalence between stationary points for rank constraints versus low-rank factorizations (2018)
- Iterative Hard Thresholding for Low-Rank Recovery from Rank-One Projections (2018)
- Nonconvex Robust Low-rank Matrix Recovery (2018)
- Run Procrustes, Run! On the convergence of accelerated Procrustes Flow (2018)
- Solving Systems of Quadratic Equations via Exponential-type Gradient Descent Algorithm (2018)
- How Much Restricted Isometry is Needed In Nonconvex Matrix Recovery? (2018)
- The Leave-one-out Approach for Matrix Completion: Primal and Dual Analysis (2018)
- Nonconvex Matrix Factorization from Rank-One Measurements (2018)
- Algorithmic Regularization in Over-parameterized Matrix Recovery (2017)
- Accelerated Alternating Projections for Robust Principal Component Analysis (2017)
- Provable quantum state tomography via non-convex methods (2017)
- Memory-efficient Kernel PCA via Partial Matrix Sampling and Nonconvex Optimization: a Model-free Analysis of Local Minima (2017)
- Nonconvex Low-Rank Matrix Recovery with Arbitrary Outliers via Median-Truncated Gradient Descent (2017)
- Robust Principal Component Analysis by Manifold Optimization (2017)
- Spectral Compressed Sensing via Projected Gradient Descent (2017)
- Reexamining Low Rank Matrix Factorization for Trace Norm Regularization (2017)
- A Well-Tempered Landscape for Non-convex Robust Subspace Recovery (2017)
- Optimal Sample Complexity for Matrix Completion and Related Problems via $\ell_2$-Regularization (2017)
- Geometry of Factored Nuclear Norm Regularization (2017)
- No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis (2017)
- Painless Breakups – Efficient Demixing of Low Rank Matrices (2017)
- Speeding Up Latent Variable Gaussian Graphical Model Estimation via Nonconvex Optimizations (2017)
- Global Optimality in Low-rank Matrix Optimization (2017)
- A Nonconvex Free Lunch for Low-Rank plus Sparse Matrix Recovery (2017)
- Symmetry, Saddle Points, and Global Geometry of Nonconvex Matrix Factorization (2016)
- Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach (2016)
- Nearly-optimal Robust Matrix Completion (2016)
- Provable non-convex projected gradient descent for a class of constrained matrix optimization problems (2016)
- Finding Low-rank Solutions to Matrix Problems, Efficiently and Provably (2016)
- Provable Efficient Online Matrix Completion via Non-convex Stochastic Gradient Descent (2016)
- Fast Algorithms for Robust PCA via Gradient Descent (2016)
- Convergence Analysis for Rectangular Matrix Completion Using Burer-Monteiro Factorization and Gradient Descent (2016)
- Matrix Completion has No Spurious Local Minimum (2016)
- Global Optimality of Local Search for Low Rank Matrix Recovery (2016)
- Guarantees of Riemannian Optimization for Low Rank Matrix Completion (2016)
- A Note on Alternating Minimization Algorithm for the Matrix Completion Problem (2016)
- Recovery guarantee of weighted low-rank approximation via alternating minimization (2016)
- Efficient Matrix Sensing Using Rank-1 Gaussian Measurements (2015)
- Guarantees of Riemannian Optimization for Low Rank Matrix Recovery (2015)
- Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees (2015)
- Low-rank Solutions of Linear Matrix Equations via Procrustes Flow (2015)
- A Convergent Gradient Descent Algorithm for Rank Minimization and Semidefinite Programming from Random Linear Measurements (2015)
- Guaranteed Matrix Completion via Non-convex Factorization (2014)
- Fast Exact Matrix Completion with Finite Samples (2014)
- Non-convex Robust PCA (2014)
- Fast Matrix Completion without the Condition Number (2014)
- Understanding Alternating Minimization for Matrix Completion (2013)
- Low-rank Matrix Completion using Alternating Minimization (2012)
- Matrix Completion from a Few Entries (2009)
- Guaranteed Rank Minimization via Singular Value Projection (2009)

### Tensor Recovery/Decomposition & Hidden Variable Models

- When Does Non-Orthogonal Tensor Decomposition Have No Spurious Local Minima? (2019)
- Nonconvex Low-Rank Symmetric Tensor Completion from Noisy Data (2019)
- Smoothed Analysis in Unsupervised Learning via Decoupling (2018)
- Recovery Guarantees for Quadratic Tensors with Limited Observations (2018)
- Algorithmic thresholds for tensor PCA (2018)
- Guaranteed Simultaneous Asymmetric Tensor Decomposition via Orthogonalized Alternating Least Squares (2018)
- A theory on the absence of spurious optimality (2018)
- Sparse and Low-rank Tensor Estimation via Cubic Sketchings (2018)
- The landscape of the spiked tensor model (2017)
- Statistically Optimal and Computationally Efficient Low Rank Tensor Completion from Noisy Entries (2017)
- On the Optimization Landscape of Tensor Decompositions (2017)
- Tensor SVD: Statistical and Computational Limits (2017)
- Orthogonalized ALS: A Theoretically Principled Tensor Decomposition Algorithm for Practical Use (2017)
- On Polynomial Time Methods for Exact Low Rank Tensor Completion (2017)
- Homotopy Method for Tensor PCA (2016)
- Low-tubal-rank Tensor Completion using Alternating Minimization (2016)
- Speeding up sum-of-squares for tensor decomposition and planted sparse vectors (2015)
- Tensor vs Matrix Methods: Robust Tensor Decomposition under Block Sparse Perturbations (2015)
- Analyzing Tensor Power Method Dynamics: Applications to Learning Overcomplete Latent Variable Models (2014)
- Tensor decompositions for learning latent variable models (2014)
- Provable Learning of Overcomplete Latent Variable Models: Semi-supervised and Unsupervised Settings (2014)
- Provable Tensor Factorization with Missing Data (2014)
- Guaranteed Non-Orthogonal Tensor Decomposition via Alternating Rank-1 Updates (2014)

### Phase Retrieval

- On the Sample Complexity and Optimization Landscape for Quadratic Feasibility Problems (2020)
- A Deterministic Convergence Framework for Exact Non-Convex Phase Retrieval (2020)
- The recovery of complex sparse signals from few phaseless measurements (2019)
- Online Stochastic Gradient Descent with Arbitrary Initialization Solves Non-smooth, Non-convex Phase Retrieval (2019)
- Solving Random Systems of Quadratic Equations with Tanh Wirtinger Flow (2019)
- On the Global Minimizers of Real Robust Phase Retrieval with Sparse Noise (2019)
- Solving a perturbed amplitude-based model for phase retrieval (2019)
- Spectral Method for Phase Retrieval: an Expectation Propagation Perspective (2019)
- Rigorous Analysis of Spectral Methods for Random Orthogonal Matrices (2019)
- Solving Complex Quadratic Systems with Full-Rank Random Matrices (2019)
- A Generalization of Wirtinger Flow for Exact Interferometric Inversion (2019)
- Phase Retrieval by Alternating Minimization with Random Initialization (2018)
- Optimal Spectral Initialization for Signal Recovery with Applications to Phase Retrieval (2018)
- Towards the optimal construction of a loss function without spurious local minima for solving quadratic equations (2018)
- Solving systems of phaseless equations via Riemannian optimization with optimal sampling complexity (2018)
- Linear Spectral Estimators and an Application to Phase Retrieval (2018)
- Approximate Message Passing for Amplitude Based Optimization (2018)
- Gradient Descent with Random Initialization: Fast Global Convergence for Nonconvex Phase Retrieval (2018)
- Optimization-based AMP for Phase Retrieval: The Impact of Initialization and $\ell_2$-regularization (2018)
- Compressive Phase Retrieval of Structured Signal (2017)
- Misspecified Nonconvex Statistical Optimization for Phase Retrieval (2017)
- Compressive Phase Retrieval via Reweighted Amplitude Flow (2017)
- A Local Analysis of Block Coordinate Descent for Gaussian Phase Retrieval ([
**S**], 2017) - Convolutional Phase Retrieval via Gradient Descent (2017)
- Linear Convergence of An Iterative Phase Retrieval Algorithm with Data Reuse (2017)
- The nonsmooth landscape of phase retrieval (2017)
- Phase Retrieval via Linear Programming: Fundamental Limits and Algorithmic Improvements (2017)
- Convergence of the randomized Kaczmarz method for phase retrieval (2017)
- Phase Retrieval via Randomized Kaczmarz: Theoretical Guarantees (2017)
- Phase retrieval using alternating minimization in a batch setting (2017)
- Solving Almost all Systems of Random Quadratic Equations (2017)
- Phase Retrieval Using Structured Sparsity: A Sample Efficient Algorithmic Framework (2017)
- Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval (2017)
- Robust Wirtinger Flow for Phase Retrieval with Arbitrary Corruption (2017)
- Phase Retrieval via Sparse Wirtinger Flow (2017)
- Structured signal recovery from quadratic measurements: Breaking sample complexity barriers via nonconvex optimization (2017)
- Sparse Phase Retrieval via Truncated Amplitude Flow (2016)
- Solving Large-scale Systems of Random Quadratic Equations via Stochastic Truncated Amplitude Flow (2016)
- Low Rank Phase Retrieval (2016)
- Phase retrieval with random Gaussian sensing vectors by alternating projections (2016)
- Non-Convex Phase Retrieval from STFT Measurements (2016)
- Gauss-Newton Method for Phase Retrieval (2016)
- Phase Retrieval via Incremental Truncated Wirtinger Flow (2016)
- Solving Systems of Random Quadratic Equations via Truncated Amplitude Flow (2016)
- Reshaped Wirtinger Flow for Solving Quadratic Systems of Equations (2016)
- Provable Non-convex Phase Retrieval with Outliers: Median Truncated Wirtinger Flow (2016)
- A Geometric Analysis of Phase Retrieval ([
**S**], 2016) - Phase retrieval for wavelet transforms (2015)
- The Local Convexity of Solving Quadratic Equations (2015)
- Solving systems of phaseless equations via Kaczmarz methods: A proof of concept study (2015)
- Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems (2015)
- Phase Retrieval via Wirtinger Flow: Theory and Algorithms (2014)
- Phase Retrieval using Alternating Minimization (2013)

### Dictionary Learning

(See also **Theory** part in Dictionary/Deep Learning)

- Complete Dictionary Learning via $\ell_p$-norm Maximization (2020)
- Analysis of the Optimization Landscapes for Overcomplete Representation Learning (2019)
- Manifold Gradient Descent Solves Multi-Channel Sparse Blind Deconvolution Provably and Efficiently (2019)
- A Nonconvex Approach for Exact and Efficient Multichannel Sparse Blind Deconvolution (2019)
- Subgradient Descent Learns Orthogonal Dictionaries ([
**S**], 2018) - Efficient Dictionary Learning with Gradient Descent (2018)
- Towards Learning Sparsely Used Dictionaries with Arbitrary Supports (2018)
- A Provable Approach for Double-Sparse Coding (2017)
- Alternating minimization for dictionary learning with random initialization (2017)
- Complete Dictionary Recovery over the Sphere ([
**S**], 2015) - Simple, Efficient, and Neural Algorithms for Sparse Coding (2015)
- More Algorithms for Provable Dictionary Learning (2014)
- Exact Recovery of Sparsely Used Overcomplete Dictionaries (2013)
- New Algorithms for Learning Incoherent and Overcomplete Dictionaries (2013)
- Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization (2013)

### Deep Learning

- A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth (2020)
- On the Global Convergence of Training Deep Linear ResNets (2020)
- Stochastic Subspace Cubic Newton Method (2020)
- Training Linear Neural Networks: Non-Local Convergence and Complexity Results (2020)
- Optimization for deep learning: theory and algorithms (2019)
- Stronger Convergence Results for Deep Residual Networks: Network Width Scales Linearly with Training Data Size (2019)
- Sub-Optimal Local Minima Exist for Almost All Over-parameterized Neural Networks (2019)
- Effects of Depth, Width, and Initialization: A Convergence Analysis of Layer-wise Training for Deep Linear Neural Networks (2019)
- Learning deep linear neural networks: Riemannian gradient flows and convergence to global minimizers (2019)
- Gradient Descent Finds Global Minima for Generalizable Deep Neural Networks of Practical Sizes (2019)
- A mean-field limit for certain deep neural networks (2019)
- On Exact Computation with an Infinitely Wide Neural Net (2019)
- Analysis of the Gradient Descent Algorithm for a Deep Neural Network Model with Skip-connections (2019)
- Training Over-parameterized Deep ResNet Is almost as Easy as Training a Two-layer Network (2019)
- Mean Field Analysis of Deep Neural Networks (2019)
- Stochastic Gradient Descent for Nonconvex Learning without Bounded Gradient Assumptions (2019)
- A Generalization Theory of Gradient Descent for Learning Over-parameterized Deep ReLU Networks (2019)
- Can SGD Learn Recurrent Neural Networks with Provable Generalization? (2019)
- Width Provably Matters in Optimization for Deep Linear Neural Networks (2019)
- Elimination of All Bad Local Minima in Deep Learning (2019)
- Over-Parameterized Deep Neural Networks Have No Strict Local Minima For Any Continuous Activations (2018)
- Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks (2018)
- Effect of Depth and Width on Local Minima in Deep Learning (2018)
- A Convergence Theory for Deep Learning via Over-Parameterization (2018)
- Gradient Descent Finds Global Minima of Deep Neural Networks (2018)
- Depth with Nonlinearity Creates No Bad Local Minima in ResNets (2018)
- A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks (2018)
- Gradient descent aligns the layers of deep linear networks (2018)
- Adding One Neuron Can Eliminate All Bad Local Minima (2018)
- End-to-end Learning of a Convolutional Neural Network via Deep Tensor Decomposition (2018)
- Understanding the Loss Surface of Neural Networks for Binary Classification (2018)
- Gradient descent with identity initialization efficiently learns positive definite linear transformations by deep residual networks (2018)
- Deep linear neural networks with arbitrary loss: All local minima are global (2017)
- Global optimality conditions for deep neural networks (2017)
- Depth Creates No Bad Local Minima (2017)
- Electron-Proton Dynamics in Deep Learning (2017)
- Deep Learning without Poor Local Minima (2016)
- No bad local minima: Data independent training error guarantees for multilayer neural networks (2016)

### Sparse Vectors in Linear Subspaces

(See Structured Element Pursuit )

### Nonnegative/Sparse Principal Component Analysis

- Recovery Guarantee of Non-negative Matrix Factorization via Alternating Updates (2016)
- Non-negative Principal Component Analysis: Message Passing Algorithms and Sharp Asymptotics (2014)

### Mixed Linear Regression

- Iteratively Learning from the Best (2018)
- Global Convergence of EM Algorithm for Mixtures of Two Component Linear Regression (2018)
- Solving a Mixture of Many Random Linear Equations by Tensor Decomposition and Alternating Minimization (2016)
- Provable Tensor Methods for Learning Mixtures of Classifiers (2014)
- Alternating Minimization for Mixed Linear Regression (2013)

### Blind Deconvolution/Calibration

- Global Guarantees for Blind Demodulation with Generative Priors (2019)
- On the Global Geometry of Sphere-Constrained Sparse Blind Deconvolution (2019)
- Composite optimization for robust blind deconvolution (2019)
- Geometry and Symmetry in Short-and-Sparse Deconvolution (2019)
- Nonconvex Demixing From Bilinear Measurements (2018)
- Structured Local Optima in Sparse Blind Deconvolution (2018)
- Global Geometry of Multichannel Sparse Blind Deconvolution on the Sphere (2018)
- Blind Gain and Phase Calibration via Sparse Spectral Methods (2017)
- Blind Deconvolution by a Steepest Descent Algorithm on a Quotient Manifold (2017)
- Regularized Gradient Descent: A Nonconvex Recipe for Fast Joint Blind Deconvolution and Demixing (2017)
- Self-Calibration via Linear Least Squares (2016)
- Through the Haze: A Non-Convex Approach to Blind Calibration for Linear Random Sensing Models (2016)
- Fast and guaranteed blind multichannel deconvolution under a bilinear system model (2016)
- Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization (2016)
- A Non-Convex Blind Calibration Method for Randomised Sensing Strategies (2016)
- RIP-like Properties in Subsampled Blind Deconvolution (2015)
- Blind Recovery of Sparse Signals from Subsampled Convolution (2015)
- Near Optimal Compressed Sensing of Sparse Rank-One Matrices via Sparse Power Factorization (2013)

### Super Resolution

- The basins of attraction of the global minimizers of the non-convex sparse spikes estimation problem (2018)
- Greed is Super: A Fast Algorithm for Super-Resolution (2015)

### Synchronization Problems/Community Detection

- A Provably Robust Multiple Rotation Averaging Scheme for SO(2) (2020)
- Multi-Frequency Phase Synchronization (2019)
- On the Landscape of Synchronization Networks: A Perspective from Nonconvex Optimization (2018)
- Near-optimal bounds for phase synchronization (2017)
- Message-passing algorithms for synchronization problems over compact groups (2016)
- On the low-rank approach for semidefinite programs arising in synchronization and community detection (2016)
- Nonconvex phase synchronization (2016)

### Joint Alignment

- The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences (2016)

### Numerical Linear Algebra

- Binary component decomposition Part II: The asymmetric case (2019)
- Binary Component Decomposition Part I: The Positive-Semidefinite Case (2019)
- On Landscape of Lagrangian Functions and Stochastic Search for Constrained Nonconvex Optimization (2018)
- PCA by Optimisation of Symmetric Functions has no Spurious Local Optima (2018)
- PCA by Determinant Optimization has no Spurious Local Optima (2018)
- Orbital minimization method with $\ell_1$ regularization (2017)
- The Global Optimization Geometry of Nonsymmetric Matrix Factorization and Sensing (2017)
- On the matrix square root via geometric optimization (2015)
- Computing Matrix Squareroot via Non Convex Local Search (2015)

### Bayesian Inference

### Empirical Risk Minimization & Shallow Networks

- Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss (2020)
- Mean-Field Analysis of Two-Layer Neural Networks: Non-Asymptotic Rates and Generalization Bounds (2020)
- Ill-Posedness and Optimization Geometry for Nonlinear Neural Network Training (2020)
- Near-Optimal Algorithms for Minimax Optimization (2020)
- Global Convergence of Frank Wolfe on One Hidden Layer Networks (2020)
- A mean-field theory of lazy training in two-layer neural nets: entropic regularization and controlled McKean-Vlasov dynamics (2020)
- No Spurious Local Minima in Deep Quadratic Networks (2020)
- Landscape Complexity for the Empirical Risk of Generalized Linear Models (2019)
- Stationary Points of Shallow Neural Networks with Quadratic Activation Function (2019)
- How Much Over-parameterization Is Sufficient to Learn Deep ReLU Networks? (2019)
- Tight Sample Complexity of Learning One-hidden-layer Convolutional Neural Networks (2019)
- Quadratic number of nodes is sufficient to learn a dataset via gradient descent (2019)
- Denoising and Regularization via Exploiting the Structural Bias of Convolutional Generators (2019)
- Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks (2019)
- Minimum “Norm” Neural Networks are Splines (2019)
- Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks (2019)
- Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks (2019)
- Finding the forward-Douglas-Rachford-forward method (2019)
- Towards Understanding the Importance of Noise in Training Neural Networks (2019)
- Effect of Activation Functions on the Training of Overparametrized Neural Nets (2019)
- Robust One-Bit Recovery via ReLU Generative Networks: Improved Statistical Rates and Global Landscape Analysis (2019)
- The generalization error of random features regression: Precise asymptotics and double descent curve (2019)
- The Landscape of Non-convex Empirical Risk with Degenerate Population Risk (2019)
- Limitations of Lazy Training of Two-layers Neural Networks (2019)
- A Comparative Analysis of the Optimization and Generalization Property of Two-layer Neural Network and Random Feature Models Under Gradient Descent Dynamics (2019)
- Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks (2019)
- Global Convergence of Adaptive Gradient Methods for An Over-parameterized Neural Network (2019)
- Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit (2019)
- Towards moderate overparameterization: global convergence guarantees for training shallow neural networks (2019)
- Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks (2019)
- A Deterministic Approach to Avoid Saddle Points (2019)
- Fitting ReLUs via SGD and Quantized SGD (2019)
- Analysis of a Two-Layer Neural Network via Displacement Convexity (2019)
- Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path? (2018)
- A Provably Convergent Scheme for Compressive Sensing under Random Generative Priors (2018)
- Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers (2018)
- Learning Two Layer Rectified Neural Networks in Polynomial Time (2018)
- Uniform Convergence of Gradients for Non-Convex Learning and Optimization (2018)
- On the Convergence Rate of Training Recurrent Neural Networks (2018)
- Learning Two-layer Neural Networks with Symmetric Inputs (2018)
- Algorithmic Aspects of Inverse Problems Using Generative Models (2018)
- ReLU Regression: Complexity, Exact and Approximation Algorithms (2018)
- Gradient Descent Provably Optimizes Over-parameterized Neural Networks (2018)
- Stochastic Gradient Descent Learns State Equations with Nonlinear Activations (2018)
- Learning ReLU Networks on Linearly Separable Data: Algorithm, Optimality, and Generalization (2018)
- Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data (2018)
- Learning ReLU Networks via Alternating Minimization (2018)
- Learning One-hidden-layer ReLU Networks via Gradient Descent (2018)
- Deep Denoising: Rate-Optimal Recovery of Structured Signals with a Deep Prior (2018)
- Improved Learning of One-hidden-layer Convolutional Neural Networks with Overlaps (2018)
- The Global Optimization Geometry of Shallow Linear Neural Networks (2018)
- Polynomial Convergence of Gradient Descent for Training One-Hidden-Layer Neural Networks (2018)
- A Mean Field View of the Landscape of Two-Layers Neural Networks (2018)
- Representing smooth functions as compositions of near-identity functions with implications for deep network optimization (2018)
- SUNLayer: Stable denoising with generative networks (2018)
- Representation Learning and Recovery in the ReLU Model (2018)
- Learning Deep Models: Critical Points and Local Openness (2018)
- On the Power of Over-parametrization in Neural Networks with Quadratic Activation (2018)
- On the Connection Between Learning Two-Layers Neural Networks and Tensor Decomposition (2018)
- Local Geometry of One-Hidden-Layer Neural Networks for Logistic Regression (2018)
- A Critical View of Global Optimality in Deep Learning (2018)
- The Multilinear Structure of ReLU Networks (2017)
- Spurious Local Minima are Common in Two-Layer ReLU Neural Networks (2017)
- Gradient Descent Learns One-hidden-layer CNN: Don’t be Afraid of Spurious Local Minima (2017)
- Learning Non-overlapping Convolutional Neural Networks with Multiple Kernels (2017)
- Learning One-hidden-layer Neural Networks with Landscape Design (2017)
- Theoretical properties of the global optimizer of two layer neural network (2017)
- Critical Points of Neural Networks: Analytical Forms and Landscape Properties (2017)
- SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data (2017)
- Porcupine Neural Networks: (Almost) All Local Optima are Global (2017)
- When is a Convolutional Filter Easy To Learn? (2017)
- Theoretical insights into the optimization landscape of over-parameterized shallow neural networks (2017)
- Recovery Guarantees for One-hidden-layer Neural Networks (2017)
- Convergence Analysis of Two-layer Neural Networks with ReLU Activation (2017)
- Global Guarantees for Enforcing Deep Generative Priors by Empirical Risk (2017)
- Learning ReLUs via Gradient Descent (2017)
- Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs (2017)
- Gradient Descent Learns Linear Dynamical Systems (2016)
- The Landscape of Empirical Risk for Non-convex Losses (2016)

### System Identification

### Burer-Monteiro Style Decomposition Algorithms

- Rank optimality for the Burer-Monteiro factorization (2018)
- Smoothed analysis of the low-rank approach for smooth semidefinite programs (2018)
- Deterministic guarantees for Burer-Monteiro factorizations of smooth semidefinite programs (2018)
- Smoothed analysis for low-rank solutions to semidefinite programs in quadratic penalty form (2018)
- Solving SDPs for synchronization and MaxCut problems via the Grothendieck inequality (2017)
- The Nonconvex Geometry of Low-Rank Matrix Optimizations with General Objective Functions (2016)
- The non-convex Burer-Monteiro approach works on smooth semidefinite programs (2016)
- On the low-rank approach for semidefinite programs arising in synchronization and community detection (2016)
- Global Convergence of Stochastic Gradient Descent for Some Nonconvex Matrix Problems (2014)

### Generic Structured Problems

- Run-and-Inspect Method for Nonconvex Optimization and Global Optimality Bounds for R-Local Minimizers (2018)
- Convex Optimization with Nonconvex Oracles (2017)

### Nonconvex Feasibility Problems

- The Douglas-Rachford Algorithm for Convex and Nonconvex Feasibility Problems (2019)
- Finding magic squares with the Douglas-Rachford algorithm (2019)
- A convergent relaxation of the Douglas-Rachford algorithm (2017)
- A Lyapunov-type approach to convergence of the Douglas-Rachford algorithm (2017)
- Feasibility Problems: Douglas-Rachford and Projection Methods (Project Page)
- Dynamics of the Douglas-Rachford Method for Ellipses and p-Spheres (2016)
- A remark on the convergence of the Douglas-Rachford iteration in a non-convex setting (2016)
- On the finite convergence of the Douglas-Rachford algorithm for solving (not necessarily convex) feasibility problems in Euclidean spaces (2016)
- Global Behavior of the Douglas-Rachford Method for a Nonconvex Feasibility Problem (2015)
- The Douglas–Rachford Algorithm in the Absence of Convexity (2011)

## Of Statistical Nature …

- Sparse Tensorraphical Model: Non-convex Optimization and Statistical Inference (2016)
- Statistical and Computational Guarantees for the Baum-Welch Algorithm (2015)
- Provable Sparse Tensor Decomposition (2015)
- Statistical consistency and asymptotic normality for high-dimensional robust M-estimators (2015)
- Support recovery without incoherence: A case for nonconvex regularizations (2014)
- High Dimensional Expectation-Maximization Algorithm: Statistical Optimization and Asymptotic Normality (2014)
- Statistical guarantees for the EM algorithm: From population to sample-based analysis (2014)
- Nonconvex Statistical Optimization: Minimax-Optimal Sparse PCA in Polynomial Time (2014)
- Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local ptima (2013)
- High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity (2011)

## Relevant Optimization Methods, Theory, Miscs

- Breaking the $O(1/\varepsilon)$ Optimal Rate for a Class of Minimax Problems (2020)
- Zeroth-order Optimization on Riemannian Manifolds (2020)
- Symmetry & critical points for a model shallow neural network (2020)
- Rates of Superlinear Convergence for Classical Quasi-Newton Methods (2020)
- Efficient Clustering for Stretched Mixtures: Landscape and Optimality (2020)
- Augmented Lagrangian based first-order methods for convex and nonconvex programs: nonergodic convergence and iteration complexity (2020)
- Solving Non-Convex Non-Differentiable Min-Max Games using Proximal Gradient Method (2020)
- Variable Smoothing for Weakly Convex Composite Functions (2020)
- A block inertial Bregman proximal algorithm for nonsmooth nonconvex problems (2020)
- First-Order Methods for Nonconvex Quadratic Minimization (2020)
- A Primal Dual Smoothing Framework for Max-Structured Nonconvex Optimization (2020)
- Geometry of First-Order Methods and Adaptive Acceleration (2020)
- A quadratically convergent proximal algorithm for nonnegative tensor decomposition (2020)
- Generative Adversarial Imitation Learning with Neural Networks: Global Optimality and Convergence Rate (2020)
- On the Convergence of Adam and Adagrad (2020)
- A Riemannian Newton Optimization Framework for the Symmetric Tensor Rank Approximation Problem (2020)
- Can We Find Near-Approximately-Stationary Points of Nonsmooth Nonconvex Functions? (2020)
- Convergence to Second-Order Stationarity for Non-negative Matrix Factorization: Provably and Concurrently (2020)
- Intrinsic Construction of Lyapunov Functions on Riemannian Manifold (2020)
- Fast Linear Convergence of Randomized BFGS (2020)
- Proximal Gradient Algorithm with Momentum and Flexible Parameter Restart for Nonconvex Optimization (2020)
- Optimality and Stability in Non-Convex-Non-Concave Min-Max Optimization (2020)
- SPRING: A fast stochastic proximal alternating method for non-smooth non-convex optimization (2020)
- Second Order Optimization Made Practical (2020)
- A Trust-Region Method For Nonsmooth Nonconvex Optimization (2020)
- Efficient Search of First-Order Nash Equilibria in Nonconvex-Concave Smooth Min-Max Problems (2020)
- Stochastic Gauss-Newton Algorithms for Nonconvex Compositional Optimization (2020)
- SingCubic: Cyclic Incremental Newton-type Gradient Descent with Cubic Regularization for Non-Convex Optimization (2020)
- A mean-field analysis of two-player zero-sum games (2020)
- Structures of Spurious Local Minima in k-means (2020)
- Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization (2020)
- Sharp Analysis of Epoch Stochastic Gradient Descent Ascent Methods for Min-Max Optimization (2020)
- Convergence of a Stochastic Gradient Method with Momentum for Nonsmooth Nonconvex Optimization (2020)
- Practical Accelerated Optimization on Riemannian Manifolds (2020)
- On Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions (2020)
- SPAN: A Stochastic Projected Approximate Newton Method (2020)
- Better Theory for SGD in the Nonconvex World (2020)
- Greedy Quasi-Newton Methods with Explicit Superlinear Convergence (2020)
- Hybrid Riemannian Conjugate Gradient Methods with Global Convergence Properties (2020)
- On the Optimal Combination of Tensor Optimization Methods (2020)
- Last Iterate is Slower than Averaged Iterate in Smooth Convex-Concave Saddle Point Problems (2020)
- Consensus-Based Optimization on the Sphere I: Well-Posedness and Mean-Field Limit (2020)
- Consensus-based Optimization on the Sphere II: Convergence to Global Minimizers and Machine Learning (2020)
- Strong Evaluation Complexity Bounds for Arbitrary-Order Optimization of Nonconvex Nonsmooth Composite Functions (2020)
- A Stochastic Subgradient Method for Nonsmooth Nonconvex Multi-Level Composition Optimization (2020)
- Second-order Online Nonconvex Optimization (2020)
- From Nesterov’s Estimate Sequence to Riemannian Acceleration (2020)
- Introduction to Nonsmooth Analysis and Optimization (2020)
- The Douglas–Rachford Algorithm Converges Only Weakly (2019)
- Robust Group Synchronization via Cycle-Edge Message Passing (2019)
- A Distributed Quasi-Newton Algorithm for Primal and Dual Regularized Empirical Risk Minimization (2019)
- Semismooth Newton-type method for bilevel optimization: Global convergence and extensive numerical experiments (2019)
- Active strict saddles in nonsmooth optimization (2019)
- On Lower Iteration Complexity Bounds for the Saddle Point Problems (2019)
- Leveraging Two Reference Functions in Block Bregman Proximal Gradient Descent for Non-convex and Non-Lipschitz Problems (2019)
- Convergence of a Stochastic Subgradient Method with Averaging for Nonsmooth Nonconvex Constrained Optimization (2019)
- A Stochastic Quasi-Newton Method for Large-Scale Nonconvex Optimization with Applications (2019)
- Trust-Region Newton-CG with Strong Second-Order Complexity Guarantees for Nonconvex Optimization (2019)
- Second-Order Non-Convex Optimization for Constrained Fixed-Structure Static Output Feedback Controller Synthesis (2019)
- Lower Bounds for Non-Convex Stochastic Optimization (2019)
- Analysis of Asymptotic Escape of Strict Saddle Sets in Manifold Optimization (2019)
- Efficient Semidefinite Programming with approximate ADMM (2019)
- Local convergence of tensor methods (2019)
- Implementing a smooth exact penalty function for general constrained nonlinear optimization (2019)
- Stochastic proximal splitting algorithm for stochastic composite minimization (2019)
- Polynomial time guarantees for the Burer-Monteiro method (2019)
- Proximal Splitting Algorithms: Overrelax them all! (2019)
- An accelerated first-order method with complexity analysis for solving cubic regularization subproblems (2019)
- An inexact augmented Lagrangian method for nonsmooth optimization on Riemannian manifold (2019)
- A Stochastic Tensor Method for Non-convex Optimization (2019)
- Hölderian error bounds and Kurdyka-Łojasiewicz inequality for the trust region subproblem (2019)
- A Fully Stochastic Second-Order Trust Region Method (2019)
- Convergence Analysis of a Momentum Algorithm with Adaptive Step Size for Non Convex Optimization (2019)
- The nonsmooth landscape of blind deconvolution (2019)
- On the tightness of SDP relaxations of QCQPs with repeated eigenvalues (2019)
- Optimal Complexity and Certification of Bregman First-Order Methods (2019)
- Nonsmooth Optimization over Stiefel Manifold: Riemannian Subgradient Methods (2019)
- Bundle Method Sketching for Low Rank Semidefinite Programming (2019)
- Second-order optimality conditions for non-convex set-constrained optimization problems (2019)
- Nonconvex Stochastic Nested Optimization via Stochastic ADMM (2019)
- Gradientless Descent: High-Dimensional Zeroth-Order Optimization (2019)
- Stochastic Difference-of-Convex Algorithms for Solving nonconvex optimization problems (2019)
- Regularization of Limited Memory Quasi-Newton Methods for Large-Scale Nonconvex Minimization (2019)
- Primal-dual block-proximal splitting for a class of non-convex problems (2019)
- UniXGrad: A Universal, Adaptive Algorithm with Optimal Guarantees for Constrained Optimization (2019)
- Unifying mirror descent and dual averaging (2019)
- Pathological subgradient dynamics (2019)
- Linear Speedup in Saddle-Point Escape for Decentralized Non-Convex Optimization (2019)
- Towards a theory of non-commutative optimization: geodesic first and second order methods for moment maps and polytopes (2019)
- Improved Zeroth-Order Variance Reduced Algorithms and Analysis for Nonconvex Optimization (2019)
- Convergence Rates of Subgradient Methods for Quasi-convex Optimization Problems (2019)
- Relative Interior Rule in Block-Coordinate Minimization (2019)
- A Stochastic Extra-Step Quasi-Newton Method for Nonsmooth Nonconvex Optimization (2019)
- Anderson Acceleration of Proximal Gradient Methods (2019)
- Zero Duality Gap in View of Abstract Convexity (2019)
- A nonsmooth nonconvex descent algorithm (2019)
- A generalized Douglas-Rachford splitting algorithm for nonconvex optimization (2019)
- On the Convergence of Perturbed Distributed Asynchronous Stochastic Gradient Descent to Second Order Stationary Points in Non-convex Optimization (2019)
- A Global Newton-Type Scheme Based on a Simplified Newton-Type Approach (2019)
- Circumcentering Reflection Methods for Nonconvex Feasibility Problems (2019)
- Understanding Limitation of Two Symmetrized Orders by Worst-case Complexity (2019)
- Implementing a smooth exact penalty function for equality-constrained nonlinear optimization (2019)
- Nonconvex stochastic optimization on manifolds via Riemannian Frank-Wolfe methods (2019)
- Convergence Analysis of Gradient Algorithms on Riemannian Manifolds Without Curvature Constraints and Application to Riemannian Mass (2019)
- The Complexity of Finding Stationary Points with Stochastic Gradient Descent (2019)
- First-order primal-dual methods for nonsmooth non-convex optimisation (2019)
- Necessary and Sufficient Conditions for Adaptive, Mirror, and Standard Gradient Methods (2019)
- A Stochastic Proximal Point Algorithm for Saddle-Point Problems (2019)
- Riemannian Proximal Gradient Methods (2019)
- Extending FISTA to Riemannian Optimization for Sparse PCA (2019)
- The chain rule for VU-decompositions of nonsmooth functions (2019)
- An Average Curvature Accelerated Composite Gradient Method for Nonconvex Smooth Composite Optimization Problems (2019)
- Near-optimal Approximate Discrete and Continuous Submodular Function Minimization (2019)
- Inexact Proximal-Point Penalty Methods for Non-Convex Optimization with Non-Convex Constraints (2019)
- Anderson Accelerated Douglas-Rachford Splitting (2019)
- Accelerating ADMM for Efficient Simulation and Optimization (2019)
- Efficiency of Coordinate Descent Methods For Structured Nonconvex Optimization (2019)
- Stochastic Optimization for Non-convex Inf-Projection Problems (2019)
- Sparse solutions of optimal control via Newton method for under-determined systems (2019)
- Linear Convergence of Adaptive Stochastic Gradient Descent (2019)
- Randomly initialized EM algorithm for two-component Gaussian mixture achieves near optimality in $O(\sqrt{n})$ iterations (2019)
- Second-Order Guarantees of Stochastic Gradient Descent in Non-Convex Optimization (2019)
- Convergence Behaviour of Some Gradient-Based Methods on Bilinear Games (2019)
- On the behaviour of the Douglas-Rachford algorithm for minimizing a convex function subject to a linear constraint (2019)
- On The Geometric Analysis of A Quartic-quadratic Optimization Problem under A Spherical Constraint (2019)
- Gradient Flows and Accelerated Proximal Splitting Methods (2019)
- Path Length Bounds for Gradient Descent and Flow (2019)
- Fenchel Duality for Convex Optimization and a Primal Dual Algorithm on Riemannian Manifolds (2019)
- Proximally Constrained Methods for Weakly Convex Optimization with Weakly Convex Constraints (2019)
- Proximal Point Methods for Optimization with Nonconvex Functional Constraints (2019)
- Distributed Gradient Descent: Nonconvergence to Saddle Points and the Stable-Manifold Theorem (2019)
- An Inexact Augmented Lagrangian Framework for Nonconvex Optimization with Nonlinear Constraints (2019)
- Incremental Methods for Weakly Convex Optimization (2019)
- Complexity of Proximal Augmented Lagrangian for nonconvex optimization with nonlinear equality constraints (2019)
- On Inexact Solution of Auxiliary Problems in Tensor Methods for Convex Optimization (2019)
- A simple Newton method for local nonsmooth optimization (2019)
- Who is Afraid of Big Bad Minima? Analysis of Gradient-Flow in a Spiked Matrix-Tensor Model (2019)
- The Generalized Trust Region Subproblem: solution complexity and convex hull results (2019)
- Sampling and Optimization on Convex Sets in Riemannian Manifolds of Non-Negative Curvature (2019)
- Stochastic algorithms with geometric step decay converge linearly on sharp functions (2019)
- Heavy-ball Algorithms Always Escape Saddle Points (2019)
- Bilevel Optimization and Variational Analysis (2019)
- SNAP: Finding Approximate Second-Order Stationary Solutions Efficiently for Non-convex Linearly Constrained Problems (2019)
- Provably Efficient Reinforcement Learning with Linear Function Approximation (2019)
- Distributed Learning in Non-Convex Environments – Part I: Agreement at a Linear Rate (2019)
- Distributed Learning in Non-Convex Environments – Part II: Polynomial Escape from Saddle-Points (2019)
- Efficient Algorithms for Smooth Minimax Optimization (2019)
- Optimization on flag manifolds (2019)
- Near-Optimal Methods for Minimizing Star-Convex Functions and Beyond (2019)
- Learning Markov models via low-rank optimization (2019)
- The generalized orthogonal Procrustes problem in the high noise regime (2019)
- Global Convergence of Least Squares EM for Demixing Two Log-Concave Densities (2019)
- Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies (2019)
- Riemannian optimization on the simplex of positive definite matrices (2019)
- Block-coordinate and incremental aggregated nonconvex proximal gradient methods: a unified view (2019)
- First-order methods almost always avoid saddle points: the case of vanishing step-sizes (2019)
- Escaping from saddle points on Riemannian manifolds (2019)
- Iteration-complexity and asymptotic analysis of steepest descent method for multiobjective optimization on Riemannian manifolds (2019)
- Efficiently escaping saddle points on manifolds (2019)
- Accelerated Alternating Minimization (2019)
- Proximal Point Approximations Achieving a Convergence Rate of $O(1/k)$ for Smooth Convex-Concave Saddle Point Problems: Optimistic Gradient and Extra-gradient Methods (2019)
- On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems (2019)
- The algorithm by Ferson et al. is surprisingly fast: An NP-hard optimization problem solvable in almost linear time with high probability (2019)
- Bregman forward-backward splitting for nonconvex composite optimization: superlinear convergence to nonisolated critical points (2019)
- Fast Convergence of Belief Propagation to Global Optima: Beyond Correlation Decay (2019)
- Neural Temporal-Difference Learning Converges to Global Optima (2019)
- Momentum-Based Variance Reduction in Non-Convex SGD (2019)
- A FISTA-type accelerated gradient algorithm for solving smooth nonconvex composite optimization problems (2019)
- Stabilized SVRG: Simple Variance Reduction for Nonconvex Optimization (2019)
- SSRGD: Simple Stochastic Recursive Gradient Descent for Escaping Saddle Points (2019)
- Provable Bregman-divergence based Methods for Nonconvex and Non-Lipschitz Problems (2019)
- Stochastic Primal-Dual Algorithms with Faster Convergence than $O(1/\sqrt{T})$ for Problems without Bilinear Structure (2019)
- Bregman Proximal Gradient Algorithm with Extrapolation for a class of Nonconvex Nonsmooth Minimization Problems (2019)
- Burer-Monteiro guarantees for general semidefinite programs (2019)
- A Trust Region Method for Finding Second-Order Stationarity in Linearly Constrained Non-Convex Optimization (2019)
- Convergence rates for the stochastic gradient descent method for non-convex objective functions (2019)
- An Alternating Manifold Proximal Gradient Method for Sparse PCA and Sparse CCA (2019)
- Aggressive Local Search for Constrained Optimal Control Problems with Many Local Minima (2019)
- Online Non-Convex Learning: Following the Perturbed Leader is Optimal (2019)
- The importance of better models in stochastic optimization (2019)
- Limited-Memory BFGS with Displacement Aggregation (2019)
- Proximal algorithms for constrained composite optimization, with applications to solving low-rank SDPs (2019)
- SGD without Replacement: Sharper Rates for General Smooth Convex Functions (2019)
- A Stochastic Trust Region Method for Non-convex Minimization (2019)
- Escaping Saddle Points with the Successive Convex Approximation Algorithm (2019)
- Inertial Block Mirror Descent Method for Non-Convex Non-Smooth Optimization (2019)
- Minimization of nonsmooth nonconvex functions using inexact evaluations and its worst-case complexity (2019)
- High-Order Evaluation Complexity for Convexly-Constrained Optimization with Non-Lipschitzian Group Sparsity Terms (2019)
- Analysis of the alternating direction method of multipliers for nonconvex problems (2019)
- Stochastic Proximal Gradient Methods for Non-smooth Non-Convex Regularized Problems (2019)
- Stochastic Gradient Descent Escapes Saddle Points Efficiently (2019)
- Adaptive stochastic gradient algorithms on Riemannian manifolds (2019)
- Inexact restoration with subsampled trust-region methods for finite-sum minimization (2019)
- Exponentiated Gradient Meets Gradient Descent (2019)
- A Convergence Analysis of Nonlinearly Constrained ADMM in Deep Learning (2019)
- Momentum Schemes with Stochastic Variance Reduction for Nonconvex Composite Optimization (2019)
- Sharp Analysis for Nonconvex SGD Escaping from Saddle Points (2019)
- Passed \& Spurious: analysing descent algorithms and local minima in spiked matrix-tensor model (2019)
- Stochastic Recursive Variance-Reduced Cubic Regularization Methods (2019)
- Lower Bounds for Smooth Nonconvex Finite-Sum Optimization (2019)
- Perturbed Proximal Descent to Escape Saddle Points for Non-convex and Non-smooth Objective Functions (2019)
- Escaping Saddle Points with Adaptive Gradient Methods (2019)
- Simple algorithms for optimization on Riemannian manifolds with constraints (2019)
- On the Convergence of (Stochastic) Gradient Descent with Extrapolation for Non-Convex Optimization (2019)
- Fast Gradient Methods for Symmetric Nonnegative Matrix Factorization (2019)
- An accelerated variant of simulated annealing that converges under fast cooling (2019)
- Cheap Orthogonal Constraints in Neural Networks: A Simple Parametrization of the Orthogonal and Unitary Group (2019)
- A Unified Analysis of Extra-gradient and Optimistic Gradient Methods for Saddle Point Problems: Proximal Point Approach (2019)
- Stochastic Gradient Methods for Non-Smooth Non-Convex Regularized Optimization (2019)
- DTN: A Learning Rate Scheme with Convergence Rate of $O(1/t)$ for SGD (2019)
- Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization (2019)
- Primal-dual proximal splitting and generalized conjugation in non-smooth non-convex optimization (2019)
- A Proximal Alternating Direction Method of Multiplier for Linearly Constrained Nonconvex Minimization (2018)
- Semi-Riemannian Manifold Optimization (2018)
- Breaking Reversibility Accelerates Langevin Dynamics for Global Non-Convex Optimization (2018)
- Solving Non-Convex Non-Concave Min-Max Games Under Polyak-Łojasiewicz Condition (2018)
- A Doubly Accelerated Inexact Proximal Point Method for Nonconvex Composite Optimization Problems (2018)
- Convergence Analysis of the Relaxed Douglas-Rachford Algorithm (2018)
- Adaptive Stochastic Variance Reduction for Subsampled Newton Method with Cubic Regularization (2018)
- Stochastic Optimization for DC Functions and Non-smooth Non-convex Regularizers with Non-asymptotic Convergence (2018)
- Markov Chain Block Coordinate Descent (2018)
- Faster First-Order Methods for Stochastic Non-Convex Optimization on Riemannian Manifolds (2018)
- Universal regularization methods - varying the power, the smoothness and the accuracy (2018)
- Sampling Can Be Faster Than Optimization (2018)
- Blind Over-the-Air Computation and Data Fusion via Provable Wirtinger Flow (2018)
- Deterministic and stochastic inexact regularization algorithms for nonconvex optimization with optimal complexity (2018)
- R-SPIDER: A Fast Riemannian Stochastic Optimization Algorithm with Curvature Independent Rate (2018)
- Proximal Gradient Method for Manifold Optimization (2018)
- Sharp worst-case evaluation complexity bounds for arbitrary-order nonconvex optimization with inexpensive constraints (2018)
- Inexact alternating projections on nonconvex sets (2018)
- On exponential convergence of SGD in non-convex over-parametrized learning (2018)
- A Stochastic Penalty Model for Convex and Nonconvex Optimization with Big Constraints (2018)
- Provably Correct Automatic Subdifferentiation for Qualified Programs (2018)
- Global Non-convex Optimization with Discretized Diffusions (2018)
- Benefits of over-parameterization with EM (2018)
- Newton method for finding a singularity of a special class of locally Lipschitz continuous vector fields on Riemannian manifolds (2018)
- Inexact Newton method with feasible inexact projections for solving constrained smooth and nonsmooth equations (2018)
- Solving Weakly-Convex-Weakly-Concave Saddle-Point Problems as Weakly-Monotone Variational Inequality (2018)
- SpiderBoost: A Class of Faster Variance-reduced Algorithms for Nonconvex Optimization (2018)
- Uniform Graphical Convergence of Subgradients in Nonconvex Optimization and Learning (2018)
- A Subsampling Line-Search Method with Second-Order Results (2018)
- Optimization on Spheres: Models and Proximal Algorithms with Computational Performance Comparisons (2018)
- Analytical Convergence Regions of Accelerated First-Order Methods in Nonconvex Optimization under Regularity Condition (2018)
- Cubic Regularization with Momentum for Nonconvex Optimization (2018)
- Parallelizable Algorithms for Optimization Problems with Orthogonality Constraints (2018)
- Optimal Adaptive and Accelerated Stochastic Gradient Descent (2018)
- Riemannian Adaptive Optimization Methods (2018)
- Newton-MR: Newton’s Method Without Smoothness or Convexity (2018)
- Convergence to Second-Order Stationarity for Constrained Non-Convex Optimization (2018)
- Non-Convex Min-Max Optimization: Provable Algorithms and Applications in Machine Learning (2018)
- Hessian barrier algorithms for linearly constrained optimization problems (2018)
- Stochastic Second-order Methods for Non-convex Optimization with Inexact Hessian and Gradient (2018)
- An Inexact First-order Method for Constrained Nonlinear Optimization (2018)
- Survey: Sixty Years of Douglas–Rachford (2018)
- On the Stability and Convergence of Stochastic Gradient Descent with Momentum (2018)
- Zeroth-order (Non)-Convex Stochastic Optimization via Conditional Gradient and Gradient Updates (2018)
- Optimization with Non-Differentiable Constraints with Applications to Fairness, Recall, Churn, and Other Goals (2018)
- Escaping Saddle Points in Constrained Optimization (2018)
- Primal-dual accelerated gradient descent with line search for convex and nonconvex optimization problems (2018)
- Secondary gradient descent in higher codimension (2018)
- On Markov Chain Gradient Descent (2018)
- Global Convergence of Stochastic Gradient Hamiltonian Monte Carlo for Non-Convex Stochastic Optimization: Non-Asymptotic Performance Bounds and Momentum-Based Acceleration (2018)
- Structured Quasi-Newton Methods for Optimization with Orthogonality Constraints (2018)
- On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization (2018)
- Theoretical study of an adaptive cubic regularization method with dynamic inexact Hessian information (2018)
- Iteration-Complexity of the Subgradient Method on Riemannian Manifolds with Lower Bounded Curvature (2018)
- Convergence of Cubic Regularization for Nonconvex Optimization under KL Property (2018)
- A Note on Inexact Condition for Cubic Regularized Newton’s Method (2018)
- Discrete gradient descent differs qualitatively from gradient flow (2018)
- Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems (2018)
- On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization (2018)
- A linear-time algorithm for generalized trust region problems (2018)
- A geometric integration approach to nonsmooth, nonconvex optimisation (2018)
- Convergence Rate of Block-Coordinate Maximization Burer-Monteiro Method for Solving Large SDPs (2018)
- SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator (2018)
- Geodesic Convex Optimization: Differentiation on Manifolds, Geodesics, and Convexity (2018)
- On the Convergence Rate of Stochastic Mirror Descent for Nonsmooth Nonconvex Optimization (2018)
- AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization (2018)
- Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced (2018)
- Gradient Method for Optimization on Riemannian Manifolds with Lower Bounded Curvature (2018)
- Towards Riemannian Accelerated Gradient Methods (2018)
- Adaptive regularization with cubics on manifolds with a first-order analysis (2018)
- Minimizing Nonconvex Population Risk from Rough Empirical Risk (2018)
- On the Connection Between Sequential Quadratic Programming and Riemannian Gradient Methods (2018)
- Cutting plane methods can be extended into nonconvex optimization (2018)
- Stochastic Gradient Descent for Stochastic Doubly-Nonconvex Composite Optimization (2018)
- Adaptive Stochastic Gradient Langevin Dynamics: Taming Convergence and Saddle Point Escape Time (2018)
- A geometric integration approach to smooth optimisation: Foundations of the discrete gradient method (2018)
- A Cubic Regularized Newton’s Method over Riemannian Manifolds (2018)
- Accelerated Stochastic Algorithms for Nonconvex Finite-sum and Multi-block Optimization (2018)
- Local Saddle Point Optimization: A Curvature Exploitation Approach (2018)
- Gradient Sampling Methods for Nonsmooth Optimization (2018)
- Stochastic subgradient method converges on tame functions (2018)
- An Envelope for Davis-Yin Splitting and Strict Saddle Point Avoidance (2018)
- Convergence guarantees for a class of non-convex and non-smooth optimization problems (2018)
- On the spherical quasi-convexity of quadratic functions (2018)
- Operator Scaling via Geodesically Convex Optimization, Invariant Theory and Polynomial Identity Testing (2018)
- Lower error bounds for the stochastic gradient descent optimization algorithm: Sharp convergence rates for slowly and fast decaying learning rates (2018)
- A Riemannian BFGS Method Without Differentiated Retraction for Nonconvex Optimization Problems (2018)
- Nonconvex weak sharp minima on Riemannian manifolds (2018)
- A Newton-CG Algorithm with Complexity Guarantees for Smooth Unconstrained Optimization (2018)
- Continuous Relaxation of MAP Inference: A Nonconvex Perspective (2018)
- On the Sublinear Convergence of Randomly Perturbed Alternating Gradient Descent to Second Order Stationary Solutions (2018)
- ADMM for Multiaffine Constrained Optimization (2018)
- Averaging Stochastic Gradient Descent on Riemannian Manifolds (2018)
- Gradient Primal-Dual Algorithm Converges to Second-Order Stationary Solutions for Nonconvex Distributed Optimization (2018)
- Concise Complexity Analyses for Trust-Region Methods (2018)
- Stochastic subgradient method converges at the rate $O(k^{-1/4})$ on weakly convex functions (2018)
- On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization (2018)
- Characterizing Implicit Bias in Terms of Optimization Geometry (2018)
- Local Optimality and Generalization Guarantees for the Langevin Algorithm via Empirical Metastability (2018)
- Neural Networks with Finite Intrinsic Dimension have no Spurious Valleys (2018)
- Sample Complexity of Stochastic Variance-Reduced Cubic Regularization for Nonconvex Optimization (2018)
- An Alternative View: When Does SGD Escape Local Minima? (2018)
- Toward Deeper Understanding of Nonconvex Stochastic Optimization with Momentum using Diffusion Approximations (2018)
- Convergence Analysis of Alternating Nonconvex Projections (2018)
- Manifold Optimization Over the Set of Doubly Stochastic Matrices: A Second-Order Geometry (2018)
- Exact Semidefinite Formulations for a Class of (Random and Non-Random) Nonconvex Quadratic Programs (2018)
- How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization (2018)
- On the Quadratic Convergence of the Cubic Regularization Method under a Local Error Bound Condition (2018)
- The proximal alternating direction method of multipliers in the nonconvex setting: convergence analysis and rates (2018)
- Nonconvex Lagrangian-Based Optimization: Monitoring Schemes and Global Convergence (2018)
- Third-order Smoothness Helps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima (2017)
- Saving Gradient and Negative Curvature Computations: Finding Local Minima More Efficiently (2017)
- NEON+: Accelerated Gradient Methods for Extracting Negative Curvature for Non-Convex Optimization (2017)
- Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent (2017)
- Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion and Blind Deconvolution (2017)
- Neon2: Finding Local Minima via First-Order Oracles (2017)
- Run-and-Inspect Method for Nonconvex Optimization and Global Optimality Bounds for R-Local Minimizers (2017)
- Revisiting Normalized Gradient Descent: Evasion of Saddle Points (2017)
- Stochastic Cubic Regularization for Fast Nonconvex Optimization (2017)
- Convex Optimization with Nonconvex Oracles (2017)
- First-order Stochastic Algorithms for Escaping From Saddle Points in Almost Linear Time (2017)
- On local non-global minimizers of quadratic functions with cubic regularization (2017)
- Frank-Wolfe methods for geodesically convex optimization with application to the matrix geometric mean (2017)
- Lower Bounds for Higher-Order Convex Optimization (2017)
- Lower Bounds for Finding Stationary Points II: First-Order Methods (2017)
- Lower Bounds for Finding Stationary Points I (2017)
- Stochastic Non-convex Optimization with Strong High Probability Second-order Convergence (2017)
- Block Coordinate Descent Only Converge to Minimizers (2017)
- First-order Methods Almost Always Avoid Saddle Points (2017)
- Accelerated Block Coordinate Proximal Gradients with Applications in High Dimensional Statistics (2017)
- The power of sum-of-squares for detecting hidden structures (2017)
- Primal-Dual Optimization Algorithms over Riemannian Manifolds: an Iteration Complexity Analysis (2017)
- On Noisy Negative Curvature Descent: Competing with Gradient Descent for Faster Non-convex Optimization (2017)
- Worst-case evaluation complexity and optimality of second-order methods for nonconvex smooth optimization (2017)
- Douglas-Rachford splitting and ADMM for nonconvex optimization: new convergence results and accelerated versions (2017)
- Local Minimizers and Second-Order Conditions in Composite Piecewise Programming via Directional Derivatives (2017)
- Alternating minimization and alternating descent over nonconvex sets (2017)
- Second-Order Optimization for Non-Convex Machine Learning: An Empirical Study (2017)
- Newton-Type Methods for Non-Convex Optimization Under Inexact Hessian Information (2017)
- Non-convex Conditional Gradient Sliding (2017)
- Improved second-order evaluation complexity for unconstrained nonlinear optimization using high-order regularized models (2017)
- An Inexact Regularized Newton Framework with a Worst-Case Iteration Complexity of $O(\varepsilon^{-3/2})$ for Nonconvex Optimization (2017)
- A Second Order Method for Nonconvex Optimization (2017)
- Behavior of Accelerated Gradient Methods Near Critical Points of Nonconvex Problems (2017)
- Mirror descent in non-convex stochastic programming (2017)
- Complexity analysis of second-order line-search algorithms for smooth nonconvex optimization (2017)
- Provable Alternating Gradient Descent for Non-negative Matrix Factorization with Strong Correlations (2017)
- An Alternative to EM for Gaussian Mixture Models: Batch and Stochastic Riemannian Optimization (2017)
- Using Negative Curvature in Solving Nonlinear Programs (2017)
- Gradient Descent Can Take Exponential Time to Escape Saddle Points (2017)
- Optimality of orders one to three and beyond: characterization and evaluation complexity in constrained nonconvex optimization (2017)
- Sub-sampled Cubic Regularization for Non-convex Optimization (2017)
- Iteration-Complexity of Gradient, Subgradient and Proximal Point Methods on Riemannian Manifolds (2017)
- Linearized ADMM for Non-convex Non-smooth Optimization with Convergence Analysis (2017)
- “Convex Until Proven Guilty”: Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions (2017)
- Accelerating Stochastic Gradient Descent (2017)
- On the Gap Between Strict-Saddles and True Convexity: An $\Omega(\log d)$ Lower Bound for Eigenvector Approximation (2017)
- Catalyst Acceleration for Gradient-Based Non-Convex Optimization (2017)
- Perspective: Energy Landscapes for Machine Learning (2017)
- Gradient descent with nonconvex constraints: local concavity determines convergence (2017)
- Riemannian stochastic quasi-Newton algorithm with variance reduction and its convergence analysis (2017)
- How to Escape Saddle Points Efficiently (2017)
- Convergence rate of a simulated annealing algorithm with noisy observations (2017)
- Exploiting Negative Curvature in Deterministic and Stochastic Optimization (2017)
- Online Multiview Representation Learning: Dropping Convexity for Better Efficiency (2017)
- Natasha: Faster Stochastic Non-Convex Optimization via Strongly Non-Convex Parameter (2017)
- Phase Transitions of Spectral Initialization for High-Dimensional Nonconvex Estimation (2017)
- Maximum likelihood estimation of determinantal point processes (2017)
- Fast Rates for Empirical Risk Minimization of Strict Saddle Problems (2017)
- Gradient Descent Efficiently Finds the Cubic-Regularized Non-Convex Newton Step (2016)
- The Power of Normalization: Faster Evasion of Saddle Points (2016)
- Accelerated Methods for Non-Convex Optimization (2016)
- Worst-case evaluation complexity for unconstrained nonlinear optimization using high-order regularized models (2016)
- Finding Local Minima for Nonconvex Optimization in Linear Time (2016)
- Convergence Rate Analysis of a Stochastic Trust Region Method for Nonconvex Optimization (2016)
- Global analysis of Expectation Maximization for mixtures of two Gaussians (2016)
- Local Maxima in the Likelihood of Gaussian Mixture Models: Structural Results and Algorithmic Consequences (2016)
- Local Convergence of the Heavy-ball Method and iPiano for Non-convex Optimization (2016)
- Global rates of convergence for nonconvex optimization on manifolds (2016)
- Gradient Descent Converges to Minimizers: The Case of Non-Isolated Critical Points (2016)
- On the Douglas-Rachford algorithm (2016)
- A Grothendieck-type inequality for local maxima (2016)
- First-order Methods for Geodesically Convex Optimization (2016)
- Efficient approaches for escaping higher order saddle points in non-convex optimization (2016)
- Gradient Descent Converges to Minimizers (2016)
- When Are Nonconvex Problems Not Scary? ([
**S**], 2015; see also my thesis) - On the Global Optimality for Linear Constrained Rank Minimization Problem (2015)
- Global Convergence of ADMM in Nonconvex Nonsmooth Optimization (2015)
- Dropping Convexity for Faster Semi-definite Optimization (2015)
- Local Linear Convergence of the ADMM/Douglas–Rachford Algorithms without Strong Convexity and Application to Statistical Imaging (2015)
- Escaping the Local Minima via Simulated Annealing: Optimization of Approximately Convex Functions (2015)
- Escaping From Saddle Points — Online Stochastic Gradient for Tensor Decomposition (2015)
- Peaceman-Rachford splitting for a class of nonconvex optimization problems (2015)
- Douglas-Rachford splitting for nonconvex optimization with application to nonconvex feasibility problems (2014)
- Global convergence of splitting methods for nonconvex composite optimization (2014)
- Accelerated Gradient Methods for Nonconvex Nonlinear and Stochastic Programming (2013)
- Proximal alternating linearized minimization for nonconvex and nonsmooth problems (2013)
- Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods (2013)
- Optimality conditions for the nonlinear programming problems on Riemannian manifolds (2012)
- Second-order negative-curvature methods for box-constrained and general constrained optimization (2010)
- Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality (2008)
- Cubic regularization of Newton method and its global performance (2006)
- Computing the Local-Nonglobal Minimizer of a Large Scale Trust-Region Subproblem (2005)
- On Some Properties of Quadratic Programs with a Convex Quadratic Constraint (1998)
- Local minimizers of quadratic functions on Euclidean balls and spheres (1994)

Disclaimer- This page is meant to serve a hub for references on this problem, and does not represent in any way personal endorsement of papers listed here. So I do not hold any responsibility for quality and technical correctness of each paper listed here. The reader is advised to use this resource with discretion.

If you’d like your paper to be listed here- Just drop me a few lines via email (which can be found on “Welcome” page). If you don’t bother to spend a word, just deposit your paper on arXiv. I get email alert about new animals there every morning, and will be happy to hunt one for this zoo if it seemsfit.

Special thanks to: Damek Davis, Wotao Yin, Vladislav Voroninski, David Martinez