# Geometry of Two-Layer Linear Neural Network

Two-layer linear neural network is arguably the simplest, nontrivial neural network. This post fleshes out the details of this seminal paper 1, in which the authors set out to understand the solution space of this optimization problem \begin{align} \label{eq:nn_main_global} \min \; f\left(\mathbf A, \mathbf B\right) = \frac{1}{2}\left\| \mathbf Y - \mathbf A \mathbf B \mathbf X \right\|_{F}^2, \mathbf A \in \mathbb{R}^{n \times p}, \mathbf B \in \mathbb{R}^{p \times n} \end{align} where $\mathbf Y \in \mathbb{R}^{n \times m}$ and $\mathbf X \in \mathbb{R}^{n \times m}$ are known with $m \geq n$. They are stacked (in columns) response-input vector pairs used to train the network. When $p \ll n$ and $\mathbf Y = \mathbf X$, it is easy to see any optimum $\mathbf A \mathbf B = \mathbf U \mathbf U^\top$, where $\mathbf U$ is an orthonormal basis for the subspace spanned by the first $p$ eigenvectors of $\mathbf X \mathbf X^\top$. Hence this formulation is closely related to the well-known principal component analysis. It turns out one interesting aspect about this optimization problem is that under some conditions the only local minimum point (up to invertible linear transformations, of course) is also global minimum. We will use some short-hand notation in the proof: $\mathbf \Sigma_{\mathbf X\mathbf X} \doteq \mathbf X \mathbf X^\top$, similarly for $\mathbf \Sigma_{\mathbf Y \mathbf Y^\top}$, $\mathbf \Sigma_{\mathbf X \mathbf Y}$ and $\mathbf \Sigma_{\mathbf Y \mathbf X}$.

Theorem 1 Suppose $\mathbf X$ is full rank and hence $\mathbf \Sigma_{\mathbf X \mathbf X}$ is invertible. Further assume $\mathbf \Sigma \doteq \mathbf \Sigma_{\mathbf Y \mathbf X} \mathbf \Sigma_{\mathbf X \mathbf X}^{-1} \mathbf \Sigma _{\mathbf X \mathbf Y}$ is full rank with $n$ distinct eigenvalues $\lambda_1 > \cdots > \lambda_n > 0$. Then $\eqref{eq:nn_main_global}$ has no spurious local minima, except for equivalent versions of global minimum due to invertible transformations.

# Vertex Coverage of Random Polytope - 2

Recall from last post that we are interested in this problem: Given $n$ i.i.d. uniformly random samples $\mathbf x_1, \cdots, \mathbf x_n$ from $\mathbb S^{d-1}$ ($n > d$), consider the induced convex polytope $\mathcal K_n = \operatorname{\mathbf{conv}}\left(\mathbf x_1, \cdots, \mathbf x_n\right)$. Now sample $m$ more i.i.d. uniformly random points $\mathbf z_1, \cdots, \mathbf z_m$ from $\mathbb S^{d-1}$ independent of first $n$ points we sampled. For each of the $m$ points, say $\mathbf z_i$, treat it as canonical vector and color the facet this vector is incident on $\mathcal K_n$ in red (with probability one, such point will only hit the interior of a facet rather than other degenerate faces). Determine how large $m$ needs to be in order to ensure 1) all $n$ vertices of $\mathcal K_n$ are colored; and 2) one can find a connected path within the colored region to connect any two vetices. In other words, if we only retain colored vertices and edges and ignore other higher-dimensional faces, the resulting graph is connected. We will call such an event coverage of vertices.

From simulation, it seems one needs at most $m \leq O\left(n \log n\right)$ to ensure the desired coverage with high probability. Here I present a heuristic argument showing $m$ only needs to be at most a low-order polynomial in $n$. Suppose we require something stronger: all edges get colored. By the celebrated Balinski’s theorem, this implies vertex coverage we desire.

Theorem 1 (Balinski’s Theorem) The undirected graph from the vertices and edges of a $d$-dimensional convex polytope is $d$-connected, i.e., the graph remains connected if one removes any $d-1$ or fewer vertices and the associated edges from the graph.

Now from the generation model, we expect $\mathcal K_n$ to be regular both in combinatorial and geometric properties. So on average, the spherical simplex cut by each facet from the sphere has normalized spherical measure $1/\#\mathrm{facets}$ and each edge is touched by $\#\mathrm{facets}/\#\mathrm{edges}$ facets. So \begin{align*} & \mathbb P\left[\exists\; \text{an edge not colored with m samples}\right] \newline \leq\; & \#\mathrm\; \times \; \mathbb P\left[\text{a fixed edge not colored with m samples}\right] \newline \approx \; & \#\mathrm\; \times \; \left(1-\frac{1}{\#\mathrm{facets}} \frac{\#\mathrm{facets}}{\#\mathrm{edges}}\right)^m \newline \leq \; & \#\mathrm\times \exp\left(-\frac{m}{\#\mathrm{edges}}\right) \newline =\; & \exp\left(-\frac{m}{\#\mathrm{edges}} + \log \#\mathrm{edges}\right). \end{align*} Note that $\#\mathrm{edges} \leq n^2/2$, and it is enough to take \begin{align} m = O\left(n^2 \log n\right) \end{align} to ensure all edges get colored with some nontrivial constant probability or $m = O\left(n^3\right)$ to ensure high probability. The order is likely to be loose, as in the random model $\#\mathrm{edges}$ could be order-of-magnitude smaller than $O\left(n^2\right)$, and also one may only need to cover a fraction of edges to ensure vertex coverage. Of course to make the argument rigorous, one needs in addition to also work on extreme quantities such as 1) minimum normalized spherical measure one spherical simplex subtends; and 2) minimum number of facets one edge will touch in $\mathcal K_n$.

# Simplicial Covering of Vertices of Random Polytopes

Sample $n$ points $\left\{\mathbf x_1, \cdots, \mathbf x_n\right\}$ i.i.d. uniform from the unit sphere $\mathbb S^{d-1}$ and consider the convex polytope formed as $\mathcal K = \operatorname{\mathbf{conv}}\left(\mathbf x_1, \cdots, \mathbf x_n\right)$. With probability one, the polytope is simplicial, which means each facet is exactly a $d-1$ simplex and there is no other points lying in it.

There are many questions of probabilistic and combinatorial nature that can be posed for this setting. Recent research on this culminates in proving some concentration results to the $\mathrm{volume}\left(\mathcal K\right)$ in this paper1. In fact, the result applies to points sampled i.i.d. uniform from the boundary of any convex body with smooth boundaries:

Theorem 1 For any compact convex body $\mathbf K$ with nonempty interior and $\mathcal C^2$ boundaries and everywhere positive Gauss-Kronecker curvature, there are positive constants $\alpha$ and $c$ such that the following holds: consider the convex polytope $\mathcal K_n = \operatorname{\mathbf{conv}}\left(\mathbf x_1, \cdots, \mathbf x_n\right)$, where $x_i$’s are sampled i.i.d. uniform from the boundary of $\mathbf K$. For any constant $0 < \eta < \frac{d-1}{3d+1}$ and $0 < \lambda \leq \frac{\alpha}{4} n^{\left(d-1\right)/\left(3d+1\right) + 2\left(d+1\right)\eta/\left(d-1\right)}$, we have \begin{align} \mathbb P\left[\left|\mathrm{Vol}\left(\mathcal K_n\right)- \mathbb E\mathrm{Vol}\left(\mathcal K_n\right) \right| > \sqrt{\lambda V_0}\right] \leq 2\exp\left(-\lambda/4\right) + \exp\left(-cn^{\left(d-1\right)/\left(3d+1\right) - \eta}\right) \end{align} where $V_0 = \alpha n^{-\left(d+3\right)/\left(d-1\right)}$.

This development is in line with the more developed results that achieves central limit theorems (CLT) of various quantities of interest for the model in which random points sampled directly from the mother convex body $\mathbf K$, instead of its boundary. See this paper2.

One problem from my research project recently is more leaned toward the combinatorial side. A simple version is like this: consider $\mathcal K$ as in the first paragraph, how many facets one needs to randomly color (say in red), such that ultimately 1) the colored regions are connected; and 2) any vertex lie in at least colored facet. Here by “randomly” we mean uniform as measured by the “volume of the spherical simplex” subtended by facets, normalized by the surface area of $\mathbb S^{d-1}$ of course. It is very easy to see the lower bound is \begin{align*} \Omega\left(n/d\right) \end{align*} as this is the least required if one were to cover all vertices. It is not immediately clear what the upper bound is. Intuitively, since the random polytopes tend to behave like very “regular objects”, and our sampling is “uniformly” random, one expects substantially fewer such colored facets to cover all vertices, rather than all facets, the number of which is often exponential in $d$. Some simulations (of course, computation quickly becomes prohibitive even one’s interested in computing the convex hull of many points in large dimensions) suggest the upper bound is at most \begin{align} O\left(n\log n\right), \end{align} which is very mild as compared to the crude estimate of the number of facets one’d have \begin{align} \binom{n}{d} \leq \frac{n^d}{d!}. \end{align}

1. An Inscribing Model for Random Polytopes, by Richardson, Vu, and Wu, http://link.springer.com/article/10.1007%2Fs00454-007-9012-3

2. Sharp Concentration of Random Polytopes, by Vu, http://link.springer.com/article/10.1007%2Fs00039-005-0541-8

# How to Fix “\$MFTMirr Does Not Match \$MFT (Record 0)” [Updated Version… ]

I hate Toshiba external disk driver’s fragile cable port … you have to keep your hands on while data are being transferred; otherwise you probably find the current article or this (a) or this (b).

This (a) has explained things pretty clearly, except for that you won’t find ntfsprogs in recent versions of Linux distributions, as the ntfsprogs project has been merged with the ntfs-3g project under Tuxera Inc. according to Wiki. The real attack works as described in Answer in this (b), except that one needs to check out the mount point by “sudo fdisk -l” and “sudo ntfsfix -b XXX” to get the program to start the fix, where XXX is to be replaced by the real mount point.

# Script for OpenCV Installation on Ubuntu 12.10

Installing OpenCV on Ubuntu was painful for me in the past. There are lots of dependencies to be sorted out first and there’s no comprehensive installation tutorial shipped with the OpenCV distribution. This script has done a great job assembling all necessary “apt-get-install’s” into an executable script. Some personal notes about this:

• It seems to me " sudo apt-get install libopencv-dev" for dependencies is not necessary - this will be superseded by the subsequent installation of opencv from source. At the end of the scripting installation, the package seems to be in some cache place and be not in effect:

• It seems " sudo apt-get install libavcodec-dev libavformat-dev libswscale-dev" is also unnecessary. These libraries are to be provided by the next fresh installation of ffmpeg follows. Checking their location seems to also confirm my guess:

• About version of ffmpeg: ffmepg is under active development as always. The choice of its version is sometimes tricky, especially in running together with OpenCV. I remember the 1.x series was not compatible with OpenCV 2.3 in my previous trial (compilation errors were thrown due to some problem with ffmpeg libraries). The current 2.4.3 version seems to be at least fine with ffmpeg 0.11.2 in compilation.

[Update] From test by Mr. Alok Singh Mahor   (comments in the original post), installing Ubuntu package “libopencv-dev” would already get things work, though the linking order with g++ has to be taken care of as discussed (also comments in the original post and here) . Of course, it’s not bad idea to install from source that always guarantees to bring  in the newest feature of actively developed OpenCV. [Dec 28 2012 / Oct 30 2013]

# Photos Taken at Florence (Italy) and Beer Mountain (New York)

Follow the pictures and click to enter my google (or 百度) photo galleries. I hope you enjoy~

Florence Gallery ( Google photos or 百度相册 )

Bear Mountain Gallery (Oct 27 2012) ( Google photos or 百度相册 )