Things that are the same in high dimensions

I have a set of high-dimensional vectors. These vectors are distributed as a gaussian surrounding zero in all the dimensions. I choose two of those vectors, perform an operation on them, and search for the nearest neighbors among the set using Euclidean distance. The following operations all tend to return the same neighbors:

The sum
The average
n * the sum, where n is large enough (something like 0.25 to infinity)
The vector product (I think that's what it's called)
The element-wise maximum
The element-wise minimum
The sum of the element-wise sign function (e > 0 : +1, e < 0 : -1)

All of these basically give me something that is approximately a 50/50 mixture of the elements of the two vectors. Each of the elements of the result vector will resemble an element of one of the two input vectors. The multiplier n can be as large as you like because after a certain point, it's just saying "whatever is furthest out along that direction."

If you think of a simpler version of the same situation, it is easier to see what is happening.  Let all the coefficients be +1 or 0, still distributed as a half-gaussian. Then you can think of the values that happen to be +1 as "words" which "define" the vector. The result of any of the above operations will give you a definition that includes words from the definitions of both the inputs, and no others.

I think I could prove this mathematically, but maybe someone knows of a proof already I could use as a reference?

Comments

Kenny Easwaran said…
Is this related to the phenomenon of concentration of measure? Consider the unit sphere in n dimensions. Fix a small positive epsilon. Consider an "equator" n-1 sphere (or "great circle") on this n sphere. As n goes to infinity, the fraction of the surface area that is within epsilon of this equator goes to 1.

Another way to put this is that in high enough dimension, any two randomly chosen vectors are almost certain to be within epsilon of perpendicular.

I'm not quite sure how to get this to yield the results you mentioned, but it may be related.

Popular Posts