High dimensional spaces
People say that the properties of high-dimensional spaces are deeply unintuitive and talk about the "curse of dimensionality." Having worked with high-dimensional spaces (around a few hundred dimensions) for almost a decade now, I find it all much more comfortable than I used to.
One of the most important things you need to understand about high dimensional hypercubes is that they have a lot of corners. An n-dimensional hypercube has 2^n corners. Since each corner gets a little bit of the hypervolume, this means that almost all of the hypervolume of a hypercube is in the corners, and very little is in the center.
So, for example, suppose you randomly choose a point within a 100-dimensional hypercube. You choose each dimension independently using a uniform distribution from 0 to 1. How likely is it that at least one of those dimensions will be close to 1? As long as at least one dimension is close to 1, you are not near the center of the hypercube but out in one of its hyperfaces.
Now choose two random points in the hypercube. How far away from each other will they be? Suppose that for a particular dimension, one of the random numbers happens to be low in the first case and high in the second case. If that happens only once out of 100 tries, the points will be almost a full unit length away from each other. The farthest distance that two points could be from each other is sqrt(100) = 10, which is also rare. You are going to get a bell curve of distances between those two, and with such a high dimension that bell curve is going to be pretty sharply peaked around the expected value (which is around 4). You can be pretty confident that two random points in the hypercube will be pretty close to this far away from each other. Everything is about the same distance from everything else.
Now take these two random points, and look at their midpoint. On half the dimensions, the midpoint will be close to an endpoint (because the two endpoints happened to be on the same end of the scale for that dimension). That means that even if you have millions of points sampled throughout the hypercube, that midpoint will be closer to each of its endpoint than any of those random samples. That is, you can always find a point that is "close" to any two random points, and "far" from everything else.
In the high dimensional spaces I am working with, every one of these points is identified with a meaning. One point might be "ocean" and another "explorer". So we can find a point that is close to both of these, and that point will represent "ocean explorer". You can think of "ocean explorer" as being in the Venn diagram intersection of two circles, the circle of things that have to do with oceans, and the circle of things that have to do with explorers. The high dimensional space lets us make the Venn diagram of any combination of concepts literal.
This fact lets us calculate analogies. Consider the analogy
bear: hiker :: shark : ???
The bear vector is somewhere near both "forest" and "predator". The hiker vector is somewhere near "forest" and "explorer". The shark vector is somewhere near "ocean" and "predator". So if we take
hiker + shark - bear
we get
(forest + explorer) + (ocean + predator) - (forest + predator) =
(forest + explorer) + (ocean + predator) - (forest + predator) = ocean + explorer
which is something like "diver" and is the answer to the analogy. That is, "diver" is one of the few concepts which is fairly close to both hiker and shark, but far away from bear. And this works for solving pretty much any analogy.
These large language models like ChatGPT represent concepts in this way in their weights. This explains how they are able to carry out analogical reasoning so well. (There's a lot more going on there, of course, but this explains one part of how they work.)
Comments