Monday, April 6, 2015

word2vec, Babel-17, Galileo, Adamic, and the displacer beast from Adventure Time

I came across (in a book of esssays by Italo Calvino) the following passage from Galileo's Dialogue Concerning the Two Chief World Systems:

"I have a little book which is considerably shorter than Aristotle and Ovid, which contains all sciences, and which with just a little study can allow others to form a perfect idea of it. The book is the alphabet, and there is no doubt that the person who knows how to put together and juxtapose this or that vowel with those other consonants, will get the most accurate responses to to all doubts and he will derive lessons pertaining to all the sciences and the arts. In exactly the same way the painter can choose from different primary colours set separately on his palette and by juxtaposing a little of one colour with a little of another can depict men, plants, buildings, birds, fishes; in short, he can represent all visible objects even though there are no eyes, feathers, scales, leaves or stones on the palette. In fact it is essential that none of the things to be represented, or even any part of them, should actually be there amongst the colours, if one wants to use them to depict all manner of things, because if there were on the palette, say, feathers, these could only be used to depict birds or plumage."

This immediately reminded me of word2vec (which is on my mind constantly these days). You can think of word2vec as a bilingual dictionary. The other language is a strange one: every word has exactly 300 letters. The difference is that in that language, words which have similar spellings are similar in meaning. The remarkable thing is what an enormous amount of knowledge about the world is wrapped up in the spelling of these words. Just as one can guess at the meaning of tele-phone by the meanings of its Greek roots, or at the meaning of a Chinese word by the meaning of its characters, so to in this language the meaning of a word can be learned from the characters it is written with. The consonance, assonance, and rhyme between parts of the name, and the relationships between the letters, encode "approximate knowledge of many things."

There are two important differences, however. The first is that each of the letters, by itself, has an incomprehensible meaning. This is similar to what Galileo was saying about letters and colors of paint. If you have a pigment that is "feathers" you can only draw birds or fancy hats. But these letters let you express whatever you want, by not carrying a meaning by themselves, only by their juxtaposition.

The second is the sheer enormous amount of knowledge contained in the structure of the spellings. The capital of every country or state, the parts of machines, the attributes of movie characters, the celebration of holidays, all are contained in the spelling of the words. It resembles Babel-17 in that once you learn the true names of things, you know all the facts about those things. It is the language the world is written in. And yet at the same time it is profoundly dependent on our culture.

No comments: