This project is a way of translating vision into sound. It translate height into pitch and x-position into stereo. It sweeps across the scene once per second. This is a proposal for another way to do it.
Our brains are good at figuring where in space a particular sound came from. We can make earphones that can reproduce these sounds as if they were coming from certain points in space.
There are too many things in the scene for them all to emit sound. It would be a cacaphony. We need to pick out the points that would give us the most information. Those points are the corners. Imagine if a tiny bell was placed in each corner of the room. Using image compression techniques, all the information in a scene can be summarized by the corners. Also, the corners don't change as you walk around an object or change the lighting. As you approached the bell or turned your ear toward it, it would grow louder. The volume of a corner could be proportional to its corner-ness (The image shows how this might work. That's Peter Kovesi's phase congruency for corner detection, by the way.)
We can listen to different voices in a party by directing our attention to them. In the same way, if the sound coming from a corner is rich enough (not just a beep, but more like a voice or a melody) we can get additional information from it about the properties of the object it is coming from such as color or texture. This part would need to be experimented with.
Text detected in the room could be read aloud by the system at its own location in space.