I, Cyborg.

12 min readJun 10, 2024

And how with the Vision Pro cameras I can see better than with my own eyes.

I was born with some congenital vision conditions that have made life a bit complicated, but they have also given me the opportunity to adapt, developing superpowers. I spent my childhood incapable of participating in sports that required balls or spheres, so I played chess. Throughout my education, from elementary school to university, I never saw what the teachers wrote on the blackboard, and even when I strained to sit in the front row, I could never follow along. However, I became very good at copying, from the person next to me, from books, from cheat sheets. I couldn’t drive a motorcycle or have a driver’s license, but I became a diver, jumped with a parachute, and flew with a glider. The ophthalmologists told me that I would have to learn Braille, but instead of a secure job on the protected lists, I chose to study Computer Engineering (with cheat sheets), open a VAT number to create websites in 1995, and then founded a company that has been developing mobile apps for 25 years, employing 18 people.

All of this without ever telling anyone how I saw or what problems it caused me. At least until today.

Let’s begin.

What is 3D?

I was born primarily seeing from one eye, the right one, with a visual acuity (measured with the letter chart that ophthalmologists use) of about 3 out of 10, the first three lines of the chart. Neither nearsighted, farsighted, nor astigmatic, nothing that could be corrected with glasses. Similarly, the left eye, but with a visual acuity of 0.1 out of 10, which corresponds approximately to reading, with great difficulty, the first two lines of the chart but at a very close distance.

The disadvantage gained is the loss of three-dimensionality or depth perception, and consequently the inability to perceive the distance of objects, like a ball coming at you. No sports like football, terrible at ping-pong, better at billiards when skipping school.

Tesla’s Artificial Intelligence

The first adaptation the brain develops to overcome this deficiency is the same technique Tesla uses for its autonomous driving, having only one front camera, not two as one might imagine.

In stereoscopic vision, human but also used by Oculus for its virtual reality headsets to understand space and head movements, our brain (and AI) performs trigonometric calculations exploiting the positional offset between two images observing the same exact point. The greater the offset, the more precise the calculation, so the closer objects are seen with more pronounced depth (so they tell me!) while landscapes are essentially flat images, like looking at a photograph.

For me, everything is a photograph, but the brain has developed a technique for estimating distance based on time, observing the change in position and size of a moving object, it can achieve a good estimate of speed and trajectory, which translates into the ability to estimate its distance and especially to activate our nervous system to avoid a ball during a dodgeball game or catch it if attempting to play basketball, as I tried for years. It’s much more difficult to do all this calculation if the objects are small and very fast like a tennis ball that you can’t see beyond the other half of the court.

Tesla’s AI works the same way, to calculate distances and dimensions of vehicles and roads in front of it, with the difference that our brain wasn’t born for this task, while Tesla’s algorithms and cameras are designed to work this way.

**It’s all pixelated**
Macular degeneration is a disease that, when it appears, is usually progressive, but in my case, it’s congenital and fortunately stable. What it does, in a very simplified way, is reduce the definition of our vision, a bit like comparing a photo taken with the latest iPhone model and those we took 10 years ago. The image we see is always the same, but we lose sharpness, color richness (called color gamut), and brightness (dynamic range) especially when there are very wide variations given by the alternation of light and dark (high dynamic range, HDR for friends). The simplest comparison to make is that the images seem stylized, objects retain their shape and size, but appear simplified.

Probability

I discussed this with an ophthalmologist many years ago during a thorough examination, and in the end, we agreed that my visual acuity is the sum of what my eyes see and what my brain believes it sees.

I demonstrate this every time I have to strain to read between the third and fourth line of the ophthalmologist’s chart because if I read certain letters on the previous line, I exclude them from the possibility of appearing on the next line, reducing the sample of possibilities and increasing the chances of guessing through intuition.

For example, if I read the letter N and on the next line, I perceive a similar shape, with more weight on one side, or in the central area, by exclusion, I will think of the letter H, the one that comes closest, and not of an M that is weighted downwards or an E that is much bulkier.

We don’t realize it, but our existence is based on statistical calculation, unconsciously in most cases, of where we place a foot when walking and the error mitigation strategies we develop in case the statistics fail. I notice this by observing people walking, because most people look with some consistency where they place their feet, either to avoid stepping on a surprise in the city or not to trip on a mountain path. In healthy vision, the transition between looking at the ground and realizing what it reminds us of is quite easy, but for those with limited resources like me, decisions have to be made to reduce the risk threshold of walking to focus on other risk sources while walking among people or simply to enjoy the view of the mountain panorama, not its path. And indeed, I often trip or bump into edges, but it’s a calculated risk I take when I don’t have to cross a ravine.

Having refined some image analysis methods, I have developed a series of bonus features (let’s call them that) even professionally. For example, I can easily recognize the difference between two HEX colors that vary by a single value, like #FFFFFF and #FFFFFE, and can read text on monitors with fonts of 5-pixel height, sometimes even 4.

Our CPU has limited capacity

This thing where the brain tries to make up for what it doesn’t see through calculation has made me understand how our brain has limited cognitive abilities and how we all, not just those with difficulties like me, spend mental energy in performing certain actions. In a home context, our brain is trained to move smoothly, but parachuted into Times Square at rush hour, with infinite visual and sound stimuli, orientation starts to weigh on our brain. In a healthy and young body, where all sensors work at their best, the operation requires few resources, but in an elderly person, who sees and hears little, the brain has to work hard to develop defense, orientation, and environmental comprehension strategies, and the result is the “dazed” effect (scientific name).

The broken sensor

Retinopathy is a condition that determines the inability of our retina, or parts of it, to perceive light. It can arise for various reasons, even injuries, the classic (and terrifying) one is looking at the sun with the naked eye. I have been relatively lucky because my retina is damaged in peripheral vision, thus reducing the so-called visual field. The resulting effect is very easy to explain because it’s like looking through binoculars, being able to see only what’s in front and not from the sides.

But peripheral vision is so important that even Tesla’s cameras implement a “fish-eye” view capable of seeing at 180°. We humans are not so lucky, biologically speaking, also because unlike Tesla where its algorithms analyze every point of the image, we humans focus on a particular point of observation, moving quickly from point to point, following a path that our brain asks our eyes to follow based on the priorities of the case.

An image generator

How many people know that even our Natural Intelligence has an image-generating function? I’m not talking about dreams, but a feature that deceives us into seeing a real object, but it’s not real. Let’s think of our vision once again as a camera that captures everything in front of us. An enormous amount of information that our brain processes only partially, for example, only in our main point of view. The rest of the images are not discarded but enter a buffer with a different calculation priority. When it comes into play, however, it can do two things: immediately send a stimulus to our nervous system to activate eye movement where attention is needed, and anticipate our vision by generating the image it saw in the previous moment and placing it in the current timeline to allow our subsystems to intervene, for example, to dodge an object. All in fractions of an instant.

There are examples of retinopathy developed at a very young age by looking at the sun, where physically the eye has a hole in the center of vision, but over time the brain gets used to filling in the image, effectively nullifying the problem. Or there are experiments demonstrating the ability to see objects appearing in our peripheral vision while looking in another direction.

I notice it from hallucinations, (like in AI) because sometimes I perceive an approaching object and react with a start, only to realize it wasn’t real.

An artificial implant

Cataracts are perhaps one of the most well-known vision problems, usually affecting the majority of the elderly population and probably due to the fact that we weren’t designed to last this long :)

Simplified to the maximum, the light entering our eyes converges through the iris, the diaphragm that decides how much light to let through to the lens, a very soft lens with variable geometry that focuses the images and projects them onto the retina, which transforms the light into electrical signals and sends them to the brain. The lens often misaligns, and with glasses or a laser intervention, it can be corrected, but with age, it hardens, preventing us from focusing correctly both near and far, and eventually becomes opaque, allowing less light through, causing severe discomfort with lights, especially at night, and in general, it is said that vision is aging. In these cases, the most used intervention is the removal of the natural lens and its replacement with an artificial one, completely losing the ability to focus variably, necessitating the use of glasses (for convenience, bifocals) but restoring clear vision to those who no longer see well.

I was born with cataracts and lived with them until I was 18/20 years old because at that time, it wasn’t clear what would happen to me, replacing my lenses. I decided to finish high school but when I started university, I decided to take the risk, starting with the sacrificial eye, the one that saw less. It went well, and a year later I had surgery on the better eye.

I remember very well the moment I left the hospital without bandages. The light overwhelmed me, the mountains that were just green appeared in thousands of colors, the asphalt that was a gray slab appeared in all its shades of gray. My visual acuity increased from 3 to 4 tenths, doubled from 0.1 to 0.2 (double!), and the world seemed completely changed to me.

Of course, I stopped focusing. This struck me particularly in reading, made extremely difficult because with my conditions, reading glasses were particularly tiring to use, and I rarely use those for distance, at the cinema or watching TV. All the skills I learned over the years I preferred to transfer to my new vision, and this mainçly because technology has helped me a lot.

Large Language Model

I struggle to read. Right now, you should see me writing. Don’t imagine the use of accessible systems, I don’t have particularly large fonts, nor do I use zoom. I’m simply used to seeing blurry and not very sharp, but generally well-defined. Over time, everyone develops the ability to read quickly when skimming a boring text or out of necessity, but I don’t know how many have ever thought about how these techniques work.

Through learning and reading, we acquire a very broad model of language (Hey ChatGPT, move aside!) that combines an extensive vocabulary with an understanding of grammar, or if we want, the sequence of words and rules that compose a sentence. Our vision does the same, learning the shape of these words, the bellies of the letters, punctuation, repetitions, how they alternate or follow one another.

In speed reading, the beginning of each word is given importance, you don’t need to read it all. For example, the word “revolution” can be understood even by reading only “revolu…” and if the context is added, it reduces to “revol…”. There is an algorithm called “Bionic Reading” that promises to help with speed reading by acting on fonts, highlighting only the essential letters of a word for understanding it. But sometimes, some words can be completely ignored, not just articles or prepositions; you can skim a paragraph by reading only the words or verbs that seem most important on that line and, in extreme cases, for power users, diagonal reading that picks statistically one or two words per line of a paragraph.

Vision Pro

At this point, I must confess that this has been a great synthesis work, but now let’s address the most important issue.

When last June Apple introduced Vision Pro, I sensed that something big was coming. After the Keynote at Apple Park, I was fortunate to exchange a few words with Phil Schiller, historic marketing manager from the days of Steve Jobs. I asked him this question:

Do you believe that Vision Pro, in the future, will allow you to see better than your own eyes?

His answer was affirmatively vague , he wouldn’t be Apple’s VP Marketing if he had answered with some future promise. But a year later, on the eve of the next WWDC that will introduce artificial intelligence in the new iPhones, I can testify that the answer is YES.

Bionic eyes, cybernetic vision

Yes. Vision Pro allows me to see better than my natural eyes. I had to wait 5 months to get the ZEISS/Apple lenses to correct my vision, but I never thought I would live this kind of experience.

Technically, it is a very simple concept. My eyes do not focus, but on the other side, when they are focused, they can see with sufficient clarity, especially if the objects are at a close distance. Vision Pro has screens practically attached to our eyes, but its internal lenses have been calibrated for focusing at about 1.2 meters away. Without additional lenses, I saw poorly, both near and far, and not finding a focal point, the experience was worse than with my natural vision. But with the additional lenses, the result, oh, guys, is extraordinary. The Vision Pro cameras handle focus up close and from afar, but above all, they project a high-resolution image just millimeters from the eyes, giving me a detail fidelity I had never perceived before.

No, I can’t yet compete with Tiger Woods and his astounding 15/10 visual acuity, but… I don’t know if I managed to explain it well, this thing is nonsensical, it’s out of control, it’s destabilizing and crazy. If tomorrow I wanted to ride a bike (pretend I never wrote this) and wanted to reduce my risk threshold and increase my confidence in moving through traffic, I would have to wear the Vision Pro to be safer than when I go without it.

I, cyborg, can’t wait to try the model with more megapixels, brighter screens, more of everything… to see even more.

[WWDC24 Update]

I couldn’t resist talking with Phill again, and he shared the same experience as mine with Vision Pro. Exciting time ahead of us.