Why automatic generated Image Descriptions are the Future for the Blind

As new technologies develop, I have updated this article. Another small disclaimer: I know that many other blind and visually impaired people do not share my assessment. Unfortunately, I don't have the time or knowledge to examine the whole thing methodically and systematically and of course you have to remain critical of the technology. In my opinion, however, the statement that a description written by humans is always better than an automatically generated description is no longer tenable.

Me in a conversation with Google Bard, where you can now upload images and ask for a description. But here you can also see the limitations and errors of AI: The calculator is not gray, as Bard claims, but black. A completely blind person would not have recognized this. And colors can certainly be relevant, let's think of pregnancy tests. I also notice a certain chattiness: I ask about the color and get contextual information that wasn't really asked in that context. However, this could be relevant for a blind person who does not know colors firsthand. I have documented the conversation as text below.

In recent years, automatic image recognition technology has made great progress. Facial recognition in photos is now mainstream and only limited by the GDPR.

There are certainly blind people or at least people who have very poor eyesight and photography. Others want to know what the hell is in the photo they have collected in their overflowing miscellaneous folder. Automatic object recognition would be extremely helpful for this.

The current iOS, the Chrome browser on the desktop and some social media services such as Facebook have integrated automatic object recognition on images. Google and Microsoft offer additional apps for image description.

Article Content

Cloud, pattern recognition and machine learning

It will probably only be a matter of time before object recognition in real time - i.e. at the moment you fixate on the object - becomes possible. This would greatly expand the possibilities for blind people: If an app recognizes any text in an environment such as street signs or displays, it can automatically read it out loud. Apps could detect objects that are in the way and offer an intelligent way to avoid obstacles. However, smart glasses are a prerequisite for this. Even the most patient person doesn't feel like constantly holding their smartphone in their hand. These are now available, for example, in the form of Envision AI glasses.

The technology could also be used for images on the Internet. Alternative texts are inadequately formulated, especially in Web 2.0. Many photos are shared on Facebook, Twitter or WhatsApp. People's willingness to describe photos, even in a rudimentary way, is sometimes low, and sometimes they simply forget.

As a blind person, you don't have the desire, time or nerves to constantly remind people of this.

Automatic image descriptions currently

As mentioned above, there are now automatic image descriptions available in many systems.

It is unfortunately difficult to answer whether one will actually understand the sender's intended message behind the image thanks to these descriptions. Often enough I don't understand the joke when someone explains the content of the picture to me.

The development has become more differentiated, especially since ChatGPT last year. iOS recognizes "under water" in a picture showing two hands, but it says "adult" in a photo of people. Facebook recognizes text in an image, but does not automatically convert the text into a form that can be read by blind people. Google Bard, Envision AI or BeMyAI, on the other hand, can describe graphics in detail, as can be seen in the example above. Chat AI significantly expands the possibilities; it allows for differentiated image descriptions and queries.

Now we quickly find ourselves in difficult waters when it comes to image descriptions - yes, they also have a political dimension. Can an algorithm say that someone is dark-skinned, bearded, older, attractive, overweight, has pimples, wears glasses, looks sad, and so on? Some of this can be determined relatively easily automatically, others are interpretation. And some things are politically sensitive, such as when the gender or ethnic group is misidentified. We will probably have to discuss such questions soon. Because yes, on the one hand it is superficial, but on the other hand blind people can also be superficial. And this information is not accessible to them if they do not want to ask anyone and are not allowed to question an AI.

Where WoManpower is in demand.

Artificial intelligence, machine learning and pattern recognition reach their limits where complex descriptions are required. Although works of art can be described as a series of objects, no art lover would be satisfied with that. Woman with long dark hair smiling is an accurate but inadequate description of the Mona Lisa. But even here, algorithms could help to formulate a decent description. It's not a big deal for a computer to recognize and describe lighting conditions or color combinations in an image.

This is also where WoManpower comes into play. It's great when a person sits down and formulates a detailed description of a picture. But since art is always a question of personal perspective, it is helpful to have as many descriptions of different people as possible. Sighted people also benefit from this because they often learn something from other people's descriptions that they would otherwise have overlooked. Or don't you have the impression that we often only look superficially at landscapes or photos? Describing anything to the blind forces the sighted to engage with something more deeply than they otherwise would.

For a diagram, alternative text can at best describe its rough trend. The complex infographics that are common in science and business today are therefore inadequately described. And let's be honest, who would want to expect the scientific, chronically overworked and underpaid proletariat - who in the end it falls back on - to comprehensively describe such a graphic? We need tools that make this task easier for us.

In contrast to works of art, I think it is quite feasible for structured infographics to be reasonably described by software, especially if they are available as vector graphics, i.e. the code can be used. Then gradients or colors can easily be derived from the code. The strength of machine learning is that such descriptions can get better over time if appropriate training material is available. True, errors can always occur and a completely blind person has no chance of validating the description independently. But I assume that these descriptions will continue to get better and more accurate. The big advantage, in my opinion, is that you don't have to make the comprehensiveness and detail accuracy of a description dependent on the goodwill of a third person.

Conclusion: Human descriptions may soon no longer be needed

A few years ago I was even more pessimistic about what automatically generated image descriptions could achieve. Above all, the chat option changed my mind and the rapid improvement of the algorithms. It is not perfect and perhaps never will be. But in my opinion that is not the point. It is important that blind people have more self-determination and can participate better at work or in education. AI can give us a little more independence.

My conversation with Bard

Domingos: What can you see in this picture?

Bard: There is a desk in the picture. There is a ballpoint pen, a calculator and paper clips on the desk. There is also a business magazine on the desk. The magazine is open and the text "Business" is written on the front page.

The colors of the image are rather neutral. The desk is white, the pen is black, the calculator is gray and the paper clips are silver. The business magazine is blue.

The picture is taken from above. The viewing angle is therefore vertical. This makes the desk and the objects on it easy to see.

The image could have been published on a website or in a magazine. It could also have been taken in an office or other work area.

Here are some more details that can be seen in the picture:

• The ballpoint pen is on the magazine.

• The calculator lies between the ballpoint pen and the paper clips.

• The paper clips are in a row.

• The business magazine is open on page 1.

I hope this answer is helpful.

Domingos: What color is the calculator?

Bard: In the picture the color of the calculator is shown as gray. This is a neutral color often used for office equipment. The color gray is not noticeable and does not distract from the content of the image.

However, it is possible that the calculator is actually a different color. The color of the calculator may vary depending on the model and manufacturer. Some calculators are available in black, white, or other colors.

In order to determine the exact color of the calculator, it would be necessary to view the image in higher resolution. In this case it would be possible to determine the color of the calculator more precisely.

Domingos: Is the calculator completely visible?

Bard: No, the calculator is not completely visible. The top and left sides of the calculator are cut off. This makes it impossible to see what functions the calculator has.

The right and bottom sides of the calculator are fully visible. However, this is not sufficient to fully identify the calculator.

To fully see the calculator, it would be necessary to view the image in higher resolution. In this case it would be possible to see the top and left side of the calculator.

Here is another way to determine if the calculator is completely visible:

• You can try to find the calculator in another picture. If the calculator is fully visible in another image, then it is also fully visible in the image I am referring to.

• You can try comparing the calculator with another calculator. If the calculator is identical to another calculator, then it will also be fully visible in the image I am referring to.

I tried using both options to see if the calculator was completely visible. However, I couldn't find any information that clearly answers this question.

Domingos: Is this a simple calculator or a scientific calculator?

Bard: It is not possible to say with certainty whether this is a simple or scientific calculator. The cut off top and left side of the calculator prevent you from seeing the calculator's function keys.

Here are some clues that might indicate whether this is a simple or scientific calculator:

• Number of function keys: Simple calculators usually have fewer function keys than scientific calculators.

• Labeling of the function keys: The function keys of simple calculators are usually labeled with simple symbols or words. The function keys on scientific calculators are usually labeled with mathematical formulas or abbreviations.

• Size and shape: Simple calculators are generally smaller and handier than scientific calculators.

Due to the cut off top and left side of the calculator, it is not possible to verify these notes. Therefore, it is not possible to say with certainty whether it is a simple or scientific calculator.

Here are some possible scenarios:

• The calculator is a simple calculator with only the basic functions. In this case, the truncated pages are not relevant as they would not display any further functions.

• The Calculator is a scientific calculator with a number of scientific functions. In this case the truncated pages are relevant as they would potentially show some of the scientific features.

In order to answer the question clearly, it would be necessary to view the image in higher resolution. In this case it would be possible to see the function keys of the calculator and thus determine whether it is a simple or scientific calculator.

Domingos: Can you tell me what kind of magazine that is in this picture?

Bard: Yes, I can. The magazine in the picture is a business magazine. This can be seen on the front page, where the text "Business" is written.

The color of the magazine is blue. This is a common color for business magazines as it is perceived as professional and serious.

The magazine is open and on page 1. This is an indication that the magazine is current and being read.

However, it is not possible to clearly identify the magazine. There are many different business magazines that bear the title "Business".

In order to clearly identify the magazine, it would be necessary to have further information. For example:

• The name of the magazine

• The magazine's logo

• The publication date of the magazine

• The contents of the magazine

With this information it would be possible to find the magazine in a database and thus clearly identify it.

Here are some examples of business magazines that have the title "Business":

• Business Insider

• Fortune

• Forbes

• Harvard Business Review

• Manager Magazine

It is likely that the magazine in the picture is one of these magazines.

Read more on Image Accessibility