Seeing AI - Talking Camera App for the Blind Community
Microsoft's mission is to empower every person on the planet to achieve more. We recognize that the blind and low vision community (estimated size of 253 million according to WHO) is often underserved by technology and historically has reduced educational and employment opportunities. And we want to close that gap with the help of technology.
Seeing AI is a free app that narrates the visual world for the blind and low vision community. This ongoing research project combines the power of Artificial Intelligence and effective human-computer interaction to open up the visual world using audio and describe nearby people, text, currency, color and objects. Much like a Swiss Army knife, this app provides many tools in one by leveraging the power of on-device deep learning to open many "first time in life" scenarios. Key features include - Real-time text reading, document structure understanding, audio-based barcode locator and product recognizer, face recognition and emotion/age/gender description, currency recognition, color recognition, audible light detector and handwriting reader. Additionally, this app can make even other apps accessible and inclusive by describing photos while using other apps.
Since launch in July 2017, the app has improved independence by assisting users in completing over 5 million tasks independently and has been downloaded by 150,000 users. These tasks previously required relying on sighted assistance. See the 90-second product video here:http://aka.ms/seeingaiproductvideo
Started as a grassroots innovation project during a companywide hackathon (week long programming event), the team conducted research in combining Artificial Intelligence and human-computer interaction to solve problems faced by the blind community in a manner that had never been done before. We have also built a uniquely intuitive and creative audio-based user experience. Given that blind users often have difficulty framing camera correctly, the app uses real time AI on-device to guide them to take a better photograph resulting in higher accuracy - e.g. audibly guiding till all 4 corners of a document are visible. Similarly, to be able to recognize barcodes, the app generated audio beeps to help guide the user towards areas with barcodes. Previously, users would have to buy a $1300 dedicated hardware barcode scanner to be able to do this.
A detailed feature set includes:
(1) Real-time text reading: Speak text as soon as it appears in front of camera.
(2) Understanding document structure: Guides users in getting the full document in frame with audible feedback, and then generates its digital replica with the formatting, so blind users can understand its structure beyond just simple optical character recognizer.
(3) Product reader: Guides users to find barcodes by giving beeps when it's visible and then recognize the product.
(4) Person mode helps recognize faces, how far the people are standing, and describes faces including an estimated age, gender, emotion, hats, sunglasses, beards to give a rich visual understanding. Additionally, allows users to teach faces on device, to recognize instantly when friends and colleagues are visible.
(5) Scene description: Describes an entire scene along with objects in a natural spoken language.
(6) Handwriting: Can read personal notes in a greeting card, as well as printed stylized text not usually readable by optical character recognition
(7) Currency: Identifies inaccessible paper currency
(8) Color: Identifies color of items, such as articles of clothing
(9) Light detector – An audible tone alerts when the user aims the phone's camera at light in the environment, eliminating having to touch hot bulbs to know that a light is switched off, for example.
(10) Image Recognition in Other Apps: Makes other apps on the phone accessible by describing images in them and make people included in social conversations.
Rated 4.8 stars on the Apple App Store, feedback often contains words such as 'new confidence', 'new opportunities', 'new experiences', 'game changer', ''independence' and 'empowerment'. A subset of feedback received often revolves around the following themes (More feedback athttp://aka.ms/seeingaifeedback ) :
* First time experiences in life – reading menus, selfies, finding items in a vending machine
* Education – Blind students in school can now read inaccessible paper text which is not in braille or does not have a digital equivalent. Similarly, blind parents are reading books, checking the handwritten homework of their kids and notes from teachers.
* Employment – People are using it to get more productive at their day jobs and be able to achieve much more during office hours.
* Feeling included in social conversations - Since the app can help other apps become accessible by describing inaccessible images in other apps, blind users feel included in social conversations like on Twitter, WhatsApp, etc. where images with text overlaid are often shared.
* Blind photographer - Blind users are now taking photos at social gathering, as they know their friends are in the center of the picture and are smiling before posting on social media.
* Grocery shopping and self-reliance in the kitchen - Because of the product scanning feature, users are able to gather ingredients and cook meals on their own.
* Getting access to affordable technology: A charity raised $2.5 million to distribute hardware barcode scanners costing $1300 a piece to blind community in Australia. Seeing AI is getting this technology in the hands of users for free, so future donations can be used for other purposes.