May 6, 2016
In February, Google moved their Cloud Vision API, a component of the Google Cloud Platform, to open beta, and a limited or unfinished version of the product is now available to the general public. Developers, businesses, and personal users will be able to leverage this powerful image recognition software for a wide array of uses.
Google Vision API is, at its core, an algorithm with the ability to understand the shapes, symbols, and elements of an image to extract meaningful data. It operates on levels of likelihood and will never declare with absolute certainty what an object is. Instead when data is returned to the user, it will look something like the image below:
The API is new and only in beta, but it presents some amazing opportunities for developers and businesses.
Cloud Vision API Use Cases
A tool like Vision API takes the legwork out of applying metadata to images and could potentially save hundreds of man-hours of data entry work for a given project.
- Aerosense Inc: Formed through a partnership between Sony and Japanese robotics company, ZMP, Aerosense uses drones to provide aerial surveying services to businesses. Their commercial drones are equipped with a camera, and as the drones traverse the requested area, they gather photographs.Google Vision API can take these photographs, analyze them, and turn thousands of images into usable data and insight.
- Rasperry Pi Robot: The announcement video Google released for the Vision API features a small Rasperry Pi Robot using the API in real time. The robot audibly declares what objects are, “banana, automobile, money,” and uses the facial detection feature to follow the movement of faces as well as move itself toward those with happy faces.
Cloud Vision API Features
Google Cloud Vision API can identify almost anything within an image, and it does so by utilizing a combination of six features. When uploading an image, the user simply selects what Cloud Vision API should be looking for.
- Label Detection: The Vision API analyzes the image and places labels on what it sees, such as “banana,” “dog,” or “mountain,” with varying degrees of certainty.
- Face Detection: It can also detect faces in images. Vision API will identify facial construction, emotional state, and headwear.
- Not Facial Recognition: Google stresses in multiple areas of their website that facial detection software is different from facial recognition software, and Google does not support the latter. Vision API cannot match a face in an image to the same face elsewhere. Google does not record the analysis of faces for any form of facial recognition software.
- Explicit Content Detection: Image can be scanned for violent, sexual, or otherwise inappropriate content.
- Logo Detection: Whether on a shirt, product, or sign, any popular brand logos are also found and identified.
- Landmark Detection: Vision API will recognize natural or man-made structures, monuments, or sites.
- Optical Character Recognition: Though not flawless with handwritten text, Vision API will understand image text.
Microsoft Project Oxford
Microsoft is also working on a similar product currently named Project Oxford. It can detect faces, emotions, image format, dimensions, and then tag the image with all the data it can extract. Despite their similarities, there is one major difference distinguishing them.
Google Cloud Vision API is a component of the Google Cloud Platform. The images analyzed using the API are often housed within the Google cloud, but they don’t have to be. Google Cloud Vision API is compatible with any cloud deployment. Microsoft Project Oxford, however, will only be able to interact with images in the Azure cloud.
It will be independent developers who decide which image analysis software will survive. Both Cloud Vision and Project Oxford are API’s. They are designed to be integrated into future applications by third parties. Whichever can create the best combination of functionality and convenience will provide the most value. While both of these products are still in a relative form of infancy and may undergo significant changes before they are complete, compatibility with any and all cloud deployments gives Google the current edge.
Like what you read?
Mindsight, a Chicago IT services provider, is an extension of your team. Our culture is built on transparency and trust, and our team is made up of extraordinary people – the kinds of people you would hire. We have one of the largest expert-level engineering teams delivering the full spectrum of IT services and solutions, from cloud to infrastructure, collaboration to contact center. Our highly-certified engineers and process-oriented excellence have certainly been key to our success. But what really sets us apart is our straightforward and honest approach to every conversation, whether it is for an emerging business or global enterprise. Our customers rely on our thought leadership, responsiveness, and dedication to solving their toughest technology challenges.
For Further Reading: