In the American cartoon The Jetsons, the titular family lives in a futuristic world of technological convenience, with household chores attended to by Rosie, a robotic maid and housekeeper. Rosie is able to see and understand the world around her just as humans do, enabling her to carry out her tasks and respond to the Jetsons’ shenanigans.
While robots like Rosie are still far from becoming reality, giving machines the ability to “see” like humans — also known as computer vision — is a key branch of AI research and development today.
“The main purpose [of computer vision] is to try to mimic human vision,” says Liu Fang, associate professor at the department of computer science at DigiPen Institute of Technology Singapore, a specialized university focused on the digital economy. “It aims to recognize visual inputs and process them as fast as humans can.”
Vision is arguably the next frontier for AI companies. After all, the ability to see underpins almost every aspect of human life, and being able to develop machines that can see, process, and comprehend the world just like a human would can transform the way we live and work.
The evolution of computer vision
One of the earliest and most well-known uses of computer vision came up in the 1970s with the development of optical character recognition (OCR).
While OCR transformed the way businesses processed data, it remained the most significant application of computer vision for many years. Not much progress was made in the field until the 2000s, which marked significant growth in the AI industry.
According to Liu, the advent of deep learning and neural networks transformed how researchers understand and work with computer vision, especially in the process of feature extraction. This refers to the transformation of image data into numerical features that can be processed by a machine, which is how a computer is able to “see” images.
“Instead of having to half-manually extract features from images, we could now train computers to automatically perform this feature extraction and identify objects,” she explains. “This has fundamentally changed the things we can do around computer vision.”
Previously, researchers had to code algorithms from scratch for feature extraction to occur. However, this meant that each feature was only suited for a specific use case and couldn’t be applied universally across different scenarios. With neural networks, researchers can now “teach” models what to look out for, making it easier to develop programs that can be applied to the real world.
The power of computer vision
Thanks to the rise in deep learning, existing applications of computer vision today are things that would have been unfathomable just 50 years ago.
In healthcare, for example, computer vision technology plays a key role in the analysis of medical images, helping doctors identify abnormalities from ultrasounds, MRIs, and CT scans. The technology has also come up in surveillance and security, being able to pinpoint threats or unusual behaviors.
For Liu, an interesting application lies in the realm of self-driving cars. She points to “Tesla Vision,” the electric vehicle firm’s autopilot system enabled by cameras, as an example of AI-based computer vision in action.
“Instead of just using radar, which is only good for gauging distance, Tesla is using computer vision to identify and recognize objects on the road,” Liu explains. “By understanding what exactly it is seeing, the AI behind the self-driving technology can better understand the situation and react accordingly.”
In the near future, Liu envisions that computer vision can be combined with other new emerging technologies, such as virtual reality (VR) and augmented reality (AR), to enhance user experiences.
For example, existing AR and VR technologies only show 2D images to each of the viewer’s eyes instead of 3D projections. New AI techniques have improved computer vision technologies so that it can more accurately capture and mimic what is seen in the real world, creating more realistic and accurate images.
These improvements could be used in gaming to create more immersive experiences for players. But the technology also has useful applications in areas such as medicine, where AR can help in training medical students and planning surgeries, as well as in engineering and manufacturing, where training and tests involving heavy machinery can be done in simulations.
The road ahead
While computer vision has come far, it still faces several limitations. Computational challenges and issues with processing power are a key hurdle that many computer vision models struggle with. Computer vision systems are also lacking in key aspects of mimicking human vision — while they are capable of recognizing individual objects, they cannot understand the scenes they’re looking at.
Additionally, the sector struggles with talent. In Asia Pacific alone, it has been estimated that there will be a shortage of 47 million tech talent by 2030, with the AI talent gap being a barrier to the growth of the sector.
However, much is being done to address these hurdles. Liu shares that researchers are looking at combining computer vision with other technologies such as natural language processing to close the gaps in what computer vision can do. Some scientists have combined language processing models with computer vision technology to allow machines to understand context, creating AI models that can parse both language and visuals.
“At present, we don’t need it to understand what it is seeing exactly as humans do, but we want it to be able to build a representation of the content in the scene, which will help the AI make better decisions,” Liu explains.
On the talent front, governments and institutions are working to build up the pool of AI talent. The Singapore government, for one, has rolled out a comprehensive series of programs to develop talent in the AI space, while DigiPen Singapore has launched a master’s degree in computer vision to help address this talent gap.
“We need to have a broader base of AI talent in order to create that synergy for more development and ideas,” Liu says.
While truly self-driving cars and robotic housekeepers may still be a ways off, Liu is bullish on what the road ahead holds for computer vision.
“As computational resources grow and more talent comes into the space, we’ll be seeing even more exciting applications and developments come up,” she concludes. “We’re just limited by our own imagination.”
This article was originally published on Tech in Asia’s website on 6 May 2022.