Blog post

June 5, 2024

Everyone is Talking About LLMs; We Need to Talk about Computer Vision. Here’s Why.

In November 2022, the world of technology changed overnight. A little-known tech company based out of San Francisco called OpenAI released a transformer model-based AI agent known as ChatGPT. This breakthrough, built on the foundational technology of large language models (LLMs), has been so intuitive that, overnight, it changed the AI industry and our lives with it.

Gone were the days of manually augmenting ideas or spontaneously generating text on your own.

A tool as intuitive as ChatGPT put LLMs squarely in the spotlight, effectively setting a new benchmark for artificial intelligence. However, parallel to this, an equally poignant technology also based in the field of deep learning and AI has been quietly advancing away from the limelight for many years: computer vision.

Natural Language Processing (NLP) and computer vision are two distinct fields within artificial intelligence (AI). Large Language Models (LLMs), which are based on NLP, primarily process textual data and are used for tasks such as translation, text generation, and sentiment analysis. In contrast, computer vision tools handle images and video data, performing functions such as image classification, object detection, and segmentation. The ultimate goal of computer vision is to enable machines to interpret, understand, and respond to visual information in a manner similar to humans.

While not garnering the same hype as LLMs, computer vision has been just as life-shifting for our society. It has been around for 60 years. However, its development has accelerated over the last ten years, resulting in some profound applications. Without computer vision, we wouldn’t have breast cancer detection, retinal disease screening, self-driving cars, wardrobe personalisation, pollution tracking, or even facial recognition. All of these applications derive from training models that can interpret visual data with high accuracy.

To give you an idea of just how accurate these modern applications are, consider those based on emotional recognition, facial recognition, medical imaging, and quality control. These applications now achieve accuracy rates exceeding 90%, with some reaching as high as 99%. Just a decade ago, these rates were around 50%.

This technology empowers us to capture real-time contextual environments, allowing us to identify risky hotspots or unsafe incidents as they unfold. By utilising video monitoring and analysis, it aids in accident prevention and ultimately saves lives. Its profound impact on our organisation cannot be overstated and should be duly recognised.

So, Why Is Computer Vision Not Attracting as Much (If Not More) Excitement as LLMs?

Whilst computer vision has been adopted in many use cases across many industries, such as healthcare, security and robotics, one reason it has lagged in the hype barometer is its limited availability of commercial applications; it doesn’t possess one landmark application like ChatGPT, that has found its way into the public discourse.

Putting that aside, it is worth taking a moment to understand why computer vision is so revolutionary in its approach and why it will reshape industries for decades to come.

It is essentially bridging the gap between human perception and artificial intelligence. Computer vision algorithms can excel in machine perception, recognising visual patterns and extracting meaningful insights, mimicking the cognitive processes of the human visual system. The possibilities and potential are endless for enhancing industries across the world.

LLMs, in contrast, lack the necessary sensory capabilities for advanced tasks such as machine perception. They are great at language tasks but may not be capable of possessing a nuanced sensory understanding of the world. As Yann LeCun, Head of AI at Meta, recently told the Financial Times, “LLMs have a very limited understanding of logic . . . and do not understand the physical world.”

For computer vision, several challenges remain on its path to ubiquity. As Sara Rarís Miralles, computer vision lead at Buddywise, states,

“Computer vision tools are more advanced in their applications and potential uses compared to LLMs. However, there is a vast amount of data that is not being effectively utilised. Both companies and governments are sitting on troves of data — from internet sources, CCTV cameras, and private company data — that remain untapped. Numerous industries could reap significant benefits from this technology in the coming years if they can successfully incorporate it.”

Sara Rarís Miralles,

Computer vision lead at Buddywise

Nonetheless, it’s important to note one significant caveat: both AI models are limited by the datasets on which they are trained. This underscores the need for high-quality, diverse data to maximise their effectiveness.

The Future of Computer Vision

Considering society’s growing demand to improve industries through visual data applications in the upcoming years, the significance and scope of computer vision technology cannot be overstated. With the ongoing refinement and dynamism of the data engines powering these systems, we anticipate a host of immensely beneficial applications across society, the economy, and for human well-being.

From assisting the visually impaired to enhancing industrial automation and surveillance systems for public safety to improving emergency preparedness and environmental conservation. Take occupational safety, for example. Each year, an estimated 2.78 million workers die from occupational accidents and work-related diseases worldwide. This is devastatingly high, and for Buddywise, computer vision helps support our goal of preventing these life-threatening workplace accidents with 24/7 safety monitoring throughout the workplace.

Buddywise utilises advanced contextual vision analytics to understand and connect events across the factory floor, leading to the prevention of accidents and injuries in a variety of settings, such as improper lifting by employees, pedestrian traffic and environmental hazards.

In the grand scheme of things, the discussion about which model holds superiority, computer vision or LLMs, or their trajectory, is unimportant. What truly matters here is how they are integrated into the marketplace and how companies are adopting them into their existing workflows, which derive insights and instigate action.

These customised adoptions of robotics, healthcare, and AI safety are already showcasing the significant influence of these integrations. By 2025, Gartner found that 70% of enterprises are projected to have integrated some form of computer vision into their operations, highlighting its increasing importance. Closer to home, according to Verdantix, a US-based research and advisory firm, 54% of companies are expected to utilise vision analytics to tackle safety concerns by 2024.

The imminent computer revolution is here to stay and poised to significantly enhance the intelligence and safety of industries worldwide in the coming decades.

Stay up to date

Sign up to our newsletter and get our news straight to your inbox