Computer vision defines a process wherein computers and complex algorithms are used to understand digital images and videos.
As a sub-group of AI and deep learning, computer vision trains convolutional neural networks (CNNs) to develop human-like vision capabilities for applications. Computer vision has a primary goal of first understanding the content of videos and still images, then formulating useful information from them to solve an ever-widening array of problems. Computer vision can include specific training of CNNs for segmentation, classification, and detection using images and videos for data.
Computer vision has infinite applications, including sports, automotive, agriculture, retail, banking, construction, insurance, and beyond. AI-driven machines of all types are becoming powered with eyes like ours, thanks to convolutional neural networks (CNNs)—the image crunchers now used by machines to identify objects. CNNs are today’s eyes of autonomous vehicles, oil exploration, and fusion energy research. They can also help spot diseases faster in medical imaging and save lives.
Convolutional neural networks can perform segmentation, classification, and detection for a myriad of applications:
Segmentation
Classification
Detection
Good at delineating objects
Is it a cat or a dog?
Where does it exist in space?
Used in self-driving vehicles
Classifies with precision
Recognizes things for safety
Computer vision systems are far better than humans at classifying images and videos into finely discrete categories and classes, like minute changes over time in medical computerized axial tomography or CAT scans. In this sense, computer vision automates tasks that humans could potentially do, but with far greater accuracy and speed.
With the wide range of current and potential applications, it isn’t surprising that growth projections for computer vision technologies and solutions are prodigious. One market research survey maintains this market will grow a stunning 47% annually through 2023, when it will reach $25 billion globally. In all of computer science, computer vision stands among the hottest and most active areas of research and development.
Much of these applications of AI are made possible by decades of advances in deep neural networks and strides in high-performance computing from GPUs to process massive amounts of data.
Computer vision analyzes images, and then creates numerical representations of what it ‘sees’ using a convolutional neural network (CNN). A CNN is a class of artificial neural network that uses convolutional layers to filter inputs for useful information. The convolution operation involves combining input data (feature map) with a convolution kernel (filter) to form a transformed feature map. The filters in the convolutional layers (conv layers) are modified based on learned parameters to extract the most useful information for a specific task. Convolutional networks adjust automatically to find the best feature based on the task. The CNN would filter information about the shape of an object when confronted with a general object recognition task but would extract the color of the bird when faced with a bird recognition task. This is based on the CNN’s understanding that different classes of objects have different shapes, but that different types of birds are more likely to differ in color than in shape.
Use cases of computer vision include image recognition, image classification, video labeling, robots, virtual assistants, and self-driving cars. Prominent use cases for computer vision include:
Data scientists and computer vision Python is the most popular programming language for machine learning (ML), and most data scientists are familiar with its ease of use and its large store of libraries—most of them free and open-source. Data scientists use Python in ML systems for data mining and data analysis, as Python provides support for a broad range of ML models and algorithms. Given the relationship between ML and computer vision, data scientists can leverage the expanding universe of computer vision applications to businesses of all types to extract vital information from stores of images and videos and augment data-driven decision-making.
Architecturally, the CPU is composed of just a few cores with lots of cache memory that can handle a few software threads at a time. In contrast, a GPU is composed of hundreds of cores that can handle thousands of threads simultaneously.
Because neural nets are created from large numbers of identical neurons, they’re highly parallel by nature. This parallelism maps naturally to GPUs, which provide a data-parallel arithmetic architecture and a significant computation speed-up over CPU-only training. This type of architecture carries out a similar set of calculations on an array of image data. The single-instruction, multiple-data (SIMD) capability of the GPU makes it suitable for running computer vision tasks, which often involve similar calculations operating on an entire image. Specifically, NVIDIA GPUs significantly accelerate computer vision operations, freeing up CPUs for other jobs. Furthermore, multiple GPUs can be used on the same machine, creating an architecture capable of running multiple computer vision algorithms in parallel.
GPU-accelerated deep learning frameworks provide interfaces to commonly used programming languages such as Python. They also provide flexibility to easily create and explore custom CNNs and DNNs, while delivering the high speed needed for both experiments and industrial deployment. The NVIDIA Deep Learning SDK accelerates widely-used deep learning frameworks such as Caffe, CNTK, TensorFlow, Theano, and Torch, as well as many other machine learning applications. The deep learning frameworks run faster on GPUs and scale across multiple GPUs within a single node. To use the frameworks with GPUs for convolutional neural network training and inference processes, NVIDIA provides cuDNN and TensorRT™ respectively. cuDNN and TensorRT provide highly tuned implementations for standard routines such as convolution, pooling, normalization, and activation layers.
Click here for a step-by-step NVCaffe installation and usage guide. A fast C++/CUDA implementation of convolutional neural networks can be found here.
To develop and deploy a vision model in no-time, NVIDIA offers the DeepStream SDK, for vision AI developers. It also includes TAO Toolkit to create accurate and efficient AI models for computer vision domain.
The NVIDIA RAPIDS™ suite of open-source software libraries, built on CUDA, gives you the ability to execute end-to-end data science and analytics pipelines entirely on GPUs, while still using familiar interfaces like Pandas and Scikit-Learn APIs.
For a more technical deep dive on CNNs, check out our developers site.
To learn more:
For more technical information read: