Introduction
Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the human visual system can do. [Source]
On this page, I will talk about the basics of computer vision and the current industry solutions that solve this AI problem.
Image structure
Images are a set of integers representing the intensity of different pixels values between 0 and 255. Images can either be Black and White (BW) or colour (RBG).
For instance, examine the BW image below. We can see how one small portion within the larger image looks like. Notice that in the matrix of integers, 0 represents a completely dark pixel and 255 represents a white pixel. All other pixel values that fall in between 0 and 255 are shown in different shades of gray.
Image Source: https://seis.bristol.ac.uk/~ggjlb/teaching/ccrs_tutorial/tutorial/chap1/c1p7_i2e.html
Colour images are also very similar to the BW example shown above except for one key difference. Instead of only one channel representing the intensity between Black and White, colour images will have 3 different channels - Red, Green, Blue. These channels are superimposed upon each other to create images that accurately depict objects, people etc. A 4th channel - called alpha - representing transparency factor may also be available.
Transformations
We can make several different transformations on an image file to alter the image's appearance or layout. Some of those are:
- Cropping
- Resize
- Rotate
- Adjust Brightness, Contrast, Hue, Saturation
- Blur
Several libraries are commonly used for these transformations:
-
OpenCV , TensorFlow etc.
Deep Learning with Images Intro to Convolution Neural Networks
A convolution neural network (CNN) is a type of feed forward neural network that is very effective in computer vision-related tasks. A standard CNN has 2 types of layers in addition to the dense & dropout layers found in a normal feed-forward neural network. They are:
- Convolution layer
- Pooling layer
We can see the working of a convolution layer in the video snippet below. In an operation, we take a matrix of weights and multiply them with a sub-matrix within an image. The result is saved to a new output matrix and the process is repeated after shifting by one cell. This operation will create a smaller dataset for use in subsequent layers.
Image Source: Google developer site
The pooling layer is required in a CNN to downsample the features that we detected in a previous convolution layer. Typically, Max pooling and Average pooling layers are used in this layer.
More information on convolution and pooling layers is available here.
Image Source: Convolutional Neural Network Feb 24, 2019
Image Source:
Earlier layers of a neural network tend to capture low-level features of an image for example, eyes in an image of a person's face. As we train a deeper CNN, the network will pool all low-level features and so understand and generate high-level features like a complete face.
Source: https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/
Challenges in Computer Vision Currently
The accuracy of the computer vision models we can train is highly dependent on the quality of our data. The common challenges we encounter with image data currently are:
- Occlusion: Some object is in the way of the object we would like to detect in the image
- Viewpoint variation: Pictures in the same class were shot from different angles leading to difficulty in capturing patterns
- Illumination: Pictures in the same class have varying levels of brightness
- Deformation: The object we are trying to detect is deformed in some of the pictures
- Background objects: There are background objects that look similar to the object we are trying to detect in some of the pictures
-
Intraclass variation: Objects in pictures of the same class look different in different pictures. For example, if we are trying to detect a dog, the breed of dog might lead to variations in shape, size, and color.
Applications of Computer Vision
Image recognition tasks are being used in a variety of enterprise and consumer-focused technology products. Some examples of image recognition tasks are:
-
Image Classification: Trying to classify images based on the content of the image. For example, Identifying faulty/ not faulty chips in a manufacturing use case.
Image Source: Quality inspection in manufacturing using deep learning-based computer vision
-
Object Detection: Identifying the exact location of objects or people in an image. The most popular example for this task is the Tesla Autopilot driving system, which identifies objects/people and uses these to make decisions.
Image Source: Tesla?s Self Driving Algorithm Explained
-
Image Captioning: Identifying what is happening in the picture and then translating it into natural language. For example, a convolutional neural network would be able to look at the image show below and translate it into human language.
Image Source: A Guide to Image Captioning
Computer vision services in the industry
Here is a list of some of the popular sources available for undertaking computer vision tasks -
-
Open source tools ? OpenCV, TensorFlow, PyTorch etc.
These tools provide the fundamental building blocks for computer vision. We can read, transform and manipulate image data. We can use tools like TensorFlow & PyTorch to also CV models.
-
Pre-trained AI models eg: Inception, VGG
These are open-source CV models that can be used for techniques such as Transfer Learning that help new users achieve greater accuracy in a shorter period of time.
-
Models-as-a-Service eg: AWS Rekognition, Azure Cognitive Services, Google AI platform
These CV models are made available by large multinational corporations via web servers.
-
AI development options eg: AWS Sagemaker, Azure ML, GCP AI platform, etc.
These services will allow us to use the open-source tools mentioned above to train and infer new computer vision models
Spotfire Solutions for Computer vision
Dr Data Science - A picture is worth a million bytes
A video walkthrough of all the content mentioned above and a demo using a standard MNIST handwritten numbers dataset are shown in the video.
Accessing pre-trained models from cloud services
With the main cloud providers all offering image recognition services, we wanted to utilize their existing functionality in Spotfire® if possible. This would not only mean Spotfire could be used to run image recognition models but also connect to cloud services.
Summary of cloud providers' image services
To build this image recognition Spotfire tool we would need to be able to do the following:
- Read images into Spotfire and extract metadata i.e. image name, dimensions, etc
- Submit images to a cloud service and handle the returned results
- Visualize the results in Spotfire
To achieve this we would utilize Spotfire's extensive access to APIs and languages such as IronPython/C#, Python, and JavaScript.
Detailed information on this demo has been made available here. And a video presentation of this demo has also been made available here
Recommended Comments
There are no comments to display.