Computer Vision

1. What is Computer Vision?

Computer Vision is a field of Artificial Intelligence that enables machines to interpret, analyze, and understand visual data from images, videos. It mimics human vision but often processes visual information faster and more accurately using deep learning and neural networks.

2. How Computer Vision works?

2.1. Input stage

Sources: Camera, image databases, videos, medical scans,…
Formats: JPG, PNG, MP4,…

2.2. Data preprocessing

Noise reduction: Removing unwanted distortions (Gaussian blur, Median filtering).
Image scaling & resizing: Standardizing image dimensions.
Color correction: Adjusting contrast, brightness, and grayscale conversion.
Data augmentation: Rotations, flipping, cropping, and brightness adjustments to improve model.

2.3. Feature extraction

It helps the AI model identify important patterns in images.

Edge detection: Identifies the borders between different objects. Detecting shapes, objects, and segmenting images.
Keypoint detection: Finds distinctive points in an image. Used in face recognition, motion tracking, object recognition.
Histogram of Oriented Gradients (HOG): Describes the shape of an object by analyzing gradient directions.
Color & texture analysis: Extracts dominant colors and patterns from an image. Used in image classification, product recognition, and content-based search.

2.4. Model selection & training

Model choice: Selecting the best AI model for the task (classification, object detection, segmentation, OCR).
Training data: Using labeled datasets.
Hyperparameter Tuning: Optimizing learning rate, batch size, and epochs.

2.5. Prediction & Analysis

Running the model on new images, videos. It makes predictions based on what it has learned during training. This is when the AI model actually performs its task in real-world applications.

2.6. Refinement & Enhancements

It is the final step where the raw output from the model is refined, enhanced, and filtered before presenting it to the user or making automated decisions. This ensures the predictions are more accurate and useful.

3. Popular tools

Data acquisition: OpenCV, FFmpeg, GStreamer, Keras, Roboflow,…
Cleaning & Enhancement: OpenCV, Scikit-Image, Albumentations,…
Feature Extraction: OpenCV, Scikit-Image, DLib, SimpleCV,…
Model selection & Training: TensorFlow, PyTorch, Detectron2, FastAI, MMDetection,…
Model inference: TensorFlow, ONNX Runtime, YOLO,…
Refinement & Enhancements: OpenCV, Scikit-Image, Albumentations,…

4. Popular API services

Google Cloud Vision API: AI-powered image analysis, object detection, and OCR.
AWS Rekognition: Face detection, object tracking, and image moderation.
Microsoft Azure Computer Vision: Optical Character Recognition (OCR), image tagging, and object detection.
Clarifai: Pre-trained models for face recognition, OCR, and scene understanding.

Computer Vision

1. What is Computer Vision?

2. How Computer Vision works?

3. Popular tools

4. Popular API services

5. Architecture