


So, an RGB image is represented as (height, width, 3). There are three color channels ( Red, Green and Blue) in an RGB image. Therefore, a grayscale image is often represented as a 2D array (tensor). We can ignore the third dimension because it is one. So, a grayscale image is represented as (height, width, 1)or simply (height, width). There is only one color channel in a grayscale image. In deep learning, images are represented as arrays of pixel values. Before we move on, you need to understand the difference between grayscale and RGB imagesĪn image consists of pixels. So, CNNs are parameter efficient.ĬNNs work with both grayscale and RGB images. CNNs can reduce the number of parameters in the network significantly.CNNs can retain spatial information as they take the images in the original format. So, accuracy will be reduced significantly. If we do so, spatial information (relationships between the nearby pixels) will be lost. To use MLPs with images, we need to flatten the image.These reasons will motivate you to learn more about CNNs. Here are the two main reasons for using CNNs instead of MLPs when working with image data. They are widely used in the domain of computer vision. An MLP is not suitable to use with image data as a large number of parameters are involved in the network even for small images.Ĭonvolutional Neural Networks (CNNs) are specially designed to work with images. We’ve already discussed one neural network architecture - Multilayer Perceptron (MLP). Original image by Gerd Altmann from Pixabay, edited by author
