Last Updated on 02/21/2023 by Sophia
The advancement in computing has been receiving many thumbs up in almost all
communities. Computing has become so versatile that it has been adopted in most groups in the
world. Narrowing down to the hybrid cloud technology which we have seen take the world by
storm. This technology has helped in pioneering and growing machine learning and the Internet
of Things (IoT). These technologies have proved beneficial in various fields especially the
health, security and scientific circles. Computing power has grown rapidly which is evident from
the specifications found in present day systems.
Quantum computing, even though still at infancy has helped spearhead machine learning
in the right direction. Machine learning is not limited to the use of dedicated central processing
units (CPUs), but it has been made versatile such that even graphics processing units (GPUs) are
being used to implement this technology. This is evident since companies like Nvidia have a
dedicated division to machine learning, deep learning and artificial intelligence (AI) (Litt, 1).
Dedicated hybrid cloud service such as Amazon AWS (Amazon Web Services) have also
contributed to strengthening of the machine-learning framework. This paper is going to delve
into the use of machine and deep learning in upscaling images using a convolutional neural
network. This increased computing power has allowed mapping and processing large neural
networks possible than before. Deep learning is a branch of machine learning which deals with
more realistic brain structure and more convolutions (Dong & Chao et al, 3). This technique of
trying to explain visual processing in the brain was known as sparse coding which inspired the
development of the neural networks.
Works into convolution neural networks can be traced as early as 1980 back in
Fukushima where a basic structure existed, but it did not train via back-propagation. Later on, the
back-propagation training was optimized for ConvNets and was deployed for optical character
recognition (OCR) and other applications in the late 80s (Cui & Zhen et al, 4). The structure
remained the same though learning was restricted to the top layer only. At this point, machine
learning was usually supervised until in 2006, LeCun developed the unsupervised learning
feature, and large-scale experiments came on board (Du & Xiaofeng et al, 8). This has seen
neural networks grow bigger; being able to be scaled across multiple graphics processing units
(GPUs) and handle more data for deep learning (Du & Xiaofeng et al, 6). From then on, various
companies have adopted this technique such as Google with their deep parallel learning,
Facebook and Amazon (Dong & Chao et al, 9) among others.
Some tasks are quite complex and no matter how good the code is, they still do not
suffice. In such a scenario, machine learning specifically deep learning comes to the rescue. Such
tasks can include recognizing and upscaling images. It also replaces repetitive tasks, which need
human-like proficiency hence increasing efficiency. Personalization is another perk which comes
with machine learning as personal preference is easily implemented and the system can adapt to
that. In as much as these machines are smart, they are still not a match for human intelligence.
On this note, they require supervised learning where human intervention and validation is
needed. Even though, some systems are autonomous, where the learning is unsupervised. The
output is usually dependent on the input, which is then processed by the machine-learning
algorithm to give the desired output as per the input parameters (Kim et al, 7).
Upscaling an image’s resolution is a complex process, which even the most trusted image
processing and editing Software, Adobe Photoshop cannot handle well. It uses a simple
algorithm known as bi-cubic interpolation (Litt, 1), which adds more pixels between the already
existing ones. This does not improve the image much since it is still grainy. Despite having more
pixels, this does not translate to the clarity of the image. In addition, using the software
mentioned above requires a high level of expertise and is time intensive. In such a scenario, a
neural network at a high level can come to the rescue. A neural network refers to an
approximation of a function that received input data and transforms the data via a complex
algorithm to produce output data. In this case, explicit programming is not necessary since we
need the neural network to learn the data based on various input and output pairs. What we want
the neural network to do in this case, is to take an input of a low-resolution photo and give a
To upscale a low-resolution photo, the internal similarities of the photo have to be
exploited. This involves the use of high-resolution and low-resolution image datasets, which help
the system, learn a mapping between the pictures. This method requires a dictionary, which will
allow mapping of low-resolution images into an immediate, sparse illustration. In such a
pipeline, various steps are involved, and not all can be optimized. A neural network achieved this
by combining all these processes into one big step, which is easily optimizable. Similar to how a
child studies to identify objects, an algorithm needs to be shown millions of images before it can
discern the input and make predictions for pictures it has never seen before.
Machine learning employs the use of super-resolution (SR) which reconstructs an image with a
higher resolution from a low-resolution image (Cui & Zhen et al, 8). The core of the SR
technique is the application of a convolutional neural network (ConvNet). They are used in
machine learning for image analysis. Their design favours minimal pre-processing when
compared to other algorithms for image classification. The network usually learns the manually
developed filters, which were developed in traditional algorithms. The convolutional neural
networks are modeled and inspired by the structure of human neurons (Du & Xiaofeng et al, 10).
This means that the more examples they receive, the more data is analyzed hence improving the
system’s intelligence. For instance, it learns that a line in an image cannot have jagged edges but
should be smooth, and if given more options for smoothing, the better the clarity and quality of
the output file.
There have been various breakthroughs in the speed and accuracy of single SR using
deeper and faster convolutional neural networks (Kim et al, 12) but recovering the finer texture
details when large upscaling factors is a challenge (Katarzyna, and Goliński, 1). Recent studies
have focused mainly on reducing the mean squared reconstruction area (Doctorow, 1). The
resultant outputs have a high signal to noise ratios, frequency details are low do not match the
expected clarity at the higher resolution.
The super-resolution generative adversarial network (SRGAN) is the first framework
with the ability of deducing photo-realistic photos up to four times the upscaling factor. This is
made possible since a perceptual loss is suggested which encompasses both content and
adversarial loss (Litt, 1). The adversarial loss drives the output to the natural picture’s manifold
via a discriminator system (Litt, 1) trained to differentiate between super-resolved images and
original ones. Besides, perceptual similarity motivates the use of a content loss function instead
of pixel space similarity. On this note, the deep ConvNet is capable of recovering photo-realistic
textures from images that have been heavily downscaled on public benchmarks (Dong & Chao et
A comprehensive (MOS) mean opinion score test has shown that substantial gains in
perceptual fidelity have been achieved using SRGAN (Doctorow, 1). The opinion scores
obtained with the framework mentioned earlier were closer to those of the original high-
resolution images than those obtained using other modern art techniques (Doctorow, 1). The use
of deep ConvNets, the machine can learn end-to-end mapping of high and low-resolution photos.
This is unlike traditional methods as it jointly optimizes all the image’s layers. A lightweight
ConvNet framework is employed and is simple to implement (Katarzyna, and Goliński, 1) while
providing formidable trade-off from the existential techniques.
With the success of various deep learning frameworks for image upscaling and
restoration, the sparse coding technique seems to have been left out, but it is still valuable. A
sparse coding model designed for SR can be incarnated as a convolutional neural network (Wang
& Zhaowen et al, 7). This system can learn froms an end-to-end cascaded structure (Cui & Zhen
et al, 10). The network’s interpretation based on sparse coding can lead to much more effective
learning while reducing the model size (Wang & Zhaowen et al, 8). This combination of deep
system and sparse coding strengths can be employed to develop a new model for image
upscaling and super-resolution both quantitatively and qualitatively. In future the various models
existing can be integrated which will help in addressing the high and low-level vision and
Cui, Zhen, et al. "Deep network cascade for image super-resolution." European Conference on
Computer Vision. Springer, Cham, 2014.
Doctorow, Cory. "Enhance Enhance: Using Machine Learning to Recover Lost Detail from
Upscaled Photos." Boing Boing, 10 May 2018, boingboing.net/2018/05/10/generative-
adversarial-network.html. Accessed 6 Dec. 2018.
Dong, Chao, et al. "Image super-resolution using deep convolutional networks." IEEE
transactions on pattern analysis and machine intelligence 38.2 (2016): 295-307.
Du, Xiaofeng, et al. "Single Image Super-Resolution Based on Multi-Scale Competitive
Convolutional Neural Network." Sensors 18.3 (2018): 789.
Kańska, Katarzyna, and Paweł Goliński. "Using Deep Learning for Single Image Super
Resolution." Deepsense.ai, 25 June 2018, deepsense.ai/using-deep-learning-for-single-
image-super-resolution/. Accessed 6 Dec. 2018.
Kim, Jiwon, Jung Kwon Lee, and Kyoung Mu Lee. "Accurate image super-resolution using very
deep convolutional networks." Proceedings of the IEEE conference on computer vision
and pattern recognition. 2016.
Litt, Geoffrey. "ENHANCE!: Upscaling Images CSI-style with Generative Adversarial Neural
Networks." Geoffrey Litt, 4 June 2017, geoffreylitt.com/2017/06/04/enhance-upscaling-
images-with-generative-adversarial-neural-networks.html. Accessed 6 Dec. 2018.
Wang, Zhaowen, et al. "Deep networks for image super-resolution with sparse
prior." Proceedings of the IEEE International Conference on Computer Vision. 2015.