plaatje*

How do Artificial Intelligence-machines perceive the world in terms of human interpretation, imagination and classification?

Epoch, Font-Family generated through deep learning process.

One of the primary functions of the human visual system is the ability that allows us to relate our vision to our knowledge of the world (brainhq). It allows us to understand the context of images by assigning labels to the objects depicted in them. These labels work as a classification method that gives us a collective representation of objects’ meaning. This classification process is at the hearth of Object Recognition, which is the capability of computers to recognize content within images. Object Recognition is deeply rooted in our shared linguistic conventions – it uses written language as way to organize and classify images and as a result it has created a new codes within culture at large.

reCAPCHA, Google, relation between human/machine vision, recapcha.com

reCAPCHA, Google, relation between human/machine vision, recapcha.com

Captcha is a type of challenge-response test used in computing to determine whether or not the user is human. (Johnson, Essaaidi. p178)

Object Recognition is based on the manual labor performed by humans, more specifically feeding computers data that has been produced by humans (Kapathy). The aim is to make computers understand the human vision and reproduce it accordingly. This data is gathered through number of methods. The Captcha above is an example of manual image classification that will be used to improve the algorithms behind Object Recognition. The point is to pass on to computers the complex attributes that human vision has. For example we can identify the difference between a dog and a cat or any other object. We know the difference between breeds of dogs. Even when we see a breed that we never have seen before we still know its a dog based on our general knowledge of dogs in terms of their visual characteristics.

reCAPCHA, Google, Streetview Housenumbers, recapcha.com

The last five years there has been a shift in the appliance of Captchas. Where before it was a string of abstract characters - numbers, letters etc., it now became in many cases image based. It seems that we are constantly performing free labor for corporations. Google for instance has been using web users to decode house numbers for Google Street View. This data have been used to train machines to recognize house numbers and decode them. (Sarah Perez.2012)

The Captcha is one of the simplest and oldest examples in regards to the relation between machine vision and the human mind. However over the past few years we have seen a huge growth in the functioning and implementation of artificial intelligence. One of the breakthroughs is the Convolutional Neural Networks (ConvNets or CNNs). A computational algorithm embedded with millions of neurons that processes information at high speed. The method is inspired by the functioning of the human brain and can be compared to a child that is learning to understand the world (Kapathy).The system has proven to be very effective in terms of image recognition and classifications and it is widely use in tasks such as face/object identification, self-driving cars and robot vision (Kapathy).

systematic arrangement in groups or categories according to established criteria; specifically

Definition of Classification, merriam-webster.com

Object recognition might seem effortless for us humans but when it comes to machine learning it has been one of the biggest achievements of the past few years. Still the system is filled with mistakes that lead to misinterpretations. Where scientists are mostly focused on the efficiency, we as artists are interested in the mistakes and failures of the system. In other words we are interested in – what we call - the beauty of aesthetic mistakes. Comparing human and computer vision presents for us a domain where we can question what is it mean to see and understand.

How do Artificial Intelligence-machines perceive the world in terms of human interpretation, imagination and classification?

The question above is the theme of my research in collaboration with artist/friend Boris Smeenk. Over the past months we did multiple experiments within the field of algorithms, machine learning, object recognition, image generating and classification. We are exploring these 3 elements that are a key functions of the human brain. They make us who we are and they are in many cases the main goal of researchers to build a machine that can process information the same way as we humans do.

Darknet, object recognition, difficulties – Boris Smeenk. 2017

"The virtual has transcended my perception of myself and my life. Technology becomes a tool for my thinking and seeing things around me. The boundaries have become vague; it’s no longer clear where the human stops and the technology starts. There is a blurred area where me, myself, technology and the rest of the world merge together.”

Zimolag, Angieszka - A Dream of an Algorithm. 2016

This quote from Agienszka is something I closely relate to. Technology has the power to change our landscape, our communication methods, language and behaviour. We now use hashtags, filters, memes, translators and emojis everywhere. The constant technological developments have an increasingly bigger impact on our lives. Every time something new emerges we adapt ourselves so easily because every new function seems to “improve” our lives or just entertain us for some time. However, every new technology also comes with its errors, mistakes and limitations. These imperfections exactly what interests me because often they expose more about the way we use technology, its meaning and its functioning.

Many of the technologies like our mobile phones or social media accounts are constantly monitored. We experience the work of algorithms in applications like Photoshop or Snapchat and search images and translate texts to other languages using Google.

Algorithms are the driving force behind many things in our lives, they fulfil their tasks but we hardly ever question how they work and what they do. Although implemented everywhere they remain a hidden layer underneath our interfaces. The algorithm’s perception of imagery in particular has become my interest. The way we understand the information we process is still very different than that of a machine. We learn to look at photos and by this we can create stories, stimulate our senses and emotions. Algorithms are trained on the principles of the human brain, but still many of the implemented algorithms in programs like Photoshop are just systems programmed to process information and fulfil their tasks without a deeper layer of thought. Still these images can create new meaning of work as a reflection upon ourselves as humans.

Collective Memory, Nachtwacht, Study about the places we visit. 2015

An example of this is the Collective Memory project I did in 2015. I got fascinated by the way people behave around the sites they visit. By constantly documenting their surrounding by shooting photos with their mobile phones as if capturing the proof of their existence. Often these photos don’t leave the mobile devices, but some are shared on social media platforms such as Instagram or Facebook. I found it interesting that these images, bounded by their location-tag create a data collection that can tell us more about the places we visit. By combining these images into a single panoramic image, I expected to get an image that would reveal more about the way this place looks. However arranged by the “Image Merge” function of Photoshop those images have transformed into an abstract representation of a Collective Memory. In other words instead of representing the physical aspect of the space they became to represent a cultural behaviour.

Screen Shot 2017-01-31 at 13.02.13.jpg, reporduction of
Jheronimus Bosch painting - the Garden of Earthly Delights 2016

Another example is the work Screen Shot 2017-01-31 at 13.02.13.jpg , in collaboration with Boris Smeenk (2016). The project explores the field of image identity and the reproduction of imagery within the digital age. The work is based on an automated system, which continuously makes screen shots. A single image is fed to the system, which makes a screen shot of it and after it makes a new screen shot of the last one. The process continues, generating thousand of images, until the original image becomes unrecognizable due to the algorithmic compression, shifting the identity of the image.

Blending

Before the start of our graduation period we have been experimenting with different Photoshop tools. Every tool is designed for a certain usage, with outcomes that are pretty much as expected. However, the results can be surprising when using the tools in a different way. Unexpected results produced by the machine are often interpreted by us as dysfunctional or imperfect. For me as a designer the aesthetic of mistake that comes with new technology is exciting. They give us an insight in the way the function actually works — they can represent something about the maker, our era, behaviour and needs as human beings.

FaceApp sorry for ‘racist’ filter that lightens skin to make users ‘hot’

A practical example of dysfunctioning technology within this theme is the Snapchat “hot-filter” (FaceApp Sorry for 'racist’ Filter, 2017). The filter uses facial recognition to displace or transform your face into things such as memes, animals, old people, elves or other things. In this case the ‘hot-filter’ would smoothen your skin, sparkle your eyes — transforming you in something which fits within the general conception of beauty. The problem was that the filter transformed dark skin into lighter skin and other characteristics such as eyes into a Western Beauty idea. People responded by calling the function racist, but it was never the intention of the creators to implement a certain bias into the system, neither it was the algorithm itself — it was simply performing what it was programmed to do. Snapchat responded with a statement saying that it was not the algorithm that was not working properly — it was the dataset of images that was used to train the algorithm to teach it the idea of a beauty. What I like about this example is that the technology can work as a mirror — a reflection showing us the truth — exposing problems regarding visual representation of beauty and the dominance of Western culture on the internet.

worldwhiteweb.net by Burai, is an project addressing those cultural differences, by trying to put an end to the norm of ‘whiteness’ on Google Images. It exposes the burden of representation on the Internet and the impact of new information technologies that are being produces by huge corporations — systems designed and trained mainly in white western dominant stereotypical principles.

Photoshop is one of these tools that I enjoy experimenting with. The program has a quite accessible interface — many features can be applied instantly but you only see results, without seeing what actually happens behind the image. Using its features in ways that they are not meant to be used exposes a little bit of this hidden process, which you can see in my earlier example with the Merge tool. In this experiment I worked with the Blend tool, used for focus stacking. Focus stacking is a digital image processing technique, which combines multiple images taken at different focus distances to provide a image with a greater depth of field than any of the individual source images.

Google Dictionary, Algorithm

By coincidence I discovered that the technique also works when you feed it a different imageset, resulting in - what I call - an “absolute image”. By this I mean that it becomes — similar to my project Collective Memories — a collective image. Taking strong elements from every image and mixing them together into a single one. I view this as a way of visual storytelling. Connecting different elements into a whole – in this case set of related images - can bring a stronger story by giving us more complex visual representation of a topic. 

During this project we got the full support of two external partners named Nikos Voyiatzis and Artyom Kocharyan. Both artists graduated at the Piet Zwart Institute, specialised in networked media.

Nikos Voyiatzis - Effect of the List, Institute of Network Cultures. 06.11.2015

Artyom Kocharyan, Google Images - Building on Fire. 2015

The practice of Nikos and Artyom is in many ways relates to my own work. In his thesis The Effect of the List, Nikos writes about the indexation of information through classification principles and standardized systems. He argues that classification as a tool of efficiency and organization has numerous disadvantages and limitations. Namely that it creates hierarchies where any information that is considered ‘abnormal’ is seen as unfitting to represent a given topic. Later I will refer to some parts of his thesis that have become very valuable for understanding my own position regarding the topic of classification.

Artyom’s graduation work is based on search results of Google Images and similar to my work it combines a collection of images to compose a singular one. His project takes into account Google’s algorithms by both playing with linguistic inputs – the search entry and the tools available on Google Images, such as color filter etc. Hence the image above ‘Building on Fire in Color Blue’ (2015) that refers to the attributes that have shaped the image.

Blending experiments

The first experiments we have conducted were based on the following visual datasets: preset wallpapers, open application tabs, online conversations and flags. They were meant as exploration to see what kind of visual language each of them collectively had.

All OSX Desktop wallpaper

OSX Desktop wallpaper “Nature”

OSX Desktop wallpaper “Universe”

OSX Desktop wallpaper “Rock”

OSX Desktop wallpaper “Abstract”

Open Tabs, Google Chrome

Online Facebook interaction with Carmen Dusmet

Universal Worldflags, Blend of all nation flags. Study of visual language through culture

Representation

Sky with Clouds 1600x900, Study on repetition and representation

Google Image search “Beach” 1600x900, Study on repetition and the seduction of representation

The above images are some of the experiments on visual representation that are based on the principle of similarity. The idea came to mind when looking at the functionality of the blending tool in Photoshop, which is meant for ‘stitching’ together similar images. Namely images that are taken from a same site and with the same camera. However I approached the idea of similarity in the wider sense. Google Images is one of the many sources that is filled with images that are similar in the way they look. For instance, the ‘Beach’ example represents the way we are most pleased to see a beach — blue sea, palm tree on the right, a local boat. A scene — used often to set a certain feeling of relaxation, good weather and leisure.

Following the principle of collective representation I have used collection of images from Stock Photo or Google Images as a source to create new images. Even within a small range of images you can see a lot of repetition in the way the images look. When we think about the meaning of words we can immediately relate them to a visual form. This idea of collective representation means that we have a shared idea of how something looks, which often results in a standardizes and stereotypical representation. Since stock photos are widely spread everywhere they affect the way we think of representation. As a result it creates a reversed effect — instead of representing the reality as it is, it becomes to dictate how reality should be represented. This results in a loop, wherein visual representations of reality become even more stereotypical.

Stock Photo, Shutter Stock, Study of the representation of the “handshake”, first 4 pages

A well known form of representation in stock photos are the business handshakes. There are thousands of them. Images that represent corporate success, business relations and job opportunities. Again, mostly white, happy office people. Above you can see a collection of images from the first 4 pages of the search term “hand shake”.

Stock Photo, Shutter Stock, Study of representation of “handshake”, first 4 pages

Shutter Stock, stock photo of “cancer patient”, first 4 pages

One of the tags I came across in Stock Photos was ‘cancer patient’ . It shows images of mostly actors who were bold (because of chemotherapy), but yet seemed very happy and hopeful. I find it rather absurd how thoes images were presented. The stereotypical cancer patient apparently is desperately smiling. When we saved the images and looked at the filenames it became even more surreal. The images were labeled with sentences, describing the scenes in very odd ways, not to speak about the order number which follows. It shows that there are no exclusions when it comes to the Stock Photo business — everything subject is victim of the classification principles of standardised systems, fitting everything in a category with its own visual representation.

portrait of a nice middle aged woman recovering after chemotherapy
focus on her smiling relax

Shutter Stock, cancer patient, filename

Shutter Stock, list of filenames, discription of “cancer patient” images, Study of labels

What interests me is the indexation and decoding of imagery — the way we see, process, recognise and classify visual content. It is this relation between language and imagery, between human vision and machine vision that will be discussed in the following chapters.

A.I.

We are interested in the way algorithms perceive images and in order to research this we are focusing on Deep Learning. The experiments before where based on Photoshop, but in comparison with Photoshop, Deep Learning is much more advanced in the way it processes information. Deep Learning implies that the machine - starting from scratch with a primary structure which can be applied in many different ways - needs to be trained in order to learn (I will expand on this later). In this project Boris and I explore the “deep” meaning behind its functioning, applying experimentation as a research method in order to discover its underlying principles.

Over the last few years we have seen a huge growth in popularity of Deep Learning. Big corporations like Google and Facebook invest in this multi-billion business. Why now? Artificial Intelligence has been around for many years but the recent growth can be explained by three main elements. 1. Big Data — the amount of data that is produced, managed and stored is enormous. The main reason for this is that it became technologically possible to store this data. 2. The technological improvement of GPU’s (videocards) to process this information in support of the deep learning process. 3. The profit that comes from analysing and classifying this data. Google and Facebook for instance are the owners of a huge part of the Internet data traffic. They can use deep learning to create structure and have an even more powerful surveillance, watching over us, selling it mostly for the sake of advertising. In 2017 Google and Facebook together had already an increase of 2.9 billion dollars. Artificial Intelligence, Machine Learning, Deep Learning? (Woodie, Alex. 2017) Before focusing more in depth on Deep Learning it is important to explain some key elements of its history, functionality and complexity.

Artificial Intelligence, Machine Learning, Deep Learning?

Before focusing more in dept on Deep Learning now and in the future it is important to explain some key elements of its history, functionality and complexity.

IAB’s numbers and public financial numbers of 2017. IAB.com

Artificial Intelligence

There are many names used to describe the concept of learnable Artificial Intelligence, algorithms have been around for decades. In the 1940s after the invention of the first programmable digital computer - based on the essence of mathematical reasoning - humans have been dreaming of creating an artificial electronic brain. This dream has been boosted by philosophers and science fiction writers, speculating about where the future will take us
(Copeland, Michael. 2016).

Graphic about layers within A.I.. CB Insights. cbinsights.com

Timeline showing the development of A.I., Nvidia Blog. blogs.nvidia.com

Presentation by Tim Finin and Marie Dejardins about the future of A.I. ()

The Presentation by Tim Finin and Marie Dejardins at University of Maryland, Baltimore, US, about the future of A.I. raises very interesting core questions. Even though scientists haven’t managed to grasp the complexity of the human brain, it has been our long-awaited desire to create something artificial that can perceive the world in the way humans do. But at the same time the image above proofs the concerns about the fast development in A.I. - concerns that have been articulated by “big brains” such as Bill Gates, Elon Musk and Stephan Hawkings - people who shaped the modern landscape of information technology. (Finin, Tim, and Dejardins Marie.)

Stephen Hawkings has expressed his concerns about artificial intelligence in a lecture at Cambridge in 2016— in his vision the problem with A.I. is the control of power that comes with it. Hawkings mentioned the “dual use” of A.I., a phrase used to describe technologies capable of great good and great harm. The idea of control is described in two time slots: For the near future it will, according to Hawkings, be the question of ‘Who is in control?’. Many companies and governments invest in the development of A.I. We already see its usage when we look at the modern warfare such as drones and battle bots, which are able to operate autonomously. (Barrat, James. 2016) Those machines are capable of making distinction between enemy or ally, attacking or protecting etc. All this is without a human intervention, which sounds like something out of Hollywood blockbusters than reality. Another implementation is data mining, which is the process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. An example of implementation of data mining is by the Central Intelligence Agency (CIA), which uses it as a surveillance tool in the War on Terror by tapping phone and social media conversations. Another concern that Hawkings articulates is the scenario where nobody has control anymore, only the system itself, which can have all sorts of outcomes. There are many examples of science fiction stories related to this futuristic scenario, presenting AI as mysterious and exciting, sketching a futuristic utopia or dystopia (depending on whether the world is destroyed by evil mechanical forces in the end, which is often the case). (Cellan-Jones, Rory, 2017)

Terminator 1, 1984, Artificial Intelligence as the futuristic force of evil in Science Fiction, IMDB

A.I. Artificial Intelligence. 2001. IMDB.com

Ghost in the Shell. 1995. IMDB. IMDB.com

The movies above are science fiction stories related to the ethical questions of the regarding A.I., portraying different scenarios wherein A.I. machines are dominating the world. My favorite one is Ghost In The Shell If you have never watched it before I would strongly recommend seeing it, since it explores beautifully the philosophical aspects regarding machine consciousness.

Optimism was the driving force behind the goal of achieving an artificially created consciousness. The first AI researchers who gathered together at he Dartmouth Artificial Intelligence conference at the Dartmouth College in Hanover, New Hampshire in 1956, predicted that a machine as intelligent as a human would exist in less than a generation. However, as we know by now, this prediction did not became reality yet.

The IBM 702: a computer used by the first generation of AI researchers. Copeland, Michael. 2016

Among the first generation of artificial intelligence were computers trained to play games of checkers, in the early 50s. The computers that were filling entire rooms had only a function of performing simple tasks. Later in the 90s IBM’s created computer Deep Blue succeeded in beating the world chess champion Garry Kasparov in a game of chess.


Deep Blue beat G. Kasparov in 1997. Eustake. Youtube 2007

Machine Learning

The ability of machine learning to scan emails and remove spam

Early A.I. needed a lot of hand coded structure — written out options for the machine to take. With Machine Learning these steps do not necessarily have to be written out completely. The system can use an algorithm to extract data and learn from it. Based on a set of instructions and training with large amounts of data, the algorithm can learn to make determinations or predictions such as separating the emails from your mom from Spam, or predict your taste in music based on your Youtube history, and suggest  related music or targeted advertisement. (Ingram 2017)

Basic Deep Learning (Neural Network) graphic, pages,cs

Deep Learning

Deep Learning - which lies at the heart of our interest in this research - is another algorithmic approach within A.I. The concept of Deep Learning has existed for decades, but as I explained before the exploration were very limited since there was just not enough process power to get the system to work properly. Deep Learning is based on our understanding of the biological brain, consisting out of millions of connected neurons processing information. In Deep Learning these neurons are called layers. These layers consist of input layers, which receive datasets (which can be based on many things, such as images, text or sound). After this there are the hidden layers, which process the input data. Each layer identifies more complex features of the data. The output data is the given result after the training. The whole process comes with a lot of complexity, which I will explain briefly in a simplified way. (Giro, Xavier. 2017)

Note: Our exploration of Deep Learning is mainly based on Convolutional Neural Networks (ConvNet or CNNs), one of the most poplar architectures of Deep Learning.

Graphic from Karpathy’s Convolutional Neural Network tutorial, explaining the different phases of training ConvNet

Input layer

The input layer holds the raw pixel values of the image, based on your dataset, the size of your image and three color channels R,G,B.

Deep Visualization Toolbox, visualising the process of Deep Learning. Jason Yosinski,2015

Hidden Layers

The video of Jason Yosinski gives a demonstration of this process. Starting of with primitive features, like an edge in an image or a unit in sound. What happens is that the neurons are filtering the input data in a search for regularities and patterns.

Graphic explaining the functioning of the Discriminator and Generator, Slideshare

The process of the hidden layers is based on two elements, called the Discriminator and the Generator. The Discriminator learns from the given input data, like a teacher giving marks to its students. The Generator - which basically are the neuron layers generating different outputs – can be seen as students who look at a lot of images and try to draw them, without ever having seen the actual thing they are attempting to draw. When the teacher (Discriminator) sees the drawing he gives marks to the students (Generator). Every time when the teacher comes back to look at the drawings is called an iteration. Every time it gives a mark, which is higher than the previous mark is called an Epoch, within one Epoch the marks can be lower or higher till the next epoch is reached. The teacher (Discriminator) decides if the students (Generator) pass or fails. This is a very powerful aspect of Deep Learning, it gives the system the ability to train and control itself without human interference.

I found this easy to understand explantation by Arthur Juliani about the discriminator and generator using Spongbob as his metaphor. (Juliani, Arthur. 2016)

Output layer

The output layer gives the end result of the whole process. It performs as a reflection on the given input. The output can be many things from sound, text, 3D objects or other things but our research is primary based on image (and some text). The image below shows the challenges that come with visual recognition. It shows how difficult it can be to find patterns to build on.

Image showing the challenges when it comes to visual recognition, Arthur Boer based on (Alex. "CS231n Winter 2016 Lecture 3)

The brain

The difference with the neurons of the human brain is that the brain can connect to any other neuron within a certain physical distance. Scientist Azevedo and colleagues did research about the functioning of neurons in the human brain and estimated that the brain consist out of around 86 billion neurons, whereas in Deep Learning this large amount of neural layers can not yet be accomplished. This underlines how extremely powerful the human brain is.

Deep learning is very similar to the learning process of a child. In order to understand the world you need to gain experience in it. By processing large amounts of data, starting from scratch. Deep Learning can be applied in many forms. For instance the Tesla self driving-car I mentioned before learns to improve itself by experience. Being part of a larger network of cars that share information between each other, the car improves itself over time on the basis of the feedback gathered from the collective experiences of the group. (Andrew Batiuk. 2016)

Representation is again an important aspect of this. When we think about an object, place, person, animal: What do we see? What is our collective idea of representation? If we all imagine a dog or any other object, what will it look like? This does not only concern the human mind, it is a key element in machine’s learning process. (Dicarlo, James. Zoccolan Davide. Rust, Nicole C. 2013)

Google Quick Draw, dogs Google Study of representation and object recognition (quickdraw.withgoogle.com)

Above is a very clear example of this, relating to both human and machine vision. Quick Draw consists of millions of doodles, made by humans within a certain timeframe. Within the “game” the machine needs to guess which object you are drawing based on the database of drawings from preview visitors. What’s interesting about this is the relationship between the different drawings. The machine learns so easily because we have a collective idea about how to portray a object/subject.

Doodle by Nikos Voyiatzis, Dirty Data, Giving falls information

We’ve seen that Deep Learning related development in recognition and classification systems is mainly based on the improvement of the indexation of content. Google and Facebook really want to have a complete overview and structure of content. Even the fun game with the doodles is not just there because of all the joy they want to provide. Corporations are seeking higher goals of achievement in order to create a system that can structure and label the whole internet. What’s interesting about this is that it is the human itself slowing down this process. The doodle above for instance, is an image Nikos send me. What you see is not just a penis, it is a form of dirty data. Dirty data is inaccurate and incomplete and what Nikos has created here is a form of human behavior, which results in the distortion of the machine’s process to learn and understand. We can clearly see that the machine "does not get the joke", and acts pure on logic. In the next chapters I will elaborate on this aspect of human classification but before that I will explain some more about Google’s recent investments and inventions, and I will walk you through some of our first experiments with Deep Learning.

Google invests billions in their Deep Learning research agencies Google Deep Mind and Google Brain. In their way of communication it seems they only do it for the greater good of the planet...

Google Brain Super Resolution

Google Brain has developed a new program that creates detailed images from tiny pixelated sources. It’s able to reconstruct an 8x8 pixel source image into a 32x32 image. (Sebastian, Anthony)

Example of Google Brain super-resolution, on the left is the input data, in the middle the result they gained and on the right the ground truth (the original image)

Google Brain trains the program using the CelebA dataset — a dataset based on nearly 3 million images of celebrities faces — perfectly aligned eyes, cropped images.

What’s interesting about this aspect is that it really plays with our idea of imagination. The program uses pure algorithmic logic, based on the dataset, to de-pixelate the image. When we look at these images we can use our imagination to fantasise about what we see. This idea of the role of imagination when we look at abstraction is something we wanted to explore.

Just a thought (Google Maps) Imagination plays a big role when humans are confronted with abstract imagery. For example if the image is obscure, vague or partly covered the brain will begin to ‘fill in the gaps’. In other words it will produce the necessary information to complete the image. This is the case with censored images. Although not precisely,  when looking at censored content, the brain can contextualise the censored parts based on our personal knowledge of the world. A thought about this was related to the censored places within Google’s own program Google Maps. Many governments, including in The Netherlands, ordered Google to keep certain sites hidden, which became to be buried underneath a field of pixels. The contrast is almost ironic — on the one hand Google makes these places pixelated, but at the same time they are working with Deep Learning that is trained to de-pixelate places.

Google Maps, pixalated places, Penis shaped Illinois Christian Science church, Dixon

We were wondering what the program would ‘hallucinate’ when it would look at those pixel fields.

Snapchat Faceswap, example of algorithm hallucination, seeing faces in structures, Reddit

Unfortunately we did not succeed in using it. The only thing what’s left is some collages based on the dataset of celebrities.

Photoshop blending collages based on CelebA dataset. Study of celebrity faces

PixelCNN

We moved on. Pixel-CNN was the first Github - a platform for development and knowledge exchange, many project are open source and accessible for anyone to use - we could get to work but only by generating output images based on a pre-trained model from Imagenet (Imagenet is a database of images, I will explain more about this later). At this moment we did not manage to find a way to train the system with our own data. The only control we had was changing the size of the output, generating 1024x1024 pixels. Size is one of the main limitations of ConvNets, because large scale images are simply too heavy to process while training. What happened then is that we stretched out the pre-trained dataset resulting in these interesting looking images based on the imageset of cars. (Openai. Openai/pixel-cnn)

Pixel CNN, 1024X1024 Study of cars, Imagenet dataset (1)

Pixel CNN, 1024X1024 Study of cars, Imagenet dataset (2)

DCGAN

We started to experiment with DCGAN TORCH, another Github with as pre-setup the CelebA dataset. By training with this dataset we finally could train the system and generate images. (Soumith. "Soumith/dcgan.torch)

Our first trained and generated celebrity faces, Dcgan.Torch, CelebA

It was really exciting to finally have this tool which we could explore/exploit. But before we go there I would first like to go through some things we noticed during the process and some new thought about the concept of Deep Learning.

Deep Aesthetics

When working with the different Githubs we noticed that the examples are based on well sorted data sets, consisting out of labeled images in many cases or in the case of CelebA, nicely aligned images of faces. What we found interesting is that the Internet in its most pure form is still very chaotic, considering the amount of data traffic  and how content labeling differs per individual. However many corporations try to tame and structure this chaos. They need to train their machines firstly to recognise, sort and label this hypercomplex stream of information. The researchers developing these programs naturally are focusing on getting the best results, while we and many other artist enjoy the weirdness that comes with the imperfection of the images. One of them is the Dutch artist Constant Dullaart — what is interesting about his work is that it plays so well with this human - machine relation. His work within the field of Deep Learning plays with the aesthetic of mistake, which causes results that are like abstract, surreal paintings. This inspired him to create his series of Machine Learned, Man Made Paintings. Neural Network compositions transformed into oil paintings by paint factories in Dafen Village, Shenzhen, China and translated into oil paintings on canvas - continuing the image automation process with outsourced human labour. (Dullaart, Constant. 2016)

Constant Dullaart, Machine Learned, Man Made Paintings, Upstream Gallery, Amsterdam 22.10.2016

Another interesting example is Memo Akten www.memo.tv/,
who’s artistic approach to the matter of ConvNet is very engaging. He experiments with real time programs in ConvNet.

Imagenet

Back to sorted data.

Website Imagenet, image-net.org

Imagenet is a database of millions of sorted images. All the images are labeled by the intensive labor of nearly fifty-thousand people from 223 countries. This perfectly structured image data is used to train and improve results based on recognition and classification. The results could easily be evaluated by comparing it with the human classified content. It is ironic that so many jobs are created to manually sort information in order to automate the process of Big Data, which will ultimately result in having less jobs and unemployment.

Snapshot of @Shinyeee use of hashtags on Instagram

Human Classification in the Age of #Hashtagification

The hashtag as we now know it was introduced by Chris Messina, a former Google design employee. Messina introduced the Hashtag based on the old pound symbol. He suggested the tag as a new method of metadata indexing — in order to create a structure to the chaotic subject sphere of Twitter.

Chris Messina, First introduction of the hashtag on Twitter 23.08.2007

The hashtag is formatted language or “paralanguage”.

Paralanguage is a component of meta-communication that may modify or nuance meaning, or convey emotion, such as prosody, pitch, volume, intonation etc. It is sometimes defined as relating to nonphonemic properties only.

– Memindex, paralinguistic communications www.memidex.com

Nowadays the hashtag is used everywhere, even beyond the cyberspace domain — implemented into public spaces and events, such as protest campaigns. (Gillman, Ollie. 2016)

Protesters with Black Lives Matter sign, showing the implication of the hashtag in public events. Dailymail.co.uk

Chris Messina never expected the success of the hashtag, nor did he expect that the hashtag would become, rather than a carrier of metadata, a symbol that characterizes our relationship with information flow within the age of Big Data. Namely the symbol of subjective classification.

Moreover, since folksonomy – the practice of tagging by users – has emerged, huge amounts of subjectivity enter the wider cataloguing practice of online information, often outside controlled vocabularies.

Voyiatzis, Nikos. The Effect of the List 06.11.2015

Definition of Metadata, Business Dictionary,

"Ben Zimmer of the Visual Thesaurus labels hashtags as a form of “ironic metadata,” almost a way for someone to convey that they are “in-the-know,” says Messina, according to the New York Times. –"
Zimmer points to how hashtags have now become a vehicle for self-directed sarcasm, pointing to an instance where Kanye West tweeted, “You have to balance ignorance with intellect! Can’t have school with out recess! #Greatesttweetofalltime.”

Zimmer.Ben Is ‘Hashtag’ ruining the English language? 28.06.2014 – college.usatoday.com

"The hashtag is a vulgar crutch, a lazy reach for substance in the personal void—written clipart."

Biddle.Sam , How the Hashtag is Ruining Language 28.12.11 – gizmodo.com

The above quotes express the condition of the hashtag very well. The hashtag did not become the most ideal method of indexation. I think that might be the reason why Google is investing so much in artificial forces to do this job. Also in relation to Imagenet and Shutter Stock, it seems that the only time humans properly try to classify images is when it is an act of paid labor. What happens in the world of hashtagfication is that it leads to the creation of hashtags consisting out of non-existing words and sentences, generating all sorts of new subclasses of indexation. This is a result of the joke of the hashtag itself. We see how concepts and language are fluent, every new technology changes the way we connect, write and see. Another aspect of the hashtag is that it gives the user the possibility to become part of many different classes within the index. This results in a radical overuse of tags, placing the content into many different classes, often not even related to the subject of the image. Both Twitter (with their limitation of characters) and hashtags, within modern media in general, has changed the way we use language and communication. Moreover, over time this phenomenon will start to affect and ultimately change the meanings that we assign of specific subject. (House, Tom. 2014)

Thought about this: How will you understand the representation of an object if the only images you see are bound to the index created by hashtags? What, for instance, would become your representation of a dog if you’d never have seen one in real life, how would you draw a dog?

Dcgan output after training #dogs, Study of online representation and hashtag classification

#Dogs

In the experiment below we trained the machine with an unfiltered “dirty” dataset of 30.000 images, based on the Instagram hashtag #Dogs. We choose to use Instagram because it is the biggest image based social media platform, perfect for experimentation. Since I was already using the dog in many previous examples, it became only logical to use it again.

What happened was that the output after training ConvNet became a subjective interpretation of the meaning #dogs. The dataset existed not just out of people overdosing their followers with photos of their “cute dogs”, it became a hashtag, Snapchat filters as dogs or used without any connection to the images, purely to generate more viewers, since it is a popular hashtag within Instagram. What we find interesting is that the algorithm uses plain logic to analyse the dirty data in search of patterns and regularities. The result of this shows less about the functions of the machine but captures much more the essence of our times and the patterns within the human behaviour.

SFORT SLIVE. DONT FORGET TO GET YOUR TREACS AD BUSINESS PIMP OR SELFIE FAD PIMP HES ARE NOT PIMPE BRONW BETRECA PIMP I WEN THE PIMPLE YOU WILL BEUVERIMINSSO #AMRPIMYSELFORHOCERIAVEDAMEMEMEMES #THEMANOFTHEYEAR#PEOPLESEXIESTMANOFTHEYEAR #THEHOESNUMBERONEDRAFTCHOICE #GODSAVETHEPIMP #LONGLIVETHEPIMP #SELFIEGOD #IAMTHENOWHATSTHEMISPIMPING #IMAMEMBEROFTHEBATHTUBCLUB #GOODNIGHT #MRPIMPGOODGAME #THEHOESNUMBERONEDRAFTCHOIZE #BELIEVEINYOUROWNSELFIES #LONGLIVETHEPIMP #SELFIEKING#PIMPSDOWHATTHEYWANTTOD

Mrpimpgoodgame.over , Study of language and classification on social Media, Instagram 04.05.2017

Screenshot of Instagram hashtag #mrpimp , above the real @mrpimpgoodgame, underneath our artificial @mrpimpgoodgame.over. Exploration of online identity and our symbiotic relation with machines

Within Instagram there are already two implementations of algorithms. On the one hand you have Instagram itself and algorithms supervising the flow of data, primary used for advertisement, and at the other hands there is a plague of bots, creating fake account used for automated fake likes and followers in exchange for money. (wilson, carder. 2017)

The experiment is based on the Instagram phenomenon @mrpimpgoodgame, the hashtag associated with a user who gained popularity by posting the same kind of ‘smiling selfie’ every day. This became a perfect dataset to train on. We trained the system not to recognise @mrpimpgoodgame, but to generate its own versions of him. We did the same for the way he writes by using hashtags using
Torch RNN. (jcjohnson/torch-rnn)

Torch RNN can be used for character-level language modelling, training on text base data. RNN is trained on text structure and generate its own version of text based on the origin. This interpretation is still filled with mistakes but these mistakes are super exciting for us. It shows a new form of language, and structure. It shows what language can be without understanding meaning.

Rob Peart is a Graphic Designer who also explores the use of Torch RNN, Dcgan. Read his article about his experiments Here

(Peart, Rob 2017)

@mrpimpgoodgame.over using the hashtag #selfiegod

This resulted in the birth of @mrpimpgoodgame.over, an artificial account that uploads two generated selfies, including captions, every day. Mr. Pimp’s followers consist of a thousand bot-followers which we bought for a small sum of money. They gave his photos likes and compliments. @mrpimpgoodgame.over is an example of a machine reproducing human online behaviour. In the future bot-accounts may not only be fake identities, built from a collection of found images, but they may become generated replicas. This tryout was mainly to experiment with creating a workflow wherein everything would happen automatically. @mrpimpgoodgame.over was fulfilling its task — generating captions and  images, placing it online within a certain timeframe. This made @mrpimpgoodgame.over a machine with a face, as humans we can easier relate to a face and build a character. Although Boris and I programmed Mrpimp (the fake user behind the account) it still felt like it had character and emotions. Wishing us #MOGOODNIGHT in the night, waking us up with a smile. This is an example of how machines can become part of the conversation.

@mrpimpgoodgame.over #GOODNIGHT

Nvidia and Game Graphics

NVIDIA GRAPHICS, geforce GTX (nvidia.com)

Earlier I wrote already about the important factor of GPU, providing the processing power which makes the fast development of Deep Learning possible, with Nvidia playing a big role. Nvidia was, in Boris’ and mine childhood, the coolest GPU developers for the gaming industry, with their cool science fiction graphics, dragons, Thumb-raiders, rough shapes and cool extravagant long names with many G's and X's. However the market shifted with the growth of the A.I. industry and the demand for process power. Nvidia then focused developing CUDA, a program language which makes it possible for developers to carry out heavy algorithmic task on the GPU. Although part of their directions changes toward the development of A.I.,  the graphics remain the same. Again — the same as with the movies about A.I. it is the fictional aesthetics of futuristic elements that influence our reality. It could be that Nvidia becomes one of the main conglomerates like Google and Facebook in the future, which could lead to the normalisation of this form of aesthetic language.

The experiment bellow plays with this. We generated a font-family called Epoch.AI, based on our own font library, which resulted in some interesting looking letters.

Epoch, Machine Learned font Dcgan, with genereted text with RNN

All Watched over by Machines

Once you adopt skepticism toward the algorithmic- and the data-divine, you can no longer construe any computational system as merely algorithmic. Think about Google Maps, for example. It’s not just mapping software running via computer—it also involves geographical information systems, geolocation satellites and transponders, human-driven automobiles, roof-mounted panoramic optical recording systems, international recording and privacy law, physical- and data-network routing systems, and web/mobile presentational apparatuses.
That’s not algorithmic culture—it’s just, well, culture.

Bogost, Ian – The Cathedral of Computation 15.01.2015

How do Artificial Intelligence-machines perceive the world in terms of human interpretation, imagination and classification?

The core question remains. We are experiencing a growth in the way algorithms are implemented – the above text from Ian Bogost explains this well. Algorithms have already been implemented everywhere in our world, with multiple interconnected layers of technology sharing information almost like neurons. Watching us from the inside of the computer, trying to understand the inherent complexity of the world.

In the future many more machine algorithms will appear, sharing RAW data without making it readable to humans. They don’t need to prove that they work, they just work based on plain logic under the roofs of big corporations like Google and Facebook, who keep boosting this industry. They aspire towards complete order, pushing our complexity into simplified, labeled boxes.

We explored notions of classification, human and machine interpretation and imagination through the focus of current stage of algorithmic indexation and description of images.

Interpretation

When we look at the concept of interpretation, “the action of explaining the meaning of something”, we see that the machines indeed can have their own “interest”, focusing on features -- aspects of the given data, based on patterns and regularity. Patterns, who we as humans perhaps won’t even notice, since we can’t have this massive overview of data. It is the system’s logic built upon these patterns.

Imagination

It is natural for us humans to relate things back to our characteristics, making it easier to understand some complex matters. When it comes to imagination it was the idea of hallucination that we find fascinating. This idea that both machines and humans have this aspect of seeing things that are not actually there based on organisational structures for information. But seeing the results of our experiments, it was clear that the machine is based on pure logic, not as irrational and dreamy as the human mind can be. It preforms tasks pure based on its algorithm. It is the human mind that imagines things when it looks at the outputs – the images - of this process.

Classification

When looking at hashtagification we have seen the way in which humans classify their content with a huge amount of subjectivity — creating a new variety of categories. We have explored how the algorithm perceives dirty data – using dataset, based on unfiltered social media images – and which patterns occur in this stream. Again, the same as with the experiments with Photoshop Blending, it presents us with an image with is based on different elements from many other images. It is this collective image that is very interesting. Datasets that are well sorted gave already this idea of what the end result could be. While working with dirty data it presents much more an interpretation of patterns and structures within chaos -- reflecting upon us and our online behaviour in terms of social media usage and our cultural standards.

References

Andrew Batiuk, Tesla - Autopilot Full Self Driving Demonstration 19.11.2017 
    Video: https://www.youtube.com/watch?v=VG68SKoG7vE

"FaceApp Sorry for 'racist' Filter." BBC, 25 Apr. 2017. Web. 18 May 2017. .

"Quick, Draw!" Google Quick, Draw! N.p., n.d. Web. 22 May2017..

Openai. "Openai/pixel-cnn." GitHub. N.p., 13 Mar. 2017. Web. 22 May 2017. .

"Paralinguistic communications." Paralinguistic communications - Memidex dictionary/thesaurus. N.p., n.d. Web. 22 May 2017. .

Soumith. "Soumith/dcgan.torch." GitHub. N.p., 13 Jan. 2017. Web. 24 May 2017. .

"What is metadata? definition and meaning." BusinessDictionary.com. N.p., n.d. Web. 18 May 2017. .

“What is CAPTCHA?” solvemedia.com. N.p., n.d. Web. 17 May 2017. .

“How Vision Works” brainhq.com. N.p., n.d. Web. 16 May 2017. .

Alex. "CS231n Winter 2016 Lecture 3 Linear Classification 2, Optimization-qqBJl65fQck.mp4." YouTube. YouTube, 14 May 2016. Web. 17 May 2017. .

Andrew Batiuk. "Autopilot Full Self Driving Demonstration Nov 18 2016 Realtime Speed."YouTube. YouTube, 19 Nov. 2016. Web. 22 May 2017. https://www.youtube.com/watch?v=VG68SKoG7vE>.

Barrat, James. "Why Stephen Hawking and Bill Gates Are Terrified of Artificial Intelligence." The Huffington Post. TheHuffingtonPost.com, 09 Apr. 2015. Web. 22 May 2017. .

Bogost, Ian. "The Cathedral of Computation." The Atlantic. Atlantic Media Company, 15 Jan. 2015. Web. 30 May 2017. .

Burai, Johanna. "World White Web - Take part in changing discriminatory search results on Google!" World White Web. 2016. Web. 22 May 2017. .

Wilson, Calder. "I Spent Two Years Botting on Instagram — Here’s What I Learned" PexaPixel. Petapixel, Apr. 2017. Web. 26 May 2017. .

Cellan-Jones, Rory. "Stephen Hawking - will AI kill or save humankind?" BBC News. BBC, Oct. 2016. Web. 28 May 2017. .

Copeland, Michael. "The Difference Between AI, Machine Learning, and Deep Learning? | NVIDIA Blog." The Official NVIDIA Blog. 29 Jul. 2016. Web. 21 May 2017. .

Dicarlo, James. Zoccolan Davide. Rust, Nicole C. “How does the brain solve visual object recognition?”. NCBI, 9 Feb. 2013. Web. 21 May 2017.

Dullaart, Constant. “Deep Epoch”. 22 Oct. 2016. Solo exhibition. Upstream Gallery, Amsterdam.

Eustake. "Deep Blue beat G. Kasparov in 1997." YouTube. YouTube, 13 May 2007. Web. 22 May 2017. .

Finin, Tim, and Dejardins Marie. "AI Philosophy and History." Penn Engineering, n.d. Web.18 May 2017. .

Gillman, Ollie. "That'll show 'em! 'Black Lives Matter' protesters chanting 'hands up, don't shoot' head to east London - and barricade a WAITROSE lorry ." Daily Mail Online. Associated Newspapers, 11 Aug. 2016. Web. 27 May 2017. .

Giro, Xavier. "Generative Models and Adversarial Training (D2L3 Insight@DCU Machine ..." LinkedIn SlideShare.Slideshare., 28 Apr. 2017. Web. 29 May 2017. .

House, Tom. "Is 'hashtag' ruining the English language?" USA Today. Gannett Satellite Information Network, 28 June 2014. Web. 19 May 2017. .

Ingram, Mathew. "Here's How Google and Facebook Have Taken Over the Digital Ad Industry." Fortune, 4 Jan. 2017. Web. 20 May 2017. .

JasonYosinski. "Deep Visualization Toolbox." YouTube. YouTube, 07 July 2015. Web. 28 May 2017. .

Johnson P. Thomas, Mohamed Essaaidi “Problems of Security in Online Games”. Amsterdam. IOS Press. 2016. 15 May 2017.

Juliani, Arthur. "Generative Adversarial Networks Explained with a Classic Spongebob Squarepants Episode." Medium. Medium, 23 May 2016. Web. 18 May 2017. .

Kapathy, Andrey. CS231n Convolutional Neural Networks for Visual Recognition. Github, n.d. Web. 22 May 2017. .

Kocharyan, Artyom. Building Fire. Digital image. Artyomkocharyan.com. 2015. Web. 15 May 2017. .

Sarah Perez. "Google Now Using ReCAPTCHA To Decode Street View Addresses." tech crunch. N.p., 29 March. 2012. Web. 21 May 2017. .

Sebastian, Anthony. "Google Brain super-resolution image tech makes “zoom, enhance!” real." Ars Technica. N.p., 07 Feb. 2017. Web. 22 May 2017. .

Voyiatzis, Nikos. “The effect of the list”. Longforms. Institute of Network Cultures, 23 Sep. 2015. Web. 19 May 2017. .

Woodie, Alex. “Why Deep Learning, and Why Now”. datanami. 13 Jan. 2017. Web. 22 May 2017. .

Zimolag, Angieszka. “ A Dream of an Algorithm”. Longforms. Institute of Network Cultures, 23 Sept. 2016. Web. 20 May 2017. .