Automating Vision - Video Resources

Anthony McCosker
Jul 20, 2020
7 min read

Sometimes the best way to understand new digital technologies is to see them in action, especially for students. I have compiled videos and links to useful introductions, exposés and examples of seeing machines in action, or their design, impact and implications.

One of the aims of our Automating Vision book, and the program of research surrounding it, is to consider how we can socialise machine vision. By this I mean we have to build new visual literacies, together with machines. This is the camera consciousness project.

I was reminded recently about Walter Benjamin's notion of the 'optical unconscious', which was itself a call to undo the fantasy of mastery that follows new developments in visual and image technologies (photography, cinema, telescopes, microscopes...). He famously drew attention to the politics and power in the desire to see and know that visual technologies represent. What are the cultural, political, institutional forces and narratives that lie behind each of the examples of fascination with machine vision in the collection below?

The videos and links below are organised thematically.

Origin stories of AI and machine vision
Bias in AI and machine learning
Facial recognition technologies
China's use of facial recognition tech
Drone vision
Mobile mapping and Project Tango
Environmental mapping
Autonomous vehicles
Art and the world of generative vision
Deepfakes
Visual literacy
Camera consciousness (according to William Shatner)

The collection is incomplete, just some exemplars and hopefully useful resources. They may be updated and we're keen for input and feedback. What else is out there to add to the list?

Origin stories of AI and machine vision

John McCarthy, who is credited with coining the term artificial intelligence, talks about developing AI as a matter of understanding thought processes in order to replicate them through computers.

A useful, and fairly typical video presentation of the origins of AI as artificial thinking, or augmented cognition and reasoning. These are the precursors for machine vision - how can a machine be designed to see and think about the world around it?

But don't take for granted the 'founding fathers' narratives about AI. This video about the many diverse contributors to AI tech barely scratches the surface, but helps to avoid the heroism in the foundational narratives. Women in AI is a necessary movement aimed at correcting these narratives, but also points to the importance of diversity in leadership for the corporations designing automation systems.

Bias in AI and machine learning

When it comes to understanding how machines see, it's important to deal with the issue of bias up front. Take nothing for granted. Both humans and machines are bias. We both have terrible track records when it comes to the harmful effects of discrimination.

Google presents the story of 'human bias' in its AI configurations. There's a very deliberate framing going on here, but nonetheless, a good account of some of the forms of bias that become embedded in machine learning modelling.

Joy Boulamwini (MIT media lab) talks about inclusive coding, or InCoding, as being a mindset. Who's missing from our coding teams, from the datasets we use to inform and train AI systems. She insists that 'who codes matters'. We need full spectrum teams. Create a world with a culture of inclusion, where coding includes all and centres on social change. Address the 'coded gaze' that makes black faces disappear from machine vision systems, for example.

Facial recognition

Machine vision is as widely applied as it is highly contentious when it comes to facial recognition technologies. Let's look at facial recognition basics - how it works:

Why have many cities and governments in the US and elsewhere started to ban the use of facial recognition?

One platform that has gained a lot of attention is Clearview. Its use by law enforcement agencies around the world has been controversial - not least because it feels like it oversteps boundaries of expected privacy, our right to be left alone in public and private spaces. It has also run into trouble for its biases and its reported inaccuracies.

China's use of facial recognition technology

This story is complicated. Read the first part of our book for more considered detail about China's expansive use of facial recognition technology as part of its move toward automating governance. There are a lot of fear mongering reports and videos. Here are a couple of useful resources to get a sense of what's happening in arguably the nation leading the world in machine vision tech.

When your face becomes your ticket. Problem solving for social management and governance - including fighting coronavirus - or excessive simply state control?

Hong Kong protesters seek relief from the automated gaze newly installed surveillance systems. A now iconic image of resistance to machine vision as governance:

Evasion of machine vision and facial recognition by many means:

Drone vision

Drones have come to represent a new fantasy of mastery over aerial visibility and control. This early use of a drone to scope police positions in during protests in Poland, 2011 captures the scale and dynamics of the conflict in new ways.

DIY enthusiasts offer an interesting perspective on first person view (FPV) technology in recreational and racing drone systems. 'Hobbyist' communities embody a kind of experimental learning with drone tech in ways that help to capture its capabilities and potential.

In 2016, DJI introduced some of the automation technology in its new drone's adaptive machine learning that enabled the DJI 4 to follow determined flight paths, track objects or people, and 'return to home' feature that automates object avoidance. As they say, quoting Kant, all of our knowledge begins with our senses / sensors.

Mobile mapping and Project Tango

Indoor mapping and positioning is the prised goal for locative media companies and services. Google has been experimenting for some time with using mobile phones to track and map indoor spaces to add to its increasingly detailed mapping of public environments.

Project Tango is a Google led development of both a software and device or sensor technologies. The work undertaking here is a reminder of how difficult and how big a job 'augmented reality' tech really is. We are a long way from fully realising the potential of AR tech.

Environmental mapping

The use of machine vision for mapping environments, and the production of environmental visual data are tech applications that don't receive enough attention in digital media studies. Machine vision has been used in forestry to monitor and understand forest density and health, detect illegal logging or environmental degradation. In agriculture it is guiding large scale production and land management, as well as helping activists monitor animal mistreatment.

Machine vision is also foundational for the transformation of industry, toward what is referred to as 'industry 4.0'. At the very least, how does automation in this space affect the conditions of human labour? What does machine vision enable for advanced manufacturing?

Autonomous vehicles

The application of machine vision in guiding autonomous vehicles has a deep fascination value. We can thank Elon Musk for driving the hype curve straight up. But how close are we to 'level 5' fully autonomous vehicles? Most experts in the field (Musk aside), say we are a very long way off. Nonetheless, autonomous vehicles offer the kind of real-world machine vision and AI problems that tech companies love. The potential for disruption and transformation of the urban environment (read wealth creation), is a huge motivating factor.

A primer on what and how a car can see:

Pay attention to Andrej Karpathy. He leads the AI and machine vision work at Tesla, and works hard to convince the world that Tesla cars can see and think about its environment as well as you and I.

Why cameras are key to autonomous vehicles?:

Art and the world of generative vision

We can learn with machine vision through the many experimental and interactive media art works that have appeared in recent times. For example, "Learning to See is an ongoing series of works that use state-of-the-art machine learning algorithms to reflect on ourselves and how we make sense of the world." (Memo Akten, 2017, Turkish artist, computer scientist and AI researcher).

Akten's work uses artificial neural networks to show us how machines learn to see. As he points out, "It can see only what it already knows, just like us", so when the network is trained only on specific image datasets, its interpretation of everyday objects becomes fantastic. (Consider this in relation to the issues of bias discussed above).

Some early footage of David Rokerby's The Giver of Names project. As we discuss in the final chapter of Automating Vision, this interactive art piece invites audiences to imagine machine vision and the sensemaking processing of objects as visual data or information. Some of the 'non-sensical' results are as informative as recent more accurate systems can produce.

Deepfakes

The same machine learning systems used to "read" and interpret images, objects, faces etc, can be generate new or artificial images and composites. Generative adversarial networks (GANS), pit two neural networks against each other. One attempting to generate new material from a given dataset (for example of images of Vladimir Putin's face), and the other seeks to detect inaccuracies, against which the generative network adjusts until it has produced an optimum and undetectable likeness. The implications for information dis-information are substantial. Deepfake images used as artificial profiles for harmful misinformation campaigns (for example...) are leading us to rethink the nature of seeing and interpretation, or how we understand visual literacy.

Here's a Bloomberg quicktake on deepfakes to set the scene:

In this Ted Talk Suparsorn Suwajanakorn talks about deepfakes, how they are generated, as well as their risks and implications.

Avitarify is an opensource GitHub hosted program that allows anyone to replace their videoed face in realtime with the face of someone like Albert Einstein or Elon Musk. Perfect, apparently for joining Zoom as someone else. It's not difficult to learn how to apply these tools, given the many YouTube tutorials and the simple open source packages.

But should we all be learning to deepfake?

Visual literacy

Here's a good primer on the idea of visual literacy.

In our effort to learn with machines, and socialise machine vision, it's important to rethink visual literacy as no longer a purely human enterprise.

Camera consciousness (according to William Shatner)

There is something fun, but also deeply philosophical about the way actor William Shatner describes the weird agency of the cameras he encountered on television sets. We'll leave the last word to him.

Anthony McCosker

Swinburne Social Innovation Research Institute,

Centre of Excellence for Automated Decision Making + Society