Inside Automating Vision
Updated: Jul 2
Don't have time to read a whole book right now? Interested in what it covers before diving in? Here's a short overview of Automating Vision: The Social Impact of the New Camera Consciousness - Anthony McCosker & Rowan Wilken (2020).
Automating Vision explores seeing machines through four main case studies: face recognition, drone vision, mobile and locative media, and autonomous vehicles. Behind the smart camera devices that we explore lies a set of increasingly integrated and automated technologies underpinned by AI, pushing the boundaries of machine learning and image processing.
Seeing machines are implicated in growing visual data markets, supported by the emerging layers of infrastructure that they co-produce as images are poured into machine learning databases to improve their accuracy. Our focus lies in understanding the social impact of these new systems of seeing and machinic visuality, the disruptions and reconfigurations to existing digital media ecosystems, to urban environments, and to mobility and social relations.
Chapter two details the concept of camera consciousness and positions it as a means of accounting for the social changes entailed by the increasing autonomy and intelligence of camera technologies and machine vision. We begin with the visual anthropology of Gregory Bateson and Margaret Mead, who pioneered the use of cameras to discover the otherwise unseeable cultural and personal traits of their Balinese subjects. The conceptual framework for camera consciousness as a tool for interrogating machine vision borrows from the theoretical work of William James, Gilles Deleuze, Jane Bennett and other contemporary theorists – each of whom helps us to rethink the materiality of the image, and the pragmatics of the relationships that they affect. Read more on camera consciousness here.
In chapter three, we address the value of the face as data in the application and social context of face recognition technology. Expanding on the example of China’s camera-driven social credit governance system, this chapter offers an expanded account of the camera’s surveillant functionality. Built on technologies of recognition and matching, identification and sentiment analysis, the long coveted visual data of the face offers a wide range of possible applications.
Just as social media selfies become relatively ubiquitous and domesticated as everyday communicative practice, the value in the visual data of faces is unfurling. In fact, face recognition technology makes most sense, and has the greatest impact, in a time of social media and in relation to the depth of digital traces associated with the media accounting of personal internet and mobile phone use. Selfies, for instance, far from being the perfect negative trope of the narcissistic mobile and social media age, present a powerful base layer of personal data to be mined, combined and operationalized for any number of intelligent and automated services based on computing individuals and populations.
Chapter four considers the automating and augmenting of mobile vision capture. It opens with an historical account of key steps and stages in the development of camera automation, and of the incorporation of cameras into mobile phones. Then, in the second part of the chapter, the emphasis shifts by looking beyond the predominant focus on camera phone practices in studying the application of current mobile media to explore the autonomous production of information and metadata, including geo-locative metadata, that accompany and are left behind by camera phone and smartphone use. We shift the emphasis from acts of photography to the relative autonomy of mobile imaging and the increasing value of geo-located visual data.
Augmented reality has been a slow burn application for smart phones, finding a range of uses for the mobile camera as interface or overlay. In chapter we explore the transition toward a more embedded, everyday activation of augmented reality. Mobile imaging and visual processing technologies play an interesting role in reshaping everyday visibilities. We can see this in developments in internal mapping as it is imagined and tested through Google’s Project Tango mobile devices. This project is emblematic of the trajectory of personal mobile media involving sophisticated sensing and visual processing power.
As technologies of the new camera consciousness, smartphones have for some time now exemplified the expanded capacity to see through distributed points of view; added to this capacity are intelligent cameras, AI image processing and associated sensors that are networked and coupled with machine vision and cloud computing. In this sense, mobiles provoke questions about how individual local environments become visible and socially available. They also extend possibilities of seeing and acting digitally beyond human senses.
Drone vision, the focus of chapter five, presents a literal lift-off point for considering the motile camera – the camera in self-sustained and computer controlled aerial movement. Chapter four asks: how does the aerial drone affect and depend on new forms of distributed, mobile, wireless visuality? While a lot of attention has been paid to the trajectory of drone technology as it has arisen out of military contexts, we focus on the essential role played by drones’ camera technology, visual controls and wireless relations.
To understand the implications of drones in specific functional contexts – urban environments, manufacture and maintenance, agriculture, mining and forestry for instance – we look at how they reconfigure personal, public and environmental visibility, or what it means to see and be seen. Among other things, drone vision pushes the boundary of, and tests new experiences of, what is considered “public.”
The chapter examines the semi-autonomous vision technology that allows drones to act in the world. It explores the “creepy agency” of drone cameras as a factor of the new visual knowledge they are able to produce, and probes the meaning of “autonomy” for these machines that act like insects or in their unpredictable movements and their ability to swarm or to take the position of the “fly on the wall,” but also in their occasional waywardness. We argue that drone vision presents us with a key case of altered sociality triggered by a highly contested camera consciousness.
What can an autonomous vehicle see? Chapter six examines a little understood aspect of autonomous vehicle technology: vision capture and image processing. While most attention relating to driverless cars has revolved around legal and safety concerns associated with removing a driver’s control, fundamental to their long-term success is the development of integrated visual capture and processing – that is, the processes that are involved in mapping, sensing, and real-time dynamic visual data analysis within dynamic urban environments.
To become properly integrated into urban, suburban or country transport systems, autonomous vehicles have to adapt dynamically in response to the way those spaces, and all their component objects, obstacles and subjects, can be seen or rendered visible, mapped, and understood computationally as real-time traversable data.
Examining the vision capture and processing used by autonomous vehicles, we ask how a driverless car learns to see, and we explore the human and social challenges that visual processing technologies place in the path of autonomous vehicle developers. The chapter also considers the concept of technological affordances, as set out by James Gibson, in relation to a wholly complex ecosystem. Driverless cars stretch the sense in which a machine can be defined by its affordances for seeing and acting in its environment.
Chapter seven, the final chapter, points toward what a digital, visual, media data literacy might look like in the age of automation and AI. We adapt the concept of “camera consciousness” as a tool to canvas the wide range of technologies and contexts that are the battlegrounds for AI, and to find a space to build new digital, visual and data literacies in the age of seeing machines.
Digital literacy has been used as a broad term for addressing both the technical skills as well as the “cognitive and socio-emotional aspects of working in a digital environment,” as Eshet-Alkalai put it in 2004. If we take the case of deepfake videos – videos that use deep learning techniques to artificially construct videos of people acting and speaking in ways that look like the real thing – the time has come to extend our understanding of literacy as read-write skills to the capabilities of agentic seeing machines.
A key avenue for shaping the ethical and beneficial development of machine vision technologies lies at the intersection of politics, education and responsible design. To this end, machine vision needs to be accountable and responsive to its own semiotic processes, its meaning-making and sense-making actions and interventions in the world.