AI and Automating Knowledge Inequity
In our third season, we continue our goal of interrogating the politics of knowledge production, exchange and circulation – but with a special focus on exploring the implications of the widespread and often uncritical use of Artificial Intelligence and Machine Learning technologies. In particular we will examine how the use of these technologies by corporate publishers and data analytics companies can replicate and exacerbate existing structural and other forms of inequities in societies and in academia.
In this first episode, we are joined by colleagues from the Distributed AI Research Institute – Dr. Alex Hannah, Dylan Baker, and Dr. Milagros Miceli.
DAIR is an interdisciplinary and globally distributed organization rooted in the belief that AI is not inevitable, its harms are preventable, and when its production and deployment include more diverse perspectives and more deliberate processes, it can be beneficial.
Safa: Hello and welcome back to our third season! My name is Safa and I’m your host.
This season, we continue our goal of interrogating the politics of knowledge production, exchange and circulation – but with a special focus on exploring the implications of the widespread and often uncritical use of Artificial Intelligence and Machine Learning technologies. In particular we will examine how the use of these technologies by corporate publishers and data analytics companies can replicate and exacerbate existing structural and other forms of inequities in societies and in academia.
We will be in conversation with folks committed to bringing awareness to and challenging the potential impacts of these technologies, particularly on the equitable production, discovery and circulation of knowledge in academia and beyond.
In our first episode, we are joined by colleagues from the Distributed AI Research Institute, also referred to as DAIR. DAIR is an interdisciplinary and globally distributed organization rooted in the belief that AI is not inevitable, its harms are preventable, and when its production and deployment include more diverse perspectives and more deliberate processes, it can be beneficial.”
Dylan: I’m Dylan Baker. I am an Engineer and Researcher at DAIR, and I’m based in Seattle, Washington.
Mila: And I am Milagros Miceli. I am based in Berlin, Germany, but I originally come from Argentina and I am a Research Fellow at DAIR and a Researcher at the Weizenbaum Institute in Berlin.
Alex: My name is Alex Hannah. I am the Director of Research at the Distributed AI Research Institute. I am actually trained as a sociologist, that’s what my PhD is in, and my undergrad degrees are in computer science, math, and sociology.
So I was working with computer science methods. And then as I finished my computer science degree, I was learning more about social media tools, especially being used in activism. And so I was interested in understanding how we could kind of make sense of a lot of this, a lot of social media content online.
In my sociology program, I started working with machine learning, particularly to do things like classified texts on social media, and was interested in things like how activists were using social media. And so have some work that was classifying some Facebook posts that some activists in Egypt were posting to organize. And then developed a tool for my dissertation, which was, called the Machine Learning Protest Event Data System, which was generating data on news articles to determine if people were protesting or not, and if the newspaper article was mentioning it.
And so, once after I graduated and was first at the university of Toronto and then went to Google and was getting more interested in basically the ways in which these things were exacerbating existing inequalities, the ways that they were built in and taking existing inequalities, be those, racial or gender or between the global north and global south, and making them worse – and how AI was super charging that.
I mean, AI is this term that’s quite old and is sort of a term that’s been thrown on a number of, technologies that have been around for a while. So AI or Machine Learning, and so I was pretty interested in that and found the Ethical AI team at Google.
Dylan and I used to be part of this team at Google. This is a team that was co-founded by Meg Mitchell and Timnit Gebru, but after both of them wrote a paper about large language models, this particular type of technology, there was a call internally for them to retract their names for the paper – and they refused and Timnit was fired because of it. And then Meg was fired because she was defending Timnit effectively.
But the team itself was, and still is oriented towards understanding the harms that come out of AI technologies and methods of mitigating those harms. And so Dylan you say more, I’m sure.
Dylan: Oh yeah. I mean, that covers it in my experience. Yeah. A really, really wonderful group of people and definitely one of the most accepting and supportive and collaborative work environments that I’ve been in, especially at Google. I think the culture of the team was super unique and really, really special and something that I think Timnit is really building back up again at DAIR.
So as an undergrad, I focused my degree around like data science and machine learning. And, I read “Weapons of Math Destruction” in an AI class, that I think really helped to crystallize a lot of the like thoughts that I was having around ethics and machine learning and in the sciences.
And actually before that, my dad is a conspiracy theorist who was sort of radicalized on YouTube and like Wikipedia. So I was sort of raised from a really young age, like extremely online, and extremely online in these very sort of radical spaces. It was a really weird way to grow up with a relationship to information and misinformation.
But I think that having that relationship to internet content, from as early as I can remember interacting with computers, I think fueled a lot of how I see ethics in AI and ML and technology in general. But yeah, after undergrad, I ended up at Google. I was doing like experimental machine learning projects and a lot of the data infrastructure.
So I think, like I was primed to expect the like model training aspect of ML and didn’t really have a complete picture of the 90% of work that goes on behind the scenes to make model training possible. And I think getting firsthand experience doing that and also being in conversations with ML practitioners and like how these experimental products came to be and the kinds of motivations and the voices in these conversations, I think was really eye-opening to me.
And it felt like there were so many disconnects between all of the different sides of the ethical concerns that were going on, that I think it just pushed me even more to want to be working on – not only working in like ethical AI, but also working on finding ways to communicate across disciplines and between different communities of practice and ways of thinking about these problems and parts of this massive system that creates machine learning models and products and production material.
And so at DAIR, part of my work is in helping with engineering things on research projects. And part of it is trying to move in a direction of finding ways to communicate our research to the public and communicate between a bunch of different stakeholders and disciplines and finding ways to make this accessible – and find the people that it actually needs to reach and reach them in a meaningful way. And in between those things, I was on the Ethical AI team.
Mila: Yeah. And, how did I get here? So I am a Sociologist like Alex. I am trained as a sociologist and, as a sociologist, I was doing a lot of projects at university that had to do with inequality, with power, also with knowledge production, but nothing really related to technology. And I had never worked in technology and I actually, besides these projects that were part of my degree in sociology, I had never really worked in research – but this is what I wanted to do, I wanted to work in research.
So I found a job to be a Research Assistant as I was still writing my master thesis at the Weizenbaum Institute, for a group that was totally about AI, a group only with computer scientists. So there were no other social scientists in the group.
And I applied, I thought, okay, I will be brave, and I will apply. And for some reason, I will never know why, they hired me. And I started working with them. At first I was really lost and I had to really catch up because I had never thought of artificial intelligence in any capacity. So I had to read a lot, I had to catch up.
But then at one point I realized that some of the problems that are caused or enhanced by AI are actually problems that we sociologists have been discussing in other forms and in other places and contexts. So I thought, okay, this is not that different from that that I have been working on before. It’s just a tech element that is challenging to me because I don’t have a technical background.
So yeah, I started working on that and I realized that there is a space and there is also a need to apply this knowledge about power and this framing about power and the framing of how knowledge is produced through and with AI, and with data also. And this is what I started doing.
I started thinking also of classification, which is something that sociologists have thought about for long, and thinking of how this is done in AI production and through AI later. And this is how I started working with data annotators because I thought, okay, this is really the moment where the classification happens – later I realized that there are many moments where classification happens.
But, to me, it was like the first instance where I thought, okay, this is a very obvious case where I could think through the framing of classification. And I started working with data laborers, did field work with them and yeah, this is how I got here.
Dylan: Yeah. So the way that I think about it in the simplest possible terms is that classification is never a neutral thing to be doing. That creating ways to break down our world into simpler pieces and create these concepts and classificatory categories, is a very human thing and involves making lots of simplifications and assumptions, because that’s how we boil things down into smaller numbers or properties.
And in all of those simplifications and assumptions, we’re making a lot of decisions about how we’re going to frame the world, and that in and of itself propagates out these worldviews and then creates the world that we’re in, and propagates out more simplifications and assumptions that people make. And so it has a way of reinforcing itself – and if we’re not thoughtful and careful about how we choose to classify things and design these classification systems, we can end up just reproducing the same kinds of systems of power that exist and make things bad now.
And so by default, that’s generally what happens. And so we see the same kinds of issues that you see in the larger world play out in different ways, on smaller scales within machine learning systems.
Like a lot of the questions about data labor that we have, or the problems that we see in data labor, are similar problems that we see in all different types of labor. And I think that’s like one of the points that we’re trying to get across in general, is that a lot of these problems are not new – like all of these flashy and exciting things that people are building, can’t get us away from the fundamental issues of power that we’re facing just societally.
Alex: And Dylan, I love this example that you’ve done – I talked about this in a class the other day, but you took this image of Seattle and I’d love for you to describe it for the podcast cause I think it was a really interesting way of talking about it.
Dylan: Yeah. So an image that I used in this Explorable article was just a picture of the Seattle skyline. And you can think about how you might classify or like separate out all the objects in that image. And in one version of this, you see like the Space Needle, and that’s annotated as a landmark. And you see a bunch of different buildings, and one of them is labeled like a non-residential building and another one is a residential building. And then you see a box around a tree that just says tree, and a box around Mount Rainier that says mountain. And that seems like a really objective way to just separate out objects from the scene.
But you can also think about a totally different worldview that would be communicated if the objects that you pulled out were different and labeled differently. So in another example, instead of classifying buildings, as for example, residential or non residential, you could classify structures as religious or non-religious or unknown spiritual value.
So in this picture I have the Space Needle box and it says structure of unknown religion, and a box around an office building that says non-religious structure. They’re also, instead of just a box around a tree, it could be like a box around the same tree that’s labeled non-medicinal plant, and a box around some shrubs elsewhere that weren’t even highlighted in the first image that’s labeled a plant that’s edible when cooked. Like Mount Rainier in this case is labeled to Tahoma, which was the original name of the mountain from the people who were indigenous to the area.
There’s also a picture of a few tents, like a small campsite, in front of some of the shrubs that was not labeled in the original picture, that’s just sort of part of the backdrop. And in the second image it’s labeled campsite and community. So the way that we decide to break down something as simple as just a picture of the Seattle skyline, can communicate really fundamentally different worldviews about what it is we see and how we classify and categorize them.
And then like what’s important to surface and opens up different questions about what we think about a space, what we think the space is for.
Safa: Part of understanding how artificial intelligence and machine learning algorithms and models produce knowledge is tracing how they have been trained and who has the power or epistemic authority to decide what gets labeled AND classified, in what way, based on what value system.
Dylan: At the very, very low level, like as a very junior engineer being in spaces where machine learning researchers are making low level decisions about how to classify things in practice. And just in my personal experience, a lot of the time it seemed like people were very uncomfortable making explicit their underlying assumptions when thinking about how to make classifications.
So for example, there were a lot of projects where we wanted to classify – it was an automatic photography, and we wanted to classify images as like really subjective things, like is sentimental, is scenic, is a heartwarming moment or image of people.
Like sometimes it would get down to like, okay, well we can classify hugging, we can classify high fiving, we can classify kissing, we can classify jumping. We have classifiers that we sort of quote, unquote, believe to work with some amount of accuracy.
So let’s say that like, sentimental is somebody maybe hugging or kissing or two people jumping together. And so these things like arise out of practicality, but I witnessed like zero discussions about not even just like the subjective nature of, you know, what makes something scenic, but these were just built off of like, we decided to create these models because unclear someone’s PhD was in this space. And like, you see all the sort of threads of like, why did we build this model to detect this thing? Well, there are some like low level sort of technical reasons why it was. easy to put these two types of models together. And this architecture happened to work really well for like finding two people and positioning their heads and finding them close together. So now we can classify kissing and now we have this project where we wanna find sentimental moments. And so, you know, all these things are sort of glued together.
And then when we get to the part that I was working on, which was data labeling, where I’m trying to find panels of 10 or 20 or 100 people to label hundreds of thousands of these video clips as sentimental or scenic, it feels almost meaningless to have a conversation when I try to bring up, like, this is a really subjective thing. Do we know anything about the data labelers or their worldviews? Or like how they might feel about these kinds of things, because that was never a reason any of this was built. That was never part of that conversation – because it didn’t, it didn’t incorporate into how, how people were thinking about these things or the motivations behind building them.
And so at that point, it almost didn’t make sense to address those questions at the very lowest level stage, which was the data labeling, which is going to build the ground truth, which is gonna tell us, quote, unquote, like how well these models are performing, which is going to inform the gold standard for the product that people actually get.
And it was, it was really frustrating and interesting to witness firsthand. And I think was a big part of the reason that I wanted to move into ethics in the first place, was because like, it was fundamentally incompatible with the way that things were built to have conversations that needed to happen for any of these products to make any sense.
Mila: Yeah, maybe I can add, I found really interesting what Dylan was just saying because, yeah, it shows more like the side of how requesters come to put together the idea of how things should be labeled, and this mention about how requesters sometimes don’t even ask themselves, where are these laborers? What’s their cultural background, is this going to make sense to them? And so on.
And I can maybe talk about the other side of things. Namely, the labelers that are outsourced somewhere. I work with groups in Argentina and Latin America, also with my collaborator, Julian Posada, we also did a study on data anatotators in Venezuela. And also I did field work in Bulgaria with refugees from the Middle East who were doing this kind of work.
And, first to the question who, who decides, who has the power to decide, right? This is very clearly a very hierarchical structure. So the one who decides is the one who has the money to pay for these annotations to be done in a certain way, in a very specific way, to what Dylan was already commenting before. And then there is the question of the epistemic authority, right? Like, there is this belief or this naturalization or the authority of managers or this authority of, you know, those who are a little bit higher in this hierarchy might know better or might have the right answer, when answers come up.
So the data laborers would typically trust the judgment of the managers. And if the managers don’t know what to do, they would go to the client and ask. Now, the interesting thing is that in all this process and structure, I have never witnessed labelers discussing among themselves, to actually decide how to label, let’s say an image in a better way.
What would make sense to them? What, you know, like they would be confronted with satellite imagery and they, uh, would not discuss among themselves: Okay, what do you think, is this a tree or can we, or can we use other types of label labels? Can we do this differently?
But the discussion would always center around the client. What would the client expect? What does he want? What do they want? So I think that that shows how these things are done and how this myth of consensus, you know, the wisdom of the crowd is actually a myth because in the end, these people just are worried to keep their jobs, they are subject to this very, you know, like top down form of imposition, and really like the structure in which they work doesn’t leave them much room or much agency for them to really, you know, use their subjectivity, use their logic, to interpret data in a way – even though they are, of course, they’re totally capable of doing this and probably it would be a better way, and we would be creating or producing, way better data if we would do this.
But, this doesn’t happen. And maybe one more thing to add to this is that, we were discussing, before the question of how classification creates truth and how classification imposes a certain worldview and specific worldview. I think this is a very much a very important part of my work.
I think that the question of naturalization, how we in many occasions, we just classify something and we think that this is obvious, and we think that this is, yeah, the way things naturally are. And, it might be that for a certain group of people who think alike, this might look obvious. But this doesn’t mean that this is obvious for everyone or that that classification works for everyone.
Alex: Absolutely. So if we think about this point that the Mila just said about the way that these things get naturalized, we can think about kind of the ways in which these data sets, or the data sets that are used, are often classifying things in a particular way.
So for instance, there was some research that came out of Facebook, where they took a data set called the dollar street data set, and this data set has images of people and their households all around the world. And, they also have the income, the monthly income of those households, and then they trained several different computer vision algorithms, that were intended to do what’s called, image classification, basically, or object classification, they say what’s in the image.
And so then what they ended up doing was that once they, they trained these on three major algorithms, and one’s called ImageNet, one’s called OpenImages, the other’s called MS Coco. And these data sets were gathered and developed by labs or universities primarily in the west – and they went ahead and analyzed, you know, how well does this do in classifying, you know, they had a kind of a top line metric in which they classified, you know, how well did it, this do on images from, you know, from the west versus the global south. And, as you’d expect, it did quite poorly on images from the global south.
As an example, one I like to use is, you know, there’s an image of soap from Nepal, which the household income was something like 100 or 200 US dollars a month, and it was classified as food or cheese. And then when they did the same thing with an image from the UK, in which the income was 2,000 US dollars a month, it was classified correctly or more closely.
And so you know, this has some, pretty, pretty clear ramifications for that kind of exacerbation. You’re sort of taking this thing that looks illegible in one context, or has been naturalized to be expected to look like this in one context, but is in fact – looks like a completely different thing, or is recognized as a completely different thing.
And in some other research that we did with two sociologists of science at University of California Los Angeles, we were assessing how much these particular data sets were actually being used in machine learning research. So we actually found in terms of the subfield of computer vision, actually all across, there’s an intense amount of concentration about how much these data sets are used, both for training and evaluation.
And so, only 12 data sets, only 12 data sets, are used in 50% or more than 50% of the paper is used in machine learning. All those data sets, except for two, are developed in the US. One is developed in Germany and then the other is developed in China, in Hong Kong.
And so this has pretty big ramifications in terms of what becomes naturalized and it has, you know, I think the analog is pretty common when we talk about this in computer vision. You know, what gets seen, what gets recognized as sort of natural. But we can also see very clear ways in which this is recognized in natural language processing, where the problem gets exacerbated because now we have to introduce the problem of language.
And there’s a great rule that a friend of DAIR, and our team calls, it’s called the Bender rule, named after Emily Bender, in which if there’s a technology that’s developed, she asks: okay, does it work just for English? And if you violated the Bender rule, you need to go back to the drawing board.
And so we see the Bender rule happen and that’s just English. That’s not even, you know, that’s not even considering other types of primarily Western languages, including Spanish, French, Portuguese, et cetera. And then we see this get exacerbated as we get into. you know, if you’re, if you’re focused on, let’s say Arabic, is it just modern standard Arabic? Is it just Egyptian Arabic, which is the most spoken dialect? The one I recognize the most – or when we are thinking about things like Sudanese Arabic, Moroccan Arabic, or, things that, uh, could be more useful, to a certain community of people. And as we are proceeding as other projects with DAIR, we’re actually focusing on other types of languages, in which people who are being targeted in terms of the Tigray genocide, the languages that are spoken in the horn of Africa and the types of languages there.
So there’s an immense amount of linguistic diversity in the world. but most of the NLP tools are focused mostly on English, unfortunately.
And I mean, we already know that, we already have an existing problem that the lingua franca of academia is English, right. This is already a huge problem, right. And so then that already has a particular sort of bias in terms of what can actually be published and who does get published.
I would also say more generally from the perspective of knowledge finding creation, is sort of the ways in which the existing knowledge repositories then get compounded by the models then that are built on top of that.
So for instance, the largest Wikipedia repository in the Wikipedia project is in English. English Wikipedia is the biggest, I think that’s, that’s right. I’m kind of making that claim. I think the last time I checked it was English, the largest one – it’s followed by, I think another romance language, but then it’s, it sort of drops off very quickly.
There’s a paper I’m thinking of I’ll share, the main title of is “Do you speak my language?” and it was focusing on repositories of different linguistic corpo. Now you can think about where tools get built on top of that.
So for instance, the types of techniques that get taught, that Timnit and Meg were writing about at Google, large language models. And you can think of large language models in a very simplified way. One of the things is that large language models should be thought of as sort of rewriting, you know, one text to another, or from a human language to kind of an internally recognized language. They do some kind of prediction to human language, because they’re generative models. But prior to doing that, there’s some steps in which they have to understand this in their neural network architecture. And so if most of that stuff is being trained on English language text, and those aren’t gonna be very successful. Um, they’re not gonna do rewrites in a language that’s gonna be very helpful for people. And we already know that large language models already have these huge problems in which they write these things, rewrite these things in a very negative and violent way for certain groups as well, just in English too.
We have people who have done audits of the purported training data of these large language models, like the Common Crawl Database, and found out there’s all kinds of negative associations in there, like associated Muslim with negative sentimentality, or gender minority groups, et cetera. So, there’s sort of these dual problems of the data that’s available and also the kind of methods that are built.
Safa: DAIR’s pioneering work on these issues is guided by a set of values and a research philosophy with a strong focus on collaborative and non-extractive approaches to knowledge production that prioritizes accountability to communities and centers the expertise of collaborators from diverse backgrounds.
Mila: Part of the things that I’m doing at DAIR is what will be the research philosophy – I’m doing this with Alex, thinking of the research philosophy, and that we actually want to hold us accountable to – and thinking also in terms of whom else we want to be accountable to.
So, several of the things that we are thinking through have to do with how to approach communities. What does it mean to do work with communities? Who are the communities we want to work with?
And in terms of how to avoid, of course, extractivist practices, how not to be, you know, the researcher that parachutes themselves into the field and does something takes what they are going to take and then forget about the community. But rather how to fund projects that are community-led, that emerge from certain communities.
So for example, one of the things that we are starting now is some sort of workers’ inquiry, where we want to fund workers to do some sort of account of their workplaces, with the researcher just being a facilitator in that setting, and not being the researcher, the one who tells the story of the worker.
So that’s one of the things that we are thinking of, maybe Alex wants to add about the other projects.
Alex: Yeah, for sure, kind of thinking about projects in which workers are telling their own stories. And, you know, one person that we were working with her name is Adrian, and she’s a former Amazon delivery driver and former charter school teacher and a labor organizer, and has a lot of experience in sort of in working and understanding the kind of connections and how this works and how labor connects to these systems of AI and data extraction and algorithm management.
You know, we’re also working with a group focusing on a project that is focusing on hate speech and disinformation and the horn of Africa, and working closely with a refugee advocate named Marone, who’s done an immense amount of work in advocating for refugees from Eritria, and working with many people who have encountered kind of horrific torture, after being taken hostage, and being refugees and encountering the sort of misinformation that comes from government and their related allies of networks. So she has a deep knowledge of that.
We also are working with many journalists and activists, who have been targeted to understand that and really approaching this from a kind of both a computational angle, but also knowing that that computational angle is really nothing unless we really have people who are really knowing and have been tracking this stuff for years on their own, just, doing this in their spare time because they just have no recourse.
And so this is something that we’ve been doing and, and Dylan, I know you’ve been working very closely with those folks. If you wanted to say a bit more about that too.
Dylan: Yeah. I think something that has been really unique or like ,super powerful about working at DAIR, is the shift of how expertise is thought of and valued.
Because yeah, I have been working really closely with the team of folks that Alex was describing. And a lot of my work is in engineering support work. And so I have lots of experience doing large scale data pipelines and processing and management, and I think in my former settings, both at school and, and internships, and at Google, that’s sort of where it ends.
The model is very much parachuting, for somebody with the technical skills to jump into a new space and sort of there’s the savior complex of, you know, I have these incredible magical skills that nobody else could possibly have, and I’m going to apply them to your subject area and, you know, come up with some miraculous solution and give it to you and save the day, where it is like extremely clear that I lack any of the context and knowledge to begin with where I’m standing, to have any insight into how to do my work effectively.
And so I’m very much guided by the folks on the ground who have been doing this kind of archiving work for years, unpaid. And the folks who have been living their entire lives, informed by the context of what’s going on in Ethiopia and Eritrea, and in the larger diaspora communities. And I really value being able to be led by their expertise, because it makes my job so much more effective, and allows me to actually feel like I can do meaningful work because I have the expertise available to me. And I have the resources, and we’re able to work closely together. And so my skill set is useful.
I think there are lots of other people in positions like mine without the structural and institutional support to be able to defer to this kind of expertise in guiding their work. And like I know a lot of other people who feel frustrated and sort of helpless and reasonably, so. Because it’s really encouraged to take this kind of saviors parachuting model, this really exceptional approach to this particular skill set, which I think is viewed that way for a variety of reasons, but through like an enormous amount of gatekeeping and you know, all of the other things that are built into the institutions that gate keep this kind of knowledge and skill acquisition, which is particularly ludicrous because it’s one of the most achievable skill sets, like on your own with the internet right now. And people still choose to gate, keep it. But anyway, that’s sort of a tangent.
Alex: It’s not a tangent at all, because it’s sort of talking about how you work with communities, right? Cause I think there’s a, there’s definitely a model. There’s the parachute model which both of you refer to, but there’s also the model of like, we’ve written up this whole thing, do you like what we did? And it’s sort of an approval at the end, you know, and this is sort of often what happens at companies where there’s like building of AI and they’re like, oh, maybe we should make sure that it’s not doing anything too racist or something. But then you sort of have to reject the premises on which it’s built, you know?
So you know, if you came to me as a trans woman and you were like, we built a gender classifier and we wanna make sure it works on people who are dark skinned and, do you approve it? I’d go, well, no, I reject this. I reject the whole theory of this thing. Like, I don’t think this should be a problem that should be approached.
You know, I don’t think that’s – and I mean, you could think of many similar things, like facial recognition systems, are you gonna get community approval of a facial recognition system? You’re like, no, it’s a carceral technology that should be rejected on the premises.
And so, you know, building things and working with community means starting from the stage of problem formulation and of, you know, do we even need this thing? Should we even frame this in this way? How is this gonna be of any sort of benefit? And that means really starting with people and having those, having people on staff that have those knowledges and then having input at every, every kind of level of the pipeline in building machine learning technologies or, you know, approaching critiques of them.
Safa: Having listened to our conversation you might be asking yourself what are ways in which the companies building and using these AI and ML technologies can be held more accountable for the harms and inequities that they exacerbate?
Alex: Unfortunately there’s not a lot of like, accountability mechanisms that occur in the world right now, especially for big corporations that have huge immense amounts of capital and market market capitalization, right. The kinds of existing – any legislative or regulatory domains, the sort of things that we have are sort of the public relations sort of shaming that one can do.
But different jurisdictions have different sorts of regulations that could be leveraged. So in Europe, there’s the GDPR, there’s the General Data Protection Regulation, and there’s going to be the AI Act that is looking to be, it’s being discussed in European parliament now. I’m less familiar with the Canadian content since I left Canada , but in the US there’s a few kinds of efforts, at least some privacy accountability regulation, but there’s really not too many kinds of things that are actionable.
And so one of the things that I think we can sort of start thinking about, or one of the kinds of roles I think scholars and kind of I think scholar activists, is to sort of say, well, what are ways in which there are kind of solidarities that can be built to sort of start working and calling out some of these things and kind of organizing against them? And what are kind of community based strategies of doing that?
And so, there are kind efforts that exist in sort of coalitions. Often these fights are happening on a municipal government level of, you know, stopping certain municipalities from adopting technologies like, ShotSpotter or PredPol.
And, there’s an example that Lily Iranian and Calum Alexander talk about in which they organized against the streetlights in San Diego, California. And they had some kind of scholar activists that would go and explore and were reading the terms of service of these streetlights and found out that, you know, GE who I think produced them, was entitled to all the data that they collected.
And so even if they wanted to stop using this, you know, GE still had all this data, and these smart streetlights had video and audio capture in them. Actually, I think they had audio capture. I’m not sure if they had video. and so these are sort of these like battles that are happening at a really, you know, these municipal, these city based levels, but are immensely consequential.
Yeah. And I’d love to hear Mila and Dylan, what you all think about this too.
Mila: In general, I think that the thing that we can do is empower ourselves and others, and how we do that is through knowledge. Through really, as Alex said, this role of researchers, journalists, to inform others about how these systems are built and take action.
But I think that action is not only about the people or the communities impacted by these systems. Of course, that’s super important and that is what Alex was, just naming. But also I think it is important to foster and to help foster action within the communities of workers who are building these systems. I think that can be incredibly powerful.
So, the Tech Workers Coalition, and other organizations like Turkopticon, where workers organize, and even unionize, and they fight not only for better working conditions, but also for better data, for better systems, for systems that are more just, and that are less harmful.
And I think these two polls in a way, the organization of the people who are impacted by the systems and the organization of the workers who are creating these systems, and also the dialogue between those, also should be fostered in a way, or should be enabled in a way, because I think that could be incredibly powerful.
Dylan: Speaking very generally, I feel like any meaningful accountability to me has to involve redistribution of wealth and resources. Because I think for any of this to be possible, people need money to exist and to organize. And if we wanna be able to set the terms of these discussions and set the terms of the kind of technologies that we actually want to build, people need money and resources and institutional power.
And so I think when sort of dreaming about the kinds of accountability systems that we could put in place for corporations or people who do harm with technology, I feel like thinking about resource allocation is really the crux of it for me. But yeah, also everything that Alex and Mila said, I think is spot on.
Safa: The tech and academic sectors share many similar inequitable labour and knowledge production practices, but both have pockets of organized resistance, often led by those in the most marginalized positions.
Alex: I cut my teeth in the academic labor movement, in the graduate student worker movement, you know, in the early 2010s. And that’s where I found a lot of parallels with sort of workers in the tech industry and the same challenges. Because I think a lot of tech workers think of themselves as, you know, maybe toiling for a bit, but then they’re gonna get a huge term sheet from a VC and they’re gonna get rich.
And then, you know, many academics kind of think of themselves the same way. They’re like I’m gonna study for a little bit and then I’m gonna get tenure then,I’m gonna strike it rich. And that’s just becoming less or less, less and less realistic.
Here in California, uh, for instance, I’m part of the, uh, the lecturers unit and at the university of California system. And the union there, the lectures, pushed back and organized and approved a strike vote at 96% threatened to strike and actually were able to extract some pretty good concessions from the university of California system.
Uh, we can think about the Maple Spring in Quebec, in which the state threatened to raise fees on students and 300,000 students went on strike over a nine month period. And unfortunately was not able to stop that fee raise, but I think they decreased it a little bit.
And so these movements do exist, probably not at the highest levels, at the levels of tenured faculty, because they are the most secure, but they’re definitely happening when it comes with contingent faculty and graduate students and undergrad students who are paying more and more fees and getting into more and more debt.
So I think any kind of challenge to the gatekeeping of knowledge production, any of these challenges are definitely gonna come from below. You know, they’re gonna come from people who are already marginalized in this system and this hierarchy.
Mila: Maybe I can add that, first of all, that yeah, these efforts should exist. And I do agree with Alex that, if they are to exist or if they already exist, this probably comes from below and not from the tenured, white professor, who is secure in their position.
What I can say is that the institutional context does a lot, and it shapes a lot how these efforts are carried out, and what kind of efforts are possible, in a way. And also the question about, who is in an authority position, or how we produce knowledge, and who is going to be recognized as, yeah, legitimate knowledge producer. So I think that that’s a lot, like, for example, to me, it’s really very much of a opposite example of seeing my institution in Germany and working for DAIR. And I’m happy this is going to be heard mostly in the US, not in Germany, and not by my bosses at the Weizenbaum Institute in Germany.
But really it’s really very limited what one can do when the institution doesn’t go along. And this is, I think the novelty of, what DAIR is and what we are building together here. We are required to hold ourselves accountable. This is nothing that I have seen somewhere else. And I’m really happy it exists.
Safa: To learn more about the work of DAIR, please visit their website at https://www.dair-institute.org.
Thank you so much for tuning in.
If you are provoked by what you heard today, we invite you to join us at the Knowledge Equity Lab. Together we can fundamentally reimagine knowledge systems and build healthier relationships and communities of care that promote and enact equity at multiple levels.
Please visit our website, sign up for our mailing list, follow us on social media and send us a message to get involved!