The High Cost of Knowledge Monopoly
Over the past 20 years, the academic publishing market has undergone changes that have led us to a juncture where power is concentrated in the hands of a handful of big companies.
To help us understand how this came to be and its implications, we are joined today by Claudio Aspesi, a leading market analyst for the academic publishing market. Claudio is a consultant at SPARC, and has authored several reports about the market power and consolidation of the largest commercial players in this space.
You are listening to the Unsettling Knowledge Inequities podcast, presented by the Knowledge Equity Lab and SPARC – the Scholarly Publishing and Academic Resources Coalition.
In our last two episodes we have learnt a lot about the data extraction and user surveillance practices of corporate publishers and data analytics companies, who use artificial intelligence and machine learning technologies in ways that exacerbate inequities in academia.
It’s clear that over the past 20 years, the changes that the academic publishing market has undergone has led us to a juncture where power is concentrated in the hands of a handful of big companies.
To help us understand how this came to be and its implications, we are joined today by Claudio Aspesi, a leading market analyst for the academic publishing market, who has been tracking and analyzing these changes for many years.
Claudio is a consultant at SPARC, and has authored several reports about the market power and consolidation of the largest commercial players in this space. These reports give valuable insights to libraries and academic institutions about the workings of these companies.
Claudio: I’m Claudio Aspesi. I am a consultant to SPARC, and I’ve been working with SPARC for the past four years on a number of issues related to the introduction of data analytics and artificial intelligence into the academic community. I’m currently based in Zurich in Switzerland, after living in several countries throughout my entire adult life.
I first started to look into scholarly and academic publishing about 12, 13 years ago. I was a financial analyst on the sales side at the time, I was covering European media for Bernstein and some of the stocks I covered was called ReedElsevier at the time, Wolters Kluwer, Thomson Reuters, Pearson, were quite active in different areas of this landscape.
And that’s how I started to become familiar with the interaction between the academic community at large and some of its corporate vendors and some of the people. If you look at scholarly communications, the leaders are the same leaders that existed 20 years ago – there’s been one notable element of consolidation when Springer Nature group was formed from the merger of Springer and Nature publishing group, but otherwise the two other important companies are the STM business of RELX, which is commonly called Elsevier, and Wiley, which is also a company that has been present for a long time.
All of these companies have been growing in the past 20 years, but they’ve grown in somewhat different directions. I mentioned Springer merged with Nature publishing group. Wiley has been more active in acquiring technology companies that strengthen its offering. And we are seeing Elsevere acquire a number of businesses that have put them squarely in an apparent position of leadership in the provision of data analytics services related to research, to academic institutions, governments, funding, bodies, et cetera.
But by and large, the next tier of company hasn’t changed much. It’s the relative shares that are likely to have moved. There is no hard data that is shared across the industry and agreed upon in the way consumer media companies tend to share their data and are allowed to assess market shares. So we don’t have the certainty of this, but there’s a clear sense that these companies have been able overall to grow faster on average than the next tiers of publishers and therefore they have gained shares.
Safa: To better understand the consolidation of power of these publishers, it’s necessary to trace the close relationship between the traditional publishing business and the research analytics business.
Claudio: The overlap between data analytics and research and the traditional publishing business is very visible. A lot of the raw data that is used to provide analytic services to academic institutions and funding bodies does come from the publishing activity itself.
And so to the extent that the citation count, the identification of authors, and the count of articles published, and the impact factor where they publish- is all data that emerges from the traditional publishing activity.
So a lot of this is simply an extension of what publishers already did in the past.
This obviously is being compounded with additional data that is being collected through the work of these companies with their customers. So when they get a sense of the spend for labs and research projects, when they track grants and the beneficiaries of the grants, they can then match that information with the publishing information and start to count the relative success, the productivity, and the return on the spend that different activities have accomplished.
Of course, all of this is still subject to a lot of interpretation. And that is in fact, one of the significant issues – that to the extent that trying to measure the importance, efficacy and effectiveness of research on this element carries with it all kinds of problems.
We know that there’s a corpus of academic research that has shown for example, the issues related to the impact factor and the gaming of matrices that the impact factor has led to over time. Obviously to the extent that there are problems with some of this data and some of this metrics – either how it’s counted or how it’s generated, that reverberates into the analytics themselves.
So there is a massive amount of decisions that are made with the support and or with the recommendations coming from some of these tools.
They’re used to mine, collect data, and to provide feedback back to the administration, or the funding bodies. Just to be clear, these products are not sold and they’re not acquired just for the fun of it. And not even just to get a general sense of what’s happening. They are used to think about a number of decisions.
They are used to screen faculty candidates, they’re used in the reward, promotion and tenure processes, they’re used to allocate funding of grants. They are used to think about possible joint activities between research teams, across different universities.
Safa: One example of a company that has both academic publishing and research analytics activities is RELX – which is the parent company of the largest academic publisher Elsevier. RELX has a large and growing “risk” business that collects vast amounts of information on individuals and sells access to that information to governments and other clients. The academic community should carefully consider the implications of the largest publisher also being a data broker and leading provider of research analytics and “risk” products.
Claudio: RELX acquired a risk business in 2008, and it started building on it. It started as a business that was mostly focused on insurance and collecting data of driver and vehicle behavior and performance, in the US. From that beginning the company expanded into a significant number of other areas.
It now collects vast amounts of data on everybody, fundamentally, particularly in North America.
And it sells that data back to corporations, financial institutions and to the government. And that data is used in a large variety of settings. There is a question: whether the data that’s collected by the academic business is then being used also in the context of risk.
The company claims that it doesn’t transfer that data over. But people who have looked into this have published results showing vast amounts of collection of data related to research activity and the online activity of individual researchers. There was an article that was published not long ago on this. It also uses tools that are shared with the risk analytics business. And so the question of how far this goes is not transparent.
The company has tried to tell everybody for a number of years now that no such transfer happens, but we only have to take their word for it. We have no independent way to verify that information. And in fact, I will argue that the only way for the company to reassure the academic community that that data is not being transferred and used in other contexts, would be to stop collecting it altogether – or to separate the academic business from the risk analytics business.
And so the academic community may have not yet realized that research analytics is very different from scholarly publishing. In scholarly publishing every title, every journal is fundamentally a business of its own – in a way it’s a monopoly of its own. If you want to read a specific article that was published on Nature or on Science or on Cell, you have to access that article. You have no alternative. That means that most academic libraries have to at least consider subscribing to the output of all the leading publishers, because that’s the way for them to support their patrons and their users.
On the other hand, data analytics is a business where the products of one company are not different from the products of a different company. And so the handful of companies that operate are fundamentally directly competing with each other for market share. It doesn’t make a whole lot of sense to spend a lot of money replicating the numbers that you get from the research analytics business of RELX and buy those of Clarivate as well, or buy those of DigitalScience.
In fact, arguably, it may even create complications – because if you get two sets of data that are not exactly comparable, you then have to decide which one you trust better and that can become messy.
You also have a second problem that’s typical of all these businesses. Not only do most customers tend to buy only one product, but they tend to buy those products repeatedly over time. Because once they create a time series that’s analyzed with a certain algorithm, it is difficult to move to a different one. You have effectively discontinuity and that discontinuity creates the need to readjust all the data, understand the changes and adjust your internal processes for those. And so you’re much better off, or at least it’s a natural tendency, for many people and for many organizations to tend to stick to what they already bought in the past, simply because it minimizes disruption.
Safa: As you can imagine, there are many serious implications of some of the largest publishers also being leading providers of research analytics – including implication for academic freedom and various conflicts of interest.
Claudio: Let’s start perhaps with the most visible one. I just described a vision in which only a handful of companies – and perhaps only one or two companies emerge over time as the providers of this information. That creates a huge threat to diversity in the academic community.
Let’s imagine that only one company emerges and becomes a monopoly. Then every academic institution in the world, which wishes to analyze where to invest their research spend, will start off from the same algorithm. Answers might still be different because they’ll have local conditions, relative strength and weaknesses, et cetera, but they will all start from an algorithm that fundamentally believes that you should investigate this gene and not that gene, you should investigate this approach to viruses, not that approach to viruses.
That’s not what we want. We don’t need the academic community and the research community to have their agenda set by a handful of companies that may end up influencing vast amounts of spend, vast amounts of time and investment, to pursue the same objectives simply because the underlying advice and recommendations come from the same source.
There is another significant issue that has not yet come to the forefront, but I believe will become crucial going forward. And that is the fact that if you are both operating a research analytics business and a publishing business, you are effectively operating in a massive conflict of interest. And the conflict of interest is really damaging the researchers.
If you’re a researcher and you have to think where you want to submit your paper, and by the way you happen to know that publisher X also operates the faculty management systems using your university, and which will be used in evaluating your tenure, you may decide the margin – you’re better off submitting your paper to that publisher, not to another publisher. You may in fact think that another publication would be more suitable, but you don’t want to cross and you don’t want to take a chance with the research analytics, particularly if they’re not transparent.
And so, there is a value to maintaining as clear and tight of a separation between publishing and research analytics. That is clearly not understood by some of the publishers that offer both activities. And it doesn’t seem to have yet fully been metabolized by the academic community.
Personally, I think the academic community should simply demand their academic institutions do not buy research analytic products from companies that also publish research.
I think there is hope. I think there is a possibility for the academic community to reverse the situation – but it needs to set its mind to it.
Let’s think about what are the strengths and the weaknesses of the commercial enterprises. And then let’s talk about how the academic community can exploit the weaknesses and build on its own strengths.
So as we pointed out earlier, concentration in a handful of companies within the publishing community means that there are vast resources – financial, and managerial to pursue the growth of research analytics. And so there is no question that in the near term, all of these companies can mobilize vast amounts of data, vast amounts of money first of all – to acquire data, to acquire smaller companies, as they’re doing, to introduce these businesses, sell them aggressively, market them as appropriate.
And all of that leads to the possibility that this business will simply evolve as a very narrow oligopoly or quasi monopoly, controlled by one or two players at the most.
On the other hand, there are also weaknesses, right? The corporate world needs to see adequate returns from these investments. And ultimately it needs to see that shareholders reward this strategy. So far, the publishers have managed to convince the investment community that operating in the research environment in the way they operate is a positive for society. But that’s only because they’ve been narrating their side of the story. The academic community has not yet put forward the other side of the story. It hasn’t discussed the implications of a monopoly of algorithms. It hasn’t discussed the implication of a conflict of interest. And so as those voices are raised, going forward, it’s entirely possible that the investment community will start to ask questions.
Why is the profit motive driving so much of what should really be a societal activity that has vast implications for the life of all of us, as we have seen over the past couple of years. And there is no denying that there will be even more implications going forward, as we think about some of the challenges from loss of diversity to climate change, to the continued threat of new diseases. And we need to have a robust system that is not dependent on the economic interest of one or two players.
And so it’s entirely possible that the narrative in the financial community over the next few years will start to shift as it becomes more apparent what some of the implications of the current trajectory are. And that’s what I really hope will happen.
Safa: While this may be the current state of the academic publishing market, there are important strategies that the academic community can use to push back on this concerning degree of consolidation and create viable alternative systems.
Claudio: The academic community starts with a weakness. The academic community starts disjointed, operating bottom up. It needs to create coalitions of the willing, so to speak, to use a term that was involved 15 years ago or 20 years ago, almost in a very different context. But it also has a capacity to innovate. It has the passion of a lot of individuals that I’ve met over the years who really want to change for the better the environment for the research community.
There are lots of people in every country who are starting up small companies or small initiatives with the goal of changing elements of this landscape and building alternatives to the commercial offerings. There are advocates – SPARC is one, but there are others that are, that are fighting every day with great passion and great determination and great skill to highlight these issues in front of society at large.
And so I don’t take for granted that the monolithic approach of the commercial entities will prevail over time.
And so I think that everyone can do their bit. I don’t think that it’s fair or productive to simply sit back and complain or think that the world is over because we see the current situation in the marketplace and believe that the game is over. Every member of the academic community can go to their administration and say that they do not want to be subject to the conflict of interest inherent in a publisher operating faculty management systems and research analytic systems. Every person who works in a funding body can similarly challenge the use of research analytics as a way to think about the allocation of funding and grants to researchers.
There are ways to talk about how the publishing model operates today that are also very meaningful. If you think about the alternative that the commercial publishers have tried to convince the academic community to live with, it’s fundamentally a choice between excluding lots of people from reading or excluding lots of people from publishing.
The system is very expensive. It has high barriers to access in one way or the other, either through very high subscription prices to journals or through very high fees for publishing articles. And fundamentally the system is designed to exclude people.
The system works because it excludes people. It’s not a casual feature. It’s at the core of how these systems operate.
And so there are alternatives to this. There are other publishing models that are much more suitable for the academic community because they’re fair, because they are not exclusionary by design. And because they do not reward simply wealthy academic institutions or leaders in wealthy academic institutions, as the current models do. And so there is an opportunity to also support many of these new initiatives that are designed to change how the research itself is being made available.
We should always remember one thing that we often forget. It took close to 200 years to go from Gutenberg’s invention of mobile characters to the first scholarly journal – so the assumption today that we have an immutable model based on digital distribution, and that the only way to distribute research is the version of record of established academic journals is a very myopic view based on the half lifespan of all of us. It’s by far not the end of the story.
There is a lot of experimentation and the real important question is not whether there will be more experimentation going forward. There will be – the real question is what should be the principles and the values that drive the choice of the new models that will emerge over time in such a way that those models are more equitable than the ones that exist today. So, I will not for a moment stop at: we have journals, we have publishers, we have research analytics, as we see them today, we’re done. In 200 years this conversation will sound silly, because somebody will invent better models.
The question is how can we make sure that new models will be both more functional to the research community and more equitable?
And so – we know that there are biases. There are biases everywhere. There are biases in favor of English language journals. There are biases in favor of being able to subscribe to expensive journals. There are biases towards being able to pay for expensive APCs. There are journals that charge an APC that is the equivalent of the entire subscription budget of entire universities. And sometimes larger than the budget for subscriptions of universities in the global south.
We cannot continue to believe that the only solution to this problem is to hand out some waivers here and there in order to equalize the situation. Waivers do not work. They do not work properly. Even publishers will admit that and they’ve written so, but most importantly, they create a situation of dependency. They create automatically authors that are authors in the top tier and authors that are waiting for handouts. That is not what this should be. Science should be about researching and it should be about publishing the result of your research in the venue that’s most suitable to the quality of your research, regardless of what you researched and regardless of where you researched it.
The current model is designed to support a handful of fashionable topics that are being researched leading universities in the global north. That is not conducive to long term stability for society at large.
Safa: Some would argue that the decisions of the largest companies in the research space are driven by their perceptions of what the market will view favorably. Therefore, there might be ways to better align the market with values of knowledge equity – or put another way, there might be ways to create a real cost for companies that undermine knowledge equity.
Claudio: This is up to the entire academic community. We literally found in the blog of a leading publisher, the words of an editor saying that this person defined their job as getting more citations – that’s the definition. If you define your job as having more citations, because that drives a higher impact factor and that drives higher subscription rates, that’s your choice. It’s aligned with the values of your company. And it’s understandable. That doesn’t mean that those journals are going to publish research that’s important. That doesn’t mean that they’re going to necessarily look for research that’s innovative and coming from sources that are not conventional sources of recognized leading research.
And we’ve seen plenty of examples. We know that there are entire areas of medical research that are abandoned or quasi abandoned, simply because they’re not fashionable.
We almost did not have mRNA vaccines because areas like messenger RNA was not a fashionable area of research. You don’t need to go far. We have a perfect example today, uh, are what could have gone wrong if it wasn’t for the stubbornness, the passion and the dedication of a really small group of individuals who went against the common wisdom in their field. Journals did not help them getting their research out – journals went along with conventional wisdom.
Now if I were a publisher, I would make a different argument. I would say, ok, our editorial boards are populated from academics. Our biases are their biases. We are doing what they’re telling us we should publish, don’t blame us, blame the academic community. Perhaps, but participating in a poor quality set of decisions does not absolve you – just the fact that others share those poor decisions doesn’t make you any more excusable.
Plus, you’re charging a lot of money – you should actually be providing the best quality publishing, not just what the academic community’s telling you, but also what your leaders, the ones who are in a position to take risks and to think about what is really important in the research agenda in your field should be able to do.
If they don’t do so, perhaps you have a structural problem. Perhaps you’re very invested in what I tend to call the academic industrial system. And this academic industrial system tends to perpetrate some strengths but also some weaknesses. And it’s perhaps time to rethink about it.
Safa: Another challenging implication of the increase in use of artificial intelligence and machine learning technologies and algorithms in the academic publishing and research analytics space is something we touched on in the first episode of Season 3 – the fact these algorithms are built using data sets that are themselves inherently biased and limited. While this is true, the commercial interest of publishers in this space can motivate them to obfuscate these built in biases.
Claudio: I tend to worry a great deal about any service that tries to use machines and algorithms in place of human judgment. Let’s be clear. Humans are biased. Humans make errors. Humans are contradictory. All of those things are true. But in human interactions, at least we know, or we have an opportunity over time to develop a point of view on the biases that each of us carries.
My friends and colleagues know exactly where I come from. They’ve learned what are the things I know how to do well and where I get things wrong – and they know how to handicap my work, and to read it with particular care, because they think that’s where I’m more likely to have my own biases seep through.
If you get a result out of the machine, the odds that you are going to be unable to spot those biases and to spot those weaknesses and therefore to make decisions that are inherently unfair is very high.
And so my concern is not so much that there are algorithms being deployed. I am concerned about algorithms being deployed – but I’m more concerned about how they are used after they’re deployed. And I think that the real risk that we are running at the moment is that the commercial interest of publishers will lead to minimizing the impact that the deployment of these algorithms has in the name of facilitating management decisions. And they can find a very receptive audience in heads of departments and senior administrators of academic institutions and senior administrators of funding bodies, because they’re being fundamentally told here is a solution to all your problems.
We all feel stressed. We all feel that we should do better. And if someone comes to me and says, oh, here I have a great product for you, this will make you look great, this will solve all your problems – I’m more likely than not to want to try, use it, buy it again, and try to get a leg up over my next door arrival.
So there is a place for data. There is a place for data analytics, just to be clear – a world without data is not a world without bias, just to be sure. But at the same time, it is vital that academic institutions maintain their control over what they do with the data and how they scrutinize their data in a way that doesn’t allow the data to hijack their agenda.
We’ve already seen it. Right. Think about the impact factor. Think about university rankings, none of those algorithms – those are algorithms fundamentally – none of those algorithms were designed to do what they’re being used for today. And yet they’re being used, they’re affecting the life of the academic community. And if you ask most people within the academic community, including senior administrators, they will tell you that they don’t really like using that data because they understand in many cases at least, they understand their weaknesses or because they think that they’re unfair. But they’re unable to get out of the mechanism that pushes them to use them, because they don’t have any alternative today.
So what I want to see ideally is a lot of competition. It’s a lot of different ways to collect the data. Lots of different ways to analyze the data, very clear principles for how the data is made transparent, addressable, people have recourse if the data’s wrong and can correct it. People have a right to opt out of being evaluated through those algorithms. And if you have the principles in place and you have strong protection for the individuals and for the academic institutions, then it’s fine.
Do I expect commercial publishers to go down that route? Not really, unless they’re pressed by the academic community themselves or unless the academic community comes up with its own set of products and services that are designed to be transparent. That are designed to be addressable. That are designed to be customized to the needs of individual institutions. And that are designed to provide information, not to pick winners and losers in a way that’s not transparent to the users.
But at the same time, it’s too early to tell. The pressure is huge. The pressure is huge because there are societal pressures, there are pressures from donors, pressures from the states for publicly funded universities, they’re pressures from the board of administration of the universities that want to see clear indications of rising success, you know, in whatever way they choose to measure it.
And so it’s very difficult if you’re a senior administrator to say, no I refuse to get into the rat race. I refuse to use these tools because I think they’re poisonous. And so I think that what’s needed is a couple of things. It is a set of alternatives that are more friendly to the values of the academic institutions. And there is a need for a set of academic leaders to take the leadership that they have over their peers and use it for the good of this by starting to say, look, we’re choosing to use these tools only to this extent, only under these conditions, only adhering to these principles and never for certain uses. And if they take this very strong public position, then their peers will pay attention.
Let me touch on a couple of issues that I’ve run into. By no means I think I’ve been able to think through every possible dystopian scenario, but I’ll give you a couple that really worry and concern me.
One is the risk that the same model of university rankings spreads to funding institutions, funding bodies, and governments. Because that will create the pressure to go even in a more focused and decisive way towards a handful of what’s perceived as high impact, high return, often short term oriented, funding of research – that is not what we want or need quite frankly.
And so that’s one area of concern.
The other area of concern I have is simply the fact that the choices on where to publish articles will be automated. Why do you need editorial boards when you can have artificial intelligence decide what is the likelihood that this article will in fact have a lot of citations versus that article will not have a lot of citations.
That’s a way to perpetrate popular research themes, popular authors and popular or successful institutions that have the funding to drive a lot of research into those areas. And so I really worry that we could get into a loop that exacerbates the problems that we already have. We may have an academic industrial complex today, but we still have a lot of very well meaning individuals who do step up and say, no, sorry, I’m not going to publish this. Or I do want to move this journal in this direction because I think it’s important or it’s ethically relevant or because I believe that this is the right approach scientifically going forward.
If you leave that to machines, the risk that we are going to have a significant bias towards what has been successful in the past, and feed itself, at the exclusion of innovative and revolutionary approaches is very high.
And so I recognize that not every journal can publish in every case, every issue, every article, something that revolutionizes science, but if there is one sure way to diminish the chances that revolutions in science will occur is to make sure that they don’t get published because those approaches, those authors, those ideas were not espoused in the past by successful authors, successful articles.
Safa: Despite the current state of affairs and the trends that Claudio has been tracking over the past years, he continues to believe in the academic community – the students, researchers, academics, scholars, and staff’s ability to challenge this system and come up with a better alternative.
Claudio: I have a great faith in the academic community. So many of the individuals that I have met over the years really want to do the right thing. Yes, they are harassed by the impact factor and they worry about where they’ll get their next article published. Yes. If they’re senior administrators, they do wait for the rankings to come out and then they get badged by their president because they don’t have risen enough in some departments, but they mean well. They want to do things in the right way and they do what they do because they love what they do and they care about what they do.
And so, I really hope that going forward, there will be a set of people who will simply step up to lead, in a very forceful way, the next revolution, the next change.
We have had very courageous people step up with some of the open access declarations 20, 30 years ago, when it went against the terrain and against everything people believed – we need the next step, which is to have people in the senior ranks of academic administrations and the senior ranks of funding bodies to step and speak as loudly and as courageously as those academics did 30 years ago.
Safa: Thank you so much for tuning in.
If you are provoked by what you heard today, we invite you to join us at the Knowledge Equity Lab. Together we can fundamentally reimagine knowledge systems and build healthier relationships and communities of care that promote and enact equity at multiple levels.
Please visit our website, sign up for our mailing list, follow us on social media and send us a message to get involved!
SPARC Landscape Analysis: The Changing Academic Publishing Industry : Implications for Academic Institutions