JSTOR, the electronic archive of academic journal articles, has been in the news this week. A programmer charged with massive theft turns out to be a 24 year old Harvard researcher named Aaron Swartz, who downloaded 4.8 million articles from JSTOR to hard disk, using a script. His identity was known, and JSTOR involved the police:
Swartz was charged with computer intrusion, fraud, and data theft. If convicted, he faces a maximum of 35 years in prison, restitution and forfeiture, and a fine of $1 million. A PDF of the indictment is here. …
Members of Demand Progress, a nonprofit political action group Swartz founded, criticized the indictment.
“This makes no sense,” the group’s executive director, David Segal, said in a statement. “It’s like trying to put someone in jail for allegedly checking too many books out of the library.”
Today a new twist: 19,000 articles have been leaked to protest the ‘war on knowledge’.
A critic of academic publishers has uploaded 19,000 scientific papers to the internet to protest the prosecution of a prominent programmer and activist accused of hacking into a college computer system and downloading almost 5 million scholarly documents from an archive service.
The 18,592 documents made available Wednesday through Bittorrent were pulled from the Philosophical Transactions of the Royal Society, a prestigious scientific journal that was founded in the 1600s, the protester said. Even though the vast majority of the documents are hundreds of years old, the London-based Royal Society charges from $8 to $19 for each one, and restricts viewing to one person on one computer for only a single month.
“If I can remove even one dollar of ill-gained income from a poisonous industry which acts to suppress scientific and historic understanding, then whatever personal cost I suffer will be justified – it will be one less dollar spent in the war against knowledge,” Gregory Maxwell, self-described hobbyist scientist from Northern Virginia, wrote in a manifesto accompanying the upload. “One less dollar spent lobbying for laws that make downloading too many scientific papers a crime.”
Academics and copyright critics immediately criticized the charges as excessive, likening them to trying to put someone in jail for checking out too many library books. They argue that many of the documents in JSTOR’s collection are probably kept behind its paywall against the authors’ will and that there are no valid copyright claims restricting their distribution.
Indeed, court documents charging Swartz contain no claims of copyright violations. Instead, they cite Swartz for intrusion of MIT’s computer network and for impairing JSTOR’s systems by using an automated script that systematically scraped its archive.
In an email to The Reg, Maxwell said he decided against uploading the documents anonymously to prevent anyone from falsely claiming Swartz was behind the move. All of the documents were published prior to 1923 to ensure they are all in the public domain.
The case is an extremely interesting one from many points of view. The charges are frivolous, since the details of how he accessed the data are, frankly, not the point at issue. These, clearly, are the best charges that the lawyers could find.
It is interesting — and probably telling — that JSTOR don’t want to put their claim of copyright to the court. I suspect their lawyers have advised them that there is nothing to gain, that at present almost everyone is respecting their exaggerated but untested claims, and that the only possible consequence of a judge looking over the matter will be to create case law which — since they currently get everything they want — would most likely restrict them in some way.
Maxwell has done precisely the right thing here, in my opinion, and I hope others will follow him. Let us all, by all means, protest legally in this way. The Royal Society’s greed — futile greed, because whoever would pay such a sum? — is indeed utterly poisonous. Nor is the Royal Society alone. A lot of British tax-funded institutions treat the web as a mechanism to extort money, rather than a means to contribute to society.
At the same time, we need to recognise that JSTOR do have a problem here. They are not altogether the bad guys. The problem, succintly, is bad law. JSTOR are uploading material created, in the main, by scholars paid by the taxpayer. But JSTOR can’t pay its bills unless it charges. It can’t charge unless it restricts access to institutions. One infuriating aspect: while charging you and I to use it — we have, of course, already paid for it once in taxes –, it gives free access to the inhabitants of third-world despotisms.
The answer, surely, is for the government to take over JSTOR and fund it from taxes. It makes no sense for us to pay scholars to create material, with all the facilities involved, and then pay again to access it via a different mechanism, which restricts access to a few. Treat it as what it is — a library funded by the public — and remove all the layers of public money going here and there. It will undoubtedly be cheaper, involve less administration, and benefit the world.
Some might say that academic publishers only allow material on JSTOR because it is subscription, and they get a cut of the cash. This is probably true. But this in turn points up how academic publishing is no longer the benefactor of the world that it was in the days of print. When the only technology for articles was paper journals, these presses performed a service. But now? Technology has rendered that distribution mechanism obsolete, and the funding structure that supported it, harmlessly, is now a barrier to access. This too, I think, will change.
The outcome of the case must be of great interest to all of us. I do hope that the issues are confronted squarely.
UPDATE: There is a thoughtful article at the New Yorker here. This adds the important detail that JSTOR says that, after calling the cops, it “considered its dealings with Swartz complete” once Swartz had deleted his copies of the download.