Open Access and Linked Data
I traveled to the midwest late last month and made a few stops, including PyCon and a brief visit with my parents. In between those two bookends I spoke at University of Michigan’s Open Access Week and had a few meetings with various parties. My topic was pretty broad — CC and Open Access — but I was [personally] pleased with how the talk came together. I’d like to re-create it for the purpose of creating a slidecast ; maybe sometime soon.
In putting together the content I realized that while I had this gut level, assumed knowledge about what Open Access is, I hadn’t ever read a definition or really delved into it. When I read the Budapest Open Access Initiative , one part stood out to me.
By “open access” to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. Well of course it stood out to me, it’s a core descriptive sentence. But in particular, “availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, [or] pass them as data to software.” Interestingly this sentence ties right into the other meetings I was having that week which all seemed to come back to linked data (in particular RDFa ). If you think about it, this sentence has implications that make OA materials perfect for linked data integration. It implies:
- you have a stable, unique URL for the work
- there isn’t a paywall or login requirement in front of the actual work
- there isn’t any user agent discrimination — text in a Flash viewer need not apply (I’m looking at you, Scribd)
- they’re in a format that’s useful as data; maybe [X]HTML?
So we have a growing corpus of information that’s ripe for markup with structured data. We’re doing a lot with embedded, structured [,linked] data right now at CC (things we need to do a better job talking about). I find it reassuring that the principles other efforts value mesh so well with what we’re doing.