Open Source, Open Culture

Nathan R. Yergler
Software Engineer,
Creative Commons

Agenda

  • What is Creative Commons?
  • Metadata and CC
  • Off the desktop, onto the net

Welcome, and thanks for having me here this evening. My name is Nathan Yergler. I'm a software engineer for the Creative Commons, and tonite I'd like to talk to you about 3 different areas: what CC is along with why it's a necessary institution, how we're using metadata and open source software to advance CC, and how you can share your creative works on the internet under a CC license. Hopefully along the way I'll share some information with you about the history of copyright, how it's changed, and help you understand why you might want to license your works under a CC license.

I should point out before getting started, however, that I'm not a lawyer. I'm a geek, a hacker, a software engineer. Other staff members are far more qualified to talk on certain topics, so if someone has a question I can't answer, or don't feel like I can answer authoritatively, we'll make a note of it, and I'll post the answer on my blog after checking with people with bigger brains.

What is Creative Commons?

"Creative Commons is a nonprofit organization that offers flexible copyright licenses for creative works."

"some rights reserved"

What Do We Do?

  • Build within the current copyright system
  • Provide licenses for creative works
  • Develop technology for tagging those works with metadata
  • Work to lower the transaction costs for using copyright-protected material

Our Licenses

  • Build on the current legal system
Attribution Reuse Commercial Use
images/by.gif
images/nd.gif
images/sa.gif
images/nc.gif

Behind the Scenes

In addition to our own licenses, we also provide deeds and machine readable information to supplement the GNU GPL and LGPL.

License Deed

images/by-nc-sa.png

The human readable deed for a license displays the freedoms, as well as the conditions placed on use of the licensed work. Embedded in the HTML of the deed, as well as the HTML snippet you get for embedding in your web page, is a chunk of metadata which describes the license terms and conditions in machine readable form.

Meta-what?

metadata(n): data about data; "a library catalog is metadata because it describes publications"

Why do we care?

Among other things, RDF helps different programs talk to each other. Imagine a world where everything had embedded RDF: When buying a plane ticket, for example, you could drag your flight itinerary onto your calendar program to add it to your calendar. You could drag a friend's top-ten songs list onto your music player, and it could try and obtain the songs for you automatically.

RDF can also be used to create more powerful search engines. Right now the only type of question you can ask a search engine is "What pages have these words in them?" When pages include RDF metadata, you will be able to ask more advanced questions like "What's the current temperature in California?" Programs can also use this information, like an alarm clock program that also displayed the current weather or a collage-making program that only used photos with permission.

Finally, metadata can be aggregated across the whole Web. A program could download all the top-ten song lists and, with the help of a pricing guide in RDF, calculate the cost of buying the most popular albums.

Metadata holds a lot of promise, but it won't be useful until people start adding it to their pages. Creative Commons hopes to help promote metadata by making it very easy for people to add metadata to their pages.

RDF

  • RDF is a framework for metadata
  • Defines statements that have:
    • a subject ("Nathan")
    • a predicate ("lives in")
    • an object ("Fort Wayne")

So in this example we're talking about a statement which makes an assertion about where I live. "Fort Wayne" is the actual value of the metadata. The subject, "Nathan", describes what this metadata applies to. The predicate describes the relationship between the two.

Expressing Metadata

RDF in XML

<rdf:RDF xmlns="http://web.resource.org/cc/"
         xmlns:dc="http://purl.org/dc/elements/1.1/"
         xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <Work rdf:about="http://creativecommons.org/">
    <license rdf:resource="http://creativecommons.org/licenses/by/2.5/" />
  </Work>
  <License rdf:about="http://creativecommons.org/licenses/by/2.5/">
    <permits rdf:resource="http://web.resource.org/cc/Reproduction"/>
    <permits rdf:resource="http://web.resource.org/cc/Distribution"/>
    <requires rdf:resource="http://web.resource.org/cc/Notice"/>
    <requires rdf:resource="http://web.resource.org/cc/Attribution"/>
    <permits rdf:resource="http://web.resource.org/cc/DerivativeWorks"/>
  </License>
</rdf:RDF>

As you can see from this example, every subject, predicate and object has a fully qualified URI which describes that item. These URIs allow applications to match information about a single item from two sources in order to extend their "knowledge" about an object.

For example, the RDF above describes licensing information for the root page of the Creative Commons website. A separate RDF block could conceivably describe authorship information or subject information about the same resource. An application which knew how to read this information could very easily aggregate this information into a single display.

Embedding RDF & Metadata

Metadata, in any format, does us no good unless it is available and "discoverable". Lots of file formats have specifications for embedding some set of metadata; for example, EXIF defines a specification for embedding photo information in a JPEG file. Since RDF is a format and application independent way of describing metadata, it is desirable to embed or link it to our digital files.

We'll look at HTML files first, for which there are several competing formats, including our own braindead way.

Linking to the metadata in a <link> tag is attractive because it validates and doesn't break existing client implementations. However it requires users to maintain a separate file containing the metadata and isn't supported by all readers.

Including the metadata in the head or body portion of the document has the advantage that it's a very simple approach, and should pass seamlessly through parsers. But it often doesn't, and causes document validation to fail.

Inclusion as element attributes using a rel attribute is incredibly simple, and still validates. It also isn't technically RDF, and is limited to making statements about the current page. This is often what you want to do, but isn't as useful as a general-case solution.

Which leads us to our own braindead solution, placing the RDF in an HTML comment. This approach is simple, doesn't break anything and can be included in any HTML. Of course, it also makes purists somewhat naseous (as it should), and isn't supported by all readers. It has the additional advantage that since the information is in the HTML document itself, this gets indexed by search engines.

Other Formats

We have defined a standard linkage string for linking file formats that don't directly support embedding such as MP3 and OGG files. The example statement above contains several pieces of information. The copyright yeah, the copyright holder, the license URL and the verification URL. The verification URL is optional, but provides a way of verifying the file has truly been licensed.

The file at the verification url contains the license information for the file, and makes assertions about a file with a specific SHA1 fingerprint.

License Wrappers

Language-specific wrappers give us a language to address license-specific information.

>>> import ccrdf
>>> cc = ccrdf.ccRdf()
>>> cc.parse(rdf_block)
>>> cc.works()
[<ccrdf.ccrdf.ccWork instance at 0xb78a50ac>]
>>> cc.works()[0].licenses()
[u'http://creativecommons.org/licenses/by/2.0/de/']
>>> cc.works()[0].licenses().permits()
(u'Reproduction', u'Distribution', u'DerivativeWorks')

Publishing

Internet Archive

http://archive.org

  • Building a "digital library of cultural artifacts"
  • Maintain several collections:
    • Audio
    • Live music
    • Movies, including the Prelinger Archive
    • Texts
    • ...anything else
  • ccPublisher provides a clean front end for licensing and uploading works

ccMixter

images/ccmixter.png

ccMixter - Remix Tracking

images/remix.png

Jamendo

images/jamendo.png

Ourmedia

images/ourmedia.png

Flickr

images/flickr.png

The photo sharing site Flickr (http://flickr.com) allows users to select a default license which is applied to all photos they upload, as well as individually license photos they upload. They also publish a feed which consists only of licensed photos.

ccPublisher

  • Shipped first version in 2004
  • Preparing beta of version 2 for release this month
  • Emphasis on interoperability, customization, reconfiguration

Future Plans

Thanks

Questions?