Learning from the Web for Learning on the Web

Earlier this year Steven Stapleton from University of Nottingham emailed me and asked if I’d like to be a keynote speaker at OpenNottingham. I accepted, and was very excited to be part of the day. More recently, an opportunity unexpectedly presented itself, and I decided that after seven years, it was time to move on from Creative Commons. As a result of the timing of my departure, I was unable to travel to the UK this past week. What follows are the remarks I delivered via Skype for the event.

Update (16 April 2011): Video of my presentation via Skype is up on YouTube.

This is actually my last presentation as CTO of Creative Commons, and as I was preparing for it this week, I spent some time thinking about what questions are on my mind about open education, and where I look for answers. CC is a little different from a lot of organizations working in this space: we develop legal and technical infrastructure as much as anything, and as such we wind up with visibility into many different domains. I hope this perspective can help us think about the future of open education, and what’s next.

Let me begin by stating what I believe to be true, and what I hope you agree with going into this. First, there has been an amazing explosion in activity surrounding eduction and learning on the web. In less than ten years we’ve seen words and acronyms like OER, OCW, metadata, and repository enter our collective consciousness, and seen myriad exciting projects launch to support open education.

Second, there is a feeling that the web, the internet, can help us deliver educational materials to audiences that are exponentially larger, with only incremental increases in cost. This broadening of delivery puts us in a position to reach and empower people in ways that they have not been before: life long learners, remedial learners, and others who may be under served by traditional models.

And third, we aren’t there yet. There are still challenging questions that we haven’t quite figured out how to answer, or that we’re just beginning to explore. For example: How do users discover open educational resources on the web? How do we determine what our impact and reach is? And what do we call success? I want to spend the next 10 to 15 minutes talking about some possible answers and things that I’ve been thinking about over the past year. What can a very selective history of the web tell us about where we are and what the future holds for online education.

The first two statements I made about learning on the web — that there has been a massive surge in interest and activity, and that there is a potential to reach vast audiences with only incremental additional cost — could very well have been made about the web itself in its early days. People were fascinated by the potential of this new technology, and rushed to stake their claim by publishing their own documents, sharing their knowledge. Now if I were you I’d be thinking, “Right, but I think we’re doing something a little more important than uploading scans of our favorite unicorn photos to GeoCities.” True enough, but the point is this: people were publishing, and they weren’t sure what came next.

As people began uploading and creating content and we saw this rise in the creative output of people on the web, there was an increasing need to capture this and organize it in an approachable form. You might have been able to draw a diagram of the early web on a sheet of A4 paper, but that rapidly became inadequate. The question of how you approach and understand this network of content became critical to answer. And the first answers were decidedly hands on processes. Yahoo did not start as an index of text on the web: it started as a way to search a hand curated set of resources, classified by human beings into categories and topics. DMoz, another directory of the web, took a similar approach: organizing resources into a hierarchy, bringing order where there was none. In both cases this was fundamentally a task of curation: what belongs in the list, and what does not. These were the web’s librarians, trying to provides an ontology that was flexible enough to handle the growing amount of content, and rigid enough that people could understand it.

Now there were definitely issues with this approach when applied to the early web, not the least of which was that they did a poor job of coping with different languages and cultures. Additionally, these directories didn’t leverage the fundamental relational nature of the web. Cross-referencing the list of resources categorized under different facets, such as language and subject, wasn’t an easy task.

As the web continued to grow, people began to realize that they could exploit the natural structure of the web — documents and links — to build a better index. Instead of searching the terms that a human used to label a resource, we could write software that followed links and created an index of the resources. So instead of hand curating a list of documents, we could trust that things linking to a document probably had a similar topic, or described the topic of what they were linking to. And eventually that “good” resources would have more links than those that did not.

It’s interesting to note that even as this transition from searching a curated list to searching a text index was taking place, the curated list still served an important purpose. Both the Yahoo index and DMoz were useful as the seeds for initial crawls. By starting with those pages, and following the links on them to other pages, software was able to being building a graph of content on the web. Curation was an important activity on its own, but it also enabled bigger and better innovations that weren’t obvious at the beginning.

So we look at the evolution of the web and see the move from curation as the primary means of discovery to curation and links as the seeds for larger and more complex discovery.Learning on the web has done a lot of this basic curation, both on a de facto and explicit basis. OCWC members publishing lists of open courseware, Connexions publishing modules and composite works, and OER Commons aggregating lists of resources from multiple sources are all acting as curators.

It’s this evolutionary question that we’re starting to face now: what is a link in online learning, and how do we compose larger works out of component pieces while giving credit and identifying what’s new or changed? Creative Commons licenses and our supporting technology provide a framework for marking what license a work is offered under, and how the creator wishes to be attributed. There seems to be wide-spread acknowledgement that linking as attribution is reasonable, but what about linking to create a larger work, or linking to cite a source work? Too often it is not obvious what components went into a work, or how to find them in a useful format for deriving your own work.

Last week I came across a website developing a free college curriculum for math, computer science, business, and liberal arts. At first I was really excited: the footer of the pages contained a link to Creative Commons Attribution license, and a full curriculum for things like computer science under a very liberal license is the sort of thing that gets me excited. But as I dug deeper, I found that the curriculum was actually more like a reading list: links to PDFs and web pages with instructions to read specific sections, pages, or chapters. Now it’s really exciting that the web and educational publishing on the web has progressed to the point where someone can act as a curator and assemble such a reading list, where all the resources are accessible.

What’s frustrating — and illustrative of this question of what a link means in education and how we create largely, composite documents, I think — is the information that’s missing. Links to PDF files and other sites are a start, but they don’t capture the actual relationship that exists between the component pieces. By exploring what the graph of educational works looks like, we can enable applications and tools that help answer the question of discovery, like the search engines that grew out of exploring the graph of documents on the web at large.

So with a multitude of ways to discover content on the web, publishers began asking the question: who is finding me, how, and what are they actually looking for? Am I reaching the individuals that I think I am, and how are they interacting with my site. In other words, what is success, and how do I measure it? The web as a whole answered this question through the development of tools like Google Analytics, Piwik, and others. For many publishers, success is defined as more visitors who spend more time on the site. I’m not actually sure that’s true for education on the web, or at least I don’t think it’s the entire story.

When I think about success for open education and education on the web, I think about both web metrics and education metrics. Web metrics look a lot like everyday web publishing metrics: visitors, time on site, bounce rate, etc. If you’re trying to drive visitors to a particular site or service, you might also measure conversions as part of your success metrics. Education metrics, however, are a lot tougher to work with on the open web. We may want to determine whether a particular resource helps people pass an assessment, but where does that assessment come from? And how do I even find alternative resource to compare results with?

As we continue to curate a pool of educational resources online, one of the facets that I’ve encountered frequently of late is how OER align to curricular standards or quality metrics. This is an example of curating for something other than the subject. That is, while early curation systems classified web pages based on their topic, there’s no reason they couldn’t classify based on what curricular standard they address instead. Embracing curation has the potential to enable new assessments and metrics that build on the nature of the web and are more broadly applicable. For example, if online education embraces a culture of linking and composition using links, it’s possible to imagine a measure of reach and impact based on links and referrers, instead of just visitors.

As we begin to explore these questions, there’s also the opportunity for this community to lead developments on the web instead of just following past trends. As this community of practice continues to develop, we can learn from the past and iterate to increase our impact and reach. While search engines initially just leveraged links to determine where a resource fits in the web, there is increasing recognition that structured data can help us develop tools that provide better results and user experiences. Web scale search providers are beginning to leverage this information to improve search results to include information like the number of stars a restaurant is reviewed at, or the cost of a product you searched for. Creative Commons uses structured data to indicate that the link to our license isn’t just another link, it actually has some meaning. By annotating links with information about their meaning, we can enable tools which give weight to different relationships based on context.

There is a great opportunity to develop rough consensus and working code around how structured data can be used to indicate the relationship between parts of a curriculum, alignment of resources to a curricular standard, or the sources a work uses. This is the next reasonable step for the use of structured data and curation on the web, and the open education community has a real opportunity to lead. As we publish resources online, we can develop a practice of linking, annotation, and curation.

This high level, incredibly vague, and very, very selective history of the web shows that there are many lessons education on the web can learn, and at least an equal number of areas where it can lead. There is excitement and passion, but we need to ask ourselves some hard questions as we move forward. What does success look like, how do we measure our impact and reach, and what can we learn from those who have gone before.

Thank you.