[Online 09] Wendy Hall and Nigel Shadbolt, The Semantic Web Revolution – Unleashing the World’s most valuable information

Besides finding out who’s sponsoring the hugely important conference drinks, the opening keynote is also about remembering why we decided to attend in the first place. This year at London’s Online Information the main attraction, for me at least, was the track on Semantic Web. In his introduction conference chairman Adrian Dale phrased the question like this: How can we make the most out of the shift from the document-centric to the information-centric world?

The opening keynote was delivered by Dame Wendy Hall, professor in Computer Science, and Nigel Shadbolt, professor in AI, both at the University of Southampton.

First, to get an idea of where we’re going with semantic web, Wendy looked back on the evolution of the Web from read only, via read/write, and on to today’s social web. But what were the theoretical foundations of the current Web?

Well, pioneers of the Web, people like Vannevar Bush, Ted Nelson, and Doug Engelbart, envisioned more intuitive computer systems, systems that would (sort of) mimic the way we think. That is to say, we dont think in hierarchies, but in a more associative (read: “link-like”) manner. Engelbart thought such systems could augment, not replace, human intellect.

Now, on the threshold of what could be a new era in the history of the Web, we might well take a look at the lessons learned since the Web grew big in the 1990’s. Those are:

  • Big is beautiful; there has to be a certain critical mass of material/web pages, before things get going
  • The network is everything, and it doesn’t matter if parts of it is scruffy and has broken links
  • Democracy rules. If the web wasn’t open and free, it (probably) wouldn’t have taken off the way it did

But what’s missing from the web as we know it? Wendy suggests that we’ve lost the idea of conceptual linking (where targets are referenced not by their location, but by the semantics of the document). Instead, where links are missing we use search engines to fill out a gap. Nevertheless, we’re hungry to share data, and in doing so we may also, by means of RDF, structure and add meaning to it. When this is done, machines can begin making inferences.

With RDF we’re seeing a web of linked data starting to emerge. This new Web, which Nigel Shadbolt calls the Pragmatic Semantic Web, is yet another layer of abstraction on top of the Web, which was itself an abstraction on top of the physical network that existed prior to it.

The technical principles of this kind of semantic web are:

  1. the URI that enables you to refer unambiguously to resources
  2. the fact that resources can be dereferenced
  3. that it’s got RDF at the backend (this makes it flexible)
  4. linked data, which can be subjected to search (Sigma is a search engine for RDF annotated material on the internet)

To get information out of RDF triples, there’s been developed a special data access language called SPARQL (SPARQL Protocol And RDF Query Language). With SPARQL, which became a W3C recommendation in January 2008, it’s possible to answer complicated questions, such as “Give me all people born in London before 1827″. But are there any data to query?

As a matter of fact there are. Besides the BBC, the UK government is publishing (data.gov.uk) large volumes of public data which are now being described with RDF and thus being prepared to be repurposed/mashed-up by whoever’s interested. This enables users to type in a postal code and get all the public data (crime statistics, local transportation, etc) available for that area.This way, public data have social and economic value, but on a larger scale linked data matters, because it supports interoperability.

Related: Richard Wallis interviews Wendy Hall on the Semantic Web Revolution.

Watching empires decline

Here’s a little visualization of the decline of four great colonial powers: France, Great Britain, Portugal and Spain. Although I miss the Netherlands, and a delineation of the rise of colonialism, it’s great work and interesting to watch.

Visualizing empires decline from Pedro M Cruz on Vimeo.

Mutations of evolution

150 years after Charles Darwin published On the Origin of Species, the theory of evolution has proven such a fruitful concept that the terms ‘evolution’ and ‘Darwinian’ have become commonplace. Today it’s simply a model for explaining change, and judging from its many different mutations, it’s tempting to suggest some kind of evolution of the theory itself.

Darwin’s theory of evolution assumed that certain heritable traits, namely those that make the survival and successful reproduction of an organism easier, become more common in a population over the generations. It was this mechanism, Darwin referred to as natural selection, and which he described elaborately.

Back in school I used to learn about Darwinian evolution as something which has sort of warped itself to different developing needs, without conscious intervention, and that evolution certainly wasn’t the same as progress – which was what the followers of Herbert Spencer thought. But it seems to me that it’s the Spencerian, or ‘Socialdarwinian’, form of evolution (“survival of the fittest”) most people today think of when they hear the word. A couple of examples from the last few months; you have the evolution of:
  • Architectural Ideas – According to Danish architect Bjarke Ingels, some architectural ideas prove more sustainable than others. Why? Because people select the best ideas. (Here, by the way, you’ll also find the interesting contrast: evolution vs. revolution, which is also a theme here and here)
  • Blogging – Om Malik says blogs need to evolve and be more social. Why? Otherwise they will not survive the competition with social networking services like Facebook and Twitter.
This was evolution as progress. But then there’s Niall Ferguson. His take on the evolution of financial theory is more refined, and this clip is definitely worth watching. Worth a read, on the other hand, is his book The Ascent of Money. Here he offers six features shared by the financial world and evolutionary systems. I quote:
  1. ‘Genes’, in the sense that certain business practices perform the same role as genes in biology, allowing information to be stored in the ‘organizational memory’ and passed on from individual to individual or from firm to firm when a new firm is created.
  2. The potential for spontaneous mutation, usually referred to in the economic world as innovation and primarily, though by no means always, technical.
  3. Competition between individuals within a species for resources, with the outcomes in terms of longevity and proliferation determining which business practices persist.
  4. A mechanism for natural selection through the market allocation of capital and human resources and possibility of death in cases of under-performance, i.e. ‘differential survival’.
  5. Scope for speciation, sustaining biodiversity through the creation of wholly new species of financial institutions.
  6. Scope for extinction, with species dying out altogether.
Language can also be seen as an evolutionary system. No matter how much people intervene and try to restrict it, it still evolves in mysterious ways. I wonder what evolutionary features one could find there? En bien! Vive le sport!

EPUB now available on Google Books

I’m happy to learn that Google Books have made their public domain books available for download in the EPUB format. This is a nice supplement to the existing image-based PDF version, because you’re no longer tied to large size displays -which, obviously, is where PDF works best.

epub

In a previous post I outlined the advantages of EPUB, but they’re well worth restating: EPUB is a free open standard designed to make text adapt (“reflow”) even to the smallest displays, and it’s supported by a growing ecosystem of digital reading devices.

All you need to get started on classics like Treasure Island is a reader. For instance, O’Reilly’s Bookworm is free online, and available in a growing number of languages. If you’re an iPhone user, you can install Stanza. Perhaps I should add that these two readers have been reviewed in Wired.

However, Google Books is not the only place, you can download EPUBs; ManyBooks, Feedbooks and Project Gutenberg are also available.

This is not transparency

A key factor in establishing authority on the internet is, as David Weinberger convincingly argued, transparency:

What we used to believe because we thought the author was objective we now believe because we can see through the author’s writings to the sources and values that brought her to that position. Transparency gives the reader information by which she can undo some of the unintended effects of the ever-present biases. Transparency brings us to reliability the way objectivity used to.

I agree with much most of it, and perhaps the point can be further illustrated by a quick example. If you take a look at the Wikipedia article on the epistemological sense of, well, Transparency, the contrast between then and now will be clear:

WikipediaTransparency
As you can see, there’s an explanation and a reference to an article by professor Paul Boghossian. The reference is the interesting part, because in academia this is perfectly sufficient for convincing readers that the material can be trusted. At least, it leaves you with an idea of what to do when you get to the library.

But the internet isn’t like the research library at all. Here, everybody could have made the claim that a certain Paul Boghossian said so and so about transparency, but, since links to resources supporting it (e.g. Wikipedias article on Paul Boghossian, for one) are extremly few, the article isn’t transparent and doesn’t meet Wikipedia’s requirements for verifiability, let alone follow conventions of the internet media.

Transparency is not the new objectivity, but comprehensiveness just might be

In a terrific post, Transparency is the new objectivity, David Weinberger argues that the hyperlink nature of the internet is reshaping our notions of authority. With everybody suddenly a potential author, the old claim to objectivity seems more and more trite and outworn:

Objectivity used to be presented as a stopping point for belief: If the source is objective and well-informed, you have sufficient reason to believe. The objectivity of the reporter is a stopping point for reader’s inquiry. That was part of high-end newspapers’ claimed value: You can’t believe what you read in a slanted tabloid, but our news is objective, so your inquiry can come to rest here. Credentialing systems had the same basic rhythm: You can stop your quest once you come to a credentialed authority who says, “I got this. You can believe it.” End of story.

Instead we demand transparency; to be able to “see through the author’s writings to the sources and values that brought her to that position.”

Transparency gives the reader information by which she can undo some of the unintended effects of the ever-present biases. Transparency brings us to reliability the way objectivity used to.

I think that this kind of “hyper-transparency” -where citing a book isn’t enough, but where a link has to point to the actual resource- may be an essential feature of the internet medium; but whereas it certainly is a necessary condition for establishing reliability, it’s hardly sufficient. After all, what leads to reliability is not the number of hyperlinks to the author’s sources, but trust in the fact that the relevant aspects of the matter have been adequately dealt with.

So, instead of objectivity, I’d suggest ‘comprehensiveness’ as a condition for reliability. And it’s a sufficient one too, because on the internet comprehensiveness seems more than ever to subsume transparency.

From Topic Maps to MediaWiki – Quick and Dirty

Recently, I needed to make some fairly large bodies of XML available for editing by a group of people. In this case the data was stored in the Topic Maps format (XTM), and –as long as I was the only one editing the files– this had been working just fine.

But with more people about to join in, it was clear that editing the files in a simple text editor wasn’t such a good idea. So, to avoid the risk of ending up with different versions (and people endlessly complaining about editing XML), I decided to turn the whole thing into a wiki.

Now, MediaWiki has the Special:Export tool for migrating wikis (‘transwikiing’). It exports pages  in a simple XML format, so that you can import it to another wiki. This way you’re able to create a wiki simply by emulating the MediaWiki XML export format.

How to

If you want to try it, the MediaWiki output has to look a little something like this:

<?xml version="1.0" encoding="utf-8"?>
<mediawiki xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns="http://www.mediawiki.org/xml/export-0.3/"
  xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.3/
  http://www.mediawiki.org/xml/export-0.3.xsd"
  version="0.3" xml:lang="da">
<page>
 <title>Google</title>
 <id>1</id>
 <revision>
  <id>1</id>
  <timestamp/>
  <contributor>
   <username>yourUserName</username>
   <id>1</id>
   </contributor>
   <text xml:space="preserve">
   <!-- Wikitext goes here -->
   ==Link==
   [http://www.google.com]

   </text>
  </revision>
</page>
<page>
 <title>Microsoft</title>
 <id>2</id>
 ...
</page>
</mediawiki>

If your data is XTM, your starting point might be something like this made-up Topic Map with names and links of three companies:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topicMap SYSTEM "xtm1.dtd">
<topicMap id="companies-tm.xtm"
  xmlns="http://www.topicmaps.org/xtm/1.0/"
  xmlns:xlink="http://www.w3.org/1999/xlink">
 <topic id="001">
  <baseName>
   <baseNameString>Google</baseNameString>
  </baseName>
  <occurrence>
   <resourceRef xlink:href="http://www.google.com"/>
  </occurrence>
 </topic>
 <topic id="002">
  <baseName>
   <baseNameString>Microsoft</baseNameString>
  </baseName>
  <occurrence>
   <resourceRef xlink:href="http://www.microsoft.com"/>
  </occurrence>
 </topic>
 <topic id="003">
  <baseName>
   <baseNameString>Oracle<baseNameString>
  </baseName>
  <occurrence>
   <resourceRef xlink:href="http://www.oracle.com"/>
  </occurrence>
</topic>
</topicMap>

In this case the following XSLT stylesheet will do the job:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   xmlns:tm="http://www.topicmaps.org xtm/1.0/"
   xmlns:tmlink="http://www.w3.org/1999/xlink"
   exclude-result-prefixes="tm tmlink" version="2.0">
 <xsl:output method="xml" encoding="utf-8" indent="yes"/>
 <xsl:template match="/">
  <xsl:apply-templates select="tm:topicMap"/>
 </xsl:template>
 <xsl:template match="tm:topicMap">
  <mediawiki xmlns="http://www.mediawiki.org/xml/export-0.3/"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.3/
  http://www.mediawiki.org/xml/export-0.3.xsd" version="0.3" xml:lang="da">
   <xsl:apply-templates select="tm:topic"/>
 </mediawiki>
</xsl:template>
<xsl:template match="tm:topic">
<page>
 <title>
  <xsl:apply-templates select="tm:baseName/tm:baseNameString"/>
 </title>
 <id><!--To give each page a unique number, use the xsl:number instruction--><xsl:number/></id>
 <revision>
  <id>1</id>
  <timestamp/>
  <contributor>
   <username>yourUserName</username>
   <id>2</id>
  </contributor>
  <!--Since whitespace is crucial to the layout of your wikipage,
you should add the xml:space attribute and set the value to 'preserve'-->
  <text xml:space="preserve">
  <!--Now start building your wikipage -->

==Links==
<xsl:value-of select="tm:occurrence"/>

</text>
</revision>
</page>
</xsl:template>
</xsl:stylesheet>

Therefore:

  • Make sure that your wiki is installed, AND that you have admin rights
  • Create a stylesheet, somewhat like the one provided above
  • Run the stylesheet on your XML file, for instance from your command line with saxon:
    $ saxon topics.xtm topicMaps2Mediawiki.xsl > mediawikiTopics.xml
  • Go to the Special:Import page on your wiki
  • Browse for the file, and
  • Upload! Do remember, however, that the filesize maximum defaults to around 1.4 MB. To change it, you need to go to php.ini and simply change the parameters for maxuploadsize=.

After uploading the file, you’ll receive a list of links to the pages, you just made.

The Case for Content Strategy

Over the last couple of years I’ve come to appreciate the term content strategy. It began in 2007 with Rachel Lovinger’s article Content Strategy: The Philosophy of Data. Here she urged readers to take a closer look at content itself, and then find out exactly who’s responsible for making it relevant, comprehensive, and efficient to produce.

I liked that, because it touches upon the very basics of communication, something which, I think, is somewhat neglected at the expense of design issues (keeping sentences short, using chunked text, putting action in verbs, etc.). Way too often, content is taken for granted. It’s what the customer brings to the agency, or something to be filled in later instead of the “lorem ipsum” gibberish, designers use.

Basically, content strategy adresses the issues of anyone trying to communicate anything, i.e. how to make your website function as:

  • a truthful representation of the sender’s intentions
  • a message relevant to the user
  • a correct use of language and imagery
  • an open channel between reader and author

And, of course, if you’re any good at writing, your text might even have an aesthetic value on its own.

Producing useful and useable web content on a daily basis isn’t a matter of being touched by the hand of god, or endowed with the perfect content from your client; it’s a matter of planning, and you need to be a part of it. Since internet communication involves quite a few disciplines, there’s a lot to plan for. A few things to consider:

  • Editorial strategy defines the guidelines by which all online content is governed: values, voice, tone, legal and regulatory concerns, user-generated content, and so on. This practice also defines an organization’s online editorial calendar, including content life cycles.
  • Web writing is the practice of writing useful, usable content specifically intended for online publication. This is a whole lot more than smart copywriting. An effective web writer must understand the basics of user experience design, be able to translate information architecture documentation, write effective metadata, and manage an ever-changing content inventory.
  • Metadata strategy identifies the type and structure of metadata, also known as “data about data” (or content). Smart, well-structured metadata helps publishers to identify, organize, use, and reuse content in ways that are meaningful to key audiences.
  • Search engine optimization is the process of editing and organizing the content on a page or across a website (including metadata) to increase its potential relevance to specific search engine keywords.
  • Content management strategy defines the technologies needed to capture, store, deliver, and preserve an organization’s content. Publishing infrastructures, content life cycles and workflows are key considerations of this strategy.
  • Content channel distribution strategy defines how and where content will be made available to users. (Side note: please consider e-mail marketing in the context of this practice; it’s a way to distribute content and drive people to find information on your website, not a standalone marketing tactic.)

I didn’t make that list (it comes from Kristina Halvorson, and it’s part of the article The Discipline of Content Strategy), but I agree. All of these branches are tools that help us create meaningful user experiences.

While there are obvious overlaps between content strategy and information architecture, I think that the two first disciplines on the list add something genuinely new. It’s not enough to structure and make the things on your website findable, you also need to make sure that the very content you’re providing is right for the occasion.

So, ultimately, it’s all about efficiency, and planning supports efficiency. Since creating content is both difficult and expensive (and always seems to be somebody else’s job), you want to make sure that every aspect of it performs at its best, and therefore there’s good reason to take the concept of content strategy (CS) seriously.

See also Jeffrey MacIntyre’s eloquent Content-tious Strategy.

Introducing EPUB

With digital books finding their way to more and more, people read everywhere and on a variety of different devices. A lot of these have small displays, and this is a problem if the text you’re reading is in PDF.

EPUB is an XML publishing format for reflowable digital books and publications standardized by the International Digital Publishing Forum (IDPF), a trade and standards association for the digital publishing industry. For the record, this organization was formerly known as Open eBook Forum. “Reflowable” means that it scales to fit different screen sizes.

Since its official adoption by IDPF in 2007, EPUB has become popular among major publishers as Hachette, O’Reilly and Penguin. The format allows publishers to produce and send a single digital publication file through distribution, and it can be read using a variety of open source and commercial software. You can use O’Reilly’s Bookworm online for free, and you can go buy Adobe’s Digital Editions (ADE). It works on all major operating systems, on e-book devices (like Kindle and Sony PRS), and other small devices such as the Apple iPhone.

Collectively referred to as EPUB, the format is made up of three open standards:

  • Open eBook Publication Structure Container Format (OCF): Describes the directory tree structure and file format (zip) of an EPUB archive
  • Open Publication Structure (OPS): Specifies the common vocabularies for the eBook, especially the formats allowed to be used for book content (for example XHTML and CSS)
  • Open Packaging Format (OPF): Defines the required and optional metadata, reading order, and table of contents in an EPUB

To learn more, Liza Daly of Threepress has done a nice tutorial called Build a digital book with EPUB, available at IBM developerWorks. To really get to know EPUB, you’ll need to read the specifications: OCF, OPS, and OPF.

Civilisation works great on TV

On smashing telly! I saw a Channel 4  program: The 50 Greatest Documentaries, and among the ones featured was BBC’s 1969 (colour!) venture Civilisation with Kenneth Clark.

One of the things that makes Civilisation great TV is that it’s such a personal account. This isn’t anonymous lecturing under the guise of scientific objectivity, but a passionate plea for culture in a society threatened by a cold war suddenly turning hot.

Here’s a little sample. Take it away, Kenneth!

The entire series is for sale here

Next Page »