The Power of Linked Data

I just looked up two people online, in two different ways, with two completely different experiences.

First, I looked up two staff members in our intranet’s Staff Directory. The information was structured into named fields, and the structure was completely consistent between the two: first name, last name, title, manager, upstream organization chart, photograph, etc. If I wanted to, I could ask a developer to present subsets of this information, e.g. an alphabetical list of full names with a picture, or a list of names of everyone who had the same manager as me. But I knew there was other information about these colleagues, stovepiped out of range. Their skills, interests, office locations, for example. Overall, high on structure, low on linkage.

Second, I looked up Albert Einstein and Neils Bohr on Wikipedia, two theoretical physicists who were contemporaries and significantly influential. The information was textual and unstructured, although each started with a summary before going into details, and there were dozens of links to other full Wikipedia pages. Overall, low on structure, high on linkage.  Hard to get a developer to do anything with this.

Looking into the Wikipedia pages, I tried to see what structure we might be able to find and exploit  They both had a contents box, which was a list of links to the various sections of the article. These bore no relationship to each other from page to page. Obviously, it would be difficult to come up with a common structure for pages describing people, places, and works of art, but there might be commonalities we could exploit for subsets such as people.

Both pages also had an “at-a-glance” summary box. There seemed to be more opportunities at this level for defining common structures. Some elements in this summary were identical, for example their name, photograph, place and date of birth, place and date of death, institutions attended, and what the scientist was known for. In other cases, there were different element names for what seemed to be the same content, namely Nationality for Bohr and both Citizenship and Residence for Einstein.

If the summary information were captured in a database, we could get answers to questions such as who were Einstein’s contemporaries in physics.

In fact, there have been attempts to structure Wikipedia, and one initiative is called Dbpedia. The Dbpedia article for Albert Einstein can be found at http://dbpedia.org/page/Albert_Einstein. Play around for a while and you will figure out that:

  • Albert Einstein is an entity of type Scientist
  • Scientist is a subclass of the type Person
  • Albert Einstein is described by a set of pairs (property, value); Neils Bohr is described by a very similar set
  • The first column is the name of a property, and the second column is the value of the property
  • Sometimes the value of the property is a property in its own right. For example, his birth place of Ulm is an entity of type Town.
  • Town is a subclass of Settlement is a subclass of Populated Place is a subclass of Place.

Given this we could answer some complex queries:

  • Find contemporaries of Einstein who were known for quantum theory
  • What is the place that Einstein lived in that has the longest river?
  • What other scientists of any discipline were born in the same birthplace as Einstein?

Dbpedia is a bold initiative, dealing as it does with a huge set of crowd-sourced content and attempting to structure it after the fact.  We don’t have to be this bold on a daily basis, but can take some lessons from this type of Linked Data initiative. Let’s go back to our Staff Directory and take the same approach, asking what are the entities that we want in our universe, their attributes and how they are linked as data.

We might start with:

  • EMPLOYEE (name, contact information, OFFICE, skills, …)
  • OFFICE (name, location, map, equipment, MEETING ROOMs, …)
  • PROJECT (name, dates,participating EMPLOYEEs, sector, …)
  • MEETING ROOM (name, office, equipment)

The entities are capitalized to show their relationships. This structure allows us to make powerful queries, and also display useful aggregations of data.

Examples of powerful queries might be:

  • Which offices have meeting rooms with capacity > 20 people, projectors and are available on March for an event?
  • Which projects in the last couple of years utilized a usability tester?  Any in Europe?

Examples of useful aggregations might be:

  • When we show an employee profile, also show the projects they have worked on, and out of which office
  • When we look at a regional office, list the employee breakdown by discipline.

Those of us who have spent a lot of time designing custom data-driven applications understand this approach very well. But people who come from web design or web content disciplines may not automatically think about the potential of a linked data approach.

And platforms such as SharePoint make it easy to build the individual entities, but don’t make it easy to link them together or to display them flexibly without custom development, biasing us towards simpler information presentations.

In either case, it might be worth while evangelizing the linked data approach as a way to achieve combinatorially larger benefits than stovepiped solutions.

The Information Artichoke Home Page   |  Modern IA Course Table of Contents

Leave a Reply