Linked Data and automated interfaces
Mar 3, 2014
Linked Data is an idea that has been around since 2006, but in my opinion is only recently starting to be appreciated. Its goal is ambitious: we’ve been putting documents on the Web for the past decades, and now we should put our data on the Web as well, using the same exact practices.
If you think about it an HTML document is a machine-readable format that computers understand to the point that they can render it nicely in a Web-browser. These documents have hyperlinks to other documents, effectively creating a cloud of interconnected documents that search-engines such as Google can harvest and analyze and provide very interesting results.
Now imagine that instead of mainly plain-text documents we were able to put and interlink datasets on the Web. This is exactly what Linked Data is doing: creating an interconnected cloud of datasets. Now imagine what search-engines can do with this sort of information.
This is effectively like having a MySQL database of all the data in the world that you can query. In essence with Linked Data we can ask search-engines much more fine-grained questions. We can ask them to return all the stores that are selling shoes of a specific brand and of a specific color and size. The really cool thing is that this query is not resolved by a single server or entity (such as eBay), but it’s resolved by all the various servers that provide such type of data.
Data vs documents
The problem however is that publishing data on the Web is much harder than publishing documents. Why? Well because with documents you simply have the content of the document, some keywords, the title and some other metadata. These are consistent across all types of documents if you think about it. With data the story is vastly different. You can have two different datasets coming from two different sources, that talk about the same thing, but are using completely different vocabularies and formats, making it hard for computers to analyze.
In essence data has simply much more information than documents and we need a new generation of Web publishers that are aware of this and that start publishing data using Linked Data principles if we ever want to harness the power of semantically-aware search-engines.
In order for us to annotate our data in a Web-friendly way and minimize the errors when typing our resources and links, we need something radically different.
Web APIs are very successful because they provide a great incentive to data-providers: users can build new applications on top of the provider’s data. Apps that the provider could never imagine to be possible. This is a huge incentive and we need to come up with a similar incentive for going the next step and actually create Web-friendly APIs: those that we can use to create the interconnected cloud of data.
The incentive cannot be “because it is good to do so” or “it will pay off in 10 years”. We need to come up with a better incentive mechanism. Something that is based on direct and measurable feedback, just like apps are direct feedbacks of Web APIs.
I for one believe in interfaces. Having automated interfaces built on your data is a great incentive. Things like LodLive or Trilby are just some examples of automated interfaces built on datasets that are using Linked Data principles. This is an important difference between Linked Data and regular Web APIs: such automated interfaces could not be built using Web APIs as they’re tailored to specific datasets. Whilst with Linked Data, the same principles are shared across all Linked Data resources, making it a much more powerful mechanism of sharing data.
By building the next generation of automated interfaces we give the incentive to data providers to finally establish the interconnected cloud of data.