[OTDev] Silk Link Discovery Framework

Thu Jun 2 07:44:45 CEST 2011

FYI on Silk Link Discovery Framework and Workbench, which includes 3 
applications for managing rdf and the integrated linked datasets.
Barry

We are happy to announce version 2.4 of the Silk - Link Discovery Framework for the Web of Data.

The central idea of the Web of Data is to interlink data items using RDF links. However, in practice most data sources are not sufficiently interlinked with related data sources. The Silk Link Discovery Framework addresses this problem by providing tools to generate links between data items based on user-provided link specifications. It can be used by data publishers to generate links between datasets as well as by Linked Data consumers to augment Web data with additional RDF links.

Link specifications can either be written manually or developed using the new Silk Workbench. The Silk Workbench, is a web application which guides the user through the process of interlinking different data sources. It’s being shipped with the 2.4 version of Silk.
The Silk Workbench offers the following features:
- It enables the user to manage different sets of data sources and linking tasks.
- It offers a graphical editor which enables the user to easily create and edit link specifications.
- As finding a good linking heuristics is usually an iterative process, the Silk Workbench makes it possible for the user to quickly evaluate the links which are generated by the current link specification.
- It allows the user to create and edit a set of reference links used to evaluate the current link specification.

The Silk Link Discovery Framework includes three applications to execute the link specifications which address different use cases:
1. Silk Single Machine is used to generate RDF links on a single machine. The datasets that should be interlinked can either reside on the same machine or on remote machines which are accessed via the SPARQL protocol. Silk Single Machine provides multithreading and caching. In addition, the performance can be further enhanced using an optional blocking feature.
2. Silk Server can be used as an identity resolution component within applications that consume Linked Data from the Web. Silk Server provides an HTTP API for matching instances from an incoming stream of RDF data while keeping track of known entities. It can be used for instance together with a Linked Data crawler to populate a local duplicate-free cache with data from the Web.
3. Silk MapReduce is used to generate RDF links between datasets using a cluster of multiple machines. Silk MapReduce is based on Hadoop and can for instance be run on Amazon Elastic MapReduce. Silk MapReduce enables Silk to scale out to very big datasets by distributing the link generation to multiple machines.

More information about the Silk framework, the Silk Link Specification Language, as well as several examples that demonstrate how Silk is used to set links between different data sources in the LOD cloud is found at:

http://www4.wiwiss.fu-berlin.de/bizer/silk/

The Silk framework is provided under the terms of the Apache License, Version 2.0 and can be downloaded from

http://www4.wiwiss.fu-berlin.de/bizer/silk/releases/

The development of Silk was supported by Vulcan Inc. as part of its Project Halo (www.projecthalo.com) and by the EU FP7 project LOD2 - Creating Knowledge out of Interlinked Data (http://lod2.eu/, Ref. No. 257943).

Thanks to  Christian Becker, Michal Murawicki and Andrea Matteini for contributing to the Silk Workbench.

Happy linking,

Robert Isele, Anja Jentzsch and Chris Bizer