Friday, May 29, 2009

Continuing on the topic of the Web of Data (aka Linked Data)

There is lots being published about Linked Data. I just saw that the Spring 2009 PriceWaterhouseCooper technology forecast is full of data Web and Semantic web coolness. But, before I jump into the forecast, I would like to give some background on the Linked Data work that is happening in the industry today.

Linking Open Data (LOD) is a W3C project. According to their web site, "The goal of the W3C SWEO Linking Open Data community project is to extend the Web with a data commons by publishing various open data sets as RDF on the Web and by setting RDF links between data items from different data sources.RDF links enable you to navigate from a data item within one data source to related data items within other sources using a Semantic Web browser. RDF links can also be followed by the crawlers of Semantic Web search engines, which may provide sophisticated search and query capabilities over crawled data. As query results are structured data and not just links to HTML pages, they can be used within other applications. ... Collectively, the data sets consist of over 4.7 billion RDF triples, which are interlinked by around 142 million RDF links (May 2009)."

Here is the LOD figure showing what is linked today (actually March 2009):




Just to get a feel for what is included ... let me note that DBpedia (the bigger circle in the left center of the image) provides structured access to Wikipedia's human-oriented data (actually, it provides a SPARQL interface). According to DBpedia's web site, "The DBpedia knowledge base currently describes more than 2.6 million things, including at least 213,000 persons, 328,000 places, 57,000 music albums, 36,000 films, 20,000 companies. The knowledge base consists of 274 million pieces of information (RDF triples). It features labels and short abstracts for these things in 30 different languages; 609,000 links to images and 3,150,000 links to external web pages; 4,878,100 external links into other RDF datasets, 415,000 Wikipedia categories, and 75,000 YAGO categories. The DBpedia knowledge base has several advantages over existing knowledge bases: it covers many domains; it represents real community agreement; it automatically evolve as Wikipedia changes, and it is truly multilingual. The DBpedia knowledge base allows you to ask quite surprising queries against Wikipedia, for instance “Give me all cities in New Jersey with more than 10,000 inhabitants” or “Give me all Italian musicians from the 18th century”. Altogether, the use cases of the DBpedia knowledge base are widespread and range from enterprise knowledge management, over Web search to revolutionizing Wikipedia search."

Going back to Tim Berners-Lee's request for us to imagine what it would be like to have people load and connect knowledge, let's imagine what all this data can do for a business and its decision making processes ....

No comments:

Post a Comment