Sunday, May 31, 2009

The Semantic Web in 3 Words

My husband asked me to explain the Semantic Web in three words (because I was going on about the web and my ideas) ... So, here they are:
  • Data
  • Linkages
  • Infrastructure
And now, I get to use more than 3 words :-).

Data is usually meta-data (data about data) - what a document is about, additional information like who the author is, etc. But, it can also be the raw information - like a business vocabulary.

Linkages are the relationships between the data. The information that ties the data together and lets you infer and extrapolate.

Infrastructure is the formalisms of the languages (RDF, RDF Schema, OWL, SPARQL, ...) and the services that are already provided (W3C's Linked Data, Protege, Pellet, ...). Data without backing services and formalisms means that you have to create everything yourself and there is no exponential building of knowledge that comes from sharing the data.

That's it. Let me know if you agree with my 3 words or have different ones.

Friday, May 29, 2009

Continuing on the topic of the Web of Data (aka Linked Data)

There is lots being published about Linked Data. I just saw that the Spring 2009 PriceWaterhouseCooper technology forecast is full of data Web and Semantic web coolness. But, before I jump into the forecast, I would like to give some background on the Linked Data work that is happening in the industry today.

Linking Open Data (LOD) is a W3C project. According to their web site, "The goal of the W3C SWEO Linking Open Data community project is to extend the Web with a data commons by publishing various open data sets as RDF on the Web and by setting RDF links between data items from different data sources.RDF links enable you to navigate from a data item within one data source to related data items within other sources using a Semantic Web browser. RDF links can also be followed by the crawlers of Semantic Web search engines, which may provide sophisticated search and query capabilities over crawled data. As query results are structured data and not just links to HTML pages, they can be used within other applications. ... Collectively, the data sets consist of over 4.7 billion RDF triples, which are interlinked by around 142 million RDF links (May 2009)."

Here is the LOD figure showing what is linked today (actually March 2009):




Just to get a feel for what is included ... let me note that DBpedia (the bigger circle in the left center of the image) provides structured access to Wikipedia's human-oriented data (actually, it provides a SPARQL interface). According to DBpedia's web site, "The DBpedia knowledge base currently describes more than 2.6 million things, including at least 213,000 persons, 328,000 places, 57,000 music albums, 36,000 films, 20,000 companies. The knowledge base consists of 274 million pieces of information (RDF triples). It features labels and short abstracts for these things in 30 different languages; 609,000 links to images and 3,150,000 links to external web pages; 4,878,100 external links into other RDF datasets, 415,000 Wikipedia categories, and 75,000 YAGO categories. The DBpedia knowledge base has several advantages over existing knowledge bases: it covers many domains; it represents real community agreement; it automatically evolve as Wikipedia changes, and it is truly multilingual. The DBpedia knowledge base allows you to ask quite surprising queries against Wikipedia, for instance “Give me all cities in New Jersey with more than 10,000 inhabitants” or “Give me all Italian musicians from the 18th century”. Altogether, the use cases of the DBpedia knowledge base are widespread and range from enterprise knowledge management, over Web search to revolutionizing Wikipedia search."

Going back to Tim Berners-Lee's request for us to imagine what it would be like to have people load and connect knowledge, let's imagine what all this data can do for a business and its decision making processes ....

Tuesday, May 26, 2009

Web 3.0 and the Web of Data

Web 3.0 is coming up (a lot) in posts on Read-Write Web and in other places. One Read-Write Web posting (The Web of Data, written by Alexander Korth in April of this year) discussed the 3 aspects of the next web (Web 3.0) ... "In the coming years, we will see a revolution in the ability of machines to access, process, and apply information. This revolution will emerge from three distinct areas of activity connected to the Semantic Web: the Web of Data, the Web of Services, and the Web of Identity providers. These webs aim to make semantic knowledge of data accessible, semantic services available and connectable, and semantic knowledge of individuals processable ...".

Tim Berners-Lee focused on the Web of Data in his TED talk on the next Web (recorded in Feb 2009). The talk is only a little longer than 15 minutes in length, and I highly recommend it. The key points are that we are now moving from a document-centric approach to storing information, to making raw data available and processable. That raw data is "linked data" - data about things (identified by URIs), including other interesting information (as RDF triples) and highlighting the relationships between the things. It is important to note that this is not about making data available through specific APIs or anticipated/pre-programmed queries on a "pretty" web site - but about making the "unadulterated data" available for machine understanding and new uses. It is about sharing and adding to data, making connections and relationships in novel ways, and bridging disciplines.

If you think about business and an enterprise, think about how powerful this would be - to capture knowledge, share it via social networking technologies, allow update and addition to the knowledge within the enterprise (again using the social networking tools of today), and to bridge disciplines and knowledge using the Semantic web mining and matching technologies. Overall, we improve the ability of the enterprise to capture and access its knowledge, and increase the captured knowledge. In the talk, Tim Berners-Lee asks people to imagine the "incredible resource" of "people doing their bit to produce a little bit, and it all connecting."

Just imagine ....

Monday, May 18, 2009

Lots of Interest in Wolfram|Alpha, and Some Discussion of Microsoft's EDM

Wolfram|Alpha is cool and uses great, new technology to provide question-answer query capabilities. But, it still has a way to go. As Read-Write-Web pointed out in their post, "the areas where Alpha exceeds are in Mathematics, Engineering, Chemistry, Physics, and the Life Sciences." What is needed is to take this technology and use it with business vocabularies and their backing databases.

To do this, you first need the capture of the vocabularies (yes, I will get back to this in my postings :-) - and then mappings to the physical stores. Microsoft's EDM (Entity Data Model) and Entity Framework are a start in enabling the mappings. They allow you to define a conceptual model, a physical model and then map between the two - although they don't help you create the conceptual or physical models, are not focused on conceptual modeling, and are too focused on the physical structure of the data store. Specifically, some of the ideal mappings are not possible (at least the last time that I tried), and all the data and meta-data that I would like to capture about the conceptual model are not possible to do (without extensions). But, they exist, are usable today, and will definitely be improved.

Another cool thing is that EDM and the framework allow you to write queries in the conceptual model, that are then translated to the physical one and run against the store. Pretty neat. Now, let's put a better query capability up front (like Wolfram|Alpha) ....

Monday, May 11, 2009

Going to School - Knowledge Management Style

In May 2001, Michael Earl wrote about three main categories and seven schools of knowledge management. His article was published in the Journal of Management Information Systems (Vol 18, Issue 1).

The three categories for capturing and sharing knowledge are:
  • Technocratic - involved with tooling and the use of technology for knowledge management
  • Economic - relating knowledge and income
  • Behavioral -dealing with how to organize to facilitate knowledge capture and exchange
Because these categories are so different, Earl pointed out that they are not mutually exclusive, and could be used in conjunction. In fact, doing so should better enable overall knowledge capture and use.

Within each of the categories, Earl posited that there are "schools" or focuses for knowledge management. Earl's seven schools are listed below (with some short descriptions):
  • Systems - Part of the technocratic category, focusing on the use of technology and the storing of explicit knowledge in databases and various systems and repositories. The knowledge is typically organized by domain.
  • Cartographic - Part of the technocratic category, focusing on who the "experts" are, in a company, and how to find and contact them. So, instead of explicit captured knowledge, the tacit knowledge held by individuals is paramount.
  • Engineering - Part of the technocratic category, focusing on capturing and sharing knowledge for process improvement. In addition, the details and outputs of various processes and knowledge flows are captured. The knowledge in this school is organized by activities with the goal of business process improvement.
  • Commercial - This is the only "economic" school and focuses on knowledge as a commercial asset. The emphasis is on income, which can be achieved in various ways ... such as limiting access to knowledge, based on payments or other exchanges, or rigorously managing a company's intellectual portfolio (individual know-how, patents, trademarks, etc.).
  • Organizational - Part of the behavioral category, focusing on building and enabling knowledge-sharing networks and communities of practice, for some business purpose. Earl defines it as a behavioral school "because the essential feature of communities is that they exchange and share knowledge interactively, often in nonroutine, personal, and unstructured ways". For those not familiar with the term "community of practice", it is defined by Etienne Wenger as “groups of people who share a concern or a passion for something they do and learn how to do it better as they interact regularly.”
  • Spatial - Part of the behavioral category, focusing on how space is used to facilitate socialization and the exchange of knowledge. This can be achieved by how office buildings are arranged, co-locating individuals working on the same project, etc.
  • Strategic - Part of the behavioral category, focusing on knowledge (according to Earl) as "the essence of a firm's strategy ... The aim is to build, nurture, and fully exploit knowledge assets through systems, processes, and people and convert them into value as knowledge-based products and services." This may seem like the strategic school rolls all the others into it, and it does. But, what distinguishes it, again according to Earl, "is that knowledge or intellectual capital are viewed as the key resource."
My personal focus is the strategic school, but with less interest in the spatial component and more in the systems aspects ... I believe that good collaboration needs to be (and can be) enabled, regardless of the physical environment or physical distances separating teams.

And, how do you do this? Via capturing, publishing and mapping each business group's/community's vocabularies (ontologies) and processes, and understanding that community's organizational structure.

Tuesday, May 5, 2009

Organizing Knowledge

I thought that I would examine what other writers think of "organizing knowledge", since I chose this for the title of my blog.

My passion for this title comes from the need to meld business knowledge with IT infrastructure - organizing the business' inherent and (usually) implicit knowledge by first capturing it and then making it usable, accessible and actionable (within the IT infrastructure). There is another aspect to this also - taking lots of information (already in the IT infrastructure) and organizing it to turn it into knowledge (not just bits of data).

Given these two goals, you find (or will find) lots of postings about ontologies, business processes, semantic web and similar topics in this blog. (Also, you will occasionally find some riffs on digital natives and education - since these are of particular interest to me.) I will not repeat postings from my earlier blog (while I was at Microsoft). You can read these yourself at http://blogs.msdn.com/policy_based_business.

Well, back to what others think about "organizing knowledge". Most of the work in this space is related to organizing and cataloging library materials, since libraries were the main repository of knowledge, and books the main format up until this digital age. This has now all changed. The need to catalog and classify books, using a single scheme, in order to find a particular book on a particular shelf in a physical library building is no longer a primary driver. One would argue that it is not even an appropriate driver, in a fast-paced, online business environment. (However, I must confess to a passion for reading real, physical books, away from the electronic distractions of today's environments.)

Libraries had a need for a single, driving organizational scheme since they often only had a few copies of a book and could not have them scattered across many shelves, classified in different ways. Now, multiple classifications/organization schemes can exist and cross-reference each other.

Where before knowledge extraction was all manual ... someone had to read the books, examine the world, organize and build on the knowledge, draw new insights and conclusions ... we now have tons of data stored on our computers and on the Web, and the help of the semantic web and description logic reasoners. Notice that I said "the help of semantic web" - it still requires a person to classify, organize and query knowledge in valid ways, and to interpret the results. I am not close to advocating for or finding the HAL computer from 2001. :-)

So, back to what others think about "organizing knowledge". I did a search on "organizing knowledge" on Amazon. Here is what I found (all quotations are from the editorial reviews on Amazon):
  • Organizing Knowledge (Jennifer Rowley and Richard Hartley) - "Incorporates extensive revisions reflecting the increasing shift towards a networked and digital information environment, and its impact on documents, information, knowledge, users and managers ... [offers] a broad-based overview of the approaches and tools used in the structuring and dissemination of knowledge".
  • Organising Knowledge: Taxonomies, Knowledge and Organisational Effectiveness (Patrick Lambe) - Defines and discusses various taxonomic forms and how these "can help organizations to leverage and articulate their knowledge"
  • The Organization of Information (Arlene Taylor) - "Provides a detailed and insightful discussion of such basic retrieval tools as bibliographies, catalogs, indexes, finding aids, registers, databases, major bibliographic utilities, and other organizing entities"
  • The Intellectual Foundation of Information Organization (Elaine Svenonius) - Analyzes the foundations of information organization, and then presents three bibliographic languages: work languages, document languages, and subject languages. From the review, "The effectiveness of a system for accessing information is a direct function of the intelligence put into organizing it."
  • Organizing Business Knowledge (Thomas Malone) - "Proposes a set of fundamental concepts to guide analysis and a classification framework for organizing knowledge, and describes the publicly available online knowledge base developed by the project, which includes a set of representative templates and specific case examples as well as a set of software tools for organizing and sharing knowledge"

As you can see, there is some interesting material out there, and some mundane stuff. I have ordered several of the books listed above and will report on them in future posts on this blog. Hopefully, the information will be of help to all of us.