Sunday, October 19, 2014

Breaking Down the "Documents and Policies" Project - Competency Questions

Our previous post defined a project for which a set of ontologies is needed ... "What access and handling policies are in effect for a document?" So, let's just jump into it!

The first step is always to understand the full scope of work and yet to be able to focus your development activities. Define what is needed both initially (to establish your work and ontologies) and ultimately (at the end of the project). Determine how to develop the ontologies, in increments, to reach the "ultimate" solution. Each increment should improve or expand your design, taking care to never go too far in one step (one development cycle). This is really an agile approach and translates to developing, testing, iterating until things are correct, and then expanding. Assume that your initial solutions will need to be improved and reworked as your development activities progress. Don't be afraid to find and correct design errors. But ... Your development should always be driven by detailed use cases and (corresponding) competency questions.

Competency questions were discussed in an earlier post, "General, Reusable Metadata Ontology - V0.2". (They are the questions that your ontology should be able to answer.) Let's assume that you and your customer define the following top-level questions:
  • What documents are in my repositories?
  • What documents are protected or affected by policies?
  • What documents are not protected or affected by policies? (I.E., what are the holes?)
  • What policies are defined?
  • What are the types of those policies (e.g., access or handling/digital rights)?
  • What the details of a specific policy?
  • Who was the author of a specific policy?
  • List all documents that are protected by multiple access control policies. And, list the policies by document.
  • List all documents that are affected by multiple handling/digital rights policies. And, list the policies by document.
These questions should lead you to ask other questions, trying to determine the boundaries of the complete problem. Remember that it is unlikely that the customers' needs will be addressed in a single set of development activities. (And, work will hopefully expand with your successes!) Often, a customer has deeper (or maybe) different questions that they have not yet begun to define. Asking questions and working with your customer can begin to tease this apart. Even if the customer does not want to go further at this time, it is valuable to understand where and how the ontologies may need to be expanded. Always take care to leave room to expand your ontologies to address new use cases and semantics.

This brings us back to "General Systems Thinking". It is important to understand a system, its parts and its boundaries.

Here are some follow-on questions (and their answers) that the competency questions could generate:
  • Q: Given that you have document repositories, how are the documents identified and tagged?
    • A: A subset of the Dublin Core information is collected for each document: Author, modified-by, title, creation date, date last modified, keywords, proprietary/non-proprietary flag, and description.
  • Q: How are the documents related to policies?
    • A: Policies apply to documents based on a combination of their metadata.
  • Q: Will we ever care about parts of documents, or do we only care about the documents as a whole?
    • A: We may ultimately want to apply policies to parts of documents, or subset a document based on its contents and provide access to its parts. But, this is a future enhancement.
  • Q: Do policies change over time (for example, becoming obsolete)?
    • A: Yes, we will have to worry about policy evolution and track that.
  • Q: What policy repositories do you have?
    • A: Policies are defined in code and in some specific content management systems. The goal is to collect the details related to all the documents and all the policies in order to guarantee consistency and remove/reduce conflicts.
  • Q: Given the last 2 competency questions, and your goal of removing/reducing conflicts, would you ultimately like the system to find inconsistencies and conflicts? How about making recommendations to correct these?
    • A: Yes! (We will need to dig into this further at a later time in order to define conflicts and remediation schemes.)
Well, we now know more about the ontologies that we will be creating. Initially, we are concerned with document identification/location/metadata and related access and digital rights policies. We can then move onto the provenance and evolution of documents and policies, and understanding conflicts and their remediation.

So, the next step is to flesh out the details for documents and policies. We will begin to do that in the next post.

Andrea

Monday, October 13, 2014

Understanding semantics and Pinker's "Curse of Knowledge"

I recently read an interesting editorial in the Wall Street Journal from Steven Pinker. It was titled, "The Source of Bad Writing", and discussed something that Pinker called the "Curse of Knowledge".
Curse of Knowledge: a difficulty in imagining what it is like for someone else not to know something that you know
After reading that article, looking at the various posts asking where to find good online courses on semantic technologies and linked data, discussing problems related to finding qualified job candidates, and listening to people (like my husband) who say that I make their heads explode, I decided to talk about semantics differently. Instead of explaining specific aspects of ontologies or semantics, or writing about disconnected aspects of the technologies, I want to go back to basics and explore how and what I do in creating ontologies, what to worry about, how to create, evolve and use an ontology and triple store, ...

Then, I need some feedback from my readers. As Steven Pinker says,
A ... way to exorcise the curse of knowledge is to close the loop, ... and get a feedback signal from the world of readers—that is, show a draft to some people who are similar to your intended audience and find out whether they can follow it. ... The other way to escape the curse of knowledge is to show a draft to yourself, ideally after enough time has passed that the text is no longer familiar. If you are like me you will find yourself thinking, "What did I mean by that?" or "How does this follow?" or, all too often, "Who wrote this crap?"
There are many good papers, books and blog posts on the languages, technologies and standards behind the Semantic Web. (Hopefully, some of my work is there.) I don't want to create yet another tutorial on these, but I do want to talk about creating and using ontologies. So, for the next 6 months or so, my goal is to design and create a set of ontologies through these blog posts - delving into existing ontologies, and semantic languages/standards and tools. In addition, as the ontologies are created, I will discuss using them - which moves us into triple stores and queries.

As we go along, I will reference specs from the W3C, other blog posts and information and tools on the web. My goal is that you can get all of the related specs, tools and details for free. I hope that you will be interested enough to scan or download them (or you might know and use them already), and ask more questions. What is important is to understand the basics, and then we can build from there.

The first question is "What is the subject of the ontology that we will be building and using?" Since I am interested in policy-based management, I would like to develop an ontology and infrastructure to answer the question: "What access and handling policies are in effect for a document?"

At first blush, you might think that the process is relatively easy. Find the document, get its details, find what policies apply, and then follow those policies. But, the policies that apply are possibly dictated by the subject or author of the document, or when it was written (since regulations and company policies change over time). Worse, the access policies are likely defined (and stored) separately from the handling/digital rights policies, but need to be considered together. Lastly, how do we even begin to understand what the policies are saying?

I hope that you see that I did not choose an easy subject at all, but one that will take some time to think through and develop. I am looking forward to doing this and would like your feedback, questions, comments and advice, along the way.

Andrea