Friday, November 28, 2014

More links to document-related ontologies

Over the course of the last few weeks, a few people have emailed me additional references for document ontologies. These are both valuable links. I want to use them (in my solution) and so, need to expand the references from my previous post to add these:
  • SALT (Semantically Annotated LaTeX for Scientific Publications) Document Ontology
    • SALT is described in this paper, but unfortunately none of the associated/referenced ontologies are still available
  • FRBRoo
    • A harmonization of the FRBR model (an entity-relationship model from the International Federation of Library Associations and Institutions, published in 1998) and the CIDOC CRM4 model (ISO Standard 21127.5 developed by ICOM-CIDOC, the International Council for Museums – International Committee on Documentation)
    • Per the documentation for Version 2 of the model, FRBRoo is "a formal ontology that captures and represents the underlying semantics of bibliographic information and therefore facilitates the integration, mediation, and interchange of bibliographic and museum information"
    • Also, there is an "owlified" version available at
Given these new insights, I have a bit more work to do on my solution.


Monday, November 3, 2014

Document-related ontologies

In my previous post, I defined the competency questions and initial focus for an ontology development effort. The goal of that effort is to answer the question, "What access and handling policies are in effect for a document?"

A relatively (and judging by the length of this post, I do mean "relatively"!) easy place to start is by creating the document-related ontology(ies). (Remember that I am explicitly walking through all the steps in my development process and not just throwing out an answer. At this time, I don't know what the complete answer is!)

Unless your background is content management, or you are a metadata expert, the first step is to learn the basic principles and concepts of the domain being modeled. This helps to define the ontologies and also establishes a base set of knowledge so that you can talk with your customer and their subject-matter experts (SMEs). Never assume that you are the first person to model a domain, or that you inherently know the concepts because they are "obvious". (Unless you invented the domain, you are not the first person to model it. Unless you work in the domain, you don't really know it!)

Beyond just learning about the domain, there are additional advantages for understanding the basic principles and concepts, and looking at previous work ... First, you don't want to waste the experts' time. That is valuable and often limited. The more that you waste an expert's time, the less that they want to talk to you. Second, you need to understand the basics since these are sometimes so obvious to the experts that they consider it "implicit". I.E., they don't say anything about the basics and their assumptions, and eventually you get confused or lost (because you don't have the necessary background or make your own assumptions in real-time, in the conversations). Third, it is valuable to know where mistakes might have been made or where models were created that seem "wrong" to you. Also, it is valuable to know where there are differences of opinion in a domain - and know where your experts land, on which side of a debate. Understanding boundary cases, and maybe accounting for multiple solutions, may make the difference between your ontology succeeding or failing.

Background knowledge can come from many places. But, I usually start with Google, Bing, Yahoo, etc. (given your personal preference). I type in various phrases and then follow the links. Here are some of the phrases that I started with, for the "documents" space:
  • Dublin Core (since that was specifically mentioned in the competency questions)
  • Document metadata
  • Document management system
  • Document ontology (since there may be a complete ontology ready to adapt or directly reuse)
Clearly this is just a starting list, since each link leads to others. It is valuable to review any Wikipedia links that come up (as they usually provide a level-set). Especially, pay attention to standards. Then dig a bit deeper, looking at academic and business articles, papers and whitepapers. You can do this with a search engine and by checking your company's, or an organization's (such as IEEE and ACM), digital library.

Here is where my initial investigations took me: You can also take a look at the metadata-ontology that I developed from Dublin Core and SKOS, and discussed in earlier posts.

As for the RDF and ontologies, I don't want to take them "as-is" and just put them together. I first want to quickly review them, as well as the ideas from relevant other references (such as OASIS's ODF). Then, we can begin to define a base ontology. It is important to always keep our immediate goals in focus (which are mostly related to document metadata), but also have an idea of probable (or possible) extensions to the ontologies.

When creating my ontologies, I usually (always?) end up taking piece-parts and reusing concepts from multiple sources. The parts can be imported and rationalized via an integrating ontology, or are cut and pasted from the different sources into a new ontology. There are advantages and disadvantages to each approach.

When importing the original ontologies and integrating them (especially when using a tool like Protege), you end up with a large number of classes and properties, with (hopefully) many duplicates or (worst case) many subtle differences in semantics. This can be difficult to manage and sort through, and it takes time to get a good understanding of the individual model/ontology semantics. Another problem with this approach is that the ontologies sometimes evolve. If this happens, URLs may change and your imports could break. Or, you may end up referencing a concept that was renamed or no longer exists. Ideally, when an ontology is published, a link is maintained to the individual versions, but this does not always happen. I usually take the latest version of an ontology or model, and save it to a local directory, maintaining the link to the source and also noting the version (for provenance).

Cutting and pasting the various piece parts of different ontologies makes it easier to initially create and control your ontology. The downside is that you sometimes lose the origins and provenance of the piece parts, and/or lose the ability to expand into new areas of the original ontologies. The latter may happen because those ontologies are not "in front" of you ("out of sight, out of mind") or because you have deviated too far from the original semantics and structure.

In my next posts, I will continue to discuss a design for the document-related ontologies (focusing on the immediate needs to reflect the Dublin Core metadata and the existence/location of the documents). In the meantime, let me know if I missed any valuable references, or if you have other ideas for the ontologies.


Sunday, October 19, 2014

Breaking Down the "Documents and Policies" Project - Competency Questions

Our previous post defined a project for which a set of ontologies is needed ... "What access and handling policies are in effect for a document?" So, let's just jump into it!

The first step is always to understand the full scope of work and yet to be able to focus your development activities. Define what is needed both initially (to establish your work and ontologies) and ultimately (at the end of the project). Determine how to develop the ontologies, in increments, to reach the "ultimate" solution. Each increment should improve or expand your design, taking care to never go too far in one step (one development cycle). This is really an agile approach and translates to developing, testing, iterating until things are correct, and then expanding. Assume that your initial solutions will need to be improved and reworked as your development activities progress. Don't be afraid to find and correct design errors. But ... Your development should always be driven by detailed use cases and (corresponding) competency questions.

Competency questions were discussed in an earlier post, "General, Reusable Metadata Ontology - V0.2". (They are the questions that your ontology should be able to answer.) Let's assume that you and your customer define the following top-level questions:
  • What documents are in my repositories?
  • What documents are protected or affected by policies?
  • What documents are not protected or affected by policies? (I.E., what are the holes?)
  • What policies are defined?
  • What are the types of those policies (e.g., access or handling/digital rights)?
  • What the details of a specific policy?
  • Who was the author of a specific policy?
  • List all documents that are protected by multiple access control policies. And, list the policies by document.
  • List all documents that are affected by multiple handling/digital rights policies. And, list the policies by document.
These questions should lead you to ask other questions, trying to determine the boundaries of the complete problem. Remember that it is unlikely that the customers' needs will be addressed in a single set of development activities. (And, work will hopefully expand with your successes!) Often, a customer has deeper (or maybe) different questions that they have not yet begun to define. Asking questions and working with your customer can begin to tease this apart. Even if the customer does not want to go further at this time, it is valuable to understand where and how the ontologies may need to be expanded. Always take care to leave room to expand your ontologies to address new use cases and semantics.

This brings us back to "General Systems Thinking". It is important to understand a system, its parts and its boundaries.

Here are some follow-on questions (and their answers) that the competency questions could generate:
  • Q: Given that you have document repositories, how are the documents identified and tagged?
    • A: A subset of the Dublin Core information is collected for each document: Author, modified-by, title, creation date, date last modified, keywords, proprietary/non-proprietary flag, and description.
  • Q: How are the documents related to policies?
    • A: Policies apply to documents based on a combination of their metadata.
  • Q: Will we ever care about parts of documents, or do we only care about the documents as a whole?
    • A: We may ultimately want to apply policies to parts of documents, or subset a document based on its contents and provide access to its parts. But, this is a future enhancement.
  • Q: Do policies change over time (for example, becoming obsolete)?
    • A: Yes, we will have to worry about policy evolution and track that.
  • Q: What policy repositories do you have?
    • A: Policies are defined in code and in some specific content management systems. The goal is to collect the details related to all the documents and all the policies in order to guarantee consistency and remove/reduce conflicts.
  • Q: Given the last 2 competency questions, and your goal of removing/reducing conflicts, would you ultimately like the system to find inconsistencies and conflicts? How about making recommendations to correct these?
    • A: Yes! (We will need to dig into this further at a later time in order to define conflicts and remediation schemes.)
Well, we now know more about the ontologies that we will be creating. Initially, we are concerned with document identification/location/metadata and related access and digital rights policies. We can then move onto the provenance and evolution of documents and policies, and understanding conflicts and their remediation.

So, the next step is to flesh out the details for documents and policies. We will begin to do that in the next post.


Monday, October 13, 2014

Understanding semantics and Pinker's "Curse of Knowledge"

I recently read an interesting editorial in the Wall Street Journal from Steven Pinker. It was titled, "The Source of Bad Writing", and discussed something that Pinker called the "Curse of Knowledge".
Curse of Knowledge: a difficulty in imagining what it is like for someone else not to know something that you know
After reading that article, looking at the various posts asking where to find good online courses on semantic technologies and linked data, discussing problems related to finding qualified job candidates, and listening to people (like my husband) who say that I make their heads explode, I decided to talk about semantics differently. Instead of explaining specific aspects of ontologies or semantics, or writing about disconnected aspects of the technologies, I want to go back to basics and explore how and what I do in creating ontologies, what to worry about, how to create, evolve and use an ontology and triple store, ...

Then, I need some feedback from my readers. As Steven Pinker says,
A ... way to exorcise the curse of knowledge is to close the loop, ... and get a feedback signal from the world of readers—that is, show a draft to some people who are similar to your intended audience and find out whether they can follow it. ... The other way to escape the curse of knowledge is to show a draft to yourself, ideally after enough time has passed that the text is no longer familiar. If you are like me you will find yourself thinking, "What did I mean by that?" or "How does this follow?" or, all too often, "Who wrote this crap?"
There are many good papers, books and blog posts on the languages, technologies and standards behind the Semantic Web. (Hopefully, some of my work is there.) I don't want to create yet another tutorial on these, but I do want to talk about creating and using ontologies. So, for the next 6 months or so, my goal is to design and create a set of ontologies through these blog posts - delving into existing ontologies, and semantic languages/standards and tools. In addition, as the ontologies are created, I will discuss using them - which moves us into triple stores and queries.

As we go along, I will reference specs from the W3C, other blog posts and information and tools on the web. My goal is that you can get all of the related specs, tools and details for free. I hope that you will be interested enough to scan or download them (or you might know and use them already), and ask more questions. What is important is to understand the basics, and then we can build from there.

The first question is "What is the subject of the ontology that we will be building and using?" Since I am interested in policy-based management, I would like to develop an ontology and infrastructure to answer the question: "What access and handling policies are in effect for a document?"

At first blush, you might think that the process is relatively easy. Find the document, get its details, find what policies apply, and then follow those policies. But, the policies that apply are possibly dictated by the subject or author of the document, or when it was written (since regulations and company policies change over time). Worse, the access policies are likely defined (and stored) separately from the handling/digital rights policies, but need to be considered together. Lastly, how do we even begin to understand what the policies are saying?

I hope that you see that I did not choose an easy subject at all, but one that will take some time to think through and develop. I am looking forward to doing this and would like your feedback, questions, comments and advice, along the way.


Saturday, July 26, 2014

Another OWL diagramming transform and some more thoughts on writing

With summer in full-tilt and lots going on, I seem to have lost track of time and been delinquent in publishing new posts. I want to get back into writing slowly ... with a small post that builds on two of my previous ones.

First, I wrote a new XSL transform that outputs all NamedIndividuals specified in an ontology file. The purpose was to help with diagramming enumerations. (I made a simplifying assumption that you added individuals into a .owl file in order to create enumerated or exemplary individuals.) The location of the transform is GitHub (check out And, details on how to use the transform (for example, with the graphical editor, yED) is described in my post, Diagramming an RDF/XML ontology.

If you don't want some individuals included, feel free to refine the transform, or just delete individuals after an initial layout with yEd.

Second, here are some more writing tips, that build on the post, Words and writing .... Most of these I learned in high school (a very long time ago), as editor of the school paper. (And, yes, I still use them today.)
  • My teacher taught us to vary the first letter of each paragraph, and start the paragraphs with interesting words (e.g., not "the", "this", "a", ...). Her point was that people got an impression of the article from glancing at the page, and the first words of the paragraphs made the most impression. If the words were boring, then the article was boring. I don't know if this is true, but it seems like a reasonable thing.
  • Another good practice is to make sure your paragraphs are relatively short, so as not to seem overwhelming. (I try to keep my paragraphs under 5-6 sentences.) Also, each paragraph should have a clear focus and stick to it. It is difficult to read when the main subject of a paragraph wanders.
  • Lastly, use a good opening sentence for each paragraph. It should establish the contents of the paragraph - setting it up for more details to come in the following sentences.
You can check out more writing tips at "Hot 100 News Writing Tips".


Tuesday, May 20, 2014

Diagramming an RDF/XML OWL ontology

Over the course of time (many times, in fact), I have been asked to "graph" my ontologies to help visualize the concepts. Anyone who has worked with Protege (or the Neon Toolkit or other tools) knows that none of the tools give you all the images that you really need to document your work. I have often resorted to hand-drawing the ontologies using UML diagrams. This is both painful and a huge time sink.

Recently, I was reading some emails on the Linked Data community distribution list about how they generate the LOD cloud diagram. Omnigraffle is used in the "official" workflow to create this diagram, but that tool costs money to buy. One of the email replies discussed a different approach.

A gentleman from needed to draw a similar diagram for the data cloud for the Open Linguistics Working Group. His team could not use the same code and processing flow as the LOD cloud folks, since they didn't have many Mac users. So, they developed an alternative based on GraphML. To create the basic graph, they developed a Python script. And, ...
Using yed's "organic" layout, a reasonable representation can be achieved which is then manually brought in shape with yed (positioning) and XML (font adjustment). In yed, we augment it with a legend and text and then export it into the graphic format of choice.
Given my propensity to "reuse" good ideas, I decided to investigate GraphML and yEd. And, since GraphML is XML, ontologies can be defined in RDF/XML, and XSLT can be used to transform XML definitions, I used XSLT to generate various GraphML outputs of an ontology file. Once the GraphML outputs were in place, I used yEd to do the layout, as the team did. (It is important to note that the basic yEd tool is free. And, layout is the most difficult piece of doing a graphic.)

So, what did I find? You can be the judge. The XSLTs are found on GitHub (check out There are four files in the graphing directory:
  • AnnotationProperties.xsl - A transform of any annotation property definitions in an RDF/XML file, drawing them as rectangles connected to a central entity named "Annotation Properties".
  • ClassHierarchies.xsl - A transform of any class definitions in an RDF/XML file, drawing them in a class-superclass hierarchy.
  • ClassProperties.xsl - A transform of any data type and object property definitions in an RDF/XML file, drawing them as rectangles with their types (functional, transitive, etc.) and domains and ranges.
  • PropertyHierarchies.xsl - A transform of any data type and object property definitions in an RDF/XML file, drawing their property-super property relationships.
I executed the transforms using xsltproc. An example invocation is:
xsltproc -o result.graphml ../graphing/ClassProperties.xsl metadata-properties.owl
I then took the result.graphml and opened it in the yEd Graph Editor. (If you do the same, you will find that all the classes, or properties lay on top of each other. I made no attempt to do any kind of layout since I planned to use yEd for that purpose.) For the class properties graph (from the above invocation), I used the Layout->Radial formatting, with the default settings. Here is the result:

I was impressed with how easy this was!

The really great thing is that if you don't like a layout, you can choose another format and even tweak the results. I did some tweaking for the "Property Hierarchies" diagram. In this case, I ran the PropertyHierarchies.xsl against the metadata-properties.owl file and used the Hierarchical Layout on the resulting GraphML file. Then, I selected all the data properties and moved them underneath the object properties. Here is the result:

Admittedly, the diagrams can get quite complex for a large ontology. But, you can easily change/combine/separate the XSLT transforms to include more or less content.

With about a day and half's worth of work (and using standards and free tooling), I think that I saved myself many frustrating and boring hours of diagramming. Let me know if you find this useful, or you have other suggestions for diagramming ontologies.


Words and writing ...

I came across an amazing blog entry today. I love reading and writing. When writing, I try to communicate my thoughts in a (hopefully) clear and entertaining manner. I often use dictionaries and thesauri to get ideas for new and different words, to spice up my paragraphs when they seem dull.

But, after reading the blog entry from James Somers, "You're probably using the wrong dictionary", I know that I have been fooled ("deceived; imposed upon") by my current tools.

Mr. Somers' blog discusses how Webster came to create the first dictionary, how John McPhee uses Webster's dictionary when creating his fourth draft of a work, and how dictionaries could come to inspire thought and writing. I know that you don't believe me on that last point ... so go check out The ARTFL Project (Webster's Dictionary, 1913 and 1828 editions). Enter any word that comes to mind and see what you find.

Here is my example, I entered the word, car (trying for a word that was mundane). Here is the text from the 1828 edition ...
1. A small vehicle moved on wheels; usually, one having but two wheels and drawn by one horse; a cart.
2. A vehicle adapted to the rails of a railroad. [U. S.] &hand; In England a railroad passenger car is called a railway carriage; a freight car a goods wagon; a platform car a goods truck; a baggage car a van. But styles of car introduced into England from America are called cars; as, tram car. Pullman car. See Train.
3. A chariot of war or of triumph; a vehicle of splendor, dignity, or solemnity. [Poetic].
   The gilded car of day. Milton.
   The towering car, the sable steeds. Tennyson.
4. (Astron.) The stars also called Charles's Wain, the Great Bear, or the Dipper.
   The Pleiads, Hyads, and the Northern Car. Dryden.
5. The cage of a lift or elevator.
6. The basket, box, or cage suspended from a ballon to contain passengers, ballast, etc.
7. A floating perforated box for living fish.

[U. S.] Car coupling, or Car coupler, a shackle or other device for connecting the cars in a railway train. [U. S.] -- Dummy car (Railroad), a car containing its own steam power or locomotive. -- Freight car (Railrood), a car for the transportation of merchandise or other goods. [U. S.] -- Hand car (Railroad), a small car propelled by hand, used by railroad laborers, etc. [U. S.] -- Horse car, or Street car, an ommibus car, draw by horses or other power upon rails laid in the streets. [U. S.] -- Mcol>Palace car, Drawing-room car, Sleeping car, Parior caretc. , (Railroad), cars especially designed and furnished for the comfort of travelers.
I was blown away! Webster's 1828 and 1913 dictionaries will become my new source of words (admittedly, not modern, but definitely poetic). Mr. Somers explains how you can download and install the 1913 edition, and use it in conjunction with your other dictionaries on your Mac, Kindle and iPad. That upgrade of my dictionaries is underway as I type.


Monday, May 12, 2014

Ontology Summit 2014 and the communique

Ontology Summit 2014 officially concluded with the symposium on April 28-29. There were some great keynotes, summary presentations and discussions. You can see most of the slides on the Day 1 and Day 2 links, and can also check out the online, unedited Day 1 chat and Day 2 chat.

The main "output" of each Ontology Summit is a communique. This year's communique is titled Semantic Web and Big Data Meets Applied Ontology, consistent with the Summit theme. Follow the previous link to get the full document, and consider endorsing it (if you are so inclined). To endorse the communique, send an email to with the subject line: "I hereby confirm my endorsement of the OntologySummit2014 Communique" and include (at least) your name in the body of the email. Other remarks or feedback can also be included. And, I would encourage you to add your thoughts.

I want to provide a quick list of the high points of the communique (for me):
  • In the world of big data, ontologies can help with semantic integration and mapping, reduction of semantic mismatches, normalization of terms, and inference and insertion of metadata and other annotations.
  • Development approaches that involve a heavy-weight, complete analysis of "the world" are evolving to lighter weight approaches. This can be seen in the development of ontology design patterns, the use of ontologies in Watson, and the bottom-up annotation and interlinking approaches of web/RESTful services (as "Linked Services").
  • There are some best practices that can be applied for sharing and reuse to succeed (and since I drafted most of these best practices, I am just copying them directly below :-)):
    • Wise reuse possibilities follow from knowing your project requirements. Competency questions should be used to formulate and structure the ontology requirements, as part of an agile approach. The questions help contextualize and frame areas of potential content reuse.
    • Be tactical in your formalization. Reuse content based on your needs, represent it in a way that meets your objectives, and then consider how it might be improved and reused. Clearly document your objectives so that others understand why you made the choices that you did.
    • Small ontology design patterns provide more possibilities for reuse because they have low barriers for creation and potential applicability, and offer greater focus and cohesiveness. They are likely less dependent on the original context in which they were developed.
    • Use "integrating" modules to merge the semantics of reused, individual content and design patterns.
    • Separately consider the reuse of classes/concepts, from properties, from individuals and from axioms. By separating these semantics (whether for linked data or ontologies) and allowing their specific reuse, it is easier to target specific content and reduce the amount of transformation and cleaning that is necessary.
    • RDF provides a basis for semantic extension (for example, by OWL and RIF). But, RDF triples without these extensions may be underspecified bits of knowledge. They can help with the vocabulary aspects of work, but formalization with languages like OWL can more formally define and constrain meaning. This allows intended queries to be answerable and supports reasoning.
    • Provide metadata (providing definitions, history and any available mapping documentation) for your ontologies and schemas. Also, it is valuable to distinguish constraints or concepts that are definitive (mandatory to capture the semantics of the content) versus ones that are specific to a domain. Domain-specific usage, and "how-to" details for use in reasoning applications or data analytics are also valuable. Some work in this area, such as Linked Open Vocabularies and several efforts in the Summit's Hackathon, is underway and should be supported.
    • Use a governance process for your ontologies (and it would be even better if enforced by your tooling). The process should include open consideration, comment, revision and acceptance of revisions by a community.
  • Lastly, what are some of the interesting areas of investigation? One area, certainly, is the need for tooling to better support modular ontology development, integration, and reuse. Another is support for hybrid reasoning capabilities - supporting both description logic and first-order logic reasoning, and both logical and probabilistic reasoning. Third, tooling that combines data analytic and ontological processing would be valuable to make sense of "big data", and aid in the dissemination of the resulting knowledge to users and for decision support. To truly address this last area, it may be necessary to create specialized hardware and processing algorithms to combine and process data using the graph-structured representations of ontologies.
That's it for me, but please take a look at the communique, draw your own conclusions, and determine your own highlights.


Wednesday, May 7, 2014

Updated metadata ontology file (V0.6.0) and new metadata-properties ontology (V0.2.0) on GitHub

I've spent some time doing more work on the general metadata ontologies (metadata-annotations and metadata-properties). Metadata-annotations is now at version 0.6.0. In this release, I mainly corrected the SPARQL queries that were defined as the competency queries. SPARQL is straightforward, but it is easy to make mistakes. I made a few in my previous version (because I just wrote the queries by hand, without testing them - my bad). Anyway, that is all fixed now and the queries are correct. My apologies on the errors.

You can also see that there is a new addition to the metadata directory with the metdata-properties ontology. Metadata-properties takes some of the concepts from metadata-annotations, and redefines them as data and object properties. In addition, a few supporting classes are defined (specifically, Actor and Modification), where required to fully specify the semantics.

Actor is used as the subject of the object properties, contributedTo and created. Modification is designed to collect all the information related to a change or update to an individual. This is important when one wants to track the specifics of each change as a set of related data. This may not be important - for example, if one only wants to track the date of last modification or only track a description of each change. In these cases, the data property, dateLastModified, or the annotation property, changeNote, can be the predicate of a triple involving the updated individual directly.

It is important to understand that only a minimum amount of information is provided for Actor and Modification. They are defined, but are purposefully underspecified to allow application- or domain-specific details to be provided in another ontology. (In which case, the IRIs of the corresponding classes in the other ontology would be related to Actor and Modification using an owl:equivalentClass axiom. This was discussed in the post on modular ontologies, and tying together the pieces.)

Also in the metadata-properties ontology, an identifier property is defined. It is similar to the identifier property from Dublin Core, but is not equivalent since the metadata-properties' identifier is defined as a functional data property. (The Dublin Core property is "officially" defined as an annotation property.)

To download the files, there is information in the blog post from Apr 17th.

Please let me know if you have any feedback or issues.


Monday, April 28, 2014

General, Reusable Metadata Ontology - V0.2

This is just a short post that a newer version of the general metadata ontology is available. The ontology was originally discussed in a blog post on April 16th. And, if you have trouble downloading the files, there is help in the blog post from Apr 17th.

I have taken all the feedback, and reworked and simplified the ontology (I hope). All the changes are documented in the ontology's changeNote.

Important sidebar: I strongly recommend using something like a changeNote to track the evolution of every ontology and model.

As noted in the Apr 16th post, most of the concepts in the ontology are taken from the Dublin Core ELements vocabulary and the SKOS data model. In this version, the well-established properties from Dublin Core and SKOS use the namespaces/IRIs from those sources ( and, respectively). Some examples are dc:contributor, dc:description and skos:prefLabel. Where the semantics are different, or more obvious names are defined (for example, creating names that provide "directions" for the skos:narrower and broader relations), then the namespace is used.

This release is getting much closer to a "finished" ontology. All of the properties have descriptions and examples, and most have scope/usage notes. The ontology's scope note describes what is not mapped from Dublin Core and SKOS, and why.

In addition, I have added two unique properties for the ontology. One is competencyQuestions and the other is competencyQuery. The concept of competency questions was originally defined in a 1995 paper by Gruninger and Fox as "requirements that are in the form of questions that [the] ontology must be able to answer." The questions help to define the scope of the ontology, and are [should be] translated to queries to validate the ontology. These queries are captured in the metadata ontology as SPARQL queries (and the corresponding competency question is included as a comment in the query, so that it can be tracked). This is a start at test-driven development for ontologies. :-)

Please take a look at the ontology (even if you did before since it has evolved), and feel free to comment or (even better) contribute.


Thursday, April 17, 2014

Downloading the Metadata Ontology Files from GitHub

Since I posted my ontology files to GitHub, and got some emails that the downloads were corrupted, I thought that I should clarify the download process.

You are certainly free to fork the repository and get a local copy. Or, you can just download the file(s) by following these instructions:
  • LEFT click on the file in the directory on GitHub
  • The file is displayed with several tabs across the top. Select the Raw tab.
  • The file is now displayed in your browser window as text. Save the file to your local disk using the "Save Page As ..." drop-down option, under File.
After you download the file(s), you can then load one of them into something like Protege. (It is only necessary to load one since they are all the same.) Note that there are NO classes, data or object properties defined in the ontology. There are only annotation properties that can be used on classes, data and object properties. Since I need this all to be usable in reasoning applications, I started with defining and documenting annotation properties.

I try to note this in a short comment on the ontology (but given the confusion, I should probably expand the comment). I am also working on a metadata-properties ontology which defines some of the annotation properties as data and object properties. This will allow (for example) validating dateTime values and referencing objects/individuals in relations (as opposed to using literal values). It is important to note, however, that you can only use data and object properties with individuals (and not with class or property declarations, or you end up with OWL Full with no computational guarantees/no reasoning).

Lastly, for anyone that objects to using annotation properties for mappings (for example, where I map SKOS' exactMatch in the metadata-annotations ontology), no worries ... More is coming. As a place to start, I defined exactMatch, moreGeneralThan, moreSpecificThan, ... annotation properties for documentation and human-consumption. (I have to start somewhere. :-) And, I tried to be more precise in my naming than SKOS, which names the latter two relations, "broader" and "narrower", with no indication of whether the subject or the object is more broad or more narrow. (I always get this mixed up if I am away from the spec for more than a week. :-)

I want to unequivocally state that annotation properties are totally inadequate to do anything significant. But, they are a start, and something that another tool could query and use. Separately, I am working on a more formal approach to mapping but starting with documentation is where I am.

Obviously, there is a lot more work in the pipeline. I just wish I had more time (like everyone).

In the meantime, please let me know if you have more questions about the ontologies or any of my blog entries.


Wednesday, April 16, 2014

General, Reusable, Metadata Ontology

I recently created a new ontology, following the principles discussed in Ontology Summit 2014's Track A. (If you are not familiar with the Summit, please check out some of my earlier posts.) My goal was to create a small, focused, general, reusable ontology (with usage and scope information, examples of each concept, and more). I must admit that it was a lot more time-consuming than I anticipated. It definitely takes time to create the documentation, validate and spell-check it, make sure that all the possible information is present, etc., etc.

I started with something relatively easy (I thought), which was a consolidation of basic Dublin Core and SKOS concepts into an OWL 2 ontology. The work is not yet finished (I have only been playing with the definition over the last few days). The "finished" pieces are the ontology metadata/documentation (including what I didn't map and why), and several of the properties (contributor, coverage, creator, date, language, mimeType, rights and their sub-properties). The rest is all still a work-in-progress.

It has been interesting creating and dog-fooding the ontology. I can definitely say that it was updated based on my experiences in using it!

You can check out the ontology definition on github ( My "master" definition is in the .ofn file (OWL functional syntax), and I used Protege to generate a Turtle encoding from it. My goals are to maintain the master definition in a version-control-friendly format (ofn), and also providing a somewhat human-readable format (ttl). I also want to experiment with different natural language renderings that are more readable than Turtle (but I am getting ahead of myself).

I would appreciate feedback on this metadata work, and suggestions for other reusable ontologies (that would help to support industry and refine the development methodology). Some of the ontologies that I am contemplating are ontologies for collections, events (evaluating and bringing together concepts from several, existing event ontologies), actors, actions, policies, and a few others.

Please let me know what you think.


Saturday, April 5, 2014

Ontology Reuse and Ontology Summit 2014

I've been doing a lot of thinking about ontology and vocabulary reuse (given my role as co-champion of Track A in Ontology Summit 2014). We are finally in our "synthesis" phase of the Summit, and I just updated our track's synthesis draft yesterday.

So, while this is all fresh in my mind, I want to highlight a few key take-aways ... For an ontology to be reused, it must provide something "that is commonly needed"; and then, the ontology must be found by someone looking to reuse it, understood by that person, and trusted as regards its quality. (Sam Adams made all these points in 1993 in a panel discussion on software reuse.) To be understood and trusted, it must be documented far more completely than is (usually) currently done.

Here are some of the suggestions for documentation:
  • Fully describe and define each of the concepts, relationships, axioms and rules that make up the ontology (or fragment)
  • Explain why the ontology was developed
  • Explain how the ontology is to be used (and perhaps how the uses may vary with different triple stores or tools)
  • Explain how the ontology was/is being used (history) and how it was tested in those environment(s)
    • Explain differences, if it is possible to use the ontology in different ways in different domains and/or for different purposes
  • Provide valid encoding(s) of the ontology
    • These encodings should discuss how each has evolved over time
    • "Valid" means that there are no consistency errors when a reasoner is run against the ontology
    • It is also valuable to create a few individuals, run a reasoner, and make sure that the individual's subsumption hierarchy is correct (e.g., an individual that is supposed to only be of type "ABC", is not also of type "DEF" and "XYZ")
    • Multiple encodings may exist due to the use of different syntaxes (Turtle and OWL Functional Syntax, for example, to provide better readability, and better version control, respectively) and to specifically separate the content to provide:
      • A "basic" version of the ontology with only the definitive concepts, axioms and properties
      • Other ontologies that add properties and axioms, perhaps to address particular domains
      • Rules that apply to the ontology, in general or for particular domains
Defining much of this information is a goal of the VOCREF (Vocabulary and Ontology Characteristics Related to Evaluation of Fitness) Ontology, which was a Hackathon event in this year's Ontology Summit. I participated in that event on March 29th and learned a lot. A summary of our experiences and learnings is posted on the Summit wiki.

VOCREF is a good start at specifying characteristics for an ontology. I will certainly continue to contribute to it. But, I also feel that too much content is contained in the vocref-top ontology (I did create an issue to address this). That makes it too top-heavy and not as reusable as I would like. Some of the content needs to be split into separate ontologies that can be reused independently of characterizing an ontology. Also, the VOCREF ontology needs to "dog-food" its own concepts, relationships, ... VOCREF itself needs to be more fully documented.

To try to help with ontology development and reuse, I decided to start a small catalog of content (I won't go so far as to call it a "repository"). The content in the catalog will vary from annotation properties that can provide basic documentation, to general concepts applicable to many domains (for example, a small event ontology), to content specific to a domain. The catalog may directly reference, document and (possibly) extend ontologies like VOCREF (with correct attribution), or may include content that is newly developed. For example, right now, I am working on some general patterns and a high level network management ontology. I will post my current work, and then drill-down to specific semantics.

All of the content will be posted on the Nine Points github page. The content will be fully documented, and licensed under the MIT License (unless prohibited by the author and the licensing of the original content). In addition, for much of the content, I will also try to discuss the ontology here, on my blog.

Let me know if you have feedback on this approach and if there is some specific content that you would like to see!


Friday, March 28, 2014

Ontology Summit 2014 Hackathon, Mar 29 - Ontological Catalog Project

Ontology Summit 2014 Hackathon is tomorrow (Saturday, Mar 29). There are six proposals, all of which look really interesting! You can see a summary of each of them on the Hackathon wiki page.

As for me, I will be participating in the one that is listed last :-), with Amanda Vizedom. But, I have to warn you that the name is a bit overwhelming ... "An ontological catalogue of ontology and metadata vocabulary characteristics relevant to suitability for semantic web and big data applications". Whew!

Really, it is about creating simple sets of concepts (i.e., a "catalog") to characterize vocabularies, models and ontologies. The catalog will be created in GitHub as a publicly available, open-source ontology (collaboratively developed and extensible over time). This Hackathon project is all about the problem of reuse ... You can't reuse something unless you understand not only what its contents are, but the intent behind creating the thing, how it was designed to be used, etc. This is summarized in the following quote from the Summit's Track A synthesis:
Documentation must include the basic details of the semantics, but also the range of conditions, contexts and intended purposes for which the content was developed. It was recommended that standard metadata for reuse be defined and complete exemplars provided.
If you are interested in participating, the kickoff is 10am EDT tomorrow using Google Hangout. Working materials are available at If you are interested, have a look at the GitHub site, "watch" the repository, and become a contributor by adding yourself to the team roster on the project's wiki page.

Looking forward to working with you tomorrow!


Wednesday, March 12, 2014

Secure Collaboration and Intel's Reliance Point

Intel posted a blog article in mid February, about a research project that they call "Reliance Point". Based on the article, it appears that they are working on ways to selectively share data (addressing privacy and IP rights concerns), and provide integrity and isolation for that data. Intel refers to Reliance Point as a "trustworthy execution environment".

The environment is interesting in that it will bring together data from multiple providers, and allow the providers to perform calculations over the complete set. The providers have to agree on the algorithm that will be used to do the calculations and trust that the infrastructure will protect their data (and not allow other uses or algorithms to be executed, or the data to be revealed).

"Letting Data Breathe" is the name of the blog post. That title seems a bit exaggerated to me. For data to "breathe" (i.e., be integrated from multiple providers), there must be some standard set of semantics and structure that is supported by the providers, or there must a way to map between the syntax and (more importantly) the semantics of the different providers. Otherwise, what do calculations mean when run against data with unknown structure, and/or unknown and disparate semantics?

There is no mention of data integration in the article, just trustworthy data availability and negotiated algorithms. But, it seems to me that the project will not work if the problem of semantics is left to the data providers to solve out-of-band. In particular, how does one provider obtain the semantics of another's available data? How is this revealed while still protecting the IP rights of the provider? If proprietary data is shared, then it is likely proprietary all the way down to the layout and syntax of the data (perhaps defined by SQL). But, I have known companies that are reluctant to share even partial db structures since that information may reveal data or IP details.

To make Reliance Point work, something along the lines of OWL and RDF are needed - a way to specify semantics (OWL + SWRL/RIF, which I will discuss in another post) along with a way to handle multiple schemas (RDF). RDF defines a subject-predicate-object structure for data, which is very flexible. All databases can be translated into it. OWL and SWRL/RIF let you define equivalences, logical statements, disjointness and more, which are necessary to actually (semantically) integrate the data.

In theory, Reliance Point seems good, but Intel is working on the easier part of the problem (the infrastructure) and not the deeper problems that will prevent usage (integrating the data).


Wednesday, February 26, 2014

Design Considerations from Weinberg's "Introduction to General Systems Thinking"

For those of you bored with this topic, you will be happy to know that I am on my last post. For those of you that find this interesting, thanks for sticking with me. :-) (Also, I will suggest that you get a copy of the book and find your own insights and wisdom in it. I certainly did not cover all the details!)

Let me jump into design considerations that Weinberg highlights. These are questions that I ask myself when modeling a system/designing an ontology:
  • What notions are within and what are external to our systems? In other words, what are the boundaries of our system?
  • How might we partition or decompose our system? When we decompose, how do the parts interact?
  • In talking about "interactions", we start to talk about behavior... What is the behavior of the system and its parts? What are the system's and parts' possible states and how are they constructed or achieved?
    • What is constant (survival and stable) about the system? What changes? Are there observable causes and effects?
    • White box or black box techniques can be used to define and understand behavior. If we use black box, then we are observing and suffer the inadequacies of our observations (as pointed out in my second post on this subject). If we use white box, then we may miss behaviors that are inherent and not of our own construction. A combination of both techniques can provide good insights.
    • In understanding behavior, you have to assume that you have incomplete knowledge. This is Weinberg's Diachronic Principle: "If a line of behavior crosses itself, then either: the system is not state determined or we are viewing a projection - an incomplete view."
    • And, another, similar principle is the Synchronic Principle: "If two systems occur in the same position in the state space, at the same time, then the space is under dimensioned, that is, the view is incomplete."
  • What are the properties or qualities of our systems? How do they change as the system evolves? Are there invariant properties? In particular:
    • What properties are used for identity?
    • What is "typical" about the system?
    • What is "exceptional" (but important) about the system?
    • What properties are used to define and control the system?
  • What are we missing? (Sometimes we miss important properties or behaviors because we look at things using our "standard" approaches, notations and tools.) Can we look at a system or problem differently in order to gain new insights?
    • Weinberg highlights the need to change our methods and approaches when new problems present themselves (especially if our current ways of thinking are not working). He captures his insights in his Used Car Law, and asks "[W]hy do we continue pumping gas into certain antique ways of looking at the world, why do we sometimes expend mammoth efforts to repair them, and why do we sometimes trade them in?"
    • Another insight is Weinberg's Principle of Indeterminability: "We cannot with certainty attribute observed constraint either to system or environment." In other words, does our system have only small things in it because that is true or because we only have a net that lets in small things?
Hopefully, some of these questions will help when modeling or observing new systems. Let me know if you have other insights from Systems Thinking or just have other questions that you ask yourself when trying to understand a system or address a new problem.


Thursday, February 20, 2014

Part 2 on What I Learned from "General Systems Thinking"

Welcome back. I want to continue my overview of Gerald Weinberg's "Introduction to General Systems Thinking". I thought that I could shorten the discussion and fit the rest of the book into one more post. But, the more that I re-read the book, the more that I found to talk about! So, I am going to continue my ramblings and discuss Chapters 3 and 5, building on my previous post.

Chapter 3 is all about "System and Illusion" (that is its title!). It starts by asserting that a "system is a way of looking at the world". Obviously, how we look at something influences what we see. And, after we have looked at something, we can try to write it down/produce a model that describes the thing (describes that piece of the world).

We have lots of ways of looking at the world, and lots of models that reflect our different perspectives. But, these models won't all agree. It is important to remember that even if models don't disagree, each one may be totally valid (since it is defined from a unique perspective). Wow ... how do we do anything with that?

Well, we start by acknowledging that the problem exists and not get into religious wars about why "my model is better than your model" (or substitute the word, "ontology" for "model"). We need to understand the systems/models/ontologies and look at the perspectives behind why and how they were designed. When I have done this in the past, I saw things that I missed because I lacked certain perspectives. I also saw things that I did not need to worry about (at a particular point in time), but felt that I might need later. And, if I did need them later, how would I incorporate them? The last thing that I wanted was to paint myself into a corner.

So, what do we need to consider? Systems in the world of "general systems thinking" are sets of things. The entities in the sets vary by perspective. For an example, consider a candy bar. In a system that is concerned with eating, the candy bar may be broken into chunks of certain size and weight. And, you may be worried about the size of your chunk. But, in a system that is concerned with the chemical components of the candy bar, the size of the pieces doesn't matter at all. The constituent molecules and their combination are everything.

Hall and Fagen in their paper, "Definition of Systems" (1968) state that:
A system is a set of objects together with relationships between the objects and between their attributes.
That sounds a lot like an ontology. But, in defining the objects and their relationships, be careful. Don't be limited by how you write your model. Often, we get tangled up in our notations and limit our thinking to what we can express. For example, we say "I can express xyz with XML, or abc with OWL, or mno with UML. So, that is all that I can do." Indeed, we will always be limited by our notation, but our thinking should not be. This is one of the general cautions or principles that Weinberg highlights:
The Principle of Indifference: Laws should not depend on a particular choice of notation.
Unfortunately, Weinberg later turns around (in Chapter 5) and states that reality is quite different than what we want or what should happen (remember that there will always be exceptions to our laws and principles :-). Reality is never fully perceived, and our abstractions may be flawed since "the limited mental powers of the observers influence the observations that they make". That leads us to another principle:
The Principle of Difference: Laws should not depend on a particular set of symbols, but they usually do.
We get involved in our own symbols and notation. In fact, we may be agreeing on perspective and modeling the same system with the same components as someone else. But, we won't realize it, because we can't see past our notation.

So, let's take a step back and try to look at a system from many perspectives, and understand the semantics independent of the notation. We need to look at the system broadly enough to understand its scope, and then try to "get a minimal view" ("one that lumps together [information that is] ... unnecessarily discriminated"). But, when we simplify (or minimalize), we need to feel that our model is "satisfying", that it conforms to what we know of the world, its past behaviors and our past experiences. Weinberg calls this his "Axiom of Experience".

Buckminster Fuller put it more eloquently, I think ...
When I am working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong.

Wednesday, February 19, 2014

General Systems Thinking, a (classic) book by Gerald Weinberg

Early in my career, I was told to read the book, "An Introduction to General Systems Thinking", by Gerald Weinberg. It was supposed to teach me how to model. So, I dutifully read the book and ended up learning a great deal. Mostly, I learned how to look at problems, solutions and models.

For everyone that does not have time to read the book, or just wants a quick overview .... I would like to summarize the main concepts (what Weinberg calls "laws" and "principles") - especially the ones that have stuck with me through the years. So, here goes.

Let's start by observing the obvious. Systems are complex, and all of our science and technology fall down when faced with the fullness of that complexity. So, we try to simplify. We try to find the least amount of information (smallest number of assumptions or details) that help us define, predict or understand a "system". This works well when we have lots of individuals (or parts) in a system (leading to the "law of large numbers" - that statistical predictions work well when calculated over large populations) or when we have a very small number of individuals (when we can observe them all). We have problems, however, in the world of medium numbers.
Weinberg's Law of Medium Numbers: For medium number systems, we can expect that large fluctuations, irregularities, and discrepancy with any theory will occur more or less frequently ... Translated into our daily experience ... the Law of Medium Numbers becomes Murphy's Law ... Anything that can happen, will happen.
Many systems are typically made up of a medium number of individuals/parts. This means that we can't just average across the behavior of the parts, or look at all the parts independently. Taking the latter approach, we would have to consider all the parts and all their interactions within the system. But, we quickly find an exponential number of things that we have to study! So, we apply reasoning and simplifying assumptions.

Weinberg builds on the work of Kenneth Boulding ("General Systems as a Point of View") and highlights the need to discover "general laws". Typically, we find these laws by generalizing from specific, different disciplines. He examines the roles of analogy, categorization and the concepts of generalization and induction. The key is to find similarities in the individual "laws" of specific disciplines and extrapolate to the general laws. Then we can apply these general laws in new situations to understand them and draw conclusions. Weinberg says:
To be a successful generalist, one must study the art of ignoring data and of seeing only the "mere outlines" of things.
In defining his general laws, Weinberg makes a point of discussing that laws evolve as new information is discovered. But, he also observes that the laws themselves are usually the last things to change (the "Law of the Conservation of Laws" :-). Instead, conditions are added to the laws to account for negative cases. So, a law becomes "x=y unless ..., or if ...". Like the rules of parlay (Code of the Order of the Brethren) from Pirates of the Caribbean, "the code is more what you'd call 'guidelines' than actual rules". Weinberg's laws are meant to be memorable, generally true and 'guidelines'. They can be understood as bits of insight and "stimulants" to thought (i.e., not really "laws").

What are a few of Weinberg's general "laws"?
  • Law of Happy Peculiarities - Any general law must have at least two specific applications
  • Law of Unhappy Peculiarities - Any general law is bound to have at least two exceptions (or, if you never say anything wrong, then you never say anything)
  • Composition Law - The whole is more than the sum of its parts
  • Decomposition Law - The part is more than a fraction of the whole
I love Weinberg's "laws" and sense of humor. I noticed that one of my development "rules of thumb" simply follows from Weinberg's Law of Happy Peculiarities ... "You must always have at least 2 use cases. A use case of one always has a solution."

So with that observation, we come to the end of Chapter 2 in Weinberg's book. There are more great discussion points and principles that I want to cover. But, I will leave them to my next post. In the meantime, I will close with some thoughts from the mathematician, Mark Kac (which are mostly also at the end of Chapter 2):
Models are for the most part caricatures of reality, but if they are good, then, like good caricatures, they portray, though perhaps in distorted manner, some of the features of the real world ... The main role of models is not so much to explain and to predict - though ultimately these are the main features of science - as to polarize thinking and to pose sharp questions ... They should not, however, be allowed to multiply indiscriminately without real necessity or real purpose.
Kac implies that there are three activities that involve models - "improving thought processes" (as described in the quote above), "studying special systems" (understanding and translating to the specific from the general) and "creating new laws and refining old" (systems research!).



Monday, February 10, 2014

Design Thoughts

I was going through some older books that I had stored away, and came across the book, Change By Design, by Tim Brown. The book was published in 2009 and is not particularly earth shattering or deeply enlightening. But, it does contain some nuggets that resonated with me (especially watching less-experienced software engineers try to design a complex system, or (even harder) a good, domain ontology that is somewhat future proof).

There is a valuable quote from the book ...
Design thinking … [matches] human needs with available technical resources within the practical constraints of business.
Seems obvious, right? But there is really much more to this statement than is understood with a cursory reading.

When designing a solution, you have to start with the needs and requirements of your customers (and they are not always realistic). Then, you have your available technologies ... maybe OWL/RDF and triple stores, maybe natural language parsers and libraries, maybe good insights from design patterns that can be reused. Lastly, you have constraints - be they time, money, lack of experienced personnel or any number of other things.

Too often, I see a team focus on what they know, and design some amazing aspect of a project ... but NOT what the customer needs. So, you end up with a partial solution that disappoints your customer, but you are proud of what you accomplished. Then, you decide that the customer or user is unreasonable. I have seen this happen a lot.

So, here are some words of advice ...
  • Design the best solution that you can, taking everything into account and being honest where you are stretching beyond your comfort zones.
  • Always strive to do incremental development and prototyping.
  • Don't be afraid to fail and don't sacrifice the overall design to 1) make everything perfect or 2) focus on what you know. If you do the former, you will never deliver anything. If you do the latter, you will never be able to implement the big picture, because you are ignoring it. Also, you will surprise the customer because you are thinking that you have everything covered, when really, you only have a small aspect of the complete solution covered.
Also, remember that incremental development is just that ... incremental ... nothing in software is ever really perfect or complete. It really comes down to whether it works and whether the customer is happy.

P.S. I have only touched on a few of the points in the Change by Design book. There are lots of good summaries out on the web. One example is the page from the Idea Code Team.


Wednesday, February 5, 2014

More on modular ontologies and tying them together

There is a short email dialog on this topic on the Ontology Summit 2014 mail list. I thought that I would add the reply from Amanda Vizedom as a blog post (to keep everything in one place).

Amanda added:

The style of modularity you mention, with what another summit poster (forgive me for forgetting who at the moment) referred to as 'placeholder' concepts within modules, can be very effective. The most effective technique I've found to date, for some cases.

Two additional points are worth making about how two execute this for maximum effectiveness (they may match what you've done, in fact, but are sometimes missed & so worth calling out for others.

Point 1: lots of annotation on the placeholders. The location & connection of the well-defined concepts to link them to is often being saved for later and possibly for someone else. In order to make sure the right external concept is connected, whatever is known or desired of the underspecifies concept shoud be captured (in the location case, for example, may be that it needs to support enough granularit to be used for location at which a person can be contacted at current time, or must be the kind os location that has a shipping address, or is only intended to be the place of business of the enterprise to which Person is assigned & out of which they operate (e.g., embassy, business office, base, campus). That's often known or easily elicitable without leaving the focus of a specialized module, and can be captured in an annotation for use in finding existing, well defined ontology content and mapping.

Point 2: advantages of modules, as you described are best maintained when the import and mapping are done *not* in the specialized module, but in a "lower" mapping module that inherits the specialized module and the mapping-target ontologies. Spindles of ontologies, which can be more or less intricate, allow for independent development and reuse of specialized modules, with lower mapping and integration modules, with a spindle-bottom that imports all in the spindle and effectivle acts as the integrated query, testing, and application module for all the modules contained in that spindle, providing a simplified and integrated interface to a more complex and highly modular system of ontologies. Meanwhile, specialized modules can be developed with SMEs who don't know, care, or have time to think about the stuff they aren't experts about, like distinguishing kinds location or temporal relations or the weather. Using placeholders and doing your mapping elsewhere may sound like extra work, but considering what it can enable, it can be an incredibly effective approach.

Indeed, the second point is exactly my "integrating" ontology, which imports the target ontologies and does the mapping. As to the first point, that is very much worth highlighting. I err on the side of over-documenting and use various different kinds of notes and annotation. For a good example, take a look at the annotation properties in the FIBO Foundations ontology. It includes comment, description, directSource, keyword, definition, various kinds of notes, and much more.

Another set of annotation properties that I use (which I have not seen documented before, but that I think is valuable for future mapping exercises) are WordNet synset references - as direct references or designating them as hyponyms or hypernyms. (For those not familiar with WordNet, check out this page and a previous blog post.)


Sunday, February 2, 2014

Creating a modular ontology and then tying the pieces together

In my previous post, I talked about creating small, focused "modules" of cohesive semantic content.  And, since these modules have to be small, they can't (and shouldn't) completely define everything that might be referenced.  Some concepts will be under-specified.  

So, how we tie the modules together in an application?

In a recent project, I used the equivalentClass OWL semantic to do this. For example, in a Person ontology, I defined the Person concept with its relevant properties.  When it came to the Person's Location - that was just an under-specified (i.e., empty) Location class.  I then found a Location ontology, developed by another group, and opted to use that.  Lastly, I defined an "integrating" ontology that imported the Person and Location ontologies, and specified an equivalence between the relevant concepts.  So, PersonNamespace:Location was defined as an equivalentClass to LocationNamespace:Location. Obviously, the application covered up all this for the users, and my triple store (with reasoner) handled the rest.

This approach left me with a lot of flexibility for reuse and ontology evolution, and didn't force imports except in my "integrating" ontology.  And, a different application could bring in its own definition of Location and create its own "integrating" ontology.

But, what happens if you can't find a Location ontology that does everything that you need?  You can still integrate/reuse other work, perhaps defined in your integrating ontology as subclasses of the (under-specified) PersonNamespace:Location concept.

This approach also works well when developing and reusing ontologies across groups.  Different groups may use different names for the same semantic, may need to expand on some concept, or want to incorporate different semantics.  If you have a monolithic ontology, these differences will be impossible to overcome.  But, if you can say things like  "my concept X is equivalent to your concept Y" or "my concept X is a kind of your Y with some additional restrictions" - that is very valuable.  Now you get reuse instead of redefinition.


Wednesday, January 29, 2014

Reuse of ontology and model concepts

Reuse is a big topic in this year's Ontology Summit.  In a Summit session last week, I discussed some experiences related to my recent work on a network management ontology.  The complete presentation is available from the Summit wiki.  And, I would encourage you to look at all the talks given that day since they were all very interesting! (The agenda, slides, chat transcript, etc. are accessible from the conference call page.)

But ... I know that you are busy.  So, here are some take-aways from my talk:

  • What were the candidates for reuse?  There were actually several ontologies and models that were looked at (and I will talk about them in later posts), but this talk was about two specific standards:  ISO 15926 for the process industry, and FIBO for the financial industry.
  • Why did we reuse since there was not perfect overlap of the chosen domain models/ontologies and network management?  Because there was good thought and insight put into the standards, and there also was tooling developed that we want to reuse.  Besides that, we have limited time and money - so jump starting the development was "a good thing". 
  • Did we find valuable concepts to reuse?  Definitely.  Details are in the talk but two examples are:
    • Defining individuals as possible versus actual.  For anyone that worries about network and capacity planning, inventory management, or staging of new equipment, the distinction between what you have now, what you will have, and what you might have is really important.
    • Ontology annotation properties.  Documentation of definitions, sources of information, keywords, notes, etc. are extremely valuable to understand semantics.  I have rarely seen good documentation in an ontology itself (it might be done in a specification that goes with the ontology).  The properties defined and used in FIBO were impressive.
  • Was reuse easy?  Not really.  It was difficult to pull apart sets of distinct concepts in ISO 15926, although we should have (and will do) more with templates in the future.  Also, use of OWL was a mapping from the original definition, which made it far less "natural"/native.  FIBO was much more modular and defined in OWL.  But due to ontology imports, we pretty much ended up loading and working through the complete foundational ontology.  

Given all this, what are some suggestions for getting more reuse?

  1. Create and publish more discrete, easily understood "modules" that:
    • Define a maximum of 12-15 core entities with their relationships (12-15 items is about the limit of what people can visually retain)
    • Document the assumptions made in the development (where perhaps short cuts were made, or could be made)
    • Capture the axioms (rules) that apply separately from the core entities (this could allow adjustments to the axioms or assumptions for different domains or problem spaces, without invalidating the core concepts and their semantics)
    • Encourage evolution and different renderings of the entities and relationships (for example, with and without short cuts)
  2. Focus on "necessary and sufficient" semantics when defining the core entities in a module and leave some things under-specified  
    • Don't completely define everything just because it touches your semantics (admittedly, you have to bring all the necessary semantics together to create a complete model or ontology, but more on that in the next post)
    • A contrived example is that physical hardware is located somewhere in time and space, but it is unlikely that everyone's requirements for spatial and temporal information will be consistent.  So, relate your Hardware entity to a Location and leave it at that.  Let another module (or set of modules) handle the idiosyncrasies of Location.
In my next post, I promise to talk more about how to combine discrete "modules" with under-specified concepts to create a complete solution.


Tuesday, January 21, 2014

Semantic Technologies and Ontologies Overview Presentation

Last year, I did a short talk on semantic technologies and ontologies, and thought that I would share it. The audience was mostly people who were new to the technologies, and needed to understand their basics and how/where they are used.

[Disclaimer] The presentation is pretty basic ...

But it seemed to work. It overviews key terms (like the "o-word", ontology :-) and standards (based on the ever popular, semantic "layer cake" image). In looking over the deck, I see that I should have talked about RIF (Rule Interchange Format). But, I was using SWRL at the time, and so gravitated to that. (My apologies for not being complete.)

Since the talk was meant to show that semantic technologies are not just an academic exercise, I spent most of the time highlighting how and where the technologies are used. IMHO, I think that the major uses are:
  • Semantic search and query expansion
  • Mapping and merging of data
  • Knowledge management
The last bullet might be a bit ambiguous. For me, it means organizing knowledge (my blog's namesake) and inferring new knowledge (via reasoning and logic).

There are also quite a few examples of real companies using ontologies and semantic technologies. It is kind of amazing when you look at what is being done.

So, take a look and let me know what you think.

And, as a teaser, I want to highlight that I will be presenting at the next Ontology Summit 2014 session on Thursday, January 23rd, on "Reuse of Content from ISO 15926 and FIBO". If you want to listen in, the details for the conference call are here.  Hopefully, you can join in.


Wednesday, January 15, 2014

Ontology Summit 2014 Kicks Off Tomorrow, Jan 16

The topic for this year's Summit is "Big Data and Semantic Web Meet Applied Ontology". The Summit kicks off with a conference call on Thursday, January 16th, at 9:30am PST/12:30pm EST/5:30pm GMT (call details). Here is a short excerpt from the Summit's "Goals and Objectives":
Since the beginnings of the Semantic Web, ontologies have played key roles in the design and deployment of new semantic technologies. Yet over the years, the level of collaboration between the Semantic Web and Applied Ontology communities has been much less than expected. Within Big Data applications, ontologies appear to have had little impact. This year's Ontology Summit is an opportunity for building bridges between the Semantic Web, Linked Data, Big Data, and Applied Ontology communities.
For those of you not familiar with the Summit, it was started in 2006, and there is a different theme each year. It is sponsored by a set of organizations including NIST, Ontolog, NCOR, NCBO, IAOA & NITRD. The way that the Summit works is by a series of conference calls (every Thursday) and lots of email discussion. It is culminated by a face-to-face meeting in late April.

I highly recommend participating, or even just lurking. It is not necessary (or even possible :-) to attend the call every week and to read every email on the ontolog-forum. (Also, it is not mandatory to come to the face-to-face.) If you have to miss something, slides and transcripts of the conference chats, as well as an email archive, are available online. In addition, at the end of the Summit, a "communique" is prepared that summarizes the discussions and work.

I have lurked on the edges of the Summit since the late 2000s, but finally have the time to actively participate. This year, I am co-championing Track A on "Common, Reusable Semantic Content". This topic is near and dear to my heart since I am a strong proponent of reuse. IMHO, it is always valuable (when it is possible) to build on someone else's good work, and benefit from their learnings, rather than starting from scratch. So, when modeling or designing, I look to find something similar and then extrapolate from, or build on it.

Needless to say, I usually don't take other ontologies "en masse", but pick and choose the semantics, patterns or ideas that make sense. How to do this is one aspect of Track A, and there is more. Here are some excerpts from our "Mission" statement:

Semantic technologies such as ontologies and reasoning play a major role in the Semantic Web and are increasingly being applied to help process and understand information expressed in digital formats. Indeed, the derivation of assured knowledge from the connection of diverse (and linked) data is one of the main themes of Big Data ... One challenge in these efforts is to build and leverage common semantic content thus reducing the burden of new ontology creation while avoiding silos of different ontologies. Examples of such content are whole or partial ontologies, ontology modules, ontological patterns and archetypes, and common theories related to ontologies and their fit to the real world ... Achieving commonality and reuse in a timely manner and with manageable resources remain key ingredients for practical development of quality and interoperable ontologies ... This track will discuss the reuse problem and explore possible solutions. Among these are practical issues like the use of ontology repositories and tools, and the possibility of using basic and common semantic content in smaller, more accessible pieces. The goal is to identify exemplary content and also define the related information to enable use/reuse in semantic applications and services. A secondary goal is to highlight where more work is needed and enable the applied ontology community to further develop semantic content and its related information.
I hope that you can join me!


Thursday, January 9, 2014

Restarting my blog and research

Thanks to everyone who has stuck with me over the past few years as I got consumed in projects and neglected my blog. That phase of my life was officially over in mid-December (2013). I just joined a small consulting firm, Nine Points Solutions. I now have the opportunity to both consult and to continue my research into extracting and managing policies in support of security, configuration and operations. I am very excited to have the opportunity to do this, and will share what I am doing and what I learn (when I can).

As part of restarting the blog, I did go through and clean up my overviews and list of blogs that I read. Please take a look and tell me about any blogs that I missed and that you find valuable. I didn't change the overall design of the site, because it works for me. But, let me know if that is not true for you, my readers.

Lastly, I want to bring everyone up-to-speed (at a very high level) on my work over the last two+ years ... I had the opportunity to architect and create a policy-based management system based on OWL ontologies and using semantic technologies. The system was written in Java and used the Stardog triple store from Clark and Parsia. Needless to say, I made mistakes, got a lot of help from the Stardog developers, and learned a lot. I will be sharing some of that learning in future posts.

Thanks again! I am looking forward to writing more in the coming weeks!