Tuesday, May 20, 2014

Diagramming an RDF/XML OWL ontology

Over the course of time (many times, in fact), I have been asked to "graph" my ontologies to help visualize the concepts. Anyone who has worked with Protege (or the Neon Toolkit or other tools) knows that none of the tools give you all the images that you really need to document your work. I have often resorted to hand-drawing the ontologies using UML diagrams. This is both painful and a huge time sink.

Recently, I was reading some emails on the Linked Data community distribution list about how they generate the LOD cloud diagram. Omnigraffle is used in the "official" workflow to create this diagram, but that tool costs money to buy. One of the email replies discussed a different approach.

A gentleman from Freenet.de needed to draw a similar diagram for the data cloud for the Open Linguistics Working Group. His team could not use the same code and processing flow as the LOD cloud folks, since they didn't have many Mac users. So, they developed an alternative based on GraphML. To create the basic graph, they developed a Python script. And, ...
Using yed's "organic" layout, a reasonable representation can be achieved which is then manually brought in shape with yed (positioning) and XML (font adjustment). In yed, we augment it with a legend and text and then export it into the graphic format of choice.
Given my propensity to "reuse" good ideas, I decided to investigate GraphML and yEd. And, since GraphML is XML, ontologies can be defined in RDF/XML, and XSLT can be used to transform XML definitions, I used XSLT to generate various GraphML outputs of an ontology file. Once the GraphML outputs were in place, I used yEd to do the layout, as the Freenet.de team did. (It is important to note that the basic yEd tool is free. And, layout is the most difficult piece of doing a graphic.)

So, what did I find? You can be the judge. The XSLTs are found on GitHub (check out http://purl.org/NinePts/graphing). There are four files in the graphing directory:
  • AnnotationProperties.xsl - A transform of any annotation property definitions in an RDF/XML file, drawing them as rectangles connected to a central entity named "Annotation Properties".
  • ClassHierarchies.xsl - A transform of any class definitions in an RDF/XML file, drawing them in a class-superclass hierarchy.
  • ClassProperties.xsl - A transform of any data type and object property definitions in an RDF/XML file, drawing them as rectangles with their types (functional, transitive, etc.) and domains and ranges.
  • PropertyHierarchies.xsl - A transform of any data type and object property definitions in an RDF/XML file, drawing their property-super property relationships.
I executed the transforms using xsltproc. An example invocation is:
xsltproc -o result.graphml ../graphing/ClassProperties.xsl metadata-properties.owl
I then took the result.graphml and opened it in the yEd Graph Editor. (If you do the same, you will find that all the classes, or properties lay on top of each other. I made no attempt to do any kind of layout since I planned to use yEd for that purpose.) For the class properties graph (from the above invocation), I used the Layout->Radial formatting, with the default settings. Here is the result:

I was impressed with how easy this was!

The really great thing is that if you don't like a layout, you can choose another format and even tweak the results. I did some tweaking for the "Property Hierarchies" diagram. In this case, I ran the PropertyHierarchies.xsl against the metadata-properties.owl file and used the Hierarchical Layout on the resulting GraphML file. Then, I selected all the data properties and moved them underneath the object properties. Here is the result:

Admittedly, the diagrams can get quite complex for a large ontology. But, you can easily change/combine/separate the XSLT transforms to include more or less content.

With about a day and half's worth of work (and using standards and free tooling), I think that I saved myself many frustrating and boring hours of diagramming. Let me know if you find this useful, or you have other suggestions for diagramming ontologies.


Words and writing ...

I came across an amazing blog entry today. I love reading and writing. When writing, I try to communicate my thoughts in a (hopefully) clear and entertaining manner. I often use dictionaries and thesauri to get ideas for new and different words, to spice up my paragraphs when they seem dull.

But, after reading the blog entry from James Somers, "You're probably using the wrong dictionary", I know that I have been fooled ("deceived; imposed upon") by my current tools.

Mr. Somers' blog discusses how Webster came to create the first dictionary, how John McPhee uses Webster's dictionary when creating his fourth draft of a work, and how dictionaries could come to inspire thought and writing. I know that you don't believe me on that last point ... so go check out The ARTFL Project (Webster's Dictionary, 1913 and 1828 editions). Enter any word that comes to mind and see what you find.

Here is my example, I entered the word, car (trying for a word that was mundane). Here is the text from the 1828 edition ...
1. A small vehicle moved on wheels; usually, one having but two wheels and drawn by one horse; a cart.
2. A vehicle adapted to the rails of a railroad. [U. S.] &hand; In England a railroad passenger car is called a railway carriage; a freight car a goods wagon; a platform car a goods truck; a baggage car a van. But styles of car introduced into England from America are called cars; as, tram car. Pullman car. See Train.
3. A chariot of war or of triumph; a vehicle of splendor, dignity, or solemnity. [Poetic].
   The gilded car of day. Milton.
   The towering car, the sable steeds. Tennyson.
4. (Astron.) The stars also called Charles's Wain, the Great Bear, or the Dipper.
   The Pleiads, Hyads, and the Northern Car. Dryden.
5. The cage of a lift or elevator.
6. The basket, box, or cage suspended from a ballon to contain passengers, ballast, etc.
7. A floating perforated box for living fish.

[U. S.] Car coupling, or Car coupler, a shackle or other device for connecting the cars in a railway train. [U. S.] -- Dummy car (Railroad), a car containing its own steam power or locomotive. -- Freight car (Railrood), a car for the transportation of merchandise or other goods. [U. S.] -- Hand car (Railroad), a small car propelled by hand, used by railroad laborers, etc. [U. S.] -- Horse car, or Street car, an ommibus car, draw by horses or other power upon rails laid in the streets. [U. S.] -- Mcol>Palace car, Drawing-room car, Sleeping car, Parior caretc. , (Railroad), cars especially designed and furnished for the comfort of travelers.
I was blown away! Webster's 1828 and 1913 dictionaries will become my new source of words (admittedly, not modern, but definitely poetic). Mr. Somers explains how you can download and install the 1913 edition, and use it in conjunction with your other dictionaries on your Mac, Kindle and iPad. That upgrade of my dictionaries is underway as I type.


Monday, May 12, 2014

Ontology Summit 2014 and the communique

Ontology Summit 2014 officially concluded with the symposium on April 28-29. There were some great keynotes, summary presentations and discussions. You can see most of the slides on the Day 1 and Day 2 links, and can also check out the online, unedited Day 1 chat and Day 2 chat.

The main "output" of each Ontology Summit is a communique. This year's communique is titled Semantic Web and Big Data Meets Applied Ontology, consistent with the Summit theme. Follow the previous link to get the full document, and consider endorsing it (if you are so inclined). To endorse the communique, send an email to with the subject line: "I hereby confirm my endorsement of the OntologySummit2014 Communique" and include (at least) your name in the body of the email. Other remarks or feedback can also be included. And, I would encourage you to add your thoughts.

I want to provide a quick list of the high points of the communique (for me):
  • In the world of big data, ontologies can help with semantic integration and mapping, reduction of semantic mismatches, normalization of terms, and inference and insertion of metadata and other annotations.
  • Development approaches that involve a heavy-weight, complete analysis of "the world" are evolving to lighter weight approaches. This can be seen in the development of ontology design patterns, the use of ontologies in Watson, and the bottom-up annotation and interlinking approaches of web/RESTful services (as "Linked Services").
  • There are some best practices that can be applied for sharing and reuse to succeed (and since I drafted most of these best practices, I am just copying them directly below :-)):
    • Wise reuse possibilities follow from knowing your project requirements. Competency questions should be used to formulate and structure the ontology requirements, as part of an agile approach. The questions help contextualize and frame areas of potential content reuse.
    • Be tactical in your formalization. Reuse content based on your needs, represent it in a way that meets your objectives, and then consider how it might be improved and reused. Clearly document your objectives so that others understand why you made the choices that you did.
    • Small ontology design patterns provide more possibilities for reuse because they have low barriers for creation and potential applicability, and offer greater focus and cohesiveness. They are likely less dependent on the original context in which they were developed.
    • Use "integrating" modules to merge the semantics of reused, individual content and design patterns.
    • Separately consider the reuse of classes/concepts, from properties, from individuals and from axioms. By separating these semantics (whether for linked data or ontologies) and allowing their specific reuse, it is easier to target specific content and reduce the amount of transformation and cleaning that is necessary.
    • RDF provides a basis for semantic extension (for example, by OWL and RIF). But, RDF triples without these extensions may be underspecified bits of knowledge. They can help with the vocabulary aspects of work, but formalization with languages like OWL can more formally define and constrain meaning. This allows intended queries to be answerable and supports reasoning.
    • Provide metadata (providing definitions, history and any available mapping documentation) for your ontologies and schemas. Also, it is valuable to distinguish constraints or concepts that are definitive (mandatory to capture the semantics of the content) versus ones that are specific to a domain. Domain-specific usage, and "how-to" details for use in reasoning applications or data analytics are also valuable. Some work in this area, such as Linked Open Vocabularies and several efforts in the Summit's Hackathon, is underway and should be supported.
    • Use a governance process for your ontologies (and it would be even better if enforced by your tooling). The process should include open consideration, comment, revision and acceptance of revisions by a community.
  • Lastly, what are some of the interesting areas of investigation? One area, certainly, is the need for tooling to better support modular ontology development, integration, and reuse. Another is support for hybrid reasoning capabilities - supporting both description logic and first-order logic reasoning, and both logical and probabilistic reasoning. Third, tooling that combines data analytic and ontological processing would be valuable to make sense of "big data", and aid in the dissemination of the resulting knowledge to users and for decision support. To truly address this last area, it may be necessary to create specialized hardware and processing algorithms to combine and process data using the graph-structured representations of ontologies.
That's it for me, but please take a look at the communique, draw your own conclusions, and determine your own highlights.


Wednesday, May 7, 2014

Updated metadata ontology file (V0.6.0) and new metadata-properties ontology (V0.2.0) on GitHub

I've spent some time doing more work on the general metadata ontologies (metadata-annotations and metadata-properties). Metadata-annotations is now at version 0.6.0. In this release, I mainly corrected the SPARQL queries that were defined as the competency queries. SPARQL is straightforward, but it is easy to make mistakes. I made a few in my previous version (because I just wrote the queries by hand, without testing them - my bad). Anyway, that is all fixed now and the queries are correct. My apologies on the errors.

You can also see that there is a new addition to the metadata directory with the metdata-properties ontology. Metadata-properties takes some of the concepts from metadata-annotations, and redefines them as data and object properties. In addition, a few supporting classes are defined (specifically, Actor and Modification), where required to fully specify the semantics.

Actor is used as the subject of the object properties, contributedTo and created. Modification is designed to collect all the information related to a change or update to an individual. This is important when one wants to track the specifics of each change as a set of related data. This may not be important - for example, if one only wants to track the date of last modification or only track a description of each change. In these cases, the data property, dateLastModified, or the annotation property, changeNote, can be the predicate of a triple involving the updated individual directly.

It is important to understand that only a minimum amount of information is provided for Actor and Modification. They are defined, but are purposefully underspecified to allow application- or domain-specific details to be provided in another ontology. (In which case, the IRIs of the corresponding classes in the other ontology would be related to Actor and Modification using an owl:equivalentClass axiom. This was discussed in the post on modular ontologies, and tying together the pieces.)

Also in the metadata-properties ontology, an identifier property is defined. It is similar to the identifier property from Dublin Core, but is not equivalent since the metadata-properties' identifier is defined as a functional data property. (The Dublin Core property is "officially" defined as an annotation property.)

To download the files, there is information in the blog post from Apr 17th.

Please let me know if you have any feedback or issues.