Friday, December 15, 2017

Referencing Reused Classes and Properties When Working with Other Ontologies

While I am still toiling away on re-working OntoGraph to support diagramming RDF/RDFS (yes, it seems to be a major undertaking!), I thought that I would post a question that I received. Here it is ... "When reusing a bunch of different ontologies in a new ontology, how should reused classes and properties be referenced?" Should each of the reused ontologies be included "en masse", should individual entities be reused directly, should entities be redefined in the new ontology but using their original namespace, or should the entities be recreated? Unfortunately, this is a question that has no right answer, but I have some preferences.

First, let me explain the alternatives:
  • Included "en masse" means using import statements for each re-used ontology, and then referencing the specific entities (classes and properties) that are actually needed. Everything is referenced in the namespace where it was defined, and nothing is redefined or recreated.
  • Reusing a class or property directly means referencing that class or property but without importing the entire ontology. Everything is referenced using the namespace where it was defined, and nothing is redefined or recreated. But, you might end up with a triple that looks like this: myNamespace:someKindOfDate a owl:DatatypeProperty, owl:subPropertyOf dcterms:date. And, it is up to the infrastructure to resolve the "dcterms" (Dublin Core) namespace to get the details of the date property.
  • Redefining entities means that you take the classes or properties that should be reused and include their definitions in your ontology. So, if you are using the Dublin Core "creator" concept, you would include a definition for dcterms:creator. You might even add more information, as new predicates/objects defined for the entity, or maybe just copy over the existing predicates. Why might you do this? One reason is to have all the necessary details in one place. But, just as this is considered bad practice in programming (having multiple copies of the same code), I believe that copy and paste of another ontology's definition (using the same IRI/URI) is also wrong. You could end up with duplicated (or worse) divergent, or out-of-date declarations.
  • Recreating entities is similar to redefining them, but different in some important ways. In this case, you create a semantically equivalent entity. Using the example above, a myNamespace:author entity might be created and the relevant details defined for it. In addition, you define an equivalentClass/Property declaration, linking it to its source (in this case, dcterms:creator). Taking this approach, if dcterms:creator means something different in a future version, the equivalentProperty statement can be removed. Or, if a new metadata standard is dictated by your company or customer, you simply add another newMetadataNamespace:author equivalentProperty declaration.
Next, I will try to give all the pros and cons of using the same vs different namespaces, and recreating entities from one namespace in another.

A namespace exists to establish the provenance of the entities defined within it, and to identify that the entities are related. Ontologies should have loose coupling and tight cohesion, just like code - and the namespace can (should?) indicate the purpose/domain-space of the ontology. You can certainly group everything under the umbrella of a namespace that represents "my overall application space" - but that seems a bit too broad. Also, you might have another application in the future where you re-use one or more of your own ontologies - and then, one might question the "my overall application space" namespace, or question which entities in that namespace are relevant to the new application.

Also, a namespace helps to disambiguate entities that might have the same name - but not necessarily the same semantics (or detail of semantics) - across different ontologies. For example, a Location entity in an Event ontology (or more correctly, ontology design pattern, ODP) should not go into detail about Locations (that is not the purpose of the ontology). Defining locations would be better served by other ontologies that specifically deal with network, spatial-temporal, latitude-longitude-altitude and/or other kinds of locations. So, an under-defined Location in an Event ODP can then link - as an equivalent class - to the more detailed location declarations in other "Location"-specific ODPs. In this way, you get loose coupling and tight cohesion. You can pull out one network location ODP and replace it by a better one - without affecting the Event ODP. In this case, you would only change the equivalentClass definition. :-)

As for re-creating entities in the ODP namespace, that is really done for convenience. I can actually argue both sides of this issue (keeping the entities with their namespaces/provenance versus recreating them). But, erring on the side of simplicity, I recommend recreating entities in the new ontology's namespace (the last bullet above). This is especially relevant if only a portion of several existing ontologies/namespaces will be re-used. Why import large ontologies when you only need a handful of classes and properties? This can confuse your users and developers as to what is really relevant. Plus, you will have new entities/properties/axioms being defined in your new ontology. If you do not recreate entities, you end up with lots of different namespaces, and this translates to lots of different namespaces in your individuals. Your users and developers can become overwhelmed keeping track of which concept comes from which namespace.

For example, you may take document details from the SPAR DoCo ontology (http://www.sparontologies.net/ontologies/doco/source.ttl) and augment it with data from the Dublin Core (http://dublincore.org/2012/06/14/dcterms.rdf) and PRISM (http://prismstandard.org/namespaces/basic/2.0/) vocabularies, and then add details from the PROV-O ontology (http://www.w3.org/ns/prov-o). All these classes and properties use different namespaces and it gets hard to remember which is which. E.g., "foo" is an instance of the doco:document class and uses the dcterms:publisher and prism:doi properties, but is linked to a revision using a prov:wasDerivedFrom property. This could lead to errors in creating and querying the instances. It seems easier to say "foo" is an instance of the myData:document class, and uses the predicates myData:author, myData:publisher, myData:doi and myData:derivedFrom (where "myData" is the namespace of the ODP for tracking document details).

I know that some might disagree (or might agree!). If so, let me know.

Andrea

Monday, October 23, 2017

OntoGraph Server

In order to make everyone's life easier, we (Nine Points Solutions) are sponsoring a hosted server running the latest release of OntoGraph and Stardog.

Feel free to give it a try by uploading your ontology, and generating a graph.

Anything that is uploaded is used only in generating the graph and then all details are deleted.

Andrea

P.S. Maintenance will be done (if needed) on Saturdays, noon-4pm Eastern time.

Graphing with OWL Reasoning

Another version of OntoGraph (V1.0.2) was released today. The main goal was to add OWL reasoning to determine individuals' types. Why might this be important? Well, an individual might be referenced in an ontology, but not defined with a rdfs:type. Or, the individual might be defined with a type, and then also used as the subject or object in a triple. If the predicate of the triple (the relating property) is defined with domains and/or ranges, then a reasoner can infer the type(s) of the individual. This is also useful to find errors in the ontology, its logic and its semantics (more on that later).

Here is a simple example:
@prefix ninepts: <http://purl.org/ninepts/test#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

<http://purl.org/ninepts/test> rdf:type owl:Ontology .

<http://purl.org/ninepts/test#class1> rdf:type owl:Class .

<http://purl.org/ninepts/test#class2> rdf:type owl:Class .

<http://purl.org/ninepts/test#class3> rdf:type owl:Class .

<http://purl.org/ninepts/test#class4> rdf:type owl:Class .

<http://purl.org/ninepts/test#objProp1> rdf:type owl:ObjectProperty ;
   rdfs:domain ninepts:class3, ninepts:class4.

<http://purl.org/ninepts/test#objProp2> rdf:type owl:ObjectProperty ;
   rdfs:range ninepts:class1, ninepts:class2 .

<http://purl.org/ninepts/test#individual1> ninepts:objProp1 ninepts:individual2.

<http://purl.org/ninepts/test#individual3> ninepts:objProp2 ninepts:individual4.
This example is written in using the Turtle syntax, and basically defines 4 classes (class1 - class4), 2 properties (objProp1 and objProp2), and 4 individuals (individual1 - individual4). The property, objProp1, is defined with 2 classes as its domain (and no range), while objProp2 is defined with 2 classes as its range (and no domain). (No domain or no range for an object property means that there is no intended semantic - that anything, any "owl:Thing", is the domain or range.) The individuals are defined in 2 triples indicating that individual1 is related to individual2 (via objProp1), and individual3 is related to individual4 (via objProp2).

Without OWL reasoning, the individuals have no types. In fact, OntoGraph does not even find any individuals since it "locates" individuals by querying for any entity that has an explicit type of owl:NamedIndividual, or that has a type that begins with a prefix other than OWL or RDF/RDFS. (The query allows us to avoid returning classes (type owl:Class) and properties (type owl:ObjectProperty or owl:DatatypeProperty) when searching for individuals and their types.)

But, if we run OntoGraph with reasoning turned on, then we find that there are indeed 4 individuals, and that individual1 has the types defined for the domain of objProp1, and individual2 has the types defined for the range of objProp2. This is shown in the figure below, which was generated by OntoGraph.



If this seems odd, think about how the reasoner works ... The ontology defined individual1 as the subject of a triple with the predicate, objProp1. We know that any subject (the domain of the property) of objProp1 is defined to be of types, class3 and class4. So, individual1 is "reasoned" to be of those 2 types. Similarly, individual4 is the object of a triple with the predicate, objProp2. And, we know from the ontology that any object (the range of the property) of objProp2 is defined to be of types, class1 and class2. There you have it ...

The reasoner can't determine anything about individual2 or individual3, except that they are themselves individuals. The reasoner figures this out since they are an object and subject (respectively) in triples whose predicates are object properties. By the way, the reasoner also determined that they are of type, owl:Thing, which doesn't tell you much (everything is of type, owl:Thing, unless there is a logical inconsistency in the ontology). OntoGraph does not bother to show that detail since it adds no information to the graph (but does clutter it up).

Now, why did I talk earlier, about illustrating errors in the ontology? If you look at the ontology definition above, you see that the domain of objProp1 is "ninepts:class3, ninepts:class4". Many people writing ontologies mistakenly think that the definition means that the domain is EITHER class3 OR class4. But, that is incorrect. The definition actually means that the domain is BOTH class3 AND class4. Therefore, an individual must be of BOTH types (multiple inheritance), or stated another way, is defined as the intersection of both types. There are some ways to get around this, as discussed in these two posts from StackOverflow (using one property with multiple domains and how to define multiple domains and ranges). I am not going to repeat the answers (which are both very good), but will talk more about reasoning and errors in my next post.

As always, let me know if you have any questions.

Andrea

Friday, October 6, 2017

OntoGraph V1.0.1 and a Discussion of VOWL

Continuing the evolution of OntoGraph, we published fixes to three minor issues, and updated the README text to address user questions that we received. The README, code, and jar and zip (in the /ontograph-<major.minor.release#> directory) are all updated. The changes are all described in the commit history. In addition, a few new issues were added to our backlog based on feedback and questions. Take a look at the current set of issues and let me know if you have issues to add or want to highlight which ones are important to you. Or, you can just add a comment to the issues directly.

Right now, we are planning on another update (V1.1.0) at the end of October. We will be addressing all the known bugs and adding support for diagramming straight RDF - i.e., to support Linked Data.

Enough of that ... Let's move onto discussing the graph output for a VOWL visualization from OntoGraph versus what is defined in the official specification. First off, per the specification ...
OWL elements such as owl:allValuesFrom, owl:someValuesFrom, owl:hasValue, rdfs:comment, rdfs:seeAlso, rdfs:isDefinedBy, and owl:DataRange (rdfs:datatype in OWL 2 which has a representation in the current specification) are not part of the VOWL visualization but could be displayed in another way (e.g. as text information in a tooltip or sidebar). This is also the case for the OWL elements owl:Ontology, owl:differentFrom, owl:AllDifferent, owl:distinctMembers, owl:Restriction, owl:onProperty, owl:AnnotationProperty, and owl:OntologyProperty that serve as containers of other elements, link individuals, or define ontology metadata.
OntoGraph diverges from the specification for annotation properties. These are displayed in a graph, similar to datatype properties. Ignoring these properties can omit valid information (usable constructs) from a graph. For example, the Friend-Of-A-Friend ontology (FOAF) defines annotation properties for information mapped from the Web of Trust (WOT) and Dublin Core schemas. These properties (especially ones from Dublin Core such as "description") are often used on class, property and element declarataions.

For many of the restriction-related elements listed above (such as owl:Restriction, owl:all/someValuesFrom, ...), OntoGraph outputs labeled edges and text in "UML boxes" that define the details. In my experience, when restrictions are used, understanding them is essential to understanding the ontology.

As regards OWL connectives, VOWL easily shows equivalencies, unions, intersections, complements or disjoint definitions between named classes. But, the defined graphing approach fails when one or more of the related classes are blank (anonymous/un-named) nodes. Consider how equivalent classes are shown - as a circle "with a double border... One of the class labels is the main label, while the rest is listed in square brackets (abbreviated if they do not all fit)." Next, consider how connectives (unions, intersections, etc.) are shown - as two or more classes connected via dashed lines (without arrowheads), to an image of a Venn diagram labeled with a union, intersection or complement logical symbol. The Venn diagram image "represents the anonymous class of the owl:unionOf [owl:intersectionOf, ...] statement".

There are two problems with these conventions when dealing with nested blank nodes. For example, consider a class, foo, that is equivalent to the union of two other blank nodes - the complement of a class, bar, and an intersection of the classes, classA and classB. Since the union node (the equivalency) is anonymous/un-named, there is nothing to display on the second line of foo's node label. As for the second problem, although the union, intersection and complement images can be diagrammed and connected to the relevant classes via dashed lines ... there is no way to understand that the complement and intersection definitions are the entities being unioned, unless arrowheads are used. (Without arrowheads, there could be many interpretations - such as, classB is the intersection of classA and a union declaration.) In standard VOWL, there are simply dashed lines running between all the images. This is shown in the image below.



This same ontology is shown as output by OntoGraph:



OntoGraph addresses VOWL's connectives issues by drawing equivalencies similar to "Subclass of" declarations, and by using arrowheads to indicate exactly what is unioned, intersected or complemented. As another example, here is a snippet of a graph of the W3C Turtle Primer, a complex ontology based on union, intersection, complement and disjoint declarations, as well as restrictions. The majority of this detail would be missing in an "official" VOWL diagram.



Another thing that is missing in the VOWL specification is the display of individuals. Whereas many ontologies do indeed focus on the TBox (the concepts and relationships of a domain), the Linked Data and application worlds have to deal with individuals/instances (the ABox). Being able to diagram your instances is important. But, even if you want to restrict yourself to the TBox world, when you have "one-of" definitions (for enumerations and restrictions), graphing these is important. OntoGraph accepts that Abox individuals are not graphed in VOWL, but does support individual diagrams in the custom, Graffoo and UML visualizations. In addition, OntoGraph displays "one-of" declarations using a UML Note format. An example can be seen at the bottom of the image above.

There are two more major (but related) issues to discuss regarding VOWL ... The first issue involves how node and property names are displayed in a graph. VOWL recommends that an implementation display any rdfs:label that may be defined for a class or property. But, "if elements do not have an rdfs:label, it is recommended to take the last part of the URI as label, i.e. the part that follows the last slash (/) or hash (#) character. Labels may be abbreviated if they do not fit in the available space (e.g. "Winnie-the-Pooh" → "Winnie…"). The full label should be shown on demand in these cases (e.g. in a tooltip or sidebar)." Unfortunately, this last aspect is not possible to support in yEd or any static copy of a graph. And, even the Example in the VOWL Specification does not show the full label when an abbreviated name is displayed!

The second, related issue is that because either a label or local name is displayed, VOWL does not include prefixes/full URIs in its graph. Instead, colors are used to distinguish what is "external" to an ontology (i.e., when a declared element uses a different base URI than the ontology URI/IRI). "External" classes and properties are shown in a darker color (darker blue for OWL classes and properties). In addition, the class nodes also carry the word, "external", in brackets, on the second line. There are several problems with this approach:
  • It will not be possible to distinguish the source of "external" references, and the problem is compounded if there are multiple imported/referenced vocabularies or ontologies. For example, the FOAF diagram includes a node (Spatial Thing) from the WGS84 Geo Positioning RDF vocabulary (WGS84) and another node (Concept) from the SKOS vocabulary (skos). Both of these are displayed in dark blue in the VOWL graph, with the text, "[external]", under their labels. OntoGraph follows this convention.
  • If there are equivalencies to multiple class declarations (from different, external ontologies) but those declarations have the same local name, then the local name will be repeated. For example, FOAF defines equivalent classes for the FOAF Person concept - linking it to the Schema.org Person class and the Person class from Tim Berner-Lee's Contact ontology. The result is a node whose label is "Person [Person, Pe...]". For domain experts reviewing a graph, this would be confusing at best. As above, OntoGraph follows this convention.
That's it for VOWL! Let me know if this information is helpful. Thanks for reading!

Andrea

Tuesday, September 19, 2017

What can be learned from the OntoGraph project?

OntoGraph was introduced in my last post, OWL Ontology Graphing Program Available as Open Source. And there are a lot of interesting things in the code! Over the next weeks, I want to take time to relay my learnings, as well as to provide insights into OWL ontologies, SPARQL queries, Bootstrap, Backbone and RESTful interfaces, the Model-View-Controller and other patterns, Spring Boot, Lombok, programming Stardog, testing, Gradle builds, and much more. Some of this will be basic stuff (but hopefully useful to some of my readers) and some will be more advanced. Feel free to pick and choose, or let me know what you want to hear about!

But, first, I want to talk about our development environment ...

The precursor to OntoGraph was originally created in about 2 days to provide some basic diagrams of a customer's ontology. Hand-drawing all the classes, properties, axioms, etc. of the ontologies was too painful and error-prone. Using a tool like OntoViz with Protege was just not flexible enough, and the images were not what the customer wanted to see. The ProtegeVOWL plug-in was also not sufficient since VOWL does not diagram all the necessary constructs (I will talk more about this in a future post). In addition, the customer did not want to be tied to using Protege since they weren't ontologists. They just wanted a diagram and to be able to play around with the layout.

Well, the 2 day "quick and dirty" version worked and the customer had their diagrams. That could have been the end of the story. But, we hired an intern who needed to learn about ontologies, the Stardog triple store, SPARQL queries and lots of other things. So, we decided to use the graphing program as a learning experience. We took the initial work and decided first to just address some bugs. Then, we decided to add the ability to customize the output, which required a front-end. Then, we added support for different kinds of visualization (Graffoo, VOWL, UML). And, the program grew. We changed directions, rewrote whole sections of the program, updated our approach to the front-end at least three times, updated our approach to testing at least twice, and upgraded our infrastructure at least twice (updating the Gradle, Stardog, Javascript libraries, etc.). We put months of work into the program, definitely taking an agile approach and learning to "fail fast".

There are lessons here ... Good software takes time. There is always more to learn. Don't be afraid to take what you learn and rewrite what is problematic (as long as you have time and there are no other programming fires burning). There is always something that you can do better. And, always remember that Stack Overflow is your friend!

Well, ok then ... back to agile. For our agile environment, we used Atlassian's products - JIRA for issue tracking and managing our process (Kanban actually), integrated with a Bitbucket Git repository for version control, and Bamboo as our continuous integration environment. Since we are a small company, this was an easy and cheap solution ($10 for each product). In addition, when we decided to get serious about releasing the code as open-source, we also decided to incoporate SonarQube into our continuous integration environment.

As someone who either spent too much or too little time on code reviews, SonarQube was great! Per Wikipedia, it provides "continuous inspection of code quality to perform automatic reviews with static analysis of code to detect bugs, code smells and security vulnerabilities" (http://en.wikipedia.org/wiki/SonarQube). And, it does this for 20+ programming languages (but we only needed Java, JavaScript and CSS). This took a lot of the pain out of code reviews. I focused on whether the method and property names were understandable, if the code seemed reasonable and was somewhat efficient, and things like that. SonarQube took care of finding problems related to bad practice, lack of efficiency, and errors (such as not initializing a variable). In addition, SonarQube would complain if you nested if/while/for/switch/try statements too deeply, or implemented methods with too many parameters or that were too complex. In reality, SonarQube was tougher on my code than any team review that I had experienced in the past.

Now, you can make things easier on yourself and change the defaults in the SonarQube rules. For example, you can allow a complexity of 30 instead of 15, or allow nesting of if/while/... past 3 levels. But, we didn't do that for OntoGraph. We figured that we would keep the defaults and fix most of the problems (or, we would eventually fix them). There are some "issues" that are just false positives, and others that we have not yet addressed. If you want to find them in the OntoGraph code, just look for "//NOSONAR" and then the explanation that follows. The "//NOSONAR" comment tells SonarQube to ignore the issue for now - either it is a false positive or we acknowledge that there is a problem and are willing to accept the issue for now. I think that this is a valuable approach. Most of the existing issues in OntoGraph are complexity, and we will fix those!

Another important aspect is test coverage. When we decided to release OntoGraph as open source, we set a testing threshold of at least 80% on the back-end processing classes (so this would be GraphController.java, GraphDAO.java and all the classes in the graphmloutputs folder). All of these classes have coverage between 93.2% and 98.3%, except one. TitleAndPrefixCreation.java has a test coverage of 77.8%, with 2 (yes, 2) uncovered lines. Those lines throw an IllegalAccessError exception if something tries to instantiate the class (which should not be done since the class contains only 1 static method). Oh well, we decided that this was definitely good enough!

You can see SonarQube in action by downloading OntoGraph and following SonarQube's instructions for Getting Started in Two Minutes. After starting and logging into SonarQube according to the instructions, go to where you downloaded OntoGraph. Type "./gradlew sonar" or "gradlew.bat sonar" for Windows (making sure that you have installed Gradle :-). After that completes, you can see all the rules/issues, statistics and more.

P.S. Sorry for the riff on SonarQube, but I wanted to hit on some cool details. And, I will talk about how Gradle supports SonarQube in a future post. This post just got way too long!

Andrea

Wednesday, September 13, 2017

OWL Ontology Graphing Program Available as Open Source

It has been forever since I last blogged on this site (more than a year, for which I feel terrible). I have been wrapped up in work for a customer whose details are proprietary, and I was also slowly working to create (what I hope will be valuable) ontology graphing software. I wished that the work on the graphing software would have been available sooner, but better late than never ... The graphing software is called OntoGraph, is finally at a point where it is acceptable to publish, and I can freely discuss it on the blog! So, here we go ...

You can check out the work at Nine Points Solutions' GitHub repository.

OntoGraph is a Spring Boot application for graphing OWL ontologies (yes, the title says this). It lets you go from XML/RDF, Turtle and several other OWL syntaxes to a custom, Graffoo, VOWL or UML-like diagram. For example, you can go from something like this (this excerpt comes from the Friend of a Friend, FOAF.rdf ontology - you can see the complete FOAF ontology at http://xmlns.com/foaf/spec/index.rdf) ...



To ...



The above image is a VOWL rendering of FOAF.

OntoGraph is designed with a Bootstrap- and Backbone-based GUI (written in Javascript), interfacing with a RESTful API. The main program is written in Java. It operates by creating various GraphML outputs of a user-provided OWL ontology file. (Or, it also accepts a zip file of a set of ontology files). The program stores the ontologies in the Stardog triple store, then runs a series of queries to return the necessary information on the classes, properties, individuals... to be diagrammed. Layout of the resulting GraphML is handled by another program. (We recommend yEd.)

Four visualizations of ontology data can be generated:
  • Custom format (defined to fit existing business or personal preferences)
  • Graffoo
  • UML-like
  • VOWL
And, information can be segmented to display:
  • Class-related information (subclassing, equivalent and disjoint classes, class restrictions, ...)
  • Individual instances, their types, and their datatype and object property information
  • Property information (datatype and object properties, functional/symmetric/... properties, domain and range definitions, ...)
  • Both class and property information
Complete information about OntoGraph, how to run it, and issues and upcoming features are available at the GitHub repository. Also, there is a pre-publication version of a paper there, there explains OntoGraph and why it was created. (The paper will be available in the next issue of the Journal of Applied Ontology, from IOS Press.)

So, now that OntoGraph is finally published, I can start to blog about its components, design and design decisions, testing, and lots of other details. I just needed something concrete!

I hope that you find the program useful!

Andrea