Friday, December 15, 2017

Referencing Reused Classes and Properties When Working with Other Ontologies

While I am still toiling away on re-working OntoGraph to support diagramming RDF/RDFS (yes, it seems to be a major undertaking!), I thought that I would post a question that I received. Here it is ... "When reusing a bunch of different ontologies in a new ontology, how should reused classes and properties be referenced?" Should each of the reused ontologies be included "en masse", should individual entities be reused directly, should entities be redefined in the new ontology but using their original namespace, or should the entities be recreated? Unfortunately, this is a question that has no right answer, but I have some preferences.

First, let me explain the alternatives:
  • Included "en masse" means using import statements for each re-used ontology, and then referencing the specific entities (classes and properties) that are actually needed. Everything is referenced in the namespace where it was defined, and nothing is redefined or recreated.
  • Reusing a class or property directly means referencing that class or property but without importing the entire ontology. Everything is referenced using the namespace where it was defined, and nothing is redefined or recreated. But, you might end up with a triple that looks like this: myNamespace:someKindOfDate a owl:DatatypeProperty, owl:subPropertyOf dcterms:date. And, it is up to the infrastructure to resolve the "dcterms" (Dublin Core) namespace to get the details of the date property.
  • Redefining entities means that you take the classes or properties that should be reused and include their definitions in your ontology. So, if you are using the Dublin Core "creator" concept, you would include a definition for dcterms:creator. You might even add more information, as new predicates/objects defined for the entity, or maybe just copy over the existing predicates. Why might you do this? One reason is to have all the necessary details in one place. But, just as this is considered bad practice in programming (having multiple copies of the same code), I believe that copy and paste of another ontology's definition (using the same IRI/URI) is also wrong. You could end up with duplicated (or worse) divergent, or out-of-date declarations.
  • Recreating entities is similar to redefining them, but different in some important ways. In this case, you create a semantically equivalent entity. Using the example above, a myNamespace:author entity might be created and the relevant details defined for it. In addition, you define an equivalentClass/Property declaration, linking it to its source (in this case, dcterms:creator). Taking this approach, if dcterms:creator means something different in a future version, the equivalentProperty statement can be removed. Or, if a new metadata standard is dictated by your company or customer, you simply add another newMetadataNamespace:author equivalentProperty declaration.
Next, I will try to give all the pros and cons of using the same vs different namespaces, and recreating entities from one namespace in another.

A namespace exists to establish the provenance of the entities defined within it, and to identify that the entities are related. Ontologies should have loose coupling and tight cohesion, just like code - and the namespace can (should?) indicate the purpose/domain-space of the ontology. You can certainly group everything under the umbrella of a namespace that represents "my overall application space" - but that seems a bit too broad. Also, you might have another application in the future where you re-use one or more of your own ontologies - and then, one might question the "my overall application space" namespace, or question which entities in that namespace are relevant to the new application.

Also, a namespace helps to disambiguate entities that might have the same name - but not necessarily the same semantics (or detail of semantics) - across different ontologies. For example, a Location entity in an Event ontology (or more correctly, ontology design pattern, ODP) should not go into detail about Locations (that is not the purpose of the ontology). Defining locations would be better served by other ontologies that specifically deal with network, spatial-temporal, latitude-longitude-altitude and/or other kinds of locations. So, an under-defined Location in an Event ODP can then link - as an equivalent class - to the more detailed location declarations in other "Location"-specific ODPs. In this way, you get loose coupling and tight cohesion. You can pull out one network location ODP and replace it by a better one - without affecting the Event ODP. In this case, you would only change the equivalentClass definition. :-)

As for re-creating entities in the ODP namespace, that is really done for convenience. I can actually argue both sides of this issue (keeping the entities with their namespaces/provenance versus recreating them). But, erring on the side of simplicity, I recommend recreating entities in the new ontology's namespace (the last bullet above). This is especially relevant if only a portion of several existing ontologies/namespaces will be re-used. Why import large ontologies when you only need a handful of classes and properties? This can confuse your users and developers as to what is really relevant. Plus, you will have new entities/properties/axioms being defined in your new ontology. If you do not recreate entities, you end up with lots of different namespaces, and this translates to lots of different namespaces in your individuals. Your users and developers can become overwhelmed keeping track of which concept comes from which namespace.

For example, you may take document details from the SPAR DoCo ontology (http://www.sparontologies.net/ontologies/doco/source.ttl) and augment it with data from the Dublin Core (http://dublincore.org/2012/06/14/dcterms.rdf) and PRISM (http://prismstandard.org/namespaces/basic/2.0/) vocabularies, and then add details from the PROV-O ontology (http://www.w3.org/ns/prov-o). All these classes and properties use different namespaces and it gets hard to remember which is which. E.g., "foo" is an instance of the doco:document class and uses the dcterms:publisher and prism:doi properties, but is linked to a revision using a prov:wasDerivedFrom property. This could lead to errors in creating and querying the instances. It seems easier to say "foo" is an instance of the myData:document class, and uses the predicates myData:author, myData:publisher, myData:doi and myData:derivedFrom (where "myData" is the namespace of the ODP for tracking document details).

I know that some might disagree (or might agree!). If so, let me know.

Andrea