Wednesday, September 23, 2009

What Modelers Can Learn from WordNet

In WordNet, there are several significant types of relationships, with very specific semantics. These semantics apply in modeling as well. However, some modeling approaches focus on (or only allow) some of the semantics - for example, focusing on inheritance and property definition with much else excluded.

So, what are the types of relationships in WordNet? They are listed below along with their linguistic terminology (holonym, meronym, troponym, etc.).
  • Inheritance information - the definition of hypernyms/superclasses and hyponyms/subclasses. Most modeling approaches handle this very well.
  • Coordinate terms - Related information, usually the sibling entities under a single superclass. Again, this is well covered.
  • Aggregation - the definition of holonyms/aggregates and meronyms/aggregated items. However, WordNet further refines aggregation as:

    • Whole/part information - For example, fingers are part of a hand, but can be treated as separate entities. Their lifetimes are influenced by the lifetime of the "whole". Obviously, if a hand is cut off, the fingers are cut off with the hand.
    • Substance/composition of an entity - For example, cement and sand are substances in concrete, but once mixed, they are not separate entities.
    • Membership information - For example, certain employees are members of a security group, but the entities are separate, with separate lifetimes. So, removing the security group does not remove the employees, or removing the employees from the group does not delete the group.

  • Attribute information - HAS-A data, well addressed by all modeling infrastructures.
  • Synonym information - Alias information and equivalent terms. Lack of this information (or meta-information) usually causes the arguments when defining the single "name" of a modeled entity.
  • Antonym/opposite information - There is usually no need to reflect this in a model. My preference is OWL's disjointWith distinction, that 2 classes have no common individuals.
  • Refinement information - Defining troponyms for verbs (relationships). This involves refining a verb by the manner in which it is performed. For example, to mumble is "to talk indistinctly by lowering the voice or partially closing the mouth". This could be modeled as a typing hierarchy involving associations. But, typically, typing hierarchies involving associations are defined based on a restriction of the referenced elements, versus a refinement of the semantics of the association. Often, we make too much of the restriction scenarios and too little of the refinement of semantics.
  • Entailment of information, in WordNet it is entailment of verbs - Entailment is the implication of one fact from another. For verbs, it is based on temporal inclusion. For example, the act of snoring implies sleeping. OCL is one example of how this is supported in today's modeling infrastructure - across nouns and verbs/associations.
  • Cause data for transitive, intransitive verbs - This is best described by example ... knowing that the wind storm broke the window is the CAUSE of the window being broken (a resulting state). Having this level of information as data or meta-data in a model could assist immeasurably with root cause analysis.
I don't know about you, but I am very impressed with the knowledge in WordNet.

Thursday, September 17, 2009

Apologies on being MIA and a quote from Abraham Lincoln

I started a new job (at CA, Inc.) and had to take a bit of time to learn the company and my role. Now that I am settled in (as much as one can ever settle in :-)), I can get back to devoting some time to blogging. With the disclaimer that ...

The opinions and statements on this site are my own and do not reflect the opinions or policies of CA, Inc.

So, onto my return post, I simply want to publish a quotation from Abraham Lincoln that is very relevant to semantics and models ...

It is unbelievable how often I hear things like "just change the name/representation of foo to be bar and we can get alignment". However, the real issue is not the representative name (which should indeed be clear), but the semantics that the people used, who defined the word "foo" in the first place. If we could move modeling to more than words on a UML diagram, to analyzing semantics ... we would be much better off.

Another thing that I sometimes hear is the question "what makes an instance a foo"? For example, "if I magically change the color and size of an elephant is it still an elephant?" Or, riffing on Nassim Nicholas Taleb's theory, "is a black swan a swan?" Semantics will help us understand what being a swan or an elephant or an IT service means - beyond the name. And, we have an added benefit ... if we make the essential definitions clear, then we have a better way to correct invalid definitions by removing problematic clauses (such as "all swans are white").