Monday, October 12, 2009
Too Many Standards?
Wednesday, September 23, 2009
What Modelers Can Learn from WordNet
- Inheritance information - the definition of hypernyms/superclasses and hyponyms/subclasses. Most modeling approaches handle this very well.
- Coordinate terms - Related information, usually the sibling entities under a single superclass. Again, this is well covered.
- Aggregation - the definition of holonyms/aggregates and meronyms/aggregated items. However, WordNet further refines aggregation as:
- Whole/part information - For example, fingers are part of a hand, but can be treated as separate entities. Their lifetimes are influenced by the lifetime of the "whole". Obviously, if a hand is cut off, the fingers are cut off with the hand.
- Substance/composition of an entity - For example, cement and sand are substances in concrete, but once mixed, they are not separate entities.
- Membership information - For example, certain employees are members of a security group, but the entities are separate, with separate lifetimes. So, removing the security group does not remove the employees, or removing the employees from the group does not delete the group.
- Attribute information - HAS-A data, well addressed by all modeling infrastructures.
- Synonym information - Alias information and equivalent terms. Lack of this information (or meta-information) usually causes the arguments when defining the single "name" of a modeled entity.
- Antonym/opposite information - There is usually no need to reflect this in a model. My preference is OWL's disjointWith distinction, that 2 classes have no common individuals.
- Refinement information - Defining troponyms for verbs (relationships). This involves refining a verb by the manner in which it is performed. For example, to mumble is "to talk indistinctly by lowering the voice or partially closing the mouth". This could be modeled as a typing hierarchy involving associations. But, typically, typing hierarchies involving associations are defined based on a restriction of the referenced elements, versus a refinement of the semantics of the association. Often, we make too much of the restriction scenarios and too little of the refinement of semantics.
- Entailment of information, in WordNet it is entailment of verbs - Entailment is the implication of one fact from another. For verbs, it is based on temporal inclusion. For example, the act of snoring implies sleeping. OCL is one example of how this is supported in today's modeling infrastructure - across nouns and verbs/associations.
- Cause data for transitive, intransitive verbs - This is best described by example ... knowing that the wind storm broke the window is the CAUSE of the window being broken (a resulting state). Having this level of information as data or meta-data in a model could assist immeasurably with root cause analysis.
Thursday, September 17, 2009
Apologies on being MIA and a quote from Abraham Lincoln
Wednesday, June 10, 2009
Types of Models - For Business Versus For IT
In the paper, "generalized" models are those used to define database/storage structures and to find the general themes and fundamental aspects of the data (and its values). In short, they are the data models defined by IT to effectively and efficiently use the technologies that are in place (like SQL databases). Maybe "reduced" is a better word than "generalized" ...
On the other hand, "detailed" models are those that are useful to business people. They define and describe the information requirements of the business, and its vocabularies, rules and processes. They hold the details from the business perspective. Again, maybe another word like "conceptual" is better (since even the "generalized" models hold "details") ...
What is valuable is not the titles used for these models but their semantics. :-) The key message is that a business needs both types of models and they need to stay in sync. This is really important. The conceptual/detailed models hold the real business requirements and language. They haven't been reduced to basic data values whose semantics are lost in the technology used to define and declare them.
IMHO, a business loses information and knowledge when it only retains and works from the IT models. There is much to be gleaned from the business input and much value in keeping the business people engaged in the work. This is almost impossible once you reduce the business requirements to technology-speak.
As the report says, "do not allow generalized models to compromise your understanding of the business."
Monday, June 8, 2009
PriceWaterhouseCoopers Spring Technology Forecast (Part 3)
The article includes a great quote on the information problem, why today's approaches (even metadata) are not enough, and the uses of Semantic Web technologies ... "Think of Linked Data as a type of database join that relies on contextual rules and pattern matching, not strict preset matches. As a user looks to mash up information from varied sources, Linked Data tools identify the semantics and ontologies to help the user fit the pieces together in the context of the exploration. ... Many organizations already recognize the importance of standards for metadata. What many don’t understand is that working to standardize metadata without an ontology is like teaching children to read without a dictionary. Using ontologies to organize the semantic rationalization of the data that flow between business partners is a process improvement over electronic data interchange (EDI) rationalization because it focuses on concepts and metadata, not individual data elements, such as columns in a relational database management system. The ontological approach also keeps the CIO’s office from being dragged into business-unit technical details and squabbling about terms. And linking your ontology to a business partner’s ontology exposes the context semantics that data definitions lack." PwC suggests taking 2 (non-exclusive) approaches to "explore" the Semantic Web and Linked Data:
- Add the dimension of semantics and ontologies to existing, internal data warehouses and data stores
- Provide tools to help users get at both internal and external Linked Data
Wednesday, June 3, 2009
PriceWaterhouseCoopers Spring Technology Forecast (Part 2)
The second featured article is Making Semantic Web connections. It discusses the business value of using Linked Data, and includes interesting information from a CEO survey about information gaps (and how the Semantic Web can address these gaps). The article argues that to get adequate information, the business must better utilize its own internal data, as well as data from external sources (such as information from members of the business' ecosystem or the Web). This is depicted in the following two figures from the article ...
I also want to include some quotes from the article - especially since they support what I said in an earlier blog from my days at Microsoft, Question on what "policy-based business" means ... :-)
- Data aren’t created in a vacuum. Data are created or acquired as part of the business processes that define an enterprise. And business processes are driven by the enterprise business model and business strategy, goals, and objectives. These are expressed in natural language, which can be descriptive and persuasive but also can create ambiguities. The nomenclature comprising
- ... the natural language used to describe the business, to design and execute business processes, and to define data elements is often left out of enterprise discussions of performance management and performance improvement.
- ... ontologies can become a vehicle for the deeper collaboration that needs to occur between business units and IT departments. In fact, the success of Linked Data within a business context will depend on the involvement of the business units. The people in the business units are the best people to describe the domain ontology they’re responsible for.
- Traditional integration methods manage the data problem one piece at a time. It is expensive, prone to error, and doesn’t scale. Metadata management gets companies partway there by exploring the definitions, but it still doesn’t reach the level of shared semantics defined in the context of the extended virtual enterprise. Linked Data offers the most value. It creates a context that allows companies to compare their semantics, to decide where to agree on semantics, and to select where to retain distinctive semantics because it creates competitive advantage.
And, yes, I did say something similar to this in an earlier post on Semantic Web and Business. (Thumbs up :-)
Tuesday, June 2, 2009
PriceWaterhouseCoopers Spring Technology Forecast (Part 1)
Spinning a data Web overviewed the technologies of the Semantic Web, and discussed how businesses can benefit from developing domain ontologies and then mediating/integrating/querying them across both internal and external data. The value of mediation is summarized in the following figure ...
I like this, since I said something similar in my post on the Semantic Web and Business.
Backing up this thesis, Tom Scott of BBC Earth provided a supporting quote in his interview, Traversing the Giant Global Graph. "... when you start getting either very large volumes or very heterogeneous data sets, then for all intents and purposes, it is impossible for any one person to try to structure that information. It just becomes too big a problem. For one, you don’t have the domain knowledge to do that job. It’s intellectually too difficult. But you can say to each domain expert, model your domain of knowledge— the ontology—and publish the model in the way that both users and machine can interface with it. Once you do that, then you need a way to manage the shared vocabulary by which you describe things, so that when I say “chair,” you know what I mean. When you do that, then you have a way in which enterprises can join this information, without any one person being responsible for the entire model. After this is in place, anyone else can come across that information and follow the graph to extract the data they’re interested in. And that seems to me to be a sane, sensible, central way of handling it."