Monday, October 12, 2009
Wednesday, September 23, 2009
- Inheritance information - the definition of hypernyms/superclasses and hyponyms/subclasses. Most modeling approaches handle this very well.
- Coordinate terms - Related information, usually the sibling entities under a single superclass. Again, this is well covered.
- Aggregation - the definition of holonyms/aggregates and meronyms/aggregated items. However, WordNet further refines aggregation as:
- Whole/part information - For example, fingers are part of a hand, but can be treated as separate entities. Their lifetimes are influenced by the lifetime of the "whole". Obviously, if a hand is cut off, the fingers are cut off with the hand.
- Substance/composition of an entity - For example, cement and sand are substances in concrete, but once mixed, they are not separate entities.
- Membership information - For example, certain employees are members of a security group, but the entities are separate, with separate lifetimes. So, removing the security group does not remove the employees, or removing the employees from the group does not delete the group.
- Attribute information - HAS-A data, well addressed by all modeling infrastructures.
- Synonym information - Alias information and equivalent terms. Lack of this information (or meta-information) usually causes the arguments when defining the single "name" of a modeled entity.
- Antonym/opposite information - There is usually no need to reflect this in a model. My preference is OWL's disjointWith distinction, that 2 classes have no common individuals.
- Refinement information - Defining troponyms for verbs (relationships). This involves refining a verb by the manner in which it is performed. For example, to mumble is "to talk indistinctly by lowering the voice or partially closing the mouth". This could be modeled as a typing hierarchy involving associations. But, typically, typing hierarchies involving associations are defined based on a restriction of the referenced elements, versus a refinement of the semantics of the association. Often, we make too much of the restriction scenarios and too little of the refinement of semantics.
- Entailment of information, in WordNet it is entailment of verbs - Entailment is the implication of one fact from another. For verbs, it is based on temporal inclusion. For example, the act of snoring implies sleeping. OCL is one example of how this is supported in today's modeling infrastructure - across nouns and verbs/associations.
- Cause data for transitive, intransitive verbs - This is best described by example ... knowing that the wind storm broke the window is the CAUSE of the window being broken (a resulting state). Having this level of information as data or meta-data in a model could assist immeasurably with root cause analysis.
Thursday, September 17, 2009
Wednesday, June 10, 2009
In the paper, "generalized" models are those used to define database/storage structures and to find the general themes and fundamental aspects of the data (and its values). In short, they are the data models defined by IT to effectively and efficiently use the technologies that are in place (like SQL databases). Maybe "reduced" is a better word than "generalized" ...
On the other hand, "detailed" models are those that are useful to business people. They define and describe the information requirements of the business, and its vocabularies, rules and processes. They hold the details from the business perspective. Again, maybe another word like "conceptual" is better (since even the "generalized" models hold "details") ...
What is valuable is not the titles used for these models but their semantics. :-) The key message is that a business needs both types of models and they need to stay in sync. This is really important. The conceptual/detailed models hold the real business requirements and language. They haven't been reduced to basic data values whose semantics are lost in the technology used to define and declare them.
IMHO, a business loses information and knowledge when it only retains and works from the IT models. There is much to be gleaned from the business input and much value in keeping the business people engaged in the work. This is almost impossible once you reduce the business requirements to technology-speak.
As the report says, "do not allow generalized models to compromise your understanding of the business."
Monday, June 8, 2009
The article includes a great quote on the information problem, why today's approaches (even metadata) are not enough, and the uses of Semantic Web technologies ... "Think of Linked Data as a type of database join that relies on contextual rules and pattern matching, not strict preset matches. As a user looks to mash up information from varied sources, Linked Data tools identify the semantics and ontologies to help the user fit the pieces together in the context of the exploration. ... Many organizations already recognize the importance of standards for metadata. What many don’t understand is that working to standardize metadata without an ontology is like teaching children to read without a dictionary. Using ontologies to organize the semantic rationalization of the data that flow between business partners is a process improvement over electronic data interchange (EDI) rationalization because it focuses on concepts and metadata, not individual data elements, such as columns in a relational database management system. The ontological approach also keeps the CIO’s office from being dragged into business-unit technical details and squabbling about terms. And linking your ontology to a business partner’s ontology exposes the context semantics that data definitions lack." PwC suggests taking 2 (non-exclusive) approaches to "explore" the Semantic Web and Linked Data:
- Add the dimension of semantics and ontologies to existing, internal data warehouses and data stores
- Provide tools to help users get at both internal and external Linked Data
Wednesday, June 3, 2009
The second featured article is Making Semantic Web connections. It discusses the business value of using Linked Data, and includes interesting information from a CEO survey about information gaps (and how the Semantic Web can address these gaps). The article argues that to get adequate information, the business must better utilize its own internal data, as well as data from external sources (such as information from members of the business' ecosystem or the Web). This is depicted in the following two figures from the article ...
I also want to include some quotes from the article - especially since they support what I said in an earlier blog from my days at Microsoft, Question on what "policy-based business" means ... :-)
- Data aren’t created in a vacuum. Data are created or acquired as part of the business processes that define an enterprise. And business processes are driven by the enterprise business model and business strategy, goals, and objectives. These are expressed in natural language, which can be descriptive and persuasive but also can create ambiguities. The nomenclature comprising
- ... the natural language used to describe the business, to design and execute business processes, and to define data elements is often left out of enterprise discussions of performance management and performance improvement.
- ... ontologies can become a vehicle for the deeper collaboration that needs to occur between business units and IT departments. In fact, the success of Linked Data within a business context will depend on the involvement of the business units. The people in the business units are the best people to describe the domain ontology they’re responsible for.
- Traditional integration methods manage the data problem one piece at a time. It is expensive, prone to error, and doesn’t scale. Metadata management gets companies partway there by exploring the definitions, but it still doesn’t reach the level of shared semantics defined in the context of the extended virtual enterprise. Linked Data offers the most value. It creates a context that allows companies to compare their semantics, to decide where to agree on semantics, and to select where to retain distinctive semantics because it creates competitive advantage.
And, yes, I did say something similar to this in an earlier post on Semantic Web and Business. (Thumbs up :-)
Tuesday, June 2, 2009
Spinning a data Web overviewed the technologies of the Semantic Web, and discussed how businesses can benefit from developing domain ontologies and then mediating/integrating/querying them across both internal and external data. The value of mediation is summarized in the following figure ...
I like this, since I said something similar in my post on the Semantic Web and Business.
Backing up this thesis, Tom Scott of BBC Earth provided a supporting quote in his interview, Traversing the Giant Global Graph. "... when you start getting either very large volumes or very heterogeneous data sets, then for all intents and purposes, it is impossible for any one person to try to structure that information. It just becomes too big a problem. For one, you don’t have the domain knowledge to do that job. It’s intellectually too difficult. But you can say to each domain expert, model your domain of knowledge— the ontology—and publish the model in the way that both users and machine can interface with it. Once you do that, then you need a way to manage the shared vocabulary by which you describe things, so that when I say “chair,” you know what I mean. When you do that, then you have a way in which enterprises can join this information, without any one person being responsible for the entire model. After this is in place, anyone else can come across that information and follow the graph to extract the data they’re interested in. And that seems to me to be a sane, sensible, central way of handling it."
Sunday, May 31, 2009
Data is usually meta-data (data about data) - what a document is about, additional information like who the author is, etc. But, it can also be the raw information - like a business vocabulary.
Linkages are the relationships between the data. The information that ties the data together and lets you infer and extrapolate.
Infrastructure is the formalisms of the languages (RDF, RDF Schema, OWL, SPARQL, ...) and the services that are already provided (W3C's Linked Data, Protege, Pellet, ...). Data without backing services and formalisms means that you have to create everything yourself and there is no exponential building of knowledge that comes from sharing the data.
That's it. Let me know if you agree with my 3 words or have different ones.
Friday, May 29, 2009
Linking Open Data (LOD) is a W3C project. According to their web site, "The goal of the W3C SWEO Linking Open Data community project is to extend the Web with a data commons by publishing various open data sets as RDF on the Web and by setting RDF links between data items from different data sources.RDF links enable you to navigate from a data item within one data source to related data items within other sources using a Semantic Web browser. RDF links can also be followed by the crawlers of Semantic Web search engines, which may provide sophisticated search and query capabilities over crawled data. As query results are structured data and not just links to HTML pages, they can be used within other applications. ... Collectively, the data sets consist of over 4.7 billion RDF triples, which are interlinked by around 142 million RDF links (May 2009)."
Here is the LOD figure showing what is linked today (actually March 2009):
Going back to Tim Berners-Lee's request for us to imagine what it would be like to have people load and connect knowledge, let's imagine what all this data can do for a business and its decision making processes ....
Tuesday, May 26, 2009
Tim Berners-Lee focused on the Web of Data in his TED talk on the next Web (recorded in Feb 2009). The talk is only a little longer than 15 minutes in length, and I highly recommend it. The key points are that we are now moving from a document-centric approach to storing information, to making raw data available and processable. That raw data is "linked data" - data about things (identified by URIs), including other interesting information (as RDF triples) and highlighting the relationships between the things. It is important to note that this is not about making data available through specific APIs or anticipated/pre-programmed queries on a "pretty" web site - but about making the "unadulterated data" available for machine understanding and new uses. It is about sharing and adding to data, making connections and relationships in novel ways, and bridging disciplines.
If you think about business and an enterprise, think about how powerful this would be - to capture knowledge, share it via social networking technologies, allow update and addition to the knowledge within the enterprise (again using the social networking tools of today), and to bridge disciplines and knowledge using the Semantic web mining and matching technologies. Overall, we improve the ability of the enterprise to capture and access its knowledge, and increase the captured knowledge. In the talk, Tim Berners-Lee asks people to imagine the "incredible resource" of "people doing their bit to produce a little bit, and it all connecting."
Just imagine ....
Monday, May 18, 2009
To do this, you first need the capture of the vocabularies (yes, I will get back to this in my postings :-) - and then mappings to the physical stores. Microsoft's EDM (Entity Data Model) and Entity Framework are a start in enabling the mappings. They allow you to define a conceptual model, a physical model and then map between the two - although they don't help you create the conceptual or physical models, are not focused on conceptual modeling, and are too focused on the physical structure of the data store. Specifically, some of the ideal mappings are not possible (at least the last time that I tried), and all the data and meta-data that I would like to capture about the conceptual model are not possible to do (without extensions). But, they exist, are usable today, and will definitely be improved.
Another cool thing is that EDM and the framework allow you to write queries in the conceptual model, that are then translated to the physical one and run against the store. Pretty neat. Now, let's put a better query capability up front (like Wolfram|Alpha) ....
Monday, May 11, 2009
The three categories for capturing and sharing knowledge are:
- Technocratic - involved with tooling and the use of technology for knowledge management
- Economic - relating knowledge and income
- Behavioral -dealing with how to organize to facilitate knowledge capture and exchange
Within each of the categories, Earl posited that there are "schools" or focuses for knowledge management. Earl's seven schools are listed below (with some short descriptions):
- Systems - Part of the technocratic category, focusing on the use of technology and the storing of explicit knowledge in databases and various systems and repositories. The knowledge is typically organized by domain.
- Cartographic - Part of the technocratic category, focusing on who the "experts" are, in a company, and how to find and contact them. So, instead of explicit captured knowledge, the tacit knowledge held by individuals is paramount.
- Engineering - Part of the technocratic category, focusing on capturing and sharing knowledge for process improvement. In addition, the details and outputs of various processes and knowledge flows are captured. The knowledge in this school is organized by activities with the goal of business process improvement.
- Commercial - This is the only "economic" school and focuses on knowledge as a commercial asset. The emphasis is on income, which can be achieved in various ways ... such as limiting access to knowledge, based on payments or other exchanges, or rigorously managing a company's intellectual portfolio (individual know-how, patents, trademarks, etc.).
- Organizational - Part of the behavioral category, focusing on building and enabling knowledge-sharing networks and communities of practice, for some business purpose. Earl defines it as a behavioral school "because the essential feature of communities is that they exchange and share knowledge interactively, often in nonroutine, personal, and unstructured ways". For those not familiar with the term "community of practice", it is defined by Etienne Wenger as “groups of people who share a concern or a passion for something they do and learn how to do it better as they interact regularly.”
- Spatial - Part of the behavioral category, focusing on how space is used to facilitate socialization and the exchange of knowledge. This can be achieved by how office buildings are arranged, co-locating individuals working on the same project, etc.
- Strategic - Part of the behavioral category, focusing on knowledge (according to Earl) as "the essence of a firm's strategy ... The aim is to build, nurture, and fully exploit knowledge assets through systems, processes, and people and convert them into value as knowledge-based products and services." This may seem like the strategic school rolls all the others into it, and it does. But, what distinguishes it, again according to Earl, "is that knowledge or intellectual capital are viewed as the key resource."
And, how do you do this? Via capturing, publishing and mapping each business group's/community's vocabularies (ontologies) and processes, and understanding that community's organizational structure.
Tuesday, May 5, 2009
My passion for this title comes from the need to meld business knowledge with IT infrastructure - organizing the business' inherent and (usually) implicit knowledge by first capturing it and then making it usable, accessible and actionable (within the IT infrastructure). There is another aspect to this also - taking lots of information (already in the IT infrastructure) and organizing it to turn it into knowledge (not just bits of data).
Given these two goals, you find (or will find) lots of postings about ontologies, business processes, semantic web and similar topics in this blog. (Also, you will occasionally find some riffs on digital natives and education - since these are of particular interest to me.) I will not repeat postings from my earlier blog (while I was at Microsoft). You can read these yourself at http://blogs.msdn.com/policy_based_business.
Well, back to what others think about "organizing knowledge". Most of the work in this space is related to organizing and cataloging library materials, since libraries were the main repository of knowledge, and books the main format up until this digital age. This has now all changed. The need to catalog and classify books, using a single scheme, in order to find a particular book on a particular shelf in a physical library building is no longer a primary driver. One would argue that it is not even an appropriate driver, in a fast-paced, online business environment. (However, I must confess to a passion for reading real, physical books, away from the electronic distractions of today's environments.)
Libraries had a need for a single, driving organizational scheme since they often only had a few copies of a book and could not have them scattered across many shelves, classified in different ways. Now, multiple classifications/organization schemes can exist and cross-reference each other.
Where before knowledge extraction was all manual ... someone had to read the books, examine the world, organize and build on the knowledge, draw new insights and conclusions ... we now have tons of data stored on our computers and on the Web, and the help of the semantic web and description logic reasoners. Notice that I said "the help of semantic web" - it still requires a person to classify, organize and query knowledge in valid ways, and to interpret the results. I am not close to advocating for or finding the HAL computer from 2001. :-)
So, back to what others think about "organizing knowledge". I did a search on "organizing knowledge" on Amazon. Here is what I found (all quotations are from the editorial reviews on Amazon):
- Organizing Knowledge (Jennifer Rowley and Richard Hartley) - "Incorporates extensive revisions reflecting the increasing shift towards a networked and digital information environment, and its impact on documents, information, knowledge, users and managers ... [offers] a broad-based overview of the approaches and tools used in the structuring and dissemination of knowledge".
- Organising Knowledge: Taxonomies, Knowledge and Organisational Effectiveness (Patrick Lambe) - Defines and discusses various taxonomic forms and how these "can help organizations to leverage and articulate their knowledge"
- The Organization of Information (Arlene Taylor) - "Provides a detailed and insightful discussion of such basic retrieval tools as bibliographies, catalogs, indexes, finding aids, registers, databases, major bibliographic utilities, and other organizing entities"
- The Intellectual Foundation of Information Organization (Elaine Svenonius) - Analyzes the foundations of information organization, and then presents three bibliographic languages: work languages, document languages, and subject languages. From the review, "The effectiveness of a system for accessing information is a direct function of the intelligence put into organizing it."
- Organizing Business Knowledge (Thomas Malone) - "Proposes a set of fundamental concepts to guide analysis and a classification framework for organizing knowledge, and describes the publicly available online knowledge base developed by the project, which includes a set of representative templates and specific case examples as well as a set of software tools for organizing and sharing knowledge"
As you can see, there is some interesting material out there, and some mundane stuff. I have ordered several of the books listed above and will report on them in future posts on this blog. Hopefully, the information will be of help to all of us.
Thursday, April 30, 2009
The premise is that experts include everything and the kitchen sink in an ontology (because they know so much about it) and use technology-specific language (which means little to people outside the domain of expertise). So, there ends up being a mapping problem between experts and lay people, and therefore between computers programmed (by people) to search for certain information.
On the importance (or curse) of mapping, I totally agree. However, the issue that peaks my interest is not the mapping between lay people and domain experts, as much as the mapping between perspectives of different groups in a business. These perspectives are what define the groups' vocabularies and ontologies. There is no single, "right" perspective - and there is a huge need to map and align the perspectives - to allow the unimpeded flow of information between groups, and to correct inconsistencies.
That is why I advocate mapping to an upper ontology. Upper ontologies capture general and reusable terms and definitions (for more information, see my earlier post). They should not restrict a mapping to a certain perspective, but allow all the perspectives to be aligned. (That is also why you may need more than one.) There will certainly be subsets and supersets of information, as well as information in only one perspective. That is to be expected. However, the relationships should be known, mappable and should NOT conflict.
Getting back to the article, it does highlight a few things to help with the "curse of knowledge":
- Focus on the intent of the ontology, instead of the details (However, I think that you need both.)
- Define small, focused ontologies, each with a single intent and extensions for details
- Determine the core concept(s) and label them
Tuesday, April 28, 2009
These articles with their grabbing titles first hook me, and then make me think. None of them are as dismissive of modern technology as their titles suggest. However, they are clearly not entirely positive on some of the impacts of technology on us as humans, especially on our children.
For some technologists, the articles are dismissed as fear-mongering. Read Write Web (RWW) had a post that did just that ("Twitter Leads to Immorality? C'mon"). One thing that jumps out at me is the difference in the single word immorality in the RWW title, versus amorality in the ScienceDaily title. Amorality is actually outside the sphere of morality (it is not moral or immoral). However, immorality is a lack of morals. There is a big difference.
Let me quote some of the RWW article, which itself includes quotes from the original work reported by ScienceDaily...
"According to first author Mary Helen Immordino-Yang, "for some kinds of thought, especially moral decision-making about other people's social and psychological situations, we need to allow for adequate time and reflection." Unfortunately, in our "real-time" web of information flow, some things happen too fast for us to process. This leads to us never being able to fully experience emotions about other people's psychological states. "That would have implications for your morality," said Immordino-Yang. ...
Media scholar Manuel Castells, holder of the Wallis Anneberg Chair of Communication Technology and Society at USC went on to further interpret the findings saying, "in a media culture in which violence and suffering becomes an endless show, be it in fiction or in infotainment, indifference to the vision of human suffering gradually sets in."
We can't help but feel we've heard similar strains of this same argument before. Doesn't it remind you of that old saying "TV will rot your brain?" Or maybe it's a throwback to the worrisome findings from the past decade about how violent video games supposedly lead to actual violence. ...
But is digital media really that bad? We think not. Maybe we can't properly feel the correct amount of compassion or pain when watching the Twitter stream update in TweetDeck, but is the Twitter stream really the place to go to experience these emotions anyway?"The last sentence is indeed the question. At issue is the vast amount of time and attention that is paid to technological access of information, versus what is learned from person-to-person communication, self-reflection, deep reading, etc.
I would describe the current digital environment as one full of “distractions”. You can (simultaneously) carry on 5 different IMs, listen to music with an earbud in one ear, watch/listen to TV (and change the channel incessantly), and be online on your computer. I have seen it done! So, how do people today learn empathy, and to deal with quiet, with frustration, with maintaining focus while doing boring, mundane work? More and more, I see people who cannot read and write English - they instead read and write IM TXT (shortened words, no capitalization, no punctuation, ...). That is hardly the best for conveying deep thoughts!
Many children today (including mine) do not want to be in a quiet space because it is boring. Many want external influences to soothe them. Reading is a last resort activity, when there is nothing “more interesting” to do. It takes too much time!
We have created a world of constant stimulation and immediate reward – which does not equip us to live in human time (versus computer time), to learn to understand ourselves and others, and to deal with other people as well as life’s boredom and frustrations.
As technologists, I argue that we have a responsibility to at least understand the impacts of technology, if not work to correct them!
Monday, April 27, 2009
OWL builds on RDF and RDF-Schema (I talked about these briefly in an earlier post). From RDF and RDF-S, you get the ability to define classes (types of things), subclasses (more specific types of things), properties (and tie the applicability of the properties to different classes), and individuals (of a class). You can also label everything, and say where the concepts were defined using standard annotations. Interesting, but not enough, IMHO.
OWL then adds equality/inequality information for classes and properties, various property characteristics and restrictions (including cardinality), union/intersection and complement of classes, versioning information (like what an ontology is incompatibleWith), and other semantic details such as defining a class by an enumeration of its members or by specific values of its properties!
Sounds interesting - but what does this really mean? How about some examples?
- equivalentClass, equivalentProperty - You can say that a class "Dog" and a class "Canine" are equivalent. Therefore any instances of the class, Canine, are reasoned to also be instances of the class, Dog. This is very necessary when aligning databases and different representation schemes!
- disjointWith (class-level) - You can say that a class "Man" is disjoint from the class "Woman". Therefore, an instance cannot simultaneously belong to both classes - and a reasoner can infer that an instance that IS a Man, IS NOT a Woman.
- sameAs and differentFrom (individual-level) - You can say that the Man named "Frank" is different from the Man named "George". In the Open World Assumption (see my definitions post if this doesn't mean anything to you), the Man Frank and the Man George may be the same person. Sometimes this is useful, and sometimes it is dangerous. Both Open and Closed World Assumptions come in handy.
- inverseOf (property-level) - You can say that the property HasHusband is the inverse of HasWife. And, knowing that John's HasWife property is set to Mary, means that Mary's HasHusband property should/could be set to John.
- TransitiveProperty, SymmetricProperty and FunctionalProperty - These concepts take you back to your math days in high school. A transitive property means that if a is related to b, and b is related to c, then a is related to c. The most often cited example of this is ancestry. If Mary is the ancestor of Bob, and Bob is the ancestor of Sue, then Mary is the ancestor of Sue. A symmetric property is one that holds in any order. An example of this is friendship - if Mary is friends with Julie, then Julie is friends with Mary (or so we hope). Lastly, a functional property means that there is zero or one unique value for an individual (remember a function has one output for each input). Cool!
- allValuesFrom, someValuesFrom (property restrictions) - You can say that all values of the property, HasWife, must come from the class, Woman. Or, using the someValuesFrom restriction, you can indicate that at least one of the values of a property must range over the instances of a particular class. For example, a medical doctor can have a property HasCollegeDegrees, that must include at least one instance of the M.D. degree.
- oneOf - A great example of this is the property DaysOfTheWeek, which must be oneOf Sunday, Monday, Tuesday, ... (at least in English). This is your basic enumeration!
- hasValue - You can define a class by the value of its properties. For example, a Student is defined as someone with a value in its property, CurrentSchool.
- unionOf, complementOf, intersetionOf (class-level) - These allow the combination of classes and restrictions. A great example is the membership of the European Union - which can be defined as the union of all citizens of all the member countries.
Wednesday, April 15, 2009
What is a possible answer? Take the local, private and community ontologies of your business and map them "up" to an existing "standardized ontology" - such as exists in medicine or even construction - see, for example, ISO 15926. (I already discussed the possibilities of ontology alignment provided by the Semantic Web in earlier posts, and will provide more details over the next few weeks.)
Or, if a standard ontology does not exist, create one from the local ontologies by mapping the local ones to one or more "upper" ontologies. At this point, some people will say "ughhh" another term - "upper" ontology - what the heck is that? Upper ontologies capture very general and reusable terms and definitions. Two examples that are both interesting and useful are:
- SUMO (http://www.ontologyportal.org), the Suggested Upper Merged Ontology - SUMO incorporates much knowledge and broad content from a variety of sources. Its downside is that it is not directly importable into the Semantic Web infrastructure, as it is written in a different syntax (something called KIF). Its upsides are its vast, general coverage, its public domain IEEE licensing, and the many domain ontologies defined to extend it.
- Proton (http://proton.semanticweb.org/D1_8_1.pdf), PROTo ONtology - PROTON takes a totally different approach to its ontology definition. Instead of theoretical analysis and hand-creation of the ontology, PROTON was derived from a corpus of general news sources, and hence addresses modern day, political, financial and sports concepts. It is encoded in OWL (OWL-Lite to be precise) for Semantic Web use, and was defined as part of the European Union's SEKT (Semantically Enabled Knowledge Technologies) project, http://www.sekt-project.com. (I will definitely be blogging more about SEKT in future posts. There is much interesting work there!)
Monday, April 13, 2009
However, to understand, support and protect a business, its work products and its organizational units, you need all the following knowledge:
- Who - the social/organizational aspects to understand responsibilities, privileges, networks, interdependencies and more
- How - the processes and tasks of the business, describing the states of the business and interactions among its agents, hopefully tying together and allowing a progression from high-level descriptions to the necessary level of detail (and this information should address both the human and automated aspects of the processes and tasks)
- Why - the intentions, goals and beliefs of the business (they really need to be written down and used in decision-making)
- What - the basic terminologies and domain concepts, along with their meanings, details/attributes and relationships
Any of this data defined and used in isolation is incomplete and therefore, subject to interpretation and erroneous assumptions. At worst, the data disagrees from one usage or silo to the next, and then your business is just waiting for the next fire to put out!
It is necessary to find a way to tie all the usages and silos of information (written of course in different syntaxes, from different vendors, using different standards) together!
Thursday, April 9, 2009
- Concept = class = noun = vocabulary word
- Triple = subject-predicate-object (such as "John went to the library" - where "John" is the subject, "went-to" is the predicate, and "library" is the object)
- Role = relation = association = the predicate in the triple = verb
- Instance = a specific occurrence of a concept or relationship (can be manually defined or inferred)
- Axiom = a statement of fact/truth that is taken for granted (i.e., is not proved)
- Inference = deriving a logical conclusion from definitions and axioms
- T-Box = a set of concepts and relationships (i.e., the definitions)
- A-Box = a set of instances of the concepts and relationships
- Hierarchy = arrangement of concepts or instances by some kind of classification/relationship mechanism - typical classification hierarchies are by type ("is-a" relationships - for example, "a tiger is a mammal") or by composition ("has-a" relationships - for example, "a person's name has the strucutre: personal or first name, zero or more middle names, and surname or last name")
- Subsumption = is-a classification (determining the ordering of more general to more specific categories/concepts)
- Consistency analysis = check to see that all specific instances make sense given the definitions, rules and axioms of an ontology
- Satisfiability analysis = check to see that an instance of a concept can be created (i.e., that creating an instance will not produce an inconsistency/error)
- Key = one or more properties that uniquely identify an individual instance of a concept/class
- Monothetic classification = identifying a particular instance with a single key
- Polythetic classification = identifying a particular instance by several possible keys which may not all exist for that instance
- Surrogate key = an artificial key
- Natural key = a key that has semantic meaning
- CWA = Closed World Assumption (in databases) = anything not explicitly known to be true is assumed to be false (for example, if you know that John is the son of Mary but have a total of 3 children defined - John, Sue and Albert - and you ask who all the children of Mary are ... you get the answer "John" - 1 child)
- OWA = Open World Assumption (in semantic computing) = anything not explicitly known is assumed to be true (using the same scenario above, asking the same question ... you get the answer "John, Sue and Albert" - 3 children)
A description-logic reasoner (DL reasoner) takes concepts, individual instances of those concepts, roles (relationships between concepts and individuals) and sometimes constraints and rules - and then "reasons" over them to find inconsistencies (errors), infer new information, and determine classifications and hierarchies. Some basic relationships that are always present come from first-order logic - like intersections, unions, negations, etc. These are explicitly formalized in languages like OWL.
The reasoner that I am now using is Pellet from Clark and Parsia (http://clarkparsia.com/pellet/). It is integrated with Protege (which I mentioned in an earlier post), but also operates standalone. The nice thing is that Pellet has both open-source and commercial licenses to accomodate any business model - and is doing some very cool research on data validation and probabilistic reasoning (which you can read about on their blog, http://clarkparsia.com/weblog/).
How cool is it when you can get a program to tell you when your vocabulary is inconsistent or incomplete? Or, when a program can infer new knowledge for you, when you align two different vocabularies and then reason over the whole? No more relying on humans and test cases to spot all the errors!
Wednesday, April 8, 2009
Typically, you hear about semantic web as a way for computers to understand and operate over the data on the web, and not just exchange it via (mostly XML-based) syntaxes. However, to "understand" something, you must speak a common language and then have insight into the vocabulary and concepts used in that language. Well, the semantic web languages exist - they are standards like RDF (Resource Description Language), RDF-S (RDF Schema), and OWL (Web Ontology Language). These syntaxes carry the details of the concepts, terms and relationships of the vocabulary. (Note that I provided only basic links to the specifications here. There is much more detail available!)
One problem is defining the syntax - and we are getting there via the work of the W3C. The next problem is getting agreement about the vocabulary. That is much harder - since every group has their own ideas about what the vocabulary should be. So, here again, the Semantic Web steps in. Semantic Web proponents are not just researching how to define and analyze vocabularies (you could also use the word, "ontology", here) - but how to merge and align them!
So, where does this intersect with business? Businesses have lots of implicit vocabularies/ontologies (for example, belonging to procurement, accounts payable, specific domain technologies integral to the organization, IT and other groups). And, business processes and data flows cross groups and therefore, cross vocabularies - and this leads to errors! Typically, lots of them!
Does this mean that everyone adopt a single vocabulary? Usually that is not even possible ... People who have learned a vocabulary and use it to mean very specific things, cannot easily change to use a new, different word. Another problem is agreeing on what a term means - like "customer" (is that the entity that pays for something, an end-user, or some other variant on this theme?).
Changing words will cause a slow down in the operations of the business due to the need to argue over terminology and representation. Then if a standard vocabulary is ever in place, there will be slowdowns and errors as people try to work the new vocabulary into their practices and processes. (BTW, I think that this is one reason that "standard" common models or a single enterprise information model are so difficult to achieve.)
How do we get around this? Enter the Semantic Web to help with the alignment of vocabularies/ontologies. But, first the vocabularies have to be captured. Certainly, no one expects people to write RDF, RDF-S or OWL. But, we all can write our natural languages - and that takes us back to "controlled languages" as I discussed in my previous post. I have a lot of ideas on how to achieve this ... but, this will come in later posts.
So, more on this in later weeks, but hopefully this post provides some reasons to be interested in the semantic web (more than just its benefits to search) ...
Tuesday, April 7, 2009
Why do systems engineers think that these are cool? Because they are fun to create, and are simpler to write and use than writing in a programming language or XML.
But, when is enough enough? Every application could end up with its own DSL - because each application expects specific data in some specific format. So, just like today, there are tons of user interfaces (but the industry is trying to standardize on a few due to customer pushback), we have (or if you don't think that we are there yet, will likely have) tons of DSLs. Talk about a business/IT nightmare! Think about everything that you have to remember!
What are some alternatives? How about natural language? Not the messy grammar and slang of everyday natural language ... but a "controlled" version of this. What does this mean? Well, I like using Wikipedia (when its definitions are solid) - so, let's take their definition (http://en.wikipedia.org/wiki/Controlled_natural_language). Controlled NLs are "subsets of natural languages, obtained by restricting the grammar and vocabulary in order to reduce or eliminate ambiguity and complexity. Traditionally, controlled languages fall into two major types: those that improve readability for human readers (e.g. non-native speakers), and those that enable reliable automatic semantic analysis of the language."
Two excellent examples of the latter category are Attempto Controlled English (http://attempto.ifi.uzh.ch/site/) and a translation of OMG's Semantics of Business Vocabulary and Business Rules into English (in the appendices of the spec at http://www.omg.org/spec/SBVR/1.0/). (Note: the "official" SBVR definition is XML-based, and not very human readable/writable or anything close to controlled English.) :-) An interesting fact is that Attempto Controlled English has an add-in for Protege which allows the translation of English-defined vocabulary and rules into OWL and SWRL! (If you don't know what these are - no worries - I plan on spending some time defining them ... they are semantic computing standards.)
At the end of the day, I think that we need more natural interfaces and input mechanisms for business and IT people, and less techy specific languages (insert here: DSLs).
Monday, April 6, 2009
I am now consulting and working on software for capturing, analyzing and using the "implicit" ontologies and domain knowledge that exists in business people's heads. The work is based on several technologies addressing:
- ontology development and alignment
- controlled natural language processing
- semantic web
- knowledge engineering
- business process development and modeling
- and more!
What I will try to do in my posts is to explain the technology and how it can be useful. My goal is to communicate with both business and IT people - providing summaries, additional thoughts, and technical details. I also will provide pointers to the basic information and research, as well as a pointers to my own work. There is a lot out there, if you have the time to investigate it!
So, please stay tuned. And, I hope that this blog will be of great value to you.