2009-12-30

booking robotics

11.30: todo.adde/lib/pdf's on knowledge representation:

web.adde/wordnet:

WordNet is an online lexical reference system.
Word forms in WordNet are represented in their familiar orthography;
word meanings are represented by synonym sets (synsets)
- lists of synonymous word forms that are interchangeable in some context.
Two kinds of relations are recognized: lexical and semantic.
Lexical relations hold between word forms;
semantic relations hold between word meanings.
To learn more about WordNet, the book
containing an updated version of "Five Papers on WordNet"
and additional papers by WordNet users .
Several "standoff" files provide further semantic information

* The Morphosemantic Database
(Semantic relations between morphologically related nouns and verbs)
* The Teleological Database
(an encoding of typical activity for which artifact was intended)
* "Core" WordNet
A semi-automatically compiled list of 5000 "core" word senses in WordNet
(approximately the 5000 most frequently used word senses,
followed by some manual filtering and adjustment).
* Logical Forms for Glosses (Core WordNet Nouns)
Logical forms for the glosses of the ~2800 noun senses in core WordNet, in plain text format, using eventuality notation.
* Logical Forms for Glosses (All WordNet)
Logical forms for most of the glosses in WordNet 3.0
(except where generation failed), in XML format, using eventuality notation.

Texai is an chatbot that intelligently seeks to
acquire knowledge and friendly behaviors.
Important components include the RDF Entity Manager, the Texai Lexicon,
and Incremental Fluid Construction Grammar.
The blog

Cyc is an artificial intelligence project(unix)
that attempts to assemble a comprehensive ontology
and knowledge base of everyday common sense knowledge,
with the goal of enabling AI applications to perform human-like reasoning.
Now that wikipedia and opencyc are linked,[11]
a version of Wikipedia is being developed that enables
browsing the encyclopedia by cyc concepts.[12]

Cyc ontology whose domain is all of human consensus reality.
* Links between Cyc concepts and WordNet synsets.
* NEW! 100,000+ "broaderTerm" assertions, in addition to the previous generalization (subclass) and instance (member) assertions, to capture additional relations among concepts.
* NEW Links between Cyc concepts (including predicates) and the FOAF ontology.
* NEW! Links between Cyc concepts and Wikipedia articles.
* The entire Cyc ontology containing hundreds of thousands of terms,
along with millions of assertions relating the terms to each other,
forming an ontology whose domain is all of human consensus reality.
* English strings (a canonical one and alternatives)
corresponding to each concept term, to assist with search and display.
* The Cyc Inference Engine and the Cyc Knowledge Base Browser
are now Java-based for improved performance and increased platform portability.
* Documentation and self-paced learning materials
to help users achieve a basic- to intermediate-level understanding
of the issues of knowledge representation
and application development using Cyc.
* A specification of CycL,
the language in which Cyc (and hence OpenCyc) is written.
* A specification of the Cyc API
for application development.

I've been following the Texai discussion,
but I don't see how it overcomes the shortcomings of Cyc
such that it requires a very expensive process of
encoding knowledge explicitly.
This may have seemed like a sensible approach in the 1980's
when we lacked the computing power and training data to
implement statistical approaches.
But in hindsight the programming effort was grossly underestimated,
and we still don't know.
IMHO Cyc failed because it is based on models of artificial language
unrelated to the way children learn natural language.
In artificial languages,
you have to parse a sentence before you can understand it.
In natural language,
you have to understand a sentence before you can parse it.
I entirely agree with [that] comment above.
The notion of bootstrapping in the Texai English dialog system
is to learn the meanings of the most frequently occurring words
in the definitions of its yet-to-be-learned vocabulary,
and then by reading their definitions,
learn the meanings of the remaining words
with help from a multitude of volunteer mentors.
In particular Matt said:
you have to understand a sentence before you can parse it.
An analysis of the word usage frequency in the Texai vocabulary definitions
reveals that knowing perhaps only 10000 frequently occurring words
should be enough to understand half
of the whole lexicon of 85000 English words.
I acknowledge that there must be a very expensive process of
encoding knowledge explicitly.
Like Cycorp's initial approach for DARPA's
Rapid Knowledge Formation project,
for which I was the first project manager,
Texai will use English dialog to rapidly acquire knowledge.
I hypothesize that such dialog greatly reduces the expense
of teaching new facts to the system,
and also permits a vast multitude of volunteer mentors to divide the effort:
many hands make light work.

. a family of knowledge representation languages for authoring ontologies .
OWL Characteristics
OWL provides the capability of creating classes, properties, defining instances and its operations.
* Classes:
User-defined classes which are subclasses of root class owl:Thing. A class may contain individuals, which are instances of the class, and other subclasses. For example, Employee could be the subclass of class owl:Thing while Dealer, Manager, and Labourer all subclass of Employee.
* Properties:
A property is a binary relation that specifies class characteristics. They are attributes of instances and sometimes act as data values or link to other instances.
There are two types of simple properties:
datatype and object properties.
Datatype properties are relations between instances of classes
and RDF literals or XML schema datatypes.
* Instances:
Instances are individuals that belong to the classes defined. A class may have any number of instances. Instances are used to define the relationship among different classes.
* Operations:
OWL supports various operations on classes such as union, intersection and complement. It also allows class enumeration, cardinality, and disjointness.
TDWG have an ontology for taxonomy[21].