Friday, November 20, 2009

Hello Worlds!

We’re not yet where we’d like to be,  but we’ve been working hard behind the scenes to give developers access to the power of semantic computing with Cyc.  You can get OpenCyc 2.0, which is cleaner and more complete, on SourceForge. Every term in Cyc is available as a Semantic Web URI for example, here is the concept for Malmö, Sweden where this post is being written, and there are open web services for concept lookup and taxonomic reasoning. If you’re a researcher or developer, and you want to access richer capabilities, you can license  ResearchCyc, and may want to attend Cycorp’s  Semantic Training in May 2010.

But maybe you just want to get an idea of why semantic computing with Cyc is different. Here (from Tony Brusseau, one of our senior developers, and Larry Lefkowitz, who had an idea) is our version of “Hello World”. Make that “Hello Worlds”:

import org.opencyc.cycobject.CycList;
import org.opencyc.cycobject.CycObject;
import org.opencyc.cycobject.CycConstant;
import org.opencyc.cycobject.DefaultCycObject;
import org.opencyc.api.CycAccess;

public static final void helloWorlds() {
   CycAccess access = null;
   try {
     access = new CycAccess("localhost", 3600); // @note: use actual server and port
     CycConstant planetInSolarSystem = (CycConstant)
       DefaultCycObject.fromCompactExternalId("Mx4rWIie-jN6EduAAADggVbxzQ", access);
     CycList planets = access.getAllInstances(planetInSolarSystem);
     for (Object planet : planets) {
       System.out.println("Hello '" +
           access.getImpreciseSingularGeneratedPhrase((CycObject)planet) + "'.");
   } catch (Exception e) {     e.printStackTrace();   }
finally {     if (access != null) {       access.close();     }   }

Hello 'Jupiter'.
Hello 'Mercury'.
Hello 'Venus'.
Hello 'Earth'.
Hello 'Mars'.
Hello 'Saturn'.
Hello 'Uranus'
Hello 'Neptune'.

- Witbrock

Reblog this post [with Zemanta]

Tuesday, March 10, 2009

Wolfram Alpha

Stephen Wolfram generously gave me a two-hour demo of Wolfram Alpha last evening, and I was quite positively impressed. As he said, it's not AI, and not aiming to be, so it shouldn't be measured by contrasting it with HAL or Cyc but with Google or Yahoo. At its heart is a formal Mathematica representation. Its inference engine is basically a large number of individually hand-engineered scripts for tapping into data which he and his team have spent the last several years gathering and "curating". For example, he has assembled tables of historical financial information about countries' GDP's and about companies' stock prices. In a small number of cases, he also connects via API to third party information, but mostly for real-time data such as a current stock price or current temperature. Rather than connecting to and relying on the current or future Semantic Web, Alpha computes its answers primarily from his own curated data to the extent possible; he sees Alpha as the home for almost all the information it needs, and will use to answer users' queries.

In an important sense, Alpha is a logical extension of Mathematica: it extends the range of types of information for which significant power can be gained by manually, and exhaustively, enumerating a large set of cases: airplane designs, cities, currencies, etc. I.e., Alpha extends what Mathematica has done previously for things like chemical compounds, geometric surfaces, topological configurations, arithmetic series, trigonometric ratios, and equations. In the new cases, as Mathematica did in those abstract math cases, Alpha excels at not just retrieving the stored data but performing various appropriate numeric calculations on the data, and displaying the results in beautiful graphs and easily comprehended tables for the user.

The resulting mosaic covers a large portion of the space of queries that the average person might genuinely want to ask, in the course of their day. The interface is not exactly natural language, but can be treated by the user as though it were -- just as users of browsers can treat them as though they parsed sentences even though they don't. A better way to think of it is a DWIMM ("do what I might mean"), so if you type in something like "gdp France / Germany", it calculates and returns a graph of the relative fraction of France's annual GDP to Germany's GDP, over the last 30 years or so. If you just type in "gdp", it looks up your local host and (in my case) displays the GDP of the USA over the last 30 years, plus various pieces of information about what gross domestic product is, from a mathematical formula perspective but not from a semantic one. It does not have an ontology, so what it knows about, say, GDP, or population, or stock price, is no more nor less than the equations that involve that term. One vulnerability that this engenders in Alpha is that errors in the data may go unnoticed for a long time; a positive way of saying this is that one could align Alpha's terms to an ontology and knowledge base, and use it to catch some fraction of errors as outright implausible violations of basic knowledge (e.g., Miami's population dropping by exactly a factor a ten during the month of October, 2006.)

Another example of DWIMM occurs if you type in a complicated mathematical formula, sloppily, with run-on variables, parenthesis errors, typos, etc. In those cases, Alpha does a great job of guessing what you could possibly have meant by that, something close to what you typed in which would be a non-trivial graph, and displays that graph. If you type in a string of letters that's parsable only as a chemical compound, it assumes that you want information about that compound. If you type in IL where it expects a state, it will interpret that as Illinois; where it expects a country, it will interpret that as Israel.

For those who are familiar with and enamored by Mathematica's powerful theorem prover, it should be mentioned that that is, for the moment, turned off, for reasons having to do with computational cost -- i.e., response time -- and also to prevent "explosions" of less and less relevant answers from being produced. Cautiously, conditionally, at some time in the future, expect to see that theorem prover come into play.

There are two important dimensions I want to discuss about Wolfram Alpha, besides the remarks I've already made here. (1) What sorts of queries does it not handle, and (2) When it returns information, how much does it actually "understand" of what it's displaying to you? There are two sorts of queries not (yet) handled: those where the data falls outside the mosaic I sketched above -- such as: When is the first day of Summer in Sydney this year? Do Muslims believe that Mohammed was divine? Who did Hezbollah take prisoner on April 18, 1987? Which animals have fingers? -- and those where the query requires logically reasoning out a way to combine (logically or arithmetically combine) two or more pieces of information which the system can individually fetch for you. One example of this is: "How old was Obama when Mitterrand was elected president of France?" It can tell you demographic information about Obama, if you ask, and it can tell you information about Mitterrand (including his ruleStartDate), but doesn't make or execute the plan to calculate a person's age on a certain date given his birth date, which is what is being asked for in this query. If it knows that exactly 17 people were killed in a certain attack, and if it also knows that 17 American soldiers were killed in that attack, it doesn't return that attack if you ask for ones in which there were no civilian casualties, or only American casualties. It doesn't perform that sort of deduction. If you ask "How fast does hair grow?", it can't parse or answer that query. But if you type in a speed, say "10cm/year", it gives you a long and quite interesting list of things that happen at about that speed, involving glaciers melting, tectonic shift, and... hair growing.

This brings up the final issue I wanted to discuss: how much of what it returns does it understand. At one extreme is, say, Google, which responds to almost anything like a faithful puppy bringing in the morning newspaper without understanding much of anything it's fetching (recognizing words in what it returns, often leading to amusing or hair-raising inappropriate "ads" being displayed, and leading to tons of false positives and false negatives). At the other extreme is, say, Cyc, which only can answer a small fraction of user queries, but can answer ones that require common sense (not just common sense queries like "Do surgeons often operate on themselves?", but ones where the logical application of such knowledge is required to correctly disambiguate and parse the user's query containing pronouns, elisions, ambiguous words, ellipsis, and so on) and where every piece of the query and every piece of the answer is as deeply understood as, say, arithmetic. Wolfram Alpha is somewhere around the geometric mean of those two extremes. It handles a much wider range of queries than Cyc, but much narrower than Google; it understands some of what it is displaying as an answer, but only some of it -- e.g., the above example about it displaying the fact that hair grows 10cm/year if you ask for things that happen at 10cm/year but not if you ask how fast hair grows; or being able to report the number of cattle in Chicago but not (even a lower bound on) the number of mammals because it doesn't know taxonomy and reason that way. If the connection between turbulent air and plane travel isn't represented via an equation, it isn't represented at all. As with many of these sentences, I want to add "...yet", because Dr. Wolfram is very much aware of the limitations of his system, and has plans for addressing many of them as Alpha continues to develop.

The bottom line is that there are a large range of queries it can't parse, and a large range of parsable queries it can't answer even when it can answer the constituents out of which they should be answerable, but it handles a huge range of numeric and scientific queries correctly even in its current state. And Dr. Wolfram and his team are chipping away at the natural language blocks, at the holes in the curated data repository, and at increasing the type and depth of logical combination of constituents, one by one, in priority order, just as they should. I went in to the demo concerned that this might be a competitor to Cyc, given its "hand-curate knowledge and engineer it, versus let anyone add anything" philosophy, but came out of last night's demo and discussion seeing Alpha as a complementary technology. I would invest in this, literally and figuratively. If it is not gobbled up by one of the existing industry superpowers, his company may well grow to become one of them in a small number of years, with most of us setting our default browser to be Wolfram Alpha.

Doug Lenat

Reblog this post [with Zemanta]

Friday, February 6, 2009

The Kindness of Recognition

Cycorp was proud and grateful to hear that AAAS has elected our founder, Dr Doug Lenat, a fellow of the AAAS. AAAS was founded 151 years ago, and is the publisher of "Science", one of the world's best known and widely respected scientific journals.

Reblog this post [with Zemanta]

Monday, February 2, 2009

Cyc 101: Introduction to Cyc & Ontological Engineering

Next Session: May 11th - May 13th, 2009 -- Austin, Texas.

A key to semantic technologies is the ability to efficiently and accurately model knowledge. Ontological Engineering (OE), a term coined at Cycorp, is a methodology for representing knowledge about the world in a way that computers can reason about it. This introductory course will familiarize you with Cyc's powerful knowledge representation capabilities and tools and will provide ample opportunity to use those tools to represent semantic information in Cyc.
Cyc 101 is a three-day workshop, balancing focused lectures with hands-on practice to reinforce the concepts and techniques being presented. The course aims to have you spend as much time as possible interacting with Cyc. Upon completing the class, you should feel comfortable navigating Cyc's huge knowledge base (KB) and have a strong basic understanding of how to find and make use of relevant KB content. You will also learn how to extend Cyc's KB by entering simple facts and rules using the Cyc KB browser interface.

Even if you have read all of our online documentation, you will still be able to derive the following benefits from attending this class:

Hands-on, guided practice under the watchful eye of experienced Cyc developers;
A deeper understanding of the advantages and pitfalls of certain representational choices or design decisions. The intent behind such decisions is not always apparent in the online material, and sometimes is best conveyed through anecdotal examples and live interactions;
Answers to specific questions you may have about your envisioned use(s) of Cyc and new ideas of how Cyc might benefit your organization;
An opportunity to better understanding of Cycorp, the Cyc Foundation, and the ResearchCyc community and how each of these can support (and benefit from) your ongoing use of Cyc.

More details here.