Saturday, September 13, 2008

Musing

If you watch the OpenCyc Semantic Web main page as obsessively as we do, you may have noticed a new feature: a purplish box that displays random "thoughts" from Cyc, with the concepts linked to the corresponding OpenCyc endpoints. It's a quick hack, but it illustrates the sort of thing that you can do if you combine the power of the free OpenCyc vocabulary with some of the sophisticated AI that makes up the whole Cyc system. In this case, what we're taking advantage of is Natural Language Generation. Those sentences you see aren't just strings; they're parts of the Cyc Knowledge Base that have been converted into English, automatically. As well as NL Generation, Cyc has capabilities for some Natural language understanding too; we'll be demonstrating that in the coming months.

In the meantime, play around with the Random Cyc Thoughts - get an idea of just some of the sorts of things you can say with the OpenCyc vocabulary. And keep coming back here to check for updates.

Oh. A couple of final words: not everything that appears in the Cyc Thoughts is actually believed, as it were, by Cyc. Some of the thoughts are things the KB thinks might be true. Others are things that are known to be false, in the real world, and only true in a particular context (MicroTheory). It's also worth observing that some of the thoughts are obvious: but that's part of the point - computers need to know the obvious things that we all know, too.
Reblog this post [with Zemanta]

Thursday, September 4, 2008

Cyclpedia

A picture of the Academy of Performing Arts, U...Image via Wikipedia By popular demand, and also because we've always liked the idea of a Wikipedia with articles computers can understand, we've just added a large number of links from OpenCyc to Wikipedia. In the past, this sort of thing has been a tedious, manual process, or has relied only on concept names, making a high-error rate unavoidable. Fortunately, Olena Medelyan and Catherine Legg, of the University of Waikato in New Zealand have been working on automating the process, making use of the background knowledge in Cyc to improve accuracy. They produced three sets of mappings with different coverage vs. accuracy trade-offs. We chose the moderately accurate mapping that included more than 42000 concepts. Here's a link to the full paper that describes what what they did, and what we used.

We especially like the fact that they called Cyc the "Rolls Royce of formal ontologies"!

Reblog this post [with Zemanta]
Reblog this post [with Zemanta]

Wednesday, September 3, 2008

OpenCyc Brings Meaning to the Web

Cycorp LogoOpenCyc a vast open-source knowledge base of concepts for the Web, was released publicly today by Cycorp, Inc. Using OpenCyc terms to represent Web content enables true semantic interoperability. While semantic web standards such as RDF/OWL provide a unifying framework for meaningful information exchange among applications, without a substantial shared vocabulary these exchanges will be quite restricted.  The OpenCyc concept ontology removes this barrier by providing an extensive network of terms, in forms that can be understood both by computers and humans, which ensures the applications will have something to talk about.  Taken together, it is now possible to develop web applications and mash-ups that understand, and can reason about, web content as well as enterprise and personal data and meta-data.

“You just can't put forward concepts and knowledge relationships like flinging hash,” said Michael Bergman of Zitgist LLC, which creates semantically enabled software for data integration. “Real information integration needs both context and coherence,” he said. “No other structure is in the same league as the common sense basis of Cyc; we've found its knowledge framework to be flexible enough for any customer context.”

“In the Cyc project, we've been working to develop the knowledge representations and reasoning capabilities for intelligent software that collaborates with its users. Although researchers have been using some of the results for a few years now, the recent growth of the Semantic Web presented both a need and opportunity to have a dramatically greater impact. OpenCyc is the language that can tie the disparate parts of the Semantic Web together,” said Douglas Lenat, founder of the Cyc Project, and CEO of Cycorp Inc. “By representing and sharing knowledge in the same language, Semantic Web applications can be enormously more powerful.”

OpenCyc is a wide-ranging and increasingly comprehensive ontology that describes things and events in the world in logical terms that computers can reason about. Its purpose is to provide a shared vocabulary for Web applications, allowing them to automatically reason about, and integrate, the content of web sites and web services. The OpenCyc ontology and knowledge base goes beyond tag-sets, taxonomies, and other reference vocabularies, because it has been designed and extensively tested for use in automated reasoning. As Andraž Tori, CTO of Zemanta Ltd. sees it, “Common semantic vocabularies are the missing link for the semantic web. Blogs cover an incredible range of subjects, so meaning-based content integration using the huge OpenCyc ontology can provide an amazing user experience for bloggers and other content authors.”

On the Web, OpenCyc is available as a set of stable Web addresses (URIs) that are readable both by machines, using the Semantic-Web standard OWL language, and by human beings, using a standard web browser such as Firefox or Internet Explorer. OpenCyc concepts can be accessed at http://sw.opencyc.org.

THE POWER OF A SHARED ONTOLOGY


Imagine if a blogger writing about the dropping prices of iPhone clones were automatically alerted to a news release from GE on a new OLED manufacturing technology. This becomes possible as on-line content like business directories and product listings adopt the shared OpenCyc vocabulary.

The OpenCyc ontology provides relevant concepts:
“Ultra Thin Flat Panel Display”, “OLED Display”, “iPhone”, “GE”, ...
relations among these concepts:
“makesProductType”, “partTypes”, “createdBy”, “competitor” ...
and relevant background knowledge, about OLEDs being a kind of thin display screen, for example.

Business rules from on-line content producers or aggregators can also use OpenCyc terms, adding information like: “If someone is interested in a product, information about components of that product may be relevant.” Or “If someone is interested in a product component, they may be interested in a company’s competitors who also rely on that component.” In this way, Web software can link previously disparate information and rules in powerful new ways.

BASED ON THE CYC ARTIFICIAL INTELLIGENCE PROJECT

The OpenCyc concepts and relationships are derived from, and form the backbone of, the Knowledge Base in the Cyc System. Over the past 24 years, the Cyc project has been capturing and representing “common sense” knowledge – real-world concepts and the relationships among them – in a way that allows computers to reason about them. The OpenCyc ontology contains machine- and human-readable descriptions of around 150,000 concepts, ranging from very general (“Idea”, “Physical object”, “Time”) to the very specific (“Lee Harvey Oswald”, “Kern Primrose Sphinx Moth”, “Valentine's Day”), from the sublime (“Romantic Love”, “the Mona Lisa”, “Chocolate”) to the ridiculous (“Clown”, “The Three Stooges”, “Monty Python's Flying Circus”). In addition, unlike other ontologies that provide only a handful of ways of expressing relations among concepts (such as subclass, name, knows, etc.), the OpenCyc ontology includes many thousands of type of relations such as “biological grandmother”, “antidote”, “longitude”, “author of literary work”, etc., etc., etc. The extensive scope of these terms and relations has led Péter Vaskó, CEo of iGlue, to observe: "We think that a rich ontology like OpenCyc can enable us to extend iGlue with new information or to validate the existing data using logical inference, and it has the potential to provide a base for a common semantic infrastructure as a sort of entity Yellow Pages."

As the underlying Cyc knowledge base continues to grow, Cycorp anticipates ongoing updates to the OpenCyc ontology, ensuring that it is ever more comprehensive and up-to-date. Subsequent releases will include further integration with other ontologies and semantic web frameworks as well as mechanisms to allow users to comment on and extend the OpenCyc ontology. Today's release of the OpenCyc semantic web endpoints also serves as the foundation for a planned roll-out of related semantic web services and applications that will leverage both OpenCyc concepts as well as the knowledge and inference capabilities of the full Cyc system.

OpenCyc is provided as open-source under the Creative Commons 3.0 Attribution license, allowing it to be easily used, at no cost, by both industry and individual developers and web-designers; the complete ontology with all concepts, definitions, terms, relationships correspondences to natural-language terms can be freely downloaded.

SAMPLE OPENCYC CONCEPTS

The sample OpenCyc concepts described above can be reached at:
To find any of the other 150,000 currently published OpenCyc concepts, check out the search tool at http://sw.opencyc.org.


Reblog this post [with Zemanta]