Open Calais Tags

Open Calais Tags is a PHP class for extracting entities from text using Open Calais. Calais performs semantic analysis of the text, using natural language processing to identify concepts like people, companies and technologies discussed in the text. These are especially useful for suggesting tags for your content such as website articles or blog posts. You could even automatically tag archived content that would take days to go through manually.

You can download the class and example usage here:
dg_open_calais.zip (updated 3/23/2009)

Calais is free for both personal and commercial use, and usage of this class requires a Calais API key. Getting an API key is an easy, automated process. Just click the “Register” link at the top of the site, then request a key through their automated system.

The Open Calais Tags class takes a content string as input, as well as a number of options, and returns a multidimensional array as output. The array’s keys are the entity types detected in the text, and the values are the entities found.

Example input:

April 7 (Bloomberg) — Yahoo! Inc., the Internet company that snubbed a $44.6 billion takeover bid from Microsoft Corp., may drop in Nasdaq trading after the software maker threatened to cut its bid if directors fail to give in soon.

If Yahoo’s directors refuse to negotiate a deal within three weeks, Microsoft plans to nominate a board slate and take its case to investors, Chief Executive Officer Steve Ballmer said April 5 in a statement. He suggested the deal’s value might decline if Microsoft has to take those steps.

The ultimatum may send Yahoo Chief Executive Officer Jerry Yang scrambling to find an appealing alternative for investors to avoid succumbing to Microsoft, whose bid was a 62 percent premium to Yahoo’s stock price at the time. The deadline shows Microsoft is in a hurry to take on Google Inc., which dominates in Internet search, said analysts including Canaccord Adams’s Colin Gillis.

Example output:

Array
(
    [Industry Term] => Array
        (
            [0] => Internet
            [1] => software maker
            [2] => Internet search
        )
    [Person] => Array
        (
            [0] => Steve Ballmer
            [1] => Jerry Yang
            [2] => Colin Gillis
        )
    [Company] => Array
        (
            [0] => Google Inc.
            [1] => Canaccord Adams
            [2] => Yahoo!
            [3] => Microsoft Corp.
        )
    [Currency] => Array
        (
            [0] => USD
        )
)

Basic usage is simple. Create an instance of the class with your API key, and call the getEntities method using your content string.

require('calais.php');
$oc = new OpenCalais('your-api-key');
$entities = $oc->getEntities($content);

A number of settings exist which can be changed through setters on the OpenCalais object:

  • setAllowDistribution: true or false. Indicates whether the extracted metadata can be distributed by Calais. Defaults to false.
  • setAllowSearch: true or false. Indicates whether future searches can be performed on metadata through the Calais API. Defaults to false.
  • setExternalID: Allows you to set an ID for the content to pass on to Calais when it’s submitted for analysis. Defaults to empty string.
  • setSubmitter: Allows you to set an identifier for the content submitter. Defaults to ‘Open Calais Tags’.
  • setContentType: Allows you to specify the type of content you’re submitting. Can be text/xml, text/txt, or text/html. Defaults to text/html.
  • setOutputFormat: Allows you to specify the format of the returned results. The API currently only supports xml/rdf.
  • setPrettyTypes: Determines if the keys of the return array will be prettified or in the raw format returned by the Calais API. For example, Calais returns the entity type “IndustryTerm”. If set to true, the array key will instead be “Industry Term”. Defaults to true.

This class is distributed under an open source BSD license. The license terms can be found in license.txt of the code archive.

60 Responses

Write a comment
  1. Great PHP library! Have you considered abstracting the code to support other semantic tagging services?

    We’d love to see AlchemyAPI support in your library. Similar to the service you’re supporting now, but supports more languages (8), disambiguates more entity types (24+), etc.

    Elliot 15 September 2009 at 12:43 pm Permalink

Write a Comment

Commenter Gravatar