Open Calais Tags

Open Calais Tags is a PHP class for extracting entities from text using Open Calais. Calais performs semantic analysis of the text, using natural language processing to identify concepts like people, companies and technologies discussed in the text. These are especially useful for suggesting tags for your content such as website articles or blog posts. You could even automatically tag archived content that would take days to go through manually.

You can download the class and example usage here:
dg_open_calais.zip (updated 7/9/2008)

Calais is free for both personal and commercial use, and usage of this class requires a Calais API key. Getting an API key is an easy, automated process. Just click the “Register” link at the top of the site, then request a key through their automated system.

The Open Calais Tags class takes a content string as input, as well as a number of options, and returns a multidimensional array as output. The array’s keys are the entity types detected in the text, and the values are the entities found.

Example input:

April 7 (Bloomberg) — Yahoo! Inc., the Internet company that snubbed a $44.6 billion takeover bid from Microsoft Corp., may drop in Nasdaq trading after the software maker threatened to cut its bid if directors fail to give in soon.

If Yahoo’s directors refuse to negotiate a deal within three weeks, Microsoft plans to nominate a board slate and take its case to investors, Chief Executive Officer Steve Ballmer said April 5 in a statement. He suggested the deal’s value might decline if Microsoft has to take those steps.

The ultimatum may send Yahoo Chief Executive Officer Jerry Yang scrambling to find an appealing alternative for investors to avoid succumbing to Microsoft, whose bid was a 62 percent premium to Yahoo’s stock price at the time. The deadline shows Microsoft is in a hurry to take on Google Inc., which dominates in Internet search, said analysts including Canaccord Adams’s Colin Gillis.

Example output:

Array
(
    [Industry Term] => Array
        (
            [0] => Internet
            [1] => software maker
            [2] => Internet search
        )
    [Person] => Array
        (
            [0] => Steve Ballmer
            [1] => Jerry Yang
            [2] => Colin Gillis
        )
    [Company] => Array
        (
            [0] => Google Inc.
            [1] => Canaccord Adams
            [2] => Yahoo!
            [3] => Microsoft Corp.
        )
    [Currency] => Array
        (
            [0] => USD
        )
)

Basic usage is simple. Create an instance of the class with your API key, and call the getEntities method using your content string.

require('calais.php');
$oc = new OpenCalais(’your-api-key’);
$entities = $oc->getEntities($content);

A number of settings exist which can be changed through setters on the OpenCalais object:

  • setAllowDistribution: true or false. Indicates whether the extracted metadata can be distributed by Calais. Defaults to false.
  • setAllowSearch: true or false. Indicates whether future searches can be performed on metadata through the Calais API. Defaults to false.
  • setExternalID: Allows you to set an ID for the content to pass on to Calais when it’s submitted for analysis. Defaults to empty string.
  • setSubmitter: Allows you to set an identifier for the content submitter. Defaults to ‘Open Calais Tags’.
  • setContentType: Allows you to specify the type of content you’re submitting. Can be text/xml, text/txt, or text/html. Defaults to text/html.
  • setOutputFormat: Allows you to specify the format of the returned results. The API currently only supports xml/rdf.
  • setPrettyTypes: Determines if the keys of the return array will be prettified or in the raw format returned by the Calais API. For example, Calais returns the entity type “IndustryTerm”. If set to true, the array key will instead be “Industry Term”. Defaults to true.

This class is distributed under an open source BSD license. The license terms can be found in license.txt of the code archive.

AddThis Social Bookmark Button

5 Trackbacks to “Open Calais Tags”

  1. Trackback from Dan Grossman » WP Calais Auto Tagger: Automatic Tag Suggestion For Your Posts on April 10th, 2008 at 4:40 am:

    […] just completed the WP Calais Auto Tagger plugin, the obvious first use of my Open Calais Tags class. It adds a tag suggestion box to your WordPress post writing screen which suggests tags based on […]

  2. Trackback from Dan Grossman » Tagging Large Post Archives Automatically on April 11th, 2008 at 4:55 pm:

    […] a PHP class for passing content to Open Calais and getting back tags. Then, a WordPress plugin for tagging […]

  3. Trackback from PHP Weekly Reader - April 13th 2008 : phpaddiction on April 15th, 2008 at 3:42 am:

    […] never in a million years related to it, until of course I saw the tag. The class in the article Open Calais Tags might be what I need, I’m sure it will make its way into Zend Framework by next week. Oh YAY it is […]

  4. Trackback from the eXternal mind » links for 2008-05-13 on May 12th, 2008 at 11:50 pm:

    […] Dan Grossman ยป Open Calais Tags (tags: php library tagging calais opencalais api) […]

  5. Trackback from Dan Grossman » Open Calais PHP Class Updated on July 9th, 2008 at 10:43 pm:

    […] updated my Open Calais PHP Class with the entity types added in Calais’ last update. It now matches a bunch of new […]

17 Responses to “Open Calais Tags”

  1. Kev
    April 8th, 2008

    Anyone to create and maintain a Wordpress plugin for tags auto-suggestion ? :)

  2. David Peterson
    April 8th, 2008

    Nice!

  3. Dan
    April 8th, 2008

    Kev: I’m hoping to work on that some time this week when I get the chance.

  4. Neha
    April 9th, 2008

    hey Dan…
    i am new to this.Just downlaoded your source files and tryingto run it..
    can you please tell me how to use your calais class..
    $entities = $oc->getEntities($content);
    the function getEntities returns empty string..
    i have placed proper key in the source code..
    are there any pre-requisites..
    i downloaded calais-client..bu not able to execute submissio-tool.bat…
    can you please help
    thanks

  5. Dan
    April 10th, 2008

    Neha: Does $content contain a string of English content with entities Calais will recognize?

    Kev: The initial WordPress plugin for tag suggestion’s now available here:
    http://www.dangrossman.info/wp-calais-auto-tagger/

  6. Neha
    April 10th, 2008

    The content is same as your example input.
    I was just trying to run your source code. Downloaded the zip and put it in my web folder and added my Licenseid.Do I need to do anything else.

  7. Neha
    April 16th, 2008

    hey dan can you tell me some place where i can test ur PHP class..
    or give me the sample input…

  8. Dan
    April 16th, 2008

    Neha: The sample output above came directly from the example input above.

  9. nico.
    April 16th, 2008

    Hey Dan,

    Thanks a lot for the file!

    I just think you forgot ‘Country’ on line 75 ;)

    cheers

  10. Dan
    April 16th, 2008

    Thanks for mentioning that nico, I’ve added it here and in the copy bundled with the plugins.

  11. Neha
    May 9th, 2008

    Hi Dan,

    I know that Dan. I downloaded the zip class file for php.Hosted it on my web. And added my API key in octest.php.I am trying to run it. I get an error saying
    Warning: Invalid argument supplied for foreach() in C:\wamp\www\opencalais\octest.php on line 27

    $response = html_entity_decode(curl_exec($ch));
    this line returns nothing

    Can you please tell me whats wrong

    Do I need anything else except the API key to use your class.

  12. Tom
    June 9th, 2008

    Hey Dan,

    Was using your OC class on my site and had such a breeze getting it working forgot all about it. Now it seems that something might have changes with the OC API as now it always returns no suggestions. Have you released a new version in line with the new API if it has indeed changed?

    Cheers,
    Tom

Leave a Reply

You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Network Activity

Visitor Boost and Targeted Visitors have received 17 orders today and 15 orders yesterday.

W3Counter is currently processing -39 queries per second for 12,750 websites.

Website Goodies is hosting 79,836 guestbooks, 12,159 counters and 7,276 polls.

Award Winning Hosts has collected 182 customer reviews of web hosts.