WP Calais Archive Tagger

The Calais Archive Tagger plugin automatically goes through your archives and tags every post you’ve written. The plugin uses the Open Calais API to perform semantic analysis of your post text and suggest tags. If a post already contains a suggested tag, that tag isn’t added, but other new tags found are. It takes about 5 minutes to tag 200 posts.

Also see the Calais Auto Tagger plugin, which adds tag suggestion to your post writing screen. These two plugins work together to make tagging both new and past content simple, but can be used separately as well.

Calais Archive Tagger

The Calais Archive Tagger requires you have an Open Calais API key. Getting a key is easy as filling out two forms — it’s an instant, automated process. First, go to the Open Calais site and use the “Register” link at the top of the page to create an account. Then, request an API key by filling out this form. Enter your API key on the Calais Configuration tab of your plugins page.

Calais Archive Tagger is compatible with WordPress 2.3+ and WordPress 2.5+ blogs. It is free for personal and commercial use, but may not be redistributed without permission. Please e-mail me if you want to do that.

Current Version

Version: 1.2
Release Date: 4/12/2008
Download: WP Calais Archive Tagger at the WordPress Codex

Version 1.1 adds a rate limiter (2 posts processed per second) to ensure you don’t exceed the Calais API rate limit (2 requests per second and 40,000 requests per day). I’ve also wrapped the API call in a try/catch block so any exceptions won’t result in a loop condition. Version 1.2 adds a check to make sure old tags are never lost when adding new ones, and no longer adds e-mail addresses found as tags.

Notes

I recommend backing up your WordPress database before using this for the first time. There is no risk of damaging the database, as this plugin uses WordPress API functions to add the tags (no direct database access), but if you’re not happy with the tags it adds, you may want the ability to undo the additions easily.

This plugin relies on the Open Calais Tags PHP class, which requires PHP 5 web hosting with PHP’s cURL extension enabled (the majority of web hosts). Also see my blog stats plugin for W3Counter.

Installation

Unzip the archive and upload the files to your wp-content/plugins directory. Then activate the plugin from the plugins tab of your WordPress administration area. You’ll now have a “Calais Archive Tagger” link on your plugins menu where you can enter your API key and start the tagging process.

AddThis Social Bookmark Button

2 Trackbacks to “WP Calais Archive Tagger”

  1. Trackback from Dan Grossman » Tagging Large Post Archives Automatically on April 11th, 2008 at 6:24 pm:

    […] First, I wrote a PHP class for passing content to Open Calais and getting back tags. Then, a WordPress plugin for tagging posts as you write them. Now, taking it one step further again, here’s a plugin for automatic tagging of your post archives. […]

  2. Trackback from The Best Blogging Software (WordPress) + The Top 60 WordPress Plugins | Midas Oracle .ORG on April 13th, 2008 at 9:37 am:

    […] WP Calais Archive Tagger 1.2 » Dan Grossman (url) Tags your entire post archive by performing semantic analysis on the post text. […]

31 Responses to “WP Calais Archive Tagger”

  1. SHaiTaaN
    April 12th, 2008

    the problem i m facing is the plugin got activated , entered API key … and clicked tagging now the status i m getting is …

    Status Tagging in progress…

    its been 30 mins now and i havd got 177 post so how long it will take to tag all.. i suppose its not working.

  2. Dan
    April 12th, 2008

    You already said in your other comment that you don’t have PHP 5. This plugin requires PHP 5. It won’t tag any posts for you. WordPress has also ended support for PHP 4, which is obsolete by several years. Get your host to upgrade.

  3. Ahni
    April 12th, 2008

    Great plugin Dan, this can be extremely useful. Though I wish there was some way to limit the number of tags it grabs… After processing the first 90 posts, It made 1500 tags (including over 2 dozen phone numbers and obscure sentences)… I’m afraid to see what it’ll look like in the end… :0

    Anways, I’m a little bit curious to know if it’s possible to organize the tags like we see here? That would probably be a lot of work, but I thought I’d ask anyways.

    Cheers.

    looks like it’s going to take several days to make all the tags though. I

    I wish there was a way to somehow limit the number of tags it picks up. I got 900 posts on my site, and with the limited number of queries per day it seems to have stopped making tags at #95. (I’m testing it on a local version of my blog.)

    One thing

  4. Chris Masse
    April 12th, 2008

    Hi,
    Have you seen this comment?
    wordpress.org/support/topic/168436
    -
    When we still see “Status: Tagging in progress…” and the page does not refresh with new line of tags, what should we do? Should we abort it and redo it again? Or should we wait a long time?
    -
    This plugin worked great on a little blog of mine, and stalled on another small blog. No idea why.
    -
    Thanks a lot,
    Chris Masse

  5. Chris Masse
    April 12th, 2008

    Hi,
    This plugin deletes old plugins… which is not a good thing. :(
    Thanks for listening.
    Chris Masse

  6. Dan
    April 12th, 2008

    @Chris: You should see lines showing up immediately after that. If you don’t, you’re probably running PHP 4 or don’t have cURL, which are required. The plugin definitely does not delete other plugins.

    @Ahni: It has that metadata on what type of entity each tag is, but tags aren’t exactly hierarchical in WordPress. You’d have to leave the tag system to keep that metadata on the tags, right?

    The Calais rate limit is 2 queries per second and 40,000 queries per day. That should be plenty to handle a 900 post blog. Perhaps your connection is actually so fast that it sends and receives the request in less than half a second despite no parallel processing?

  7. Chjis Masse
    April 12th, 2008

    I meant “This plugin deletes old TAGS”… Sorry for the typo. :-D
    -
    I am on DreamHost. they run PHP 5.2.3.
    -
    I think they have “cURL”…
    -
    Thanks.
    Chris Masse

  8. Chjis Masse
    April 12th, 2008

    So to recap:
    1. Your plugin seems to freeze after a while.
    2. Your plugin deletes old tags, instead of adding new tags and leaving intact the old tags.
    -
    If these 2 problems could be solved, then this plugin would be great.
    hanks a lot,
    Chris Masse

  9. Dan
    April 12th, 2008

    Chris: Please wait 15 minutes and re-download the plugin from the WP plugin site. It’s updated so that it ensures no old tags are deleted in the tagging process.

  10. Ahni
    April 12th, 2008

    You’d have to leave the tag system to keep that metadata on the tags, right?

    Ah, I see. Well, it would still be a great feature to have (tag categories!) Perhaps this is something WP will add in the future.

    your connection is actually so fast that it sends…
    Yeah, that must have been it. It kept going this time, but I seem to have run into another problem. It stopped creating new tags and started mirroring the ones I’ve added in the past. Is there any chance the rate limiter you added skips adding tags if the server’s too fast? (btw I’m testing it on a local install of my blog)

  11. Dan
    April 12th, 2008

    It should be showing both old and new tags for each post listed. I had it add the existing tags to the list before the save_tags call to deal with what Chris reported. Perhaps a change to how they’re displayed will clear that up.

    I’ve updated the plugin again so that it displays only the tags from Calais, even though it still preserves any tags already on the post. WordPress updates the .zip archive on their site every 15 minutes, so within 15 minutes of this comment you can get the update.

  12. Chris Masse
    April 13th, 2008

    The plugin worked well on my 2 small blogs. However, on the big blog, the process stopped after post #194. (I have over 4,000 posts.) If you or Calais could solve this problem, then that would be great.

    One important feature to add to your plugin would be to have a range of posts to tag… instead of tagging all…
    Like: Do tag only posts from May 2007… or do tag only posts ID#34 to post ID#230. That way, next time we re-run this plugin, we wouldn’t have to re-tag the old tags already tagged by this plugin in a previous session…. :-D

    The location of this plugin should be under “Manage”, for its tagging functions… and under “Plugins” or “Options” for the API keys.
    Just my 2 cents,
    Chris

  13. Chris Merriman
    April 13th, 2008

    Note for any other Bluehost customers, who find they are still running on PHP4 boxes - You do NOT need to contact tech support to be swapped over any more. Go to your CPanel, click PHP Config, then change to PHP5 or PHP5 FastCGI . Users of other hosting companies might find they can do the same, but I only use BH, so can’t test, sorry.
    About to take the plunge and auto tag some 1800 posts now :)
    Thanks for all your work Dan.

  14. Alex
    April 13th, 2008

    It seems the API cannot handle languages different than English…
    When I use the plugin on my Italian-written blog I got a long queue of errors starting with:

    Fatal error: Uncaught exception ‘OpenCalaisException’ with message ‘Unsupported document language’ ……

    I consider that a not-so-little limitation…. :-(

  15. Chris Masse
    April 13th, 2008

    The plugin seems to appear 2 times at wordpress:
    wordpress.org/extend/plugins/wp-calais-archive-tagger/
    wordpress.org/extend/plugins/calais-auto-tagger/

  16. Mao-B
    April 13th, 2008

    I´m sorry i have to say this, but this plugin is rubbish, at least for my blog. None of the created tags have anything to do with the postingcontent. what ever it does, it is not the semantic analysis i thought first of.

  17. Chris Merriman
    April 13th, 2008

    Second note - this time for people not using their own PC when running auto-tag archives…

    Make sure that FireFox is set to run new searches in a NEW tab. It is a little depressing to get 70% of the posts tagged, search for something, then realise that you’ve just wasted all that time :(

    If future versions of the plugin could support either breaking down the process by categories/months, or if an internal counter could be set, so work would not be repeated after you stop for whatever reason, that would be great.

    To Mao-B above, sorry to hear that Calais’ service didn’t hit the nail on the head for you, it seems to be doing fairly well for me so far. A small minority of posts have no tag at all created, but I’ll look into that later, and see if there is some sort of obvious pattern. Just a though, does Calais’ semantic search service definitely work on languages other than English?

  18. Chris Masse
    April 13th, 2008

    @ MAo-B
    Surely this plugin is not perfect and needs the reviewing by a human being after the automatic tagging. But it does many good. It does put many good tags. we can later delete the bad tags. Or we can use other tools (like search and Replace Tages) to refine and finish the tagging process.
    -
    This plugin is a good start to tag old posts that have no tags.
    -

  19. Ahni
    April 13th, 2008

    Hey, thanks for your efforts Dan. I must have missed the latest updates (because all my old tags are gone now) but I’m happy to say it finished without a hitch :D

    As a general comment about Calais, I think it does need a bit more work. There are a number of keywords it didn’t pick up on that I would think it should have. For instance, I have many topics about gold, titanium, and uranium but it made no tags for these words (above all, that’s what I was hoping it would do.)

    In any case, thanks again Dan.

  20. Dan
    April 13th, 2008

    @Chris Masse: Please slow down on the commenting! Those two plugins are different. One tags your archives, the other adds tag suggestion to your post writing screen. I’ll keep your suggestion about incremental processing in mind.

    @Chris Merriam and Alex: Calais only supports English language text. It’s still a beta product, and I believe additional languages are part of the third milestone on their roadmap. As they create more ontologies, it’ll recognize more entities within the text.

  21. Heffo
    April 13th, 2008

    Hi Dan, I get the following error when trying to activate the ‘WP Calais Archive Tagger’ plugin:

    Plugin could not be activated because it triggered a fatal error.
    Parse error: parse error in ….\wp-content\plugins\calais_archive_tagger.php on line 132

    Are you aware of this issue? Is it a problem with the file or my WP?
    I’m using a mac and tried Safari 3.1 and FF 3.0b5.

    Thanks, Heffo

  22. Dan
    April 13th, 2008

    @Heffo: This plugin requires PHP 5. You only have PHP 4.

  23. Heffo
    April 13th, 2008

    Ah ok. Thanks Dan.

  24. David
    April 16th, 2008

    I assume you know of this prize: http://www.semantic-web.at/index.php?id=1&subid=57&action=resource&item=1646

  25. Dan
    April 16th, 2008

    Yeah David, unfortunately the deadline to send a proposal for the bounty was in March, and I didn’t see it until this month.

  26. Matt Ellsworth
    April 21st, 2008

    I just tried to use this on several sites. It worked GREAT on the sites that were small. However I tried it on one with about 4000 posts. I let it run for a while - but it started to make firefox use up 100% cpu (on a quad core box - so it just grabs one core).

    It would be great to have a pause/resume button.
    It would also be cool if it didn’t list every post, but rather just show a revolving list of 10 or so.

  27. Matt Ellsworth
    April 24th, 2008

    I thought I would post back an update… I let it run on the site with 3500 posts. It ran fine - it took about 10 hours or so and firefox would periodically go from using 0% to 100% of the cpu (that one is a 2ghz machine). But I just let it run.

    I’m now running this on another blog with about 10,000 posts, and i’m just going to let it go, and see how it does. so far so good.

  28. Matt Ellsworth
    April 24th, 2008

    Me again… sorry about all the comments… I figured out that if you want to stop it part way through - just make note of the post number.

    1. open up the file calais_archive_tagger.php
    2. Go to line 80 (at least in my file)
    3. Look for this

    Status: Click here to start tagging your posts.

    See where it says calais_archive_run(0) - replace 0 with the post id that you want to start with.

    This worked for me. Hope it helps.

    and dan- thanks again for this great plugin!!!

  29. indi
    April 28th, 2008

    Just wondering before I bork my blog. The plugin has stopped generating tags after post 635. Should I start it again? Will there be double tags?

Leave a Reply

You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Network Activity

Visitor Boost and Targeted Visitors have received 0 order today and 20 orders yesterday. Normal weekend lull.

W3Counter is currently processing 242 queries per second for 10,467 websites.

Website Goodies is hosting 52,044 guestbooks, 11,196 counters and 6,988 polls.

Award Winning Hosts has collected 178 customer reviews of web hosts.