WP Calais Archive Tagger

The Calais Archive Tagger plugin automatically goes through your archives and tags every post you’ve written. The plugin uses the Open Calais API to perform semantic analysis of your post text and suggest tags. If a post already contains a suggested tag, that tag isn’t added, but other new tags found are. It takes about 5 minutes to tag 200 posts.

Also see the Calais Auto Tagger plugin, which adds tag suggestion to your post writing screen. These two plugins work together to make tagging both new and past content simple, but can be used separately as well.

Calais Archive Tagger

The Calais Archive Tagger requires you have an Open Calais API key. Getting a key is easy as filling out two forms — it’s an instant, automated process. First, go to the Open Calais site and use the “Register” link at the top of the page to create an account. Then, request an API key by filling out this form. Enter your API key on the Calais Configuration tab of your plugins page.

Calais Archive Tagger is compatible with WordPress 2.3+ and WordPress 2.5+ blogs. It is free for personal and commercial use, but may not be redistributed without permission. Please e-mail me if you want to do that.

Current Version

Version: 1.4
Release Date: 3/23/2009
Download: WP Calais Archive Tagger at the WordPress Codex

Version 1.1 adds a rate limiter (2 posts processed per second) to ensure you don’t exceed the Calais API rate limit (2 requests per second and 40,000 requests per day). I’ve also wrapped the API call in a try/catch block so any exceptions won’t result in a loop condition. Version 1.2 adds a check to make sure old tags are never lost when adding new ones, and no longer adds e-mail addresses found as tags.

Notes

I recommend backing up your WordPress database before using this for the first time. There is no risk of damaging the database, as this plugin uses WordPress API functions to add the tags (no direct database access), but if you’re not happy with the tags it adds, you may want the ability to undo the additions easily.

This plugin relies on the Open Calais Tags PHP class, which requires PHP 5 web hosting with PHP’s cURL extension enabled (the majority of web hosts). Also see my blog stats plugin for W3Counter.

Installation

Unzip the archive and upload the files to your wp-content/plugins directory. Then activate the plugin from the plugins tab of your WordPress administration area. You’ll now have a “Calais Archive Tagger” link on your plugins menu where you can enter your API key and start the tagging process.

79 Responses

Write a comment
  1. the problem i m facing is the plugin got activated , entered API key … and clicked tagging now the status i m getting is …

    Status Tagging in progress…

    its been 30 mins now and i havd got 177 post so how long it will take to tag all.. i suppose its not working.

    SHaiTaaN 12 April 2008 at 7:29 am Permalink
  2. You already said in your other comment that you don’t have PHP 5. This plugin requires PHP 5. It won’t tag any posts for you. WordPress has also ended support for PHP 4, which is obsolete by several years. Get your host to upgrade.

    Dan 12 April 2008 at 12:10 pm Permalink
  3. Great plugin Dan, this can be extremely useful. Though I wish there was some way to limit the number of tags it grabs… After processing the first 90 posts, It made 1500 tags (including over 2 dozen phone numbers and obscure sentences)… I’m afraid to see what it’ll look like in the end… :0

    Anways, I’m a little bit curious to know if it’s possible to organize the tags like we see here? That would probably be a lot of work, but I thought I’d ask anyways.

    Cheers.

    looks like it’s going to take several days to make all the tags though. I

    I wish there was a way to somehow limit the number of tags it picks up. I got 900 posts on my site, and with the limited number of queries per day it seems to have stopped making tags at #95. (I’m testing it on a local version of my blog.)

    One thing

    Ahni 12 April 2008 at 6:52 pm Permalink
  4. Hi,
    Have you seen this comment?
    wordpress.org/support/topic/168436
    -
    When we still see “Status: Tagging in progress…” and the page does not refresh with new line of tags, what should we do? Should we abort it and redo it again? Or should we wait a long time?
    -
    This plugin worked great on a little blog of mine, and stalled on another small blog. No idea why.
    -
    Thanks a lot,
    Chris Masse

    Chris Masse 12 April 2008 at 7:31 pm Permalink
  5. Hi,
    This plugin deletes old plugins… which is not a good thing. :(
    Thanks for listening.
    Chris Masse

    Chris Masse 12 April 2008 at 7:55 pm Permalink
  6. @Chris: You should see lines showing up immediately after that. If you don’t, you’re probably running PHP 4 or don’t have cURL, which are required. The plugin definitely does not delete other plugins.

    @Ahni: It has that metadata on what type of entity each tag is, but tags aren’t exactly hierarchical in WordPress. You’d have to leave the tag system to keep that metadata on the tags, right?

    The Calais rate limit is 2 queries per second and 40,000 queries per day. That should be plenty to handle a 900 post blog. Perhaps your connection is actually so fast that it sends and receives the request in less than half a second despite no parallel processing?

    Dan 12 April 2008 at 8:26 pm Permalink
  7. I meant “This plugin deletes old TAGS”… Sorry for the typo. :-D
    -
    I am on DreamHost. they run PHP 5.2.3.
    -
    I think they have “cURL”…
    -
    Thanks.
    Chris Masse

    Chjis Masse 12 April 2008 at 9:03 pm Permalink
  8. So to recap:
    1. Your plugin seems to freeze after a while.
    2. Your plugin deletes old tags, instead of adding new tags and leaving intact the old tags.
    -
    If these 2 problems could be solved, then this plugin would be great.
    hanks a lot,
    Chris Masse

    Chjis Masse 12 April 2008 at 9:12 pm Permalink
  9. Chris: Please wait 15 minutes and re-download the plugin from the WP plugin site. It’s updated so that it ensures no old tags are deleted in the tagging process.

    Dan 12 April 2008 at 9:35 pm Permalink
  10. You’d have to leave the tag system to keep that metadata on the tags, right?

    Ah, I see. Well, it would still be a great feature to have (tag categories!) Perhaps this is something WP will add in the future.

    your connection is actually so fast that it sends…
    Yeah, that must have been it. It kept going this time, but I seem to have run into another problem. It stopped creating new tags and started mirroring the ones I’ve added in the past. Is there any chance the rate limiter you added skips adding tags if the server’s too fast? (btw I’m testing it on a local install of my blog)

    Ahni 12 April 2008 at 11:26 pm Permalink
  11. It should be showing both old and new tags for each post listed. I had it add the existing tags to the list before the save_tags call to deal with what Chris reported. Perhaps a change to how they’re displayed will clear that up.

    I’ve updated the plugin again so that it displays only the tags from Calais, even though it still preserves any tags already on the post. WordPress updates the .zip archive on their site every 15 minutes, so within 15 minutes of this comment you can get the update.

    Dan 12 April 2008 at 11:33 pm Permalink
  12. The plugin worked well on my 2 small blogs. However, on the big blog, the process stopped after post #194. (I have over 4,000 posts.) If you or Calais could solve this problem, then that would be great.

    One important feature to add to your plugin would be to have a range of posts to tag… instead of tagging all…
    Like: Do tag only posts from May 2007… or do tag only posts ID#34 to post ID#230. That way, next time we re-run this plugin, we wouldn’t have to re-tag the old tags already tagged by this plugin in a previous session…. :-D

    The location of this plugin should be under “Manage”, for its tagging functions… and under “Plugins” or “Options” for the API keys.
    Just my 2 cents,
    Chris

    Chris Masse 13 April 2008 at 4:30 am Permalink
  13. Note for any other Bluehost customers, who find they are still running on PHP4 boxes – You do NOT need to contact tech support to be swapped over any more. Go to your CPanel, click PHP Config, then change to PHP5 or PHP5 FastCGI . Users of other hosting companies might find they can do the same, but I only use BH, so can’t test, sorry.
    About to take the plunge and auto tag some 1800 posts now :)
    Thanks for all your work Dan.

    Chris Merriman 13 April 2008 at 7:13 am Permalink
  14. It seems the API cannot handle languages different than English…
    When I use the plugin on my Italian-written blog I got a long queue of errors starting with:

    Fatal error: Uncaught exception ‘OpenCalaisException’ with message ‘Unsupported document language’ ……

    I consider that a not-so-little limitation…. :-(

    Alex 13 April 2008 at 7:32 am Permalink
  15. The plugin seems to appear 2 times at wordpress:
    wordpress.org/extend/plugins/wp-calais-archive-tagger/
    wordpress.org/extend/plugins/calais-auto-tagger/

    Chris Masse 13 April 2008 at 8:10 am Permalink
  16. I´m sorry i have to say this, but this plugin is rubbish, at least for my blog. None of the created tags have anything to do with the postingcontent. what ever it does, it is not the semantic analysis i thought first of.

    Mao-B 13 April 2008 at 8:58 am Permalink
  17. Second note – this time for people not using their own PC when running auto-tag archives…

    Make sure that FireFox is set to run new searches in a NEW tab. It is a little depressing to get 70% of the posts tagged, search for something, then realise that you’ve just wasted all that time :(

    If future versions of the plugin could support either breaking down the process by categories/months, or if an internal counter could be set, so work would not be repeated after you stop for whatever reason, that would be great.

    To Mao-B above, sorry to hear that Calais’ service didn’t hit the nail on the head for you, it seems to be doing fairly well for me so far. A small minority of posts have no tag at all created, but I’ll look into that later, and see if there is some sort of obvious pattern. Just a though, does Calais’ semantic search service definitely work on languages other than English?

    Chris Merriman 13 April 2008 at 9:08 am Permalink
  18. @ MAo-B
    Surely this plugin is not perfect and needs the reviewing by a human being after the automatic tagging. But it does many good. It does put many good tags. we can later delete the bad tags. Or we can use other tools (like search and Replace Tages) to refine and finish the tagging process.
    -
    This plugin is a good start to tag old posts that have no tags.
    -

    Chris Masse 13 April 2008 at 10:07 am Permalink
  19. Hey, thanks for your efforts Dan. I must have missed the latest updates (because all my old tags are gone now) but I’m happy to say it finished without a hitch :D

    As a general comment about Calais, I think it does need a bit more work. There are a number of keywords it didn’t pick up on that I would think it should have. For instance, I have many topics about gold, titanium, and uranium but it made no tags for these words (above all, that’s what I was hoping it would do.)

    In any case, thanks again Dan.

    Ahni 13 April 2008 at 11:43 am Permalink
  20. @Chris Masse: Please slow down on the commenting! Those two plugins are different. One tags your archives, the other adds tag suggestion to your post writing screen. I’ll keep your suggestion about incremental processing in mind.

    @Chris Merriam and Alex: Calais only supports English language text. It’s still a beta product, and I believe additional languages are part of the third milestone on their roadmap. As they create more ontologies, it’ll recognize more entities within the text.

    Dan 13 April 2008 at 2:32 pm Permalink
  21. Hi Dan, I get the following error when trying to activate the ‘WP Calais Archive Tagger’ plugin:

    Plugin could not be activated because it triggered a fatal error.
    Parse error: parse error in ….\wp-content\plugins\calais_archive_tagger.php on line 132

    Are you aware of this issue? Is it a problem with the file or my WP?
    I’m using a mac and tried Safari 3.1 and FF 3.0b5.

    Thanks, Heffo

    Heffo 13 April 2008 at 4:01 pm Permalink
  22. @Heffo: This plugin requires PHP 5. You only have PHP 4.

    Dan 13 April 2008 at 4:15 pm Permalink
  23. Ah ok. Thanks Dan.

    Heffo 13 April 2008 at 5:43 pm Permalink
  24. David 16 April 2008 at 6:36 pm Permalink
  25. Yeah David, unfortunately the deadline to send a proposal for the bounty was in March, and I didn’t see it until this month.

    Dan 16 April 2008 at 10:15 pm Permalink
  26. I just tried to use this on several sites. It worked GREAT on the sites that were small. However I tried it on one with about 4000 posts. I let it run for a while – but it started to make firefox use up 100% cpu (on a quad core box – so it just grabs one core).

    It would be great to have a pause/resume button.
    It would also be cool if it didn’t list every post, but rather just show a revolving list of 10 or so.

    Matt Ellsworth 21 April 2008 at 5:16 pm Permalink
  27. I thought I would post back an update… I let it run on the site with 3500 posts. It ran fine – it took about 10 hours or so and firefox would periodically go from using 0% to 100% of the cpu (that one is a 2ghz machine). But I just let it run.

    I’m now running this on another blog with about 10,000 posts, and i’m just going to let it go, and see how it does. so far so good.

    Matt Ellsworth 24 April 2008 at 2:49 pm Permalink
  28. Me again… sorry about all the comments… I figured out that if you want to stop it part way through – just make note of the post number.

    1. open up the file calais_archive_tagger.php
    2. Go to line 80 (at least in my file)
    3. Look for this

    Status: Click here to start tagging your posts.

    See where it says calais_archive_run(0) – replace 0 with the post id that you want to start with.

    This worked for me. Hope it helps.

    and dan- thanks again for this great plugin!!!

    Matt Ellsworth 24 April 2008 at 3:57 pm Permalink
  29. Just wondering before I bork my blog. The plugin has stopped generating tags after post 635. Should I start it again? Will there be double tags?

    indi 28 April 2008 at 3:51 am Permalink
  30. Hi Dan,

    Great work: thanks for your contribution. Everything works as advertised on my admittedly very small blog.

    A feature suggestion, if you don’t mind (but I don’t know if it is possible): Could the archive-tagger either mark posts that have been processed so one can re-run the plugin without re-processing posts that have already been processed, or could the user choose and limit which posts to tag, perhaps starting from a given date, category, or page(s).

    Thank you again for your efforts.

    Jim 25 May 2008 at 3:28 pm Permalink
  31. This plug in is awesome. Is there any way to tag pages and not just posts?

    Shaun Robinson 25 June 2008 at 10:53 am Permalink
  32. Dear Dan,
    I would like to ask a question about this plug in,from the description available in your plugin i suggest it’s a good pulgin but when i try to upload and active m plugin for few minute appear message fatal eror.
    example:
    Plugin could not be activated because it triggered a fatal error.
    Parse error: syntax error, unexpected ‘{‘ in /home/archi/public_html/wp-content/plugins/wp-calais-archive-tagger/calais_archive_tagger.php on line 132

    and i can’t actived my plugin. may you help me please…..thank before.

    irulbyzan 15 July 2008 at 10:53 pm Permalink
  33. @irulbyzan: This plugin requires PHP 5, while you’re trying to run it on PHP 4.

    Dan 15 July 2008 at 10:59 pm Permalink
  34. Thanks, this is exactly what i was looking for.

    Philix 23 July 2008 at 10:27 am Permalink
  35. Hey Dan,

    Very cool little plugin, perfect for a feed aggregator. The only thing that was missing was the ability to automatically add tags periodically, without having to do it manually. See with feed aggregation, you don’t actually create the posts yourself, nor even ever look at the Writing screen, so a different solution was needed.

    I took the liberty to hack together a file, calais_cron_tagger.php, which basically is your file slightly modified and trimmed down and cron readable. So if needed just plop that file into your wp-calais-archive-tagger folder, set up a cron job to point to the file in question, and you’re ready to rock and roll. Posts being tagged while you’re sleeping, kinda cool.

    You can download the file here:
    http://www.leeclemmer.com/calais-cron-tagger.rar

    It’s a pretty dirty hack but works.

    Enjoy, and thanks again!
    Greetz from Philly, 215 w00t!
    - Lee

    Lee 23 July 2008 at 8:58 pm Permalink
  36. PS: reading through the above comments, it seems that some users were having firefox problems: as this little “cron_tagger” actually works in the backgroudn (and sends output as email), this may be useful for people with a lot of posts… just a thought.

    Lee 23 July 2008 at 9:18 pm Permalink
  37. Is there any chance this would work with PHP4? My hosting service has PHP4 and no PHP5.

    Pratik Sinha 7 August 2008 at 12:44 am Permalink
  38. Ok, hey Dan, just like ‘Chris Masse’ (no reply for his comment) said:

    “The plugin worked well on my 2 small blogs. However, on the big blog, the process stopped after post #194. (I have over 4,000 posts.) If you or Calais could solve this problem, then that would be great.”

    I have one blog with more tha 4,000 posts too, and is stops on #72 or #96 or #196… it stops.. random times.

    Any fix for that ?

    Thanks.

    Saulo Benigno 7 August 2008 at 4:07 am Permalink
  39. You’re probably running into the maximum execution time limit set in your PHP configuration. Unfortunately the plugin doesn’t keep track of where it left off to resume processing, so there’s no easy fix from my end (until I have time to write a new version). You can set_time_limit(0) to see if your host allows you to override the setting.

    Dan 7 August 2008 at 5:20 am Permalink
  40. Well, i’m using the fix posted by “Matt Ellsworth”, it’s working.

    Thanks.

    Saulo Benigno 7 August 2008 at 11:11 am Permalink
  41. Hey – great plug! Would you mind posting a snippet so that I can make the plugin skip posts that have any tags at all. A simple checkbox in the next release would be grand!

    Thanks.

    Matt 11 September 2008 at 3:08 pm Permalink
  42. Hello Dan,

    This is a nice plugin, thank you for your great work. Actually i have some issue with the output of the tags. There is a ( ; ) added next to the anchors text of the tag links (e.g: tags: google;, usa;, people; ). its there any part of the code i can delate to remove the ( ; )?

    Thank you

    Thy 3 November 2008 at 8:14 pm Permalink
  43. I can echo what Thy is experiencing. That is about the only thing I can’t get to work. All links to tag pages work fine and there are no ;’s in the tag slugs.

    Thank you for a very nice plugin. I took the liberty of creating an adapted version for FeedWordPress users, available here : http://www.kaplak.com/wiki/index.php?title=FWP_Calais_Autotagger

    This adaptation tags each individual item as it comes in via the FeedWordPress plugin.

    Morten Blaabjerg 11 November 2008 at 9:25 am Permalink
  44. Using this on WP2.6.3 it worked like a charm. However, on WP2.7 Beta 2 and Beta 3 the semi-colons that Thy and Morten are reporting magically appeared.

    It may be that Calais has altered their results in the week between using it on 2.6.3 and when I used it again on 2.7 Beta. It doesn’t appear to be coming from the plugin.

    Since I don’t intend to run the tagger over old posts again (now that they are nicely tagged) I simply used the WP Search & Replace plugin to search the terms table eg. wp_terms and delete all instances of ; in the names. This can also be easily done with a SQL query in the database.
    Hope that helps someone.

    Lynne 17 November 2008 at 12:31 am Permalink
  45. Experiencing the same problem with the semi-colons (e.g. “tag;”). Reported the issue on the OpenCalais forum to see if it’s coming from their side:

    http://opencalais.com/node/11332#comment-578

    Let me know if any of you have found a resolution to this!

    Thanks,
    - Lee

    Lee 15 December 2008 at 11:48 am Permalink
  46. Great plugin, works great, had to edit to remove the trailing ; but all in all its pretty nice, got it setup to run through cron, good job. :)

    Jamison Fitzgerald 21 December 2008 at 3:26 am Permalink
  47. love the plugin – any plans on updating it to still work after march 15th? The api.opencalais.com will be shut down then – they have the new R4 format.

    http://opencalais.com/news/calais-40-update-test-now-full-40-release-coming-march-15th

    thanks

    matt 23 February 2009 at 1:19 pm Permalink
  48. I just wanted to add to the questions about whether you plan to update the plugins for WP 2.7 and Calais R4 compatibility?

    Thanks for everything you’ve done with the plugin so far.

    Joss Winn 19 March 2009 at 9:53 am Permalink
  49. Hi there mate, great plugin!
    I am testing the plugin and while it looks perfect on short posts, it looks like it builds too many tags on long posts.

    I have some 2,600 posts to tag and after a while the plugin becomes slow due to page scroll up and down, could you disable the output or find a way using java-script to output on the same line what is the current record and how many records are left?

    Also it would be perfect to enable a feature to resume where the last interruption occurred, without reprocessing the whole db, still allowing rebuild from the beginning (two options)

    Regards

    Hermann 12 April 2009 at 12:40 pm Permalink
  50. Ditto! The plugin has hanged twice, due to connection errors.. now it starts again from the beginning!
    Is there a simple way to modify your plugin to skip tagged posts?

    Hermann 12 April 2009 at 5:17 pm Permalink

Write a Comment

Commenter Gravatar