Calais is a free web service from Clearforest, a Reuters company, that can perform semantic analysis on any English text. It uses natural language processing to extract concepts and relationships from the text. It’s been around for a few months, but there’s been very little developer activity around it, and even fewer completed applications using the technology.
Not finding any other work to build on, I wrote my own PHP class for extracting tags from content with Open Calais’ API. You can get the source and read more here. This class takes a block of text or HTML, sends it to Open Calais or parsing, and extracts all of the entities (things like peoples’ names, companies, technologies, etc.). It returns a multidimensional array organized by entity type.
There’s more Open Calais can do, but I hope this class contributes something to those PHP developers that’d like to start using it but had no place to start with the lightweight documentation and eerily quiet official forums. I plan on putting this class to work as an auto-tagging plugin for WordPress posts. I still need some time to figure out how to integrate that into the new authoring interface of WP 2.5, which this blog is now running on.