NOTE: Please start a new email thread for a new topic (See 
http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking)

Your strategy could work. You might want to look into dedicated entity 
extraction frameworks like
http://opennlp.sourceforge.net/
http://nlp.stanford.edu/software/CRF-NER.shtml
http://incubator.apache.org/uima/index.html

Or if that is too much work, look at 
http://issues.apache.org/jira/browse/SOLR-1725 for a way to plug in your entity 
extraction code into Solr itself using a scripting language.

--
Jan Høydahl  - search architect
Cominvent AS - www.cominvent.com

On 5. feb. 2010, at 20.10, José Moreira wrote:

> Hello,
> 
> I'm planning to index a 'content' field for search and from that
> fields text content i would like to facet (probably) according to if
> the content has e-mails, urls and within urls, url's to pictures,
> videos and others.
> 
> As i'm a relatively new user to Solr, my plan was to regexp the
> content in my application and add tags to a Solr field according to
> the content, so for example the content "m...@email.com
> http://www.site.com"; would have the tags "email, link".
> 
> If i follow this path can i then facet on "email" and/or "link" ? For
> example combining facet field with facet value params?
> 
> Best
> 
> -- 
> http://pt.linkedin.com/in/josemoreira
> josemore...@irc.freenode.net
> http://djangopeople.net/josemoreira/

Reply via email to