NOTE: Please start a new email thread for a new topic (See http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking)
Your strategy could work. You might want to look into dedicated entity extraction frameworks like http://opennlp.sourceforge.net/ http://nlp.stanford.edu/software/CRF-NER.shtml http://incubator.apache.org/uima/index.html Or if that is too much work, look at http://issues.apache.org/jira/browse/SOLR-1725 for a way to plug in your entity extraction code into Solr itself using a scripting language. -- Jan Høydahl - search architect Cominvent AS - www.cominvent.com On 5. feb. 2010, at 20.10, José Moreira wrote: > Hello, > > I'm planning to index a 'content' field for search and from that > fields text content i would like to facet (probably) according to if > the content has e-mails, urls and within urls, url's to pictures, > videos and others. > > As i'm a relatively new user to Solr, my plan was to regexp the > content in my application and add tags to a Solr field according to > the content, so for example the content "m...@email.com > http://www.site.com" would have the tags "email, link". > > If i follow this path can i then facet on "email" and/or "link" ? For > example combining facet field with facet value params? > > Best > > -- > http://pt.linkedin.com/in/josemoreira > josemore...@irc.freenode.net > http://djangopeople.net/josemoreira/