Re: Faceting on multivalued field

2011-04-03 Thread Erick Erickson
Why not count them on the way in and just store that number along with the original e-mail? Best Erick On Sun, Apr 3, 2011 at 10:10 PM, Kaushik Chakraborty wrote: > Ok. My expectation was since "comment_post_id" is a MultiValued field hence > it would appear multiple times (i.e. for each comment

Re: Faceting on multivalued field

2011-04-03 Thread Chris Fauerbach
Wouldn't you want to extract your original data format from the index and then 'count' the comments for each post ? I don't think facets are appropriate. On Apr 3, 2011, at 22:10, Kaushik Chakraborty wrote: > Ok. My expectation was since "comment_post_id" is a MultiValued field hence > it wou

Re: Faceting on multivalued field

2011-04-03 Thread Kaushik Chakraborty
Ok. My expectation was since "comment_post_id" is a MultiValued field hence it would appear multiple times (i.e. for each comment). And hence when I would facet with that field it would also give me the count of those many documents where comment_post_id appears. My requirement is getting total fo

Re: Multiple Words in String

2011-04-03 Thread Erick Erickson
Short form: I think you're going down a rabbit-hole and should just use synonyms and forget about it. I'm particularly thinking that a general-purpose solution that somehow breaks up or combines adjacent tokens will have consequences that pop out other places that you don't want and you'll have to

Re: Multiple Words in String

2011-04-03 Thread Chris Fauerbach
It's not a specific case only ( e.g. microsoft.com), but it's really a multi word issue. carwash, bookkeeper etc... I'm ultimately looking for a schema for search and retrieve that's heavily focused on 'names'.. these are peoples names, business names etc.. not content like large text fields,

Re: admin/index.jsp double submit on IE

2011-04-03 Thread Erick Erickson
Jeffery: It's perfectly appropriate to raise a JIRA for something like this. If you could add the steps to make this happen, that'd be great. see: http://wiki.apache.org/solr/HowToContribute#Contributing_your_work. If you can add a patch, that'd be even better (instructions on that page too). Y

Re: Using EmbeddedSolrServer with static documents

2011-04-03 Thread Erick Erickson
OK, you're still not quite on the right track. You can't just index XML documents without transforming them into valid Solr XML documents. Ditto for HTML. Take a look at the ExtractingRequestHandler documentation at: http://wiki.apache.org/solr/ExtractingRequestHandler Here's some more documentat

Re: Faceting on multivalued field

2011-04-03 Thread Erick Erickson
Hmmm, I think you're misunderstanding faceting. It's counting the number of documents that have a particular value. So if you're faceting on "comment_post_id", there is one and only one document with that value (assuming that the comment_post_ids are unique). Which is what's being reported This

Re: Multiple Words in String

2011-04-03 Thread Erick Erickson
Is this a general question or specific? You can handle specific ones by using synonyms. But the general case, that is treating any two pairs of tokens as a single pair seems fraught with unintended consequences, but you know your problem space better than I do. Best Erick On Sat, Apr 2, 2011 at

Re: Using EmbeddedSolrServer with static documents

2011-04-03 Thread michael.i
Hi Erick, thanx for getting back to me. "Well, what is "a document on the filesystem"? Solr deals with well-formed XML documents of a specific format." I would like to index all kinds of documents. For a start I'll be happy to be able to work with xml and html documents. -- View this message in

Re: Using EmbeddedSolrServer with static documents

2011-04-03 Thread Erick Erickson
Well, what is "a document on the filesystem"? Solr deals with well-formed XML documents of a specific format. You can't just stream a random file to Solr. Specifically documents look like: value for field . . . perhaps with an . There are ways for structured documents to be added using the T

Faceting on multivalued field

2011-04-03 Thread Kaushik Chakraborty
Hi, My index contains a root entity "Post" and a child entity "Comments". Each post can have multiple comments. data-config.xml:

Re: Difference between Solr and Lucidworks distribution

2011-04-03 Thread Ken Krugler
On Apr 3, 2011, at 6:56am, yehosef wrote: > How can they require payment for something that was developed under the > apache license? It's the difference between free speech and free beer :) See http://en.wikipedia.org/wiki/Gratis_versus_libre -- Ken -- Ken Krugler +1

AW: Difference between Solr and Lucidworks distribution

2011-04-03 Thread Wolfram Bartussek
Take "Lucidworks for Solr", it's free. Regards, Wolfram -Ursprüngliche Nachricht- Von: yehosef [mailto:yeho...@gmail.com] Gesendet: Sonntag, 3. April 2011 15:57 An: solr-user@lucene.apache.org Betreff: Re: Difference between Solr and Lucidworks distribution How can they require payment

does overwrite=false work with json

2011-04-03 Thread David Murphy
I'm doing some performance benchmarking of Solr and I started with a single big JSON file containing all the docs that I'm sending via curl. The results are fantastic - I'm achieving an indexing rate of about 44,000 docs/sec using this method (these are really small test docs). In the past I hav

Re: Difference between Solr and Lucidworks distribution

2011-04-03 Thread yehosef
How can they require payment for something that was developed under the apache license? -- View this message in context: http://lucene.472066.n3.nabble.com/Difference-between-Solr-and-Lucidworks-distribution-tp2474792p2771191.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multiple Words in String

2011-04-03 Thread lboutros
I managed to find both documents with your two input queries . Add this filter in your analyzer query part : => The main problem is that your query "microsoft" is transfor