date:20110403

Re: Multiple Words in String

2011-04-03 Thread lboutros

I managed to find both documents with your two input queries .

Add this filter in your analyzer query part :



=>









 

The main problem is that your query "microsoft" is transformed into one
single PhraseQuery which cannot match the document containing "micro soft".
The PositionFilterFactory will transform the query into multiple queries.
You can activate the debug mode to see the differences.

you can see more informations here :

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory

Ludovic.

-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-Words-in-String-tp2767964p2770713.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Difference between Solr and Lucidworks distribution

2011-04-03 Thread yehosef

How can they require payment for something that was developed under the
apache license?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Difference-between-Solr-and-Lucidworks-distribution-tp2474792p2771191.html
Sent from the Solr - User mailing list archive at Nabble.com.

does overwrite=false work with json

2011-04-03 Thread David Murphy

I'm doing some performance benchmarking of Solr and I started with a single big 
JSON file containing all the docs that I'm sending via curl. The results are 
fantastic - I'm achieving an indexing rate of about 44,000 docs/sec using this 
method (these are really small test docs). In the past I have used CSV and 
adding overwrite=false to the URL increased performance when doing a fresh 
reindex when I know all the document ids are unique. I tried this with the JSON 
upload, and nothing seemed to change.  Is this supposed to work with the JSON 
update handler?

Anyway, Solr is doing spectacular against the competition so far.  Keep up the 
great work!

--Dave

AW: Difference between Solr and Lucidworks distribution

2011-04-03 Thread Wolfram Bartussek

Take "Lucidworks for Solr", it's free.

Regards, Wolfram

-Ursprüngliche Nachricht-
Von: yehosef [mailto:yeho...@gmail.com] 
Gesendet: Sonntag, 3. April 2011 15:57
An: solr-user@lucene.apache.org
Betreff: Re: Difference between Solr and Lucidworks distribution

How can they require payment for something that was developed under the
apache license?

--
View this message in context:
http://lucene.472066.n3.nabble.com/Difference-between-Solr-and-Lucidworks-di
stribution-tp2474792p2771191.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Difference between Solr and Lucidworks distribution

2011-04-03 Thread Ken Krugler


On Apr 3, 2011, at 6:56am, yehosef wrote:

> How can they require payment for something that was developed under the
> apache license?

It's the difference between free speech and free beer :)

See http://en.wikipedia.org/wiki/Gratis_versus_libre

-- Ken

--
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g

Faceting on multivalued field

2011-04-03 Thread Kaushik Chakraborty

Hi,

My index contains a root entity "Post" and a child entity "Comments". Each
post can have multiple comments. data-config.xml:












   



The schema has all columns of "comment" entity as "MultiValued" fields and
all fields are indexed & stored. My requirement is to count the number of
comments for each post. Approach I'm taking is to query on "*:*" and
faceting the result on "comment_post_id" so that it gives the count of
comment occurred for that post.

But I'm getting incorrect result e.g. if a post has 2 comments, the
multivalued fields are populated alright but the facet count is coming as 1
(for that post_id). What else do I need to do?


Thanks,
Kaushik

Re: Using EmbeddedSolrServer with static documents

2011-04-03 Thread Erick Erickson

Well, what is "a document on the filesystem"? Solr deals
with well-formed XML documents of a specific format. You
can't just stream a random file to Solr. Specifically
documents look like:

  value for field 
.
.
.


perhaps with an .

There are ways for structured documents to be added using the
Tika libraries etc.

But before we go there, what is it you want to do? What is the
nature of your document?

Best
Erick

On Sat, Apr 2, 2011 at 12:35 PM, michael.i  wrote:

> Hi,
> I am new to solr so please excuse me if my question sounds basic.
>
> I would like to use the EmbeddedSolrServer.
> It happens that all examples I've found on the web use documents that have
> been generated dynamically such as:
>
>
> SolrServer solrServer = new EmbeddedSolrServer(container, "core");
> SolrInputDocument doc = new SolrInputDocument();
> doc.addField("docText", "This is a sample file");
> solrServer.add(doc);
> solrServer.commit();
>
>
> I would like to be able to load a document that is stored on the
> filesystem.
> Ideally, I would have liked to do something such as:
> SolrInputDocument doc = new SolrInputDocument("path/myDoc.txt");
> solrServer.add(doc);
> solrServer.commit();
>
> It does not seem possible to do such thing. Am I missing something? Are
> there some best practices with regards to referring to a document on the
> filesystem?
>
> Thanx!
> Michael.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Using-EmbeddedSolrServer-with-static-documents-tp2767614p2767614.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Using EmbeddedSolrServer with static documents

2011-04-03 Thread michael.i

Hi Erick,
thanx for getting back to me.

"Well, what is "a document on the filesystem"? Solr deals
with well-formed XML documents of a specific format."

I would like to index all kinds of documents. For a start I'll be happy to
be able to work with xml and html documents.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-EmbeddedSolrServer-with-static-documents-tp2767614p2773012.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multiple Words in String

2011-04-03 Thread Erick Erickson

Is this a general question or specific? You can handle specific ones by
using synonyms.

But the general case, that is treating any two pairs of tokens as
a single pair seems fraught with unintended consequences, but
you know your problem space better than I do.

Best
Erick

On Sat, Apr 2, 2011 at 2:21 PM, Chris Fauerbach wrote:

> Good afternoon everyone!
> I am stumped, and I would love some help.I'm new to solr/lucene,
> but I have thrown myself into it, so I think I have a solid
> understanding.   Using the analysis tool in the admin interface, I see
> these words stemmed and processed as I assume they would be, so I'm
> stuck.
>
> In my index, I have two documents, each with a text field, and here
> are example values
>
> 1) microsoft.com
> 2) micro soft
>
> I want to do a search using microsoft or "micro soft" and find both.
> I'm using the dismax interface, the fields are properly listed in the
> config, and I can find both records, but never at the same time.
> Here's my schema.xml for my text field, any thoughts on what I can do
> to find these together?
>
>
> positionIncrementGap="100">
>  
>
>
> words="stopwords.txt" enablePositionIncrements="true"/>
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"
> preserveOriginal="1"/>
> synonyms="syn/index_synonyms.txt" ignoreCase="true" expand="true"/>
> maxGramSize="15" side="front"/>
> maxGramSize="15" side="back"/>
> language="English" protected="protwords.txt"/>
>  
>  
>
>
> maxGramSize="15" side="front"/>
> maxGramSize="15" side="back"/>
> words="stopwords.txt" enablePositionIncrements="true"/>
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"
> preserveOriginal="1"/>
> language="English" protected="protwords.txt"/>
>
>  
>
>

Re: Faceting on multivalued field

2011-04-03 Thread Erick Erickson

Hmmm, I think you're misunderstanding faceting. It's counting the
number of documents that have a particular value. So if you're
faceting on "comment_post_id", there is one and only one document
with that value (assuming that the comment_post_ids are unique).
Which is what's being reported This will be quite expensive on a
large corpus, BTW.

Is your task to show the totals for *every* document in your corpus or
just the ones in a display page? Because if the latter, your app could
just count up the number of elements in the XML returned for the
multiValued comments field.

If that's not relevant, could you explain a bit more why you need this
count?

Best
Erick

On Sun, Apr 3, 2011 at 2:31 PM, Kaushik Chakraborty wrote:

> Hi,
>
> My index contains a root entity "Post" and a child entity "Comments". Each
> post can have multiple comments. data-config.xml:
>
> 
> dataSource="jdbc" query="">
>
>
>
>
>
>
>
>
>
>   
>
> 
>
> The schema has all columns of "comment" entity as "MultiValued" fields and
> all fields are indexed & stored. My requirement is to count the number of
> comments for each post. Approach I'm taking is to query on "*:*" and
> faceting the result on "comment_post_id" so that it gives the count of
> comment occurred for that post.
>
> But I'm getting incorrect result e.g. if a post has 2 comments, the
> multivalued fields are populated alright but the facet count is coming as 1
> (for that post_id). What else do I need to do?
>
>
> Thanks,
> Kaushik
>

Re: Using EmbeddedSolrServer with static documents

2011-04-03 Thread Erick Erickson

OK, you're still not quite on the right track. You can't just
index XML documents without transforming them into
valid Solr XML documents. Ditto for HTML.

Take a look at the ExtractingRequestHandler documentation at:
http://wiki.apache.org/solr/ExtractingRequestHandler

Here's some more documentation that might help.
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Content-Extraction-Tika

But at root, you have to extract the relevant info from the file in question
and
form your own valid Solr document and send *that* to Solr if you want to
do it by hand.

Or you can use the ExtractingRequestHandler to do it for you, but then you
need
to be aware that it'll do the best it can at putting meta-data information
into
the appropriate fields in your schema, but you don't have total control over
that.

Oh, and why are you using embedded Solr? The normal HTTP request process
is recommended, which you can connect to easily with SolrJ..

FWIW
Erick

On Sun, Apr 3, 2011 at 6:48 PM, michael.i  wrote:

> Hi Erick,
> thanx for getting back to me.
>
> "Well, what is "a document on the filesystem"? Solr deals
> with well-formed XML documents of a specific format."
>
> I would like to index all kinds of documents. For a start I'll be happy to
> be able to work with xml and html documents.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Using-EmbeddedSolrServer-with-static-documents-tp2767614p2773012.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: admin/index.jsp double submit on IE

2011-04-03 Thread Erick Erickson

Jeffery:

It's perfectly appropriate to raise a JIRA for something like this.

If you could add the steps to make this happen, that'd be great.

see: http://wiki.apache.org/solr/HowToContribute#Contributing_your_work.

If you can add a patch, that'd be even better (instructions on that page
too). You'll
find the Solr committers are quite willing to work with you on the patch.

Thanks for finding this and digging into the underlying reason!

Best
Erick

On Sat, Apr 2, 2011 at 12:39 PM, Jeffrey Chang  wrote:

> Hi,
>
> I noticed /admin/index.jsp could issue a double submit on IE causing Jetty
> to error out.
>
> Fixed by modifying index.jsp's javascript submit to return false.
>
> ... queryForm.submit(); return false; ...
>
> Not sure if I should log a defect for this or not.
>
> - Jeff
>

Re: Multiple Words in String

2011-04-03 Thread Chris Fauerbach

It's not a specific case only ( e.g. microsoft.com),  but it's really a
multi word issue.

carwash, bookkeeper etc...

I'm ultimately looking for a schema for search and retrieve that's heavily
focused on 'names'.. these are peoples names, business names etc..   not
content like large text fields, web sites or anything like that, but
business data that I'm very succesfully receiving using dataimport
handlers...  it's these special cases that are really tripping me up .. my
business folks keep coming up with them!


Chris Fauerbach
chrisfauerb...@gmail.com


On Sun, Apr 3, 2011 at 6:51 PM, Erick Erickson wrote:

> Is this a general question or specific? You can handle specific ones by
> using synonyms.
>
> But the general case, that is treating any two pairs of tokens as
> a single pair seems fraught with unintended consequences, but
> you know your problem space better than I do.
>
> Best
> Erick
>
> On Sat, Apr 2, 2011 at 2:21 PM, Chris Fauerbach  >wrote:
>
> > Good afternoon everyone!
> > I am stumped, and I would love some help.I'm new to solr/lucene,
> > but I have thrown myself into it, so I think I have a solid
> > understanding.   Using the analysis tool in the admin interface, I see
> > these words stemmed and processed as I assume they would be, so I'm
> > stuck.
> >
> > In my index, I have two documents, each with a text field, and here
> > are example values
> >
> > 1) microsoft.com
> > 2) micro soft
> >
> > I want to do a search using microsoft or "micro soft" and find both.
> > I'm using the dismax interface, the fields are properly listed in the
> > config, and I can find both records, but never at the same time.
> > Here's my schema.xml for my text field, any thoughts on what I can do
> > to find these together?
> >
> >
> > > positionIncrementGap="100">
> >  
> >
> >
> > > words="stopwords.txt" enablePositionIncrements="true"/>
> > > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"
> > preserveOriginal="1"/>
> > > synonyms="syn/index_synonyms.txt" ignoreCase="true" expand="true"/>
> > minGramSize="2"
> > maxGramSize="15" side="front"/>
> > minGramSize="2"
> > maxGramSize="15" side="back"/>
> > > language="English" protected="protwords.txt"/>
> >  
> >  
> >
> >
> > minGramSize="2"
> > maxGramSize="15" side="front"/>
> > minGramSize="2"
> > maxGramSize="15" side="back"/>
> > > words="stopwords.txt" enablePositionIncrements="true"/>
> > > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"
> > preserveOriginal="1"/>
> > > language="English" protected="protwords.txt"/>
> >
> >  
> >
> >
>

Re: Multiple Words in String

2011-04-03 Thread Erick Erickson

Short form:
I think you're going down a rabbit-hole and should just
use synonyms and forget about it.

I'm particularly thinking that a general-purpose solution
that somehow breaks up or combines adjacent tokens
will have consequences that pop out other places that
you don't want and you'll have to fix *that*. I can't think
of a way to do this that wouldn't run that danger.

Long form, think of it as a sermon, it's Sunday after all.

This is the point, in my experience, where you have to ask your
business people "what's it worth to you"? You can handle
any case the come up similar to the examples you've shown
by adding it into your synonyms file - compressing any pair
into it's joined form (as a synonym) and be done with it. This is
a very straight-forward approach that has predictable consequences.

Or you can mess around, possibly for quite some time, trying
to find a general purpose solution that will almost inevitably
lead to unanticipated behavior that you'll then spend lots of time
trying to chase down, time you could have spent putting in
features that your users will actually notice.

Here's a test. Ask your business people to create a list of all the
pairs they want to see treated like this. If your response is any
variant of "we don't have time to do that" then even *they* must
not think it's very important . And if they do, put
it in your synonyms file and be a hero

Evil thoughts aside, I'm dead serious. This is the kind of rabbit-hole
that development efforts go down that, in all probability, add almost
zero *value* to the product. There's a way to handle 95% of the cases
that's very easy to implement. It's already there in Solr.

Historically, we in the programming field have done a very poor job
of making it clear to the business folks that every such request has
not only an implementation cost (and we all too often don't include
debugging/maintenance in that cost) but an opportunity cost. We owe it
to the business folks *and ourselves* to clearly explain to them the
cost and let them make the decision whether it's worth it. A decision
based on information. And understand that I'm not knocking the
business folks here. We haven't given them the consequences to weigh,
so how can we fault their decisions?

OK, sermon over . I've just too often said "yes, we can do that"
without thinking to add "and it'll cost 3 weeks of development effort".
Eventually I figured out that adding the estimate and letting the business
folks know what I wouldn't be able to get to because of that time
spent lead to "Oh, never mind".

Best
Erick

P.S. Ok, it's late Sunday night and I feel like writing long, involved
responses
that aren't entirely on-topic

On Sun, Apr 3, 2011 at 9:04 PM, Chris Fauerbach wrote:

> It's not a specific case only ( e.g. microsoft.com),  but it's really a
> multi word issue.
>
> carwash, bookkeeper etc...
>
> I'm ultimately looking for a schema for search and retrieve that's heavily
> focused on 'names'.. these are peoples names, business names etc..   not
> content like large text fields, web sites or anything like that, but
> business data that I'm very succesfully receiving using dataimport
> handlers...  it's these special cases that are really tripping me up .. my
> business folks keep coming up with them!
>
>
> Chris Fauerbach
> chrisfauerb...@gmail.com
>
>
> On Sun, Apr 3, 2011 at 6:51 PM, Erick Erickson  >wrote:
>
> > Is this a general question or specific? You can handle specific ones by
> > using synonyms.
> >
> > But the general case, that is treating any two pairs of tokens as
> > a single pair seems fraught with unintended consequences, but
> > you know your problem space better than I do.
> >
> > Best
> > Erick
> >
> > On Sat, Apr 2, 2011 at 2:21 PM, Chris Fauerbach <
> chrisfauerb...@gmail.com
> > >wrote:
> >
> > > Good afternoon everyone!
> > > I am stumped, and I would love some help.I'm new to solr/lucene,
> > > but I have thrown myself into it, so I think I have a solid
> > > understanding.   Using the analysis tool in the admin interface, I see
> > > these words stemmed and processed as I assume they would be, so I'm
> > > stuck.
> > >
> > > In my index, I have two documents, each with a text field, and here
> > > are example values
> > >
> > > 1) microsoft.com
> > > 2) micro soft
> > >
> > > I want to do a search using microsoft or "micro soft" and find both.
> > > I'm using the dismax interface, the fields are properly listed in the
> > > config, and I can find both records, but never at the same time.
> > > Here's my schema.xml for my text field, any thoughts on what I can do
> > > to find these together?
> > >
> > >
> > > > > positionIncrementGap="100">
> > >  
> > >
> > >
> > > > > words="stopwords.txt" enablePositionIncrements="true"/>
> > > > > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"
> > > preserveOriginal="1"/>
>

Re: Faceting on multivalued field

2011-04-03 Thread Kaushik Chakraborty

Ok. My expectation was since "comment_post_id" is a MultiValued field hence
it would appear multiple times (i.e. for each comment). And hence when I
would facet with that field it would also give me the count of those many
documents where comment_post_id appears.

My requirement is getting total for every document i.e. finding number of
comments per post in the whole corpus. To explain it more clearly, I'm
getting a result xml something like this

46
Hello World
20

9
10


   19
   2


  46
  46


   Hello - from World
   Hi



  
 *1*

I need the count to be 2 as the post 46 has 2 comments.

 What other way can I approach?

Thanks,
Kaushik


On Mon, Apr 4, 2011 at 4:29 AM, Erick Erickson wrote:

> Hmmm, I think you're misunderstanding faceting. It's counting the
> number of documents that have a particular value. So if you're
> faceting on "comment_post_id", there is one and only one document
> with that value (assuming that the comment_post_ids are unique).
> Which is what's being reported This will be quite expensive on a
> large corpus, BTW.
>
> Is your task to show the totals for *every* document in your corpus or
> just the ones in a display page? Because if the latter, your app could
> just count up the number of elements in the XML returned for the
> multiValued comments field.
>
> If that's not relevant, could you explain a bit more why you need this
> count?
>
> Best
> Erick
>
> On Sun, Apr 3, 2011 at 2:31 PM, Kaushik Chakraborty  >wrote:
>
> > Hi,
> >
> > My index contains a root entity "Post" and a child entity "Comments".
> Each
> > post can have multiple comments. data-config.xml:
> >
> > 
> > > dataSource="jdbc" query="">
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >   
> >
> > 
> >
> > The schema has all columns of "comment" entity as "MultiValued" fields
> and
> > all fields are indexed & stored. My requirement is to count the number of
> > comments for each post. Approach I'm taking is to query on "*:*" and
> > faceting the result on "comment_post_id" so that it gives the count of
> > comment occurred for that post.
> >
> > But I'm getting incorrect result e.g. if a post has 2 comments, the
> > multivalued fields are populated alright but the facet count is coming as
> 1
> > (for that post_id). What else do I need to do?
> >
> >
> > Thanks,
> > Kaushik
> >
>

Re: Faceting on multivalued field

2011-04-03 Thread Chris Fauerbach

Wouldn't you want to extract your original data format from the index and then 
'count' the comments for each post ? 
I don't think facets are appropriate. 

On Apr 3, 2011, at 22:10, Kaushik Chakraborty  wrote:

> Ok. My expectation was since "comment_post_id" is a MultiValued field hence
> it would appear multiple times (i.e. for each comment). And hence when I
> would facet with that field it would also give me the count of those many
> documents where comment_post_id appears.
> 
> My requirement is getting total for every document i.e. finding number of
> comments per post in the whole corpus. To explain it more clearly, I'm
> getting a result xml something like this
> 
> 46
> Hello World
> 20
> 
>9
>10
> 
> 
>   19
>   2
> 
> 
>  46
>  46
> 
> 
>   Hello - from World
>   Hi
> 
> 
> 
>  
> *1*
> 
> I need the count to be 2 as the post 46 has 2 comments.
> 
> What other way can I approach?
> 
> Thanks,
> Kaushik
> 
> 
> On Mon, Apr 4, 2011 at 4:29 AM, Erick Erickson wrote:
> 
>> Hmmm, I think you're misunderstanding faceting. It's counting the
>> number of documents that have a particular value. So if you're
>> faceting on "comment_post_id", there is one and only one document
>> with that value (assuming that the comment_post_ids are unique).
>> Which is what's being reported This will be quite expensive on a
>> large corpus, BTW.
>> 
>> Is your task to show the totals for *every* document in your corpus or
>> just the ones in a display page? Because if the latter, your app could
>> just count up the number of elements in the XML returned for the
>> multiValued comments field.
>> 
>> If that's not relevant, could you explain a bit more why you need this
>> count?
>> 
>> Best
>> Erick
>> 
>> On Sun, Apr 3, 2011 at 2:31 PM, Kaushik Chakraborty >> wrote:
>> 
>>> Hi,
>>> 
>>> My index contains a root entity "Post" and a child entity "Comments".
>> Each
>>> post can have multiple comments. data-config.xml:
>>> 
>>> 
>>>   >> dataSource="jdbc" query="">
>>> 
>>>   
>>>   
>>>   
>>>   
>>>   
>>>   
>>>   
>>>   
>>>  
>>>   
>>> 
>>> 
>>> The schema has all columns of "comment" entity as "MultiValued" fields
>> and
>>> all fields are indexed & stored. My requirement is to count the number of
>>> comments for each post. Approach I'm taking is to query on "*:*" and
>>> faceting the result on "comment_post_id" so that it gives the count of
>>> comment occurred for that post.
>>> 
>>> But I'm getting incorrect result e.g. if a post has 2 comments, the
>>> multivalued fields are populated alright but the facet count is coming as
>> 1
>>> (for that post_id). What else do I need to do?
>>> 
>>> 
>>> Thanks,
>>> Kaushik
>>> 
>>

Re: Faceting on multivalued field

2011-04-03 Thread Erick Erickson

Why not count them on the way in and just store that number along
with the original e-mail?

Best
Erick

On Sun, Apr 3, 2011 at 10:10 PM, Kaushik Chakraborty wrote:

> Ok. My expectation was since "comment_post_id" is a MultiValued field hence
> it would appear multiple times (i.e. for each comment). And hence when I
> would facet with that field it would also give me the count of those many
> documents where comment_post_id appears.
>
> My requirement is getting total for every document i.e. finding number of
> comments per post in the whole corpus. To explain it more clearly, I'm
> getting a result xml something like this
>
> 46
> Hello World
> 20
> 
>9
>10
> 
> 
>   19
>   2
> 
> 
>  46
>  46
> 
> 
>   Hello - from World
>   Hi
> 
>
> 
>  
> *1*
>
> I need the count to be 2 as the post 46 has 2 comments.
>
>  What other way can I approach?
>
> Thanks,
> Kaushik
>
>
> On Mon, Apr 4, 2011 at 4:29 AM, Erick Erickson  >wrote:
>
> > Hmmm, I think you're misunderstanding faceting. It's counting the
> > number of documents that have a particular value. So if you're
> > faceting on "comment_post_id", there is one and only one document
> > with that value (assuming that the comment_post_ids are unique).
> > Which is what's being reported This will be quite expensive on a
> > large corpus, BTW.
> >
> > Is your task to show the totals for *every* document in your corpus or
> > just the ones in a display page? Because if the latter, your app could
> > just count up the number of elements in the XML returned for the
> > multiValued comments field.
> >
> > If that's not relevant, could you explain a bit more why you need this
> > count?
> >
> > Best
> > Erick
> >
> > On Sun, Apr 3, 2011 at 2:31 PM, Kaushik Chakraborty  > >wrote:
> >
> > > Hi,
> > >
> > > My index contains a root entity "Post" and a child entity "Comments".
> > Each
> > > post can have multiple comments. data-config.xml:
> > >
> > > 
> > > > > dataSource="jdbc" query="">
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >   
> > >
> > > 
> > >
> > > The schema has all columns of "comment" entity as "MultiValued" fields
> > and
> > > all fields are indexed & stored. My requirement is to count the number
> of
> > > comments for each post. Approach I'm taking is to query on "*:*" and
> > > faceting the result on "comment_post_id" so that it gives the count of
> > > comment occurred for that post.
> > >
> > > But I'm getting incorrect result e.g. if a post has 2 comments, the
> > > multivalued fields are populated alright but the facet count is coming
> as
> > 1
> > > (for that post_id). What else do I need to do?
> > >
> > >
> > > Thanks,
> > > Kaushik
> > >
> >
>

Re: Multiple Words in String

Re: Difference between Solr and Lucidworks distribution

does overwrite=false work with json

AW: Difference between Solr and Lucidworks distribution

Re: Difference between Solr and Lucidworks distribution

Faceting on multivalued field

Re: Using EmbeddedSolrServer with static documents

Re: Using EmbeddedSolrServer with static documents

Re: Multiple Words in String

Re: Faceting on multivalued field

Re: Using EmbeddedSolrServer with static documents

Re: admin/index.jsp double submit on IE

Re: Multiple Words in String

Re: Multiple Words in String

Re: Faceting on multivalued field

Re: Faceting on multivalued field

Re: Faceting on multivalued field

17 matches

Site Navigation

Mail list logo

Footer information