Query results vs. facets results

2012-07-15 Thread tudor
Hello,

I am new to Solr and I running some tests with our data in Solr. I am using
version 3.6 and the data is imported form a DB2 database using Solr's DIH.
We have defined a single entity in the db-data-config.xml, which is an
equivalent of the following query:



The ID in NAME_CONNECTIONS is not unique, so it might appear multiple times.

For the unique ID in the schema, we are using a solr.UUIDField:


http://localhost:8983/solr/db/select?indent=on&version=2.2&q=CITY:MILTON&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.ngroups=true&group.truncate=true

yields 

134

as a result, which is exactly what we expect.

On the other hand, running

http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.truncate=true&facet=true&facet.field=CITY&group.ngroups=true

yields


   
 
  
103

I would expect to have the same number (134) in this facet result as the
previous filter result. Could you please let me know why these two results
are different?

Thank you,
Tudor 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3995079.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR 4 Alpha Out Of Mem Err

2012-07-15 Thread Yonik Seeley
Do you have the following hard autoCommit in your config (as the stock
server does)?

 
   15000
   false
 

This is now fairly important since Solr now tracks information on
every uncommitted document added.
At some point we should probably hardcode some mechanism based on
number of documents or time.

-Yonik
http://lucidimagination.com


Re: Groups count in distributed grouping is wrong in some case

2012-07-15 Thread Agnieszka Kukałowicz
Hi,

I'm using SOLR 4.x from trunk. This was the version from 2012-07-10. So
this is one of the latest versions.

I searched mailing list and jira but found only this
https://issues.apache.org/jira/browse/SOLR-3436

It was committed in May to trunk so my version of SOLR has this fix. But
the problem still exists.

Cheers
Agnieszka

2012/7/15 Erick Erickson 

> what version of Solr are you using? There's been quite a bit of work
> on this lately,
> I'm not even sure how much has made it into 3.6. You might try searching
> the
> JIRA list, Martijn van Groningen has done a bunch of work lately, look for
> his name. Fortunately, it's not likely to get a bunch of false hits ..
>
> Best
> Erick
>
> On Fri, Jul 13, 2012 at 7:50 AM, Agnieszka Kukałowicz
>  wrote:
> > Hi,
> >
> > I have problem with faceting count in distributed grouping. It appears
> only
> > when I make query that returns almost all of the documents.
> >
> > My SOLR implementation has 4 shards and my queries looks like:
> >
> > http://host:port
> >
> /select/q?=*:*&shards=shard1,shard2,shard3,shard4&group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1
> >
> > With query like above I get strange counts for field category1.
> > The counts for values are very big:
> > 9659
> > 7015
> > 5676
> > 1180
> > 1105
> > 979
> > 770
> > 701
> > 612
> > 422
> > 358
> >
> > When I make query to narrow the results adding to query
> > fq=category1:"val1", etc. I get different counts than facet category1
> shows
> > for a few first values:
> >
> > fq=category1:"val1" - counts: 22
> > fq=category1:"val2" - counts: 22
> > fq=category1:"val3" - counts: 21
> > fq=category1:"val4" - counts: 19
> > fq=category1:"val5" - counts: 19
> > fq=category1:"val6" - counts: 20
> > fq=category1:"val7" - counts: 20
> > fq=category1:"val8" - counts: 25
> > fq=category1:"val9" - counts: 422
> > fq=category1:"val10" - counts: 358
> >
> > From val9 the count is ok.
> >
> > First I thought that for some values in facet "category1" groups count
> does
> > not work and it returns counts of all documents not group by field id.
> > But the number of all documents matches query  fq=category1:"val1" is
> > 45468. So the numbers are not the same.
> >
> > I check the queries on each shard for val1 and the results are:
> >
> > shard1:
> > query:
> >
> http://shard1/select/?q=*:*group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1
> >
> > 
> > 11
> >
> > query:
> >
> http://shard1/select/?q=*:*group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1&fq=category1
> > :"val1"
> >
> > shard 2:
> > query:
> >
> http://shard2/select/?q=*:*group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1
> >
> > there is no value "val1" in category1 facet.
> >
> > query:
> >
> http://shard2/select/?q=*:*group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1&fq=category1
> > :"val1"
> >
> > 7
> >
> > shard3:
> > query:
> >
> http://shard3/select/?q=*:*group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1
> >
> > there is no value val1 in category1 facet
> >
> > query:
> >
> http://shard3/select/?q=*:*group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1&fq=category1
> > :"val1"
> >
> > 4
> >
> > So it looks that detail query with fq=category1:"val1" returns the
> relevant
> > results. But Solr has problem with faceting counts when one of the shard
> > does not return the faceting value (in this scenario "val1") that exists
> on
> > other shards.
> >
> > I checked shards for "val10" and I got:
> >
> > shard1: count for val10 - 142
> > shard2: count for val10 - 131
> > shard3: count for val10 -  149
> > sum of counts 422 - ok.
> >
> > I'm not sure how to resolve that situation. For sure the counts of val1
> to
> > val9 should be different and they should not be on the top of the
> category1
> > facet because this is very confusing. Do you have any idea how to fix
> this
> > problem?
> >
> > Best regards
> > Agnieszka
>


Lost answers?

2012-07-15 Thread Bruno Mannina

Dear Solr Users,

I have a solr3.6 + Tomcat and I have a program that connect 4 http 
requests at the same time.

I must do 1902 requests.

I do several tests but each time it losts some requests:
- sometimes I get 1856 docs, 1895 docs, 1900 docs but never 1902 docs.

With Jetty, I get always 1902 docs.

As it's a dev' environment, I'm alone to test it.

Is it a problem to do 4 requests at the same time for tomcat6?

thanks for your info,

Bruno


Re: Facet on all the dynamic fields with *_s feature

2012-07-15 Thread Jack Krupansky
The answer appears to be "No", but it's good to hear people express an 
interest in proposed features.


-- Jack Krupansky

-Original Message- 
From: Rajani Maski

Sent: Sunday, July 15, 2012 12:02 AM
To: solr-user@lucene.apache.org
Subject: Facet on all the dynamic fields with *_s feature

Hi All,

  Is this issue fixed in solr 3.6 or 4.0:  Faceting on all Dynamic field
with facet.field=*_s

  Link  :  https://issues.apache.org/jira/browse/SOLR-247



 If it is not fixed, any suggestion on how do I achieve this?


My requirement is just same as this one :
http://lucene.472066.n3.nabble.com/Dynamic-facet-field-tc2979407.html#none


Regards
Rajani 



Solr - Spatial Search for Specif Areas on Map

2012-07-15 Thread samabhiK
Hi,

I am new to Solr Spatial Search and would like to understand if Solr can be
used successfully for very large data sets in the range of 4Billion records.
I need to search some filtered data based on a region - maybe a set of
lat/lons or polygon area. is that possible in solr? How fast is it with such
data size? Will it be able to handle the load for 1 req/sec? If so, how?
Do you think solr can beat the performance of PostGIS? As I am about to
choose the right technology for my new project, I need some expert comments
from the community.

Regards
Sam

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Spatial-Search-for-Specif-Areas-on-Map-tp3995051.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: SOLR 4 Alpha Out Of Mem Err

2012-07-15 Thread Nick Koton
> Do you have the following hard autoCommit in your config (as the stock
server does)?
> 
>   15000
>   false
> 

I have tried with and without that setting.  When I described running with
auto commit, that setting is what I mean.  I have varied the time in the
range 10,000-60,000 msec.  I have tried this setting with and without soft
commit in the server config file.

I have tried without this setting, but specifying the commit within time in
the solrj client in the add method.

In both these cases, the client seems to overrun the server and out of
memory in the server results.  One clarification I should make is that after
the server gets out of memory, the solrj client does NOT receive an error.
However, the documents indexed do not reliably appear to queries.

Approach #3 is to remove the autocommit in the server config, issue the add
method without commit within, but issue commits in the solrj client with
wait for sync and searcher set to true.  In case #3, I do not see the out of
memory in the server.  However, document index rates are restricted to about
1,000 per second.

 Nick

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Sunday, July 15, 2012 5:15 AM
To: solr-user@lucene.apache.org
Subject: Re: SOLR 4 Alpha Out Of Mem Err

Do you have the following hard autoCommit in your config (as the stock
server does)?

 
   15000
   false
 

This is now fairly important since Solr now tracks information on every
uncommitted document added.
At some point we should probably hardcode some mechanism based on number of
documents or time.

-Yonik
http://lucidimagination.com



Re: Query results vs. facets results

2012-07-15 Thread Erick Erickson
q and fq queries don't necessarily run through the same query parser, see:
http://wiki.apache.org/solr/SimpleFacetParameters#facet.query_:_Arbitrary_Query_Faceting

So try adding &debugQuery=on to both queries you submitted. My guess
is that if you look at the parsed queries, you'll see something that explains
your differences. If not, paste the results back and we can take a look.

BTW, ignore all the "explain" bits for now, the important bit is the parsed form
of q and fq in your queries.

Best
Erick

On Sat, Jul 14, 2012 at 5:11 AM, tudor  wrote:
> Hello,
>
> I am new to Solr and I running some tests with our data in Solr. We are
> using version 3.6 and the data is imported form a DB2 database using Solr's
> DIH. We have defined a single entity in the db-data-config.xml, which is an
> equivalent of the following query:
>  query="
> SELECT C.NAME,
>F.CITY
> FROM
> NAME_CONNECTIONS AS C
> JOIN NAME_DETAILS AS F
> ON C.DETAILS_NAME = F.NAME"
>>
> 
>
> This might lead to some names appearing multiple times in the result set.
> This is OK.
>
> For the unique ID in the schema, we are using a solr.UUIDField:
>
> 
>  stored="true" default="NEW"/
>
> All the searchable fields are declared as indexed and stored.
>
> I am aware of the fact that this is a very crude configuration, but for the
> tests that I am running it is fine.
>
> The problem that I have is the different result counts that I receive when I
> do equivalent queries for searching and faceting. For example, running the
> following query
>
> http://localhost:8983/solr/db/select?indent=on&version=2.2&q=CITY:MILTON&fq=&start=0&rows=100&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=NAME&group.ngroups=true&group.truncate=true
>
> yields
>
> 134
>
> as a result, which is exactly what we expect.
>
> On the other hand, running
>
> http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=NAME&group.truncate=true&facet=true&facet.field=CITY&group.ngroups=true
>
> yields
>
> 
>
>  
>   
> 103
>
> I would expect to have the same number (134) in this facet result as well.
> Could you please let me know why these two results are different?
>
> Thank you,
> Tudor
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3994988.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: 4.0.ALPHA vs 4.0 branch/trunk - what is best for upgrade?

2012-07-15 Thread Erick Erickson
Anything currently in the trunk will most probably be in the BETA and
in the eventual release. So I'd go with the trunk code. It'll always
be closer to the actual release than ALPHA or BETA

I know there've been some changes recently around, exactly
the "collection" name. In fact there's a discussion about
rearranging the whole example directory

Best
Erick

On Sat, Jul 14, 2012 at 9:54 PM, Roman Chyla  wrote:
> Hi,
>
> Is it intentional that the ALPHA release has a different folder structure
> as opposed to the trunk?
>
> eg. collection1 folder is missing in the ALPHA, but present in branch_4x
> and trunk
>
> lucene-trunk/solr/example/solr/collection1/conf/xslt/example_atom.xsl
> 4.0.0-ALPHA/solr/example/solr/conf/xslt/example_atom.xsl
> lucene_4x/solr/example/solr/collection1/conf/xslt/example_atom.xsl
>
>
> This has consequences for development - e.g. solr testcases do not expect
> that the collection1 is there for ALPHA.
>
> In general, what is your advice for developers who are upgrading from solr
> 3.x to solr 4.x? What codebase should we follow to minimize the pain of
> porting to the next BETA and stable releases?
>
> Thanks!
>
>   roman


Re: Metadata and FullText, indexed at different times - looking for best approach

2012-07-15 Thread Erick Erickson
You've got a couple of choices. There's a new patch in town
https://issues.apache.org/jira/browse/SOLR-139
that allows you to update individual fields in a doc if (and only if)
all the fields in the original document were stored (actually, all the
non-copy fields).

So if you're storing (stored="true") all your metadata information, you can
just update the document when the  text becomes available assuming you
know the uniqueKey when you update.

Under the covers, this will find the old document, get all the fields, add the
new fields to it, and re-index the whole thing.

Otherwise, your fallback idea is a good one.

Best
Erick

On Sat, Jul 14, 2012 at 11:05 PM, Alexandre Rafalovitch
 wrote:
> Hello,
>
> I have a database of metadata and I can inject it into SOLR with DIH
> just fine. But then, I also have the documents to extract full text
> from that I want to add to the same records as additional fields. I
> think DIH allows to run Tika at the ingestion time, but I may not have
> the full-text files at that point (they could arrive days later). I
> can match the file to the metadata by a file name matching a field
> name.
>
> What is the best approach to do that staggered indexing with minimum
> custom code? I guess my fallback position is a custom full-text
> indexer agent that re-adds the metadata fields when the file is being
> indexed. Is there anything better?
>
> I am a newbie using v4.0alpha of SOLR (and loving it).
>
> Thank you,
> Alex.
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)


Re: 4.0.ALPHA vs 4.0 branch/trunk - what is best for upgrade?

2012-07-15 Thread Jack Krupansky

"Anything currently in the trunk ..."

I think you mean "Anything in the 4x branch", since "trunk" is 5x by 
definition.


But I'd agree that taking a nightly build or building from the 4x branch is 
likely to be a better bet than the "old" Alpha.


-- Jack Krupansky

-Original Message- 
From: Erick Erickson

Sent: Sunday, July 15, 2012 11:02 AM
To: solr-user@lucene.apache.org
Subject: Re: 4.0.ALPHA vs 4.0 branch/trunk - what is best for upgrade?

Anything currently in the trunk will most probably be in the BETA and
in the eventual release. So I'd go with the trunk code. It'll always
be closer to the actual release than ALPHA or BETA

I know there've been some changes recently around, exactly
the "collection" name. In fact there's a discussion about
rearranging the whole example directory

Best
Erick

On Sat, Jul 14, 2012 at 9:54 PM, Roman Chyla  wrote:

Hi,

Is it intentional that the ALPHA release has a different folder structure
as opposed to the trunk?

eg. collection1 folder is missing in the ALPHA, but present in branch_4x
and trunk

lucene-trunk/solr/example/solr/collection1/conf/xslt/example_atom.xsl
4.0.0-ALPHA/solr/example/solr/conf/xslt/example_atom.xsl
lucene_4x/solr/example/solr/collection1/conf/xslt/example_atom.xsl


This has consequences for development - e.g. solr testcases do not expect
that the collection1 is there for ALPHA.

In general, what is your advice for developers who are upgrading from solr
3.x to solr 4.x? What codebase should we follow to minimize the pain of
porting to the next BETA and stable releases?

Thanks!

  roman 




Re: SOLR 4 Alpha Out Of Mem Err

2012-07-15 Thread Jack Krupansky
Maybe your rate of update is so high that the commit never gets a chance to 
run. So, maybe all these uncommitted updates are buffered up and using 
excess memory.


Try explicit commits from SolrJ, but less frequently. Or maybe if you just 
pause your updates periodically (every 30 seconds or so) the auto-commit 
would get a chance to occur. Although I have no idea how long a pause might 
be needed.


-- Jack Krupansky

-Original Message- 
From: Nick Koton

Sent: Sunday, July 15, 2012 10:52 AM
To: solr-user@lucene.apache.org ; yo...@lucidimagination.com
Subject: RE: SOLR 4 Alpha Out Of Mem Err


Do you have the following hard autoCommit in your config (as the stock

server does)?


  15000
  false



I have tried with and without that setting.  When I described running with
auto commit, that setting is what I mean.  I have varied the time in the
range 10,000-60,000 msec.  I have tried this setting with and without soft
commit in the server config file.

I have tried without this setting, but specifying the commit within time in
the solrj client in the add method.

In both these cases, the client seems to overrun the server and out of
memory in the server results.  One clarification I should make is that after
the server gets out of memory, the solrj client does NOT receive an error.
However, the documents indexed do not reliably appear to queries.

Approach #3 is to remove the autocommit in the server config, issue the add
method without commit within, but issue commits in the solrj client with
wait for sync and searcher set to true.  In case #3, I do not see the out of
memory in the server.  However, document index rates are restricted to about
1,000 per second.

Nick

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Sunday, July 15, 2012 5:15 AM
To: solr-user@lucene.apache.org
Subject: Re: SOLR 4 Alpha Out Of Mem Err

Do you have the following hard autoCommit in your config (as the stock
server does)?


  15000
  false


This is now fairly important since Solr now tracks information on every
uncommitted document added.
At some point we should probably hardcode some mechanism based on number of
documents or time.

-Yonik
http://lucidimagination.com 



Re: SOLR 4 Alpha Out Of Mem Err

2012-07-15 Thread Yonik Seeley
On Sun, Jul 15, 2012 at 11:52 AM, Nick Koton  wrote:
>> Do you have the following hard autoCommit in your config (as the stock
> server does)?
>> 
>>   15000
>>   false
>> 
>
> I have tried with and without that setting.  When I described running with
> auto commit, that setting is what I mean.

OK cool.  You should be able to run the stock server (i.e. with this
autocommit) and blast in updates all day long - it looks like you have
more than enough memory.  If you can't, we need to fix something.  You
shouldn't need explicit commits unless you want the docs to be
searchable at that point.

> Solrj multi-threaded client sends several 1,000 docs/sec

Can you expand on that?  How many threads at once are sending docs to
solr?  Is each request a single doc or multiple?

-Yonik
http://lucidimagination.com


Re: SOLR 4 Alpha Out Of Mem Err

2012-07-15 Thread Yonik Seeley
On Sun, Jul 15, 2012 at 12:52 PM, Jack Krupansky
 wrote:
> Maybe your rate of update is so high that the commit never gets a chance to
> run.

I don't believe that is possible.  If it is, it should be fixed.

-Yonik
http://lucidimagination.com


Re: SOLR 4 Alpha Out Of Mem Err

2012-07-15 Thread Jack Krupansky

Agreed. That's why I say "maybe". Clearly something sounds amiss here.

-- Jack Krupansky

-Original Message- 
From: Yonik Seeley

Sent: Sunday, July 15, 2012 12:06 PM
To: solr-user@lucene.apache.org
Subject: Re: SOLR 4 Alpha Out Of Mem Err

On Sun, Jul 15, 2012 at 12:52 PM, Jack Krupansky
 wrote:
Maybe your rate of update is so high that the commit never gets a chance 
to

run.


I don't believe that is possible.  If it is, it should be fixed.

-Yonik
http://lucidimagination.com 



Re: 4.0.ALPHA vs 4.0 branch/trunk - what is best for upgrade?

2012-07-15 Thread Mark Miller
The beta will have files that where in solr/conf and solr/data in 
solr/collection1/conf|data instead.

What Solr test cases are you referring to? The only ones that should care about 
this would have to be looking at the file system. If that is the case, simply 
update the path. The built in tests had to be adjusted for this as well.

The problem with having the default core use /solr as a conf dir is that if you 
create another core, where does it logically go? The default collection is 
called collection1, so now its conf and data lives in a folder called 
collection1. A new SolrCore called newsarticles would have it's conf and data 
in /solr/newsarticles.

There are still going to be some bumps as you move from alpha to beta to 
release if you are depending on very specific file system locations - however, 
they should be small bumps that are easily handled.

Just send an email to the user list if you'd like some help with anything in 
particular.

In this case, I'd update what you have to look at /solr/collection1 rather than 
simply /solr. It's still the default core, so simple URLs without the core name 
will still work. It won't affect HTTP communication. Just file system location.

On Jul 14, 2012, at 9:54 PM, Roman Chyla wrote:

> Hi,
> 
> Is it intentional that the ALPHA release has a different folder structure
> as opposed to the trunk?
> 
> eg. collection1 folder is missing in the ALPHA, but present in branch_4x
> and trunk
> 
> lucene-trunk/solr/example/solr/collection1/conf/xslt/example_atom.xsl
> 4.0.0-ALPHA/solr/example/solr/conf/xslt/example_atom.xsl
> lucene_4x/solr/example/solr/collection1/conf/xslt/example_atom.xsl
> 
> 
> This has consequences for development - e.g. solr testcases do not expect
> that the collection1 is there for ALPHA.
> 
> In general, what is your advice for developers who are upgrading from solr
> 3.x to solr 4.x? What codebase should we follow to minimize the pain of
> porting to the next BETA and stable releases?
> 
> Thanks!
> 
>  roman

- Mark Miller
lucidimagination.com













Re: Index version on slave incrementing to higher than master

2012-07-15 Thread Andrew Davidoff
Erick,

Thank you. I think originally my thought was that if I had my slave
configuration really close to my master config, it would be very easy to
promote a slave to a master (and vice versa) if necessary. But I think you
are correct that ripping out from the slave config anything that would
modify an index in any way makes sense. I will give this a try very soon.

Thanks again.
Andy


On Sat, Jul 14, 2012 at 5:22 PM, Erick Erickson wrote:

> Gotta admit it's a bit puzzling, and surely you want to move to the 3x
> versions ..
>
> But at a guess, things might be getting confused on the slaves given
> you have a merge policy on them. There's no reason to have any
> policies on the slaves; slaves should just be about copying the files
> from the master, all the policies,commits,optimizes should be done on
> the master. About all the slave does is copy the current state of the index
> from the master.
>
> So I'd try removing everything but the replication from the slaves,
> including
> any autocommit stuff and just let replication do it's thing.
>
> And I'd replicate after the optimize if you keep the optimize going. You
> should
> end up with one segment in the index after that, on both the master and
> slave.
> You can't get any more merged than that.
>
> Of course you'll also copy the _entire_ index every time after you've
> optimized...
>
> Best
> Erick
>
> On Fri, Jul 13, 2012 at 12:31 AM, Andrew Davidoff 
> wrote:
> > Hi,
> >
> > I am running solr 1.4.0+ds1-1ubuntu1. I have a master server that has a
> > number of solr instances running on it (150 or so), and nightly most of
> > them have documents written to them. The script that does these writes
> > (adds) does a commit and an optimize on the indexes when it's entirely
> > finished updating them, then initiates replication on the slave per
> > instance. In this configuration, the index versions between master and
> > slave remain in synch.
> >
> > The optimize portion, which, again, happens nightly, is taking a lot of
> > time and I think it's unnecessary. I was hoping to stop doing this
> explicit
> > optimize, and to let my merge policy handle that. However, if I don't do
> an
> > optimize, and only do a commit before initiating slave replication, some
> > hours later the slave is, for reasons that are unclear to me,
> incrementing
> > its index version to 1 higher than the master.
> >
> > I am not really sure I understand the logs, but it looks like the
> > incremented index version is the result of an optimize on the slave, but
> I
> > am never issuing any commands against the slave aside from initiating
> > replication, and I don't think there's anything in my solr configuration
> > that would be initiating this. I do have autoCommit on with maxDocs of
> > 1000, but since I am initiating slave replication after doing a commit on
> > the master, I don't think there would ever be any uncommitted documents
> on
> > the slave. I do have a merge policy configured, but it's not clear to me
> > that it has anything to do with this. And if it did, I'd expect to see
> > similar behavior on the master (right?).
> >
> > I have included a snipped from my slave logs that shows this issue. In
> this
> > snipped index version 1286065171264 is what the master has,
> > and 1286065171265 is what the slave increments itself to, which is then
> out
> > of synch with the master in terms of version numbers. Nothing that I know
> > of is issuing any commands to the slave at this time. If I understand
> these
> > logs (I might not), it looks like something issued an optimize that took
> > 1023720ms? Any ideas?
> >
> > Thanks in advance.
> >
> > Andy
> >
> >
> >
> > Jul 12, 2012 12:21:14 PM org.apache.solr.update.SolrIndexWriter close
> > FINE: Closing Writer DirectUpdateHandler2
> > Jul 12, 2012 12:21:14 PM org.apache.solr.core.SolrDeletionPolicy onCommit
> > INFO: SolrDeletionPolicy.onCommit: commits:num=2
> >
> >
> commit{dir=/var/lib/ontolo/solr/o_3952/index,segFN=segments_h8,version=1286065171264,generation=620,filenames=[_h6.fnm,
> > _h5.nrm, segments_h8, _h4.nrm, _h5.tii, _h4
> > .tii, _h5.tis, _h4.tis, _h4.fdx, _h5.fnm, _h6.tii, _h4.fdt, _h5.fdt,
> > _h5.fdx, _h5.frq, _h4.fnm, _h6.frq, _h6.tis, _h4.prx, _h4.frq, _h6.nrm,
> > _h5.prx, _h6.prx, _h6.fdt, _h6
> > .fdx]
> >
> >
> commit{dir=/var/lib/ontolo/solr/o_3952/index,segFN=segments_h9,version=1286065171265,generation=621,filenames=[_h7.tis,
> > _h7.fdx, _h7.fnm, _h7.fdt, _h7.prx, segment
> > s_h9, _h7.nrm, _h7.tii, _h7.frq]
> > Jul 12, 2012 12:21:14 PM org.apache.solr.core.SolrDeletionPolicy
> > updateCommits
> > INFO: newest commit = 1286065171265
> > Jul 12, 2012 12:21:14 PM org.apache.solr.search.SolrIndexSearcher 
> > INFO: Opening Searcher@4ac62082 main
> > Jul 12, 2012 12:21:14 PM org.apache.solr.update.DirectUpdateHandler2
> commit
> > INFO: end_commit_flush
> > Jul 12, 2012 12:21:14 PM org.apache.solr.search.SolrIndexSearcher warm
> > INFO: autowarming Searcher@4ac62082 main from Searcher@48d901f7 mai

Re: Lost answers?

2012-07-15 Thread Bruno Mannina

I forgot:

I do the request on the uniqueKey field, so each request gets one document

Le 15/07/2012 14:11, Bruno Mannina a écrit :

Dear Solr Users,

I have a solr3.6 + Tomcat and I have a program that connect 4 http 
requests at the same time.

I must do 1902 requests.

I do several tests but each time it losts some requests:
- sometimes I get 1856 docs, 1895 docs, 1900 docs but never 1902 docs.

With Jetty, I get always 1902 docs.

As it's a dev' environment, I'm alone to test it.

Is it a problem to do 4 requests at the same time for tomcat6?

thanks for your info,

Bruno







RE: SOLR 4 Alpha Out Of Mem Err

2012-07-15 Thread Nick Koton
>> Solrj multi-threaded client sends several 1,000 docs/sec

>Can you expand on that?  How many threads at once are sending docs to solr?
Is each request a single doc or multiple?
I realize, after the fact, that my solrj client is much like
org.apache.solr.client.solrj.LargeVolumeTestBase.  The number of threads is
configurable at run time as are the various commit parameters.  Most of the
test have been in the 4-16 threads range.  Most of my testing has been with
the single document SolrServer::add(SolrInputDocument doc )method.  When I
realized what LargeVolumeTestBase is doing, I converted my program to use
the SolrServer::add(Collection docs) method with 100
documents in each add batch.  Unfortunately, the out of memory errors still
occur without client side commits.

If you agree my three approaches to committing are logical, would it be
useful for me to try to reproduce this with "example" schema in a small
cloud configuration using LargeVolumeTestBase or the like?  It will take me
a couple days to work it in.  Or perhaps this sort of test is already run?

Best 
Nick

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Sunday, July 15, 2012 11:05 AM
To: Nick Koton
Cc: solr-user@lucene.apache.org
Subject: Re: SOLR 4 Alpha Out Of Mem Err

On Sun, Jul 15, 2012 at 11:52 AM, Nick Koton  wrote:
>> Do you have the following hard autoCommit in your config (as the 
>> stock
> server does)?
>> 
>>   15000
>>   false
>> 
>
> I have tried with and without that setting.  When I described running 
> with auto commit, that setting is what I mean.

OK cool.  You should be able to run the stock server (i.e. with this
autocommit) and blast in updates all day long - it looks like you have more
than enough memory.  If you can't, we need to fix something.  You shouldn't
need explicit commits unless you want the docs to be searchable at that
point.

> Solrj multi-threaded client sends several 1,000 docs/sec

Can you expand on that?  How many threads at once are sending docs to solr?
Is each request a single doc or multiple?

-Yonik
http://lucidimagination.com



JRockit with SOLR3.4/3.5

2012-07-15 Thread Salman Akram
We used JRockit with SOLR1.4 as default JVM had mem issues (not only it was 
consuming more mem but didn't restrict to the max mem allocated to tomcat - 
jrockit did restrict to max mem). However, JRockit gives an error while using 
it with SOLR3.4/3.5. Any ideas, why?

*** This Message Has Been Sent Using BlackBerry Internet Service from Mobilink 
***


Re: Query results vs. facets results

2012-07-15 Thread tudor
Hi Eric,

Thanks for the reply.

The query:
 
http://localhost:8983/solr/db/select?indent=on&version=2.2&q=CITY:MILTON&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.ngroups=true&group.truncate=true&debugQuery=on

yields this in the debug section:

CITY:MILTON
  CITY:MILTON
  CITY:MILTON
  CITY:MILTON
  LuceneQParser

in the explain section. There is no information about grouping.

Second query:

http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.truncate=true&facet=true&facet.field=CITY&facet.missing=true&group.ngroups=true&debugQuery=on

yields this in the debug section:


  *
  *
  ID:*
  ID:*
  LuceneQParser

To be honest, these do not tell me too much. I would like to see some
information about the grouping, since I believe this is where I am missing
something.

In the mean time, I have combined the two queries above, hoping to make some
sense out of the results. The following query filters all the entries with
the city name "MILTON" and groups together the ones with the same ID. Also,
the query facets the entries on city, grouping the ones with the same ID. So
the results numbers refer to the number of groups.

http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*&fq={!tag=dt}CITY:MILTON&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.truncate=true&facet=true&facet.field={!ex=dt}CITY&facet.missing=true&group.ngroups=true&debugQuery=on

yields the same (for me perplexing) results:


  
  284
  134

(i.e.: fq says: 134 groups with CITY:MILTON)
...


  
  
   ...
  103

(i.e.: faceted search says: 103 groups with CITY:MILTON)

I really believe that these different results have something to do with the
grouping that Solr makes, but I do not know how to dig into this.

Thank you and best regards,
Tudor

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3995079p3995150.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query results vs. facets results

2012-07-15 Thread tudor
Hi Erick, 

Thanks for the reply. 

The query: 
  
http://localhost:8983/solr/db/select?indent=on&version=2.2&q=CITY:MILTON&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.ngroups=true&group.truncate=true&debugQuery=on

yields this in the debug section: 

CITY:MILTON
  CITY:MILTON
  CITY:MILTON
  CITY:MILTON
  LuceneQParser

in the explain section. There is no information about grouping. 

Second query: 

http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.truncate=true&facet=true&facet.field=CITY&facet.missing=true&group.ngroups=true&debugQuery=on

yields this in the debug section: 


  *
  *
  ID:*
  ID:*
  LuceneQParser

To be honest, these do not tell me too much. I would like to see some
information about the grouping, since I believe this is where I am missing
something. 

In the mean time, I have combined the two queries above, hoping to make some
sense out of the results. The following query filters all the entries with
the city name "MILTON" and groups together the ones with the same ID. Also,
the query facets the entries on city, grouping the ones with the same ID. So
the results numbers refer to the number of groups. 

http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*&fq={!tag=dt}CITY:MILTON&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.truncate=true&facet=true&facet.field={!ex=dt}CITY&facet.missing=true&group.ngroups=true&debugQuery=on
 

yields the same (for me perplexing) results: 


  
  284
  134

(i.e.: fq says: 134 groups with CITY:MILTON) 
... 


  
  
   ... 
  103

(i.e.: faceted search says: 103 groups with CITY:MILTON) 

I really believe that these different results have something to do with the
grouping that Solr makes, but I do not know how to dig into this. 

Thank you and best regards, 
Tudor

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3995079p3995152.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query results vs. facets results

2012-07-15 Thread tudor
Hi Eric, 

Thanks for the reply. 

The query: 
  
http://localhost:8983/solr/db/select?indent=on&version=2.2&q=CITY:MILTON&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.ngroups=true&group.truncate=true&debugQuery=on

yields this in the debug section: 

CITY:MILTON
  CITY:MILTON
  CITY:MILTON
  CITY:MILTON
  LuceneQParser

There is no information about grouping. 

Second query: 

http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.truncate=true&facet=true&facet.field=CITY&facet.missing=true&group.ngroups=true&debugQuery=on

yields this in the debug section: 


  *
  *
  ID:*
  ID:*
  LuceneQParser

To be honest, these do not tell me too much. I would like to see some
information about the grouping, since I believe this is where I am missing
something. 

In the mean time, I have combined the two queries above, hoping to make some
sense out of the results. The following query filters all the entries with
the city name "MILTON" and groups together the ones with the same ID. Also,
the query facets the entries on city, grouping the ones with the same ID. So
the results numbers refer to the number of groups. 

http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*&fq={!tag=dt}CITY:MILTON&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.truncate=true&facet=true&facet.field={!ex=dt}CITY&facet.missing=true&group.ngroups=true&debugQuery=on
 

yields the same (for me perplexing) results: 


  
  284
  134

(i.e.: fq says: 134 groups with CITY:MILTON) 
... 


  
  
   ... 
  103

(i.e.: faceted search says: 103 groups with CITY:MILTON) 

I really believe that these different results have something to do with the
grouping that Solr makes, but I do not know how to dig into this. 

Thank you and best regards, 
Tudor

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3995079p3995154.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query results vs. facets results

2012-07-15 Thread tudor
Hi Erick, 

Thanks for the reply. 

The query: 
  
http://localhost:8983/solr/db/select?indent=on&version=2.2&q=CITY:MILTON&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.ngroups=true&group.truncate=true&debugQuery=on

yields this in the debug section: 

CITY:MILTON
  CITY:MILTON
  CITY:MILTON
  CITY:MILTON
  LuceneQParser

There is no information about grouping. 

Second query: 

http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.truncate=true&facet=true&facet.field=CITY&facet.missing=true&group.ngroups=true&debugQuery=on

yields this in the debug section: 


  *
  *
  ID:*
  ID:*
  LuceneQParser

To be honest, these do not tell me too much. I would like to see some
information about the grouping, since I believe this is where I am missing
something. 

In the mean time, I have combined the two queries above, hoping to make some
sense out of the results. The following query filters all the entries with
the city name "MILTON" and groups together the ones with the same ID. Also,
the query facets the entries on city, grouping the ones with the same ID. So
the results numbers refer to the number of groups. 

http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*&fq={!tag=dt}CITY:MILTON&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.truncate=true&facet=true&facet.field={!ex=dt}CITY&facet.missing=true&group.ngroups=true&debugQuery=on
 

yields the same (for me perplexing) results: 


  
  284
  134

(i.e.: fq says: 134 groups with CITY:MILTON) 
... 


  
  
   ... 
  103

(i.e.: faceted search says: 103 groups with CITY:MILTON) 

I really believe that these different results have something to do with the
grouping that Solr makes, but I do not know how to dig into this. 

Thank you and best regards, 
Tudor

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3995079p3995156.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: JRockit with SOLR3.4/3.5

2012-07-15 Thread Michael Della Bitta
Hello, Salman,

It would probably be helpful if you included the text/stack trace of
the error you're encountering, plus any other pertinent system
information you can think of.

One thing to remember is the memory usage you tune with Xmx is only
the maximum size of the heap, and there are other types of memory
usage by the JVM that don't fall under that (Permgen space, memory
mapped files, etc).

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Sun, Jul 15, 2012 at 3:19 PM, Salman Akram
 wrote:
> We used JRockit with SOLR1.4 as default JVM had mem issues (not only it was 
> consuming more mem but didn't restrict to the max mem allocated to tomcat - 
> jrockit did restrict to max mem). However, JRockit gives an error while using 
> it with SOLR3.4/3.5. Any ideas, why?
>
> *** This Message Has Been Sent Using BlackBerry Internet Service from 
> Mobilink ***


Re: SOLR 4 Alpha Out Of Mem Err

2012-07-15 Thread Michael Della Bitta
"unable to create new native thread"

That suggests you're running out of threads, not RAM. Possibly you're
using a multithreaded collector, and it's pushing you over the top of
how many threads your OS lets a single process allocate? Or somehow
the thread stack size is set too high?

More here: 
http://stackoverflow.com/questions/763579/how-many-threads-can-a-java-vm-support

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Sun, Jul 15, 2012 at 2:45 PM, Nick Koton  wrote:
>>> Solrj multi-threaded client sends several 1,000 docs/sec
>
>>Can you expand on that?  How many threads at once are sending docs to solr?
> Is each request a single doc or multiple?
> I realize, after the fact, that my solrj client is much like
> org.apache.solr.client.solrj.LargeVolumeTestBase.  The number of threads is
> configurable at run time as are the various commit parameters.  Most of the
> test have been in the 4-16 threads range.  Most of my testing has been with
> the single document SolrServer::add(SolrInputDocument doc )method.  When I
> realized what LargeVolumeTestBase is doing, I converted my program to use
> the SolrServer::add(Collection docs) method with 100
> documents in each add batch.  Unfortunately, the out of memory errors still
> occur without client side commits.
>
> If you agree my three approaches to committing are logical, would it be
> useful for me to try to reproduce this with "example" schema in a small
> cloud configuration using LargeVolumeTestBase or the like?  It will take me
> a couple days to work it in.  Or perhaps this sort of test is already run?
>
> Best
> Nick
>
> -Original Message-
> From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
> Sent: Sunday, July 15, 2012 11:05 AM
> To: Nick Koton
> Cc: solr-user@lucene.apache.org
> Subject: Re: SOLR 4 Alpha Out Of Mem Err
>
> On Sun, Jul 15, 2012 at 11:52 AM, Nick Koton  wrote:
>>> Do you have the following hard autoCommit in your config (as the
>>> stock
>> server does)?
>>> 
>>>   15000
>>>   false
>>> 
>>
>> I have tried with and without that setting.  When I described running
>> with auto commit, that setting is what I mean.
>
> OK cool.  You should be able to run the stock server (i.e. with this
> autocommit) and blast in updates all day long - it looks like you have more
> than enough memory.  If you can't, we need to fix something.  You shouldn't
> need explicit commits unless you want the docs to be searchable at that
> point.
>
>> Solrj multi-threaded client sends several 1,000 docs/sec
>
> Can you expand on that?  How many threads at once are sending docs to solr?
> Is each request a single doc or multiple?
>
> -Yonik
> http://lucidimagination.com
>


Re: 4.0.ALPHA vs 4.0 branch/trunk - what is best for upgrade?

2012-07-15 Thread Roman Chyla
I am using AbstractSolrTestCase (which in turn uses
solr.util.TestHarness) as a basis for unittests, but the solr
installation is outside of my source tree and I don't want to
duplicate it just to change a few lines (and with the new solr4.0 I
hope I can get the test-framework in a jar file, previously that
wasn't possible). So in essence, I have to deal with the expected
folder structure for all my unittests.

The way I make the configuration visible outside the solr standard
paths is to get the classloader and add folders to it, this way test
extensions for solr without having the same configuration. But I
should mimick the folder structure to be compatible.

Thanks all for you help, it is much appreciated.

roman

On Sun, Jul 15, 2012 at 1:46 PM, Mark Miller  wrote:
> The beta will have files that where in solr/conf and solr/data in 
> solr/collection1/conf|data instead.
>
> What Solr test cases are you referring to? The only ones that should care 
> about this would have to be looking at the file system. If that is the case, 
> simply update the path. The built in tests had to be adjusted for this as 
> well.
>
> The problem with having the default core use /solr as a conf dir is that if 
> you create another core, where does it logically go? The default collection 
> is called collection1, so now its conf and data lives in a folder called 
> collection1. A new SolrCore called newsarticles would have it's conf and data 
> in /solr/newsarticles.
>
> There are still going to be some bumps as you move from alpha to beta to 
> release if you are depending on very specific file system locations - 
> however, they should be small bumps that are easily handled.
>
> Just send an email to the user list if you'd like some help with anything in 
> particular.
>
> In this case, I'd update what you have to look at /solr/collection1 rather 
> than simply /solr. It's still the default core, so simple URLs without the 
> core name will still work. It won't affect HTTP communication. Just file 
> system location.
>
> On Jul 14, 2012, at 9:54 PM, Roman Chyla wrote:
>
>> Hi,
>>
>> Is it intentional that the ALPHA release has a different folder structure
>> as opposed to the trunk?
>>
>> eg. collection1 folder is missing in the ALPHA, but present in branch_4x
>> and trunk
>>
>> lucene-trunk/solr/example/solr/collection1/conf/xslt/example_atom.xsl
>> 4.0.0-ALPHA/solr/example/solr/conf/xslt/example_atom.xsl
>> lucene_4x/solr/example/solr/collection1/conf/xslt/example_atom.xsl
>>
>>
>> This has consequences for development - e.g. solr testcases do not expect
>> that the collection1 is there for ALPHA.
>>
>> In general, what is your advice for developers who are upgrading from solr
>> 3.x to solr 4.x? What codebase should we follow to minimize the pain of
>> porting to the next BETA and stable releases?
>>
>> Thanks!
>>
>>  roman
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>


are stopwords indexed?

2012-07-15 Thread Giovanni Gherdovich
Hi all,

are stopwords from the stopwords.txt config file
supposed to be indexed?

I would say no, but this is the situation I am
observing on my Solr instance:

* I have a bunch of stopwords in stopwords.txt
* my fields are of fieldType "text" from the example schema.xml,
  i.e. I have

-- -- >8 -- -- >8 -- -- >8 -- -- >8
   
  
[...]

[...]
  
  
 [...]
 
  
   
-- -- >8 -- -- >8 -- -- >8 -- -- >8

* searching for a stopwords thru solr gives always zero results
* inspecting the index with LuCLI
http://manpages.ubuntu.com/manpages/natty/man1/lucli.1.html
  show that all stopwords are in my index. Note that I query
  LuCLI specifying the field, i.e. with "myFieldName:and"
  and not just with the stopword "and".

Is this normal?

Are stopwords indexed?

Cheers,
Giovanni


Re: Solr - Spatial Search for Specif Areas on Map

2012-07-15 Thread David Smiley (@MITRE.org)
Sam,

These are big numbers you are throwing around, especially the query volume. 
How big are these records that you have 4 billion of -- or put another way,
how much space would it take up in a pure form like in CSV?  And should I
assume the searches you are doing are more than geospatial?  In any case, a
Solr solution here is going to involve many machines.  The biggest number
you propose is 10k queries per second which is hard to imagine.

I've seen some say Solr 4 might have 100M records per shard, although there
is a good deal variability -- as usual, YMMV.  But lets go with that for
this paper-napkin calculation.  You would need 40 shards of 100M documents
each to get to 4000M (4B) documents.  That is a lot of shards, but people
have done it, I believe.  This scales out to your document collection but
not up to your query volume which is extremely high.  I have some old
benchmarks suggesting ~10ms geo queries on spatial queries for SOLR-2155
which was rolled into the spatial code in Lucene 4 (Solr adapters are on the
way).  But for a full query overhead and for a safer estimate, lets say
50ms.  So perhaps you might get 20 concurrent queries per second (which
seems high but we'll go with it).  But you require 10k/sec(!) so this means
you need 500 times the 20qps which means 500 *times* the base hardware to
support the 40 shards I mentioned before.  In other words, the 4B documents
need to be replicated 500 times to support 10k/second queries.  So
theoretically, we're talking 500 clusters, each cluster having 40 shards --
at ~4 shards/machine this is 10 machines per cluster: 5,000 machines in
total.  Wow.  Doesn't seem realistic.  If you have a reference to some
system or person's experience with any system that can, Solr or not, then
please share.

If you or anyone were to attempt to see if Solr scale's for their needs, a
good approach is to consider just one shard non-replicated, or even better a
handful that would all exist on one machine.  Optimize it as much as you
can.  Then see how much data you can put on this machine and with what
query-volume.  From this point, it's basic math to see how many more such
machines are required to scale out to your data size and up to your query
volume.

Care to explain why so much data needs to be searched at such a volume? 
Maybe you work for Google ;-)

To your question on scalability vs PostGIS, I think Solr shines in its
ability to scale out if you have the resources to do it.

~ David Smiley

-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Spatial-Search-for-Specif-Areas-on-Map-tp3995051p3995197.html
Sent from the Solr - User mailing list archive at Nabble.com.


Wildcard query vs facet.prefix for autocomplete?

2012-07-15 Thread santamaria2
I'm about to implement an autocomplete mechanism for my search box. I've read
about some of the common approaches, but I have a question about wildcard
query vs facet.prefix.

Say I want autocomplete for a title: 'Shadows of the Damned'. I want this to
appear as a suggestion if I type 'sha' or 'dam' or 'the'. I don't care that
it won't appear if I type 'hadows'. 

While indexing, I'd use a whitespace tokenizer and a lowercase filter to
store that title in the index.
Now I'm thinking two approaches for 'dam' typed in the search box:

1) q=title:dam*

2) q=*:*&facet=on&facet.field=title&facet.prefix=dam


So any reason that I should favour one over the other? Speed a factor? The
index has around 200,000 items.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH - incorrect datasource being picked up by XPathEntityProcessor

2012-07-15 Thread girishyes

Thanks Gora, I tried that but didn't help.

Regards.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-incorrect-datasource-being-picked-up-by-XPathEntityProcessor-tp3994802p3995211.html
Sent from the Solr - User mailing list archive at Nabble.com.