Re: Boosing Basic

2014-04-04 Thread Alvaro Cabrerizo
Hi,

If I were you, I would start reading the edismax
documentation.
Apart from the wiki, you can find in every distribution a full example with
the configuration of the edismax query parser (check the xml node
requestHandler name="/browse" in the next file:
$YOUR_SOLR_DISTRIBUTION_DIRECTORY/solr/example/solr/collection1/conf/solrconfig.xml).

Regards.


On Thu, Apr 3, 2014 at 6:55 PM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions)  wrote:

> Hello,
>
> I am trying to implement boosting but I am not able to find a good
> example, Some places asking to add ^10 to boost score in some places it
> says use bf . I have query with condition (Name OR Description OR
> ProductType) but I like to show the results first Name and need to boost
> the condition.
>
> Thanks
>
> Ravi
>


Re: Solr join and lucene scoring

2014-04-04 Thread Alvaro Cabrerizo
Hi,

The defect you are referencing is closed with a resolution of *Invalid*, so
it seems the scoring is working fine with the join.  I've made the next two
tests on my own data and seems it is working:

*TestA*

   - fl=id,score
   - q=notebook
   - fq={!join from=product_list to=id fromIndex=product}id:*
   - rows=2

Gives me the next result with the score calculated:

4ADCBA5F-B532-4154-8E12-47311DC0FD50
*2.6598556*


C861CC4A-6481-4754-946F-EA3903371C80
*2.6598551*



*TESTB*

   - fl=id,score
   - q=notebook AND _query_:{!join from=product_list to=id
   fromIndex=product}id:*
   - rows=2

Gives me the next result with the score calcualted:


5C449525-8A69-409B-829C-671E147BF6BB
*0.1573925*


D1A719E8-F843-4E8D-AD82-64AA88D78BBB
*0.1571764*


 Regards.


On Thu, Apr 3, 2014 at 11:42 AM,  wrote:

> Hello,
>
> referencing to this issue:
> https://issues.apache.org/jira/browse/SOLR-4307
>
> Is it still not possible with the solr query time join to use scoring?
> Do I still have to write my own plugin or is there a plugin somewhere I
> could use?
>
> I never wrote a plugin for solr before, so I would prefer if I don't have
> to start from scratch.
>
> THX,
> Moritz
>
>


How to reduce the search speed of solrcloud

2014-04-04 Thread Sathya
Hi All,

Hi All, I am new to Solr. And i dont know how to increase the search speed
of solrcloud. I have indexed nearly 4 GB of data. When i am searching a
document using java with solrj, solr takes more 6 seconds to return a query
result. Any one please help me to reduce the search query time to less than
500 ms. i have allocate the 4 GB ram for solr. Please let me know for
further details about solrcloud config.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to reduce the search speed of solrcloud

2014-04-04 Thread Alexandre Rafalovitch
Show a sample query string that does that (takes 6 seconds to return).
Including all defaults you may have put in solrconfig.xml (if any).
That might give us a hint which features you are using and what
possible direction you could go in next. For the bonus points, enable
debug flag and rows=1 parameter to see how big your documents
themselves are.

You may have issues with a particular non-cloud-friendly feature, with
caches, with not reusing parts of your queries as 'fq', returning too
many fields or a bunch of other things.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Fri, Apr 4, 2014 at 2:31 PM, Sathya  wrote:
> Hi All,
>
> Hi All, I am new to Solr. And i dont know how to increase the search speed
> of solrcloud. I have indexed nearly 4 GB of data. When i am searching a
> document using java with solrj, solr takes more 6 seconds to return a query
> result. Any one please help me to reduce the search query time to less than
> 500 ms. i have allocate the 4 GB ram for solr. Please let me know for
> further details about solrcloud config.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to reduce the search speed of solrcloud

2014-04-04 Thread Sathya
Hi Alex,

33026985 Component Audio\:A
Shopping List 2012-01-11
09:02:42.96

This is what i am  indexed in solr. I have only 3 fields in index. And
i am just indexing id, subject and date of the news articles. Nearly 5
crore documents. Also i have attached my solrconfig and solr.xml file.
If u need more information, pls let me know.

On Fri, Apr 4, 2014 at 1:15 PM, Alexandre Rafalovitch [via Lucene]
 wrote:
>
> Show a sample query string that does that (takes 6 seconds to return).
> Including all defaults you may have put in solrconfig.xml (if any).
> That might give us a hint which features you are using and what
> possible direction you could go in next. For the bonus points, enable
> debug flag and rows=1 parameter to see how big your documents
> themselves are.
>
> You may have issues with a particular non-cloud-friendly feature, with
> caches, with not reusing parts of your queries as 'fq', returning too
> many fields or a bunch of other things.
>
> Regards,
>Alex.
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr 
> proficiency
>
>
> On Fri, Apr 4, 2014 at 2:31 PM, Sathya <[hidden email]> wrote:
>
> > Hi All,
> >
> > Hi All, I am new to Solr. And i dont know how to increase the search speed
> > of solrcloud. I have indexed nearly 4 GB of data. When i am searching a
> > document using java with solrj, solr takes more 6 seconds to return a query
> > result. Any one please help me to reduce the search query time to less than
> > 500 ms. i have allocate the 4 GB ram for solr. Please let me know for
> > further details about solrcloud config.
> >
> >
> >
> > --
> > View this message in context: 
> > http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
>
> 
> If you reply to this email, your message will be added to the discussion 
> below:
> http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129068.html
> To unsubscribe from How to reduce the search speed of solrcloud, click here.
> NAML


solrconfig.xml (101K) 

solr.xml (1K) 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129073.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to reduce the search speed of solrcloud

2014-04-04 Thread Alexandre Rafalovitch
What does your Solr query looks like (check the Solr backend log if
you don't know)?

And how many document is that? 50 million? Does not sound like much
for 3 fields. And what's the definitions (schema.xml rather than
solr.xml).

And what happens if you issue the query directly to Solr rather than
through the client? Is the speed much different?

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Fri, Apr 4, 2014 at 3:12 PM, Sathya  wrote:
> Hi Alex,
>
> 33026985 Component Audio\:A
> Shopping List 2012-01-11
> 09:02:42.96
>
> This is what i am  indexed in solr. I have only 3 fields in index. And
> i am just indexing id, subject and date of the news articles. Nearly 5
> crore documents. Also i have attached my solrconfig and solr.xml file.
> If u need more information, pls let me know.
>
> On Fri, Apr 4, 2014 at 1:15 PM, Alexandre Rafalovitch [via Lucene]
>  wrote:
>>
>> Show a sample query string that does that (takes 6 seconds to return).
>> Including all defaults you may have put in solrconfig.xml (if any).
>> That might give us a hint which features you are using and what
>> possible direction you could go in next. For the bonus points, enable
>> debug flag and rows=1 parameter to see how big your documents
>> themselves are.
>>
>> You may have issues with a particular non-cloud-friendly feature, with
>> caches, with not reusing parts of your queries as 'fq', returning too
>> many fields or a bunch of other things.
>>
>> Regards,
>>Alex.
>> Personal website: http://www.outerthoughts.com/
>> Current project: http://www.solr-start.com/ - Accelerating your Solr 
>> proficiency
>>
>>
>> On Fri, Apr 4, 2014 at 2:31 PM, Sathya <[hidden email]> wrote:
>>
>> > Hi All,
>> >
>> > Hi All, I am new to Solr. And i dont know how to increase the search speed
>> > of solrcloud. I have indexed nearly 4 GB of data. When i am searching a
>> > document using java with solrj, solr takes more 6 seconds to return a query
>> > result. Any one please help me to reduce the search query time to less than
>> > 500 ms. i have allocate the 4 GB ram for solr. Please let me know for
>> > further details about solrcloud config.
>> >
>> >
>> >
>> > --
>> > View this message in context: 
>> > http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067.html
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>> 
>> If you reply to this email, your message will be added to the discussion 
>> below:
>> http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129068.html
>> To unsubscribe from How to reduce the search speed of solrcloud, click here.
>> NAML
>
>
> solrconfig.xml (101K) 
> 
> solr.xml (1K) 
> 
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129073.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to reduce the search speed of solrcloud

2014-04-04 Thread Sathya
Hi,

I have attached my schema.xml file too.

And you are right. I have 50 million documents. When i use solr
browser to search a document, it will return within 1000 to 2000 ms.

My query looks like this:
http://10.10.1.14:5050/solr/set_recent_shard1_replica5/select?q=subject&indent=true

On 4/4/14, Alexandre Rafalovitch [via Lucene]
 wrote:
>
>
> What does your Solr query looks like (check the Solr backend log if
> you don't know)?
>
> And how many document is that? 50 million? Does not sound like much
> for 3 fields. And what's the definitions (schema.xml rather than
> solr.xml).
>
> And what happens if you issue the query directly to Solr rather than
> through the client? Is the speed much different?
>
> Regards,
>Alex.
>
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr
> proficiency
>
>
> On Fri, Apr 4, 2014 at 3:12 PM, Sathya  wrote:
>> Hi Alex,
>>
>> 33026985 Component Audio\:A
>> Shopping List 2012-01-11
>> 09:02:42.96
>>
>> This is what i am  indexed in solr. I have only 3 fields in index. And
>> i am just indexing id, subject and date of the news articles. Nearly 5
>> crore documents. Also i have attached my solrconfig and solr.xml file.
>> If u need more information, pls let me know.
>>
>> On Fri, Apr 4, 2014 at 1:15 PM, Alexandre Rafalovitch [via Lucene]
>>  wrote:
>>>
>>> Show a sample query string that does that (takes 6 seconds to return).
>>> Including all defaults you may have put in solrconfig.xml (if any).
>>> That might give us a hint which features you are using and what
>>> possible direction you could go in next. For the bonus points, enable
>>> debug flag and rows=1 parameter to see how big your documents
>>> themselves are.
>>>
>>> You may have issues with a particular non-cloud-friendly feature, with
>>> caches, with not reusing parts of your queries as 'fq', returning too
>>> many fields or a bunch of other things.
>>>
>>> Regards,
>>>Alex.
>>> Personal website: http://www.outerthoughts.com/
>>> Current project: http://www.solr-start.com/ - Accelerating your Solr
>>> proficiency
>>>
>>>
>>> On Fri, Apr 4, 2014 at 2:31 PM, Sathya <[hidden email]> wrote:
>>>
>>> > Hi All,
>>> >
>>> > Hi All, I am new to Solr. And i dont know how to increase the search
>>> > speed
>>> > of solrcloud. I have indexed nearly 4 GB of data. When i am searching
>>> > a
>>> > document using java with solrj, solr takes more 6 seconds to return a
>>> > query
>>> > result. Any one please help me to reduce the search query time to less
>>> > than
>>> > 500 ms. i have allocate the 4 GB ram for solr. Please let me know for
>>> > further details about solrcloud config.
>>> >
>>> >
>>> >
>>> > --
>>> > View this message in context:
>>> > http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067.html
>>> > Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>> 
>>> If you reply to this email, your message will be added to the discussion
>>> below:
>>> http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129068.html
>>> To unsubscribe from How to reduce the search speed of solrcloud, click
>>> here.
>>> NAML
>>
>>
>> solrconfig.xml (101K)
>> 
>> solr.xml (1K)
>> 
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129073.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
>
> ___
> If you reply to this email, your message will be added to the discussion
> below:
> http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129074.html
>
> To unsubscribe from How to reduce the search speed of solrcloud, visit
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4129067&code=c2F0aGlhLmJsYWNrc3RhckBnbWFpbC5jb218NDEyOTA2N3wtMjEyNDcwMTI5OA==


schema.xml (81K) 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129075.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to reduce the search speed of solrcloud

2014-04-04 Thread Alexandre Rafalovitch
Well, if the direct browser query is 1000ms and your client query is
6seconds, then it is not Solr itself you need to worry about first.
Something must be wrong at the client. Trying timing that bit. Maybe
it is writing from the client to your ultimate consumer that's the
problem.

Regards,
   Alex.
P.s. You should probably trim your schema to get rid of all the
example fields. Keep _version_ and _root_ but delete all the rest you
don't actually use. Same with dynamic fields and all fieldType
definitions you do not actually use. You can always reintroduce them
later from the example schemas if something is missing.

Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Fri, Apr 4, 2014 at 3:41 PM, Sathya  wrote:
> Hi,
>
> I have attached my schema.xml file too.
>
> And you are right. I have 50 million documents. When i use solr
> browser to search a document, it will return within 1000 to 2000 ms.
>
> My query looks like this:
> http://10.10.1.14:5050/solr/set_recent_shard1_replica5/select?q=subject&indent=true
>
> On 4/4/14, Alexandre Rafalovitch [via Lucene]
>  wrote:
>>
>>
>> What does your Solr query looks like (check the Solr backend log if
>> you don't know)?
>>
>> And how many document is that? 50 million? Does not sound like much
>> for 3 fields. And what's the definitions (schema.xml rather than
>> solr.xml).
>>
>> And what happens if you issue the query directly to Solr rather than
>> through the client? Is the speed much different?
>>
>> Regards,
>>Alex.
>>
>> Personal website: http://www.outerthoughts.com/
>> Current project: http://www.solr-start.com/ - Accelerating your Solr
>> proficiency
>>
>>
>> On Fri, Apr 4, 2014 at 3:12 PM, Sathya  wrote:
>>> Hi Alex,
>>>
>>> 33026985 Component Audio\:A
>>> Shopping List 2012-01-11
>>> 09:02:42.96
>>>
>>> This is what i am  indexed in solr. I have only 3 fields in index. And
>>> i am just indexing id, subject and date of the news articles. Nearly 5
>>> crore documents. Also i have attached my solrconfig and solr.xml file.
>>> If u need more information, pls let me know.
>>>
>>> On Fri, Apr 4, 2014 at 1:15 PM, Alexandre Rafalovitch [via Lucene]
>>>  wrote:

 Show a sample query string that does that (takes 6 seconds to return).
 Including all defaults you may have put in solrconfig.xml (if any).
 That might give us a hint which features you are using and what
 possible direction you could go in next. For the bonus points, enable
 debug flag and rows=1 parameter to see how big your documents
 themselves are.

 You may have issues with a particular non-cloud-friendly feature, with
 caches, with not reusing parts of your queries as 'fq', returning too
 many fields or a bunch of other things.

 Regards,
Alex.
 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr
 proficiency


 On Fri, Apr 4, 2014 at 2:31 PM, Sathya <[hidden email]> wrote:

 > Hi All,
 >
 > Hi All, I am new to Solr. And i dont know how to increase the search
 > speed
 > of solrcloud. I have indexed nearly 4 GB of data. When i am searching
 > a
 > document using java with solrj, solr takes more 6 seconds to return a
 > query
 > result. Any one please help me to reduce the search query time to less
 > than
 > 500 ms. i have allocate the 4 GB ram for solr. Please let me know for
 > further details about solrcloud config.
 >
 >
 >
 > --
 > View this message in context:
 > http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067.html
 > Sent from the Solr - User mailing list archive at Nabble.com.


 
 If you reply to this email, your message will be added to the discussion
 below:
 http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129068.html
 To unsubscribe from How to reduce the search speed of solrcloud, click
 here.
 NAML
>>>
>>>
>>> solrconfig.xml (101K)
>>> 
>>> solr.xml (1K)
>>> 
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129073.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>>
>>
>> ___
>> If you reply to this email, your message will be added to the discussion
>> below:
>> http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129074.html
>>
>> To unsubscribe from How to reduce the search speed of solrcloud, visit
>> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_

Query and field name with wildcard

2014-04-04 Thread Croci Francesco Luigi (ID SWS)
In my index I have some fields which have the same prefix(rmDocumentTitle, 
rmDocumentClass, rmDocumentSubclass, rmDocumentArt). Apparently it is not 
possible to specify a query like this:

q = rm* : some_word

Is there a way to do this without having to write a long list of ORs?

Another question is if it is really not possible to search a word over the 
entire index. Something like this: q = * : some_word

Thank you
Francesco


Re: Query and field name with wildcard

2014-04-04 Thread Alexandre Rafalovitch
Are you using eDisMax. That gives a lot of options, including field
aliasing, including a single name to multiple fields:
http://wiki.apache.org/solr/ExtendedDisMax#Field_aliasing_.2F_renaming
(with example on p77 of my book
http://www.packtpub.com/apache-solr-for-indexing-data/book :-)

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Fri, Apr 4, 2014 at 3:52 PM, Croci  Francesco Luigi (ID SWS)
 wrote:
> In my index I have some fields which have the same prefix(rmDocumentTitle, 
> rmDocumentClass, rmDocumentSubclass, rmDocumentArt). Apparently it is not 
> possible to specify a query like this:
>
> q = rm* : some_word
>
> Is there a way to do this without having to write a long list of ORs?
>
> Another question is if it is really not possible to search a word over the 
> entire index. Something like this: q = * : some_word
>
> Thank you
> Francesco


Re: How to reduce the search speed of solrcloud

2014-04-04 Thread Sathya
Hi,

Sorry, i cant get u alex. Can you please explain me(if you can).  Because
now only i entered into solr.


On Fri, Apr 4, 2014 at 2:20 PM, Alexandre Rafalovitch [via Lucene] <
ml-node+s472066n4129077...@n3.nabble.com> wrote:

> Well, if the direct browser query is 1000ms and your client query is
> 6seconds, then it is not Solr itself you need to worry about first.
> Something must be wrong at the client. Trying timing that bit. Maybe
> it is writing from the client to your ultimate consumer that's the
> problem.
>
> Regards,
>Alex.
> P.s. You should probably trim your schema to get rid of all the
> example fields. Keep _version_ and _root_ but delete all the rest you
> don't actually use. Same with dynamic fields and all fieldType
> definitions you do not actually use. You can always reintroduce them
> later from the example schemas if something is missing.
>
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr
> proficiency
>
>
> On Fri, Apr 4, 2014 at 3:41 PM, Sathya <[hidden 
> email]>
> wrote:
>
> > Hi,
> >
> > I have attached my schema.xml file too.
> >
> > And you are right. I have 50 million documents. When i use solr
> > browser to search a document, it will return within 1000 to 2000 ms.
> >
> > My query looks like this:
> >
> http://10.10.1.14:5050/solr/set_recent_shard1_replica5/select?q=subject&indent=true
> >
> > On 4/4/14, Alexandre Rafalovitch [via Lucene]
> > <[hidden email] >
> wrote:
> >>
> >>
> >> What does your Solr query looks like (check the Solr backend log if
> >> you don't know)?
> >>
> >> And how many document is that? 50 million? Does not sound like much
> >> for 3 fields. And what's the definitions (schema.xml rather than
> >> solr.xml).
> >>
> >> And what happens if you issue the query directly to Solr rather than
> >> through the client? Is the speed much different?
> >>
> >> Regards,
> >>Alex.
> >>
> >> Personal website: http://www.outerthoughts.com/
> >> Current project: http://www.solr-start.com/ - Accelerating your Solr
> >> proficiency
> >>
> >>
> >> On Fri, Apr 4, 2014 at 3:12 PM, Sathya <[hidden 
> >> email]>
> wrote:
> >>> Hi Alex,
> >>>
> >>> 33026985 Component Audio\:A
> >>> Shopping List 2012-01-11
> >>> 09:02:42.96
> >>>
> >>> This is what i am  indexed in solr. I have only 3 fields in index. And
> >>> i am just indexing id, subject and date of the news articles. Nearly 5
> >>> crore documents. Also i have attached my solrconfig and solr.xml file.
> >>> If u need more information, pls let me know.
> >>>
> >>> On Fri, Apr 4, 2014 at 1:15 PM, Alexandre Rafalovitch [via Lucene]
> >>> <[hidden email] >
> wrote:
> 
>  Show a sample query string that does that (takes 6 seconds to
> return).
>  Including all defaults you may have put in solrconfig.xml (if any).
>  That might give us a hint which features you are using and what
>  possible direction you could go in next. For the bonus points, enable
>  debug flag and rows=1 parameter to see how big your documents
>  themselves are.
> 
>  You may have issues with a particular non-cloud-friendly feature,
> with
>  caches, with not reusing parts of your queries as 'fq', returning too
>  many fields or a bunch of other things.
> 
>  Regards,
> Alex.
>  Personal website: http://www.outerthoughts.com/
>  Current project: http://www.solr-start.com/ - Accelerating your Solr
>  proficiency
> 
> 
>  On Fri, Apr 4, 2014 at 2:31 PM, Sathya <[hidden email]> wrote:
> 
>  > Hi All,
>  >
>  > Hi All, I am new to Solr. And i dont know how to increase the
> search
>  > speed
>  > of solrcloud. I have indexed nearly 4 GB of data. When i am
> searching
>  > a
>  > document using java with solrj, solr takes more 6 seconds to return
> a
>  > query
>  > result. Any one please help me to reduce the search query time to
> less
>  > than
>  > 500 ms. i have allocate the 4 GB ram for solr. Please let me know
> for
>  > further details about solrcloud config.
>  >
>  >
>  >
>  > --
>  > View this message in context:
>  >
> http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067.html
>  > Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
>  
>  If you reply to this email, your message will be added to the
> discussion
>  below:
> 
> http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129068.html
>  To unsubscribe from How to reduce the search speed of solrcloud,
> click
>  here.
>  NAML
> >>>
> >>>
> >>> solrconfig.xml (101K)
> >>> <
> http://luc

Re: How to reduce the search speed of solrcloud

2014-04-04 Thread Alexandre Rafalovitch
You said your request is 6 seconds when going through the SolrJ
client. But it is 1 second (1000 ms) when going directly to Solr
bypassing the SolrJ. So, the other 5 seconds must be added outside of
Solr. Concentrate on that.

Regarding your schema, you used example schema that has a lot of stuff
you do not need. Here is what a very small schema looks like:
https://github.com/arafalov/solr-indexing-book/blob/master/published/collection1/conf/schema.xml
, so you can compare. That's an example from my book. You may find the
book a fast way to get from your current state to early intermediate
(no cloud examples, though).

Contact me directly if you need a discount.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Fri, Apr 4, 2014 at 4:11 PM, Sathya  wrote:
> Hi,
>
> Sorry, i cant get u alex. Can you please explain me(if you can).  Because
> now only i entered into solr.
>
>
> On Fri, Apr 4, 2014 at 2:20 PM, Alexandre Rafalovitch [via Lucene] <
> ml-node+s472066n4129077...@n3.nabble.com> wrote:
>
>> Well, if the direct browser query is 1000ms and your client query is
>> 6seconds, then it is not Solr itself you need to worry about first.
>> Something must be wrong at the client. Trying timing that bit. Maybe
>> it is writing from the client to your ultimate consumer that's the
>> problem.
>>
>> Regards,
>>Alex.
>> P.s. You should probably trim your schema to get rid of all the
>> example fields. Keep _version_ and _root_ but delete all the rest you
>> don't actually use. Same with dynamic fields and all fieldType
>> definitions you do not actually use. You can always reintroduce them
>> later from the example schemas if something is missing.
>>
>> Personal website: http://www.outerthoughts.com/
>> Current project: http://www.solr-start.com/ - Accelerating your Solr
>> proficiency
>>
>>
>> On Fri, Apr 4, 2014 at 3:41 PM, Sathya <[hidden 
>> email]>
>> wrote:
>>
>> > Hi,
>> >
>> > I have attached my schema.xml file too.
>> >
>> > And you are right. I have 50 million documents. When i use solr
>> > browser to search a document, it will return within 1000 to 2000 ms.
>> >
>> > My query looks like this:
>> >
>> http://10.10.1.14:5050/solr/set_recent_shard1_replica5/select?q=subject&indent=true
>> >
>> > On 4/4/14, Alexandre Rafalovitch [via Lucene]
>> > <[hidden email] >
>> wrote:
>> >>
>> >>
>> >> What does your Solr query looks like (check the Solr backend log if
>> >> you don't know)?
>> >>
>> >> And how many document is that? 50 million? Does not sound like much
>> >> for 3 fields. And what's the definitions (schema.xml rather than
>> >> solr.xml).
>> >>
>> >> And what happens if you issue the query directly to Solr rather than
>> >> through the client? Is the speed much different?
>> >>
>> >> Regards,
>> >>Alex.
>> >>
>> >> Personal website: http://www.outerthoughts.com/
>> >> Current project: http://www.solr-start.com/ - Accelerating your Solr
>> >> proficiency
>> >>
>> >>
>> >> On Fri, Apr 4, 2014 at 3:12 PM, Sathya <[hidden 
>> >> email]>
>> wrote:
>> >>> Hi Alex,
>> >>>
>> >>> 33026985 Component Audio\:A
>> >>> Shopping List 2012-01-11
>> >>> 09:02:42.96
>> >>>
>> >>> This is what i am  indexed in solr. I have only 3 fields in index. And
>> >>> i am just indexing id, subject and date of the news articles. Nearly 5
>> >>> crore documents. Also i have attached my solrconfig and solr.xml file.
>> >>> If u need more information, pls let me know.
>> >>>
>> >>> On Fri, Apr 4, 2014 at 1:15 PM, Alexandre Rafalovitch [via Lucene]
>> >>> <[hidden email] >
>> wrote:
>> 
>>  Show a sample query string that does that (takes 6 seconds to
>> return).
>>  Including all defaults you may have put in solrconfig.xml (if any).
>>  That might give us a hint which features you are using and what
>>  possible direction you could go in next. For the bonus points, enable
>>  debug flag and rows=1 parameter to see how big your documents
>>  themselves are.
>> 
>>  You may have issues with a particular non-cloud-friendly feature,
>> with
>>  caches, with not reusing parts of your queries as 'fq', returning too
>>  many fields or a bunch of other things.
>> 
>>  Regards,
>> Alex.
>>  Personal website: http://www.outerthoughts.com/
>>  Current project: http://www.solr-start.com/ - Accelerating your Solr
>>  proficiency
>> 
>> 
>>  On Fri, Apr 4, 2014 at 2:31 PM, Sathya <[hidden email]> wrote:
>> 
>>  > Hi All,
>>  >
>>  > Hi All, I am new to Solr. And i dont know how to increase the
>> search
>>  > speed
>>  > of solrcloud. I have indexed nearly 4 GB of data. When i am
>> searching
>>  > a
>>  > docum

RE: tf and very short text fields

2014-04-04 Thread Markus Jelsma
Hi - In this case Walter, iirc, was looking for two things: no normalization 
and no flat TF (1f for tf(float freq) > 0). We know that k1 controls TF 
saturation but in BM25Similarity you can see that k1 is multiplied by the 
encoded norm value, taking b also into account. So setting k1 to zero 
effectively disabled length normalization and results in flat or binary TF. 

Here's an example output of k1 = 0 and k1 = 0.2. Norms or enabled on the field, 
term occurs three times in the field:

28.203003 = score(doc=0,freq=1.5 = phraseFreq=1.5
), product of:
  6.4 = boost
  4.406719 = idf(docFreq=1, docCount=122)
  1.0 = tfNorm, computed from:
1.5 = phraseFreq=1.5
0.0 = parameter k1
0.75 = parameter b
8.721312 = avgFieldLength
16.0 = fieldLength




27.813797 = score(doc=0,freq=1.5 = phraseFreq=1.5
), product of:
  6.4 = boost
  4.406719 = idf(docFreq=1, docCount=122)
  0.98619986 = tfNorm, computed from:
1.5 = phraseFreq=1.5
0.2 = parameter k1
0.75 = parameter b
8.721312 = avgFieldLength
16.0 = fieldLength


You can clearly see the final TF norm being 1, despite the term frequency and 
length. Please correct my wrongs :)
Markus

 
 
-Original message-
> From:Tom Burton-West 
> Sent: Thursday 3rd April 2014 20:18
> To: solr-user@lucene.apache.org
> Subject: Re: tf and very short text fields
> 
> Hi Markus and Wunder,
> 
> I'm  missing the original context, but I don't think BM25 will solve this
> particular problem.
> 
> The k1 parameter sets how quickly the contribution of tf to the score falls
> off with increasing tf.   It would be helpful for making sure really long
> documents don't get too high a score, but I don't think it would help for
> very short documents without messing up its original design purpose.
> 
> For BM25, if you want to turn off length normalization, you set "b" to 0.
>  However, I don't think that will do what you want, since turning off
> normalization will mean that the score for "new york, new york"  will be
> twice that of the score for "new york" since without normalization the tf
> in "new york new york" is twice that of "new york".
> 
> I think the earlier suggestion to "override tfidfsimilarity and emit 1f in
> tf() is probably the best way to switch to eliminate using tf counts,
> assumming that is really what you want.
> 
> Tom
> 
> 
> 
> 
> 
> 
> 
> 
> On Tue, Apr 1, 2014 at 4:17 PM, Walter Underwood wrote:
> 
> > Thanks! We'll try that out and report back. I keep forgetting that I want
> > to try BM25, so this is a good excuse.
> >
> > wunder
> >
> > On Apr 1, 2014, at 12:30 PM, Markus Jelsma 
> > wrote:
> >
> > > Also, if i remember correctly, k1 set to zero for bm25 automatically
> > omits norms in the calculation. So thats easy to play with without
> > reindexing.
> > >
> > >
> > > Markus Jelsma  schreef:Yes, override
> > tfidfsimilarity and emit 1f in tf(). You can also use bm25 with k1 set to
> > zero in your schema.
> > >
> > >
> > > Walter Underwood  schreef:And here is another
> > peculiarity of short text fields.
> > >
> > > The movie "New York, New York" should not be twice as relevant for the
> > query "new york". Is there a way to use a binary term frequency rather than
> > a count?
> > >
> > > wunder
> > > --
> > > Walter Underwood
> > > wun...@wunderwood.org
> > >
> > >
> > >
> >
> > --
> > Walter Underwood
> > wun...@wunderwood.org
> >
> >
> >
> >
> 


Cannot run program "svnversion" when building lucene 4.7.1

2014-04-04 Thread Puneet Pawaia
Hi all.

I am trying to build lucene 4.7.1 from the sources. I can compile without
any issues but when I try to build the dist, lucene gives me
Cannot run program "svnversion" ... The system cannot find the specified
file.

I am compiling on Windows 7 64-bit using java version 1.7.0.45 64-bit.

Where can I get this svnversion ?

Thanks
Puneet


Solr Search on Fields name

2014-04-04 Thread anuragwalia
Hi,

Thank for giving your important time.

Problem :
I am unable to find a way how can I search Key with "OR" operator like if I
search Items having  "RuleA" OR "RuleE".

Format of Indexed Data:



1.0
.
4
2
2
2


Can any one help me out how can prepare SearchQuery for key search.


Regards
Anurag 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Search-on-Fields-name-tp4129119.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Cannot run program "svnversion" when building lucene 4.7.1

2014-04-04 Thread Ahmet Arslan
Hi,

When you install subversion, svnversion executable comes with that too. 
Did you install any svn client for Windows?



On Friday, April 4, 2014 3:38 PM, Puneet Pawaia  wrote:
Hi all.

I am trying to build lucene 4.7.1 from the sources. I can compile without
any issues but when I try to build the dist, lucene gives me
Cannot run program "svnversion" ... The system cannot find the specified
file.

I am compiling on Windows 7 64-bit using java version 1.7.0.45 64-bit.

Where can I get this svnversion ?

Thanks
Puneet



Re: Solr Search on Fields name

2014-04-04 Thread Ahmet Arslan


Hi Anurag,

It seems that RuleA and RuleB are field names?

in that case try this query

q=RuleA:[* TO *] OR RuleB:[* TO *]

Ahmet


On Friday, April 4, 2014 4:15 PM, anuragwalia  wrote:
Hi,

Thank for giving your important time.

Problem :
I am unable to find a way how can I search Key with "OR" operator like if I
search Items having  "RuleA" OR "RuleE".

Format of Indexed Data:



1.0
.
4
2
2
2


Can any one help me out how can prepare SearchQuery for key search.


Regards
Anurag 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Search-on-Fields-name-tp4129119.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Full Indexing is Causing a Java Heap Out of Memory Exception

2014-04-04 Thread Candygram For Mongo
The ramBufferSizeMB was set to 6MB only on the test system to make the
system crash sooner.  In production that tag is commented out which
I believe forces the default value to be used.


On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan  wrote:

> Hi,
>
> out of curiosity, why did you set ramBufferSizeMB to 6?
>
> Ahmet
>
>
>
>
> On Friday, April 4, 2014 3:27 AM, Candygram For Mongo <
> candygram.for.mo...@gmail.com> wrote:
> *Main issue: Full Indexing is Causing a Java Heap Out of Memory Exception
>
> *SOLR/Lucene version: *4.2.1*
>
> *JVM version:
>
> Java(TM) SE Runtime Environment (build 1.7.0_07-b11)
>
> Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)
>
>
>
> *Indexer startup command:
>
> set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m
>
>
>
> java " %JVMARGS% ^
>
> -Dcom.sun.management.jmxremote.port=1092 ^
>
> -Dcom.sun.management.jmxremote.ssl=false ^
>
> -Dcom.sun.management.jmxremote.authenticate=false ^
>
> -jar start.jar
>
>
>
> *SOLR indexing HTTP parameters request:
>
> webapp=/solr path=/dataimport
> params={clean=false&command=full-import&wt=javabin&version=2}
>
>
>
> We are getting a Java heap OOM exception when indexing (updating) 27
> million records.  If we increase the Java heap memory settings the problem
> goes away but we believe the problem has not been fixed and that we will
> eventually get the same OOM exception.  We have other processes on the
> server that also require resources so we cannot continually increase the
> memory settings to resolve the OOM issue.  We are trying to find a way to
> configure the SOLR instance to reduce or preferably eliminate the
> possibility of an OOM exception.
>
>
>
> We can reproduce the problem on a test machine.  We set the Java heap
> memory size to 64MB to accelerate the exception.  If we increase this
> setting the same problems occurs, just hours later.  In the test
> environment, we are using the following parameters:
>
>
>
> JVMARGS=-XX:MaxPermSize=64m -Xss256K -Xmx64m -Xms64m
>
>
>
> Normally we use the default solrconfig.xml file with only the following jar
> file references added:
>
>
>
> 
>
> 
>
> 
>
>
>
> Using these values and trying to index 6 million records from the database,
> the Java Heap Out of Memory exception is thrown very quickly.
>
>
>
> We were able to complete a successful indexing by further modifying the
> solrconfig.xml and removing all or all but one  tags from the
> schema.xml file.
>
>
>
> The following solrconfig.xml values were modified:
>
>
>
> 6
>
>
>
> 
>
> 2
>
> 2
>
> 10
>
> 150
>
> 
>
>
>
> 
>
> 15000  
>
> false
>
> 
>
>
>
> Using our customized schema.xml file with two or more  tags, the
> OOM exception is always thrown.  Based on the errors, the problem occurs
> when the process was trying to do the merge.  The error is provided below:
>
>
>
> Exception in thread "Lucene Merge Thread #156"
> org.apache.lucene.index.MergePolicy$MergeException:
> java.lang.OutOfMemoryError: Java heap space
>
> at
>
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:541)
>
> at
>
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:514)
>
> Caused by: java.lang.OutOfMemoryError: Java heap space
>
> at
>
> org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.loadNumeric(Lucene42DocValuesProducer.java:180)
>
> at
>
> org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.getNumeric(Lucene42DocValuesProducer.java:146)
>
> at
>
> org.apache.lucene.index.SegmentCoreReaders.getNormValues(SegmentCoreReaders.java:301)
>
> at
> org.apache.lucene.index.SegmentReader.getNormValues(SegmentReader.java:259)
>
> at
> org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:233)
>
> at
> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:137)
>
> at
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3693)
>
> at
> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3296)
>
> at
>
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:401)
>
> at
>
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:478)
>
> Mar 12, 2014 12:17:40 AM org.apache.solr.common.SolrException log
>
> SEVERE: auto commit error...:java.lang.IllegalStateException: this writer
> hit an OutOfMemoryError; cannot commit
>
> at
> org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3971)
>
> at
>
> org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2744)
>
> at
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2827)
>
> at
> org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2807)
>
> a

Re: Does sorting skip everything having to do with relevancy?

2014-04-04 Thread Shawn Heisey
On 4/4/2014 12:48 AM, Alvaro Cabrerizo wrote:
> By default solr is using the sort parameter over the "score field". So if
> you overwrite it using other sort field, yes solr will use the parameter
> you've provided. Remember, you can use multiple fields for
> sorting so
> you can make something like: sort score desc, your_field1 asc, your_field2
> desc
> 
> The score of documents is calculated on every query (it does not depend on
> the sort parameter or the debugQueryParameter) and the debubQuery is only a
> mechanism for showing (or hidding) how score was calculated. If you want to
> see a document score for a particular query (apart from the debugQuery) you
> can ask for it in the solr response adding the parameter *fl=*,score* to
> your request.

These are things that I already know.

What I want to know is whether Solr has code in place that will avoid
wasting CPU cycles calculating the score that will never be displayed or
used, *especially* the complex boost parameter that's in the request
handler definition (solrconfig.xml).

min(recip(abs(ms(NOW/HOUR,registered_date)),1.92901e-10,1.5,1.5),0.85)

Do I need to send 'boost=' as a parameter (along with my sort) to get it
to avoid that calculation?

Thanks,
Shawn



Re: Query and field name with wildcard

2014-04-04 Thread Ahmet Arslan
Hi,

bq. possible to search a word over the entire index.

You can a get list of all searchable fields (indexed=true) programmatically by 
https://wiki.apache.org/solr/LukeRequestHandler
And then you can fed this list to qf parameter of (e)dismax.

This could be implemented as a custom query parser plugin that searches a word 
over the entire index.


Ahmet


On Friday, April 4, 2014 12:08 PM, Alexandre Rafalovitch  
wrote:
Are you using eDisMax. That gives a lot of options, including field
aliasing, including a single name to multiple fields:
http://wiki.apache.org/solr/ExtendedDisMax#Field_aliasing_.2F_renaming
(with example on p77 of my book
http://www.packtpub.com/apache-solr-for-indexing-data/book :-)

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency



On Fri, Apr 4, 2014 at 3:52 PM, Croci  Francesco Luigi (ID SWS)
 wrote:
> In my index I have some fields which have the same prefix(rmDocumentTitle, 
> rmDocumentClass, rmDocumentSubclass, rmDocumentArt). Apparently it is not 
> possible to specify a query like this:
>
> q = rm* : some_word
>
> Is there a way to do this without having to write a long list of ORs?
>
> Another question is if it is really not possible to search a word over the 
> entire index. Something like this: q = * : some_word
>
> Thank you
> Francesco



Re: Full Indexing is Causing a Java Heap Out of Memory Exception

2014-04-04 Thread Ahmet Arslan
Hi,

Which database are you using? Can you send us data-config.xml? 

What happens when you use default merge policy settings?

What happens when you dump your table to Comma Separated File and fed that file 
to solr?

Ahmet

On Friday, April 4, 2014 5:10 PM, Candygram For Mongo 
 wrote:

The ramBufferSizeMB was set to 6MB only on the test system to make the system 
crash sooner.  In production that tag is commented out which I believe forces 
the default value to be used.




On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan  wrote:

Hi,
>
>out of curiosity, why did you set ramBufferSizeMB to 6? 
>
>Ahmet
>
>
>
>
>
>On Friday, April 4, 2014 3:27 AM, Candygram For Mongo 
> wrote:
>*Main issue: Full Indexing is Causing a Java Heap Out of Memory Exception
>
>*SOLR/Lucene version: *4.2.1*
>
>
>*JVM version:
>
>Java(TM) SE Runtime Environment (build 1.7.0_07-b11)
>
>Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)
>
>
>
>*Indexer startup command:
>
>set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m
>
>
>
>java " %JVMARGS% ^
>
>-Dcom.sun.management.jmxremote.port=1092 ^
>
>-Dcom.sun.management.jmxremote.ssl=false ^
>
>-Dcom.sun.management.jmxremote.authenticate=false ^
>
>-jar start.jar
>
>
>
>*SOLR indexing HTTP parameters request:
>
>webapp=/solr path=/dataimport
>params={clean=false&command=full-import&wt=javabin&version=2}
>
>
>
>We are getting a Java heap OOM exception when indexing (updating) 27
>million records.  If we increase the Java heap memory settings the problem
>goes away but we believe the problem has not been fixed and that we will
>eventually get the same OOM exception.  We have other processes on the
>server that also require resources so we cannot continually increase the
>memory settings to resolve the OOM issue.  We are trying to find a way to
>configure the SOLR instance to reduce or preferably eliminate the
>possibility of an OOM exception.
>
>
>
>We can reproduce the problem on a test machine.  We set the Java heap
>memory size to 64MB to accelerate the exception.  If we increase this
>setting the same problems occurs, just hours later.  In the test
>environment, we are using the following parameters:
>
>
>
>JVMARGS=-XX:MaxPermSize=64m -Xss256K -Xmx64m -Xms64m
>
>
>
>Normally we use the default solrconfig.xml file with only the following jar
>file references added:
>
>
>
>
>
>
>
>
>
>
>
>Using these values and trying to index 6 million records from the database,
>the Java Heap Out of Memory exception is thrown very quickly.
>
>
>
>We were able to complete a successful indexing by further modifying the
>solrconfig.xml and removing all or all but one  tags from the
>schema.xml file.
>
>
>
>The following solrconfig.xml values were modified:
>
>
>
>6
>
>
>
>
>
>2
>
>2
>
>10
>
>150
>
>
>
>
>
>
>
>15000  
>
>false
>
>
>
>
>
>Using our customized schema.xml file with two or more  tags, the
>OOM exception is always thrown.  Based on the errors, the problem occurs
>when the process was trying to do the merge.  The error is provided below:
>
>
>
>Exception in thread "Lucene Merge Thread #156"
>org.apache.lucene.index.MergePolicy$MergeException:
>java.lang.OutOfMemoryError: Java heap space
>
>                at
>org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:541)
>
>                at
>org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:514)
>
>Caused by: java.lang.OutOfMemoryError: Java heap space
>
>                at
>org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.loadNumeric(Lucene42DocValuesProducer.java:180)
>
>                at
>org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.getNumeric(Lucene42DocValuesProducer.java:146)
>
>                at
>org.apache.lucene.index.SegmentCoreReaders.getNormValues(SegmentCoreReaders.java:301)
>
>                at
>org.apache.lucene.index.SegmentReader.getNormValues(SegmentReader.java:259)
>
>                at
>org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:233)
>
>                at
>org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:137)
>
>                at
>org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3693)
>
>                at
>org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3296)
>
>                at
>org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:401)
>
>                at
>org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:478)
>
>Mar 12, 2014 12:17:40 AM org.apache.solr.common.SolrException log
>
>SEVERE: auto commit error...:java.lang.IllegalStateException: this writer
>hit an OutOfMemoryError; cannot commit
>
>                at
>org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3971)
>
>                at
>org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2744)
>
>                at
>org.apache.lucene.index.IndexW

Re: tf and very short text fields

2014-04-04 Thread Ahmet Arslan
Hi,

Another dimple approach is: 
If you don't use phrase query or phrase boosting, you can set 
omitTermFreqAndPositions=true

Ahmet


On Friday, April 4, 2014 2:38 PM, Markus Jelsma  
wrote:
Hi - In this case Walter, iirc, was looking for two things: no normalization 
and no flat TF (1f for tf(float freq) > 0). We know that k1 controls TF 
saturation but in BM25Similarity you can see that k1 is multiplied by the 
encoded norm value, taking b also into account. So setting k1 to zero 
effectively disabled length normalization and results in flat or binary TF. 

Here's an example output of k1 = 0 and k1 = 0.2. Norms or enabled on the field, 
term occurs three times in the field:

        28.203003 = score(doc=0,freq=1.5 = phraseFreq=1.5
), product of:
          6.4 = boost
          4.406719 = idf(docFreq=1, docCount=122)
          1.0 = tfNorm, computed from:
            1.5 = phraseFreq=1.5
            0.0 = parameter k1
            0.75 = parameter b
            8.721312 = avgFieldLength
            16.0 = fieldLength




        27.813797 = score(doc=0,freq=1.5 = phraseFreq=1.5
), product of:
          6.4 = boost
          4.406719 = idf(docFreq=1, docCount=122)
          0.98619986 = tfNorm, computed from:
            1.5 = phraseFreq=1.5
            0.2 = parameter k1
            0.75 = parameter b
            8.721312 = avgFieldLength
            16.0 = fieldLength


You can clearly see the final TF norm being 1, despite the term frequency and 
length. Please correct my wrongs :)
Markus




-Original message-
> From:Tom Burton-West 
> Sent: Thursday 3rd April 2014 20:18
> To: solr-user@lucene.apache.org
> Subject: Re: tf and very short text fields
> 
> Hi Markus and Wunder,
> 
> I'm  missing the original context, but I don't think BM25 will solve this
> particular problem.
> 
> The k1 parameter sets how quickly the contribution of tf to the score falls
> off with increasing tf.   It would be helpful for making sure really long
> documents don't get too high a score, but I don't think it would help for
> very short documents without messing up its original design purpose.
> 
> For BM25, if you want to turn off length normalization, you set "b" to 0.
>  However, I don't think that will do what you want, since turning off
> normalization will mean that the score for "new york, new york"  will be
> twice that of the score for "new york" since without normalization the tf
> in "new york new york" is twice that of "new york".
> 
> I think the earlier suggestion to "override tfidfsimilarity and emit 1f in
> tf() is probably the best way to switch to eliminate using tf counts,
> assumming that is really what you want.
> 
> Tom
> 
> 
> 
> 
> 
> 
> 
> 
> On Tue, Apr 1, 2014 at 4:17 PM, Walter Underwood wrote:
> 
> > Thanks! We'll try that out and report back. I keep forgetting that I want
> > to try BM25, so this is a good excuse.
> >
> > wunder
> >
> > On Apr 1, 2014, at 12:30 PM, Markus Jelsma 
> > wrote:
> >
> > > Also, if i remember correctly, k1 set to zero for bm25 automatically
> > omits norms in the calculation. So thats easy to play with without
> > reindexing.
> > >
> > >
> > > Markus Jelsma  schreef:Yes, override
> > tfidfsimilarity and emit 1f in tf(). You can also use bm25 with k1 set to
> > zero in your schema.
> > >
> > >
> > > Walter Underwood  schreef:And here is another
> > peculiarity of short text fields.
> > >
> > > The movie "New York, New York" should not be twice as relevant for the
> > query "new york". Is there a way to use a binary term frequency rather than
> > a count?
> > >
> > > wunder
> > > --
> > > Walter Underwood
> > > wun...@wunderwood.org
> > >
> > >
> > >
> >
> > --
> > Walter Underwood
> > wun...@wunderwood.org
> >
> >
> >
> >
>


Re: How to reduce the search speed of solrcloud

2014-04-04 Thread Shawn Heisey
On 4/4/2014 1:31 AM, Sathya wrote:
> Hi All,
> 
> Hi All, I am new to Solr. And i dont know how to increase the search speed
> of solrcloud. I have indexed nearly 4 GB of data. When i am searching a
> document using java with solrj, solr takes more 6 seconds to return a query
> result. Any one please help me to reduce the search query time to less than
> 500 ms. i have allocate the 4 GB ram for solr. Please let me know for
> further details about solrcloud config.

How much total RAM do you have on the system, and how much total index
data is on that system (adding up all the Solr cores)?  You've already
said that you have allocated 4GB of RAM for Solr.

Later you said you had 50 million documents, and then you showed us a
URL that looks like SolrCloud.

I suspect that you don't have enough RAM left over to cache your index
effectively -- the OS Disk Cache is too small.

http://wiki.apache.org/solr/SolrPerformanceProblems

Another possible problem, also discussed on that page, is that your Java
heap is too small.

Thanks,
Shawn



Difference between ["" TO *] and [* TO *] at Solr?

2014-04-04 Thread Furkan KAMACI
Hİ;

What is the difference between ["" TO *] and [* TO *] at Solr? (I tested it
at 4.5.1 and numFounds are different.

Thanks;
Furkan KAMACI


Solr Search For Documents That Has Empty Content For a Given Particular Field

2014-04-04 Thread Furkan KAMACI
Hi;

How can I find the documents that has empty content for a given field. I
don't mean something like:

-field:[* TO *]

because it returns the documents that has not given particular field. I
have documents something like:

"field1":"some text",
"field2":"some text",
"field" : "" // this is the field that I want to learn which document has
it.

Thanks;
Furkan KAMACI


Re: tf and very short text fields

2014-04-04 Thread Tom Burton-West
Thanks Marcus,

I was thinking about normalization and was absolutely wrong about setting
K1 to zero.   I should have taken a look at the algorithm and walked
through setting K=0.  (This is easier to do looking at the formula in
wikipedia http://en.wikipedia.org/wiki/Okapi_BM25 than walking though the
code.)
When you set k1 to 0 it does just what you said i.e provides binary tf.
 That part of the formula  returns 1 if the term is present and 0 if not.
Which is I think what Wunder was trying to accomplish.

Sorry about jumping in without double checking things first.

Tom


On Fri, Apr 4, 2014 at 7:38 AM, Markus Jelsma wrote:

> Hi - In this case Walter, iirc, was looking for two things: no
> normalization and no flat TF (1f for tf(float freq) > 0). We know that k1
> controls TF saturation but in BM25Similarity you can see that k1 is
> multiplied by the encoded norm value, taking b also into account. So
> setting k1 to zero effectively disabled length normalization and results in
> flat or binary TF.
>
> Here's an example output of k1 = 0 and k1 = 0.2. Norms or enabled on the
> field, term occurs three times in the field:
>
> 28.203003 = score(doc=0,freq=1.5 = phraseFreq=1.5
> ), product of:
>   6.4 = boost
>   4.406719 = idf(docFreq=1, docCount=122)
>   1.0 = tfNorm, computed from:
> 1.5 = phraseFreq=1.5
> 0.0 = parameter k1
> 0.75 = parameter b
> 8.721312 = avgFieldLength
> 16.0 = fieldLength
>
>
>
>
> 27.813797 = score(doc=0,freq=1.5 = phraseFreq=1.5
> ), product of:
>   6.4 = boost
>   4.406719 = idf(docFreq=1, docCount=122)
>   0.98619986 = tfNorm, computed from:
> 1.5 = phraseFreq=1.5
> 0.2 = parameter k1
> 0.75 = parameter b
> 8.721312 = avgFieldLength
> 16.0 = fieldLength
>
>
> You can clearly see the final TF norm being 1, despite the term frequency
> and length. Please correct my wrongs :)
> Markus
>
>
>
> -Original message-
> > From:Tom Burton-West 
> > Sent: Thursday 3rd April 2014 20:18
> > To: solr-user@lucene.apache.org
> > Subject: Re: tf and very short text fields
> >
> > Hi Markus and Wunder,
> >
> > I'm  missing the original context, but I don't think BM25 will solve this
> > particular problem.
> >
> > The k1 parameter sets how quickly the contribution of tf to the score
> falls
> > off with increasing tf.   It would be helpful for making sure really long
> > documents don't get too high a score, but I don't think it would help for
> > very short documents without messing up its original design purpose.
> >
> > For BM25, if you want to turn off length normalization, you set "b" to 0.
> >  However, I don't think that will do what you want, since turning off
> > normalization will mean that the score for "new york, new york"  will be
> > twice that of the score for "new york" since without normalization the tf
> > in "new york new york" is twice that of "new york".
> >
> > I think the earlier suggestion to "override tfidfsimilarity and emit 1f
> in
> > tf() is probably the best way to switch to eliminate using tf counts,
> > assumming that is really what you want.
> >
> > Tom
> >
> >
> >
> >
> >
> >
> >
> >
> > On Tue, Apr 1, 2014 at 4:17 PM, Walter Underwood  >wrote:
> >
> > > Thanks! We'll try that out and report back. I keep forgetting that I
> want
> > > to try BM25, so this is a good excuse.
> > >
> > > wunder
> > >
> > > On Apr 1, 2014, at 12:30 PM, Markus Jelsma  >
> > > wrote:
> > >
> > > > Also, if i remember correctly, k1 set to zero for bm25 automatically
> > > omits norms in the calculation. So thats easy to play with without
> > > reindexing.
> > > >
> > > >
> > > > Markus Jelsma  schreef:Yes, override
> > > tfidfsimilarity and emit 1f in tf(). You can also use bm25 with k1 set
> to
> > > zero in your schema.
> > > >
> > > >
> > > > Walter Underwood  schreef:And here is another
> > > peculiarity of short text fields.
> > > >
> > > > The movie "New York, New York" should not be twice as relevant for
> the
> > > query "new york". Is there a way to use a binary term frequency rather
> than
> > > a count?
> > > >
> > > > wunder
> > > > --
> > > > Walter Underwood
> > > > wun...@wunderwood.org
> > > >
> > > >
> > > >
> > >
> > > --
> > > Walter Underwood
> > > wun...@wunderwood.org
> > >
> > >
> > >
> > >
> >
>


Re: Solr Search For Documents That Has Empty Content For a Given Particular Field

2014-04-04 Thread Ahmet Arslan
Hi Furkan,

q=fiel:""&fl=field works for me (4.7.0). 

Ahmet


On Friday, April 4, 2014 5:50 PM, Furkan KAMACI  wrote:
Hi;

How can I find the documents that has empty content for a given field. I
don't mean something like:

-field:[* TO *]

because it returns the documents that has not given particular field. I
have documents something like:

"field1":"some text",
"field2":"some text",
"field" : "" // this is the field that I want to learn which document has
it.

Thanks;
Furkan KAMACI



Re: Solr Search For Documents That Has Empty Content For a Given Particular Field

2014-04-04 Thread Furkan KAMACI
Hi;

II tried it before but does not work


2014-04-04 18:08 GMT+03:00 Ahmet Arslan :

> Hi Furkan,
>
> q=fiel:""&fl=field works for me (4.7.0).
>
> Ahmet
>
>
> On Friday, April 4, 2014 5:50 PM, Furkan KAMACI 
> wrote:
> Hi;
>
> How can I find the documents that has empty content for a given field. I
> don't mean something like:
>
> -field:[* TO *]
>
> because it returns the documents that has not given particular field. I
> have documents something like:
>
> "field1":"some text",
> "field2":"some text",
> "field" : "" // this is the field that I want to learn which document has
> it.
>
> Thanks;
> Furkan KAMACI
>
>


Re: Solr Search For Documents That Has Empty Content For a Given Particular Field

2014-04-04 Thread Ahmet Arslan
Hi,

Weird, for type="string" it works for me. What is the field type you are using? 


On Friday, April 4, 2014 6:25 PM, Furkan KAMACI  wrote:

Hi;
II tried it before but does not work



2014-04-04 18:08 GMT+03:00 Ahmet Arslan :

Hi Furkan,
>
>q=fiel:""&fl=field works for me (4.7.0). 
>
>Ahmet
>
>
>
>On Friday, April 4, 2014 5:50 PM, Furkan KAMACI  wrote:
>Hi;
>
>How can I find the documents that has empty content for a given field. I
>don't mean something like:
>
>-field:[* TO *]
>
>because it returns the documents that has not given particular field. I
>have documents something like:
>
>"field1":"some text",
>"field2":"some text",
>"field" : "" // this is the field that I want to learn which document has
>it.
>
>Thanks;
>Furkan KAMACI
>
>


Strange behavior of edismax and mm=0 with long queries (bug?)

2014-04-04 Thread Nils Kaiser
Hey,

I am currently using solr to recognize songs and people from a list of user
comments. My index stores the titles of the songs. At the moment my
application builds word ngrams and fires a search with that query, which
works well but is quite inefficient.

So my thought was to simply use the collated comments as query. So it is a
case where the query is much longer. I need to use mm=0 or mm=1.

My plan was to use edismax as the pf2 and pf3 parameters should work well
for my usecase.

However when using longer queries, I get a strange behavior which can be
seen in debugQuery.

Here is an example:

Collated Comments (used as query)

"I love Henry so much. It is hard to tear your eyes away from Maria, but
watch just his feet. You'll be amazed.
sometimes pure skill can will a comp, sometimes pure joy can win... put
them both together and there is no competition
This video clip makes me smile.
Pure joy!
so good!
Who's the person that gave this a thumbs down?!? This is one of the best
routines I've ever seen. Period. And it's a competitionl! How is that
possible? They're so good it boggles my mind.
It's gorgeous. Flawless victory.
Great number! Does anybody know the name of the piece?
I believe it's called Sunny side of the street
Maria is like, the best 'follow' I've ever seen. She's so amazing.
Thanks so much Johnathan!"

Song name in Index
Louis Armstrong - Sunny Side of The Street

parsedquery_toString:
+(((text:I) (text:love) (text:Henry) (text:so) (text:much.) (text:It)
(text:is) (text:hard) (text:to) (text:tear) (text:your) (text:eyes)
(text:away) (text:from) (text:Maria,) (text:but) (text:watch) (text:just)
(text:his) (text:feet.) (text:You'll) (text:be) (text:amazed.)
(text:sometimes) (text:pure) (text:skill) (text:can) (text:will) (text:a)
(text:comp,) (text:sometimes) (text:pure) (text:joy) (text:can)
(text:win...) (text:put) (text:them) (text:both) +(text:together)
+(text:there) (text:is) (text:no) (text:competition) (text:This)
(text:video) (text:clip) (text:makes) (text:me) (text:smile.) (text:Pure)
(text:joy!) (text:so) (text:good!) (text:Who's) (text:the) (text:person)
(text:that) (text:gave) (text:this) (text:a) (text:thumbs) (text:down?!?)
(text:This) (text:is) (text:one) (text:of) (text:the) (text:best)
(text:routines) (text:I've) (text:ever) (text:seen.) +(text:Period.)
+(text:it's) (text:a) (text:competitionl!) (text:How) (text:is) (text:that)
(text:possible?) (text:They're) (text:so) (text:good) (text:it)
(text:boggles) (text:my) (text:mind.) (text:It's) (text:gorgeous.)
(text:Flawless) (text:victory.) (text:Great) (text:number!) (text:Does)
(text:anybody) (text:know) (text:the) (text:name) (text:of) (text:the)
(text:piece?) (text:I) (text:believe) (text:it's) (text:called)
(text:Sunny) (text:side) (text:of) (text:the) (text:street) (text:Maria)
(text:is) (text:like,) (text:the) (text:best) (text:'follow') (text:I've)
(text:ever) (text:seen.) (text:She's) (text:so) (text:amazing.)
(text:Thanks) (text:so) (text:much) (text:Johnathan!))~1)

This query generates 0 results. The reason is it expects terms together,
there, Period., it's to be part of the document (see parsedquery above, all
other terms are optional, those terms are must).

Is there any reason for this behavior? If I use shorter queries it works
flawlessly and returns the document.

I've appended the whole query.

Best,

Nils




  0
  11




  I love Henry so much. It is hard to tear your eyes away from Maria, but watch just his feet. You'll be amazed.
sometimes pure skill can will a comp, sometimes pure joy can win... put them both together and there is no competition
This video clip makes me smile.
Pure joy!
so good!
Who's the person that gave this a thumbs down?!? This is one of the best routines I've ever seen. Period. And it's a competitionl! How is that possible? They're so good it boggles my mind.
It's gorgeous. Flawless victory.
Great number! Does anybody know the name of the piece?
I believe it's called Sunny side of the street
Maria is like, the best 'follow' I've ever seen. She's so amazing.
Thanks so much Johnathan!

  I love Henry so much. It is hard to tear your eyes away from Maria, but watch just his feet. You'll be amazed.
sometimes pure skill can will a comp, sometimes pure joy can win... put them both together and there is no competition
This video clip makes me smile.
Pure joy!
so good!
Who's the person that gave this a thumbs down?!? This is one of the best routines I've ever seen. Period. And it's a competitionl! How is that possible? They're so good it boggles my mind.
It's gorgeous. Flawless victory.
Great number! Does anybody know the name of the piece?
I believe it's called Sunny side of the street
Maria is like, the best 'follow' I've ever seen. She's so amazing.
Thanks so much Johnathan!

  (+((DisjunctionMaxQuery((text:I)) DisjunctionMaxQuery((text:love)) DisjunctionMaxQuery((text:Henry)) DisjunctionMaxQuery((text:so)) DisjunctionMaxQuery((text:much.)) DisjunctionMaxQuery((text:It)) DisjunctionMax

Re: Solr Search For Documents That Has Empty Content For a Given Particular Field

2014-04-04 Thread Chris Hostetter

: "field" : "" // this is the field that I want to learn which document has
: it.

How you (can) query for a field value like that is going to depend 
entirely on the FieldTYpe/Analyzer ... if it's a string field, of uses 
KeywordTokenizer then q=field:"" should find it -- if you use a more 
traditional analyzer then it probably didn't produce any terms for hte 
input "" and from Solr's perspective a document that was indexed using 
an empty string value is exactly the same as a document that had no value 
when index.

In essenc,e your question is equivilent to asking "How can i search for 
doc1, but not doc2, evne though i'm using LowerCaseAnalyzer which produces 
exactly the same indexe terms or both...

   doc1: Quick Fox
   doc2: quick fox



-Hoss
http://www.lucidworks.com/


Re: How to reduce the search speed of solrcloud

2014-04-04 Thread Sathya
Hi shawn,

I have indexed 50 million data in 5 servers. 3 servers have 8gb ram. One
have 24gb and another one have 64gb ram. I was allocate 4 gb ram to solr in
each machine. I am using solrcloud. My total index size is 50gb including 5
servers. Each server have 3 zookeepers. Still I didnt check about OS disk
cache and heap memory. I will check and let u know shawn. If anything, pls
let me know.

Thank u shawn.

On Friday, April 4, 2014, Shawn Heisey-4 [via Lucene] <
ml-node+s472066n4129150...@n3.nabble.com> wrote:
> On 4/4/2014 1:31 AM, Sathya wrote:
>> Hi All,
>>
>> Hi All, I am new to Solr. And i dont know how to increase the search
speed
>> of solrcloud. I have indexed nearly 4 GB of data. When i am searching a
>> document using java with solrj, solr takes more 6 seconds to return a
query
>> result. Any one please help me to reduce the search query time to less
than
>> 500 ms. i have allocate the 4 GB ram for solr. Please let me know for
>> further details about solrcloud config.
>
> How much total RAM do you have on the system, and how much total index
> data is on that system (adding up all the Solr cores)?  You've already
> said that you have allocated 4GB of RAM for Solr.
>
> Later you said you had 50 million documents, and then you showed us a
> URL that looks like SolrCloud.
>
> I suspect that you don't have enough RAM left over to cache your index
> effectively -- the OS Disk Cache is too small.
>
> http://wiki.apache.org/solr/SolrPerformanceProblems
>
> Another possible problem, also discussed on that page, is that your Java
> heap is too small.
>
> Thanks,
> Shawn
>
>
>
> 
> If you reply to this email, your message will be added to the discussion
below:
>
http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129150.html
> To unsubscribe from How to reduce the search speed of solrcloud, click
here.
> NAML




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129173.html
Sent from the Solr - User mailing list archive at Nabble.com.

AUTO: Saravanan Chinnadurai is out of the office (returning 08/04/2014)

2014-04-04 Thread Saravanan . Chinnadurai
I am out of the office until 08/04/2014.

 Please email itsta...@actionimages.com for any urgent queries.


Note: This is an automated response to your message  "Cannot run program
"svnversion" when building lucene 4.7.1" sent on 4/4/2014 13:38:22.

This is the only notification you will receive while this person is away.


Action Images is a division of Reuters Limited and your data will therefore be 
protected
in accordance with the Reuters Group Privacy / Data Protection notice which is 
available
in the privacy footer at www.reuters.com
Registered in England No. 145516   VAT REG: 397000555


Re: How to reduce the search speed of solrcloud

2014-04-04 Thread Anshum Gupta
I am not sure if you setup your SolrCloud right. Can you also provide me
with the version of Solr that you're running.
Also, if you could tell me about how did you setup your SolrCloud cluster.
Are the times consistent? Is this the only collection on the cluster?

Also, if I am getting it right, you have 15 ZKs running. Correct me if I'm
wrong, but if I'm not, you don't need that kind of a zk setup.


On Fri, Apr 4, 2014 at 9:39 AM, Sathya  wrote:

> Hi shawn,
>
> I have indexed 50 million data in 5 servers. 3 servers have 8gb ram. One
> have 24gb and another one have 64gb ram. I was allocate 4 gb ram to solr in
> each machine. I am using solrcloud. My total index size is 50gb including 5
> servers. Each server have 3 zookeepers. Still I didnt check about OS disk
> cache and heap memory. I will check and let u know shawn. If anything, pls
> let me know.
>
> Thank u shawn.
>
> On Friday, April 4, 2014, Shawn Heisey-4 [via Lucene] <
> ml-node+s472066n4129150...@n3.nabble.com> wrote:
> > On 4/4/2014 1:31 AM, Sathya wrote:
> >> Hi All,
> >>
> >> Hi All, I am new to Solr. And i dont know how to increase the search
> speed
> >> of solrcloud. I have indexed nearly 4 GB of data. When i am searching a
> >> document using java with solrj, solr takes more 6 seconds to return a
> query
> >> result. Any one please help me to reduce the search query time to less
> than
> >> 500 ms. i have allocate the 4 GB ram for solr. Please let me know for
> >> further details about solrcloud config.
> >
> > How much total RAM do you have on the system, and how much total index
> > data is on that system (adding up all the Solr cores)?  You've already
> > said that you have allocated 4GB of RAM for Solr.
> >
> > Later you said you had 50 million documents, and then you showed us a
> > URL that looks like SolrCloud.
> >
> > I suspect that you don't have enough RAM left over to cache your index
> > effectively -- the OS Disk Cache is too small.
> >
> > http://wiki.apache.org/solr/SolrPerformanceProblems
> >
> > Another possible problem, also discussed on that page, is that your Java
> > heap is too small.
> >
> > Thanks,
> > Shawn
> >
> >
> >
> > 
> > If you reply to this email, your message will be added to the discussion
> below:
> >
>
> http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129150.html
> > To unsubscribe from How to reduce the search speed of solrcloud, click
> here.
> > NAML
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129173.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 

Anshum Gupta
http://www.anshumgupta.net


SOLR Jetty Server on Windows 2003

2014-04-04 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
Hi , I am trying to install solr on the Windows 2003 with Jetty server. Form 
browser everything works , but when I try to acesss from another javascript 
Code in other machine I am not getting reponse. I am using Xmlhttprequest to 
get the response from server using javascript.

Any Help...?


--Ravi


RE: SOLR Jetty Server on Windows 2003

2014-04-04 Thread Doug Turnbull
Are the requests cross domain? Is your browser giving errors about
cross domain scripting restrictions in the browser? If you're doing
cross domain browser stuff, Solr gives you the ability to do requests
over JSONP which is a sneaky hack that gets around these issues. Check
out my blog post for an example that uses angular:

http://www.opensourceconnections.com/2013/08/25/instant-search-with-solr-and-angular/



Sent from my Windows Phone From: EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions)
Sent: 4/4/2014 1:51 PM
To: solr-user@lucene.apache.org
Subject: SOLR Jetty Server on Windows 2003
Hi , I am trying to install solr on the Windows 2003 with Jetty
server. Form browser everything works , but when I try to acesss from
another javascript Code in other machine I am not getting reponse. I
am using Xmlhttprequest to get the response from server using
javascript.

Any Help...?


--Ravi


Re: Cannot run program "svnversion" when building lucene 4.7.1

2014-04-04 Thread Puneet Pawaia
Hi. Yes I installed Tortoise svn.
Regards
Puneet
On 4 Apr 2014 19:35, "Ahmet Arslan"  wrote:

> Hi,
>
> When you install subversion, svnversion executable comes with that too.
> Did you install any svn client for Windows?
>
>
>
> On Friday, April 4, 2014 3:38 PM, Puneet Pawaia 
> wrote:
> Hi all.
>
> I am trying to build lucene 4.7.1 from the sources. I can compile without
> any issues but when I try to build the dist, lucene gives me
> Cannot run program "svnversion" ... The system cannot find the specified
> file.
>
> I am compiling on Windows 7 64-bit using java version 1.7.0.45 64-bit.
>
> Where can I get this svnversion ?
>
> Thanks
> Puneet
>
>


Re: Cannot run program "svnversion" when building lucene 4.7.1

2014-04-04 Thread Ahmet Arslan
Hi,

I am not a windows user but if you installed that svnversion should be 
somewhere on disk.
Probably right next to svn. Find/locate it by file search, and add its folder 
to your path.
Once you do that you can invoke svnversion in command line.

For example here is the executables in my computer under /opt/subversion/bin: 
svn svnadminsvnlook svnservesvnversion
svn-tools   svndumpfilter   svnrdumpsvnsync



On Friday, April 4, 2014 9:18 PM, Puneet Pawaia  wrote:
Hi. Yes I installed Tortoise svn.
Regards
Puneet
On 4 Apr 2014 19:35, "Ahmet Arslan"  wrote:

> Hi,
>
> When you install subversion, svnversion executable comes with that too.
> Did you install any svn client for Windows?
>
>
>
> On Friday, April 4, 2014 3:38 PM, Puneet Pawaia 
> wrote:
> Hi all.
>
> I am trying to build lucene 4.7.1 from the sources. I can compile without
> any issues but when I try to build the dist, lucene gives me
> Cannot run program "svnversion" ... The system cannot find the specified
> file.
>
> I am compiling on Windows 7 64-bit using java version 1.7.0.45 64-bit.
>
> Where can I get this svnversion ?
>
> Thanks
> Puneet
>
>



How to see the value of "long" type (solr) ?

2014-04-04 Thread Lisheng Zhang
Hi,

We use solr 3.6 to index a field of "long" type:



Re: Full Indexing is Causing a Java Heap Out of Memory Exception

2014-04-04 Thread Candygram For Mongo
In this case we are indexing an Oracle database.

We do not include the data-config.xml in our distribution.  We store the
database information in the database.xml file.  I have attached the
database.xml file.

When we use the default merge policy settings, we get the same results.



We have not tried to dump the table to a comma separated file.  We think
that dumping this size table to disk will introduce other memory problems
with big file management. We have not tested that case.


On Fri, Apr 4, 2014 at 7:25 AM, Ahmet Arslan  wrote:

> Hi,
>
> Which database are you using? Can you send us data-config.xml?
>
> What happens when you use default merge policy settings?
>
> What happens when you dump your table to Comma Separated File and fed that
> file to solr?
>
> Ahmet
>
> On Friday, April 4, 2014 5:10 PM, Candygram For Mongo <
> candygram.for.mo...@gmail.com> wrote:
>
> The ramBufferSizeMB was set to 6MB only on the test system to make the
> system crash sooner.  In production that tag is commented out which
> I believe forces the default value to be used.
>
>
>
>
> On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan  wrote:
>
> Hi,
> >
> >out of curiosity, why did you set ramBufferSizeMB to 6?
> >
> >Ahmet
> >
> >
> >
> >
> >
> >On Friday, April 4, 2014 3:27 AM, Candygram For Mongo <
> candygram.for.mo...@gmail.com> wrote:
> >*Main issue: Full Indexing is Causing a Java Heap Out of Memory Exception
> >
> >*SOLR/Lucene version: *4.2.1*
> >
> >
> >*JVM version:
> >
> >Java(TM) SE Runtime Environment (build 1.7.0_07-b11)
> >
> >Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)
> >
> >
> >
> >*Indexer startup command:
> >
> >set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m
> >
> >
> >
> >java " %JVMARGS% ^
> >
> >-Dcom.sun.management.jmxremote.port=1092 ^
> >
> >-Dcom.sun.management.jmxremote.ssl=false ^
> >
> >-Dcom.sun.management.jmxremote.authenticate=false ^
> >
> >-jar start.jar
> >
> >
> >
> >*SOLR indexing HTTP parameters request:
> >
> >webapp=/solr path=/dataimport
> >params={clean=false&command=full-import&wt=javabin&version=2}
> >
> >
> >
> >We are getting a Java heap OOM exception when indexing (updating) 27
> >million records.  If we increase the Java heap memory settings the problem
> >goes away but we believe the problem has not been fixed and that we will
> >eventually get the same OOM exception.  We have other processes on the
> >server that also require resources so we cannot continually increase the
> >memory settings to resolve the OOM issue.  We are trying to find a way to
> >configure the SOLR instance to reduce or preferably eliminate the
> >possibility of an OOM exception.
> >
> >
> >
> >We can reproduce the problem on a test machine.  We set the Java heap
> >memory size to 64MB to accelerate the exception.  If we increase this
> >setting the same problems occurs, just hours later.  In the test
> >environment, we are using the following parameters:
> >
> >
> >
> >JVMARGS=-XX:MaxPermSize=64m -Xss256K -Xmx64m -Xms64m
> >
> >
> >
> >Normally we use the default solrconfig.xml file with only the following
> jar
> >file references added:
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >Using these values and trying to index 6 million records from the
> database,
> >the Java Heap Out of Memory exception is thrown very quickly.
> >
> >
> >
> >We were able to complete a successful indexing by further modifying the
> >solrconfig.xml and removing all or all but one  tags from the
> >schema.xml file.
> >
> >
> >
> >The following solrconfig.xml values were modified:
> >
> >
> >
> >6
> >
> >
> >
> >
> >
> >2
> >
> >2
> >
> >10
> >
> >150
> >
> >
> >
> >
> >
> >
> >
> >15000  
> >
> >false
> >
> >
> >
> >
> >
> >Using our customized schema.xml file with two or more  tags,
> the
> >OOM exception is always thrown.  Based on the errors, the problem occurs
> >when the process was trying to do the merge.  The error is provided below:
> >
> >
> >
> >Exception in thread "Lucene Merge Thread #156"
> >org.apache.lucene.index.MergePolicy$MergeException:
> >java.lang.OutOfMemoryError: Java heap space
> >
> >at
>
> >org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:541)
> >
> >at
>
> >org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:514)
> >
> >Caused by: java.lang.OutOfMemoryError: Java heap space
> >
> >at
>
> >org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.loadNumeric(Lucene42DocValuesProducer.java:180)
> >
> >at
>
> >org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.getNumeric(Lucene42DocValuesProducer.java:146)
> >
> >at
>
> >org.apache.lucene.index.SegmentCoreReaders.getNormValues(SegmentCoreReaders.java:301)
> >
> >at
>
> >org.apache.lucene.index.SegmentReader.getNormValues(SegmentReader.java:259)
> >
> >at
> >org.apache.lucene.index.SegmentMerger.mergeNorm

[ JOB ] - Search Specialist, Bloomberg LP [ NY and London ]

2014-04-04 Thread Anirudha Jadhav
http://jobs.bloomberg.com/job/New-York-Search-Technology-Specialist-Job-NY/45497500/

http://jobs.bloomberg.com/job/London-R&D-News-Search-Backend-Developer-Job/50463600/

keeping it short here , feel free to talk to me with more questions

-- 
Anirudha P. Jadhav


Re: Does sorting skip everything having to do with relevancy?

2014-04-04 Thread Alvaro Cabrerizo
Hi,

If you dont want to waste your cpu time, then comment the boost parameter
in the query parser defined in your solrconfig.xml. If you cant do that,
then you can overwrite it sending the boost parameter for example using the
constant function  (e.g.  http:///...&boost=1&sort=your_sort). The
parameter boost will be overwritten if it is not defined as an invariant.

Regards.


On Fri, Apr 4, 2014 at 4:12 PM, Shawn Heisey  wrote:

> On 4/4/2014 12:48 AM, Alvaro Cabrerizo wrote:
> > By default solr is using the sort parameter over the "score field". So if
> > you overwrite it using other sort field, yes solr will use the parameter
> > you've provided. Remember, you can use multiple fields for
> > sorting so
> > you can make something like: sort score desc, your_field1 asc,
> your_field2
> > desc
> >
> > The score of documents is calculated on every query (it does not depend
> on
> > the sort parameter or the debugQueryParameter) and the debubQuery is
> only a
> > mechanism for showing (or hidding) how score was calculated. If you want
> to
> > see a document score for a particular query (apart from the debugQuery)
> you
> > can ask for it in the solr response adding the parameter *fl=*,score* to
> > your request.
>
> These are things that I already know.
>
> What I want to know is whether Solr has code in place that will avoid
> wasting CPU cycles calculating the score that will never be displayed or
> used, *especially* the complex boost parameter that's in the request
> handler definition (solrconfig.xml).
>
> 
> name="boost">min(recip(abs(ms(NOW/HOUR,registered_date)),1.92901e-10,1.5,1.5),0.85)
>
> Do I need to send 'boost=' as a parameter (along with my sort) to get it
> to avoid that calculation?
>
> Thanks,
> Shawn
>
>


Re: Does sorting skip everything having to do with relevancy?

2014-04-04 Thread Shawn Heisey

On 4/4/2014 1:48 PM, Alvaro Cabrerizo wrote:

If you dont want to waste your cpu time, then comment the boost parameter
in the query parser defined in your solrconfig.xml. If you cant do that,
then you can overwrite it sending the boost parameter for example using the
constant function  (e.g.  http:///...&boost=1&sort=your_sort). The
parameter boost will be overwritten if it is not defined as an invariant.


Thank you for responding.  I know how I can override the behavior, what 
I want to find out is whether or not it's necessary to do so -- if it's 
not necessary because Solr skips it, then everything is good.  If it is 
necessary, I can open an issue in Jira asking for Solr to get smarter.  
That way everyone benefits and they don't have to do anything except 
upgrade Solr.


Thanks,
Shawn



Distributed tracing for Solr via adding HTTP headers?

2014-04-04 Thread Gregg Donovan
We have some metadata -- e.g. a request UUID -- that we log to every log
line using Log4J's MDC [1]. The UUID logging allows us to connect any log
lines we have for a given request across servers. Sort of like Zipkin [2].

Currently we're using EmbeddedSolrServer without sharding, so adding the
UUID is fairly simple, since everything is in one process and one thread.
But, we're testing a sharded HTTP implementation and running into some
difficulties getting this data passed around in a way that lets us trace
all log lines generated by a request to its UUID.

The first thing I tried was to add the UUID by adding it to the SolrParams.
This achieves the goal of getting those values logged on the shards if a
request is successful, but we miss having those values in the MDC if there
are other log lines before the final log line. E.g. an Exception in a
custom component.

My current thought is that sending HTTP headers with diagnostic information
would be very useful. Those could be placed in the MDC even before handing
off to work to SolrDispatchFilter, so that any Solr problem will have the
proper logging.

I.e. every additional header added to a Solr request gets a "Solr-" prefix.
On the server, we look for those headers and add them to the SLF4J MDC[3].

Here's a patch [4] that does this that we're testing out. Is this a good
idea? Would anyone else find this useful? If so, I'll open a ticket.

--Gregg

[1] http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/MDC.html
[2] http://twitter.github.io/zipkin/
[3] http://www.slf4j.org/api/org/slf4j/MDC.html
[4] https://gist.github.com/greggdonovan/9982327


Re: Full Indexing is Causing a Java Heap Out of Memory Exception

2014-04-04 Thread Candygram For Mongo
In case the attached database.xml file didn't show up, I have pasted in the
contents below:
































On Fri, Apr 4, 2014 at 11:55 AM, Candygram For Mongo <
candygram.for.mo...@gmail.com> wrote:

> In this case we are indexing an Oracle database.
>
> We do not include the data-config.xml in our distribution.  We store the
> database information in the database.xml file.  I have attached the
> database.xml file.
>
> When we use the default merge policy settings, we get the same results.
>
>
>
> We have not tried to dump the table to a comma separated file.  We think
> that dumping this size table to disk will introduce other memory problems
> with big file management. We have not tested that case.
>
>
> On Fri, Apr 4, 2014 at 7:25 AM, Ahmet Arslan  wrote:
>
>> Hi,
>>
>> Which database are you using? Can you send us data-config.xml?
>>
>> What happens when you use default merge policy settings?
>>
>> What happens when you dump your table to Comma Separated File and fed
>> that file to solr?
>>
>> Ahmet
>>
>> On Friday, April 4, 2014 5:10 PM, Candygram For Mongo <
>> candygram.for.mo...@gmail.com> wrote:
>>
>> The ramBufferSizeMB was set to 6MB only on the test system to make the
>> system crash sooner.  In production that tag is commented out which
>> I believe forces the default value to be used.
>>
>>
>>
>>
>> On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan  wrote:
>>
>> Hi,
>> >
>> >out of curiosity, why did you set ramBufferSizeMB to 6?
>> >
>> >Ahmet
>> >
>> >
>> >
>> >
>> >
>> >On Friday, April 4, 2014 3:27 AM, Candygram For Mongo <
>> candygram.for.mo...@gmail.com> wrote:
>> >*Main issue: Full Indexing is Causing a Java Heap Out of Memory Exception
>> >
>> >*SOLR/Lucene version: *4.2.1*
>> >
>> >
>> >*JVM version:
>> >
>> >Java(TM) SE Runtime Environment (build 1.7.0_07-b11)
>> >
>> >Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)
>> >
>> >
>> >
>> >*Indexer startup command:
>> >
>> >set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m
>> >
>> >
>> >
>> >java " %JVMARGS% ^
>> >
>> >-Dcom.sun.management.jmxremote.port=1092 ^
>> >
>> >-Dcom.sun.management.jmxremote.ssl=false ^
>> >
>> >-Dcom.sun.management.jmxremote.authenticate=false ^
>> >
>> >-jar start.jar
>> >
>> >
>> >
>> >*SOLR indexing HTTP parameters request:
>> >
>> >webapp=/solr path=/dataimport
>> >params={clean=false&command=full-import&wt=javabin&version=2}
>> >
>> >
>> >
>> >We are getting a Java heap OOM exception when indexing (updating) 27
>> >million records.  If we increase the Java heap memory settings the
>> problem
>> >goes away but we believe the problem has not been fixed and that we will
>> >eventually get the same OOM exception.  We have other processes on the
>> >server that also require resources so we cannot continually increase the
>> >memory settings to resolve the OOM issue.  We are trying to find a way to
>> >configure the SOLR instance to reduce or preferably eliminate the
>> >possibility of an OOM exception.
>> >
>> >
>> >
>> >We can reproduce the problem on a test machine.  We set the Java heap
>> >memory size to 64MB to accelerate the exception.  If we increase this
>> >setting the same problems occurs, just hours later.  In the test
>> >environment, we are using the following parameters:
>> >
>> >
>> >
>> >JVMARGS=-XX:MaxPermSize=64m -Xss256K -Xmx64m -Xms64m
>> >
>> >
>> >
>> >Normally we use the default solrconfig.xml file with only the following
>> jar
>> >file references added:
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >Using these values and trying to index 6 million records from the
>> database,
>> >the Java Heap Out of Memory exception is thrown very quickly.
>> >
>> >
>> >
>> >We were able to complete a successful indexing by further modifying the
>> >solrconfig.xml and removing all or all but one  tags from the
>> >schema.xml file.
>> >
>> >
>> >
>> >The following solrconfig.xml values were modified:
>> >
>> >
>> >
>> >6
>> >
>> >
>> >
>> >
>> >
>> >2
>> >
>> >2
>> >
>> >10
>> >
>> >150
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >15000  
>> >
>> >false
>> >
>> >
>> >
>> >
>> >
>> >Using our customized schema.xml file with two or more  tags,
>> the
>> >OOM exception is always thrown.  Based on the errors, the problem occurs
>> >when the process was trying to do the merge.  The error is provided
>> below:
>> >
>> >
>> >
>> >Exception in thread "Lucene Merge Thread #156"
>> >org.apache.lucene.index.MergePolicy$MergeException:
>> >java.lang.OutOfMemoryError: Java heap space
>> >
>> >at
>>
>> >org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:541)
>> >
>> >at
>>
>> >org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:514)
>> >
>> >Caused by: java.lang.OutOfMemoryError: Java heap space
>> >
>> >at
>>
>> >org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.loadNumeric(Lucene42DocValuesProducer.java:180)
>> >
>> >  

Re: Does sorting skip everything having to do with relevancy?

2014-04-04 Thread Mikhail Khludnev
Hello Shawn,

I suppose SolrIndexSearcher.buildTopDocsCollector() doesn't create a
Collector which calls score() in this case. Hence, it shouldn't waste CPU.
Just my impression.
Haven't you tried to check it supplying some weird formula, which throws
exception?


On Sat, Apr 5, 2014 at 12:02 AM, Shawn Heisey  wrote:

> On 4/4/2014 1:48 PM, Alvaro Cabrerizo wrote:
>
>> If you dont want to waste your cpu time, then comment the boost parameter
>> in the query parser defined in your solrconfig.xml. If you cant do that,
>> then you can overwrite it sending the boost parameter for example using
>> the
>> constant function  (e.g.  http:///...&boost=1&sort=your_sort). The
>> parameter boost will be overwritten if it is not defined as an invariant.
>>
>
> Thank you for responding.  I know how I can override the behavior, what I
> want to find out is whether or not it's necessary to do so -- if it's not
> necessary because Solr skips it, then everything is good.  If it is
> necessary, I can open an issue in Jira asking for Solr to get smarter.
>  That way everyone benefits and they don't have to do anything except
> upgrade Solr.
>
> Thanks,
> Shawn
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


Re: Solr join and lucene scoring

2014-04-04 Thread Mikhail Khludnev
On Thu, Apr 3, 2014 at 1:42 PM,  wrote:

> Hello,
>
> referencing to this issue:
> https://issues.apache.org/jira/browse/SOLR-4307
>
> Is it still not possible with the solr query time join to use scoring?
>
It's not implemented still.
https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/search/JoinQParserPlugin.java#L549


> Do I still have to write my own plugin or is there a plugin somewhere I
> could use?
>
> I never wrote a plugin for solr before, so I would prefer if I don't have
> to start from scratch.
>
The right approach from my POV is to use Lucene's join
https://github.com/apache/lucene-solr/blob/trunk/lucene/join/src/java/org/apache/lucene/search/join/JoinUtil.javain
new QParser, but solving the impedance between Lucene and Solr, might
be
tricky.



>
> THX,
> Moritz
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


Re: Filter query with multiple raw/literal ORs

2014-04-04 Thread Mikhail Khludnev
On Fri, Apr 4, 2014 at 4:08 AM, Yonik Seeley  wrote:

> Try adding a space before the first term, so the
> default lucene query parser will be used:
>

Yonik, I'm curious, whether it a feature?


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Full Indexing is Causing a Java Heap Out of Memory Exception

2014-04-04 Thread Ahmet Arslan
Hi,

Can you remove auto commit for bulk import. Commit at the very end?

Ahmet



On Saturday, April 5, 2014 12:16 AM, Candygram For Mongo 
 wrote:
In case the attached database.xml file didn't show up, I have pasted in the
contents below:

































On Fri, Apr 4, 2014 at 11:55 AM, Candygram For Mongo <
candygram.for.mo...@gmail.com> wrote:

> In this case we are indexing an Oracle database.
>
> We do not include the data-config.xml in our distribution.  We store the
> database information in the database.xml file.  I have attached the
> database.xml file.
>
> When we use the default merge policy settings, we get the same results.
>
>
>
> We have not tried to dump the table to a comma separated file.  We think
> that dumping this size table to disk will introduce other memory problems
> with big file management. We have not tested that case.
>
>
> On Fri, Apr 4, 2014 at 7:25 AM, Ahmet Arslan  wrote:
>
>> Hi,
>>
>> Which database are you using? Can you send us data-config.xml?
>>
>> What happens when you use default merge policy settings?
>>
>> What happens when you dump your table to Comma Separated File and fed
>> that file to solr?
>>
>> Ahmet
>>
>> On Friday, April 4, 2014 5:10 PM, Candygram For Mongo <
>> candygram.for.mo...@gmail.com> wrote:
>>
>> The ramBufferSizeMB was set to 6MB only on the test system to make the
>> system crash sooner.  In production that tag is commented out which
>> I believe forces the default value to be used.
>>
>>
>>
>>
>> On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan  wrote:
>>
>> Hi,
>> >
>> >out of curiosity, why did you set ramBufferSizeMB to 6?
>> >
>> >Ahmet
>> >
>> >
>> >
>> >
>> >
>> >On Friday, April 4, 2014 3:27 AM, Candygram For Mongo <
>> candygram.for.mo...@gmail.com> wrote:
>> >*Main issue: Full Indexing is Causing a Java Heap Out of Memory Exception
>> >
>> >*SOLR/Lucene version: *4.2.1*
>> >
>> >
>> >*JVM version:
>> >
>> >Java(TM) SE Runtime Environment (build 1.7.0_07-b11)
>> >
>> >Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)
>> >
>> >
>> >
>> >*Indexer startup command:
>> >
>> >set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m
>> >
>> >
>> >
>> >java " %JVMARGS% ^
>> >
>> >-Dcom.sun.management.jmxremote.port=1092 ^
>> >
>> >-Dcom.sun.management.jmxremote.ssl=false ^
>> >
>> >-Dcom.sun.management.jmxremote.authenticate=false ^
>> >
>> >-jar start.jar
>> >
>> >
>> >
>> >*SOLR indexing HTTP parameters request:
>> >
>> >webapp=/solr path=/dataimport
>> >params={clean=false&command=full-import&wt=javabin&version=2}
>> >
>> >
>> >
>> >We are getting a Java heap OOM exception when indexing (updating) 27
>> >million records.  If we increase the Java heap memory settings the
>> problem
>> >goes away but we believe the problem has not been fixed and that we will
>> >eventually get the same OOM exception.  We have other processes on the
>> >server that also require resources so we cannot continually increase the
>> >memory settings to resolve the OOM issue.  We are trying to find a way to
>> >configure the SOLR instance to reduce or preferably eliminate the
>> >possibility of an OOM exception.
>> >
>> >
>> >
>> >We can reproduce the problem on a test machine.  We set the Java heap
>> >memory size to 64MB to accelerate the exception.  If we increase this
>> >setting the same problems occurs, just hours later.  In the test
>> >environment, we are using the following parameters:
>> >
>> >
>> >
>> >JVMARGS=-XX:MaxPermSize=64m -Xss256K -Xmx64m -Xms64m
>> >
>> >
>> >
>> >Normally we use the default solrconfig.xml file with only the following
>> jar
>> >file references added:
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >Using these values and trying to index 6 million records from the
>> database,
>> >the Java Heap Out of Memory exception is thrown very quickly.
>> >
>> >
>> >
>> >We were able to complete a successful indexing by further modifying the
>> >solrconfig.xml and removing all or all but one  tags from the
>> >schema.xml file.
>> >
>> >
>> >
>> >The following solrconfig.xml values were modified:
>> >
>> >
>> >
>> >6
>> >
>> >
>> >
>> >
>> >
>> >2
>> >
>> >2
>> >
>> >10
>> >
>> >150
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >15000  
>> >
>> >false
>> >
>> >
>> >
>> >
>> >
>> >Using our customized schema.xml file with two or more  tags,
>> the
>> >OOM exception is always thrown.  Based on the errors, the problem occurs
>> >when the process was trying to do the merge.  The error is provided
>> below:
>> >
>> >
>> >
>> >Exception in thread "Lucene Merge Thread #156"
>> >org.apache.lucene.index.MergePolicy$MergeException:
>> >java.lang.OutOfMemoryError: Java heap space
>> >
>> >                at
>>
>> >org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:541)
>> >
>> >                at
>>
>> >org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:514)
>> >
>> >Caused by: java.lang.OutOfMemoryError: Java heap space
>> >

Re: Filter query with multiple raw/literal ORs

2014-04-04 Thread Yonik Seeley
On Fri, Apr 4, 2014 at 5:28 PM, Mikhail Khludnev
 wrote:
> On Fri, Apr 4, 2014 at 4:08 AM, Yonik Seeley  wrote:
>
>> Try adding a space before the first term, so the
>> default lucene query parser will be used:
>>
>
> Yonik, I'm curious, whether it a feature?

Yep, it was completely on purpose that I required local parameters to
be left-justified.  It left an easy way to "escape" the normal local
params processing when looking for the query type.

For example, if you want to ensure that your custom parser is used,
and you have defType=myCustomQParser
then all you have to do is add a space before the query parameter
(which shouldn't mess up any sort of natural language query parser).

-Yonik
http://heliosearch.org - solve Solr GC pauses with off-heap filters
and fieldcache


Re: Does sorting skip everything having to do with relevancy?

2014-04-04 Thread Shawn Heisey

On 4/4/2014 3:13 PM, Mikhail Khludnev wrote:

I suppose SolrIndexSearcher.buildTopDocsCollector() doesn't create a
Collector which calls score() in this case. Hence, it shouldn't waste CPU.
Just my impression.
Haven't you tried to check it supplying some weird formula, which throws
exception?


I didn't think of that.  That's a good idea -- as long as there's not 
independent code that checks the function in addition to the code that 
actually runs it.


With the following parameters added to an edismax query that otherwise 
works, I get an exception.  It works if I change the "e" to 5.


&sort=registered_date asc&boost=sum(5,"e")

I will take Alvaro's suggestion and add "boost=1" to queries that use a 
sort parameter.  It's probably a good idea to file that Jira.


Thanks,
Shawn



Re: Full Indexing is Causing a Java Heap Out of Memory Exception

2014-04-04 Thread Candygram For Mongo
We would be happy to try that.  That sounds counter intuitive for the high
volume of records we have.  Can you help me understand how that might solve
our problem?



On Fri, Apr 4, 2014 at 2:34 PM, Ahmet Arslan  wrote:

> Hi,
>
> Can you remove auto commit for bulk import. Commit at the very end?
>
> Ahmet
>
>
>
> On Saturday, April 5, 2014 12:16 AM, Candygram For Mongo <
> candygram.for.mo...@gmail.com> wrote:
> In case the attached database.xml file didn't show up, I have pasted in the
> contents below:
>
> 
>  name="org_only"
> type="JdbcDataSource"
> driver="oracle.jdbc.OracleDriver"
> url="jdbc:oracle:thin:@test2.abc.com:1521:ORCL"
> user="admin"
> password="admin"
> readOnly="false"
> batchSize="100"
> />
> 
>
>
> 
>
> 
> 
> 
>  name="ADDRESS_ACCT_ALL.ADDR_TYPE_CD_abc" />
>  />
> 
> 
> 
> 
>  />
>
> 
>
>
>
> 
> 
> 
> 
>
>
>
>
>
>
> On Fri, Apr 4, 2014 at 11:55 AM, Candygram For Mongo <
> candygram.for.mo...@gmail.com> wrote:
>
> > In this case we are indexing an Oracle database.
> >
> > We do not include the data-config.xml in our distribution.  We store the
> > database information in the database.xml file.  I have attached the
> > database.xml file.
> >
> > When we use the default merge policy settings, we get the same results.
> >
> >
> >
> > We have not tried to dump the table to a comma separated file.  We think
> > that dumping this size table to disk will introduce other memory problems
> > with big file management. We have not tested that case.
> >
> >
> > On Fri, Apr 4, 2014 at 7:25 AM, Ahmet Arslan  wrote:
> >
> >> Hi,
> >>
> >> Which database are you using? Can you send us data-config.xml?
> >>
> >> What happens when you use default merge policy settings?
> >>
> >> What happens when you dump your table to Comma Separated File and fed
> >> that file to solr?
> >>
> >> Ahmet
> >>
> >> On Friday, April 4, 2014 5:10 PM, Candygram For Mongo <
> >> candygram.for.mo...@gmail.com> wrote:
> >>
> >> The ramBufferSizeMB was set to 6MB only on the test system to make the
> >> system crash sooner.  In production that tag is commented out which
> >> I believe forces the default value to be used.
> >>
> >>
> >>
> >>
> >> On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan  wrote:
> >>
> >> Hi,
> >> >
> >> >out of curiosity, why did you set ramBufferSizeMB to 6?
> >> >
> >> >Ahmet
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >On Friday, April 4, 2014 3:27 AM, Candygram For Mongo <
> >> candygram.for.mo...@gmail.com> wrote:
> >> >*Main issue: Full Indexing is Causing a Java Heap Out of Memory
> Exception
> >> >
> >> >*SOLR/Lucene version: *4.2.1*
> >> >
> >> >
> >> >*JVM version:
> >> >
> >> >Java(TM) SE Runtime Environment (build 1.7.0_07-b11)
> >> >
> >> >Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)
> >> >
> >> >
> >> >
> >> >*Indexer startup command:
> >> >
> >> >set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m
> >> >
> >> >
> >> >
> >> >java " %JVMARGS% ^
> >> >
> >> >-Dcom.sun.management.jmxremote.port=1092 ^
> >> >
> >> >-Dcom.sun.management.jmxremote.ssl=false ^
> >> >
> >> >-Dcom.sun.management.jmxremote.authenticate=false ^
> >> >
> >> >-jar start.jar
> >> >
> >> >
> >> >
> >> >*SOLR indexing HTTP parameters request:
> >> >
> >> >webapp=/solr path=/dataimport
> >> >params={clean=false&command=full-import&wt=javabin&version=2}
> >> >
> >> >
> >> >
> >> >We are getting a Java heap OOM exception when indexing (updating) 27
> >> >million records.  If we increase the Java heap memory settings the
> >> problem
> >> >goes away but we believe the problem has not been fixed and that we
> will
> >> >eventually get the same OOM exception.  We have other processes on the
> >> >server that also require resources so we cannot continually increase
> the
> >> >memory settings to resolve the OOM issue.  We are trying to find a way
> to
> >> >configure the SOLR instance to reduce or preferably eliminate the
> >> >possibility of an OOM exception.
> >> >
> >> >
> >> >
> >> >We can reproduce the problem on a test machine.  We set the Java heap
> >> >memory size to 64MB to accelerate the exception.  If we increase this
> >> >setting the same problems occurs, just hours later.  In the test
> >> >environment, we are using the following parameters:
> >> >
> >> >
> >> >
> >> >JVMARGS=-XX:MaxPermSize=64m -Xss256K -Xmx64m -Xms64m
> >> >
> >> >
> >> >
> >> >Normally we use the default solrconfig.xml file with only the following
> >> jar
> >> >file references added:
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >Using these values and trying to index 6 million records from the
> >> database,
> >> >the Java Heap Out of Memory exception is thrown very quickly.
> >> >
> >> >
> >> >
> >> >We were able to complete a successful indexing by further modifying the
> >> >solrconfig.xml and removing all or all but one  tags from
> the
> >> >schema.xml file.
> >> >
> >> >
> >> >
> >> >The following solrconfig.xml values were modified:
> >> >
> >> >
> >> >
> >> >6
> >> >
> >> >
> >> >

Re: Full Indexing is Causing a Java Heap Out of Memory Exception

2014-04-04 Thread Candygram For Mongo
I might have forgot to mention that we are using the DataImportHandler.  I
think we know how to remove auto commit.  How would we force a commit at
the end?


On Fri, Apr 4, 2014 at 3:18 PM, Candygram For Mongo <
candygram.for.mo...@gmail.com> wrote:

> We would be happy to try that.  That sounds counter intuitive for the high
> volume of records we have.  Can you help me understand how that might solve
> our problem?
>
>
>
> On Fri, Apr 4, 2014 at 2:34 PM, Ahmet Arslan  wrote:
>
>> Hi,
>>
>> Can you remove auto commit for bulk import. Commit at the very end?
>>
>> Ahmet
>>
>>
>>
>> On Saturday, April 5, 2014 12:16 AM, Candygram For Mongo <
>> candygram.for.mo...@gmail.com> wrote:
>> In case the attached database.xml file didn't show up, I have pasted in
>> the
>> contents below:
>>
>> 
>> > name="org_only"
>> type="JdbcDataSource"
>> driver="oracle.jdbc.OracleDriver"
>> url="jdbc:oracle:thin:@test2.abc.com:1521:ORCL"
>> user="admin"
>> password="admin"
>> readOnly="false"
>> batchSize="100"
>> />
>> 
>>
>>
>> 
>>
>> 
>> 
>> 
>> > name="ADDRESS_ACCT_ALL.ADDR_TYPE_CD_abc" />
>> > />
>> 
>> > />
>> 
>> 
>> > />
>>
>> 
>>
>>
>>
>> 
>> 
>> 
>> 
>>
>>
>>
>>
>>
>>
>> On Fri, Apr 4, 2014 at 11:55 AM, Candygram For Mongo <
>> candygram.for.mo...@gmail.com> wrote:
>>
>> > In this case we are indexing an Oracle database.
>> >
>> > We do not include the data-config.xml in our distribution.  We store the
>> > database information in the database.xml file.  I have attached the
>> > database.xml file.
>> >
>> > When we use the default merge policy settings, we get the same results.
>> >
>> >
>> >
>> > We have not tried to dump the table to a comma separated file.  We think
>> > that dumping this size table to disk will introduce other memory
>> problems
>> > with big file management. We have not tested that case.
>> >
>> >
>> > On Fri, Apr 4, 2014 at 7:25 AM, Ahmet Arslan  wrote:
>> >
>> >> Hi,
>> >>
>> >> Which database are you using? Can you send us data-config.xml?
>> >>
>> >> What happens when you use default merge policy settings?
>> >>
>> >> What happens when you dump your table to Comma Separated File and fed
>> >> that file to solr?
>> >>
>> >> Ahmet
>> >>
>> >> On Friday, April 4, 2014 5:10 PM, Candygram For Mongo <
>> >> candygram.for.mo...@gmail.com> wrote:
>> >>
>> >> The ramBufferSizeMB was set to 6MB only on the test system to make the
>> >> system crash sooner.  In production that tag is commented out which
>> >> I believe forces the default value to be used.
>> >>
>> >>
>> >>
>> >>
>> >> On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan 
>> wrote:
>> >>
>> >> Hi,
>> >> >
>> >> >out of curiosity, why did you set ramBufferSizeMB to 6?
>> >> >
>> >> >Ahmet
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >On Friday, April 4, 2014 3:27 AM, Candygram For Mongo <
>> >> candygram.for.mo...@gmail.com> wrote:
>> >> >*Main issue: Full Indexing is Causing a Java Heap Out of Memory
>> Exception
>> >> >
>> >> >*SOLR/Lucene version: *4.2.1*
>> >> >
>> >> >
>> >> >*JVM version:
>> >> >
>> >> >Java(TM) SE Runtime Environment (build 1.7.0_07-b11)
>> >> >
>> >> >Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)
>> >> >
>> >> >
>> >> >
>> >> >*Indexer startup command:
>> >> >
>> >> >set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m
>> >> >
>> >> >
>> >> >
>> >> >java " %JVMARGS% ^
>> >> >
>> >> >-Dcom.sun.management.jmxremote.port=1092 ^
>> >> >
>> >> >-Dcom.sun.management.jmxremote.ssl=false ^
>> >> >
>> >> >-Dcom.sun.management.jmxremote.authenticate=false ^
>> >> >
>> >> >-jar start.jar
>> >> >
>> >> >
>> >> >
>> >> >*SOLR indexing HTTP parameters request:
>> >> >
>> >> >webapp=/solr path=/dataimport
>> >> >params={clean=false&command=full-import&wt=javabin&version=2}
>> >> >
>> >> >
>> >> >
>> >> >We are getting a Java heap OOM exception when indexing (updating) 27
>> >> >million records.  If we increase the Java heap memory settings the
>> >> problem
>> >> >goes away but we believe the problem has not been fixed and that we
>> will
>> >> >eventually get the same OOM exception.  We have other processes on the
>> >> >server that also require resources so we cannot continually increase
>> the
>> >> >memory settings to resolve the OOM issue.  We are trying to find a
>> way to
>> >> >configure the SOLR instance to reduce or preferably eliminate the
>> >> >possibility of an OOM exception.
>> >> >
>> >> >
>> >> >
>> >> >We can reproduce the problem on a test machine.  We set the Java heap
>> >> >memory size to 64MB to accelerate the exception.  If we increase this
>> >> >setting the same problems occurs, just hours later.  In the test
>> >> >environment, we are using the following parameters:
>> >> >
>> >> >
>> >> >
>> >> >JVMARGS=-XX:MaxPermSize=64m -Xss256K -Xmx64m -Xms64m
>> >> >
>> >> >
>> >> >
>> >> >Normally we use the default solrconfig.xml file with only the
>> following
>> >> jar
>> >> >file references added:
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> 

Re: Full Indexing is Causing a Java Heap Out of Memory Exception

2014-04-04 Thread Ahmet Arslan
Hi,

This may not solve your problem but generally it is recommended to disable auto 
commit and transaction logs for bulk indexing.
And issue one commit at the very end. Do you tlogs enabled? I see "commit 
failed" in the error message thats why I am offering this.

And regarding comma separated values, with this approach you focus on just solr 
importing process. You separate data acquisition phrase. And it is very fast 
load even big csv files  http://wiki.apache.org/solr/UpdateCSV
I have never experienced OOM during indexing, I suspect data acquisition has 
role in it.

Ahmet

On Saturday, April 5, 2014 1:18 AM, Candygram For Mongo 
 wrote:

We would be happy to try that.  That sounds counter intuitive for the high 
volume of records we have.  Can you help me understand how that might solve our 
problem?




On Fri, Apr 4, 2014 at 2:34 PM, Ahmet Arslan  wrote:

Hi,
>
>Can you remove auto commit for bulk import. Commit at the very end?
>
>Ahmet
>
>
>
>
>On Saturday, April 5, 2014 12:16 AM, Candygram For Mongo 
> wrote:
>In case the attached database.xml file didn't show up, I have pasted in the
>contents below:
>
>
>name="org_only"
>type="JdbcDataSource"
>driver="oracle.jdbc.OracleDriver"
>url="jdbc:oracle:thin:@test2.abc.com:1521:ORCL"
>user="admin"
>password="admin"
>readOnly="false"
>batchSize="100"
>/>
>
>
>
>
>
>
>
>
>name="ADDRESS_ACCT_ALL.ADDR_TYPE_CD_abc" />
>
>
>
>
>
>/>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>On Fri, Apr 4, 2014 at 11:55 AM, Candygram For Mongo <
>candygram.for.mo...@gmail.com> wrote:
>
>> In this case we are indexing an Oracle database.
>>
>> We do not include the data-config.xml in our distribution.  We store the
>> database information in the database.xml file.  I have attached the
>> database.xml file.
>>
>> When we use the default merge policy settings, we get the same results.
>>
>>
>>
>> We have not tried to dump the table to a comma separated file.  We think
>> that dumping this size table to disk will introduce other memory problems
>> with big file management. We have not tested that case.
>>
>>
>> On Fri, Apr 4, 2014 at 7:25 AM, Ahmet Arslan  wrote:
>>
>>> Hi,
>>>
>>> Which database are you using? Can you send us data-config.xml?
>>>
>>> What happens when you use default merge policy settings?
>>>
>>> What happens when you dump your table to Comma Separated File and fed
>>> that file to solr?
>>>
>>> Ahmet
>>>
>>> On Friday, April 4, 2014 5:10 PM, Candygram For Mongo <
>>> candygram.for.mo...@gmail.com> wrote:
>>>
>>> The ramBufferSizeMB was set to 6MB only on the test system to make the
>>> system crash sooner.  In production that tag is commented out which
>>> I believe forces the default value to be used.
>>>
>>>
>>>
>>>
>>> On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan  wrote:
>>>
>>> Hi,
>>> >
>>> >out of curiosity, why did you set ramBufferSizeMB to 6?
>>> >
>>> >Ahmet
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >On Friday, April 4, 2014 3:27 AM, Candygram For Mongo <
>>> candygram.for.mo...@gmail.com> wrote:
>>> >*Main issue: Full Indexing is Causing a Java Heap Out of Memory Exception
>>> >
>>> >*SOLR/Lucene version: *4.2.1*
>>> >
>>> >
>>> >*JVM version:
>>> >
>>> >Java(TM) SE Runtime Environment (build 1.7.0_07-b11)
>>> >
>>> >Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)
>>> >
>>> >
>>> >
>>> >*Indexer startup command:
>>> >
>>> >set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m
>>> >
>>> >
>>> >
>>> >java " %JVMARGS% ^
>>> >
>>> >-Dcom.sun.management.jmxremote.port=1092 ^
>>> >
>>> >-Dcom.sun.management.jmxremote.ssl=false ^
>>> >
>>> >-Dcom.sun.management.jmxremote.authenticate=false ^
>>> >
>>> >-jar start.jar
>>> >
>>> >
>>> >
>>> >*SOLR indexing HTTP parameters request:
>>> >
>>> >webapp=/solr path=/dataimport
>>> >params={clean=false&command=full-import&wt=javabin&version=2}
>>> >
>>> >
>>> >
>>> >We are getting a Java heap OOM exception when indexing (updating) 27
>>> >million records.  If we increase the Java heap memory settings the
>>> problem
>>> >goes away but we believe the problem has not been fixed and that we will
>>> >eventually get the same OOM exception.  We have other processes on the
>>> >server that also require resources so we cannot continually increase the
>>> >memory settings to resolve the OOM issue.  We are trying to find a way to
>>> >configure the SOLR instance to reduce or preferably eliminate the
>>> >possibility of an OOM exception.
>>> >
>>> >
>>> >
>>> >We can reproduce the problem on a test machine.  We set the Java heap
>>> >memory size to 64MB to accelerate the exception.  If we increase this
>>> >setting the same problems occurs, just hours later.  In the test
>>> >environment, we are using the following parameters:
>>> >
>>> >
>>> >
>>> >JVMARGS=-XX:MaxPermSize=64m -Xss256K -Xmx64m -Xms64m
>>> >
>>> >
>>> >
>>> >Normally we use the default solrconfig.xml file with only the following
>>> jar
>>> >file references added:
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >Using these values 

Re: Full Indexing is Causing a Java Heap Out of Memory Exception

2014-04-04 Thread Ahmet Arslan
Hi,

To disable auto commit remove both  and  
parts/definitions from solrconfig.xml

To disable tlog remove  
   
      ${solr.ulog.dir:}
    

from solrconfig.xml

To commit at the end use commit=true parameter. ?commit=true&command=full-import
There is a checkbox for this in data import admin page.



On Saturday, April 5, 2014 1:27 AM, Candygram For Mongo 
 wrote:
I might have forgot to mention that we are using the DataImportHandler.  I
think we know how to remove auto commit.  How would we force a commit at
the end?


On Fri, Apr 4, 2014 at 3:18 PM, Candygram For Mongo <
candygram.for.mo...@gmail.com> wrote:

> We would be happy to try that.  That sounds counter intuitive for the high
> volume of records we have.  Can you help me understand how that might solve
> our problem?
>
>
>
> On Fri, Apr 4, 2014 at 2:34 PM, Ahmet Arslan  wrote:
>
>> Hi,
>>
>> Can you remove auto commit for bulk import. Commit at the very end?
>>
>> Ahmet
>>
>>
>>
>> On Saturday, April 5, 2014 12:16 AM, Candygram For Mongo <
>> candygram.for.mo...@gmail.com> wrote:
>> In case the attached database.xml file didn't show up, I have pasted in
>> the
>> contents below:
>>
>> 
>> > name="org_only"
>> type="JdbcDataSource"
>> driver="oracle.jdbc.OracleDriver"
>> url="jdbc:oracle:thin:@test2.abc.com:1521:ORCL"
>> user="admin"
>> password="admin"
>> readOnly="false"
>> batchSize="100"
>> />
>> 
>>
>>
>> 
>>
>> 
>> 
>> 
>> > name="ADDRESS_ACCT_ALL.ADDR_TYPE_CD_abc" />
>> > />
>> 
>> > />
>> 
>> 
>> > />
>>
>> 
>>
>>
>>
>> 
>> 
>> 
>> 
>>
>>
>>
>>
>>
>>
>> On Fri, Apr 4, 2014 at 11:55 AM, Candygram For Mongo <
>> candygram.for.mo...@gmail.com> wrote:
>>
>> > In this case we are indexing an Oracle database.
>> >
>> > We do not include the data-config.xml in our distribution.  We store the
>> > database information in the database.xml file.  I have attached the
>> > database.xml file.
>> >
>> > When we use the default merge policy settings, we get the same results.
>> >
>> >
>> >
>> > We have not tried to dump the table to a comma separated file.  We think
>> > that dumping this size table to disk will introduce other memory
>> problems
>> > with big file management. We have not tested that case.
>> >
>> >
>> > On Fri, Apr 4, 2014 at 7:25 AM, Ahmet Arslan  wrote:
>> >
>> >> Hi,
>> >>
>> >> Which database are you using? Can you send us data-config.xml?
>> >>
>> >> What happens when you use default merge policy settings?
>> >>
>> >> What happens when you dump your table to Comma Separated File and fed
>> >> that file to solr?
>> >>
>> >> Ahmet
>> >>
>> >> On Friday, April 4, 2014 5:10 PM, Candygram For Mongo <
>> >> candygram.for.mo...@gmail.com> wrote:
>> >>
>> >> The ramBufferSizeMB was set to 6MB only on the test system to make the
>> >> system crash sooner.  In production that tag is commented out which
>> >> I believe forces the default value to be used.
>> >>
>> >>
>> >>
>> >>
>> >> On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan 
>> wrote:
>> >>
>> >> Hi,
>> >> >
>> >> >out of curiosity, why did you set ramBufferSizeMB to 6?
>> >> >
>> >> >Ahmet
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >On Friday, April 4, 2014 3:27 AM, Candygram For Mongo <
>> >> candygram.for.mo...@gmail.com> wrote:
>> >> >*Main issue: Full Indexing is Causing a Java Heap Out of Memory
>> Exception
>> >> >
>> >> >*SOLR/Lucene version: *4.2.1*
>> >> >
>> >> >
>> >> >*JVM version:
>> >> >
>> >> >Java(TM) SE Runtime Environment (build 1.7.0_07-b11)
>> >> >
>> >> >Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)
>> >> >
>> >> >
>> >> >
>> >> >*Indexer startup command:
>> >> >
>> >> >set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m
>> >> >
>> >> >
>> >> >
>> >> >java " %JVMARGS% ^
>> >> >
>> >> >-Dcom.sun.management.jmxremote.port=1092 ^
>> >> >
>> >> >-Dcom.sun.management.jmxremote.ssl=false ^
>> >> >
>> >> >-Dcom.sun.management.jmxremote.authenticate=false ^
>> >> >
>> >> >-jar start.jar
>> >> >
>> >> >
>> >> >
>> >> >*SOLR indexing HTTP parameters request:
>> >> >
>> >> >webapp=/solr path=/dataimport
>> >> >params={clean=false&command=full-import&wt=javabin&version=2}
>> >> >
>> >> >
>> >> >
>> >> >We are getting a Java heap OOM exception when indexing (updating) 27
>> >> >million records.  If we increase the Java heap memory settings the
>> >> problem
>> >> >goes away but we believe the problem has not been fixed and that we
>> will
>> >> >eventually get the same OOM exception.  We have other processes on the
>> >> >server that also require resources so we cannot continually increase
>> the
>> >> >memory settings to resolve the OOM issue.  We are trying to find a
>> way to
>> >> >configure the SOLR instance to reduce or preferably eliminate the
>> >> >possibility of an OOM exception.
>> >> >
>> >> >
>> >> >
>> >> >We can reproduce the problem on a test machine.  We set the Java heap
>> >> >memory size to 64MB to accelerate the exception.  If we increase this
>> >> >setting the same problems occurs, just hours later.  In the test
>> >> >en

Re: Searching multivalue fields.

2014-04-04 Thread Vijay Kokatnur
I had already tested with omitTermFreqAndPositions="false" .  I still got
the same error.

Is there something that I am overlooking?

On Fri, Apr 4, 2014 at 2:45 PM, Ahmet Arslan  wrote:

> Hi Vijay,
>
> Add omitTermFreqAndPositions="false"  attribute to fieldType definitions.
>
>  omitTermFreqAndPositions="false" sortMissingLast="true" />
>
> omitTermFreqAndPositions="false" precisionStep="0"
> positionIncrementGap="0"/>
>
> You don't need termVectors  for this.
>
>1.2: omitTermFreqAndPositions attribute introduced, true by default
> except for text fields.
>
> And please reply to solr user mail, so others can use the threat later on.
>
> Ahmet
>   On Saturday, April 5, 2014 12:18 AM, Vijay Kokatnur <
> kokatnur.vi...@gmail.com> wrote:
>   Hey Ahmet,
>
> Sorry it took some time to test this.  But schema definition seem to
> conflict with SpanQuery.  I get following error when I use Spans
>
>  field "OrderLineType" was indexed without position data; cannot run
> SpanTermQuery (term=11)
>
> I changed field definition in the schema but can't find the right
> attribute to set this.  My last attempt was with following definition
>
> multiValued="true" *termVectors="true" termPositions="true"
> termOffsets="true"*/>
>
>  Any ideas what I am doing wrong?
>
> Thanks,
> -Vijay
>
> On Wed, Mar 26, 2014 at 1:54 PM, Ahmet Arslan  wrote:
>
> Hi Vijay,
>
> After reading the documentation it seems that following query is what you
> are after. It will return OrderId:345 without matching OrderId:123
>
> SpanQuery q1  = new SpanTermQuery(new Term("BookingRecordId", "234"));
> SpanQuery q2  = new SpanTermQuery(new Term("OrderLineType", "11"));
> SpanQuery q2m new FieldMaskingSpanQuery(q2, "BookingRecordId");
> Query q = new SpanNearQuery(new SpanQuery[]{q1, q2m}, -1, false);
>
> Ahmet
>
>
>
> On Wednesday, March 26, 2014 10:39 PM, Ahmet Arslan 
> wrote:
> Hi Vijay,
>
> I personally don't understand joins very well. Just a guess may
> be FieldMaskingSpanQuery could be used?
>
>
> http://blog.griddynamics.com/2011/07/solr-experience-search-parent-child.html
>
>
> Ahmet
>
>
>
>
> On Wednesday, March 26, 2014 9:46 PM, Vijay Kokatnur <
> kokatnur.vi...@gmail.com> wrote:
> Hi,
>
> I am bumping this thread again one last time to see if anyone has a
> solution.
>
> In it's current state, our application is storing child items as multivalue
> fields.  Consider some orders, for example -
>
>
> {
> OrderId:123
> BookingRecordId : ["145", "987", "*234*"]
> OrderLineType : ["11", "12", "*13*"]
> .
> }
> {
> OrderId:345
> BookingRecordId : ["945", "882", "*234*"]
> OrderLineType : ["1", "12", "*11*"]
> .
> }
> {
> OrderId:678
> BookingRecordId : ["444"]
> OrderLineType : ["11"]
> .
> }
>
>
> Here, If you look up for an Order with BookingRecordId: 234 And
> OrderLineType:11.  You will get two orders with orderId : 123 and 345,
> which is correct.  You have two arrays in both the orders that satisfy this
> condition.
>
> However, for OrderId:123, the value at 3rd index of OrderLineType array is
> 13 and not 11( this is for OrderId:345).  So orderId 123 should be
> excluded. This is what I am trying to achieve.
>
> I got some suggestions from a solr-user to use FieldsCollapsing, Join,
> Block-join or string concatenation.  None of these approaches can be used
> without re-indexing schema.
>
> Has anyone found a non-invasive solution for this?
>
> Thanks,
>
> -Vijay
>
>
>
>
>


Re: Difference between ["" TO *] and [* TO *] at Solr?

2014-04-04 Thread Erick Erickson
What kind of field are you using? Not quite sure what would happen
with a date or numeric field for instance.


On Fri, Apr 4, 2014 at 10:28 AM, Furkan KAMACI  wrote:
> Hİ;
>
> What is the difference between ["" TO *] and [* TO *] at Solr? (I tested it
> at 4.5.1 and numFounds are different.
>
> Thanks;
> Furkan KAMACI


Re: Full Indexing is Causing a Java Heap Out of Memory Exception

2014-04-04 Thread Candygram For Mongo
Guessing that the attachments won't work, I am pasting one file in each of
four separate emails.

database.xml































On Fri, Apr 4, 2014 at 4:57 PM, Candygram For Mongo <
candygram.for.mo...@gmail.com> wrote:

> Does this user list allow attachments?  I have four files attached
> (database.xml, error.txt, schema.xml, solrconfig.xml).  We just ran the
> process again using the parameters you suggested, but not to a csv file.
>  It errored out quickly.  We are working on the csv file run.
>
> Removed both  and  parts/definitions from
> solrconfig.xml
>
> Disabled tlog by removing
>
>
>   ${solr.ulog.dir:}
> 
>
> from solrconfig.xml
>
> Used commit=true parameter. ?commit=true&command=full-import
>
>
> On Fri, Apr 4, 2014 at 3:29 PM, Ahmet Arslan  wrote:
>
>> Hi,
>>
>> This may not solve your problem but generally it is recommended to
>> disable auto commit and transaction logs for bulk indexing.
>> And issue one commit at the very end. Do you tlogs enabled? I see "commit
>> failed" in the error message thats why I am offering this.
>>
>> And regarding comma separated values, with this approach you focus on
>> just solr importing process. You separate data acquisition phrase. And it
>> is very fast load even big csv files
>> http://wiki.apache.org/solr/UpdateCSV
>> I have never experienced OOM during indexing, I suspect data acquisition
>> has role in it.
>>
>> Ahmet
>>
>> On Saturday, April 5, 2014 1:18 AM, Candygram For Mongo <
>> candygram.for.mo...@gmail.com> wrote:
>>
>> We would be happy to try that.  That sounds counter intuitive for the
>> high volume of records we have.  Can you help me understand how that might
>> solve our problem?
>>
>>
>>
>>
>> On Fri, Apr 4, 2014 at 2:34 PM, Ahmet Arslan  wrote:
>>
>> Hi,
>> >
>> >Can you remove auto commit for bulk import. Commit at the very end?
>> >
>> >Ahmet
>> >
>> >
>> >
>> >
>> >On Saturday, April 5, 2014 12:16 AM, Candygram For Mongo <
>> candygram.for.mo...@gmail.com> wrote:
>> >In case the attached database.xml file didn't show up, I have pasted in
>> the
>> >contents below:
>> >
>> >
>> >> >name="org_only"
>> >type="JdbcDataSource"
>> >driver="oracle.jdbc.OracleDriver"
>> >url="jdbc:oracle:thin:@test2.abc.com:1521:ORCL"
>> >user="admin"
>> >password="admin"
>> >readOnly="false"
>> >batchSize="100"
>> >/>
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >> >name="ADDRESS_ACCT_ALL.ADDR_TYPE_CD_abc" />
>> >> name="ADDRESS_ACCT_ALL.LONGITUDE_abc" />
>> >> />
>> >> />
>> >
>> >
>> >> name="ADDRESS_ACCT_ALL.EMAIL_ADDR_abc"
>> >/>
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >On Fri, Apr 4, 2014 at 11:55 AM, Candygram For Mongo <
>> >candygram.for.mo...@gmail.com> wrote:
>> >
>> >> In this case we are indexing an Oracle database.
>> >>
>> >> We do not include the data-config.xml in our distribution.  We store
>> the
>> >> database information in the database.xml file.  I have attached the
>> >> database.xml file.
>> >>
>> >> When we use the default merge policy settings, we get the same results.
>> >>
>> >>
>> >>
>> >> We have not tried to dump the table to a comma separated file.  We
>> think
>> >> that dumping this size table to disk will introduce other memory
>> problems
>> >> with big file management. We have not tested that case.
>> >>
>> >>
>> >> On Fri, Apr 4, 2014 at 7:25 AM, Ahmet Arslan 
>> wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> Which database are you using? Can you send us data-config.xml?
>> >>>
>> >>> What happens when you use default merge policy settings?
>> >>>
>> >>> What happens when you dump your table to Comma Separated File and fed
>> >>> that file to solr?
>> >>>
>> >>> Ahmet
>> >>>
>> >>> On Friday, April 4, 2014 5:10 PM, Candygram For Mongo <
>> >>> candygram.for.mo...@gmail.com> wrote:
>> >>>
>> >>> The ramBufferSizeMB was set to 6MB only on the test system to make the
>> >>> system crash sooner.  In production that tag is commented out which
>> >>> I believe forces the default value to be used.
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan 
>> wrote:
>> >>>
>> >>> Hi,
>> >>> >
>> >>> >out of curiosity, why did you set ramBufferSizeMB to 6?
>> >>> >
>> >>> >Ahmet
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> >On Friday, April 4, 2014 3:27 AM, Candygram For Mongo <
>> >>> candygram.for.mo...@gmail.com> wrote:
>> >>> >*Main issue: Full Indexing is Causing a Java Heap Out of Memory
>> Exception
>> >>> >
>> >>> >*SOLR/Lucene version: *4.2.1*
>> >>> >
>> >>> >
>> >>> >*JVM version:
>> >>> >
>> >>> >Java(TM) SE Runtime Environment (build 1.7.0_07-b11)
>> >>> >
>> >>> >Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)
>> >>> >
>> >>> >
>> >>> >
>> >>> >*Indexer startup command:
>> >>> >
>> >>> >set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m
>> >>> >
>> >>> >
>> >>> >
>> >>> >java " %JVMARGS% ^
>> >>> >
>> >>> >-Dcom.sun.management.jmxremote.port=1092 ^
>> >>> >
>> >>> >-Dcom.sun.management.jmxremote.ssl=false ^

Re: Full Indexing is Causing a Java Heap Out of Memory Exception

2014-04-04 Thread Candygram For Mongo
error.txt below


Java Platform Detected x64
Java Platform Detected -XX:MaxPermSize=64m -Xss256K -Xmx64m -Xms64m
 -XX:+HeapDumpOnOutOfMemoryError -XX:+CreateMinidumpOnCrash
2014-04-04 15:49:43.341:INFO:oejs.Server:jetty-8.1.8.v20121106
2014-04-04 15:49:43.353:INFO:oejdp.ScanningAppProvider:Deployment monitor
D:\AbcData\V12\application server\server\indexer\example\contexts at
interval 0
2014-04-04 15:49:43.358:INFO:oejd.DeploymentManager:Deployable added:
D:\AbcData\V12\application
server\server\indexer\example\contexts\solr-jetty-context.xml
2014-04-04 15:49:43.989:INFO:oejw.StandardDescriptorProcessor:NO JSP
Support for /solr, did not find org.apache.jasper.servlet.JspServlet
Null identity service, trying login service: null
Finding identity service: null
2014-04-04 15:49:44.011:INFO:oejsh.ContextHandler:started
o.e.j.w.WebAppContext{/solr,file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr-webapp/webapp/},D:\AbcData\V12\application
server\server\indexer\example/webapps/solr.war
2014-04-04 15:49:44.012:INFO:oejsh.ContextHandler:started
o.e.j.w.WebAppContext{/solr,file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr-webapp/webapp/},D:\AbcData\V12\application
server\server\indexer\example/webapps/solr.war
Apr 04, 2014 3:49:44 PM org.apache.solr.servlet.SolrDispatchFilter init
INFO: SolrDispatchFilter.init()
Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: JNDI not configured for solr (NoInitialContextEx)
Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: solr home defaulted to 'solr/' (could not find system property or
JNDI)
Apr 04, 2014 3:49:44 PM org.apache.solr.core.CoreContainer$Initializer
initialize
INFO: looking for solr.xml: D:\AbcData\V12\application
server\server\indexer\example\solr\solr.xml
Apr 04, 2014 3:49:44 PM org.apache.solr.core.CoreContainer 
INFO: New CoreContainer 1879341237
Apr 04, 2014 3:49:44 PM org.apache.solr.core.CoreContainer load
INFO: Loading CoreContainer using Solr Home: 'solr/'
Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader 
INFO: new SolrResourceLoader for directory: 'solr/'
Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding
'file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr/lib/apache-log4j-extras-1.2.17.jar'
to classloader
Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding
'file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr/lib/jtds-1.2.5.jar'
to classloader
Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding
'file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr/lib/log4j-1.2.17.jar'
to classloader
Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding
'file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr/lib/msbase.jar'
to classloader
Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding
'file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr/lib/mssqlserver.jar'
to classloader
Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding
'file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr/lib/msutil.jar'
to classloader
Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding
'file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr/lib/ojdbc6.jar'
to classloader
Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding
'file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr/lib/slf4j-api-1.7.5.jar'
to classloader
Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding
'file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr/lib/slf4j-nop-1.7.5.jar'
to classloader
Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding
'file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr/lib/sqljdbc4.jar'
to classloader
Apr 04, 2014 3:49:44 PM
org.apache.solr.handler.component.HttpShardHandlerFactory getParameter
INFO: Setting socketTimeout to: 0
Apr 04, 2014 3:49:44 PM
org.apache.solr.handler.component.HttpShardHandlerFactory getParameter
INFO: Setting urlScheme to: http://
Apr 04, 2014 3:49:44 PM
org.apache.solr.handler.component.HttpShardHandlerFactory getParameter
INFO: Setting connTimeout to: 0
Apr 04, 2014 3:49:44 PM
org.apache.solr.handler.component.HttpShardHandlerFactory getParameter
INFO: Setting maxConnectionsPerHost to: 20
Apr 04, 2014 3:49:44 PM
org.apache.solr.handler.component.HttpShardHandlerFactory getParameter
INFO: Setting corePoolSize to: 0
Apr 04, 2014 3:49:44 PM
org.apache.solr.handler.component.HttpShardHandlerFactory getParameter
INFO: Setting maximumPoolSize to: 2147483647
Apr 04, 2014 3:49:44 PM
org.apache.solr.h

Re: Distributed tracing for Solr via adding HTTP headers?

2014-04-04 Thread Alexandre Rafalovitch
I like the idea. No comments about implementation, leave it to others.

But if it is done, maybe somebody very familiar with logging can also
review Solr's current logging config. I suspect it is not optimized
for troubleshooting at this point.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Sat, Apr 5, 2014 at 3:16 AM, Gregg Donovan  wrote:
> We have some metadata -- e.g. a request UUID -- that we log to every log
> line using Log4J's MDC [1]. The UUID logging allows us to connect any log
> lines we have for a given request across servers. Sort of like Zipkin [2].
>
> Currently we're using EmbeddedSolrServer without sharding, so adding the
> UUID is fairly simple, since everything is in one process and one thread.
> But, we're testing a sharded HTTP implementation and running into some
> difficulties getting this data passed around in a way that lets us trace
> all log lines generated by a request to its UUID.
>


Re: SOLR Jetty Server on Windows 2003

2014-04-04 Thread Alexandre Rafalovitch
You might be hitting
http://en.wikipedia.org/wiki/Cross-origin_resource_sharing .

Something like http://www.telerik.com/fiddler or Wireshark may allow
you to see network traffic if you don't have other means.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Sat, Apr 5, 2014 at 12:49 AM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions) 
wrote:
> Hi , I am trying to install solr on the Windows 2003 with Jetty server. Form 
> browser everything works , but when I try to acesss from another javascript 
> Code in other machine I am not getting reponse. I am using Xmlhttprequest to 
> get the response from server using javascript.
>
> Any Help...?
>
>
> --Ravi


Re: How to reduce the search speed of solrcloud

2014-04-04 Thread Alexandre Rafalovitch
And 50 million records of 3 fields each should not become 50Gb of
data. Something smells wrong there. Do you have unique IDs setup?

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Sat, Apr 5, 2014 at 12:48 AM, Anshum Gupta  wrote:
> I am not sure if you setup your SolrCloud right. Can you also provide me
> with the version of Solr that you're running.
> Also, if you could tell me about how did you setup your SolrCloud cluster.
> Are the times consistent? Is this the only collection on the cluster?
>
> Also, if I am getting it right, you have 15 ZKs running. Correct me if I'm
> wrong, but if I'm not, you don't need that kind of a zk setup.
>
>
> On Fri, Apr 4, 2014 at 9:39 AM, Sathya  wrote:
>
>> Hi shawn,
>>
>> I have indexed 50 million data in 5 servers. 3 servers have 8gb ram. One
>> have 24gb and another one have 64gb ram. I was allocate 4 gb ram to solr in
>> each machine. I am using solrcloud. My total index size is 50gb including 5
>> servers. Each server have 3 zookeepers. Still I didnt check about OS disk
>> cache and heap memory. I will check and let u know shawn. If anything, pls
>> let me know.
>>
>> Thank u shawn.
>>
>> On Friday, April 4, 2014, Shawn Heisey-4 [via Lucene] <
>> ml-node+s472066n4129150...@n3.nabble.com> wrote:
>> > On 4/4/2014 1:31 AM, Sathya wrote:
>> >> Hi All,
>> >>
>> >> Hi All, I am new to Solr. And i dont know how to increase the search
>> speed
>> >> of solrcloud. I have indexed nearly 4 GB of data. When i am searching a
>> >> document using java with solrj, solr takes more 6 seconds to return a
>> query
>> >> result. Any one please help me to reduce the search query time to less
>> than
>> >> 500 ms. i have allocate the 4 GB ram for solr. Please let me know for
>> >> further details about solrcloud config.
>> >
>> > How much total RAM do you have on the system, and how much total index
>> > data is on that system (adding up all the Solr cores)?  You've already
>> > said that you have allocated 4GB of RAM for Solr.
>> >
>> > Later you said you had 50 million documents, and then you showed us a
>> > URL that looks like SolrCloud.
>> >
>> > I suspect that you don't have enough RAM left over to cache your index
>> > effectively -- the OS Disk Cache is too small.
>> >
>> > http://wiki.apache.org/solr/SolrPerformanceProblems
>> >
>> > Another possible problem, also discussed on that page, is that your Java
>> > heap is too small.
>> >
>> > Thanks,
>> > Shawn
>> >
>> >
>> >
>> > 
>> > If you reply to this email, your message will be added to the discussion
>> below:
>> >
>>
>> http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129150.html
>> > To unsubscribe from How to reduce the search speed of solrcloud, click
>> here.
>> > NAML
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129173.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>
>
> --
>
> Anshum Gupta
> http://www.anshumgupta.net


Re: Solr Search For Documents That Has Empty Content For a Given Particular Field

2014-04-04 Thread Alexandre Rafalovitch
And one solution is to use UpdateRequestProcessor that will create a
separate binary field for presence/absence and query on that instead.

Regards,
Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Fri, Apr 4, 2014 at 11:13 PM, Chris Hostetter
 wrote:
>
> : "field" : "" // this is the field that I want to learn which document has
> : it.
>
> How you (can) query for a field value like that is going to depend
> entirely on the FieldTYpe/Analyzer ... if it's a string field, of uses
> KeywordTokenizer then q=field:"" should find it -- if you use a more
> traditional analyzer then it probably didn't produce any terms for hte
> input "" and from Solr's perspective a document that was indexed using
> an empty string value is exactly the same as a document that had no value
> when index.
>
> In essenc,e your question is equivilent to asking "How can i search for
> doc1, but not doc2, evne though i'm using LowerCaseAnalyzer which produces
> exactly the same indexe terms or both...
>
>doc1: Quick Fox
>doc2: quick fox
>
>
>
> -Hoss
> http://www.lucidworks.com/


Re: Cannot run program "svnversion" when building lucene 4.7.1

2014-04-04 Thread Chris Hostetter

: > I am trying to build lucene 4.7.1 from the sources. I can compile without
: > any issues but when I try to build the dist, lucene gives me
: > Cannot run program "svnversion" ... The system cannot find the specified
: > file.
: >
: > I am compiling on Windows 7 64-bit using java version 1.7.0.45 64-bit.

That's ... strange.

the build system *attempts* to include the svnversion info in the build 
artifacts, but it is explicitly designed to not fail if svnversion can't 
be run.

Can you pleae file a bug, note in the description your specific OS 
setup, and include as an attachment the full build logs you get from ant 
that give you this error?  ideally run ant using the "-v" option.


worst case scenerio: you should be able to override the "svnversion.exe" 
build property to some simple command that doesn't output much (not 
sure what a good command to use on windows might be - i would use 
something like "whoami" on linux if i din't have svn installed). 

the command would be something like this...

  ant -Dsvnversion.exe=whoami dist



-Hoss
http://www.lucidworks.com/


Re: Difference between ["" TO *] and [* TO *] at Solr?

2014-04-04 Thread Jack Krupansky
And we can debate what it should or shouldn't be (and just check the 
code!) - and a clear contract is quite desirable, but this is starting to 
smell like an XY Problem - what is the user really trying to query - stated 
simply in English.


-- Jack Krupansky

-Original Message- 
From: Erick Erickson

Sent: Friday, April 4, 2014 5:38 PM
To: solr-user@lucene.apache.org
Subject: Re: Difference between ["" TO *] and [* TO *] at Solr?

What kind of field are you using? Not quite sure what would happen
with a date or numeric field for instance.


On Fri, Apr 4, 2014 at 10:28 AM, Furkan KAMACI  
wrote:

Hİ;

What is the difference between ["" TO *] and [* TO *] at Solr? (I tested 
it

at 4.5.1 and numFounds are different.

Thanks;
Furkan KAMACI