Re: Solr maximum Optimal Index Size per Shard

2014-06-04 Thread Shawn Heisey
On 6/4/2014 12:45 AM, Vineet Mishra wrote:
> Thanks all for your response.
> I presume this conversation concludes that indexing around 1Billion
> documents per shard won't be a problem, as I have 10 Billion docs to index,
> so approx 10 shards with 1 Billion each should be fine with it and how
> about Memory, what size of RAM should be fine for this amount of data?

Figure out the heap requirements of the operating system and every
program on the machine (Solr especially).  Then you would add that
number to the total size of the index data on the machine.  That is the
ideal minimum RAM.

http://wiki.apache.org/solr/SolrPerformanceProblems

Unfortunately, if you are dealing with a huge index with billions of
documents, it is likely to be prohibitively expensive to buy that much
RAM.  If you are running Solr on Amazon's cloud, the cost for that much
RAM would be astronomical.

Exactly how much RAM would actually be required is very difficult to
predict.  If you had only 25% of the ideal, your index might have
perfectly acceptable performance, or it might not.  It might do fine
under a light query load, but if you increase to 50 queries per second,
performance may drop significantly ... or it might be good.  It's
generally not possible to know how your hardware will perform until you
actually build and use your index.

http://searchhub.org/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

A general rule of thumb for RAM that I have found to be useful is that
if you've got less than half of the ideal memory size, you might have
performance problems.

> Moreover what should be the indexing technique for this huge data set, as
> currently I am indexing with EmbeddedSolrServer but its going pathetically
> slow after some 20Gb of indexing. Comparatively SolrHttpPost was slow due
> to network delays and response but after this long running the indexing
> with EmbeddedSolrServer I am getting a different notion.
> Any good indexing technique for this huge dataset would be highly
> appreciated.

EmbeddedSolrServer is not recommended.  Run Solr in the traditional way
with HTTP connectivity.  HTTP overhead on a LAN is usually quite small.
 Solr is fully thread-safe, so you can have several indexing threads all
going at the same time.

Indexes at this scale should normally be built with SolrCloud, with
enough servers so that each machine is only handling one shard replica.
 The ideal indexing program would be written in Java, using CloudSolrServer.

Thanks,
Shawn



unexpected result with custom filter

2014-06-04 Thread Aman Tandon
Hi,

I am new in solr and i am trying to create the custom filter, to create
that filter i just copied the lowercasefilter and making all the changes in
the increment token, but to make sure that my changes are applying
properly, i am also printing some debugging info in log.

public final boolean incrementToken() throws IOException
  {
if (this.input.incrementToken()) {

char[] inputChar = this.termAtt.buffer();

System.err.println("Token is "+new String(inputChar));

  return true;
}
return false;
  }

eg: Field Value: spellcheck for bags

recieved : spellcheck for bags as expected.

but when i am converting it into string i am getting unexpected results in
logs.

Token is spellcheck
Token is forllcheck
Token is bagslcheck
Token is spellcheck
Token is forllcheck
Token is bagslcheck

Please help me here.


With Regards
Aman Tandon


Re: Integrate solr with openNLP

2014-06-04 Thread Tommaso Teofili
Hi all,

Ahment was suggesting to eventually use UIMA integration because OpenNLP
has already an integration with Apache UIMA and so you would just have to
use that [1].
And that's one of the main reason UIMA integration was done: it's a
framework that you can easily hook into in order to plug your NLP algorithm.

If you want to just use OpenNLP then it's up to you if either write your
own UpdateRequestProcessor plugin [2] to add metadata extracted by OpenNLP
to your documents or either you can write a dedicated analyzer / tokenizer
/ token filter.

For the OpenNLP integration (LUCENE-2899), the patch is not up to date with
the latest APIs in trunk, however you should be able to apply it to (if I
recall correctly) to 4.4 version or so, and also adapting it to the latest
API shouldn't be too hard.

Regards,
Tommaso

[1] :
http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#org.apche.opennlp.uima
[2] : http://wiki.apache.org/solr/UpdateRequestProcessor



2014-06-03 15:34 GMT+02:00 Ahmet Arslan :

> Can you extract names, locations etc using OpenNLP in plain/straight java
> program?
>
> If yes, here are two seperate options :
>
> 1) Use http://searchhub.org/2012/02/14/indexing-with-solrj/ as an example
> to integrate your NER code into it and write your own indexing code. You
> have the full power here. No solr-plugins are involved.
>
> 2) Use 'Implementing a conditional copyField' given here :
> http://wiki.apache.org/solr/UpdateRequestProcessor
> as an example and integrate your NER code into it.
>
>
> Please note that these are separate ways to enrich your incoming
> documents, choose either (1) or (2).
>
>
>
> On Tuesday, June 3, 2014 3:30 PM, Vivekanand Ittigi 
> wrote:
> Okay, but i dint understand what you said. Can you please elaborate.
>
> Thanks,
> Vivek
>
>
>
>
>
> On Tue, Jun 3, 2014 at 5:36 PM, Ahmet Arslan  wrote:
>
> > Hi Vivekanand,
> >
> > I have never use UIMA+Solr before.
> >
> > Personally I think it takes more time to learn how to configure/use these
> > uima stuff.
> >
> >
> > If you are familiar with java, write a class that extends
> > UpdateRequestProcessor(Factory). Use OpenNLP for NER, add these new
> fields
> > (organisation, city, person name, etc, to your document. This phase is
> > usually called 'enrichment'.
> >
> > Does that makes sense?
> >
> >
> >
> > On Tuesday, June 3, 2014 2:57 PM, Vivekanand Ittigi <
> vi...@biginfolabs.com>
> > wrote:
> > Hi Ahmet,
> >
> > I followed what you said
> > https://cwiki.apache.org/confluence/display/solr/UIMA+Integration. But
> how
> > can i achieve my goal? i mean extracting only name of the organization or
> > person from the content field.
> >
> > I guess i'm almost there but something is missing? please guide me
> >
> > Thanks,
> > Vivek
> >
> >
> >
> >
> >
> > On Tue, Jun 3, 2014 at 2:50 PM, Vivekanand Ittigi  >
> > wrote:
> >
> > > Entire goal cant be said but one of those tasks can be like this.. we
> > have
> > > big document(can be website or pdf etc) indexed to the solr.
> > > Lets say  will sore store the contents of document.
> > > All i want to do is pick name of persons,places from it using openNLP
> or
> > > some other means.
> > >
> > > Those names should be reflected in solr itself.
> > >
> > > Thanks,
> > > Vivek
> > >
> > >
> > > On Tue, Jun 3, 2014 at 1:33 PM, Ahmet Arslan 
> wrote:
> > >
> > >> Hi,
> > >>
> > >> Please tell us what you are trying to in a new treat. Your high level
> > >> goal. There may be some other ways/tools such as (
> > >> https://stanbol.apache.org ) other than OpenNLP.
> > >>
> > >>
> > >>
> > >> On Tuesday, June 3, 2014 8:31 AM, Vivekanand Ittigi <
> > >> vi...@biginfolabs.com> wrote:
> > >>
> > >>
> > >>
> > >> We'll surely look into UIMA integration.
> > >>
> > >> But before moving, is this( https://wiki.apache.org/solr/OpenNLP )
> the
> > >> only link we've got to integrate?isn't there any other article or link
> > >> which may help us to do fix this problem.
> > >>
> > >> Thanks,
> > >> Vivek
> > >>
> > >>
> > >>
> > >>
> > >> On Tue, Jun 3, 2014 at 2:50 AM, Ahmet Arslan 
> wrote:
> > >>
> > >> Hi,
> > >> >
> > >> >I believe I answered it. Let me re-try,
> > >> >
> > >> >There is no committed code for OpenNLP. There is an open ticket with
> > >> patches. They may not work with current trunk.
> > >> >
> > >> >Confluence is the official documentation. Wiki is maintained by
> > >> community. Meaning wiki can talk about some uncommitted
> features/stuff.
> > >> Like this one : https://wiki.apache.org/solr/OpenNLP
> > >> >
> > >> >What I am suggesting is, have a look at
> > >> https://cwiki.apache.org/confluence/display/solr/UIMA+Integration
> > >> >
> > >> >
> > >> >And search how to use OpenNLP inside UIMA. May be LUCENE-2899 is
> > already
> > >> doable with solr-uima. I am adding Tommaso (sorry for this but we need
> > an
> > >> authoritative answer here) to clarify this.
> > >> >
> > >> >
> > >> >Also consider indexing with SolrJ and use OpenNLP enrichment outside
> > the
> > 

Re: unexpected result with custom filter

2014-06-04 Thread Ahmet Arslan
Hi Aman,

What you see is normal. If you want to convert it to a string use 

this.termAttribute.toString();

Please see source code of org.apache.lucene.analysis.br.BrazilianStemFilter for 
an example.


Ahmet



On Wednesday, June 4, 2014 10:21 AM, Aman Tandon  
wrote:
Hi,

I am new in solr and i am trying to create the custom filter, to create
that filter i just copied the lowercasefilter and making all the changes in
the increment token, but to make sure that my changes are applying
properly, i am also printing some debugging info in log.

public final boolean incrementToken() throws IOException
  {
    if (this.input.incrementToken()) {

        char[] inputChar = this.termAtt.buffer();

        System.err.println("Token is "+new String(inputChar));

      return true;
    }
    return false;
  }

eg: Field Value: spellcheck for bags

recieved : spellcheck for bags as expected.

but when i am converting it into string i am getting unexpected results in
logs.

Token is spellcheck
Token is forllcheck
Token is bagslcheck
Token is spellcheck
Token is forllcheck
Token is bagslcheck

Please help me here.


With Regards
Aman Tandon



Use a field with space in qf

2014-06-04 Thread devraj.jaiman
Hi,

Long time ago I defined a field in schema with space(e.g 'Movie Name').
Things were going very cool till I need to use edismax query parser and want
to give 'Movie Name' in qf. But as we all know qf consider space as field
delimiter. I tried 'Movie\ Name' 'Movie\+Name' nothing is working. 

So What should I do, is there any way to use field with space or I have to
re-index all data in new field without space.

Thanks in advance.

Regards,
Devraj



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Use-a-field-with-space-in-qf-tp4139768.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DataImportHandler while Replication

2014-06-04 Thread rulinma
good.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DataImportHandler-while-Replication-tp4138763p4139774.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Integrate solr with openNLP

2014-06-04 Thread Vivekanand Ittigi
Hi Tommaso,

Yes, you are right. 4.4 version will work.. I'm able to compile now. I'm
trying to apply named recognition(person name) token but im not seeing any
change. my schema.xml looks like this:




  



  



Please guide..?

Thanks,
Vivek


On Wed, Jun 4, 2014 at 1:27 PM, Tommaso Teofili 
wrote:

> Hi all,
>
> Ahment was suggesting to eventually use UIMA integration because OpenNLP
> has already an integration with Apache UIMA and so you would just have to
> use that [1].
> And that's one of the main reason UIMA integration was done: it's a
> framework that you can easily hook into in order to plug your NLP algorithm.
>
> If you want to just use OpenNLP then it's up to you if either write your
> own UpdateRequestProcessor plugin [2] to add metadata extracted by OpenNLP
> to your documents or either you can write a dedicated analyzer / tokenizer
> / token filter.
>
> For the OpenNLP integration (LUCENE-2899), the patch is not up to date
> with the latest APIs in trunk, however you should be able to apply it to
> (if I recall correctly) to 4.4 version or so, and also adapting it to the
> latest API shouldn't be too hard.
>
> Regards,
> Tommaso
>
> [1] :
> http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#org.apche.opennlp.uima
> [2] : http://wiki.apache.org/solr/UpdateRequestProcessor
>
>
>
> 2014-06-03 15:34 GMT+02:00 Ahmet Arslan :
>
> Can you extract names, locations etc using OpenNLP in plain/straight java
>> program?
>>
>> If yes, here are two seperate options :
>>
>> 1) Use http://searchhub.org/2012/02/14/indexing-with-solrj/ as an
>> example to integrate your NER code into it and write your own indexing
>> code. You have the full power here. No solr-plugins are involved.
>>
>> 2) Use 'Implementing a conditional copyField' given here :
>> http://wiki.apache.org/solr/UpdateRequestProcessor
>> as an example and integrate your NER code into it.
>>
>>
>> Please note that these are separate ways to enrich your incoming
>> documents, choose either (1) or (2).
>>
>>
>>
>> On Tuesday, June 3, 2014 3:30 PM, Vivekanand Ittigi <
>> vi...@biginfolabs.com> wrote:
>> Okay, but i dint understand what you said. Can you please elaborate.
>>
>> Thanks,
>> Vivek
>>
>>
>>
>>
>>
>> On Tue, Jun 3, 2014 at 5:36 PM, Ahmet Arslan  wrote:
>>
>> > Hi Vivekanand,
>> >
>> > I have never use UIMA+Solr before.
>> >
>> > Personally I think it takes more time to learn how to configure/use
>> these
>> > uima stuff.
>> >
>> >
>> > If you are familiar with java, write a class that extends
>> > UpdateRequestProcessor(Factory). Use OpenNLP for NER, add these new
>> fields
>> > (organisation, city, person name, etc, to your document. This phase is
>> > usually called 'enrichment'.
>> >
>> > Does that makes sense?
>> >
>> >
>> >
>> > On Tuesday, June 3, 2014 2:57 PM, Vivekanand Ittigi <
>> vi...@biginfolabs.com>
>> > wrote:
>> > Hi Ahmet,
>> >
>> > I followed what you said
>> > https://cwiki.apache.org/confluence/display/solr/UIMA+Integration. But
>> how
>> > can i achieve my goal? i mean extracting only name of the organization
>> or
>> > person from the content field.
>> >
>> > I guess i'm almost there but something is missing? please guide me
>> >
>> > Thanks,
>> > Vivek
>> >
>> >
>> >
>> >
>> >
>> > On Tue, Jun 3, 2014 at 2:50 PM, Vivekanand Ittigi <
>> vi...@biginfolabs.com>
>> > wrote:
>> >
>> > > Entire goal cant be said but one of those tasks can be like this.. we
>> > have
>> > > big document(can be website or pdf etc) indexed to the solr.
>> > > Lets say  will sore store the contents of
>> document.
>> > > All i want to do is pick name of persons,places from it using openNLP
>> or
>> > > some other means.
>> > >
>> > > Those names should be reflected in solr itself.
>> > >
>> > > Thanks,
>> > > Vivek
>> > >
>> > >
>> > > On Tue, Jun 3, 2014 at 1:33 PM, Ahmet Arslan 
>> wrote:
>> > >
>> > >> Hi,
>> > >>
>> > >> Please tell us what you are trying to in a new treat. Your high level
>> > >> goal. There may be some other ways/tools such as (
>> > >> https://stanbol.apache.org ) other than OpenNLP.
>> > >>
>> > >>
>> > >>
>> > >> On Tuesday, June 3, 2014 8:31 AM, Vivekanand Ittigi <
>> > >> vi...@biginfolabs.com> wrote:
>> > >>
>> > >>
>> > >>
>> > >> We'll surely look into UIMA integration.
>> > >>
>> > >> But before moving, is this( https://wiki.apache.org/solr/OpenNLP )
>> the
>> > >> only link we've got to integrate?isn't there any other article or
>> link
>> > >> which may help us to do fix this problem.
>> > >>
>> > >> Thanks,
>> > >> Vivek
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> On Tue, Jun 3, 2014 at 2:50 AM, Ahmet Arslan 
>> wrote:
>> > >>
>> > >> Hi,
>> > >> >
>> > >> >I believe I answered it. Let me re-try,
>> > >> >
>> > >> >There is no committed code for OpenNLP. There is an open ticket with
>> > >> patches. They may not work with current trunk.
>> > >> >
>> > >> >Confluence is the official documentation. Wiki is maintained by
>> > >> community. Meaning wiki can tal

Re: unexpected result with custom filter

2014-06-04 Thread Aman Tandon
Thanks Ahmet that worked

Can anybody help me here to how should i start to develop and learn the
solr internals, so that i can make these custom solr developments
efficiently with proper understanding for all these classes.




With Regards
Aman Tandon


On Wed, Jun 4, 2014 at 1:30 PM, Ahmet Arslan  wrote:

> Hi Aman,
>
> What you see is normal. If you want to convert it to a string use
>
> this.termAttribute.toString();
>
> Please see source code of
> org.apache.lucene.analysis.br.BrazilianStemFilter for an example.
>
>
> Ahmet
>
>
>
> On Wednesday, June 4, 2014 10:21 AM, Aman Tandon 
> wrote:
> Hi,
>
> I am new in solr and i am trying to create the custom filter, to create
> that filter i just copied the lowercasefilter and making all the changes in
> the increment token, but to make sure that my changes are applying
> properly, i am also printing some debugging info in log.
>
> public final boolean incrementToken() throws IOException
>   {
> if (this.input.incrementToken()) {
>
> char[] inputChar = this.termAtt.buffer();
>
> System.err.println("Token is "+new String(inputChar));
>
>   return true;
> }
> return false;
>   }
>
> eg: Field Value: spellcheck for bags
>
> recieved : spellcheck for bags as expected.
>
> but when i am converting it into string i am getting unexpected results in
> logs.
>
> Token is spellcheck
> Token is forllcheck
> Token is bagslcheck
> Token is spellcheck
> Token is forllcheck
> Token is bagslcheck
>
> Please help me here.
>
>
> With Regards
> Aman Tandon
>
>


Re: sort by spatial distance in faceting

2014-06-04 Thread Aman Tandon
Thanks David, yeah i want to contribute can you please suggest me that how
should i start to learn deeply about solr spatial, i am new in solr and i
really want to contribute here :)

Any help will be really appreciated.

@David Sorry for the late reply.

With Regards
Aman Tandon


On Tue, May 27, 2014 at 9:36 AM, david.w.smi...@gmail.com <
david.w.smi...@gmail.com> wrote:

> Hi Aman,
>
> That’s an interesting feature request that I haven’t heard before.
>
> First reaction:  Helliosearch (a fork of Solr that is kept up to date with
> changes from Solr) is extremely close to supporting such a thing because it
> supports sorting facets by Helliosearch specific aggregation functions.
> http://heliosearch.org/solr-facet-functions/   However, none of its
> aggregation functions are spatial oriented.  If this feature is important
> enough to you, you could very well add it.  It would likely involve
> encoding the coordinate into the place name to avoid unnecessary redundant
> calculations that would be needed if another field were used.
>
> Second reaction: You could do a secondary search just for these facet
> values that works using Result Grouping (AKA Field Collapsing). Add to each
> document the coordinates of the city indexed using a LatLonType field.  On
> this request, sort the documents using geodist(), and group on the city
> name.  Perhaps you can even get away with returning no documents per group
> if Solr lets you — you don’t need the doc data after all.  The main thing I
> don’t like about this approach is that it’s going to internally calculate
> the distance very redundantly since all documents for a city are going to
> have the coordinate.  Well see if it’s fast enough and give it a try.
>
> ~ David Smiley
> Freelance Apache Lucene/Solr Search Consultant/Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Mon, May 26, 2014 at 8:31 AM, Aman Tandon  >wrote:
>
> > Hi,
> >
> > Is it possible to sort the results return on faceting by geo spatial
> > distance instead of result count.
> >
> > Currently i am faceting on city, which returns me the top facets on
> behalf
> > of the docs matched for that particular city.
> >
> > e.g.:
> > Delhi,400
> > Noida, 380
> > .
> > .
> > .
> > etc.
> >
> > If the user selects the city then the facets should be according to the
> geo
> > spatial distance instead of results, Is it possible with the solr 4.7.x.?
> >
> > With Regards
> > Aman Tandon
> >
>


Tika: url issue

2014-06-04 Thread harshrossi
Hi,

   I am working on Solr using DataImortHander for indexing rich documents
like pdf,word,image etc 
I am using TikaEntityProcessor for extracting contents from the files.

I have one small issue regarding setting value to 'url' entry.

My data-config.xml file is like so:






 








 
   

   



The thing is, the file path is stored in a different pattern in the
database:
"doc_url" is the field in db which stores the url or file path. The file
path is stored in this way:
 *D:\Games\CS2\setup.doc#D:\Games\CS2\setup.doc#*
i.e. the path is stored twice seperated by a '#'. I am not sure why it is
done. It has been done by our client.

All I need is only the one file path i.e. D:\Games\CS2\setup.doc
I am passing the url value to tika as * url="${db_link.LINK}"
*
But the *${db_link.LINK}* contains the path coming from database directly.
I have tried using script transformer and splitting the path string to parts
by '#' and taking the first path using the method *getFilePath(row)* but no
luck.

I am still getting the path as stored in db. This gives a *FileNotFound*
exception while trying to index it and that is obvious because the path is
incorrect.

What can be done to get only the path and leaving out rest of the path
having # and all?

Help would be much appreciated :)







--
View this message in context: 
http://lucene.472066.n3.nabble.com/Tika-url-issue-tp4139781.html
Sent from the Solr - User mailing list archive at Nabble.com.


Highlighting on Parent document

2014-06-04 Thread StrW_dev
Hi,

I am using Block in my index structure as I have many variations of
documents, which have the same content.
This means my parent document has the content I am searching in and I am
filtering and returning on the child documents:

  

  


  



This works great and fast. The only issue I have is with the highlighting
component. As I am returning the childs the snippets also apply to the
childs, but I am actually searching in the parent.  (search query example:
{!child of=type:parent}q )

So is it possible to return the snippets based on the query in the parent?
Do I need to tweak the code for this? Or is there an open isssue and thus
work in progress?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Highlighting-on-Parent-document-tp4139784.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr High GC issue

2014-06-04 Thread rulinma
mark.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-High-GC-issue-tp4138570p4139785.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Use a field with space in qf

2014-06-04 Thread Jack Krupansky
Unfortunately, field aliasing works above the level of the qf parameter 
values.


Maybe the Lucene guys could suggest a way to forcibly rename a field on 
disk.


-- Jack Krupansky

-Original Message- 
From: devraj.jaiman

Sent: Wednesday, June 4, 2014 6:27 AM
To: solr-user@lucene.apache.org
Subject: Use a field with space in qf

Hi,

Long time ago I defined a field in schema with space(e.g 'Movie Name').
Things were going very cool till I need to use edismax query parser and want
to give 'Movie Name' in qf. But as we all know qf consider space as field
delimiter. I tried 'Movie\ Name' 'Movie\+Name' nothing is working.

So What should I do, is there any way to use field with space or I have to
re-index all data in new field without space.

Thanks in advance.

Regards,
Devraj



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Use-a-field-with-space-in-qf-tp4139768.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: sort by spatial distance in faceting

2014-06-04 Thread david.w.smi...@gmail.com
Did my suggestion work out?

RE contributing — most people start out with making improvements needed for
their application.  Alternatively you could look at some of the open issues
in JIRA that have the “spatial” or “modules/spatial” component (for Solr or
Lucene, respectively).  Most of the real spatial stuff is in Lucene-spatial
& Spatial4j but some stuff is at the Solr level.  Speaking of Spatial4j;
it’s an independent project on GitHub, used by Lucene/Solr spatial.  If you
really like computational geometry and geodesic formulas go there and get
on the dev list for it. It’s issues are tracked separately.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, Jun 4, 2014 at 7:45 AM, Aman Tandon  wrote:

> Thanks David, yeah i want to contribute can you please suggest me that how
> should i start to learn deeply about solr spatial, i am new in solr and i
> really want to contribute here :)
>
> Any help will be really appreciated.
>
> @David Sorry for the late reply.
>
> With Regards
> Aman Tandon
>
>
> On Tue, May 27, 2014 at 9:36 AM, david.w.smi...@gmail.com <
> david.w.smi...@gmail.com> wrote:
>
> > Hi Aman,
> >
> > That’s an interesting feature request that I haven’t heard before.
> >
> > First reaction:  Helliosearch (a fork of Solr that is kept up to date
> with
> > changes from Solr) is extremely close to supporting such a thing because
> it
> > supports sorting facets by Helliosearch specific aggregation functions.
> > http://heliosearch.org/solr-facet-functions/   However, none of its
> > aggregation functions are spatial oriented.  If this feature is important
> > enough to you, you could very well add it.  It would likely involve
> > encoding the coordinate into the place name to avoid unnecessary
> redundant
> > calculations that would be needed if another field were used.
> >
> > Second reaction: You could do a secondary search just for these facet
> > values that works using Result Grouping (AKA Field Collapsing). Add to
> each
> > document the coordinates of the city indexed using a LatLonType field.
>  On
> > this request, sort the documents using geodist(), and group on the city
> > name.  Perhaps you can even get away with returning no documents per
> group
> > if Solr lets you — you don’t need the doc data after all.  The main
> thing I
> > don’t like about this approach is that it’s going to internally calculate
> > the distance very redundantly since all documents for a city are going to
> > have the coordinate.  Well see if it’s fast enough and give it a try.
> >
> > ~ David Smiley
> > Freelance Apache Lucene/Solr Search Consultant/Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
> >
> > On Mon, May 26, 2014 at 8:31 AM, Aman Tandon  > >wrote:
> >
> > > Hi,
> > >
> > > Is it possible to sort the results return on faceting by geo spatial
> > > distance instead of result count.
> > >
> > > Currently i am faceting on city, which returns me the top facets on
> > behalf
> > > of the docs matched for that particular city.
> > >
> > > e.g.:
> > > Delhi,400
> > > Noida, 380
> > > .
> > > .
> > > .
> > > etc.
> > >
> > > If the user selects the city then the facets should be according to the
> > geo
> > > spatial distance instead of results, Is it possible with the solr
> 4.7.x.?
> > >
> > > With Regards
> > > Aman Tandon
> > >
> >
>


Re: Solr maximum Optimal Index Size per Shard

2014-06-04 Thread Jack Krupansky

How many documents was in that 20GB index?

I'm skeptical that a 1 billion document shard "won't be a problem." I mean 
technically it is possible, but as you are already experiencing, it may take 
a long time and a very powerful machine to do so. 100 million (or 250 
million max) would be a more realistic goal. Even then, it depends on your 
doc size and machine size.


The main point from the previous discussion is that although the technical 
hard limit for a Solr shard is 2G docs, from a practical perspective it is 
very difficult to get to that limit, not that indexing 1 billion docs on a 
single shard is "just fine"!


As a general rule, if you want fast queries for high volume, strive to 
assure that your per-shard index fits entirely into the system memory 
available for OS caching of file system pages.


In any case, a proof of concept implementation will tell you everything you 
need to know.


-- Jack Krupansky

-Original Message- 
From: Vineet Mishra

Sent: Wednesday, June 4, 2014 2:45 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr maximum Optimal Index Size per Shard

Thanks all for your response.
I presume this conversation concludes that indexing around 1Billion
documents per shard won't be a problem, as I have 10 Billion docs to index,
so approx 10 shards with 1 Billion each should be fine with it and how
about Memory, what size of RAM should be fine for this amount of data?
Moreover what should be the indexing technique for this huge data set, as
currently I am indexing with EmbeddedSolrServer but its going pathetically
slow after some 20Gb of indexing. Comparatively SolrHttpPost was slow due
to network delays and response but after this long running the indexing
with EmbeddedSolrServer I am getting a different notion.
Any good indexing technique for this huge dataset would be highly
appreciated.

Thanks again!


On Wed, Jun 4, 2014 at 6:40 AM, rulinma  wrote:


mark.



--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-maximum-Optimal-Index-Size-per-Shard-tp4139565p4139698.html
Sent from the Solr - User mailing list archive at Nabble.com.





Highlighting priority

2014-06-04 Thread Erwin Gunadi
Hi,

 

We are currently using Solr 4.3 and have highlighting activated on three
different fields using FVH.

 

Is it possible with Solr to prioritize highlighting for these fields ?

I mean, how to configure Solr, when it's possible highlight the keywords
from the first field, and highlight the rest of the keywords on the second
field and so on.

   

Is there any "Way Of Solr" for doing this ?

 

Thank you and Best Regards

Erwin



Strange Behavior with Solr in Tomcat.

2014-06-04 Thread S.L
Hi Folks,

I recently started using the spellchecker in my solrconfig.xml. I am able to 
build up an index in Solr.

But,if I ever shutdown tomcat I am not able to restart it.The server never 
spits out the server startup time in seconds in the logs,nor does it print any 
error messages in the catalina.out file.

The only way for me to get around this is by delete the data directory of the 
index and then start the server,obviously this makes me loose my index.

Just wondering if anyone faced a similar issue and if they were able to solve 
this.

Thanks.



null pointer on FSTCompletionLookup

2014-06-04 Thread Will Milspec
Hi all,

Someone posted this problem over a year ago but I did not see a clear
resolution in the thread.

Intermittently--i.e. for some searches, not others--the
'suggest/spellcheck' component throws a n NullPointerException (NPE) when a
user executes  a search. It fails on  FSTCompletionLookup (line 244)

I'm using solr 4.4. ( I'm using 4.4 to match "what's in production")I could
upgrade if necessary. )

Any hints on why it occurs and how to fix? The earlier post alluded to
"changing the field type solved the problem", but did not provide details.

Thanks

will

/select request handler:


   on
  suggestDictionary
  false
  5
  2
  5
  true
  true
  5
  3

spellcheck component:




suggestDictionary
org.apache.solr.spelling.suggest.Suggester
org.apache.solr.spelling.suggest.fst.FSTLookupFactory
title
 
0.
true



field type definition:



  





  
  




  


field definition:




It fails here:
===
Here's the line that fails.

@Override
  public List lookup(CharSequence key, boolean
higherWeightsFirst, int num) {
final List completions;
if (higherWeightsFirst) {
  completions = higherWeightsCompletion.lookup(key, num);
} else {
  completions = normalCompletion.lookup(key, num); <-- fails on this
line

}


Highlighting priority

2014-06-04 Thread Erwin Gunadi
Hi,


We are currently using Solr 4.3 and have highlighting activated on three
different fields using FVH.



Is it possible with Solr to prioritize highlighting for these fields ?

I mean, how to configure Solr, when it’s possible highlight the keywords
from the first field, and highlight the rest of the keywords on the second
field and so on.



Is there any “Way Of Solr” for doing this ?



Thank you and Best Regards
Erwin


Re: DirectSpellChecker not returning expected suggestions.

2014-06-04 Thread Erick Erickson
If you have access to the solr admin screen you have access to how it was
analyzed through the "analysis" page. You have to hover over the little
abbreviations to see the class in the analysis chain.

Likewise, the admin screen should have access to the raw schema.xml file
which _also_ has the analysis chain definition.

And if you really don't have access to either of those, and you can't find
out how the fields were analyzed, then there's not much that you can do...

Best
Erick


On Mon, Jun 2, 2014 at 12:25 PM, S.L  wrote:

> James,
>
> I get no results back and no suggestions  for "wrangle" , however I get
> suggestions for "wranglr" , and "wrangle" is not present in my index.
>
> I am just searching for "wrangle" in a field that is created by copying
> other fields, as to how it is analyzed I dont have access to it now.
>
> Thanks.
>
>
> On Mon, Jun 2, 2014 at 2:48 PM, Dyer, James 
> wrote:
>
> > If "wrangle" is not in your index, and if it is within the max # of
> edits,
> > then it should suggest it.
> >
> > Are you getting anything back from spellcheck at all?  What is the exact
> > query you are using?  How is the spellcheck field analyzed?  If you're
> > using stemming, then "wrangle" and "wrangler" might be stemmed to the
> same
> > word. (by the way, you shouldn't spellcheck against a stemmed or
> otherwise
> > heavily-analyzed field).
> >
> > James Dyer
> > Ingram Content Group
> > (615) 213-4311
> >
> >
> > -Original Message-
> > From: S.L [mailto:simpleliving...@gmail.com]
> > Sent: Monday, June 02, 2014 1:06 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: DirectSpellChecker not returning expected suggestions.
> >
> > OK, I just realized that "wrangle" is a proper english word, probably
> thats
> > why I dont get a suggestion for "wrangler" in this case. How ever in my
> > test index there is no "wrangle" present , so even though this is a
> proper
> > english word , since there is no occurence of it in the index should'nt
> > Solr suggest me "wrangler" ?
> >
> >
> > On Mon, Jun 2, 2014 at 2:00 PM, S.L  wrote:
> >
> > > I do not get any suggestion (when I search for "wrangle") , however I
> > > correctly get the suggestion wrangler when I search for wranglr , I am
> > > using the Direct and WordBreak spellcheckers in combination, I have not
> > > tried using anything else.
> > >
> > > Is the distance calculation of Solr different than what Levestien
> > distance
> > > calculation ? I have set maxEdits to 1 , assuming that this corresponds
> > to
> > > the maxDistance.
> > >
> > > Thanks for your help!
> > >
> > >
> > > On Mon, Jun 2, 2014 at 1:54 PM, david.w.smi...@gmail.com <
> > > david.w.smi...@gmail.com> wrote:
> > >
> > >> What do you get then?  Suggestions, but not the one you’re looking
> for,
> > or
> > >> is it deemed correctly spelled?
> > >>
> > >> Have you tried another spellChecker impl, for troubleshooting
> purposes?
> > >>
> > >> ~ David Smiley
> > >> Freelance Apache Lucene/Solr Search Consultant/Developer
> > >> http://www.linkedin.com/in/davidwsmiley
> > >>
> > >>
> > >> On Sat, May 31, 2014 at 12:33 AM, S.L 
> > wrote:
> > >>
> > >> > Hi All,
> > >> >
> > >> > I have a small test index of 400 documents , it happens to have an
> > entry
> > >> > for  "wrangler", When I search for "wranglr", I correctly get the
> > >> collation
> > >> > suggestion as "wrangler", however when I search for "wrangle" , I do
> > not
> > >> > get a suggestion for "wrangler".
> > >> >
> > >> > The Levenstien distance between wrangle --> wrangler is same as the
> > >> > Levestien distance between wranglr-->wrangler , I am just wondering
> > why
> > >> I
> > >> > do not get a suggestion for wrangle.
> > >> >
> > >> > Below is my Direct spell checker configuration.
> > >> >
> > >> > 
> > >> >   direct
> > >> >   suggestAggregate
> > >> >   solr.DirectSolrSpellChecker
> > >> >   
> > >> >   internal
> > >> >   score
> > >> >
> > >> >   
> > >> >   0.7
> > >> >   
> > >> >   1
> > >> >   
> > >> >   3
> > >> >   
> > >> >   5
> > >> >   
> > >> >   4
> > >> >   
> > >> >   0.01
> > >> >   
> > >> >   
> > >> > 
> > >> >
> > >>
> > >
> > >
> >
>


Re: Does CloudSolrServer hit zookeeper for every request?

2014-06-04 Thread Erick Erickson
There's some pinging going on between ZK and registered nodes, and when the
timeout is exceeded there ZK marks the node as down and broadcasts messages
to all the _other_ nodes that the node is down. Then each Solr node knows
not to use the downed node until a message is received indicating it's
healthy again.

Best,
Erick


On Mon, Jun 2, 2014 at 2:17 PM, Jim.Musil  wrote:

> I’m curious how CloudSolrServer works in practice.
>
> I understand that it gets the active solr nodes from zookeeper, but does
> it do this for every request?
>
> If it does hit zk for every request, that seems to put a lot of pressure
> on the zk ensemble.
>
> If it does NOT hit zk for every request, then how does it detect changes
> in the number of nodes and the status of the nodes?
>
> Thanks!
> Jim M.
>


Re: Automatic syncing of data on a node that was down for a while:

2014-06-04 Thread Erick Erickson
You shouldn't have to do anything, assuming that instance3 is a replica of
instance1 or instance2, it should be automatic. You do have to wait for the
synchronization to happen, and you should be seeing messages in the various
Solr logs (particularly instance3 and the leader of the shard). What do the
logs say?

Now, if these are different _shards_ it's a different story, but that
doesn't appear to be the case from your description.

Best,
Erick


On Mon, Jun 2, 2014 at 4:59 PM, keertisurapaneni  wrote:

> We have 3 SOLR instances on 3 different hosts and we have an external
> zookeeper configured for each SOLR instance.
>
> Suppose, instance1 and instance2 are up and running and instance3 is down.
> A
> few records are added to both the running instances.
>
> I am able to see the records that were added to instance1 on both instance1
> and instance2.
>
> I am able to see the records that were added to instance2 on both instance1
> and instance2.
>
> Then when I bring up instance3, I don't see the records that were added to
> instance1 and instance2.
>
> What needs to be done so the instance that is down can remain in sync with
> the running instances when it comes up?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Automatic-syncing-of-data-on-a-node-that-was-down-for-a-while-tp4139425.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Strange Behavior with Solr in Tomcat.

2014-06-04 Thread Aman Tandon
I guess if you try to copy the index and then kill the process of tomcat
then it might help. If still the index need to be delete you would have the
back up. Next time always make back up.
On Jun 4, 2014 7:55 PM, "S.L"  wrote:

> Hi Folks,
>
> I recently started using the spellchecker in my solrconfig.xml. I am able
> to build up an index in Solr.
>
> But,if I ever shutdown tomcat I am not able to restart it.The server never
> spits out the server startup time in seconds in the logs,nor does it print
> any error messages in the catalina.out file.
>
> The only way for me to get around this is by delete the data directory of
> the index and then start the server,obviously this makes me loose my index.
>
> Just wondering if anyone faced a similar issue and if they were able to
> solve this.
>
> Thanks.
>
>


Re: Strict mode at searching and indexing

2014-06-04 Thread Erick Erickson
right, if that line is uncommented, then _anything_ you throw at Solr will
be processed just fine. You've essentially told Solr "there's no input
that's wrong".

Perhaps confusingly, the "ignored" field type has stored="false" and
indexed="false" so the effect at indexing time is for the input to be,
well, ignored. At query time similarly I believe.

So try un-commenting, restarting Solr and see if the problem goes away.

Best,
Erick


On Tue, Jun 3, 2014 at 12:08 PM, T. Kuro Kurosaka 
wrote:

>
> On 05/30/2014 08:29 AM, Erick Erickson wrote:
>
>> I see errors in both cases. Do you
>> 1> have schemaless configured
>> or
>> 2> have a dynamic field pattern that matches your "non_exist_field"?
>>
> Maybe
>  
> is un-commented-out in schema.xml?
>
> Kuro
>
>


Re: Strange behaviour when tuning the caches

2014-06-04 Thread Joel Bernstein
The CollapsingQParserPlugin can be resource intensive so you'll want to be
careful about how it's used. Particularly with autowarming in the
queryResultCache. If you autowarm lots of queries while using the
CollapsingQParserPlugin, your be running lots of CPU and memory intensive
queries after opening a new searcher.

Also you'll want to understand the memory profile for the
CollapsingQParserPlugin on your index. It uses more memory as the number of
unique values in the collapse field grows. This is regardless of the number
of unique values in the search results.

So, be aware of the cardinality of the collapse field and use the
nullPolicy=expand, if you have nulls in the collapsed field. This null
policy is designed to lesson the memory impact if there a nulls in the
collapsed field.

Also it's a good idea to have one static warming query that exercises the
CollapsingQParserPlugin as it can take time to warm. Autowarming the query
result cache might cover this in your case.

In general the CollapsingQParserPlugin should be faster then grouping when
you have a high number of distinct groups in the result set. But the
tradeoff is that it it's more memory intensive then grouping when there is
a low number of distinct groups in the result set. Both the
collapsingqparserpluging and grouping (with ngroups) have a high memory
footprint when there is a large number of distinct groups in the result
set. If your not using ngroups, grouping will always outperform the
collapsingqparserplugin.







Joel Bernstein
Search Engineer at Heliosearch


On Tue, Jun 3, 2014 at 12:38 PM, Jean-Sebastien Vachon <
jean-sebastien.vac...@wantedanalytics.com> wrote:

> Yes we are already using it.
>
> > -Original Message-
> > From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com]
> > Sent: June-03-14 11:41 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Strange behaviour when tuning the caches
> >
> > Hi,
> >
> > Have you seen https://wiki.apache.org/solr/CollapsingQParserPlugin ?
>  May
> > help with the field collapsing queries.
> >
> > Otis
> > --
> > Performance Monitoring * Log Analytics * Search Analytics Solr &
> > Elasticsearch Support * http://sematext.com/
> >
> >
> > On Tue, Jun 3, 2014 at 8:41 AM, Jean-Sebastien Vachon < jean-
> > sebastien.vac...@wantedanalytics.com> wrote:
> >
> > > Hi Otis,
> > >
> > > We saw some improvement when increasing the size of the caches. Since
> > > then, we followed Shawn advice on the filterCache and gave some
> > > additional RAM to the JVM in order to reduce GC. The performance is
> > > very good right now but we are still experiencing some instability but
> > > not at the same level as before.
> > > With our current settings the number of evictions is actually very low
> > > so we might be able to reduce some caches to free up some additional
> > > memory for the JVM to use.
> > >
> > > As for the queries, it is a set of 5 million queries taken from our
> > > logs so they vary a lot. All I can say is that all queries involve
> > > either grouping/field collapsing and/or radius search around a point.
> > > Our largest customer is using a set of 8-10 filters that are
> > > translated as fq parameters. The collection contains around 13 million
> > > documents distributed on 5 shards with 2 replicas. The second
> > > collection has the same configuration and is used for indexing or as a
> > > fail-over index in case the first one falls.
> > >
> > > We`ll keep making adjustments today but we are pretty close of having
> > > something that performs while being stable.
> > >
> > > Thanks all for your help.
> > >
> > >
> > >
> > > > -Original Message-
> > > > From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com]
> > > > Sent: June-03-14 12:17 AM
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: Strange behaviour when tuning the caches
> > > >
> > > > Hi Jean-Sebastien,
> > > >
> > > > One thing you didn't mention is whether as you are increasing(I
> > > > assume) cache sizes you actually see performance improve?  If not,
> > > > then maybe
> > > there
> > > > is no value increasing cache sizes.
> > > >
> > > > I assume you changed only one cache at a time? Were you able to get
> > > > any one of them to the point where there were no evictions without
> > > > things breaking?
> > > >
> > > > What are your queries like, can you share a few examples?
> > > >
> > > > Otis
> > > > --
> > > > Performance Monitoring * Log Analytics * Search Analytics Solr &
> > > > Elasticsearch Support * http://sematext.com/
> > > >
> > > >
> > > > On Mon, Jun 2, 2014 at 11:09 AM, Jean-Sebastien Vachon < jean-
> > > > sebastien.vac...@wantedanalytics.com> wrote:
> > > >
> > > > > Thanks for your quick response.
> > > > >
> > > > > Our JVM is configured with a heap of 8GB. So we are pretty close
> > > > > of the "optimal" configuration you are mentioning. The only other
> > > > > programs running is Zookeeper (which has its own storage device)
> > > > > and a proprietary A

Re: Strange Behavior with Solr in Tomcat.

2014-06-04 Thread S.L
Hi,

This is not a case of accidental deletion , the only way I can restart the
tomcat is by deleting the data directory for the index that was created
earlier, this started happening after I started using spellcheckers in my
solrconfig.xml. As long as the Tomcat is running its fine.

Any help from anyone who faced a similar issues would be appreciated.

Thanks.



On Wed, Jun 4, 2014 at 11:08 AM, Aman Tandon  wrote:

> I guess if you try to copy the index and then kill the process of tomcat
> then it might help. If still the index need to be delete you would have the
> back up. Next time always make back up.
> On Jun 4, 2014 7:55 PM, "S.L"  wrote:
>
> > Hi Folks,
> >
> > I recently started using the spellchecker in my solrconfig.xml. I am able
> > to build up an index in Solr.
> >
> > But,if I ever shutdown tomcat I am not able to restart it.The server
> never
> > spits out the server startup time in seconds in the logs,nor does it
> print
> > any error messages in the catalina.out file.
> >
> > The only way for me to get around this is by delete the data directory of
> > the index and then start the server,obviously this makes me loose my
> index.
> >
> > Just wondering if anyone faced a similar issue and if they were able to
> > solve this.
> >
> > Thanks.
> >
> >
>


Tomcat restart removes the Core.

2014-06-04 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
All, Can anyone help me on what is going wrong in my tomcat. When I restart the 
tomcat after schema update, the Cores are removed.

I need to add the cores manually to get back them on work.

Is there anything someone experience..

Thanks

Ravi


Re: Solr cloud nodes falling

2014-06-04 Thread Kashish
Any updates on this? Any help will be greatly appreciated. :)



-
Thanks,
Kashish
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-cloud-nodes-falling-tp4139390p4139856.html
Sent from the Solr - User mailing list archive at Nabble.com.


"Fake" cached join query much faster than cached fq?

2014-06-04 Thread Brett Hoerner
The following two queries are doing the same thing, one using a "normal" fq
range query and another using a parent query. The cache is warm (these are
both hits) but the "normal" ones takes ~6 to 7.5sec while the parent query
hack takes ~1.2sec.

Is this expected? Is there anything "wrong" with my "normal fq" query? My
plan is to increase the size of my perSegFilter cache so I can use the hack
for faster queries... any thoughts here?

"responseHeader": { "status": 0, "QTime": 7657, "params": { "q": "*:*", "
facet.field": "terms_smnd", "debug": "true", "indent": "true", "fq": [
"created_at_tdid:[1392768001
TO 1393954400]", "text:coffee" ], "rows": "0", "wt": "json", "facet": "true",
"_": "1401906435914" } }, "response": { "numFound": 2432754, "start": 0, "
maxScore": 1, "docs": [] }

Full response example:
https://gist.githubusercontent.com/bretthoerner/60418f08a88093c30220/raw/0a61f013f763e68985c15c5ed6cad6fa253182b9/gistfile1.txt

 "responseHeader": { "status": 0, "QTime": 1210, "params": { "q": "*:*", "
facet.field": "terms_smnd", "debug": "true", "indent": "true", "fq": [
"{!cache=false}{!parent
which='created_at_tdid:[1392768001 TO 1393954400]'}", "text:coffee" ], "rows":
"0", "wt": "json", "facet": "true", "_": "1401906444521" } }, "response": {
"numFound": 2432754, "start": 0, "maxScore": 1, "docs": [] }

Full response example:
https://gist.githubusercontent.com/bretthoerner/9d82aa8fe59ffc7ff6ab/raw/560a395a0933870a5d2ac736b58805d8fab7f758/gistfile1.txt


Re: Tomcat restart removes the Core.

2014-06-04 Thread Michael Della Bitta
Any chance you don't have a persistent="true" attribute in your solr.xml?

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions

w: appinions.com 


On Wed, Jun 4, 2014 at 1:06 PM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions)  wrote:

> All, Can anyone help me on what is going wrong in my tomcat. When I
> restart the tomcat after schema update, the Cores are removed.
>
> I need to add the cores manually to get back them on work.
>
> Is there anything someone experience..
>
> Thanks
>
> Ravi
>


Cache response time

2014-06-04 Thread Branham, Jeremy [HR]
Is there a JMX metric for measuring the cache request time?

I can see the avg request times, but I'm assuming this includes the cache and 
non-cache values.

http://wiki.apache.org/solr/SolrPerformanceFactors






This e-mail may contain Sprint proprietary information intended for the sole 
use of the recipient(s). Any use by others is prohibited. If you are not the 
intended recipient, please contact the sender and delete all copies of the 
message.


Re: Automatic syncing of data on a node that was down for a while:

2014-06-04 Thread keertisurapaneni
 

 

 

 

PS: I am using the same default solrconfig.xml file without any changes for
all 3 instances. It is showing all 3 instances as masters (like in the above
pics).

I brought down instance3, and added a record each on instance1 and
instance2. Below are the results after bringing up instance3:

 

 

 

The sync doesn't happen in instance3.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Automatic-syncing-not-happening-on-a-node-that-was-down-for-a-while-tp4139425p4139908.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Cache response time

2014-06-04 Thread Matt Kuiper
I have not come across one.  Is your question directed to the queryResultCache? 
 

My understanding is that the queryResultCache is the only cache that contains 
full query results that could be used to compare against non-cached results 
times.  I believe the other caches can participate in speeding up a request for 
different parts of the query (i.e. filterCache can help with the filter query 
portions of a request, and documentCache for the stored fields).  I am learning 
myself, so if someone wants to correct, or clarify, please do.

A possible manually approach to answer your question could be to use a JMX 
monitoring tool to retrieve a timestamp for when the Solr JMX metric "hits" 
increases for the queryResultCache.  Then you could use the timestamp with the 
logs to find the request time for the request associated with the cache "hits" 
metric increasing.  

Matt 

-Original Message-
From: Branham, Jeremy [HR] [mailto:jeremy.d.bran...@sprint.com] 
Sent: Wednesday, June 04, 2014 1:33 PM
To: solr-user@lucene.apache.org
Subject: Cache response time

Is there a JMX metric for measuring the cache request time?

I can see the avg request times, but I'm assuming this includes the cache and 
non-cache values.

http://wiki.apache.org/solr/SolrPerformanceFactors






This e-mail may contain Sprint proprietary information intended for the sole 
use of the recipient(s). Any use by others is prohibited. If you are not the 
intended recipient, please contact the sender and delete all copies of the 
message.


Multivalue wild card search

2014-06-04 Thread Ethan
I can't seem to find a solution to do wild card search on a multiValued
field.

For Eg consider a multiValued field called "Name" with 3 values -

"Name" : [
"[[\"Ethan\", \"G\", \"\"],[\"Steve\", \"Wonder\", \"\"]]",
"[]",
"[[\"hifte\", \"Grop\", \"\"]]"
]

For a multiValued like above, I want search like-

q="***[\"Steve\", \"Wonder\", \"\"]"

But I do not get back any results back. Any ideas on to create such query?


Re: Multivalue wild card search

2014-06-04 Thread Jack Krupansky
Wildcard, fuzzy, and regex query operate on a single term of a single 
tokenized field value or a single string field value.


-- Jack Krupansky

-Original Message- 
From: Ethan

Sent: Wednesday, June 4, 2014 6:59 PM
To: solr-user
Subject: Multivalue wild card search

I can't seem to find a solution to do wild card search on a multiValued
field.

For Eg consider a multiValued field called "Name" with 3 values -

"Name" : [
"[[\"Ethan\", \"G\", \"\"],[\"Steve\", \"Wonder\", \"\"]]",
"[]",
"[[\"hifte\", \"Grop\", \"\"]]"
]

For a multiValued like above, I want search like-

q="***[\"Steve\", \"Wonder\", \"\"]"

But I do not get back any results back. Any ideas on to create such query? 



RE: suspect SOLR query from D029 (SOLR master)

2014-06-04 Thread Branham, Jeremy [HR]
Thanks Jack -

The following keyword search, based on the previous synonym definition, 
actually runs in SOLR and produces a HTTP 500 error (attempted to create too 
many clauses error)

"asurion device protection has tep, tep plus, erp, esrp programs"

HTTP/1.1 500 Internal Server Error
Server: Apache-Coyote/1.1
X-Powered-By: Servlet 2.5; JBoss-5.0/JBossWeb-2.1
Last-Modified: Tue, 03 Jun 2014 17:36:49 GMT
ETag: "14662ce0c90"
Cache-Control: no-cache, no-store
Pragma: no-cache
Expires: Sat, 01 Jan 2000 01:00:00 GMT
Content-Type: application/xml;charset=UTF-8
Date: Tue, 03 Jun 2014 17:36:48 GMT
Connection: close
Content-Length: 3449



maxClauseCount is set to 1024org.apache.lucene.search.BooleanQuery$TooManyClauses: 
maxClauseCount is set to 1024
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:142)
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:133)
at 
org.apache.solr.search.SynonymExpandingExtendedDismaxQParser.applySynonymQueries(SynonymExpandingExtendedDismaxQParserPlugin.java:416)
at 
org.apache.solr.search.SynonymExpandingExtendedDismaxQParser.applySynonymQueries(SynonymExpandingExtendedDismaxQParserPlugin.java:396)
at 
org.apache.solr.search.SynonymExpandingExtendedDismaxQParser.attemptToApplySynonymsToQuery(SynonymExpandingExtendedDismaxQParserPlugin.java:379)
at 
org.apache.solr.search.SynonymExpandingExtendedDismaxQParser.parse(SynonymExpandingExtendedDismaxQParserPlugin.java:351)
at org.apache.solr.search.QParser.getQuery(QParser.java:142)
at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:142)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:187)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:235)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:183)
at 
org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:95)
at 
org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.process(SecurityContextEstablishmentValve.java:126)
at 
org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.invoke(SecurityContextEstablishmentValve.java:70)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.jboss.web.tomcat.service.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:158)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:330)
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:829)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:598)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:451)
at java.lang.Thread.run(Thread.java:662)
500


---
Same keyword search performed (minus the commas), produces the OOM issue, again 
based on the previous synonym definition list

"asurion device protection has tep tep plus erp esrp programs"

-
HTTP/1.1 500 Internal Server Error
Server: Apache-Coyote/1.1
X-Powered-By: Servlet 2.5; JBoss-5.0/JBossWeb-2.1
Last-Modified: Tue, 03 Jun 2014 17:35:57 GMT
ETag: "N2UwMDAwMDAwMDAwMDAwMFNvbHI="
Content-Type: application/xml;charset=UTF-8
Transfer-Encoding: chunked
Date: Tue, 03 Jun 2014 17:42:07 GMT
Connection: close

1563


java.lang.OutOfMemoryError: GC overhead limit 
exceededjava.lang.RuntimeException: 
java.lang.OutOfMemoryError: GC overhead limit exceeded
at 
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:717)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.

Re: Cache response time

2014-06-04 Thread Otis Gospodnetic
Hi Jeremy,

Nothing in Solr tracks that time.  Caches are pluggable.  If you really
want this info you could write your own cache that is just a proxy for the
real cache and then you can time it.

But why do you need this info?  Do you suspect that is slow?

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Wed, Jun 4, 2014 at 3:33 PM, Branham, Jeremy [HR] <
jeremy.d.bran...@sprint.com> wrote:

> Is there a JMX metric for measuring the cache request time?
>
> I can see the avg request times, but I'm assuming this includes the cache
> and non-cache values.
>
> http://wiki.apache.org/solr/SolrPerformanceFactors
>
>
>
>
> 
>
> This e-mail may contain Sprint proprietary information intended for the
> sole use of the recipient(s). Any use by others is prohibited. If you are
> not the intended recipient, please contact the sender and delete all copies
> of the message.
>


Re: null pointer on FSTCompletionLookup

2014-06-04 Thread Will Milspec
Hi all,

I know this probably seems like an uninteresting problem and smells, even
to me,  like a stupid/newbie mis-configuration [Yes. I am reading the
excellent solr in action and  trying my hand at applying the "suggestion
examples"], but I looked a bit into this tonight, fired up the debugger,
stepped through code, etc to try to find where I erred:  to no avail.

Some questions:

First, does the SpellCheck component's "FSTLookupFactory" require any extra
special configuration, e.g. term vectors for the field ("suggest" below),
etc.:
org.apache.solr.spelling.suggest.fst.FSTLookupFactory
suggest

Second, why does the FSTCompletionLookup not check for nulls here for these
variables: higherWeightsCompletion and normalCompletion?  Wo

if (higherWeightsFirst) {
  completions = higherWeightsCompletion.lookup(key, num);
} else {
  completions = normalCompletion.lookup(key, num);
}

[Stepping through the code, I saw it execute this constructor:

  /**
   * This constructor prepares for creating a suggested FST using the
   * {@link #build(TermFreqIterator)} method.
   *
   * @param buckets
   *  The number of weight discretization buckets (see
   *  {@link FSTCompletion} for details).
   *
   * @param exactMatchFirst
   *  If true exact matches are promoted to the top of
the
   *  suggestions list. Otherwise they appear in the order of
   *  discretized weight and alphabetical within the bucket.
   */
  public FSTCompletionLookup(int buckets, boolean exactMatchFirst) {

This constructor never initializes the  two *Completion variables ]


Third: I got inconsistent results. If I started solr afresh: this error
appeared. If I reindexed my test site, then executed my 'problematic
searches' , the problem went away. Why would this happen

Thanks in advance





On Wed, Jun 4, 2014 at 9:32 AM, Will Milspec  wrote:

> Hi all,
>
> Someone posted this problem over a year ago but I did not see a clear
> resolution in the thread.
>
> Intermittently--i.e. for some searches, not others--the
> 'suggest/spellcheck' component throws a n NullPointerException (NPE) when a
> user executes  a search. It fails on  FSTCompletionLookup (line 244)
>
> I'm using solr 4.4. ( I'm using 4.4 to match "what's in production")I
> could upgrade if necessary. )
>
> Any hints on why it occurs and how to fix? The earlier post alluded to
> "changing the field type solved the problem", but did not provide details.
>
> Thanks
>
> will
>
> /select request handler:
> 
>
>on
>   suggestDictionary
>   false
>   5
>   2
>   5
>   true
>   true
>   5
>   3
>
> spellcheck component:
> 
>
> 
> 
> suggestDictionary
>  name="classname">org.apache.solr.spelling.suggest.Suggester
>  name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookupFactory
> title
>  
> 0.
> true
> 
> 
>
> field type definition:
> 
>
>  positionIncrementGap="100">
>   
> 
> 
>  words="stopwords.txt" />
> 
> 
>   
>   
> 
>  words="stopwords.txt" />
>  ignoreCase="true" expand="true"/>
> 
>   
> 
>
> field definition:
> 
>
>  multiValued="false" omitNorms="false"/>
>
> It fails here:
> ===
> Here's the line that fails.
>
> @Override
>   public List lookup(CharSequence key, boolean
> higherWeightsFirst, int num) {
> final List completions;
> if (higherWeightsFirst) {
>   completions = higherWeightsCompletion.lookup(key, num);
> } else {
>   completions = normalCompletion.lookup(key, num); <-- fails on this
> line
>
> }
>
>


Re: Cache response time

2014-06-04 Thread Shalin Shekhar Mangar
Can you please open a Jira issue? It'd be nice to have this.


On Thu, Jun 5, 2014 at 1:03 AM, Branham, Jeremy [HR] <
jeremy.d.bran...@sprint.com> wrote:

> Is there a JMX metric for measuring the cache request time?
>
> I can see the avg request times, but I'm assuming this includes the cache
> and non-cache values.
>
> http://wiki.apache.org/solr/SolrPerformanceFactors
>
>
>
>
> 
>
> This e-mail may contain Sprint proprietary information intended for the
> sole use of the recipient(s). Any use by others is prohibited. If you are
> not the intended recipient, please contact the sender and delete all copies
> of the message.
>



-- 
Regards,
Shalin Shekhar Mangar.