When using Dismax, Solr 5.1 tries to compare the entire field to the search string, instead of only using keywords

2015-06-04 Thread Wouter Admiraal
Hi all.

Sorry about the title, but I don't know how to be more explicit than
that. I am updating a Solr 1.4 install to Solr 5.1. I went through all
the changes, updated my schema.xml, etc. Everything works (I
re-indexed instead of migrating the existing one). I can search for
documents, no problem there.

Where I do have a problem is with dismax. It doesn't behave like
before. It must a configuration issue, or maybe I never really
understood how it is supposed to work.

I have 2 documents, which can be summarized as follows:

{
  "label": "Food Inc",
  "keywords": ["Food", "Nutrition"]
}

{
  "label": "Food check online",
  "keywords": ["Internet", "Health"]
}

If I disable dismax and search for "Food" (?q=Food), I find both
documents. So far, so good.

If I turn dismax on and add a boost to the label, I get 0 results
(?q=Food&defType=dismax&qf=label^3.0).

If I turn dismax on and add a boost to the keywords, I get 1 result
("Food Inc", which has a keyword "Food";
?q=Food&defType=dismax&qf=keywords^2.0).

So, from what I understand, it tries to match the search term
*exactly* when enabling dismax, but uses a "contains keyword" logic
when disabling dismax (same for edismax). Which means "Food" !== "Food
Inc" with dismax on.

When I turn on debug, I get the following:

"debug": {
  "rawquerystring": "Food",
  "querystring": "Food",
  "parsedquery": "(+DisjunctionMaxQuery((label:Food^3.0)) ())/no_coord",
  "parsedquery_toString": "+(label:Food^3.0) ()",
  "explain": {},
  "QParser": "DisMaxQParser",
  "altquerystring": null,
  "boostfuncs": null,
  ...
}

I don't understand how/why this doesn't use a "contains" operator.
This was the behavior on the old 1.4 instance. I went through the
changelog for 1.4 to 5.1, but I don't find any explicit information
about dismax behaving differently, except the "mm" parameter needs a
default. I tried many values for mm (including 0, 100%, 100, etc) but
to no avail.

Thanks for your help.

Best regards,

Wouter Admiraal


Re: Derive suggestions across multiple fields

2015-06-04 Thread Dhanesh Radhakrishnan
Try this

http://localhost:8983/solr/collection1/suggest?suggest=true&suggest.dictionary=suggest&suggest.build=true&wt=xml&suggest.q=mater

On Thu, Jun 4, 2015 at 11:53 AM, Zheng Lin Edwin Yeo 
wrote:

> I've tried to use the solr.SuggestComponent as stated in the website, but
> it couldn't work.
>
> When I change to using the suggest with the configuration below and go a
> query like http://localhost:8983/solr/collection1/suggest?q=mater, it says
> "The Webpage cannot be found"
>
>   
> 
>   suggest
>name="classname">org.apache.solr.spelling.suggest.Suggester
>   
> name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookupFactory
>   text  
>   true
> 
>   
>name="/suggest">
> 
>explicit
>   edismax
>10
>json
>true
>
>   true
>   suggest
>   5
>   true
> 
> 
>   suggest
> 
>   
>
>
> Regards,
> Edwin
>
>
> On 4 June 2015 at 13:21, Erick Erickson  wrote:
>
> > This may be helpful: http://lucidworks.com/blog/solr-suggester/
> >
> > Note that there are a series of fixes in various versions of Solr,
> > particularly buildOnStartup=false and working on multivalued fields.
> >
> > Best,
> > Erick
> >
> > On Wed, Jun 3, 2015 at 8:04 PM, Zheng Lin Edwin Yeo
> >  wrote:
> > > My previous suggester configuration is derived from this page:
> > > https://wiki.apache.org/solr/Suggester
> > >
> > > Does it mean that what is written there is outdated?
> > >
> > > Regards,
> > > Edwin
> > >
> > >
> > >
> > > On 3 June 2015 at 23:44, Zheng Lin Edwin Yeo 
> > wrote:
> > >
> > >> Thank you for your suggestions.
> > >> Will try that out and update on the results again.
> > >>
> > >> Regards,
> > >> Edwin
> > >>
> > >>
> > >> On 3 June 2015 at 21:13, Alessandro Benedetti <
> > benedetti.ale...@gmail.com>
> > >> wrote:
> > >>
> > >>> I can see a lot of confusion in the configuration!
> > >>>
> > >>> Few suggestions :
> > >>> - read carefully the document and try to apply the suggesting
> guidance
> > >>> - currently there is no need to use spellcheck for suggestions, now
> > they
> > >>> are separated things
> > >>> - i see text used to derive suggestions, I would prefer there to see
> > the
> > >>> copy field specifically used to contain the interesting fields
> > >>> - Yes you need to build the suggester the first time to see
> suggestions
> > >>> - Yes , if you add a copy field yo need to re-index to see it filled
> !
> > >>>
> > >>> Cheers
> > >>>
> > >>> 2015-06-03 11:07 GMT+01:00 Zheng Lin Edwin Yeo  >:
> > >>>
> > >>> > This is my suggester configuration:
> > >>> >
> > >>> >   
> > >>> > 
> > >>> >   suggest
> > >>> >> >>> > name="classname">org.apache.solr.spelling.suggest.Suggester
> > >>> >> >>> >
> > >>> >
> > >>>
> >
> name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookupFactory
> > >>> >   text  
> > >>> >   0.005
> > >>> >   true
> > >>> > 
> > >>> >   
> > >>> >> >>> class="org.apache.solr.handler.component.SearchHandler"
> > >>> > name="/suggest">
> > >>> > 
> > >>> >explicit
> > >>> >   edismax
> > >>> >10
> > >>> >json
> > >>> >true
> > >>> >   text
> > >>> >
> > >>> >   true
> > >>> >   suggest
> > >>> >   true
> > >>> >   5
> > >>> >   true
> > >>> > 
> > >>> > 
> > >>> >   suggest
> > >>> > 
> > >>> >   
> > >>> >
> > >>> >
> > >>> > Yes, I've read the guide. I've found out that there is a need to do
> > >>> > re-indexing if I'm creating a new copyField. It works when I used
> the
> > >>> > copyField that's created before the indexing is done.
> > >>> >
> > >>> > As I'm using the spellcheck dictionary as my suggester, so does
> that
> > >>> mean I
> > >>> > just need to build the spellcheck dictionary?
> > >>> >
> > >>> >
> > >>> > Regards,
> > >>> > Edwin
> > >>> >
> > >>> >
> > >>> > On 3 June 2015 at 17:36, Alessandro Benedetti <
> > >>> benedetti.ale...@gmail.com>
> > >>> > wrote:
> > >>> >
> > >>> > > Can you share you suggester configurations ?
> > >>> > > Have you read the guide I linked ?
> > >>> > > Has the suggestion index/fst has been built ? ( you need to build
> > the
> > >>> > > suggester)
> > >>> > >
> > >>> > > Cheers
> > >>> > >
> > >>> > > 2015-06-03 4:07 GMT+01:00 Zheng Lin Edwin Yeo <
> > edwinye...@gmail.com>:
> > >>> > >
> > >>> > > > Thank you for your explanation.
> > >>> > > >
> > >>> > > > I'll not need to care where the suggestions are coming from.
> All
> > the
> > >>> > > > suggestions from different fields can be consolidate and
> display
> > >>> > > together.
> > >>> > > >
> > >>> > > > I've tried to put those field into a new Suggestion copy field,
> > but
> > >>> no
> > >>> > > > suggestion is shown when I set:
> > >>> > > > Suggestion  
> > >>> > > >
> > >>> > > > Is there a need to re-index the documents in order for this to
> > work?
> > >>> > > >
> > >>> > > > Regards,
> > >>> > > > Edwin
> > >>> > > >
> > >>> > > >
> > >>> > > >
> > >>> > > > 

Re: Derive suggestions across multiple fields

2015-06-04 Thread Zheng Lin Edwin Yeo
This is the result that I get from the query URL you mentioned. Still not
able to get any output.



  
0
0
  
true
mater
true
suggest
xml
  




Regards,
Edwin



On 4 June 2015 at 15:26, Dhanesh Radhakrishnan  wrote:

> Try this
>
>
> http://localhost:8983/solr/collection1/suggest?suggest=true&suggest.dictionary=suggest&suggest.build=true&wt=xml&suggest.q=mater
>
> On Thu, Jun 4, 2015 at 11:53 AM, Zheng Lin Edwin Yeo  >
> wrote:
>
> > I've tried to use the solr.SuggestComponent as stated in the website, but
> > it couldn't work.
> >
> > When I change to using the suggest with the configuration below and go a
> > query like http://localhost:8983/solr/collection1/suggest?q=mater, it
> says
> > "The Webpage cannot be found"
> >
> >   
> > 
> >   suggest
> >> name="classname">org.apache.solr.spelling.suggest.Suggester
> >>
> >
> name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookupFactory
> >   text  
> >   true
> > 
> >   
> >> name="/suggest">
> > 
> >explicit
> >   edismax
> >10
> >json
> >true
> >
> >   true
> >   suggest
> >   5
> >   true
> > 
> > 
> >   suggest
> > 
> >   
> >
> >
> > Regards,
> > Edwin
> >
> >
> > On 4 June 2015 at 13:21, Erick Erickson  wrote:
> >
> > > This may be helpful: http://lucidworks.com/blog/solr-suggester/
> > >
> > > Note that there are a series of fixes in various versions of Solr,
> > > particularly buildOnStartup=false and working on multivalued fields.
> > >
> > > Best,
> > > Erick
> > >
> > > On Wed, Jun 3, 2015 at 8:04 PM, Zheng Lin Edwin Yeo
> > >  wrote:
> > > > My previous suggester configuration is derived from this page:
> > > > https://wiki.apache.org/solr/Suggester
> > > >
> > > > Does it mean that what is written there is outdated?
> > > >
> > > > Regards,
> > > > Edwin
> > > >
> > > >
> > > >
> > > > On 3 June 2015 at 23:44, Zheng Lin Edwin Yeo 
> > > wrote:
> > > >
> > > >> Thank you for your suggestions.
> > > >> Will try that out and update on the results again.
> > > >>
> > > >> Regards,
> > > >> Edwin
> > > >>
> > > >>
> > > >> On 3 June 2015 at 21:13, Alessandro Benedetti <
> > > benedetti.ale...@gmail.com>
> > > >> wrote:
> > > >>
> > > >>> I can see a lot of confusion in the configuration!
> > > >>>
> > > >>> Few suggestions :
> > > >>> - read carefully the document and try to apply the suggesting
> > guidance
> > > >>> - currently there is no need to use spellcheck for suggestions, now
> > > they
> > > >>> are separated things
> > > >>> - i see text used to derive suggestions, I would prefer there to
> see
> > > the
> > > >>> copy field specifically used to contain the interesting fields
> > > >>> - Yes you need to build the suggester the first time to see
> > suggestions
> > > >>> - Yes , if you add a copy field yo need to re-index to see it
> filled
> > !
> > > >>>
> > > >>> Cheers
> > > >>>
> > > >>> 2015-06-03 11:07 GMT+01:00 Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> > >:
> > > >>>
> > > >>> > This is my suggester configuration:
> > > >>> >
> > > >>> >name="suggest">
> > > >>> > 
> > > >>> >   suggest
> > > >>> >> > >>> > name="classname">org.apache.solr.spelling.suggest.Suggester
> > > >>> >> > >>> >
> > > >>> >
> > > >>>
> > >
> >
> name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookupFactory
> > > >>> >   text  
> > > >>> >   0.005
> > > >>> >   true
> > > >>> > 
> > > >>> >   
> > > >>> >> > >>> class="org.apache.solr.handler.component.SearchHandler"
> > > >>> > name="/suggest">
> > > >>> > 
> > > >>> >explicit
> > > >>> >   edismax
> > > >>> >10
> > > >>> >json
> > > >>> >true
> > > >>> >   text
> > > >>> >
> > > >>> >   true
> > > >>> >   suggest
> > > >>> >   true
> > > >>> >   5
> > > >>> >   true
> > > >>> > 
> > > >>> > 
> > > >>> >   suggest
> > > >>> > 
> > > >>> >   
> > > >>> >
> > > >>> >
> > > >>> > Yes, I've read the guide. I've found out that there is a need to
> do
> > > >>> > re-indexing if I'm creating a new copyField. It works when I used
> > the
> > > >>> > copyField that's created before the indexing is done.
> > > >>> >
> > > >>> > As I'm using the spellcheck dictionary as my suggester, so does
> > that
> > > >>> mean I
> > > >>> > just need to build the spellcheck dictionary?
> > > >>> >
> > > >>> >
> > > >>> > Regards,
> > > >>> > Edwin
> > > >>> >
> > > >>> >
> > > >>> > On 3 June 2015 at 17:36, Alessandro Benedetti <
> > > >>> benedetti.ale...@gmail.com>
> > > >>> > wrote:
> > > >>> >
> > > >>> > > Can you share you suggester configurations ?
> > > >>> > > Have you read the guide I linked ?
> > > >>> > > Has the suggestion index/fst has been built ? ( you need to
> build
> > > the
> > > >>> > > suggester)
> > > >>> > >
> > > >>> > > Cheers
> > > >>> > >
> > > >>> > > 2015-06-03 4:07 GMT+01:00 Zheng Lin Edwin Yeo <
> > > edwinye...@gmail.com>:
> >

Re: Derive suggestions across multiple fields

2015-06-04 Thread Alessandro Benedetti
Let me try to clarify the things…
Because you are using solr 5.1 I can not see any reason to try to use the
old spellcheck approach.
If you take a look to the page me and Erick quoted there is a simple config
example :


> 
> mySuggester
> FuzzyLookupFactory
> suggester_fuzzy_dir
> 
> DocumentDictionaryFactory
> title
> suggestType
> false
> false
> 
> 
>


>  startup="lazy" >
> 
> true
> 10
> mySuggester
> 
> 
> suggest
> 
> 


You should use this approach.
After you build the Suggestion Dictionary ( after your first commit or
manually) you are going to be able to see the suggestions.

Your config appears to be very confused ( why an edismax query parser for a
suggestion request handler ? )

To answer do Dalnesh, there is no benefit in explicitly expressing again
the query parameters, they are already appended if you take a look to Edwin
config, so this will not solve anything.

I would suggest you to use the latest approach and then verify the
suggester building went fine.

Cheers

2015-06-04 9:13 GMT+01:00 Zheng Lin Edwin Yeo :

> This is the result that I get from the query URL you mentioned. Still not
> able to get any output.
>
> 
> 
>   
> 0
> 0
>   
> true
> mater
> true
> suggest
> xml
>   
> 
> 
>
>
> Regards,
> Edwin
>
>
>
> On 4 June 2015 at 15:26, Dhanesh Radhakrishnan  wrote:
>
> > Try this
> >
> >
> >
> http://localhost:8983/solr/collection1/suggest?suggest=true&suggest.dictionary=suggest&suggest.build=true&wt=xml&suggest.q=mater
> >
> > On Thu, Jun 4, 2015 at 11:53 AM, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> > >
> > wrote:
> >
> > > I've tried to use the solr.SuggestComponent as stated in the website,
> but
> > > it couldn't work.
> > >
> > > When I change to using the suggest with the configuration below and go
> a
> > > query like http://localhost:8983/solr/collection1/suggest?q=mater, it
> > says
> > > "The Webpage cannot be found"
> > >
> > >   
> > > 
> > >   suggest
> > >> > name="classname">org.apache.solr.spelling.suggest.Suggester
> > >> >
> > >
> >
> name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookupFactory
> > >   text  
> > >   true
> > > 
> > >   
> > >class="org.apache.solr.handler.component.SearchHandler"
> > > name="/suggest">
> > > 
> > >explicit
> > >   edismax
> > >10
> > >json
> > >true
> > >
> > >   true
> > >   suggest
> > >   5
> > >   true
> > > 
> > > 
> > >   suggest
> > > 
> > >   
> > >
> > >
> > > Regards,
> > > Edwin
> > >
> > >
> > > On 4 June 2015 at 13:21, Erick Erickson 
> wrote:
> > >
> > > > This may be helpful: http://lucidworks.com/blog/solr-suggester/
> > > >
> > > > Note that there are a series of fixes in various versions of Solr,
> > > > particularly buildOnStartup=false and working on multivalued fields.
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > > On Wed, Jun 3, 2015 at 8:04 PM, Zheng Lin Edwin Yeo
> > > >  wrote:
> > > > > My previous suggester configuration is derived from this page:
> > > > > https://wiki.apache.org/solr/Suggester
> > > > >
> > > > > Does it mean that what is written there is outdated?
> > > > >
> > > > > Regards,
> > > > > Edwin
> > > > >
> > > > >
> > > > >
> > > > > On 3 June 2015 at 23:44, Zheng Lin Edwin Yeo  >
> > > > wrote:
> > > > >
> > > > >> Thank you for your suggestions.
> > > > >> Will try that out and update on the results again.
> > > > >>
> > > > >> Regards,
> > > > >> Edwin
> > > > >>
> > > > >>
> > > > >> On 3 June 2015 at 21:13, Alessandro Benedetti <
> > > > benedetti.ale...@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >>> I can see a lot of confusion in the configuration!
> > > > >>>
> > > > >>> Few suggestions :
> > > > >>> - read carefully the document and try to apply the suggesting
> > > guidance
> > > > >>> - currently there is no need to use spellcheck for suggestions,
> now
> > > > they
> > > > >>> are separated things
> > > > >>> - i see text used to derive suggestions, I would prefer there to
> > see
> > > > the
> > > > >>> copy field specifically used to contain the interesting fields
> > > > >>> - Yes you need to build the suggester the first time to see
> > > suggestions
> > > > >>> - Yes , if you add a copy field yo need to re-index to see it
> > filled
> > > !
> > > > >>>
> > > > >>> Cheers
> > > > >>>
> > > > >>> 2015-06-03 11:07 GMT+01:00 Zheng Lin Edwin Yeo <
> > edwinye...@gmail.com
> > > >:
> > > > >>>
> > > > >>> > This is my suggester configuration:
> > > > >>> >
> > > > >>> >> name="suggest">
> > > > >>> > 
> > > > >>> >   suggest
> > > > >>> >> > > >>> >
> name="classname">org.apache.solr.spelling.suggest.Suggester
> > > > >>> >> > > >>> >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookupFactory
> > > > >>> >   text  
> > > > >>> >   0.005
> > > > >>> >   true
> > > > >>> > 
> > > > >>> >   
> > > > >>> >> > > >>> class="

Re: retrieving large number of docs

2015-06-04 Thread Alessandro Benedetti
Hi Rob,
Reading your use case I can not understand why the Query Time join is not a
fit for you !
The documents returned by the Query Time Join will be from core1, so
faceting and filter querying that core, would definitely be possible !
I can not see your problem honestly !

Cheers

2015-06-04 1:47 GMT+01:00 Robust Links :

> that doesnt work either, and even if it did, joining is not going to be a
> solution since i cant query 1 core and facet on the result of the other. To
> sum up, my problem is
>
> core0
> 
> field:id
> field: text
>
> core1
> 
> field:id
> field tag
>
>
> I want to
>
> 1) query text field of core0,
> 2) use the {id} of matches (which can be >>10K) to retrieve the docs in
> core 1 with same id and
> 3) facet on tags in core1
>
> Is this possible without denormalizing (which is not an option)?
>
> thank you
>
> On Wed, Jun 3, 2015 at 4:24 PM, Jack Krupansky 
> wrote:
>
> > Specify the join query parser for the main query. See:
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-JoinQueryParser
> >
> >
> > -- Jack Krupansky
> >
> > On Wed, Jun 3, 2015 at 3:32 PM, Robust Links 
> > wrote:
> >
> > > Hi Erick
> > >
> > > they are on the same JVM. I had already tried the core join strategy
> but
> > > that doesnt solve the faceting problem... i.e if i have 2 cores, core0
> > and
> > > core1, and I run this query on core0
> > >
> > > /select?&q=fq={!join from=id1 to=id2
> > > fromIndex=core1}&facet=true&facet.field=tag
> > >
> > > has 2 problems
> > > 1) i need to specify the docIDs with the fq (so back to the same
> > > fq={!terms} problem), and
> > > 2) faceting doesnt work
> > >
> > >
> > > Flattening the data is not possible due to security reasons.
> > >
> > > Am I using join correctly?
> > >
> > > thank you Erick
> > >
> > > Peyman
> > >
> > > On Wed, Jun 3, 2015 at 2:12 PM, Erick Erickson <
> erickerick...@gmail.com>
> > > wrote:
> > >
> > > > Are these indexes on different machines? Because if they're in the
> > > > same JVM, you might be able to use cross-core joins. Be aware,
> though,
> > > > that joining on high-cardinality fields (which, by definition, docID
> > > > probably is) is where pseudo joins perform worst.
> > > >
> > > > Have you considered flattening the data and including whatever
> > > > information you have in your "from" index in your main index? Because
> > > > < 100ms response is probably not going to be tough if you have to
> have
> > > > two indexes/cores.
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > > On Wed, Jun 3, 2015 at 10:58 AM, Joel Bernstein 
> > > > wrote:
> > > > > You may have to do something custom to meet your needs.
> > > > >
> > > > > 10,000 DocID's is not huge but you're latency requirement are
> pretty
> > > low.
> > > > >
> > > > > Are your DocID's by any chance integers? This can make custom
> > > PostFilters
> > > > > run much faster.
> > > > >
> > > > > You should also be aware of the Streaming API in Solr 5.1 which
> will
> > > give
> > > > > you fast Map/Reduce approaches (
> > > > >
> > > >
> > >
> >
> http://joelsolr.blogspot.com/2015/04/the-streaming-api-solrjio-basics.html
> > > > ).
> > > > >
> > > > > Joel Bernstein
> > > > > http://joelsolr.blogspot.com/
> > > > >
> > > > > On Wed, Jun 3, 2015 at 1:46 PM, Robust Links <
> pey...@robustlinks.com
> > >
> > > > wrote:
> > > > >
> > > > >> Hey Joel
> > > > >>
> > > > >> see below
> > > > >>
> > > > >> On Wed, Jun 3, 2015 at 1:43 PM, Joel Bernstein <
> joels...@gmail.com>
> > > > wrote:
> > > > >>
> > > > >> > A few questions for you:
> > > > >> >
> > > > >> > How large can the list of filtering ID's be?
> > > > >> >
> > > > >>
> > > > >> >> 10k
> > > > >>
> > > > >>
> > > > >> >
> > > > >> > What's your expectation on latency?
> > > > >> >
> > > > >>
> > > > >> 10> latency <100
> > > > >>
> > > > >>
> > > > >> >
> > > > >> > What version of Solr are you using?
> > > > >> >
> > > > >>
> > > > >> 5.0.0
> > > > >>
> > > > >>
> > > > >> >
> > > > >> > SolrCloud or not?
> > > > >> >
> > > > >>
> > > > >> not
> > > > >>
> > > > >>
> > > > >>
> > > > >> >
> > > > >> > Joel Bernstein
> > > > >> > http://joelsolr.blogspot.com/
> > > > >> >
> > > > >> > On Wed, Jun 3, 2015 at 1:23 PM, Robust Links <
> > > pey...@robustlinks.com>
> > > > >> > wrote:
> > > > >> >
> > > > >> > > Hi
> > > > >> > >
> > > > >> > > I have a set of document IDs from one core and i want to query
> > > > another
> > > > >> > core
> > > > >> > > using the ids retrieved from the first core...the constraint
> is
> > > that
> > > > >> the
> > > > >> > > size of doc ID set can be very large. I want to:
> > > > >> > >
> > > > >> > > 1) retrieve these docs from the 2nd index
> > > > >> > > 2) facet on the results
> > > > >> > >
> > > > >> > > I can think of 3 solutions:
> > > > >> > >
> > > > >> > > 1) boolean query
> > > > >> > > 2) terms fq
> > > > >> > > 3) use a DB rather than Solr
> > > > >> > >
> > > > >> > > I am trying to keep latencies down so prefer to not use (3)

Solr Atomic Updates by Query

2015-06-04 Thread Ксения Баталова
Hi!

I have one more question about atomic updates in Solr (Solr 4.4.0).
Is it posible to generate atomic update by query?
I mean I want to update those documents in which IDs contain some string.
For example, index has:
Doc1, id="123|a,b"
Doc2, id="123|a,c"
Doc3, id="345|a,b"
Doc4, id="345|a,c,d".

And if I don't want to generate all IDs to update, but I know that
necessary IDs start with "123".
I tried to generate query something like that (using *):

{"id":"123|*",
 "price":{"set":99}
}

But in result the document with id="123|*" was added.
Can I do this somehow?

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Best regards,
Batalova Kseniya


Re: Derive suggestions across multiple fields

2015-06-04 Thread Zheng Lin Edwin Yeo
I think I'm confused with the old spellcheck approach that came out more
frequently during my research.

Just to confirm, do I need to re-index the data in order for this new
approach to work if I'm using an existing field?


Regards,
Edwin


On 4 June 2015 at 16:58, Alessandro Benedetti 
wrote:

> Let me try to clarify the things…
> Because you are using solr 5.1 I can not see any reason to try to use the
> old spellcheck approach.
> If you take a look to the page me and Erick quoted there is a simple config
> example :
>
> 
> > 
> > mySuggester
> > FuzzyLookupFactory
> > suggester_fuzzy_dir
> > 
> > DocumentDictionaryFactory
> > title
> > suggestType
> > false
> > false
> > 
> > 
> >
>
>
> >  > startup="lazy" >
> > 
> > true
> > 10
> > mySuggester
> > 
> > 
> > suggest
> > 
> > 
>
>
> You should use this approach.
> After you build the Suggestion Dictionary ( after your first commit or
> manually) you are going to be able to see the suggestions.
>
> Your config appears to be very confused ( why an edismax query parser for a
> suggestion request handler ? )
>
> To answer do Dalnesh, there is no benefit in explicitly expressing again
> the query parameters, they are already appended if you take a look to Edwin
> config, so this will not solve anything.
>
> I would suggest you to use the latest approach and then verify the
> suggester building went fine.
>
> Cheers
>
> 2015-06-04 9:13 GMT+01:00 Zheng Lin Edwin Yeo :
>
> > This is the result that I get from the query URL you mentioned. Still not
> > able to get any output.
> >
> > 
> > 
> >   
> > 0
> > 0
> >   
> > true
> > mater
> > true
> > suggest
> > xml
> >   
> > 
> > 
> >
> >
> > Regards,
> > Edwin
> >
> >
> >
> > On 4 June 2015 at 15:26, Dhanesh Radhakrishnan 
> wrote:
> >
> > > Try this
> > >
> > >
> > >
> >
> http://localhost:8983/solr/collection1/suggest?suggest=true&suggest.dictionary=suggest&suggest.build=true&wt=xml&suggest.q=mater
> > >
> > > On Thu, Jun 4, 2015 at 11:53 AM, Zheng Lin Edwin Yeo <
> > edwinye...@gmail.com
> > > >
> > > wrote:
> > >
> > > > I've tried to use the solr.SuggestComponent as stated in the website,
> > but
> > > > it couldn't work.
> > > >
> > > > When I change to using the suggest with the configuration below and
> go
> > a
> > > > query like http://localhost:8983/solr/collection1/suggest?q=mater,
> it
> > > says
> > > > "The Webpage cannot be found"
> > > >
> > > >   
> > > > 
> > > >   suggest
> > > >> > > name="classname">org.apache.solr.spelling.suggest.Suggester
> > > >> > >
> > > >
> > >
> >
> name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookupFactory
> > > >   text  
> > > >   true
> > > > 
> > > >   
> > > >> class="org.apache.solr.handler.component.SearchHandler"
> > > > name="/suggest">
> > > > 
> > > >explicit
> > > >   edismax
> > > >10
> > > >json
> > > >true
> > > >
> > > >   true
> > > >   suggest
> > > >   5
> > > >   true
> > > > 
> > > > 
> > > >   suggest
> > > > 
> > > >   
> > > >
> > > >
> > > > Regards,
> > > > Edwin
> > > >
> > > >
> > > > On 4 June 2015 at 13:21, Erick Erickson 
> > wrote:
> > > >
> > > > > This may be helpful: http://lucidworks.com/blog/solr-suggester/
> > > > >
> > > > > Note that there are a series of fixes in various versions of Solr,
> > > > > particularly buildOnStartup=false and working on multivalued
> fields.
> > > > >
> > > > > Best,
> > > > > Erick
> > > > >
> > > > > On Wed, Jun 3, 2015 at 8:04 PM, Zheng Lin Edwin Yeo
> > > > >  wrote:
> > > > > > My previous suggester configuration is derived from this page:
> > > > > > https://wiki.apache.org/solr/Suggester
> > > > > >
> > > > > > Does it mean that what is written there is outdated?
> > > > > >
> > > > > > Regards,
> > > > > > Edwin
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 3 June 2015 at 23:44, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > >> Thank you for your suggestions.
> > > > > >> Will try that out and update on the results again.
> > > > > >>
> > > > > >> Regards,
> > > > > >> Edwin
> > > > > >>
> > > > > >>
> > > > > >> On 3 June 2015 at 21:13, Alessandro Benedetti <
> > > > > benedetti.ale...@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >>> I can see a lot of confusion in the configuration!
> > > > > >>>
> > > > > >>> Few suggestions :
> > > > > >>> - read carefully the document and try to apply the suggesting
> > > > guidance
> > > > > >>> - currently there is no need to use spellcheck for suggestions,
> > now
> > > > > they
> > > > > >>> are separated things
> > > > > >>> - i see text used to derive suggestions, I would prefer there
> to
> > > see
> > > > > the
> > > > > >>> copy field specifically used to contain the interesting fields
> > > > > >>> - Yes you need to build the suggester the first time to see
> > > > suggestions
> > > > > >>> - Yes , if you add a copy field yo need to re-index to

Running Solr 5.1.0 as a Service on Windows

2015-06-04 Thread Guillaume Belrose

Hi, 

I've successfully used procrun (see 
http://commons.apache.org/proper/commons-daemon/procrun.html) to wrap Solr 5.1 
solr.cmd script as a Windows service (I’ve only tested on Windows 2008 R2). 
Previously, I was using Procrun to manage Jetty services running the Solr.war 
from older versions but with a bit a tweaking, I was able to wrap the new Solr 
5.1.0 scripts.

I roughly did the following:
-download and unzip the Solr 5.1.0 distribution to a local folder (i.e. c:\opt )
-download and unzip the Apache Commons Daemon .zip file (from 
http://commons.apache.org/proper/commons-daemon/download_daemon.cgi) in my solr 
local folder (i.e. c:\opt\solr-5.1.0)
-run the batch file [1].

All of this was done through Ansible Playbooks which is the tool I use for 
configuration management on Windows and Linux.

Cheers, 

Guillaume.

[1] 
@echo off
set SERVICE_NAME=solr
set SERVICE_HOME=c:\opt\solr-5.1.0
set PR_INSTALL=%SERVICE_HOME%\amd64\prunsrv.exe

@REM Service Log Configuration
set PR_LOGPREFIX=%SERVICE_NAME%
set PR_LOGPATH=%SERVICE_HOME%\logs
set PR_STDOUTPUT=auto
set PR_STDERROR=auto
set PR_LOGLEVEL=Debug

set PR_STARTUP=auto
set PR_STARTMODE=exe
set PR_STARTIMAGE=%SERVICE_HOME%\bin\solr.cmd
set PR_STARTPARAMS=start

@REM Shutdown Configuration
set PR_STOPMODE=exe
set PR_STOPIMAGE=%SERVICE_HOME%\bin\solr.cmd
set PR_STOPPARAMS=stop -p 8983

%PR_INSTALL% //IS/%SERVICE_NAME% ^
  --Description="Solr-5.1.0" ^
  --DisplayName="%SERVICE_NAME%" ^
  --Install="%PR_INSTALL%" ^
  --Startup="%PR_STARTUP%" ^
  --LogPath="%PR_LOGPATH%" ^
  --LogPrefix="%PR_LOGPREFIX%" ^
  --LogLevel="%PR_LOGLEVEL%" ^
  --StdOutput="%PR_STDOUTPUT%" ^
  --StdError="%PR_STDERROR%" ^
  --StartMode="%PR_STARTMODE%" ^
  --StartImage="%PR_STARTIMAGE%" ^
  --StartParams="%PR_STARTPARAMS%" ^
  --StopMode="%PR_STOPMODE%" ^
  --StopImage="%PR_STOPIMAGE%" ^
  --StopParams="%PR_STOPPARAMS%"

if not errorlevel 1 goto installed
echo Failed to install "%SERVICE_NAME%" service.  Refer to log in %PR_LOGPATH%
exit /B 1

:installed
echo The Service "%SERVICE_NAME%" has been installed
exit /B 0
---
This email has been scanned for email related threats and delivered safely by 
Mimecast.
For more information please visit http://www.mimecast.com
---



Re: Index optimize runs in background.

2015-06-04 Thread Modassar Ather
Hi,

Please provide your inputs on optimize and commit running as background.
Your suggestion will be really helpful.

Thanks,
Modassar

On Tue, Jun 2, 2015 at 6:05 PM, Modassar Ather 
wrote:

> Erick! I could not find any underlying setting of 10 minutes.
> It is not only optimize but commit is also behaving in the same fashion
> and is taking lesser time than usually had taken.
> As per my observation both are running in background.
>
> On Fri, May 29, 2015 at 7:21 PM, Erick Erickson 
> wrote:
>
>> I'm not talking about you setting a timeout, but the underlying
>> connection timing out...
>>
>> The "10 minutes then the indexer exits" comment points in that direction.
>>
>> Best,
>> Erick
>>
>> On Thu, May 28, 2015 at 11:43 PM, Modassar Ather 
>> wrote:
>> > I have not added any timeout in the indexer except zk client time out
>> which
>> > is 30 seconds. I am simply calling client.close() at the end of
>> indexing.
>> > The same code was not running in background for optimize with
>> solr-4.10.3
>> > and org.apache.solr.client.solrj.impl.CloudSolrServer.
>> >
>> > On Fri, May 29, 2015 at 11:13 AM, Erick Erickson <
>> erickerick...@gmail.com>
>> > wrote:
>> >
>> >> Are you timing out on the client request? The theory here is that it's
>> >> still a synchronous call, but you're just timing out at the client
>> >> level. At that point, the optimize is still running it's just the
>> >> connection has been dropped
>> >>
>> >> Shot in the dark.
>> >> Erick
>> >>
>> >> On Thu, May 28, 2015 at 10:31 PM, Modassar Ather <
>> modather1...@gmail.com>
>> >> wrote:
>> >> > I could not notice it but with my past experience of commit which
>> used to
>> >> > take around 2 minutes is now taking around 8 seconds. I think this is
>> >> also
>> >> > running as background.
>> >> >
>> >> > On Fri, May 29, 2015 at 10:52 AM, Modassar Ather <
>> modather1...@gmail.com
>> >> >
>> >> > wrote:
>> >> >
>> >> >> The indexer takes almost 2 hours to optimize. It has a
>> multi-threaded
>> >> add
>> >> >> of batches of documents to
>> >> >> org.apache.solr.client.solrj.impl.CloudSolrClient.
>> >> >> Once all the documents are indexed it invokes commit and optimize. I
>> >> have
>> >> >> seen that the optimize goes into background after 10 minutes and
>> indexer
>> >> >> exits.
>> >> >> I am not sure why this 10 minutes it hangs on indexer. This
>> behavior I
>> >> >> have seen in multiple iteration of the indexing of same data.
>> >> >>
>> >> >> There is nothing significant I found in log which I can share. I
>> can see
>> >> >> following in log.
>> >> >> org.apache.solr.update.DirectUpdateHandler2; start
>> >> >>
>> >>
>> commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
>> >> >>
>> >> >> On Wed, May 27, 2015 at 10:59 PM, Erick Erickson <
>> >> erickerick...@gmail.com>
>> >> >> wrote:
>> >> >>
>> >> >>> All strange of course. What do your Solr logs show when this
>> happens?
>> >> >>> And how reproducible is this?
>> >> >>>
>> >> >>> Best,
>> >> >>> Erick
>> >> >>>
>> >> >>> On Wed, May 27, 2015 at 4:00 AM, Upayavira  wrote:
>> >> >>> > In this case, optimising makes sense, once the index is
>> generated,
>> >> you
>> >> >>> > are not updating It.
>> >> >>> >
>> >> >>> > Upayavira
>> >> >>> >
>> >> >>> > On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
>> >> >>> >> Our index has almost 100M documents running on SolrCloud of 5
>> shards
>> >> >>> and
>> >> >>> >> each shard has an index size of about 170+GB (for the record,
>> we are
>> >> >>> not
>> >> >>> >> using stored fields - our documents are pretty large). We
>> perform a
>> >> >>> full
>> >> >>> >> indexing every weekend and during the week there are no updates
>> >> made to
>> >> >>> >> the
>> >> >>> >> index. Most of the queries that we run are pretty complex with
>> >> hundreds
>> >> >>> >> of
>> >> >>> >> terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards,
>> boosts
>> >> >>> etc.
>> >> >>> >> and take many minutes to execute. A difference of 10-20% is
>> also a
>> >> big
>> >> >>> >> advantage for us.
>> >> >>> >>
>> >> >>> >> We have been optimizing the index after indexing for years and
>> it
>> >> has
>> >> >>> >> worked well for us. Every once in a while, we upgrade Solr to
>> the
>> >> >>> latest
>> >> >>> >> version and try without optimizing so that we can save the many
>> >> hours
>> >> >>> it
>> >> >>> >> take to optimize such a huge index, but find optimized index
>> work
>> >> well
>> >> >>> >> for
>> >> >>> >> us.
>> >> >>> >>
>> >> >>> >> Erick I was indexing today the documents and saw the optimize
>> >> happening
>> >> >>> >> in
>> >> >>> >> background.
>> >> >>> >>
>> >> >>> >> On Tue, May 26, 2015 at 9:12 PM, Erick Erickson <
>> >> >>> erickerick...@gmail.com>
>> >> >>> >> wrote:
>> >> >>> >>
>> >> >>> >> > No results yet. I finished the test harness last night (not
>> >> really a
>> >> >>> >> > unit test, a stand-alone program that endlessly adds stuff and
>> >> tests

Re: Derive suggestions across multiple fields

2015-06-04 Thread Alessandro Benedetti
If you are using an existing indexed field to provide suggestions, you
simply need to build the suggester and start using it !
No re-indexing needed .

Cheers

2015-06-04 11:01 GMT+01:00 Zheng Lin Edwin Yeo :

> I think I'm confused with the old spellcheck approach that came out more
> frequently during my research.
>
> Just to confirm, do I need to re-index the data in order for this new
> approach to work if I'm using an existing field?
>
>
> Regards,
> Edwin
>
>
> On 4 June 2015 at 16:58, Alessandro Benedetti 
> wrote:
>
> > Let me try to clarify the things…
> > Because you are using solr 5.1 I can not see any reason to try to use the
> > old spellcheck approach.
> > If you take a look to the page me and Erick quoted there is a simple
> config
> > example :
> >
> > 
> > > 
> > > mySuggester
> > > FuzzyLookupFactory
> > > suggester_fuzzy_dir
> > > 
> > > DocumentDictionaryFactory
> > > title
> > > suggestType
> > > false
> > > false
> > > 
> > > 
> > >
> >
> >
> > >  > > startup="lazy" >
> > > 
> > > true
> > > 10
> > > mySuggester
> > > 
> > > 
> > > suggest
> > > 
> > > 
> >
> >
> > You should use this approach.
> > After you build the Suggestion Dictionary ( after your first commit or
> > manually) you are going to be able to see the suggestions.
> >
> > Your config appears to be very confused ( why an edismax query parser
> for a
> > suggestion request handler ? )
> >
> > To answer do Dalnesh, there is no benefit in explicitly expressing again
> > the query parameters, they are already appended if you take a look to
> Edwin
> > config, so this will not solve anything.
> >
> > I would suggest you to use the latest approach and then verify the
> > suggester building went fine.
> >
> > Cheers
> >
> > 2015-06-04 9:13 GMT+01:00 Zheng Lin Edwin Yeo :
> >
> > > This is the result that I get from the query URL you mentioned. Still
> not
> > > able to get any output.
> > >
> > > 
> > > 
> > >   
> > > 0
> > > 0
> > >   
> > > true
> > > mater
> > > true
> > > suggest
> > > xml
> > >   
> > > 
> > > 
> > >
> > >
> > > Regards,
> > > Edwin
> > >
> > >
> > >
> > > On 4 June 2015 at 15:26, Dhanesh Radhakrishnan 
> > wrote:
> > >
> > > > Try this
> > > >
> > > >
> > > >
> > >
> >
> http://localhost:8983/solr/collection1/suggest?suggest=true&suggest.dictionary=suggest&suggest.build=true&wt=xml&suggest.q=mater
> > > >
> > > > On Thu, Jun 4, 2015 at 11:53 AM, Zheng Lin Edwin Yeo <
> > > edwinye...@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > I've tried to use the solr.SuggestComponent as stated in the
> website,
> > > but
> > > > > it couldn't work.
> > > > >
> > > > > When I change to using the suggest with the configuration below and
> > go
> > > a
> > > > > query like http://localhost:8983/solr/collection1/suggest?q=mater,
> > it
> > > > says
> > > > > "The Webpage cannot be found"
> > > > >
> > > > >   
> > > > > 
> > > > >   suggest
> > > > >> > > > name="classname">org.apache.solr.spelling.suggest.Suggester
> > > > >> > > >
> > > > >
> > > >
> > >
> >
> name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookupFactory
> > > > >   text  
> > > > >   true
> > > > > 
> > > > >   
> > > > >> > class="org.apache.solr.handler.component.SearchHandler"
> > > > > name="/suggest">
> > > > > 
> > > > >explicit
> > > > >   edismax
> > > > >10
> > > > >json
> > > > >true
> > > > >
> > > > >   true
> > > > >   suggest
> > > > >   5
> > > > >   true
> > > > > 
> > > > > 
> > > > >   suggest
> > > > > 
> > > > >   
> > > > >
> > > > >
> > > > > Regards,
> > > > > Edwin
> > > > >
> > > > >
> > > > > On 4 June 2015 at 13:21, Erick Erickson 
> > > wrote:
> > > > >
> > > > > > This may be helpful: http://lucidworks.com/blog/solr-suggester/
> > > > > >
> > > > > > Note that there are a series of fixes in various versions of
> Solr,
> > > > > > particularly buildOnStartup=false and working on multivalued
> > fields.
> > > > > >
> > > > > > Best,
> > > > > > Erick
> > > > > >
> > > > > > On Wed, Jun 3, 2015 at 8:04 PM, Zheng Lin Edwin Yeo
> > > > > >  wrote:
> > > > > > > My previous suggester configuration is derived from this page:
> > > > > > > https://wiki.apache.org/solr/Suggester
> > > > > > >
> > > > > > > Does it mean that what is written there is outdated?
> > > > > > >
> > > > > > > Regards,
> > > > > > > Edwin
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On 3 June 2015 at 23:44, Zheng Lin Edwin Yeo <
> > edwinye...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > >> Thank you for your suggestions.
> > > > > > >> Will try that out and update on the results again.
> > > > > > >>
> > > > > > >> Regards,
> > > > > > >> Edwin
> > > > > > >>
> > > > > > >>
> > > > > > >> On 3 June 2015 at 21:13, Alessandro Benedetti <
> > > > > > benedetti.ale...@gmail.com>
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >>> I can see a lot of confusion in the configuration!
>

indexing issue

2015-06-04 Thread Midas A
I have some indexing issue . While indexing IOwait is high in solr server
and load also.


Re: indexing issue

2015-06-04 Thread Toke Eskildsen
On Thu, 2015-06-04 at 16:45 +0530, Midas A wrote:
> I have some indexing issue . While indexing IOwait is high in solr server
> and load also.

Might be because you commit too frequently. How often do you do that?

- Toke Eskildsen, State and University Library, Denmark




Re: indexing issue

2015-06-04 Thread Alessandro Benedetti
I think this mail is really poor in term of details.
Which version of Solr are you using ?
Architecture ?
Load expected ?
Indexing approach ?
When does your problem happens ?

More detail we give, easier will be to provide help.

Cheers

2015-06-04 12:19 GMT+01:00 Toke Eskildsen :

> On Thu, 2015-06-04 at 16:45 +0530, Midas A wrote:
> > I have some indexing issue . While indexing IOwait is high in solr server
> > and load also.
>
> Might be because you commit too frequently. How often do you do that?
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>


-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: indexing issue

2015-06-04 Thread Midas A
Thanks for replying below is commit frequency

 6  false  
60  


On Thu, Jun 4, 2015 at 4:49 PM, Toke Eskildsen 
wrote:

> On Thu, 2015-06-04 at 16:45 +0530, Midas A wrote:
> > I have some indexing issue . While indexing IOwait is high in solr server
> > and load also.
>
> Might be because you commit too frequently. How often do you do that?
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>


Re: indexing issue

2015-06-04 Thread Midas A
Thanks Alessandro,

Please find the info inline .

Which version of Solr are you using : 4.2.1

   - Architecture : Master -slave

Load expected : currently it is 7- 15 should be below 1
Indexing approach : Using DIH
When does your problem happens :  we run delta import every 10 mins full
index once a day .. some time it goes to 7-15


On Thu, Jun 4, 2015 at 4:52 PM, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> I think this mail is really poor in term of details.
> Which version of Solr are you using ?
> Architecture ?
> Load expected ?
> Indexing approach ?
> When does your problem happens ?
>
> More detail we give, easier will be to provide help.
>
> Cheers
>
> 2015-06-04 12:19 GMT+01:00 Toke Eskildsen :
>
> > On Thu, 2015-06-04 at 16:45 +0530, Midas A wrote:
> > > I have some indexing issue . While indexing IOwait is high in solr
> server
> > > and load also.
> >
> > Might be because you commit too frequently. How often do you do that?
> >
> > - Toke Eskildsen, State and University Library, Denmark
> >
> >
> >
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


Re: indexing issue

2015-06-04 Thread Alessandro Benedetti
Honestly your auto-commit configuration seems not alarming at all!
Can you give me more details regarding :

Load expected : currently it is 7- 15 should be below 1

What does this mean ? Without a unit of measure i find hard to understand
plain numbers :)
 was expecting the number of documents per unit of time you index, and an
average size of these docs.
Which kind of DIH processor ? Where is your data coming from ? A database ?

Let's try to improve the understanding of the situation and then evaluate
an approach.

Cheers

​


Re: How to identify field names from the suggested values in multiple fields

2015-06-04 Thread Dhanesh Radhakrishnan
Dear Erick,
That document help me to build multiple suggesters
But still there is one problem that I faced.
When I used both suggesters with same lookupImpl as
AnalyzingInfixLookupFactory
AnalyzingInfixLookupFactory

solr throws an error

Caused by: java.lang.RuntimeException at
org.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory.create(AnalyzingInfixLookupFactory.java:138)
at
org.apache.solr.spelling.suggest.SolrSuggester.init(SolrSuggester.java:107)
at
org.apache.solr.handler.component.SuggestComponent.inform(SuggestComponent.java:119)
at
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:620)
at org.apache.solr.core.SolrCore.(SolrCore.java:868)
... 8 more

So that I changed the Lookup with FuzzyLookupFactory suggester worked for
both fields.

Is there any limitation using AnalyzingInfixLookupFactory for multiple
suggesters

I'm using SOLR 5.1

Regards
dhanesh s.r


On Thu, Jun 4, 2015 at 11:33 AM, Erick Erickson 
wrote:

> Yes, this might help: http://lucidworks.com/blog/solr-suggester/
>
> Best,
> Erick
>
> On Wed, Jun 3, 2015 at 10:32 PM, Dhanesh Radhakrishnan
>  wrote:
> > Thank you for the quick response.
> > If I use 2 suggesters, can I get the result in a single request?
> >
> http://192.17.80.99:8983/solr/core1/suggest?suggest=true&suggest.dictionary=mySuggester&wt=xml&suggest.q=school
> > Is there any helping document to build multiple suggesters??
> >
> >
> > On Thu, Jun 4, 2015 at 10:40 AM, Walter Underwood  >
> > wrote:
> >
> >> Configure two suggesters, one based on each field. Use both of them and
> >> you’ll get separate suggestions from each.
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >>
> >> On Jun 3, 2015, at 10:03 PM, Dhanesh Radhakrishnan 
> >> wrote:
> >>
> >> > Hi
> >> > Anyone help me to  build a suggester auto complete based on multiple
> >> fields?
> >> > There are two fields in my schema. Category and Subcategory and I'm
> >> trying
> >> > to build  suggester based on these 2 fields. When the suggestions
> result,
> >> > how can I distinguish from which filed it come from?
> >> >
> >> > I used a copyfields to combine multiple fields into single field and
> use
> >> > that field in suggester
> >> > But this will  return the combined result of category and
> subcategory. I
> >> > can't differentiate the results that fetch from which field
> >> >
> >> > These are the copyfields for autocomplete
> >> > 
> >> > 
> >> >
> >> > Suggestions should know from which field its from.
> >> > For Eg my suggester returns 5 results for the keyword "schools". In
> that
> >> > result  2 from the category field and 3 from the subcategory field.
> >> >
> >> > Schools (Category)
> >> > Primary Schools (Subcategory)
> >> > Driving Schools (Subcategory)
> >> > Day care and play school (Subcategory)
> >> > Day Care/Play School (Category)
> >> >
> >> >
> >> > Is there any way to build like this ??
> >> >
> >> >
> >> > --
> >> > [image: hifx_logo] 
> >> > *dhanesh s.R *
> >> > Team Lead
> >> > t: (+91) 484 4011750 (ext. 712) | m: (+91) 99 4  703
> >> > e: dhan...@hifx.in | w: www.hifx.in
> >> >  
> >> > 
> >> > 
> >> >
> >> > --
> >> >
> >> > --
> >> > IMPORTANT: This is an e-mail from HiFX IT Media Services Pvt. Ltd. Its
> >> > content are confidential to the intended recipient. If you are not the
> >> > intended recipient, be advised that you have received this e-mail in
> >> error
> >> > and that any use, dissemination, forwarding, printing or copying of
> this
> >> > e-mail is strictly prohibited. It may not be disclosed to or used by
> >> anyone
> >> > other than its intended recipient, nor may it be copied in any way. If
> >> > received in error, please email a reply to the sender, then delete it
> >> from
> >> > your system.
> >> >
> >> > Although this e-mail has been scanned for viruses, HiFX cannot
> ultimately
> >> > accept any responsibility for viruses and it is your responsibility to
> >> scan
> >> > attachments (if any).
> >> >
> >> >
> >> > Before you print this email or attachments, please consider the
> negative
> >> > environmental impacts associated with printing.
> >>
> >>
> >
> >
> > --
> >  [image: hifx_logo] 
> > *dhanesh s.R *
> > Team Lead
> > t: (+91) 484 4011750 (ext. 712) | m: (+91) 99 4  703
> > e: dhan...@hifx.in | w: www.hifx.in
> >  
> > 
> > 
> >
> > --
> >
> > --
> > IMPORTANT: This is an e-mail from HiFX IT Media Services Pvt. Ltd. Its
> > content are confidential to the intended recipient. If you are not the
> >

Apache Solr Stack Trace Admin

2015-06-04 Thread Adam Hall
Hi there,

We have installed apache solr 3.6.2 for our Magento Enterprise sales
platform (unfortunately its the only version Enterprise supports),
however, when navigating the admin interface we keep stumbling across
stack traces:

 PWC6033: Unable to compile class for JSP

PWC6197: An error occurred at line: 181 in the jsp file: /admin/analysis.jsp
PWC6199: Generated servlet error:
The type java.lang.CharSequence cannot be resolved. It is indirectly referenced 
from required .class files

org.apache.jasper.JasperException: PWC6033: Unable to compile class for JSP

PWC6197: An error occurred at line: 181 in the jsp file: /admin/analysis.jsp
PWC6199: Generated servlet error:
The type java.lang.CharSequence cannot be resolved. It is indirectly referenced 
from required .class files


at 
org.apache.jasper.compiler.DefaultErrorHandler.javacError(DefaultErrorHandler.java:123)
at 
org.apache.jasper.compiler.ErrorDispatcher.javacError(ErrorDispatcher.java:296)
at org.apache.jasper.compiler.Compiler.generateClass(Compiler.java:376)
at org.apache.jasper.compiler.Compiler.compile(Compiler.java:437)
at 
org.apache.jasper.JspCompilationContext.compile(JspCompilationContext.java:608)
at 
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:360)
at 
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:486)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:380)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at 
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:327)
at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:283)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

Could you kindly advise an appropriate solution to this issue?

Kind Regards,

Adam
Adam Hall
TECHNICAL DEVELOPMENT MANAGER
DOBELL
(formerly known as MyTuxedo)

Telephone: +44 (0)1323 745932 | W. www.dobell.co.uk  | E. adamh...@dobell.co.uk
Company Number. 4964527 | Vat No. 860 6313 37



Re: retrieving large number of docs

2015-06-04 Thread Robust Links
try it for yourself and see if it works Alessandro. Not only cant i get
facets but i even get field errors when i run such join queries

select?fl=title&q={!join from=id to=id fromIndex=Tags}titleNormalized:pdf


undefined field titleNormalized
400





On Thu, Jun 4, 2015 at 5:19 AM, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> Hi Rob,
> Reading your use case I can not understand why the Query Time join is not a
> fit for you !
> The documents returned by the Query Time Join will be from core1, so
> faceting and filter querying that core, would definitely be possible !
> I can not see your problem honestly !
>
> Cheers
>
> 2015-06-04 1:47 GMT+01:00 Robust Links :
>
> > that doesnt work either, and even if it did, joining is not going to be a
> > solution since i cant query 1 core and facet on the result of the other.
> To
> > sum up, my problem is
> >
> > core0
> > 
> > field:id
> > field: text
> >
> > core1
> > 
> > field:id
> > field tag
> >
> >
> > I want to
> >
> > 1) query text field of core0,
> > 2) use the {id} of matches (which can be >>10K) to retrieve the docs in
> > core 1 with same id and
> > 3) facet on tags in core1
> >
> > Is this possible without denormalizing (which is not an option)?
> >
> > thank you
> >
> > On Wed, Jun 3, 2015 at 4:24 PM, Jack Krupansky  >
> > wrote:
> >
> > > Specify the join query parser for the main query. See:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-JoinQueryParser
> > >
> > >
> > > -- Jack Krupansky
> > >
> > > On Wed, Jun 3, 2015 at 3:32 PM, Robust Links 
> > > wrote:
> > >
> > > > Hi Erick
> > > >
> > > > they are on the same JVM. I had already tried the core join strategy
> > but
> > > > that doesnt solve the faceting problem... i.e if i have 2 cores,
> core0
> > > and
> > > > core1, and I run this query on core0
> > > >
> > > > /select?&q=fq={!join from=id1 to=id2
> > > > fromIndex=core1}&facet=true&facet.field=tag
> > > >
> > > > has 2 problems
> > > > 1) i need to specify the docIDs with the fq (so back to the same
> > > > fq={!terms} problem), and
> > > > 2) faceting doesnt work
> > > >
> > > >
> > > > Flattening the data is not possible due to security reasons.
> > > >
> > > > Am I using join correctly?
> > > >
> > > > thank you Erick
> > > >
> > > > Peyman
> > > >
> > > > On Wed, Jun 3, 2015 at 2:12 PM, Erick Erickson <
> > erickerick...@gmail.com>
> > > > wrote:
> > > >
> > > > > Are these indexes on different machines? Because if they're in the
> > > > > same JVM, you might be able to use cross-core joins. Be aware,
> > though,
> > > > > that joining on high-cardinality fields (which, by definition,
> docID
> > > > > probably is) is where pseudo joins perform worst.
> > > > >
> > > > > Have you considered flattening the data and including whatever
> > > > > information you have in your "from" index in your main index?
> Because
> > > > > < 100ms response is probably not going to be tough if you have to
> > have
> > > > > two indexes/cores.
> > > > >
> > > > > Best,
> > > > > Erick
> > > > >
> > > > > On Wed, Jun 3, 2015 at 10:58 AM, Joel Bernstein <
> joels...@gmail.com>
> > > > > wrote:
> > > > > > You may have to do something custom to meet your needs.
> > > > > >
> > > > > > 10,000 DocID's is not huge but you're latency requirement are
> > pretty
> > > > low.
> > > > > >
> > > > > > Are your DocID's by any chance integers? This can make custom
> > > > PostFilters
> > > > > > run much faster.
> > > > > >
> > > > > > You should also be aware of the Streaming API in Solr 5.1 which
> > will
> > > > give
> > > > > > you fast Map/Reduce approaches (
> > > > > >
> > > > >
> > > >
> > >
> >
> http://joelsolr.blogspot.com/2015/04/the-streaming-api-solrjio-basics.html
> > > > > ).
> > > > > >
> > > > > > Joel Bernstein
> > > > > > http://joelsolr.blogspot.com/
> > > > > >
> > > > > > On Wed, Jun 3, 2015 at 1:46 PM, Robust Links <
> > pey...@robustlinks.com
> > > >
> > > > > wrote:
> > > > > >
> > > > > >> Hey Joel
> > > > > >>
> > > > > >> see below
> > > > > >>
> > > > > >> On Wed, Jun 3, 2015 at 1:43 PM, Joel Bernstein <
> > joels...@gmail.com>
> > > > > wrote:
> > > > > >>
> > > > > >> > A few questions for you:
> > > > > >> >
> > > > > >> > How large can the list of filtering ID's be?
> > > > > >> >
> > > > > >>
> > > > > >> >> 10k
> > > > > >>
> > > > > >>
> > > > > >> >
> > > > > >> > What's your expectation on latency?
> > > > > >> >
> > > > > >>
> > > > > >> 10> latency <100
> > > > > >>
> > > > > >>
> > > > > >> >
> > > > > >> > What version of Solr are you using?
> > > > > >> >
> > > > > >>
> > > > > >> 5.0.0
> > > > > >>
> > > > > >>
> > > > > >> >
> > > > > >> > SolrCloud or not?
> > > > > >> >
> > > > > >>
> > > > > >> not
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> >
> > > > > >> > Joel Bernstein
> > > > > >> > http://joelsolr.blogspot.com/
> > > > > >> >
> > > > > >> > On Wed, Jun 3, 2015 at 1:23 PM, Robust Links <
> > > > pey...@robustlinks.com>
> > > 

Re: BoolField fieldType

2015-06-04 Thread Steven White
Thanks Erick.

What about at query time?  If I index my Boolean and it has one of the
variations of "t", "T" or "1", what should my query be to get a hit on
"true"?  q=MyBoolField: ?  What should the value of  be when I
want to check if the field has a "true" and when I need to check if it has
a "false"?

Steve

On Wed, Jun 3, 2015 at 6:41 PM, Erick Erickson 
wrote:

> I took a quick look at the code and it _looks_ like any string
> starting with "t", "T" or "1" is evaluated as true and everything else
> as false.
>
> sortMissingLast determines sort order if you're sorting on this field
> and the document doesn't have a value. Should the be sorted after or
> before docs that have a value for the field?
>
> Hmm, could use some better docs
>
> Erick
>
> On Wed, Jun 3, 2015 at 2:38 PM, Steven White  wrote:
> > Hi everyone,
> >
> > This is a two part question:
> >
> > 1) I see the following:  > sortMissingLast="true"/>
> >
> > a) what does sortMissingLast do?
> > b) what kind of data is considered Boolean?  "TRUE", "True", "true", "1",
> > "yes,", "Yes, "FALSE", etc.
> >
> > 2) When searching, what do I search on: q=MyBoolField:  That is
> what
> > should "" be?
> >
> > Thanks
> >
> > Steve
>


Re: indexing issue

2015-06-04 Thread Midas A
Hi Alessandro,



On Thu, Jun 4, 2015 at 5:19 PM, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> Honestly your auto-commit configuration seems not alarming at all!
> Can you give me more details regarding :
>
> Load expected : currently it is 7- 15 should be below 1
> *[Abhishek] :  solr server load average.*
> What does this mean ? Without a unit of measure i find hard to understand
> plain numbers :)
>


>  was expecting the number of documents per unit of time you index, and an
> average size of these docs.
>
*   [Abhishek] :  avg size of doc : 250 kb *
 6  false 
we have not specified Max docs limit

Which kind of DIH processor ? Where is your data coming from ? A database ?
> *  [Abhishek] :  Using mysql data base and inbuilt  Solr DIH  (Data import
> handler)*
>


> Let's try to improve the understanding of the situation and then evaluate
> an approach.
>
> Cheers
>
> ​
>


Re: When using Dismax, Solr 5.1 tries to compare the entire field to the search string, instead of only using keywords

2015-06-04 Thread Shawn Heisey
On 6/4/2015 1:22 AM, Wouter Admiraal wrote:
> When I turn on debug, I get the following:
> 
> "debug": {
>   "rawquerystring": "Food",
>   "querystring": "Food",
>   "parsedquery": "(+DisjunctionMaxQuery((label:Food^3.0)) ())/no_coord",
>   "parsedquery_toString": "+(label:Food^3.0) ()",
>   "explain": {},
>   "QParser": "DisMaxQParser",
>   "altquerystring": null,
>   "boostfuncs": null,
>   ...
> }
> 
> I don't understand how/why this doesn't use a "contains" operator.
> This was the behavior on the old 1.4 instance. I went through the
> changelog for 1.4 to 5.1, but I don't find any explicit information
> about dismax behaving differently, except the "mm" parameter needs a
> default. I tried many values for mm (including 0, 100%, 100, etc) but
> to no avail.

In your schema.xml, what is the definition of the label field, and the
fieldType definition of the type used in the label field?  That will
determine exactly how the query is parsed and whether individual words
will match.  I wasn't using dismax or edismax back when I was running
1.4, so I can't say anything about how it used to work, only how it
works now.

Thanks,
Shawn



Re: Apache Solr Stack Trace Admin

2015-06-04 Thread Shawn Heisey
On 6/4/2015 5:32 AM, Adam Hall wrote:
> We have installed apache solr 3.6.2 for our Magento Enterprise sales
> platform (unfortunately its the only version Enterprise supports),
> however, when navigating the admin interface we keep stumbling across
> stack traces:
> 
>  PWC6033: Unable to compile class for JSP
> 
> PWC6197: An error occurred at line: 181 in the jsp file: /admin/analysis.jsp
> PWC6199: Generated servlet error:
> The type java.lang.CharSequence cannot be resolved. It is indirectly 
> referenced from required .class files
> 
> org.apache.jasper.JasperException: PWC6033: Unable to compile class for JSP

Solr 1.x and 3.x uses JSP for its admin UI, so you must have JSP support
in the servlet container.  With many servlet containers, JSP support
requires the Java JDK, not just the JRE -- the container may need to
compile the JSP code on the fly.  This is the usual solution for people
that get this JasperException.

It is *strongly* recommended that you use the jetty included with Solr,
not a third-party container.  That jetty has been tuned for Solr's needs.

The admin UI in 4.0 and later doesn't use JSP.

Thanks,
Shawn



Re: indexing issue

2015-06-04 Thread Shawn Heisey
On 6/4/2015 5:15 AM, Midas A wrote:
> I have some indexing issue . While indexing IOwait is high in solr server
> and load also.

My first suspect here is that you don't have enough RAM for your index size.

* How many total docs is Solr handling (all cores)?
* What is the total size on disk of all your cores?
* How much RAM does the machine have?
* What is the java max heap?

Here is some additional information on memory requirements for Solr:

https://wiki.apache.org/solr/SolrPerformanceProblems#RAM

When Alessandro asked about the load on Solr, the hope was to find out
your *rate* of indexing and querying, not the load average from the
operating system.  Indexing requires a fair amount of heap memory and
CPU resources.  If your heap is too small, then Java might have to work
extremely hard to free up memory for normal operation.

Thanks,
Shawn



Re: retrieving large number of docs

2015-06-04 Thread Alessandro Benedetti
Lets try to make clear some point :

Index TO : is the one you are using to call the select request handler
Index From : Tags
Is titleNormalized present in the "Tags" index ? Because there is where the
query will run.

The documents in tags satisfying the query will be joined with the index TO
.
The resulting documents can be filtered and faceted.
I did use this approach a lot of times.
And I can tell you it is working in this way.
Maybe you misunderstood the Join feature, or I misunderstood your
requirement.

Cheers

2015-06-04 13:27 GMT+01:00 Robust Links :

> try it for yourself and see if it works Alessandro. Not only cant i get
> facets but i even get field errors when i run such join queries
>
> select?fl=title&q={!join from=id to=id fromIndex=Tags}titleNormalized:pdf
>
> 
> undefined field titleNormalized
> 400
> 
>
>
>
>
> On Thu, Jun 4, 2015 at 5:19 AM, Alessandro Benedetti <
> benedetti.ale...@gmail.com> wrote:
>
> > Hi Rob,
> > Reading your use case I can not understand why the Query Time join is
> not a
> > fit for you !
> > The documents returned by the Query Time Join will be from core1, so
> > faceting and filter querying that core, would definitely be possible !
> > I can not see your problem honestly !
> >
> > Cheers
> >
> > 2015-06-04 1:47 GMT+01:00 Robust Links :
> >
> > > that doesnt work either, and even if it did, joining is not going to
> be a
> > > solution since i cant query 1 core and facet on the result of the
> other.
> > To
> > > sum up, my problem is
> > >
> > > core0
> > > 
> > > field:id
> > > field: text
> > >
> > > core1
> > > 
> > > field:id
> > > field tag
> > >
> > >
> > > I want to
> > >
> > > 1) query text field of core0,
> > > 2) use the {id} of matches (which can be >>10K) to retrieve the docs in
> > > core 1 with same id and
> > > 3) facet on tags in core1
> > >
> > > Is this possible without denormalizing (which is not an option)?
> > >
> > > thank you
> > >
> > > On Wed, Jun 3, 2015 at 4:24 PM, Jack Krupansky <
> jack.krupan...@gmail.com
> > >
> > > wrote:
> > >
> > > > Specify the join query parser for the main query. See:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-JoinQueryParser
> > > >
> > > >
> > > > -- Jack Krupansky
> > > >
> > > > On Wed, Jun 3, 2015 at 3:32 PM, Robust Links  >
> > > > wrote:
> > > >
> > > > > Hi Erick
> > > > >
> > > > > they are on the same JVM. I had already tried the core join
> strategy
> > > but
> > > > > that doesnt solve the faceting problem... i.e if i have 2 cores,
> > core0
> > > > and
> > > > > core1, and I run this query on core0
> > > > >
> > > > > /select?&q=fq={!join from=id1 to=id2
> > > > > fromIndex=core1}&facet=true&facet.field=tag
> > > > >
> > > > > has 2 problems
> > > > > 1) i need to specify the docIDs with the fq (so back to the same
> > > > > fq={!terms} problem), and
> > > > > 2) faceting doesnt work
> > > > >
> > > > >
> > > > > Flattening the data is not possible due to security reasons.
> > > > >
> > > > > Am I using join correctly?
> > > > >
> > > > > thank you Erick
> > > > >
> > > > > Peyman
> > > > >
> > > > > On Wed, Jun 3, 2015 at 2:12 PM, Erick Erickson <
> > > erickerick...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Are these indexes on different machines? Because if they're in
> the
> > > > > > same JVM, you might be able to use cross-core joins. Be aware,
> > > though,
> > > > > > that joining on high-cardinality fields (which, by definition,
> > docID
> > > > > > probably is) is where pseudo joins perform worst.
> > > > > >
> > > > > > Have you considered flattening the data and including whatever
> > > > > > information you have in your "from" index in your main index?
> > Because
> > > > > > < 100ms response is probably not going to be tough if you have to
> > > have
> > > > > > two indexes/cores.
> > > > > >
> > > > > > Best,
> > > > > > Erick
> > > > > >
> > > > > > On Wed, Jun 3, 2015 at 10:58 AM, Joel Bernstein <
> > joels...@gmail.com>
> > > > > > wrote:
> > > > > > > You may have to do something custom to meet your needs.
> > > > > > >
> > > > > > > 10,000 DocID's is not huge but you're latency requirement are
> > > pretty
> > > > > low.
> > > > > > >
> > > > > > > Are your DocID's by any chance integers? This can make custom
> > > > > PostFilters
> > > > > > > run much faster.
> > > > > > >
> > > > > > > You should also be aware of the Streaming API in Solr 5.1 which
> > > will
> > > > > give
> > > > > > > you fast Map/Reduce approaches (
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://joelsolr.blogspot.com/2015/04/the-streaming-api-solrjio-basics.html
> > > > > > ).
> > > > > > >
> > > > > > > Joel Bernstein
> > > > > > > http://joelsolr.blogspot.com/
> > > > > > >
> > > > > > > On Wed, Jun 3, 2015 at 1:46 PM, Robust Links <
> > > pey...@robustlinks.com
> > > > >
> > > > > > wrote:
> > > > > > >
> > > > > > >> Hey Joel
> > > > > > >>
> > > > > > >> see below
> > > > > > >>
> > > > > > >> On

Re: indexing issue

2015-06-04 Thread Midas A
Hi shawn,

Please find comment in line.

On Thu, Jun 4, 2015 at 6:48 PM, Shawn Heisey  wrote:

> On 6/4/2015 5:15 AM, Midas A wrote:
> > I have some indexing issue . While indexing IOwait is high in solr server
> > and load also.
>
> My first suspect here is that you don't have enough RAM for your index
> size.
>
> * How many total docs is Solr handling (all cores)?
>
 --30,0 dos

> * What is the total size on disk of all your cores?
>
 --  600 GB

> * How much RAM does the machine have?
>
 --48 GB

> * What is the java max heap?
> --30 GB(jvm)
> Here is some additional information on memory requirements for Solr:
>
> https://wiki.apache.org/solr/SolrPerformanceProblems#RAM
>
> When Alessandro asked about the load on Solr, the hope was to find out
> your *rate* of indexing and querying, not the load average from the
> operating system.  Indexing requires a fair amount of heap memory and
> CPU resources.  If your heap is too small, then Java might have to work
> extremely hard to free up memory for normal operation.
>
> Thanks,
> Shawn
>
>


Re: retrieving large number of docs

2015-06-04 Thread Robust Links
my requirement is to join core1 onto core0. restating the requirements
again. I have 2 cores

core0

field:id
field: text

core1

field:id
field tag


I want to

1) query text field of core0, together with filters
2) use the {id} of matches (which can be >>10K) to retrieve the docs in
core 1 with same id and
3) facet on tags in core1

so my /select is to run on core0 and facet on tag field of core1

thank you Alessandro


On Thu, Jun 4, 2015 at 9:28 AM, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> Lets try to make clear some point :
>
> Index TO : is the one you are using to call the select request handler
> Index From : Tags
> Is titleNormalized present in the "Tags" index ? Because there is where the
> query will run.
>
> The documents in tags satisfying the query will be joined with the index TO
> .
> The resulting documents can be filtered and faceted.
> I did use this approach a lot of times.
> And I can tell you it is working in this way.
> Maybe you misunderstood the Join feature, or I misunderstood your
> requirement.
>
> Cheers
>
> 2015-06-04 13:27 GMT+01:00 Robust Links :
>
> > try it for yourself and see if it works Alessandro. Not only cant i get
> > facets but i even get field errors when i run such join queries
> >
> > select?fl=title&q={!join from=id to=id fromIndex=Tags}titleNormalized:pdf
> >
> > 
> > undefined field titleNormalized
> > 400
> > 
> >
> >
> >
> >
> > On Thu, Jun 4, 2015 at 5:19 AM, Alessandro Benedetti <
> > benedetti.ale...@gmail.com> wrote:
> >
> > > Hi Rob,
> > > Reading your use case I can not understand why the Query Time join is
> > not a
> > > fit for you !
> > > The documents returned by the Query Time Join will be from core1, so
> > > faceting and filter querying that core, would definitely be possible !
> > > I can not see your problem honestly !
> > >
> > > Cheers
> > >
> > > 2015-06-04 1:47 GMT+01:00 Robust Links :
> > >
> > > > that doesnt work either, and even if it did, joining is not going to
> > be a
> > > > solution since i cant query 1 core and facet on the result of the
> > other.
> > > To
> > > > sum up, my problem is
> > > >
> > > > core0
> > > > 
> > > > field:id
> > > > field: text
> > > >
> > > > core1
> > > > 
> > > > field:id
> > > > field tag
> > > >
> > > >
> > > > I want to
> > > >
> > > > 1) query text field of core0,
> > > > 2) use the {id} of matches (which can be >>10K) to retrieve the docs
> in
> > > > core 1 with same id and
> > > > 3) facet on tags in core1
> > > >
> > > > Is this possible without denormalizing (which is not an option)?
> > > >
> > > > thank you
> > > >
> > > > On Wed, Jun 3, 2015 at 4:24 PM, Jack Krupansky <
> > jack.krupan...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Specify the join query parser for the main query. See:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-JoinQueryParser
> > > > >
> > > > >
> > > > > -- Jack Krupansky
> > > > >
> > > > > On Wed, Jun 3, 2015 at 3:32 PM, Robust Links <
> pey...@robustlinks.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Hi Erick
> > > > > >
> > > > > > they are on the same JVM. I had already tried the core join
> > strategy
> > > > but
> > > > > > that doesnt solve the faceting problem... i.e if i have 2 cores,
> > > core0
> > > > > and
> > > > > > core1, and I run this query on core0
> > > > > >
> > > > > > /select?&q=fq={!join from=id1 to=id2
> > > > > > fromIndex=core1}&facet=true&facet.field=tag
> > > > > >
> > > > > > has 2 problems
> > > > > > 1) i need to specify the docIDs with the fq (so back to the same
> > > > > > fq={!terms} problem), and
> > > > > > 2) faceting doesnt work
> > > > > >
> > > > > >
> > > > > > Flattening the data is not possible due to security reasons.
> > > > > >
> > > > > > Am I using join correctly?
> > > > > >
> > > > > > thank you Erick
> > > > > >
> > > > > > Peyman
> > > > > >
> > > > > > On Wed, Jun 3, 2015 at 2:12 PM, Erick Erickson <
> > > > erickerick...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Are these indexes on different machines? Because if they're in
> > the
> > > > > > > same JVM, you might be able to use cross-core joins. Be aware,
> > > > though,
> > > > > > > that joining on high-cardinality fields (which, by definition,
> > > docID
> > > > > > > probably is) is where pseudo joins perform worst.
> > > > > > >
> > > > > > > Have you considered flattening the data and including whatever
> > > > > > > information you have in your "from" index in your main index?
> > > Because
> > > > > > > < 100ms response is probably not going to be tough if you have
> to
> > > > have
> > > > > > > two indexes/cores.
> > > > > > >
> > > > > > > Best,
> > > > > > > Erick
> > > > > > >
> > > > > > > On Wed, Jun 3, 2015 at 10:58 AM, Joel Bernstein <
> > > joels...@gmail.com>
> > > > > > > wrote:
> > > > > > > > You may have to do something custom to meet your needs.
> > > > > > > >
> > > > > > > > 10,000 DocID

Re: When using Dismax, Solr 5.1 tries to compare the entire field to the search string, instead of only using keywords

2015-06-04 Thread Wouter Admiraal
Hi, thanks for the response.

Label field:




















I can surely optimize the above config a bit, maybe only use one
 for both query and index. But for now, this is what it
does.

Just as a side-question: is dismax *supposed* to match fields exactly
with the search query? Or is my expectation correct, meaning it should
"tokenize" the field, just as with regular searches? It just doesn't
seem intuitive to me.

Thank you again for your help.

Kind regards,
Wouter Admiraal


2015-06-04 14:52 GMT+02:00 Shawn Heisey :
> On 6/4/2015 1:22 AM, Wouter Admiraal wrote:
>> When I turn on debug, I get the following:
>>
>> "debug": {
>>   "rawquerystring": "Food",
>>   "querystring": "Food",
>>   "parsedquery": "(+DisjunctionMaxQuery((label:Food^3.0)) ())/no_coord",
>>   "parsedquery_toString": "+(label:Food^3.0) ()",
>>   "explain": {},
>>   "QParser": "DisMaxQParser",
>>   "altquerystring": null,
>>   "boostfuncs": null,
>>   ...
>> }
>>
>> I don't understand how/why this doesn't use a "contains" operator.
>> This was the behavior on the old 1.4 instance. I went through the
>> changelog for 1.4 to 5.1, but I don't find any explicit information
>> about dismax behaving differently, except the "mm" parameter needs a
>> default. I tried many values for mm (including 0, 100%, 100, etc) but
>> to no avail.
>
> In your schema.xml, what is the definition of the label field, and the
> fieldType definition of the type used in the label field?  That will
> determine exactly how the query is parsed and whether individual words
> will match.  I wasn't using dismax or edismax back when I was running
> 1.4, so I can't say anything about how it used to work, only how it
> works now.
>
> Thanks,
> Shawn
>


Re: retrieving large number of docs

2015-06-04 Thread Alessandro Benedetti
Hi Rob,
according to your use case you have to :

Call the /select from *core1 *in this way* :*

*core1*/select?fl=title&q={!join from=id to=id fromIndex=*core0*}
titleNormalized:pdf&facet=true&facet.field=tags

Hope this clarify your problem.

Cheers

2015-06-04 15:00 GMT+01:00 Robust Links :

> my requirement is to join core1 onto core0. restating the requirements
> again. I have 2 cores
>
> core0
> 
> field:id
> field: text
>
> core1
> 
> field:id
> field tag
>
>
> I want to
>
> 1) query text field of core0, together with filters
> 2) use the {id} of matches (which can be >>10K) to retrieve the docs in
> core 1 with same id and
> 3) facet on tags in core1
>
> so my /select is to run on core0 and facet on tag field of core1
>
> thank you Alessandro
>
>
> On Thu, Jun 4, 2015 at 9:28 AM, Alessandro Benedetti <
> benedetti.ale...@gmail.com> wrote:
>
> > Lets try to make clear some point :
> >
> > Index TO : is the one you are using to call the select request handler
> > Index From : Tags
> > Is titleNormalized present in the "Tags" index ? Because there is where
> the
> > query will run.
> >
> > The documents in tags satisfying the query will be joined with the index
> TO
> > .
> > The resulting documents can be filtered and faceted.
> > I did use this approach a lot of times.
> > And I can tell you it is working in this way.
> > Maybe you misunderstood the Join feature, or I misunderstood your
> > requirement.
> >
> > Cheers
> >
> > 2015-06-04 13:27 GMT+01:00 Robust Links :
> >
> > > try it for yourself and see if it works Alessandro. Not only cant i get
> > > facets but i even get field errors when i run such join queries
> > >
> > > select?fl=title&q={!join from=id to=id
> fromIndex=Tags}titleNormalized:pdf
> > >
> > > 
> > > undefined field titleNormalized
> > > 400
> > > 
> > >
> > >
> > >
> > >
> > > On Thu, Jun 4, 2015 at 5:19 AM, Alessandro Benedetti <
> > > benedetti.ale...@gmail.com> wrote:
> > >
> > > > Hi Rob,
> > > > Reading your use case I can not understand why the Query Time join is
> > > not a
> > > > fit for you !
> > > > The documents returned by the Query Time Join will be from core1, so
> > > > faceting and filter querying that core, would definitely be possible
> !
> > > > I can not see your problem honestly !
> > > >
> > > > Cheers
> > > >
> > > > 2015-06-04 1:47 GMT+01:00 Robust Links :
> > > >
> > > > > that doesnt work either, and even if it did, joining is not going
> to
> > > be a
> > > > > solution since i cant query 1 core and facet on the result of the
> > > other.
> > > > To
> > > > > sum up, my problem is
> > > > >
> > > > > core0
> > > > > 
> > > > > field:id
> > > > > field: text
> > > > >
> > > > > core1
> > > > > 
> > > > > field:id
> > > > > field tag
> > > > >
> > > > >
> > > > > I want to
> > > > >
> > > > > 1) query text field of core0,
> > > > > 2) use the {id} of matches (which can be >>10K) to retrieve the
> docs
> > in
> > > > > core 1 with same id and
> > > > > 3) facet on tags in core1
> > > > >
> > > > > Is this possible without denormalizing (which is not an option)?
> > > > >
> > > > > thank you
> > > > >
> > > > > On Wed, Jun 3, 2015 at 4:24 PM, Jack Krupansky <
> > > jack.krupan...@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Specify the join query parser for the main query. See:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-JoinQueryParser
> > > > > >
> > > > > >
> > > > > > -- Jack Krupansky
> > > > > >
> > > > > > On Wed, Jun 3, 2015 at 3:32 PM, Robust Links <
> > pey...@robustlinks.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Erick
> > > > > > >
> > > > > > > they are on the same JVM. I had already tried the core join
> > > strategy
> > > > > but
> > > > > > > that doesnt solve the faceting problem... i.e if i have 2
> cores,
> > > > core0
> > > > > > and
> > > > > > > core1, and I run this query on core0
> > > > > > >
> > > > > > > /select?&q=fq={!join from=id1 to=id2
> > > > > > > fromIndex=core1}&facet=true&facet.field=tag
> > > > > > >
> > > > > > > has 2 problems
> > > > > > > 1) i need to specify the docIDs with the fq (so back to the
> same
> > > > > > > fq={!terms} problem), and
> > > > > > > 2) faceting doesnt work
> > > > > > >
> > > > > > >
> > > > > > > Flattening the data is not possible due to security reasons.
> > > > > > >
> > > > > > > Am I using join correctly?
> > > > > > >
> > > > > > > thank you Erick
> > > > > > >
> > > > > > > Peyman
> > > > > > >
> > > > > > > On Wed, Jun 3, 2015 at 2:12 PM, Erick Erickson <
> > > > > erickerick...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Are these indexes on different machines? Because if they're
> in
> > > the
> > > > > > > > same JVM, you might be able to use cross-core joins. Be
> aware,
> > > > > though,
> > > > > > > > that joining on high-cardinality fields (which, by
> definition,
> > > > docID
> > > > > > > > probably is) i

Re: Derive suggestions across multiple fields

2015-06-04 Thread Alessandro Benedetti
Please remember this :

"to be used as the basis for a suggestion, the field must be stored"

>From the official guide.

Cheers

2015-06-04 11:19 GMT+01:00 Alessandro Benedetti 
:

> If you are using an existing indexed field to provide suggestions, you
> simply need to build the suggester and start using it !
> No re-indexing needed .
>
> Cheers
>
> 2015-06-04 11:01 GMT+01:00 Zheng Lin Edwin Yeo :
>
>> I think I'm confused with the old spellcheck approach that came out more
>> frequently during my research.
>>
>> Just to confirm, do I need to re-index the data in order for this new
>> approach to work if I'm using an existing field?
>>
>>
>> Regards,
>> Edwin
>>
>>
>> On 4 June 2015 at 16:58, Alessandro Benedetti > >
>> wrote:
>>
>> > Let me try to clarify the things…
>> > Because you are using solr 5.1 I can not see any reason to try to use
>> the
>> > old spellcheck approach.
>> > If you take a look to the page me and Erick quoted there is a simple
>> config
>> > example :
>> >
>> > 
>> > > 
>> > > mySuggester
>> > > FuzzyLookupFactory
>> > > suggester_fuzzy_dir
>> > > 
>> > > DocumentDictionaryFactory
>> > > title
>> > > suggestType
>> > > false
>> > > false
>> > > 
>> > > 
>> > >
>> >
>> >
>> > > > > > startup="lazy" >
>> > > 
>> > > true
>> > > 10
>> > > mySuggester
>> > > 
>> > > 
>> > > suggest
>> > > 
>> > > 
>> >
>> >
>> > You should use this approach.
>> > After you build the Suggestion Dictionary ( after your first commit or
>> > manually) you are going to be able to see the suggestions.
>> >
>> > Your config appears to be very confused ( why an edismax query parser
>> for a
>> > suggestion request handler ? )
>> >
>> > To answer do Dalnesh, there is no benefit in explicitly expressing again
>> > the query parameters, they are already appended if you take a look to
>> Edwin
>> > config, so this will not solve anything.
>> >
>> > I would suggest you to use the latest approach and then verify the
>> > suggester building went fine.
>> >
>> > Cheers
>> >
>> > 2015-06-04 9:13 GMT+01:00 Zheng Lin Edwin Yeo :
>> >
>> > > This is the result that I get from the query URL you mentioned. Still
>> not
>> > > able to get any output.
>> > >
>> > > 
>> > > 
>> > >   
>> > > 0
>> > > 0
>> > >   
>> > > true
>> > > mater
>> > > true
>> > > suggest
>> > > xml
>> > >   
>> > > 
>> > > 
>> > >
>> > >
>> > > Regards,
>> > > Edwin
>> > >
>> > >
>> > >
>> > > On 4 June 2015 at 15:26, Dhanesh Radhakrishnan 
>> > wrote:
>> > >
>> > > > Try this
>> > > >
>> > > >
>> > > >
>> > >
>> >
>> http://localhost:8983/solr/collection1/suggest?suggest=true&suggest.dictionary=suggest&suggest.build=true&wt=xml&suggest.q=mater
>> > > >
>> > > > On Thu, Jun 4, 2015 at 11:53 AM, Zheng Lin Edwin Yeo <
>> > > edwinye...@gmail.com
>> > > > >
>> > > > wrote:
>> > > >
>> > > > > I've tried to use the solr.SuggestComponent as stated in the
>> website,
>> > > but
>> > > > > it couldn't work.
>> > > > >
>> > > > > When I change to using the suggest with the configuration below
>> and
>> > go
>> > > a
>> > > > > query like http://localhost:8983/solr/collection1/suggest?q=mater
>> ,
>> > it
>> > > > says
>> > > > > "The Webpage cannot be found"
>> > > > >
>> > > > >   
>> > > > > 
>> > > > >   suggest
>> > > > >   > > > > > name="classname">org.apache.solr.spelling.suggest.Suggester
>> > > > >   > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookupFactory
>> > > > >   text  
>> > > > >   true
>> > > > > 
>> > > > >   
>> > > > >   > > > class="org.apache.solr.handler.component.SearchHandler"
>> > > > > name="/suggest">
>> > > > > 
>> > > > >explicit
>> > > > >   edismax
>> > > > >10
>> > > > >json
>> > > > >true
>> > > > >
>> > > > >   true
>> > > > >   suggest
>> > > > >   5
>> > > > >   true
>> > > > > 
>> > > > > 
>> > > > >   suggest
>> > > > > 
>> > > > >   
>> > > > >
>> > > > >
>> > > > > Regards,
>> > > > > Edwin
>> > > > >
>> > > > >
>> > > > > On 4 June 2015 at 13:21, Erick Erickson 
>> > > wrote:
>> > > > >
>> > > > > > This may be helpful: http://lucidworks.com/blog/solr-suggester/
>> > > > > >
>> > > > > > Note that there are a series of fixes in various versions of
>> Solr,
>> > > > > > particularly buildOnStartup=false and working on multivalued
>> > fields.
>> > > > > >
>> > > > > > Best,
>> > > > > > Erick
>> > > > > >
>> > > > > > On Wed, Jun 3, 2015 at 8:04 PM, Zheng Lin Edwin Yeo
>> > > > > >  wrote:
>> > > > > > > My previous suggester configuration is derived from this page:
>> > > > > > > https://wiki.apache.org/solr/Suggester
>> > > > > > >
>> > > > > > > Does it mean that what is written there is outdated?
>> > > > > > >
>> > > > > > > Regards,
>> > > > > > > Edwin
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > On 3 June 2015 at 23:44, Zheng Lin Edwin Yeo <
>> > edwinye...@gmail.com
>> > > >
>> > > > > > wrote:
>> > > > > > >
>> > > > 

Re: Derive suggestions across multiple fields

2015-06-04 Thread Zheng Lin Edwin Yeo
Thank you so much for your advice.

Regards,
Edwin

On 4 June 2015 at 22:30, Alessandro Benedetti 
wrote:

> Please remember this :
>
> "to be used as the basis for a suggestion, the field must be stored"
>
> From the official guide.
>
> Cheers
>
> 2015-06-04 11:19 GMT+01:00 Alessandro Benedetti <
> benedetti.ale...@gmail.com>
> :
>
> > If you are using an existing indexed field to provide suggestions, you
> > simply need to build the suggester and start using it !
> > No re-indexing needed .
> >
> > Cheers
> >
> > 2015-06-04 11:01 GMT+01:00 Zheng Lin Edwin Yeo :
> >
> >> I think I'm confused with the old spellcheck approach that came out more
> >> frequently during my research.
> >>
> >> Just to confirm, do I need to re-index the data in order for this new
> >> approach to work if I'm using an existing field?
> >>
> >>
> >> Regards,
> >> Edwin
> >>
> >>
> >> On 4 June 2015 at 16:58, Alessandro Benedetti <
> benedetti.ale...@gmail.com
> >> >
> >> wrote:
> >>
> >> > Let me try to clarify the things…
> >> > Because you are using solr 5.1 I can not see any reason to try to use
> >> the
> >> > old spellcheck approach.
> >> > If you take a look to the page me and Erick quoted there is a simple
> >> config
> >> > example :
> >> >
> >> > 
> >> > > 
> >> > > mySuggester
> >> > > FuzzyLookupFactory
> >> > > suggester_fuzzy_dir
> >> > > 
> >> > > DocumentDictionaryFactory
> >> > > title
> >> > > suggestType
> >> > > false
> >> > > false
> >> > > 
> >> > > 
> >> > >
> >> >
> >> >
> >> > >  >> > > startup="lazy" >
> >> > > 
> >> > > true
> >> > > 10
> >> > > mySuggester
> >> > > 
> >> > > 
> >> > > suggest
> >> > > 
> >> > > 
> >> >
> >> >
> >> > You should use this approach.
> >> > After you build the Suggestion Dictionary ( after your first commit or
> >> > manually) you are going to be able to see the suggestions.
> >> >
> >> > Your config appears to be very confused ( why an edismax query parser
> >> for a
> >> > suggestion request handler ? )
> >> >
> >> > To answer do Dalnesh, there is no benefit in explicitly expressing
> again
> >> > the query parameters, they are already appended if you take a look to
> >> Edwin
> >> > config, so this will not solve anything.
> >> >
> >> > I would suggest you to use the latest approach and then verify the
> >> > suggester building went fine.
> >> >
> >> > Cheers
> >> >
> >> > 2015-06-04 9:13 GMT+01:00 Zheng Lin Edwin Yeo :
> >> >
> >> > > This is the result that I get from the query URL you mentioned.
> Still
> >> not
> >> > > able to get any output.
> >> > >
> >> > > 
> >> > > 
> >> > >   
> >> > > 0
> >> > > 0
> >> > >   
> >> > > true
> >> > > mater
> >> > > true
> >> > > suggest
> >> > > xml
> >> > >   
> >> > > 
> >> > > 
> >> > >
> >> > >
> >> > > Regards,
> >> > > Edwin
> >> > >
> >> > >
> >> > >
> >> > > On 4 June 2015 at 15:26, Dhanesh Radhakrishnan 
> >> > wrote:
> >> > >
> >> > > > Try this
> >> > > >
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> http://localhost:8983/solr/collection1/suggest?suggest=true&suggest.dictionary=suggest&suggest.build=true&wt=xml&suggest.q=mater
> >> > > >
> >> > > > On Thu, Jun 4, 2015 at 11:53 AM, Zheng Lin Edwin Yeo <
> >> > > edwinye...@gmail.com
> >> > > > >
> >> > > > wrote:
> >> > > >
> >> > > > > I've tried to use the solr.SuggestComponent as stated in the
> >> website,
> >> > > but
> >> > > > > it couldn't work.
> >> > > > >
> >> > > > > When I change to using the suggest with the configuration below
> >> and
> >> > go
> >> > > a
> >> > > > > query like
> http://localhost:8983/solr/collection1/suggest?q=mater
> >> ,
> >> > it
> >> > > > says
> >> > > > > "The Webpage cannot be found"
> >> > > > >
> >> > > > >   
> >> > > > > 
> >> > > > >   suggest
> >> > > > >>> > > > >
> name="classname">org.apache.solr.spelling.suggest.Suggester
> >> > > > >>> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookupFactory
> >> > > > >   text  
> >> > > > >   true
> >> > > > > 
> >> > > > >   
> >> > > > >>> > > class="org.apache.solr.handler.component.SearchHandler"
> >> > > > > name="/suggest">
> >> > > > > 
> >> > > > >explicit
> >> > > > >   edismax
> >> > > > >10
> >> > > > >json
> >> > > > >true
> >> > > > >
> >> > > > >   true
> >> > > > >   suggest
> >> > > > >   5
> >> > > > >   true
> >> > > > > 
> >> > > > > 
> >> > > > >   suggest
> >> > > > > 
> >> > > > >   
> >> > > > >
> >> > > > >
> >> > > > > Regards,
> >> > > > > Edwin
> >> > > > >
> >> > > > >
> >> > > > > On 4 June 2015 at 13:21, Erick Erickson <
> erickerick...@gmail.com>
> >> > > wrote:
> >> > > > >
> >> > > > > > This may be helpful:
> http://lucidworks.com/blog/solr-suggester/
> >> > > > > >
> >> > > > > > Note that there are a series of fixes in various versions of
> >> Solr,
> >> > > > > > particularly buildOnStartup=false and working on multivalued
> >> > fields.
> >> > > > 

List all Collections together with number of records

2015-06-04 Thread Zheng Lin Edwin Yeo
Hi,

Would like to check, are we able to use the Collection API or any other
method to list all the collections in the cluster together with the number
of records in each of the collections in one output?

Currently, I only know of the List Collections
/admin/collections?action=LIST. However, this only list the names of the
collections that are in the cluster, but not the number of records.

Is there a way to show the number of records in each of the collections as
well?

Regards,
Edwin


Re: When using Dismax, Solr 5.1 tries to compare the entire field to the search string, instead of only using keywords

2015-06-04 Thread Jack Krupansky
The empty parentheses in the parsed query says something odd is going on
with query-time analysis, that is essentially generating an empty term.
That may not be the cause of your specific issue, but at least its says
that something is unexplained here.

Generally, there is an asymmetry between the index and query analyzers when
the word delimiter filter is used - at index time you typically generate
extra terms to aid in recall, while at query time the extra terms are not
generated to aid in precision. In particular, you would just generate the
word and number parts, and not preserve the original token. But... that
should not matter if there is only a single query term. So, something else
is going on here.

-- Jack Krupansky

On Thu, Jun 4, 2015 at 10:03 AM, Wouter Admiraal  wrote:

> Hi, thanks for the response.
>
> Label field:
>  termVectors="true" omitNorms="true"/>
>
> 
> 
> 
>  words="txt/stopwords.txt" />
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"
> preserveOriginal="1"/>
> 
> 
>  maxGramSize="25"/>
> 
> 
> 
>  synonyms="txt/synonyms.txt" ignoreCase="true" expand="true"/>
>  words="txt/stopwords.txt" />
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"
> preserveOriginal="1"/>
> 
> 
> 
>
> I can surely optimize the above config a bit, maybe only use one
>  for both query and index. But for now, this is what it
> does.
>
> Just as a side-question: is dismax *supposed* to match fields exactly
> with the search query? Or is my expectation correct, meaning it should
> "tokenize" the field, just as with regular searches? It just doesn't
> seem intuitive to me.
>
> Thank you again for your help.
>
> Kind regards,
> Wouter Admiraal
>
>
> 2015-06-04 14:52 GMT+02:00 Shawn Heisey :
> > On 6/4/2015 1:22 AM, Wouter Admiraal wrote:
> >> When I turn on debug, I get the following:
> >>
> >> "debug": {
> >>   "rawquerystring": "Food",
> >>   "querystring": "Food",
> >>   "parsedquery": "(+DisjunctionMaxQuery((label:Food^3.0)) ())/no_coord",
> >>   "parsedquery_toString": "+(label:Food^3.0) ()",
> >>   "explain": {},
> >>   "QParser": "DisMaxQParser",
> >>   "altquerystring": null,
> >>   "boostfuncs": null,
> >>   ...
> >> }
> >>
> >> I don't understand how/why this doesn't use a "contains" operator.
> >> This was the behavior on the old 1.4 instance. I went through the
> >> changelog for 1.4 to 5.1, but I don't find any explicit information
> >> about dismax behaving differently, except the "mm" parameter needs a
> >> default. I tried many values for mm (including 0, 100%, 100, etc) but
> >> to no avail.
> >
> > In your schema.xml, what is the definition of the label field, and the
> > fieldType definition of the type used in the label field?  That will
> > determine exactly how the query is parsed and whether individual words
> > will match.  I wasn't using dismax or edismax back when I was running
> > 1.4, so I can't say anything about how it used to work, only how it
> > works now.
> >
> > Thanks,
> > Shawn
> >
>


Re: retrieving large number of docs

2015-06-04 Thread Robust Links
that worked but seem unable to run

1) phrase queries: i.e.

*core1*/select?fl=title&q={!join from=id to=id fromIndex=*core0*}
titleNormalized:"*text pdf*"&facet=true&facet.field=tags

or 2) run filters on core0

*core1*/select?fl=title&q={!join from=id to=id fromIndex=*core0*}
titleNormalized:"*text pdf*"&fq=user:76&facet=true&facet.field=tags

i am thinking a better design is to build a custom searchcomponent on core0
and add it as the last-component to the default search component on core0
(both cores are on the same JVM). the custom core aware component will
access core1 as follows:

// inform of core0 //

public void inform(SolrCore core){

  SolrCore core1 = core.getCoreDescriptor().getCoreContainer().getCore(
"core1");

   SolrIndexSearcher = core1.getNewestSearcher(false).get();

}

then i intercept the default search handler

public void process(ResponseBuilder rb) throws IOException {

   SolrIndexSearcher core0 = rb.req.getSearcher();

   SolrParams params = rb.req.getParams();

   Iterator docIt = rb.getResults().docList.iterator();

   String tagname;

   String id;

   while(docIt.hasNext())

   {

Integer docID = docIt.next();

  id = core0.doc(docID).get("id");

  tagname = doc.search(id);

.

do faceting on the docs;

 }

   }





On Thu, Jun 4, 2015 at 10:29 AM, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> Hi Rob,
> according to your use case you have to :
>
> Call the /select from *core1 *in this way* :*
>
> *core1*/select?fl=title&q={!join from=id to=id fromIndex=*core0*}
> titleNormalized:pdf&facet=true&facet.field=tags
>
> Hope this clarify your problem.
>
> Cheers
>
> 2015-06-04 15:00 GMT+01:00 Robust Links :
>
> > my requirement is to join core1 onto core0. restating the requirements
> > again. I have 2 cores
> >
> > core0
> > 
> > field:id
> > field: text
> >
> > core1
> > 
> > field:id
> > field tag
> >
> >
> > I want to
> >
> > 1) query text field of core0, together with filters
> > 2) use the {id} of matches (which can be >>10K) to retrieve the docs in
> > core 1 with same id and
> > 3) facet on tags in core1
> >
> > so my /select is to run on core0 and facet on tag field of core1
> >
> > thank you Alessandro
> >
> >
> > On Thu, Jun 4, 2015 at 9:28 AM, Alessandro Benedetti <
> > benedetti.ale...@gmail.com> wrote:
> >
> > > Lets try to make clear some point :
> > >
> > > Index TO : is the one you are using to call the select request handler
> > > Index From : Tags
> > > Is titleNormalized present in the "Tags" index ? Because there is where
> > the
> > > query will run.
> > >
> > > The documents in tags satisfying the query will be joined with the
> index
> > TO
> > > .
> > > The resulting documents can be filtered and faceted.
> > > I did use this approach a lot of times.
> > > And I can tell you it is working in this way.
> > > Maybe you misunderstood the Join feature, or I misunderstood your
> > > requirement.
> > >
> > > Cheers
> > >
> > > 2015-06-04 13:27 GMT+01:00 Robust Links :
> > >
> > > > try it for yourself and see if it works Alessandro. Not only cant i
> get
> > > > facets but i even get field errors when i run such join queries
> > > >
> > > > select?fl=title&q={!join from=id to=id
> > fromIndex=Tags}titleNormalized:pdf
> > > >
> > > > 
> > > > undefined field titleNormalized
> > > > 400
> > > > 
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Jun 4, 2015 at 5:19 AM, Alessandro Benedetti <
> > > > benedetti.ale...@gmail.com> wrote:
> > > >
> > > > > Hi Rob,
> > > > > Reading your use case I can not understand why the Query Time join
> is
> > > > not a
> > > > > fit for you !
> > > > > The documents returned by the Query Time Join will be from core1,
> so
> > > > > faceting and filter querying that core, would definitely be
> possible
> > !
> > > > > I can not see your problem honestly !
> > > > >
> > > > > Cheers
> > > > >
> > > > > 2015-06-04 1:47 GMT+01:00 Robust Links :
> > > > >
> > > > > > that doesnt work either, and even if it did, joining is not going
> > to
> > > > be a
> > > > > > solution since i cant query 1 core and facet on the result of the
> > > > other.
> > > > > To
> > > > > > sum up, my problem is
> > > > > >
> > > > > > core0
> > > > > > 
> > > > > > field:id
> > > > > > field: text
> > > > > >
> > > > > > core1
> > > > > > 
> > > > > > field:id
> > > > > > field tag
> > > > > >
> > > > > >
> > > > > > I want to
> > > > > >
> > > > > > 1) query text field of core0,
> > > > > > 2) use the {id} of matches (which can be >>10K) to retrieve the
> > docs
> > > in
> > > > > > core 1 with same id and
> > > > > > 3) facet on tags in core1
> > > > > >
> > > > > > Is this possible without denormalizing (which is not an option)?
> > > > > >
> > > > > > thank you
> > > > > >
> > > > > > On Wed, Jun 3, 2015 at 4:24 PM, Jack Krupansky <
> > > > jack.krupan...@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Specify the join query parser for the main query. See:
> > > > > > >
> > > > 

Re: How can I parse the TermVectorComponent response in SolrJ

2015-06-04 Thread Majid Laali
Hi, 

Based on a few hours googling, I concluded that there is no class in SOLR 5.1 
that can parser JSON response of The Term Vector Component. 
I am not sure if it is fine to create an issue in the SOLR JIRA website and 
make patch to address it. 

I would be grateful to get any advice for that.

Thanks, 
Majid



> On May 22, 2015, at 5:49 PM, Majid Laali  wrote:
> 
> Hi, 
> 
> I have a java program that sends a query to solr and get the term vector of a 
> document. Something like this:
> 
> SolrQuery solrQuery = new SolrQuery();
> solrQuery.setRequestHandler("/tvrh");
> solrQuery.setQuery("id:" + id);
> solrQuery.setParam("fl", textField);
> solrQuery.setParam("tv.tf", "true");
> solrQuery.setParam("tv.df", "true");
> solrQuery.setParam("tv.tf_idf", "true");
> solrQuery.setRows(1);
> QueryResponse docTVResponse = solrClient.query(solrQuery);
> Object termVectors = docTVResponse.getResponse().get("termVectors”);
> 
> I am wondering if there is a class that can wrap the termVecotrs object so 
> that I can access to tf, idf of terms or I have to manually parse the json 
> response of TermVectoreComponent. 
> 
> Thanks in advance, 
> Majid
> 
> 
> /***
>  *   Majid Laali, Ph.D. Candidate, 
>  *   Computer Science & Software Engineering Department
>  *   Concordia University
> ***/



Re: When using Dismax, Solr 5.1 tries to compare the entire field to the search string, instead of only using keywords

2015-06-04 Thread Wouter Admiraal
Thanks for the reply.

So, as an aside, should I remove the solr.WhitespaceTokenizerFactory
and solr.WordDelimiterFilterFactory from the query analyzer part?

Any idea in which direction I should poke around? I deactivated dismax
for now, but would really like to use it.


Wouter Admiraal


2015-06-04 16:54 GMT+02:00 Jack Krupansky :
> The empty parentheses in the parsed query says something odd is going on
> with query-time analysis, that is essentially generating an empty term.
> That may not be the cause of your specific issue, but at least its says
> that something is unexplained here.
>
> Generally, there is an asymmetry between the index and query analyzers when
> the word delimiter filter is used - at index time you typically generate
> extra terms to aid in recall, while at query time the extra terms are not
> generated to aid in precision. In particular, you would just generate the
> word and number parts, and not preserve the original token. But... that
> should not matter if there is only a single query term. So, something else
> is going on here.
>
> -- Jack Krupansky
>
> On Thu, Jun 4, 2015 at 10:03 AM, Wouter Admiraal  wrote:
>
>> Hi, thanks for the response.
>>
>> Label field:
>> > termVectors="true" omitNorms="true"/>
>>
>> 
>> 
>> 
>> > words="txt/stopwords.txt" />
>> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"
>> preserveOriginal="1"/>
>> 
>> 
>> > maxGramSize="25"/>
>> 
>> 
>> 
>> > synonyms="txt/synonyms.txt" ignoreCase="true" expand="true"/>
>> > words="txt/stopwords.txt" />
>> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"
>> preserveOriginal="1"/>
>> 
>> 
>> 
>>
>> I can surely optimize the above config a bit, maybe only use one
>>  for both query and index. But for now, this is what it
>> does.
>>
>> Just as a side-question: is dismax *supposed* to match fields exactly
>> with the search query? Or is my expectation correct, meaning it should
>> "tokenize" the field, just as with regular searches? It just doesn't
>> seem intuitive to me.
>>
>> Thank you again for your help.
>>
>> Kind regards,
>> Wouter Admiraal
>>
>>
>> 2015-06-04 14:52 GMT+02:00 Shawn Heisey :
>> > On 6/4/2015 1:22 AM, Wouter Admiraal wrote:
>> >> When I turn on debug, I get the following:
>> >>
>> >> "debug": {
>> >>   "rawquerystring": "Food",
>> >>   "querystring": "Food",
>> >>   "parsedquery": "(+DisjunctionMaxQuery((label:Food^3.0)) ())/no_coord",
>> >>   "parsedquery_toString": "+(label:Food^3.0) ()",
>> >>   "explain": {},
>> >>   "QParser": "DisMaxQParser",
>> >>   "altquerystring": null,
>> >>   "boostfuncs": null,
>> >>   ...
>> >> }
>> >>
>> >> I don't understand how/why this doesn't use a "contains" operator.
>> >> This was the behavior on the old 1.4 instance. I went through the
>> >> changelog for 1.4 to 5.1, but I don't find any explicit information
>> >> about dismax behaving differently, except the "mm" parameter needs a
>> >> default. I tried many values for mm (including 0, 100%, 100, etc) but
>> >> to no avail.
>> >
>> > In your schema.xml, what is the definition of the label field, and the
>> > fieldType definition of the type used in the label field?  That will
>> > determine exactly how the query is parsed and whether individual words
>> > will match.  I wasn't using dismax or edismax back when I was running
>> > 1.4, so I can't say anything about how it used to work, only how it
>> > works now.
>> >
>> > Thanks,
>> > Shawn
>> >
>>


Re: When using Dismax, Solr 5.1 tries to compare the entire field to the search string, instead of only using keywords

2015-06-04 Thread Erik Hatcher
The debug parsed queries for the various ways you've tried it would be helpful. 
 dismax uses the query analysis of each of the fields and the fact that label 
does not appear lowercased indicates something fishy like changing the 
definition after indexing maybe.  Try the admin analysis UI for that field to 
see how it works at both index and query times. 

   Erik

> On Jun 4, 2015, at 11:39, Wouter Admiraal  wrote:
> 
> Thanks for the reply.
> 
> So, as an aside, should I remove the solr.WhitespaceTokenizerFactory
> and solr.WordDelimiterFilterFactory from the query analyzer part?
> 
> Any idea in which direction I should poke around? I deactivated dismax
> for now, but would really like to use it.
> 
> 
> Wouter Admiraal
> 
> 
> 2015-06-04 16:54 GMT+02:00 Jack Krupansky :
>> The empty parentheses in the parsed query says something odd is going on
>> with query-time analysis, that is essentially generating an empty term.
>> That may not be the cause of your specific issue, but at least its says
>> that something is unexplained here.
>> 
>> Generally, there is an asymmetry between the index and query analyzers when
>> the word delimiter filter is used - at index time you typically generate
>> extra terms to aid in recall, while at query time the extra terms are not
>> generated to aid in precision. In particular, you would just generate the
>> word and number parts, and not preserve the original token. But... that
>> should not matter if there is only a single query term. So, something else
>> is going on here.
>> 
>> -- Jack Krupansky
>> 
>>> On Thu, Jun 4, 2015 at 10:03 AM, Wouter Admiraal  wrote:
>>> 
>>> Hi, thanks for the response.
>>> 
>>> Label field:
>>> >> termVectors="true" omitNorms="true"/>
>>> 
>>> 
>>>
>>>
>>>>> words="txt/stopwords.txt" />
>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"
>>> preserveOriginal="1"/>
>>>
>>>
>>>>> maxGramSize="25"/>
>>>
>>>
>>>
>>>>> synonyms="txt/synonyms.txt" ignoreCase="true" expand="true"/>
>>>>> words="txt/stopwords.txt" />
>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"
>>> preserveOriginal="1"/>
>>>
>>>
>>> 
>>> 
>>> I can surely optimize the above config a bit, maybe only use one
>>>  for both query and index. But for now, this is what it
>>> does.
>>> 
>>> Just as a side-question: is dismax *supposed* to match fields exactly
>>> with the search query? Or is my expectation correct, meaning it should
>>> "tokenize" the field, just as with regular searches? It just doesn't
>>> seem intuitive to me.
>>> 
>>> Thank you again for your help.
>>> 
>>> Kind regards,
>>> Wouter Admiraal
>>> 
>>> 
>>> 2015-06-04 14:52 GMT+02:00 Shawn Heisey :
> On 6/4/2015 1:22 AM, Wouter Admiraal wrote:
> When I turn on debug, I get the following:
> 
> "debug": {
>  "rawquerystring": "Food",
>  "querystring": "Food",
>  "parsedquery": "(+DisjunctionMaxQuery((label:Food^3.0)) ())/no_coord",
>  "parsedquery_toString": "+(label:Food^3.0) ()",
>  "explain": {},
>  "QParser": "DisMaxQParser",
>  "altquerystring": null,
>  "boostfuncs": null,
>  ...
> }
> 
> I don't understand how/why this doesn't use a "contains" operator.
> This was the behavior on the old 1.4 instance. I went through the
> changelog for 1.4 to 5.1, but I don't find any explicit information
> about dismax behaving differently, except the "mm" parameter needs a
> default. I tried many values for mm (including 0, 100%, 100, etc) but
> to no avail.
 
 In your schema.xml, what is the definition of the label field, and the
 fieldType definition of the type used in the label field?  That will
 determine exactly how the query is parsed and whether individual words
 will match.  I wasn't using dismax or edismax back when I was running
 1.4, so I can't say anything about how it used to work, only how it
 works now.
 
 Thanks,
 Shawn
>>> 


Re: Solr Atomic Updates by Query

2015-06-04 Thread Erick Erickson
There is no equivalent of, say a SQL update...where... so no, atomic
updates by query...

Best,
Erick

On Thu, Jun 4, 2015 at 2:49 AM, Ксения Баталова  wrote:
> Hi!
>
> I have one more question about atomic updates in Solr (Solr 4.4.0).
> Is it posible to generate atomic update by query?
> I mean I want to update those documents in which IDs contain some string.
> For example, index has:
> Doc1, id="123|a,b"
> Doc2, id="123|a,c"
> Doc3, id="345|a,b"
> Doc4, id="345|a,c,d".
>
> And if I don't want to generate all IDs to update, but I know that
> necessary IDs start with "123".
> I tried to generate query something like that (using *):
>
> {"id":"123|*",
>  "price":{"set":99}
> }
>
> But in result the document with id="123|*" was added.
> Can I do this somehow?
>
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>
> Best regards,
> Batalova Kseniya


Re: indexing issue

2015-06-04 Thread Shawn Heisey
On 6/4/2015 7:38 AM, Midas A wrote:
> On Thu, Jun 4, 2015 at 6:48 PM, Shawn Heisey  wrote:
>
>> On 6/4/2015 5:15 AM, Midas A wrote:
>>> I have some indexing issue . While indexing IOwait is high in solr server
>>> and load also.
>> My first suspect here is that you don't have enough RAM for your index
>> size.
>>
>> * How many total docs is Solr handling (all cores)?
>>
>  --30,0 dos
>
>> * What is the total size on disk of all your cores?
>>
>  --  600 GB
>
>> * How much RAM does the machine have?
>>
>  --48 GB
>
>> * What is the java max heap?
>> --30 GB(jvm)

Is that 3 million docs or 30 million docs?  The actual numbers are 3
million, but you put a single comma in the number after the 30, so I am
not sure which you meant.  Either way, those documents must be quite
large, to make a 600GB index.  30 million docs in my index would only be
about 30GB.

With 48 GB of RAM, 30 GB allocated to Solr, and a 600GB index, you don't
have anywhere even close to enough RAM to cache your index effectively. 
There's only 18GB of RAM left over for the OS disk cache.  That's only 3
percent of the index data that can fit in the OS disk cache.  I would
imagine that you're going to need to be able to fit somewhere between 25
and 50 percent of the index into RAM, which would mean that you're going
to want around 256GB of RAM for that index. 128GB *might* be enough. 
Alternatively, you could work on making your index smaller -- but be
aware that to improve performance with low memory, you need to reduce
the *indexed* part, the *stored* part makes little difference.

Another potential problem with a 30GB heap is related to garbage
collection tuning.  If you haven't tuned your GC at all, then
performance will be terrible on a heap that large, especially when you
are indexing.  The wiki page I linked on my previous reply contains a
link to my personal page, which covers GC tuning:

https://wiki.apache.org/solr/ShawnHeisey

Thanks,
Shawn



Re: Index optimize runs in background.

2015-06-04 Thread Erick Erickson
Can't get any failures to happen on my end so I really haven't a clue.

Best,
Erick

On Thu, Jun 4, 2015 at 3:17 AM, Modassar Ather  wrote:
> Hi,
>
> Please provide your inputs on optimize and commit running as background.
> Your suggestion will be really helpful.
>
> Thanks,
> Modassar
>
> On Tue, Jun 2, 2015 at 6:05 PM, Modassar Ather 
> wrote:
>
>> Erick! I could not find any underlying setting of 10 minutes.
>> It is not only optimize but commit is also behaving in the same fashion
>> and is taking lesser time than usually had taken.
>> As per my observation both are running in background.
>>
>> On Fri, May 29, 2015 at 7:21 PM, Erick Erickson 
>> wrote:
>>
>>> I'm not talking about you setting a timeout, but the underlying
>>> connection timing out...
>>>
>>> The "10 minutes then the indexer exits" comment points in that direction.
>>>
>>> Best,
>>> Erick
>>>
>>> On Thu, May 28, 2015 at 11:43 PM, Modassar Ather 
>>> wrote:
>>> > I have not added any timeout in the indexer except zk client time out
>>> which
>>> > is 30 seconds. I am simply calling client.close() at the end of
>>> indexing.
>>> > The same code was not running in background for optimize with
>>> solr-4.10.3
>>> > and org.apache.solr.client.solrj.impl.CloudSolrServer.
>>> >
>>> > On Fri, May 29, 2015 at 11:13 AM, Erick Erickson <
>>> erickerick...@gmail.com>
>>> > wrote:
>>> >
>>> >> Are you timing out on the client request? The theory here is that it's
>>> >> still a synchronous call, but you're just timing out at the client
>>> >> level. At that point, the optimize is still running it's just the
>>> >> connection has been dropped
>>> >>
>>> >> Shot in the dark.
>>> >> Erick
>>> >>
>>> >> On Thu, May 28, 2015 at 10:31 PM, Modassar Ather <
>>> modather1...@gmail.com>
>>> >> wrote:
>>> >> > I could not notice it but with my past experience of commit which
>>> used to
>>> >> > take around 2 minutes is now taking around 8 seconds. I think this is
>>> >> also
>>> >> > running as background.
>>> >> >
>>> >> > On Fri, May 29, 2015 at 10:52 AM, Modassar Ather <
>>> modather1...@gmail.com
>>> >> >
>>> >> > wrote:
>>> >> >
>>> >> >> The indexer takes almost 2 hours to optimize. It has a
>>> multi-threaded
>>> >> add
>>> >> >> of batches of documents to
>>> >> >> org.apache.solr.client.solrj.impl.CloudSolrClient.
>>> >> >> Once all the documents are indexed it invokes commit and optimize. I
>>> >> have
>>> >> >> seen that the optimize goes into background after 10 minutes and
>>> indexer
>>> >> >> exits.
>>> >> >> I am not sure why this 10 minutes it hangs on indexer. This
>>> behavior I
>>> >> >> have seen in multiple iteration of the indexing of same data.
>>> >> >>
>>> >> >> There is nothing significant I found in log which I can share. I
>>> can see
>>> >> >> following in log.
>>> >> >> org.apache.solr.update.DirectUpdateHandler2; start
>>> >> >>
>>> >>
>>> commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
>>> >> >>
>>> >> >> On Wed, May 27, 2015 at 10:59 PM, Erick Erickson <
>>> >> erickerick...@gmail.com>
>>> >> >> wrote:
>>> >> >>
>>> >> >>> All strange of course. What do your Solr logs show when this
>>> happens?
>>> >> >>> And how reproducible is this?
>>> >> >>>
>>> >> >>> Best,
>>> >> >>> Erick
>>> >> >>>
>>> >> >>> On Wed, May 27, 2015 at 4:00 AM, Upayavira  wrote:
>>> >> >>> > In this case, optimising makes sense, once the index is
>>> generated,
>>> >> you
>>> >> >>> > are not updating It.
>>> >> >>> >
>>> >> >>> > Upayavira
>>> >> >>> >
>>> >> >>> > On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
>>> >> >>> >> Our index has almost 100M documents running on SolrCloud of 5
>>> shards
>>> >> >>> and
>>> >> >>> >> each shard has an index size of about 170+GB (for the record,
>>> we are
>>> >> >>> not
>>> >> >>> >> using stored fields - our documents are pretty large). We
>>> perform a
>>> >> >>> full
>>> >> >>> >> indexing every weekend and during the week there are no updates
>>> >> made to
>>> >> >>> >> the
>>> >> >>> >> index. Most of the queries that we run are pretty complex with
>>> >> hundreds
>>> >> >>> >> of
>>> >> >>> >> terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards,
>>> boosts
>>> >> >>> etc.
>>> >> >>> >> and take many minutes to execute. A difference of 10-20% is
>>> also a
>>> >> big
>>> >> >>> >> advantage for us.
>>> >> >>> >>
>>> >> >>> >> We have been optimizing the index after indexing for years and
>>> it
>>> >> has
>>> >> >>> >> worked well for us. Every once in a while, we upgrade Solr to
>>> the
>>> >> >>> latest
>>> >> >>> >> version and try without optimizing so that we can save the many
>>> >> hours
>>> >> >>> it
>>> >> >>> >> take to optimize such a huge index, but find optimized index
>>> work
>>> >> well
>>> >> >>> >> for
>>> >> >>> >> us.
>>> >> >>> >>
>>> >> >>> >> Erick I was indexing today the documents and saw the optimize
>>> >> happening
>>> >> >>> >> in
>>> >> >>> >> background.
>>> >> >>> >>
>>> >> >>> >> On

Re: How to identify field names from the suggested values in multiple fields

2015-06-04 Thread Erick Erickson
There shouldn't be any limitation. You haven't provided the full stack trace,
so there's not a lot to say.

Do be a little careful, though, since the parameters are slightly different
for analyzingInfix, i.e. indexPath rather than sotreDir.

Best,
Erick

On Thu, Jun 4, 2015 at 4:55 AM, Dhanesh Radhakrishnan
 wrote:
> Dear Erick,
> That document help me to build multiple suggesters
> But still there is one problem that I faced.
> When I used both suggesters with same lookupImpl as
> AnalyzingInfixLookupFactory
> AnalyzingInfixLookupFactory
>
> solr throws an error
>
> Caused by: java.lang.RuntimeException at
> org.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory.create(AnalyzingInfixLookupFactory.java:138)
> at
> org.apache.solr.spelling.suggest.SolrSuggester.init(SolrSuggester.java:107)
> at
> org.apache.solr.handler.component.SuggestComponent.inform(SuggestComponent.java:119)
> at
> org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:620)
> at org.apache.solr.core.SolrCore.(SolrCore.java:868)
> ... 8 more
>
> So that I changed the Lookup with FuzzyLookupFactory suggester worked for
> both fields.
>
> Is there any limitation using AnalyzingInfixLookupFactory for multiple
> suggesters
>
> I'm using SOLR 5.1
>
> Regards
> dhanesh s.r
>
>
> On Thu, Jun 4, 2015 at 11:33 AM, Erick Erickson 
> wrote:
>
>> Yes, this might help: http://lucidworks.com/blog/solr-suggester/
>>
>> Best,
>> Erick
>>
>> On Wed, Jun 3, 2015 at 10:32 PM, Dhanesh Radhakrishnan
>>  wrote:
>> > Thank you for the quick response.
>> > If I use 2 suggesters, can I get the result in a single request?
>> >
>> http://192.17.80.99:8983/solr/core1/suggest?suggest=true&suggest.dictionary=mySuggester&wt=xml&suggest.q=school
>> > Is there any helping document to build multiple suggesters??
>> >
>> >
>> > On Thu, Jun 4, 2015 at 10:40 AM, Walter Underwood > >
>> > wrote:
>> >
>> >> Configure two suggesters, one based on each field. Use both of them and
>> >> you’ll get separate suggestions from each.
>> >>
>> >> wunder
>> >> Walter Underwood
>> >> wun...@wunderwood.org
>> >> http://observer.wunderwood.org/  (my blog)
>> >>
>> >>
>> >> On Jun 3, 2015, at 10:03 PM, Dhanesh Radhakrishnan 
>> >> wrote:
>> >>
>> >> > Hi
>> >> > Anyone help me to  build a suggester auto complete based on multiple
>> >> fields?
>> >> > There are two fields in my schema. Category and Subcategory and I'm
>> >> trying
>> >> > to build  suggester based on these 2 fields. When the suggestions
>> result,
>> >> > how can I distinguish from which filed it come from?
>> >> >
>> >> > I used a copyfields to combine multiple fields into single field and
>> use
>> >> > that field in suggester
>> >> > But this will  return the combined result of category and
>> subcategory. I
>> >> > can't differentiate the results that fetch from which field
>> >> >
>> >> > These are the copyfields for autocomplete
>> >> > 
>> >> > 
>> >> >
>> >> > Suggestions should know from which field its from.
>> >> > For Eg my suggester returns 5 results for the keyword "schools". In
>> that
>> >> > result  2 from the category field and 3 from the subcategory field.
>> >> >
>> >> > Schools (Category)
>> >> > Primary Schools (Subcategory)
>> >> > Driving Schools (Subcategory)
>> >> > Day care and play school (Subcategory)
>> >> > Day Care/Play School (Category)
>> >> >
>> >> >
>> >> > Is there any way to build like this ??
>> >> >
>> >> >
>> >> > --
>> >> > [image: hifx_logo] 
>> >> > *dhanesh s.R *
>> >> > Team Lead
>> >> > t: (+91) 484 4011750 (ext. 712) | m: (+91) 99 4  703
>> >> > e: dhan...@hifx.in | w: www.hifx.in
>> >> >  
>> >> > 
>> >> > 
>> >> >
>> >> > --
>> >> >
>> >> > --
>> >> > IMPORTANT: This is an e-mail from HiFX IT Media Services Pvt. Ltd. Its
>> >> > content are confidential to the intended recipient. If you are not the
>> >> > intended recipient, be advised that you have received this e-mail in
>> >> error
>> >> > and that any use, dissemination, forwarding, printing or copying of
>> this
>> >> > e-mail is strictly prohibited. It may not be disclosed to or used by
>> >> anyone
>> >> > other than its intended recipient, nor may it be copied in any way. If
>> >> > received in error, please email a reply to the sender, then delete it
>> >> from
>> >> > your system.
>> >> >
>> >> > Although this e-mail has been scanned for viruses, HiFX cannot
>> ultimately
>> >> > accept any responsibility for viruses and it is your responsibility to
>> >> scan
>> >> > attachments (if any).
>> >> >
>> >> >
>> >> > Before you print this email or attachments, please consider the
>> negative
>> >> > environmental impacts associated with printing.
>> >>
>> >>
>> >
>> >
>> > --
>> >  [image: hifx_logo] 
>> > *dhanesh s.R *
>> > Tea

Re: BoolField fieldType

2015-06-04 Thread Erick Erickson
Have you tried it? Really, it should take you 2 minutes to add a doc and see.

I'd guess it follows the same rules.

Best,
Erick

On Thu, Jun 4, 2015 at 5:29 AM, Steven White  wrote:
> Thanks Erick.
>
> What about at query time?  If I index my Boolean and it has one of the
> variations of "t", "T" or "1", what should my query be to get a hit on
> "true"?  q=MyBoolField: ?  What should the value of  be when I
> want to check if the field has a "true" and when I need to check if it has
> a "false"?
>
> Steve
>
> On Wed, Jun 3, 2015 at 6:41 PM, Erick Erickson 
> wrote:
>
>> I took a quick look at the code and it _looks_ like any string
>> starting with "t", "T" or "1" is evaluated as true and everything else
>> as false.
>>
>> sortMissingLast determines sort order if you're sorting on this field
>> and the document doesn't have a value. Should the be sorted after or
>> before docs that have a value for the field?
>>
>> Hmm, could use some better docs
>>
>> Erick
>>
>> On Wed, Jun 3, 2015 at 2:38 PM, Steven White  wrote:
>> > Hi everyone,
>> >
>> > This is a two part question:
>> >
>> > 1) I see the following: > > sortMissingLast="true"/>
>> >
>> > a) what does sortMissingLast do?
>> > b) what kind of data is considered Boolean?  "TRUE", "True", "true", "1",
>> > "yes,", "Yes, "FALSE", etc.
>> >
>> > 2) When searching, what do I search on: q=MyBoolField:  That is
>> what
>> > should "" be?
>> >
>> > Thanks
>> >
>> > Steve
>>


Re: List all Collections together with number of records

2015-06-04 Thread Erick Erickson
Not in a single call that I know of. These are really orthogonal
concepts. Getting the cluster status merely involves reading the
Zookeeper clusterstate whereas getting the total number of docs for
each would involve querying each collection, i.e. going to the Solr
nodes themselves. I'd guess it's unlikely to be combined.

Best,
Erick

On Thu, Jun 4, 2015 at 7:47 AM, Zheng Lin Edwin Yeo
 wrote:
> Hi,
>
> Would like to check, are we able to use the Collection API or any other
> method to list all the collections in the cluster together with the number
> of records in each of the collections in one output?
>
> Currently, I only know of the List Collections
> /admin/collections?action=LIST. However, this only list the names of the
> collections that are in the cluster, but not the number of records.
>
> Is there a way to show the number of records in each of the collections as
> well?
>
> Regards,
> Edwin


SOLR & Windows

2015-06-04 Thread Doug Ford
Hi folks -

Quick question:

Is TomCat needed on Windows Server 2012 before I install SOLR 5.1?

Thanks

Doug


Re: Solr Atomic Updates

2015-06-04 Thread Ксения Баталова
Erick,

Thank you so much. It became a bit clearer.

It was decided to upgrade Solr to 5.2 and use SolrCloud in our next release.

I think I'll write here about it yet :)

_ _

Batalova Kseniya


I have to ask then why you're not using SolrCloud with multiple shards? It
seems to me that that gives you the indexing throughput you need (be sure to
use CloudSolrServer from your client). At 300M complex documents, you
pretty much certainly will need to shard anyway so in some sense you're
re-inventing the wheel here.

You can host multiple shards on the same machine, and these _are_ separate
Solr cores under the covers so you problem with atomic updates disappears.

Although I would consider upgrading to Solr 4.10.3 or even 5.2 (which is being
voted on even now and should be out in a week or so barring problems).

Best,
Erick

On Wed, Jun 3, 2015 at 11:04 AM, Ксения Баталова 
wrote:
> Jack,
>
> Decision of using several cores was made to increase indexing and
> searching performance (experimentally).
>
> In my project index is about 300-500 millions documents (each document
> has rather difficult structure) and it may be larger.
>
> So, while indexing the documents are being added in different cores by
> some amount of threads.
>
> In other words, each thread collect nessesary information for list of
> documents and generate create-documents query to specific core.
>
> At this moment it doesn't matter (and it can't be found out) which
> document in which core will be.
>
> And now there is necessary to update (atomic update) this index.
>
> Something like this..
>
> _ _
>
> Batalova Kseniya
>
>
> Explain a little about why you have separate cores, and how you decide
> which core a new document should reside in. Your scenario still seems a bit
> odd, so help us understand.
>
>
> -- Jack Krupansky
>
> On Wed, Jun 3, 2015 at 3:15 AM, Ксения Баталова 
> wrote:
>
>> Hi!
>>
>> Thanks for your quick reply.
>>
>> The problem that all my index is consists of several parts (several cores)
>>
>> and while updating I don't know in advance in which part updated id is
>> lying (in which core the document with specified id is lying).
>>
>> For example, I have two cores (*Core1 *and *Core2*) and I want to
>> update the document with id *Id1 *and I don't know where this document
>> is lying.
>>
>> So, I have to do two select-queries to my cores to know where it is.
>>
>> And then generate update-query to necessary core.
>>
>> What am I doing wrong?
>>
>> I remind that I'm using SOLR 4.4.0.
>>
>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>> Best regards,
>> Batalova Kseniya
>>
>>
>> What exactly is the problem? And why do you care about cores, per se -
>> other than to send the update to the core/collection you are trying to
>> update? You should specify the core/collection name in the URL.
>>
>> You should also be using the Solr reference guide rather than the (old)
>> wiki:
>>
>> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents
>>
>>
>> -- Jack Krupansky
>>
>> On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова 
>> wrote:
>>
>> > Hi!
>> >
>> > I'm using *SOLR 4.4.0* for searching in my project.
>> > Now I am facing a problem of atomic updates in multiple cores.
>> > From wiki:
>> >
>> > curl *http://localhost:8983/solr/update
>> >  *-H
>> > 'Content-type:application/json' -d '
>> > [
>> >  {
>> >   "*id*": "*TestDoc1*",
>> >   "title" : {"set":"test1"},
>> >   "revision"  : {"inc":3},
>> >   "publisher" : {"add":"TestPublisher"}
>> >  },
>> >  {
>> >   "id": "TestDoc2",
>> >   "publisher" : {"add":"TestPublisher"}
>> >  }
>> > ]'
>> >
>> > As well as I understand, this means that the document, for example, with
>> id
>> > *TestDoc1*, will be searched for updating *only in one core*.
>> > And if there is no any document with id *TestDoc1*, the document will be
>> > created.
>> > Can I somehow to specify the* list of cores* for searching and then
>> > updating necessary document with specific id?
>> >
>> > It's something like *shards *parameter in *select* query.
>> > From wiki:
>> >
>> > #now do a distributed search across both servers with your browser or
>> curl
>> > curl '
>> >
>> http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solr&indent=true&q=ipod+solr
>> > '
>> >
>> > Or is it planned in the future?
>> >
>> > Thanks in advance.
>> >
>> > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>> >
>> > Best regards,
>> > Batalova Kseniya
>> >
>>


Re: indexing issue

2015-06-04 Thread Midas A
sorry Shawn ,

a) Total docs solr is handling is 3 million .
b) index size is only 5 GB



On Thu, Jun 4, 2015 at 9:35 PM, Shawn Heisey  wrote:

> On 6/4/2015 7:38 AM, Midas A wrote:
> > On Thu, Jun 4, 2015 at 6:48 PM, Shawn Heisey 
> wrote:
> >
> >> On 6/4/2015 5:15 AM, Midas A wrote:
> >>> I have some indexing issue . While indexing IOwait is high in solr
> server
> >>> and load also.
> >> My first suspect here is that you don't have enough RAM for your index
> >> size.
> >>
> >> * How many total docs is Solr handling (all cores)?
> >>
> >  --30,0 dos
> >
> >> * What is the total size on disk of all your cores?
> >>
> >  --  600 GB
> >
> >> * How much RAM does the machine have?
> >>
> >  --48 GB
> >
> >> * What is the java max heap?
> >> --30 GB(jvm)
>
> Is that 3 million docs or 30 million docs?  The actual numbers are 3
> million, but you put a single comma in the number after the 30, so I am
> not sure which you meant.  Either way, those documents must be quite
> large, to make a 600GB index.  30 million docs in my index would only be
> about 30GB.
>
> With 48 GB of RAM, 30 GB allocated to Solr, and a 600GB index, you don't
> have anywhere even close to enough RAM to cache your index effectively.
> There's only 18GB of RAM left over for the OS disk cache.  That's only 3
> percent of the index data that can fit in the OS disk cache.  I would
> imagine that you're going to need to be able to fit somewhere between 25
> and 50 percent of the index into RAM, which would mean that you're going
> to want around 256GB of RAM for that index. 128GB *might* be enough.
> Alternatively, you could work on making your index smaller -- but be
> aware that to improve performance with low memory, you need to reduce
> the *indexed* part, the *stored* part makes little difference.
>
> Another potential problem with a 30GB heap is related to garbage
> collection tuning.  If you haven't tuned your GC at all, then
> performance will be terrible on a heap that large, especially when you
> are indexing.  The wiki page I linked on my previous reply contains a
> link to my personal page, which covers GC tuning:
>
> https://wiki.apache.org/solr/ShawnHeisey
>
> Thanks,
> Shawn
>
>


Re: Solr Atomic Updates by Query

2015-06-04 Thread Ксения Баталова
Is it planned soon?

Or may be not soon..

_ _ _

Batalova Kseniya


There is no equivalent of, say a SQL update...where... so no, atomic
updates by query...

Best,
Erick

On Thu, Jun 4, 2015 at 2:49 AM, Ксения Баталова 
wrote:
> Hi!
>
> I have one more question about atomic updates in Solr (Solr 4.4.0).
> Is it posible to generate atomic update by query?
> I mean I want to update those documents in which IDs contain some string.
> For example, index has:
> Doc1, id="123|a,b"
> Doc2, id="123|a,c"
> Doc3, id="345|a,b"
> Doc4, id="345|a,c,d".
>
> And if I don't want to generate all IDs to update, but I know that
> necessary IDs start with "123".
> I tried to generate query something like that (using *):
>
> {"id":"123|*",
>  "price":{"set":99}
> }
>
> But in result the document with id="123|*" was added.
> Can I do this somehow?
>
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>
> Best regards,
> Batalova Kseniya


Re: Solr Atomic Updates

2015-06-04 Thread Erick Erickson
NP. It's something of a step when moving to SolrCloud to "let go" of the
details you've had to (painfully) pay attention to, but worth it. The price is,
of course, learning to do things a new way ;)...

Best,
Erick

On Thu, Jun 4, 2015 at 10:04 AM, Ксения Баталова  wrote:
> Erick,
>
> Thank you so much. It became a bit clearer.
>
> It was decided to upgrade Solr to 5.2 and use SolrCloud in our next release.
>
> I think I'll write here about it yet :)
>
> _ _
>
> Batalova Kseniya
>
>
> I have to ask then why you're not using SolrCloud with multiple shards? It
> seems to me that that gives you the indexing throughput you need (be sure to
> use CloudSolrServer from your client). At 300M complex documents, you
> pretty much certainly will need to shard anyway so in some sense you're
> re-inventing the wheel here.
>
> You can host multiple shards on the same machine, and these _are_ separate
> Solr cores under the covers so you problem with atomic updates disappears.
>
> Although I would consider upgrading to Solr 4.10.3 or even 5.2 (which is being
> voted on even now and should be out in a week or so barring problems).
>
> Best,
> Erick
>
> On Wed, Jun 3, 2015 at 11:04 AM, Ксения Баталова 
> wrote:
>> Jack,
>>
>> Decision of using several cores was made to increase indexing and
>> searching performance (experimentally).
>>
>> In my project index is about 300-500 millions documents (each document
>> has rather difficult structure) and it may be larger.
>>
>> So, while indexing the documents are being added in different cores by
>> some amount of threads.
>>
>> In other words, each thread collect nessesary information for list of
>> documents and generate create-documents query to specific core.
>>
>> At this moment it doesn't matter (and it can't be found out) which
>> document in which core will be.
>>
>> And now there is necessary to update (atomic update) this index.
>>
>> Something like this..
>>
>> _ _
>>
>> Batalova Kseniya
>>
>>
>> Explain a little about why you have separate cores, and how you decide
>> which core a new document should reside in. Your scenario still seems a bit
>> odd, so help us understand.
>>
>>
>> -- Jack Krupansky
>>
>> On Wed, Jun 3, 2015 at 3:15 AM, Ксения Баталова 
>> wrote:
>>
>>> Hi!
>>>
>>> Thanks for your quick reply.
>>>
>>> The problem that all my index is consists of several parts (several cores)
>>>
>>> and while updating I don't know in advance in which part updated id is
>>> lying (in which core the document with specified id is lying).
>>>
>>> For example, I have two cores (*Core1 *and *Core2*) and I want to
>>> update the document with id *Id1 *and I don't know where this document
>>> is lying.
>>>
>>> So, I have to do two select-queries to my cores to know where it is.
>>>
>>> And then generate update-query to necessary core.
>>>
>>> What am I doing wrong?
>>>
>>> I remind that I'm using SOLR 4.4.0.
>>>
>>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>>> Best regards,
>>> Batalova Kseniya
>>>
>>>
>>> What exactly is the problem? And why do you care about cores, per se -
>>> other than to send the update to the core/collection you are trying to
>>> update? You should specify the core/collection name in the URL.
>>>
>>> You should also be using the Solr reference guide rather than the (old)
>>> wiki:
>>>
>>> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents
>>>
>>>
>>> -- Jack Krupansky
>>>
>>> On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова 
>>> wrote:
>>>
>>> > Hi!
>>> >
>>> > I'm using *SOLR 4.4.0* for searching in my project.
>>> > Now I am facing a problem of atomic updates in multiple cores.
>>> > From wiki:
>>> >
>>> > curl *http://localhost:8983/solr/update
>>> >  *-H
>>> > 'Content-type:application/json' -d '
>>> > [
>>> >  {
>>> >   "*id*": "*TestDoc1*",
>>> >   "title" : {"set":"test1"},
>>> >   "revision"  : {"inc":3},
>>> >   "publisher" : {"add":"TestPublisher"}
>>> >  },
>>> >  {
>>> >   "id": "TestDoc2",
>>> >   "publisher" : {"add":"TestPublisher"}
>>> >  }
>>> > ]'
>>> >
>>> > As well as I understand, this means that the document, for example, with
>>> id
>>> > *TestDoc1*, will be searched for updating *only in one core*.
>>> > And if there is no any document with id *TestDoc1*, the document will be
>>> > created.
>>> > Can I somehow to specify the* list of cores* for searching and then
>>> > updating necessary document with specific id?
>>> >
>>> > It's something like *shards *parameter in *select* query.
>>> > From wiki:
>>> >
>>> > #now do a distributed search across both servers with your browser or
>>> curl
>>> > curl '
>>> >
>>> http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solr&indent=true&q=ipod+solr
>>> > '
>>> >
>>> > Or is it planned in the future?
>>> >
>>> > Thanks in advance.
>>> >
>>> > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>>> >
>>> > Best regards,
>>> > Batalova Kseniya
>>> >
>>>


Re: indexing issue

2015-06-04 Thread Shawn Heisey
On 6/4/2015 11:12 AM, Midas A wrote:
> sorry Shawn ,
>
> a) Total docs solr is handling is 3 million .
> b) index size is only 5 GB

If your total index size is only 5GB, then there should be no need for a
30GB heap.  For that much index, I'd start with 4GB, and implement GC
tuning.

A high iowait doesn't make any sense for that situation, but it WOULD
make sense with 600 GB of total index.

Thanks,
Shawn



Re: indexing issue

2015-06-04 Thread Midas A
Shwan,

Please find the log . give me some sense what is happening

On Thu, Jun 4, 2015 at 10:56 PM, Shawn Heisey  wrote:

> On 6/4/2015 11:12 AM, Midas A wrote:
> > sorry Shawn ,
> >
> > a) Total docs solr is handling is 3 million .
> > b) index size is only 5 GB
>
> If your total index size is only 5GB, then there should be no need for a
> 30GB heap.  For that much index, I'd start with 4GB, and implement GC
> tuning.
>
> A high iowait doesn't make any sense for that situation, but it WOULD
> make sense with 600 GB of total index.
>
> Thanks,
> Shawn
>
>
2015-06-04 18:44:56
Full thread dump OpenJDK 64-Bit Server VM (24.45-b08 mixed mode):

"qtp1122335225-81" prio=10 tid=0x2ab280f92800 nid=0x44e4 waiting on condition [0x40293000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x2aaab8aa0c00> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
	at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
	at java.lang.Thread.run(Thread.java:744)

"qtp1122335225-80" prio=10 tid=0x2ab280f8e800 nid=0x44e3 runnable [0x43151000]
   java.lang.Thread.State: RUNNABLE
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.read(SocketInputStream.java:152)
	at java.net.SocketInputStream.read(SocketInputStream.java:122)
	at org.eclipse.jetty.io.ByteArrayBuffer.readFrom(ByteArrayBuffer.java:375)
	at org.eclipse.jetty.io.bio.StreamEndPoint.fill(StreamEndPoint.java:141)
	at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.fill(SocketConnector.java:227)
	at org.eclipse.jetty.http.HttpParser.fill(HttpParser.java:1035)
	at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:280)
	at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
	at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
	at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
	at java.lang.Thread.run(Thread.java:744)

"Attach Listener" daemon prio=10 tid=0x139c7800 nid=0x44e2 waiting on condition [0x]
   java.lang.Thread.State: RUNNABLE

"qtp1122335225-77" prio=10 tid=0x2ab280224000 nid=0x3196 runnable [0x41eac000]
   java.lang.Thread.State: RUNNABLE
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.read(SocketInputStream.java:152)
	at java.net.SocketInputStream.read(SocketInputStream.java:122)
	at org.eclipse.jetty.io.ByteArrayBuffer.readFrom(ByteArrayBuffer.java:375)
	at org.eclipse.jetty.io.bio.StreamEndPoint.fill(StreamEndPoint.java:141)
	at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.fill(SocketConnector.java:227)
	at org.eclipse.jetty.http.HttpParser.fill(HttpParser.java:1035)
	at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:280)
	at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
	at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
	at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
	at java.lang.Thread.run(Thread.java:744)

"qtp1122335225-76" prio=10 tid=0x2ab280f7f000 nid=0x3195 runnable [0x40691000]
   java.lang.Thread.State: RUNNABLE
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.read(SocketInputStream.java:152)
	at java.net.SocketInputStream.read(SocketInputStream.java:122)
	at org.eclipse.jetty.io.ByteArrayBuffer.readFrom(ByteArrayBuffer.java:375)
	at org.eclipse.jetty.io.bio.StreamEndPoint.fill(StreamEndPoint.java:141)
	at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.fill(SocketConnector.java:227)
	at org.eclipse.jetty.http.HttpParser.fill(HttpParser.java:1035)
	at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:280)
	at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
	at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
	at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
	at org.ecli

Re: indexing issue

2015-06-04 Thread Midas A
we are indexing around 5 docs par 10 min .

On Thu, Jun 4, 2015 at 11:02 PM, Midas A  wrote:

> Shwan,
>
> Please find the log . give me some sense what is happening
>
> On Thu, Jun 4, 2015 at 10:56 PM, Shawn Heisey  wrote:
>
>> On 6/4/2015 11:12 AM, Midas A wrote:
>> > sorry Shawn ,
>> >
>> > a) Total docs solr is handling is 3 million .
>> > b) index size is only 5 GB
>>
>> If your total index size is only 5GB, then there should be no need for a
>> 30GB heap.  For that much index, I'd start with 4GB, and implement GC
>> tuning.
>>
>> A high iowait doesn't make any sense for that situation, but it WOULD
>> make sense with 600 GB of total index.
>>
>> Thanks,
>> Shawn
>>
>>
>


Shard still around after calling splitshard

2015-06-04 Thread Mike Thomsen
I thought splitshard was supposed to get rid of the original shard, shard1,
in this case. Am I missing something? I was expecting the only two
remaining shards to be shard1_0 and shard1_1.

The REST call I used was
/admin/collections?collection=default-collection&shard=shard1&action=SPLITSHARD
if that helps.

Attached is a screenshot of the Cloud view in the admin console after
running splitshard.

Should it look like that? Do I need to delete shard1 now?

Thanks,

Mike


Re: SOLR & Windows

2015-06-04 Thread Erick Erickson
No servlet container is needed at all. We're moving away from
distributing a war file in the future, Solr 5x still distributes a war
for back-compat reasons. The preferred method is now to use the
bin/solr start script.

Under the covers this still uses Jetty, but that is now an
"implementation detail" that may change in the future and you don't
have to do _anything_ to accommodate those changes.

See: https://wiki.apache.org/solr/WhyNoWar

Best,
Erick

On Thu, Jun 4, 2015 at 9:58 AM, Doug Ford  wrote:
> Hi folks -
>
> Quick question:
>
> Is TomCat needed on Windows Server 2012 before I install SOLR 5.1?
>
> Thanks
>
> Doug


Re: Shard still around after calling splitshard

2015-06-04 Thread Anshum Gupta
Hi Mike,

Once the SPLITSHARD call completes, it just marks the original shard as
Inactive i.e. it no longer accepts requests. So yes, you would have to use
DELETESHARD (
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api7)
to clean it up.

As far as what you see on the admin UI, that information is wrong i.e. the
UI does not respect the state of the shard while displaying them. So,
though the parent shard might be inactive, you still would end up seeing it
as just another active shard. There's an open issue for this one.

One way to confirm the shard state is by looking at the shard state in
clusterstate.json (or state.json, depending upon the version of Solr you're
using).


On Thu, Jun 4, 2015 at 10:35 AM, Mike Thomsen 
wrote:

> I thought splitshard was supposed to get rid of the original shard,
> shard1, in this case. Am I missing something? I was expecting the only two
> remaining shards to be shard1_0 and shard1_1.
>
> The REST call I used was
> /admin/collections?collection=default-collection&shard=shard1&action=SPLITSHARD
> if that helps.
>
> Attached is a screenshot of the Cloud view in the admin console after
> running splitshard.
>
> Should it look like that? Do I need to delete shard1 now?
>
> Thanks,
>
> Mike
>



-- 
Anshum Gupta


Re: BoolField fieldType

2015-06-04 Thread Chris Hostetter

: I took a quick look at the code and it _looks_ like any string
: starting with "t", "T" or "1" is evaluated as true and everything else
: as false.

correct and documented...
https://cwiki.apache.org/confluence/display/solr/Field+Types+Included+with+Solr

: sortMissingLast determines sort order if you're sorting on this field
: and the document doesn't have a value. Should the be sorted after or
: before docs that have a value for the field?

also correct and documented...
https://cwiki.apache.org/confluence/display/solr/Field+Type+Definitions+and+Properties


-Hoss
http://www.lucidworks.com/


Re: Shard still around after calling splitshard

2015-06-04 Thread Mike Thomsen
Thanks. I thought it worked like that, but didn't want to jump to
conclusions.

On Thu, Jun 4, 2015 at 1:42 PM, Anshum Gupta  wrote:

> Hi Mike,
>
> Once the SPLITSHARD call completes, it just marks the original shard as
> Inactive i.e. it no longer accepts requests. So yes, you would have to use
> DELETESHARD (
>
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api7
> )
> to clean it up.
>
> As far as what you see on the admin UI, that information is wrong i.e. the
> UI does not respect the state of the shard while displaying them. So,
> though the parent shard might be inactive, you still would end up seeing it
> as just another active shard. There's an open issue for this one.
>
> One way to confirm the shard state is by looking at the shard state in
> clusterstate.json (or state.json, depending upon the version of Solr you're
> using).
>
>
> On Thu, Jun 4, 2015 at 10:35 AM, Mike Thomsen 
> wrote:
>
> > I thought splitshard was supposed to get rid of the original shard,
> > shard1, in this case. Am I missing something? I was expecting the only
> two
> > remaining shards to be shard1_0 and shard1_1.
> >
> > The REST call I used was
> >
> /admin/collections?collection=default-collection&shard=shard1&action=SPLITSHARD
> > if that helps.
> >
> > Attached is a screenshot of the Cloud view in the admin console after
> > running splitshard.
> >
> > Should it look like that? Do I need to delete shard1 now?
> >
> > Thanks,
> >
> > Mike
> >
>
>
>
> --
> Anshum Gupta
>


Re: Solr Atomic Updates by Query

2015-06-04 Thread Erick Erickson
Not to my knowledge. In Solr terms this would be a _very_ heavyweight
operation, potentially re-indexing millions and millions of documents.
Imagine if your q were id:* for instance. Plus routing that to all
shards and dealing with other updates coming in would be a nightmare.

Best,
Erick

On Thu, Jun 4, 2015 at 10:13 AM, Ксения Баталова  wrote:
> Is it planned soon?
>
> Or may be not soon..
>
> _ _ _
>
> Batalova Kseniya
>
>
> There is no equivalent of, say a SQL update...where... so no, atomic
> updates by query...
>
> Best,
> Erick
>
> On Thu, Jun 4, 2015 at 2:49 AM, Ксения Баталова 
> wrote:
>> Hi!
>>
>> I have one more question about atomic updates in Solr (Solr 4.4.0).
>> Is it posible to generate atomic update by query?
>> I mean I want to update those documents in which IDs contain some string.
>> For example, index has:
>> Doc1, id="123|a,b"
>> Doc2, id="123|a,c"
>> Doc3, id="345|a,b"
>> Doc4, id="345|a,c,d".
>>
>> And if I don't want to generate all IDs to update, but I know that
>> necessary IDs start with "123".
>> I tried to generate query something like that (using *):
>>
>> {"id":"123|*",
>>  "price":{"set":99}
>> }
>>
>> But in result the document with id="123|*" was added.
>> Can I do this somehow?
>>
>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>>
>> Best regards,
>> Batalova Kseniya


Re: BoolField fieldType

2015-06-04 Thread Chris Hostetter

: What about at query time?  If I index my Boolean and it has one of the
: variations of "t", "T" or "1", what should my query be to get a hit on
: "true"?  q=MyBoolField: ?  What should the value of  be when I
: want to check if the field has a "true" and when I need to check if it has
: a "false"?

the string representations are parsed into the logical concepts -- it 
doesn't matter if one doc was added with a value of "true" and another doc 
was indexed with a value of "T" in the index they both have a true value.

likewise at query time it doesn't matter if you query for "true" or "T" - 
they are both going to find all docs that have a true value, and querying 
for "F" or "false" or "BOGUS" are going to find you all docs with a false 
value.

where things get interesting is is when you deal with documents that do 
not have a value in the field at all -- searching for something like 
this...

q=-MyBoolField:true

...won't just return all the docs with a false value, it will also return 
all docs w/o any value.

if you want to find all documents that don't have any value, you can 
search for this...

q=-MyBoolField:*

note that the "*" is a query parser feature, so it causes the query parser 
to do a "docs with value in this field" query w/o ever asking the 
BoolField to "parse" the input string as a true/false value.



-Hoss
http://www.lucidworks.com/


Problems while setting classpath (while upgrading to Solr 5.1 from Solr 4.X)

2015-06-04 Thread Vaibhav Bhandari
Hi,

I am trying to upgrade Solr 4.X to Solr 5.1.0. I am using Maven to compile
solr.

Previously, I was packing my custom-plugin jars with solr.war and
everything worked fine.

With Solr 5.X, however, I am finding it hard to include my custom-plugin
jars in the classpath. I am using bin/solr script to start up Solr.

I have tried a few things:

   1. Passing the path of custom jars as -Djava.class.path=
   2. Passing the path as -a "-cp "
   3. Copying the jars inside server/lib/ext folder.
   4. Copying the jars inside server/solr-webapp/webapp/WEB-INF/lib folder
   (because after reading the script, i realize that this folder is included
   in the classpath by default).

Is there a recommended way to load the plugin jars in the classpath? Note
that I am willing to avoid the blob-store api.

Also, there is no documentation on the OPTIONS parameter in start.jar and
how do i set that from bin/solr script.

Any guidance is highly appreciated.

Thanks,
Vaibhav


Re: Solr Atomic Updates

2015-06-04 Thread Ксения Баталова
Hope I'll succeed)

Anyway, solr-user community surprised me in a good way.

Thanks again.

_ _

Batalova Kseniya


NP. It's something of a step when moving to SolrCloud to "let go" of the
details you've had to (painfully) pay attention to, but worth it. The price is,
of course, learning to do things a new way ;)...

Best,
Erick

On Thu, Jun 4, 2015 at 10:04 AM, Ксения Баталова 
wrote:
> Erick,
>
> Thank you so much. It became a bit clearer.
>
> It was decided to upgrade Solr to 5.2 and use SolrCloud in our next release.
>
> I think I'll write here about it yet :)
>
> _ _
>
> Batalova Kseniya
>
>
> I have to ask then why you're not using SolrCloud with multiple shards? It
> seems to me that that gives you the indexing throughput you need (be sure to
> use CloudSolrServer from your client). At 300M complex documents, you
> pretty much certainly will need to shard anyway so in some sense you're
> re-inventing the wheel here.
>
> You can host multiple shards on the same machine, and these _are_ separate
> Solr cores under the covers so you problem with atomic updates disappears.
>
> Although I would consider upgrading to Solr 4.10.3 or even 5.2 (which is being
> voted on even now and should be out in a week or so barring problems).
>
> Best,
> Erick
>
> On Wed, Jun 3, 2015 at 11:04 AM, Ксения Баталова 
> wrote:
>> Jack,
>>
>> Decision of using several cores was made to increase indexing and
>> searching performance (experimentally).
>>
>> In my project index is about 300-500 millions documents (each document
>> has rather difficult structure) and it may be larger.
>>
>> So, while indexing the documents are being added in different cores by
>> some amount of threads.
>>
>> In other words, each thread collect nessesary information for list of
>> documents and generate create-documents query to specific core.
>>
>> At this moment it doesn't matter (and it can't be found out) which
>> document in which core will be.
>>
>> And now there is necessary to update (atomic update) this index.
>>
>> Something like this..
>>
>> _ _
>>
>> Batalova Kseniya
>>
>>
>> Explain a little about why you have separate cores, and how you decide
>> which core a new document should reside in. Your scenario still seems a bit
>> odd, so help us understand.
>>
>>
>> -- Jack Krupansky
>>
>> On Wed, Jun 3, 2015 at 3:15 AM, Ксения Баталова 
>> wrote:
>>
>>> Hi!
>>>
>>> Thanks for your quick reply.
>>>
>>> The problem that all my index is consists of several parts (several cores)
>>>
>>> and while updating I don't know in advance in which part updated id is
>>> lying (in which core the document with specified id is lying).
>>>
>>> For example, I have two cores (*Core1 *and *Core2*) and I want to
>>> update the document with id *Id1 *and I don't know where this document
>>> is lying.
>>>
>>> So, I have to do two select-queries to my cores to know where it is.
>>>
>>> And then generate update-query to necessary core.
>>>
>>> What am I doing wrong?
>>>
>>> I remind that I'm using SOLR 4.4.0.
>>>
>>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>>> Best regards,
>>> Batalova Kseniya
>>>
>>>
>>> What exactly is the problem? And why do you care about cores, per se -
>>> other than to send the update to the core/collection you are trying to
>>> update? You should specify the core/collection name in the URL.
>>>
>>> You should also be using the Solr reference guide rather than the (old)
>>> wiki:
>>>
>>> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents
>>>
>>>
>>> -- Jack Krupansky
>>>
>>> On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова 
>>> wrote:
>>>
>>> > Hi!
>>> >
>>> > I'm using *SOLR 4.4.0* for searching in my project.
>>> > Now I am facing a problem of atomic updates in multiple cores.
>>> > From wiki:
>>> >
>>> > curl *http://localhost:8983/solr/update
>>> >  *-H
>>> > 'Content-type:application/json' -d '
>>> > [
>>> >  {
>>> >   "*id*": "*TestDoc1*",
>>> >   "title" : {"set":"test1"},
>>> >   "revision"  : {"inc":3},
>>> >   "publisher" : {"add":"TestPublisher"}
>>> >  },
>>> >  {
>>> >   "id": "TestDoc2",
>>> >   "publisher" : {"add":"TestPublisher"}
>>> >  }
>>> > ]'
>>> >
>>> > As well as I understand, this means that the document, for example, with
>>> id
>>> > *TestDoc1*, will be searched for updating *only in one core*.
>>> > And if there is no any document with id *TestDoc1*, the document will be
>>> > created.
>>> > Can I somehow to specify the* list of cores* for searching and then
>>> > updating necessary document with specific id?
>>> >
>>> > It's something like *shards *parameter in *select* query.
>>> > From wiki:
>>> >
>>> > #now do a distributed search across both servers with your browser or
>>> curl
>>> > curl '
>>> >
>>> http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solr&indent=true&q=ipod+solr
>>> > '
>>> >
>>> > Or is it planned in the future?
>>> >
>>> > Thanks in advance.
>>> >
>>> > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Re: Problems while setting classpath (while upgrading to Solr 5.1 from Solr 4.X)

2015-06-04 Thread Shawn Heisey
On 6/4/2015 12:01 PM, Vaibhav Bhandari wrote:
> With Solr 5.X, however, I am finding it hard to include my custom-plugin
> jars in the classpath. I am using bin/solr script to start up Solr.
>
> I have tried a few things:
>
>1. Passing the path of custom jars as -Djava.class.path=
>2. Passing the path as -a "-cp "
>3. Copying the jars inside server/lib/ext folder.
>4. Copying the jars inside server/solr-webapp/webapp/WEB-INF/lib folder
>(because after reading the script, i realize that this folder is included
>in the classpath by default).
>
> Is there a recommended way to load the plugin jars in the classpath? Note
> that I am willing to avoid the blob-store api.
>
> Also, there is no documentation on the OPTIONS parameter in start.jar and
> how do i set that from bin/solr script.
>
> Any guidance is highly appreciated.

Find your solr home.  This is the directory where solr.xml lives, and
normally the directory where the core directories are.  If you aren't
specifying the solr home, it will default to server/solr in the root of
the extracted download.

In the solr home, create a "lib" directory.  Put all extra or contrib
jars required by Solr there.  Start Solr.

That's it.  All jars included in the lib directory that you created are
automatically loaded by Solr.  You do not need any  directives in
your solrconfig.xml file.

Thanks,
Shawn



Solr Tomcat or CURL

2015-06-04 Thread Ксения Баталова
Hi!

Need help with Solr 4.4.0 + Tomcat 7 + CURL.
I send many elementary select-queries to Solr core:
http://localhost/solr/Core1/select?q=id:("TestDoc1")&wt=xml&indent=true
May be this is Tomcat or CURL problem: after couple of seconds of regular
queries it returns empty response.
Requests are sent via CURL.
And if execute the first query with empty response in browser, it returns
ERR_CONNECTION_FAILED error
with the details:
web page is currently unavailable at this address or permanently moved to a
new address.
After a few seconds this query execute successfully.
What could be the problem?
May be some settings can solve this.

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Best regards,
Batalova Kseniya


Re: Solr Atomic Updates by Query

2015-06-04 Thread Ксения Баталова
Oh, I see.

May be it's not such a good idea.)

Thanks.

_ _

Batalova Kseniya


Not to my knowledge. In Solr terms this would be a _very_ heavyweight
operation, potentially re-indexing millions and millions of documents.
Imagine if your q were id:* for instance. Plus routing that to all
shards and dealing with other updates coming in would be a nightmare.

Best,
Erick

On Thu, Jun 4, 2015 at 10:13 AM, Ксения Баталова 
wrote:
> Is it planned soon?
>
> Or may be not soon..
>
> _ _ _
>
> Batalova Kseniya
>
>
> There is no equivalent of, say a SQL update...where... so no, atomic
> updates by query...
>
> Best,
> Erick
>
> On Thu, Jun 4, 2015 at 2:49 AM, Ксения Баталова 
> wrote:
>> Hi!
>>
>> I have one more question about atomic updates in Solr (Solr 4.4.0).
>> Is it posible to generate atomic update by query?
>> I mean I want to update those documents in which IDs contain some string.
>> For example, index has:
>> Doc1, id="123|a,b"
>> Doc2, id="123|a,c"
>> Doc3, id="345|a,b"
>> Doc4, id="345|a,c,d".
>>
>> And if I don't want to generate all IDs to update, but I know that
>> necessary IDs start with "123".
>> I tried to generate query something like that (using *):
>>
>> {"id":"123|*",
>>  "price":{"set":99}
>> }
>>
>> But in result the document with id="123|*" was added.
>> Can I do this somehow?
>>
>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>>
>> Best regards,
>> Batalova Kseniya


Re: Disable or limit the size of Lucene field cache

2015-06-04 Thread pras.venkatesh
A follow up question, I see docValues has been there since Lucene 4.0. so can
I use docValues with my current solr cloud version of 4.8.x 

The reason I am asking is because, I have deployment mechanism and securing
the index (using Tomcat Valves) all built out based on Tomcat which I need
figure out with Jetty.

so thinking if I could use docValues with solr/lucene 4.8.x



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Disable-or-limit-the-size-of-Lucene-field-cache-tp4198798p4209873.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Ability to load solrcore.properties from zookeeper

2015-06-04 Thread Chris Hostetter

: passed in as a Properties object to the CD constructor.  At the moment, 
: you can't refer to a property defined in solrcore.properties within your 
: core.properties file.

but if you look at it fro ma historical context, that doesn't really 
matter for the purpose that solrcore.properties was intended for -- it 
predates core discover, and was only intended as a way to specify "user" 
level properties that could then be substituted in the solrconfig.xml or 
dih.xml or schema.xml

ie: making it possible to use a solrcore.prop value to set a core.prop 
value might be a nice to have, but it's definitely not what it was 
intended for, so it shouldn't really be a blocker to getting the same 
(original) basic functionality working in SolrCloud.



-Hoss
http://www.lucidworks.com/


Re: Problems while setting classpath (while upgrading to Solr 5.1 from Solr 4.X)

2015-06-04 Thread Vaibhav Bhandari
Thanks for the quick response Shawn. That worked!

I wish this was documented though.

-Vaibhav

On Thu, Jun 4, 2015 at 11:22 AM, Shawn Heisey  wrote:

> On 6/4/2015 12:01 PM, Vaibhav Bhandari wrote:
> > With Solr 5.X, however, I am finding it hard to include my custom-plugin
> > jars in the classpath. I am using bin/solr script to start up Solr.
> >
> > I have tried a few things:
> >
> >1. Passing the path of custom jars as -Djava.class.path=
> >2. Passing the path as -a "-cp "
> >3. Copying the jars inside server/lib/ext folder.
> >4. Copying the jars inside server/solr-webapp/webapp/WEB-INF/lib
> folder
> >(because after reading the script, i realize that this folder is
> included
> >in the classpath by default).
> >
> > Is there a recommended way to load the plugin jars in the classpath? Note
> > that I am willing to avoid the blob-store api.
> >
> > Also, there is no documentation on the OPTIONS parameter in start.jar and
> > how do i set that from bin/solr script.
> >
> > Any guidance is highly appreciated.
>
> Find your solr home.  This is the directory where solr.xml lives, and
> normally the directory where the core directories are.  If you aren't
> specifying the solr home, it will default to server/solr in the root of
> the extracted download.
>
> In the solr home, create a "lib" directory.  Put all extra or contrib
> jars required by Solr there.  Start Solr.
>
> That's it.  All jars included in the lib directory that you created are
> automatically loaded by Solr.  You do not need any  directives in
> your solrconfig.xml file.
>
> Thanks,
> Shawn
>
>


Peer Sync fails when newly added node is elected leader.

2015-06-04 Thread Michael Roberts
Hi,

I am seeing some unexpected behavior when adding a new machine to my cluster. I 
am running 4.10.3.

My setup has multiple collections, each collection has a single shard. I am 
using core auto discovery on the hosts (my deployment mechanism ensures that 
the directory structure is created and the core.properties file is in the right 
place).

To add a new machine I have to stop the cluster.

If I add a new machine, and start the cluster, if this new machine is elected 
leader for the shard, peer recovery fails. So, now I have a leader with no 
content, and replicas with content. Depending on where the read request is 
sent, I may or may not get the response I am expecting.

2015-06-04 14:26:09.595 -0700 (,,,) coreZkRegister-1-thread-3 : INFO  
org.apache.solr.cloud.ShardLeaderElectionContext - Running the leader process 
for shard shard1
2015-06-04 14:26:09.607 -0700 (,,,) coreZkRegister-1-thread-9 : INFO  
org.apache.solr.cloud.ShardLeaderElectionContext - Waiting until we see more 
replicas up for shard shard1: total=2 found=1 timeoutin=1.14707356E15ms
2015-06-04 14:26:10.108 -0700 (,,,) coreZkRegister-1-thread-3 : INFO  
org.apache.solr.cloud.ShardLeaderElectionContext - Enough replicas found to 
continue.
2015-06-04 14:26:10.108 -0700 (,,,) coreZkRegister-1-thread-3 : INFO  
org.apache.solr.cloud.ShardLeaderElectionContext - I may be the new leader - 
try and sync
2015-06-04 14:26:10.115 -0700 (,,,) coreZkRegister-1-thread-3 : INFO  
org.apache.solr.update.PeerSync - PeerSync: core=domain 
url=http://10.36.9.70:11000/solr START 
replicas=[http://mlim:11000/solr/domain/] nUpdates=100
2015-06-04 14:26:10.121 -0700 (,,,) coreZkRegister-1-thread-3 : INFO  
org.apache.solr.update.PeerSync - PeerSync: core=domain 
url=http://10.36.9.70:11000/solr DONE.  We have no versions.  sync failed.
2015-06-04 14:26:10.121 -0700 (,,,) coreZkRegister-1-thread-3 : INFO  
org.apache.solr.cloud.ShardLeaderElectionContext - We failed sync, but we have 
no versions - we can't sync in that case - we were active before, so become 
leader anyway
2015-06-04 14:26:10.121 -0700 (,,,) coreZkRegister-1-thread-3 : INFO  
org.apache.solr.cloud.ShardLeaderElectionContext - I am the new leader: 
http://10.36.9.70:11000/solr/domain/ shard1
2015-06-04 14:26:11.153 -0700 (,,,) coreZkRegister-1-thread-3 : INFO  
org.apache.solr.cloud.ZkController - No LogReplay needed for core=domain 
baseURL=http://10.36.9.70:11000/solr
2015-06-04 14:26:11.153 -0700 (,,,) coreZkRegister-1-thread-3 : INFO  
org.apache.solr.cloud.ZkController - I am the leader, no recovery necessary

This seems like a fairly common scenario. So I suspect, either I am doing 
something incorrectly, or I have an incorrect assumption about how this is 
supposed to work.

Does anyone have any suggestions?

Thanks

Mike.


Re: List all Collections together with number of records

2015-06-04 Thread Zheng Lin Edwin Yeo
The reason we wanted to do a single call is to improve on the performance,
as our application requires to list the total number of records in each of
the collections, and the number of records that matches the query each of
the collections.

Currently we are querying each collection one by one to retrieve the
numFound value and display them, but this can slow down the system
significantly when the number of collection grows. So we are thinking of
ways to improve the speed in this area.

Any other methods which you can suggest that we can do to overcome this
speed problem?

Regards,
Edwin
On 5 Jun 2015 00:16, "Erick Erickson"  wrote:

> Not in a single call that I know of. These are really orthogonal
> concepts. Getting the cluster status merely involves reading the
> Zookeeper clusterstate whereas getting the total number of docs for
> each would involve querying each collection, i.e. going to the Solr
> nodes themselves. I'd guess it's unlikely to be combined.
>
> Best,
> Erick
>
> On Thu, Jun 4, 2015 at 7:47 AM, Zheng Lin Edwin Yeo
>  wrote:
> > Hi,
> >
> > Would like to check, are we able to use the Collection API or any other
> > method to list all the collections in the cluster together with the
> number
> > of records in each of the collections in one output?
> >
> > Currently, I only know of the List Collections
> > /admin/collections?action=LIST. However, this only list the names of the
> > collections that are in the cluster, but not the number of records.
> >
> > Is there a way to show the number of records in each of the collections
> as
> > well?
> >
> > Regards,
> > Edwin
>


Re: Peer Sync fails when newly added node is elected leader.

2015-06-04 Thread Shalin Shekhar Mangar
Why do you stop the cluster while adding a node? This is the reason why
this is happening. When the first node of a solr cluster starts up, it
waits for some time to see other nodes but if it finds none then it goes
ahead and becomes the leader. If other nodes were up and running then peer
sync and replication recovery will make sure that the node with data
becomes the leader. So just keep the cluster running while adding a new
node.

Also, stop relying on core discovery for setting up a node. At some point
we will stop supporting this feature. Use the collection API to add new
replicas.

On Fri, Jun 5, 2015 at 5:01 AM, Michael Roberts 
wrote:

> Hi,
>
> I am seeing some unexpected behavior when adding a new machine to my
> cluster. I am running 4.10.3.
>
> My setup has multiple collections, each collection has a single shard. I
> am using core auto discovery on the hosts (my deployment mechanism ensures
> that the directory structure is created and the core.properties file is in
> the right place).
>
> To add a new machine I have to stop the cluster.
>
> If I add a new machine, and start the cluster, if this new machine is
> elected leader for the shard, peer recovery fails. So, now I have a leader
> with no content, and replicas with content. Depending on where the read
> request is sent, I may or may not get the response I am expecting.
>
> 2015-06-04 14:26:09.595 -0700 (,,,) coreZkRegister-1-thread-3 : INFO
> org.apache.solr.cloud.ShardLeaderElectionContext - Running the leader
> process for shard shard1
> 2015-06-04 14:26:09.607 -0700 (,,,) coreZkRegister-1-thread-9 : INFO
> org.apache.solr.cloud.ShardLeaderElectionContext - Waiting until we see
> more replicas up for shard shard1: total=2 found=1 timeoutin=1.14707356E15ms
> 2015-06-04 14:26:10.108 -0700 (,,,) coreZkRegister-1-thread-3 : INFO
> org.apache.solr.cloud.ShardLeaderElectionContext - Enough replicas found to
> continue.
> 2015-06-04 14:26:10.108 -0700 (,,,) coreZkRegister-1-thread-3 : INFO
> org.apache.solr.cloud.ShardLeaderElectionContext - I may be the new leader
> - try and sync
> 2015-06-04 14:26:10.115 -0700 (,,,) coreZkRegister-1-thread-3 : INFO
> org.apache.solr.update.PeerSync - PeerSync: core=domain url=
> http://10.36.9.70:11000/solr START replicas=[
> http://mlim:11000/solr/domain/] nUpdates=100
> 2015-06-04 14:26:10.121 -0700 (,,,) coreZkRegister-1-thread-3 : INFO
> org.apache.solr.update.PeerSync - PeerSync: core=domain url=
> http://10.36.9.70:11000/solr DONE.  We have no versions.  sync failed.
> 2015-06-04 14:26:10.121 -0700 (,,,) coreZkRegister-1-thread-3 : INFO
> org.apache.solr.cloud.ShardLeaderElectionContext - We failed sync, but we
> have no versions - we can't sync in that case - we were active before, so
> become leader anyway
> 2015-06-04 14:26:10.121 -0700 (,,,) coreZkRegister-1-thread-3 : INFO
> org.apache.solr.cloud.ShardLeaderElectionContext - I am the new leader:
> http://10.36.9.70:11000/solr/domain/ shard1
> 2015-06-04 14:26:11.153 -0700 (,,,) coreZkRegister-1-thread-3 : INFO
> org.apache.solr.cloud.ZkController - No LogReplay needed for core=domain
> baseURL=http://10.36.9.70:11000/solr
> 2015-06-04 14:26:11.153 -0700 (,,,) coreZkRegister-1-thread-3 : INFO
> org.apache.solr.cloud.ZkController - I am the leader, no recovery necessary
>
> This seems like a fairly common scenario. So I suspect, either I am doing
> something incorrectly, or I have an incorrect assumption about how this is
> supposed to work.
>
> Does anyone have any suggestions?
>
> Thanks
>
> Mike.
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Problems while setting classpath (while upgrading to Solr 5.1 from Solr 4.X)

2015-06-04 Thread Shalin Shekhar Mangar
Since you are moving to Solr 5.x, have you seen
https://cwiki.apache.org/confluence/display/solr/Adding+Custom+Plugins+in+SolrCloud+Mode
?

On Fri, Jun 5, 2015 at 4:03 AM, Vaibhav Bhandari <
vaibhav.bhandar...@gmail.com> wrote:

> Thanks for the quick response Shawn. That worked!
>
> I wish this was documented though.
>
> -Vaibhav
>
> On Thu, Jun 4, 2015 at 11:22 AM, Shawn Heisey  wrote:
>
> > On 6/4/2015 12:01 PM, Vaibhav Bhandari wrote:
> > > With Solr 5.X, however, I am finding it hard to include my
> custom-plugin
> > > jars in the classpath. I am using bin/solr script to start up Solr.
> > >
> > > I have tried a few things:
> > >
> > >1. Passing the path of custom jars as -Djava.class.path=
> > >2. Passing the path as -a "-cp "
> > >3. Copying the jars inside server/lib/ext folder.
> > >4. Copying the jars inside server/solr-webapp/webapp/WEB-INF/lib
> > folder
> > >(because after reading the script, i realize that this folder is
> > included
> > >in the classpath by default).
> > >
> > > Is there a recommended way to load the plugin jars in the classpath?
> Note
> > > that I am willing to avoid the blob-store api.
> > >
> > > Also, there is no documentation on the OPTIONS parameter in start.jar
> and
> > > how do i set that from bin/solr script.
> > >
> > > Any guidance is highly appreciated.
> >
> > Find your solr home.  This is the directory where solr.xml lives, and
> > normally the directory where the core directories are.  If you aren't
> > specifying the solr home, it will default to server/solr in the root of
> > the extracted download.
> >
> > In the solr home, create a "lib" directory.  Put all extra or contrib
> > jars required by Solr there.  Start Solr.
> >
> > That's it.  All jars included in the lib directory that you created are
> > automatically loaded by Solr.  You do not need any  directives in
> > your solrconfig.xml file.
> >
> > Thanks,
> > Shawn
> >
> >
>



-- 
Regards,
Shalin Shekhar Mangar.


Collection ID in Solr?

2015-06-04 Thread Zheng Lin Edwin Yeo
Hi,

Would like to check, does all the collections in Solr have an ID that is
stored internally which we can reference to?

Currently I believe we are using the name of the collection when we are
querying to the collection, and this can be modified as and when is
required. Whenever this is changed, it will cause problem with my program
as the old collection name doesn't exists anymore.

So would like to know, does Solr creates an unique ID when a collection is
created, and this ID shouldn't change even if the collection name is
changed?

I'm using Solr 5.1


Regards,
Edwin


Re: List all Collections together with number of records

2015-06-04 Thread Erick Erickson
Have you considered spawning a bunch of threads, one per collection
and having them all run in parallel?

Best,
Erick

On Thu, Jun 4, 2015 at 4:52 PM, Zheng Lin Edwin Yeo
 wrote:
> The reason we wanted to do a single call is to improve on the performance,
> as our application requires to list the total number of records in each of
> the collections, and the number of records that matches the query each of
> the collections.
>
> Currently we are querying each collection one by one to retrieve the
> numFound value and display them, but this can slow down the system
> significantly when the number of collection grows. So we are thinking of
> ways to improve the speed in this area.
>
> Any other methods which you can suggest that we can do to overcome this
> speed problem?
>
> Regards,
> Edwin
> On 5 Jun 2015 00:16, "Erick Erickson"  wrote:
>
>> Not in a single call that I know of. These are really orthogonal
>> concepts. Getting the cluster status merely involves reading the
>> Zookeeper clusterstate whereas getting the total number of docs for
>> each would involve querying each collection, i.e. going to the Solr
>> nodes themselves. I'd guess it's unlikely to be combined.
>>
>> Best,
>> Erick
>>
>> On Thu, Jun 4, 2015 at 7:47 AM, Zheng Lin Edwin Yeo
>>  wrote:
>> > Hi,
>> >
>> > Would like to check, are we able to use the Collection API or any other
>> > method to list all the collections in the cluster together with the
>> number
>> > of records in each of the collections in one output?
>> >
>> > Currently, I only know of the List Collections
>> > /admin/collections?action=LIST. However, this only list the names of the
>> > collections that are in the cluster, but not the number of records.
>> >
>> > Is there a way to show the number of records in each of the collections
>> as
>> > well?
>> >
>> > Regards,
>> > Edwin
>>


Re: Peer Sync fails when newly added node is elected leader.

2015-06-04 Thread Erick Erickson
And to pile on Shalin's comments, there is absolutely no reason
to try to pre-configure the replica on the new node, and quite
a bit of downside as you are finding. Just add the new node
without any cores and use the ADDREPLICA command to cause
create replicas.

Best,
Erick

On Thu, Jun 4, 2015 at 8:31 PM, Shalin Shekhar Mangar
 wrote:
> Why do you stop the cluster while adding a node? This is the reason why
> this is happening. When the first node of a solr cluster starts up, it
> waits for some time to see other nodes but if it finds none then it goes
> ahead and becomes the leader. If other nodes were up and running then peer
> sync and replication recovery will make sure that the node with data
> becomes the leader. So just keep the cluster running while adding a new
> node.
>
> Also, stop relying on core discovery for setting up a node. At some point
> we will stop supporting this feature. Use the collection API to add new
> replicas.
>
> On Fri, Jun 5, 2015 at 5:01 AM, Michael Roberts 
> wrote:
>
>> Hi,
>>
>> I am seeing some unexpected behavior when adding a new machine to my
>> cluster. I am running 4.10.3.
>>
>> My setup has multiple collections, each collection has a single shard. I
>> am using core auto discovery on the hosts (my deployment mechanism ensures
>> that the directory structure is created and the core.properties file is in
>> the right place).
>>
>> To add a new machine I have to stop the cluster.
>>
>> If I add a new machine, and start the cluster, if this new machine is
>> elected leader for the shard, peer recovery fails. So, now I have a leader
>> with no content, and replicas with content. Depending on where the read
>> request is sent, I may or may not get the response I am expecting.
>>
>> 2015-06-04 14:26:09.595 -0700 (,,,) coreZkRegister-1-thread-3 : INFO
>> org.apache.solr.cloud.ShardLeaderElectionContext - Running the leader
>> process for shard shard1
>> 2015-06-04 14:26:09.607 -0700 (,,,) coreZkRegister-1-thread-9 : INFO
>> org.apache.solr.cloud.ShardLeaderElectionContext - Waiting until we see
>> more replicas up for shard shard1: total=2 found=1 timeoutin=1.14707356E15ms
>> 2015-06-04 14:26:10.108 -0700 (,,,) coreZkRegister-1-thread-3 : INFO
>> org.apache.solr.cloud.ShardLeaderElectionContext - Enough replicas found to
>> continue.
>> 2015-06-04 14:26:10.108 -0700 (,,,) coreZkRegister-1-thread-3 : INFO
>> org.apache.solr.cloud.ShardLeaderElectionContext - I may be the new leader
>> - try and sync
>> 2015-06-04 14:26:10.115 -0700 (,,,) coreZkRegister-1-thread-3 : INFO
>> org.apache.solr.update.PeerSync - PeerSync: core=domain url=
>> http://10.36.9.70:11000/solr START replicas=[
>> http://mlim:11000/solr/domain/] nUpdates=100
>> 2015-06-04 14:26:10.121 -0700 (,,,) coreZkRegister-1-thread-3 : INFO
>> org.apache.solr.update.PeerSync - PeerSync: core=domain url=
>> http://10.36.9.70:11000/solr DONE.  We have no versions.  sync failed.
>> 2015-06-04 14:26:10.121 -0700 (,,,) coreZkRegister-1-thread-3 : INFO
>> org.apache.solr.cloud.ShardLeaderElectionContext - We failed sync, but we
>> have no versions - we can't sync in that case - we were active before, so
>> become leader anyway
>> 2015-06-04 14:26:10.121 -0700 (,,,) coreZkRegister-1-thread-3 : INFO
>> org.apache.solr.cloud.ShardLeaderElectionContext - I am the new leader:
>> http://10.36.9.70:11000/solr/domain/ shard1
>> 2015-06-04 14:26:11.153 -0700 (,,,) coreZkRegister-1-thread-3 : INFO
>> org.apache.solr.cloud.ZkController - No LogReplay needed for core=domain
>> baseURL=http://10.36.9.70:11000/solr
>> 2015-06-04 14:26:11.153 -0700 (,,,) coreZkRegister-1-thread-3 : INFO
>> org.apache.solr.cloud.ZkController - I am the leader, no recovery necessary
>>
>> This seems like a fairly common scenario. So I suspect, either I am doing
>> something incorrectly, or I have an incorrect assumption about how this is
>> supposed to work.
>>
>> Does anyone have any suggestions?
>>
>> Thanks
>>
>> Mike.
>>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.


Re: Collection ID in Solr?

2015-06-04 Thread Erick Erickson
In a word, no. Why do you change the collection name? If you're doing
some sort of switching the collection, consider collection aliasing.

Best,
Erick

On Thu, Jun 4, 2015 at 8:53 PM, Zheng Lin Edwin Yeo
 wrote:
> Hi,
>
> Would like to check, does all the collections in Solr have an ID that is
> stored internally which we can reference to?
>
> Currently I believe we are using the name of the collection when we are
> querying to the collection, and this can be modified as and when is
> required. Whenever this is changed, it will cause problem with my program
> as the old collection name doesn't exists anymore.
>
> So would like to know, does Solr creates an unique ID when a collection is
> created, and this ID shouldn't change even if the collection name is
> changed?
>
> I'm using Solr 5.1
>
>
> Regards,
> Edwin


Re: List all Collections together with number of records

2015-06-04 Thread Zheng Lin Edwin Yeo
I'm trying to write a SolrJ program in Java to read and consolidate all the
information into a JSON file, The client will just need to call this SolrJ
program and read this JSON file to get the details. But the problem is we
are still querying the Solr once for each collection, just that this time
it is done in the SolrJ program in a for-loop, while previously it's done
on the client side. Not sure will this lead to performance improvement?

For your suggestion on spawning a bunch of threads, does it mean the same
thing as I did?

Regards,
Edwin


On 5 June 2015 at 12:03, Erick Erickson  wrote:

> Have you considered spawning a bunch of threads, one per collection
> and having them all run in parallel?
>
> Best,
> Erick
>
> On Thu, Jun 4, 2015 at 4:52 PM, Zheng Lin Edwin Yeo
>  wrote:
> > The reason we wanted to do a single call is to improve on the
> performance,
> > as our application requires to list the total number of records in each
> of
> > the collections, and the number of records that matches the query each of
> > the collections.
> >
> > Currently we are querying each collection one by one to retrieve the
> > numFound value and display them, but this can slow down the system
> > significantly when the number of collection grows. So we are thinking of
> > ways to improve the speed in this area.
> >
> > Any other methods which you can suggest that we can do to overcome this
> > speed problem?
> >
> > Regards,
> > Edwin
> > On 5 Jun 2015 00:16, "Erick Erickson"  wrote:
> >
> >> Not in a single call that I know of. These are really orthogonal
> >> concepts. Getting the cluster status merely involves reading the
> >> Zookeeper clusterstate whereas getting the total number of docs for
> >> each would involve querying each collection, i.e. going to the Solr
> >> nodes themselves. I'd guess it's unlikely to be combined.
> >>
> >> Best,
> >> Erick
> >>
> >> On Thu, Jun 4, 2015 at 7:47 AM, Zheng Lin Edwin Yeo
> >>  wrote:
> >> > Hi,
> >> >
> >> > Would like to check, are we able to use the Collection API or any
> other
> >> > method to list all the collections in the cluster together with the
> >> number
> >> > of records in each of the collections in one output?
> >> >
> >> > Currently, I only know of the List Collections
> >> > /admin/collections?action=LIST. However, this only list the names of
> the
> >> > collections that are in the cluster, but not the number of records.
> >> >
> >> > Is there a way to show the number of records in each of the
> collections
> >> as
> >> > well?
> >> >
> >> > Regards,
> >> > Edwin
> >>
>


Re: Collection ID in Solr?

2015-06-04 Thread Zheng Lin Edwin Yeo
Hi Erick,

The reason is we want to allow flexibility to change the collection name
based on the needs of the users.

For the collection aliasing, does this mean that the user will reference
the collection by the alias name instead of the collection name, but at the
backend we will still reference by the collection name? Whenever the user
wants to change the name, they'll just change the alias name, and leave the
collection name intact?

Regards,
Edwin


On 5 June 2015 at 12:08, Erick Erickson  wrote:

> In a word, no. Why do you change the collection name? If you're doing
> some sort of switching the collection, consider collection aliasing.
>
> Best,
> Erick
>
> On Thu, Jun 4, 2015 at 8:53 PM, Zheng Lin Edwin Yeo
>  wrote:
> > Hi,
> >
> > Would like to check, does all the collections in Solr have an ID that is
> > stored internally which we can reference to?
> >
> > Currently I believe we are using the name of the collection when we are
> > querying to the collection, and this can be modified as and when is
> > required. Whenever this is changed, it will cause problem with my program
> > as the old collection name doesn't exists anymore.
> >
> > So would like to know, does Solr creates an unique ID when a collection
> is
> > created, and this ID shouldn't change even if the collection name is
> > changed?
> >
> > I'm using Solr 5.1
> >
> >
> > Regards,
> > Edwin
>


docValues in solr/lucene 4.8.x

2015-06-04 Thread pras.venkatesh
I see docValues has been there since Lucene 4.0. so can I use docValues with
my current solr cloud version of 4.8.x 

The reason I am asking is because, I have deployment mechanism and securing
the index (using Tomcat valve) all built out based on Tomcat which I need
figure out all the way again with Jetty. 

so thinking if I could use docValues with solr/lucene 4.8.x in order to
perform sort/facet queries effectively(consuming less heap memory)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/docValues-in-solr-lucene-4-8-x-tp4209927.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Collection ID in Solr?

2015-06-04 Thread Shawn Heisey
On 6/4/2015 11:39 PM, Zheng Lin Edwin Yeo wrote:
> The reason is we want to allow flexibility to change the collection name
> based on the needs of the users.
> 
> For the collection aliasing, does this mean that the user will reference
> the collection by the alias name instead of the collection name, but at the
> backend we will still reference by the collection name? Whenever the user
> wants to change the name, they'll just change the alias name, and leave the
> collection name intact?

The users can change the collection name? That sounds like a recipe for
disaster.

Collection aliases are a way for an admin to do index swapping in
SolrCloud so that new indexes can be built and then swapped with the old
index.  You build a collection named something like foo20150605 and then
set up an alias named "foo" that points to that collection.  Your
program doesn't need to know about the real collection name, the alias
redirects all activity from the well-known name to the actual
collection, and that mapping can change anytime it needs to.

Thanks,
Shawn



Re: docValues in solr/lucene 4.8.x

2015-06-04 Thread Shawn Heisey
On 6/4/2015 11:42 PM, pras.venkatesh wrote:
> I see docValues has been there since Lucene 4.0. so can I use docValues with
> my current solr cloud version of 4.8.x 
> 
> The reason I am asking is because, I have deployment mechanism and securing
> the index (using Tomcat valve) all built out based on Tomcat which I need
> figure out all the way again with Jetty. 
> 
> so thinking if I could use docValues with solr/lucene 4.8.x in order to
> perform sort/facet queries effectively(consuming less heap memory)

Solr 4.8 can do docValues.  To enable the feature on a field, you just
need to change field definition in schema.xml to include docValues="true".

Note that you need to completely reindex.  After you make the change and
restart or reload, sorting and facets will NOT work until the reindex is
done, because when docValues is present in the schema, Solr will try to
use docValues, and that data will not be present unless you reindex.

https://wiki.apache.org/solr/HowToReindex

Thanks,
Shawn



Integrating Solr with Tomcat

2015-06-04 Thread Chandima Dileepa
Hi,

According to the wiki, I got to know that integrating Solr (starting
release 5.0.0) with tomcat cannot be done. Should I run Solr as a
standalone server?

Thanks,
Chandima


Re: Integrating Solr with Tomcat

2015-06-04 Thread Shawn Heisey
On 6/5/2015 12:32 AM, Chandima Dileepa wrote:
> According to the wiki, I got to know that integrating Solr (starting
> release 5.0.0) with tomcat cannot be done. Should I run Solr as a
> standalone server?

Yes.

There's a lot more detail, but read this first:

https://wiki.apache.org/solr/WhyNoWar

If you still have questions after reading that, feel free to ask them.

Thanks,
Shawn