Re: Can't get spelling suggestions to work properly

2017-01-12 Thread Matt Pearce

Hi Jimi,

It looks like the suggest mode defaults to only returning results when 
the query term is not in your index.


I think setting spellcheck.onlyMorePopular=true, or a value for 
spellcheck.alternativeTermCount should change the suggest mode to one 
that doesn't enforce that restriction. Apologies if you've already tried 
those, but it doesn't look like they're set in your query params below.


I know from recent experience that setting up spell checkers (and 
suggesters) can be an irritating process, and often a result of lots of 
trial and error testing!


All the best,

Matt


On 10/01/17 15:41, jimi.hulleg...@svensktnaringsliv.se wrote:

No one has any input on my post below about the spelling suggestions? I just find it a 
bit frustrating not being able to understand this feature better, and why it doesn't give 
the expected results. A built in "explain" feature really would have helped.

/Jimi

-Original Message-
From: jimi.hulleg...@svensktnaringsliv.se 
[mailto:jimi.hulleg...@svensktnaringsliv.se]
Sent: Friday, December 16, 2016 9:58 PM
To: solr-user@lucene.apache.org
Subject: Can't get spelling suggestions to work properly

Hi,

I'm trying to add the spelling suggestion feature to our search, but I'm having 
problems getting suggestions on some misspellings.

For example, the Swedish word 'mycket' exists in ~14.000 of a total of ~40.000 
documents in our index.

A search for the incorrect spelling 'myket' (a missing 'c') gives several 
spelling suggestions, and the top one is 'mycket'. This is the wanted/expected 
behaivor.

But a search for the incorrect spelling 'mycet' (a missing 'k') gives no 
spelling suggestions.

The only difference between these two searches is that the one that results in 
spelling suggestions had zero results, while the other one had two (2) results. 
These two documents contain this incorrect spelling ('mycet'). Can this be the 
cause of no spelling suggestions? But I have set 'maxQueryFrequency' to 0.001, 
and with 40.000 documents in the index that should mean that the word can exist 
in up to 40 documents, and since 2 is less than 40 I argue that that this word 
would be considered a spelling misstake. But for some reason the solr 
spellchecker considers 'myket' as an incorrect spelling, while 'mycet' 
incorrectly is considered as a correct spelling.

Also, I tried with spellcheck.accuracy=0 just to rule out that I have a too 
high accuracy setting, but that didn't help.

Can someone see what I'm doing wrong, or give some tips on configuration 
changes and/or how I can troubleshoot this? For example, is there any way to 
debug the spellchecker function?


Here are the searches:

Search for 'myket':

http://localhost:8080/solr/s2/select/?q=myket&rows=100&sort=score+desc&fl=*%2Cscore%2C%5Bexplain+style%3Dtext%5D&defType=edismax&qf=title%5E2&qf=swedishText1%5E1&spellcheck=true&spellcheck.accuracy=0&spellcheck.maxCollationTries=200&fq=%2Bactivatedate%3A%5B*+TO+NOW%5D+%2Bexpiredate%3A%5BNOW+TO+*%5D+%2B%28state%3Apublished+OR+state%3Adraft-published+OR+state%3Asubmitted-published+OR+state%3Aapproved-published%29&wt=xml&indent=true

Spellcheck output for 'myket':


 
  
   16
   0
   5
   0
   
  
  
  
   mycket
  
   14039
  
  

[...]
  
 
  
  false
  
   mycket
   14005
   

Re: Can't get spelling suggestions to work properly

2017-01-12 Thread alessandro.benedetti
Hi Jimi,
taking a look to the *maxQueryFrequency*  param :

Your understanding is correct.

1) we don't provide misspelled suggestions if we set the param to 1, and we
have a minimum of 1 doc freq for the term .

2) we don't provide misspelled suggestions if the doc frequency of the term
is greater than the max limit set.

Let us explore the code :

if (suggestMode==SuggestMode.SUGGEST_WHEN_NOT_IN_INDEX && docfreq > 0) {
  return new SuggestWord[0];
}
/// If we are working in "Not in Index Mode" , with a document frequency >0
we get no misspelled corrections.
/

int maxDoc = ir.maxDoc();

if (maxQueryFrequency >= 1f && docfreq > maxQueryFrequency) {
  return new SuggestWord[0];
} else if (docfreq > (int) Math.ceil(maxQueryFrequency * (float)maxDoc))
{
  return new SuggestWord[0];
}
// then the MaxQueryFrequency as you correctly stated enters the game

...

Let's explore how you can end up in the first scenario :

if (maxResultsForSuggest == null || hits <= maxResultsForSuggest) {
  SuggestMode suggestMode = SuggestMode.SUGGEST_WHEN_NOT_IN_INDEX;
  if (onlyMorePopular) {
suggestMode = SuggestMode.SUGGEST_MORE_POPULAR;
  } else if (alternativeTermCount > 0) {
suggestMode = SuggestMode.SUGGEST_ALWAYS;
  }

You did not set maxResultsForSuggest ( and not onlyMorePopular or
alternative term count) so you ended up in :
SuggestMode suggestMode = SuggestMode.SUGGEST_WHEN_NOT_IN_INDEX;

>From Solr javaDoc :

If left unspecified, the default behavior will prevail.  That is,
"correctlySpelled" will be false and suggestions
   * will be returned only if one or more of the query terms are absent from
the dictionary and/or index.  If set to zero,
   * the "correctlySpelled" flag will be false only if the response returns
zero hits.  If set to a value greater than zero, 
   * suggestions will be returned even if hits are returned (up to the
specified number).  This number also will serve as
   * the threshold in determining the value of "correctlySpelled". 
Specifying a value greater than zero is useful 
   * for creating "did-you-mean" suggestions for queries that return a low
number of hits.
   * 
   */
  public static final String SPELLCHECK_MAX_RESULTS_FOR_SUGGEST =
SPELLCHECK_PREFIX + "maxResultsForSuggest";

You probably want to bypass the other parameters and just set the proper
maxResultsForSuggest param for your spellchecker
Cheers



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-t-get-spelling-suggestions-to-work-properly-tp4310079p4313685.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to train the model using user clicks when use ltr(learning to rank) module?

2017-01-12 Thread alessandro.benedetti
Hi Jeffery,
Just noticed your comment to my blog, I will try to respond asap.
Related your doubt, I second Diego's readme.

If you have other user signals as well ( apart from clicks) it may be
interesting to use them as well.
Users signals such as : "Add To Favorites" , "Add to the basket" , "Share" ,
"Buy", could be indicator of better relevancy.
If you are able to collect "shown but not clicked" should help you as well (
in this case as an indicator of low relevancy).
Of course this will have implication on the volume of signals collected.

I may state the obvious, but if you can, try to collect as much signals type
as possible.

Cheers




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-train-the-model-using-user-clicks-when-use-ltr-learning-to-rank-module-tp4312462p4313688.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Commit required after delete ?

2017-01-12 Thread alessandro.benedetti
Interesting Michael, can you pass me the code reference?

Cheers



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Commit-required-after-delete-tp4312697p4313692.html
Sent from the Solr - User mailing list archive at Nabble.com.


[HelpWanted] Improve the PublicServers wiki page

2017-01-12 Thread Jan Høydahl
Hi,

We have a Wiki page at https://wiki.apache.org/solr/PublicServers where users 
can maintain a list of known Solr installations.
Now, that list contains a staggering 248 public sites and 27 other users. 
Problem is that the list grows but noone ever takes something away when they go 
out of business, or fix dead links :-(

Having a maintained list will give a better first impression for people 
intending to check out sites using Solr.
So I kicked off a dead-link crawl for the site, and found 383 link URLs in 
total, of which 63 were dead.
Some of those dead links represent companies that are out of business, some are 
404’s due to new website and yet others have merged or changed name and still 
using Solr.

I’d like to call out a small gardening project to clean up the page. This is a 
task that does not require Java skills, so welcome to contribute even if you 
are new.

Here are suggestions for tasks you can take on...

Fix broken links

Test every broken link (list at the end of this email) and see if there’s an 
easy fix.
If you cannot fix the link, move the entire entry to the new section “Broken 
links graveyard”


QA of the other sites
-
For all public websites listed, try to validate the claims and correct obvious 
errors.
If it is obvious that the site does not use Solr anymore, or it is clearly 
broken or outdated,
remove it or send an email to the company asking them to comment.
Where it says "Company X uses Solr version x.y…”, remove the version info since 
it is irrelevant
To keep track of which entries that have been validated, please move them above 
the horizontal line as you go.


Add new entries you know about
--
Add your own site or your customer’s site




The following deadlinks were found by my script:

https://wiki.apache.org/solr/PublicServers
http://jobuzu.co.uk/
http://jobblu.co.uk/
http://www.attinteractive.com/
http://ca.buy.com
http://fr.buy.com
http://www.stubhub.com/
http://siris-collections.si.edu/search/
http://www.golfvex.com/
http://www.wickedin.co.uk/
http://jobs.wickedin.co.uk/
http://homes.wickedin.co.uk/
http://cars.wickedin.co.uk/
http://motorcycles.wickedin.co.uk/
http://pets.wickedin.co.uk/
http://campers.wickedin.co.uk/
http://classifieds.wickedin.co.uk/
http://jetwick.com/
http://www.pannous.info/products/jetwick-twitter-search/
http://www.allplumbingrepair.com/
http://www.thebigjobs.com
http://www.manta.com/mb
http://www.discogs.com/
http://codeconsult.ch/bertrand/archives/000760.html
http://www.zvents.com/
http://www.pricejunkie.com/
http://reddit.com
http://mamereviews.hubmed.org/
http://peel.hubmed.org/
http://www.bazaaria.com/
http://www.autoo.ro
http://www.rez.ro
http://www.imoo.ro
http://www.rallyformusic.com
http://lab.cheaptickets.com/shop/reviews
http://www.webcity.fr/concert/recherche-evenement
http://www.adidastrainers.co.uk
http://firm-ua.com/
http://www.findthatzipfile.com
http://www.findthatexe.com
http://www.findthataudio.com,http://www.findthatvideo.com
http://www.saveur.com/solrSearchResults.jsp
http://www.voucher-code-discount.co.uk

http://blog.gkudos.com/2010/03/05/observatory-of-presidential-elections-colombia-2010/
http://www.tvtv.co.uk
http://www.jounce.com/
http://katalog.finn.no/
http://jobs.trovit.com/
http://www.shipsmoorings.com
http://www.talenttube.co/CampusSearch.do
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://dazzlepod.com/gifiles/
http://www.searchbyquote.com/
http://www.benipaltechnologies.com
http://phx.corporate-ir.net/phoenix.zhtml?c=98566&p=irol-aboutIMOverview
http://www.webcity.fr/hotels/recherche-lieu
http://www.talenttube.co/CorporateSearch.do
http://dazzlepod.com/cable/
http://www.webcity.fr/restaurants/recherche-lieu
http://www.talenttube.co/JobSearch.do
http://www.webcity.fr/
http://www.talenttube.co/CandidateSearch.do
http://www.talenttube.co/StudentSearch.do
http://www.talenttube.co/


--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com



Re: Commit required after delete ?

2017-01-12 Thread Mikhail Khludnev
Alessandro,
I'm not sure which code reference are you asking about, but here they are:
http://lucene.apache.org/core/6_3_0/core/org/apache/lucene/index/DirectoryReader.html#openIfChanged-org.apache.lucene.index.DirectoryReader-org.apache.lucene.index.IndexWriter-boolean-
http://blog.mikemccandless.com/2011/06/lucenes-near-real-time-search-is-fast.html


On Thu, Jan 12, 2017 at 2:09 PM, alessandro.benedetti  wrote:

> Interesting Michael, can you pass me the code reference?
>
> Cheers
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Commit-required-after-delete-tp4312697p4313692.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev


Re: [HelpWanted] Improve the PublicServers wiki page

2017-01-12 Thread Erick Erickson
+1.

Also, you don't have to be a committer to do this! If you don't
already have access to edit the Wiki we can cure that ;)

On Thu, Jan 12, 2017 at 3:31 AM, Jan Høydahl  wrote:
> Hi,
>
> We have a Wiki page at https://wiki.apache.org/solr/PublicServers where users 
> can maintain a list of known Solr installations.
> Now, that list contains a staggering 248 public sites and 27 other users.
> Problem is that the list grows but noone ever takes something away when they 
> go out of business, or fix dead links :-(
>
> Having a maintained list will give a better first impression for people 
> intending to check out sites using Solr.
> So I kicked off a dead-link crawl for the site, and found 383 link URLs in 
> total, of which 63 were dead.
> Some of those dead links represent companies that are out of business, some 
> are 404’s due to new website and yet others have merged or changed name and 
> still using Solr.
>
> I’d like to call out a small gardening project to clean up the page. This is 
> a task that does not require Java skills, so welcome to contribute even if 
> you are new.
>
> Here are suggestions for tasks you can take on...
>
> Fix broken links
> 
> Test every broken link (list at the end of this email) and see if there’s an 
> easy fix.
> If you cannot fix the link, move the entire entry to the new section “Broken 
> links graveyard”
>
>
> QA of the other sites
> -
> For all public websites listed, try to validate the claims and correct 
> obvious errors.
> If it is obvious that the site does not use Solr anymore, or it is clearly 
> broken or outdated,
> remove it or send an email to the company asking them to comment.
> Where it says "Company X uses Solr version x.y…”, remove the version info 
> since it is irrelevant
> To keep track of which entries that have been validated, please move them 
> above the horizontal line as you go.
>
>
> Add new entries you know about
> --
> Add your own site or your customer’s site
>
>
>
>
> The following deadlinks were found by my script:
>
> https://wiki.apache.org/solr/PublicServers
> http://jobuzu.co.uk/
> http://jobblu.co.uk/
> http://www.attinteractive.com/
> http://ca.buy.com
> http://fr.buy.com
> http://www.stubhub.com/
> http://siris-collections.si.edu/search/
> http://www.golfvex.com/
> http://www.wickedin.co.uk/
> http://jobs.wickedin.co.uk/
> http://homes.wickedin.co.uk/
> http://cars.wickedin.co.uk/
> http://motorcycles.wickedin.co.uk/
> http://pets.wickedin.co.uk/
> http://campers.wickedin.co.uk/
> http://classifieds.wickedin.co.uk/
> http://jetwick.com/
> http://www.pannous.info/products/jetwick-twitter-search/
> http://www.allplumbingrepair.com/
> http://www.thebigjobs.com
> http://www.manta.com/mb
> http://www.discogs.com/
> http://codeconsult.ch/bertrand/archives/000760.html
> http://www.zvents.com/
> http://www.pricejunkie.com/
> http://reddit.com
> http://mamereviews.hubmed.org/
> http://peel.hubmed.org/
> http://www.bazaaria.com/
> http://www.autoo.ro
> http://www.rez.ro
> http://www.imoo.ro
> http://www.rallyformusic.com
> http://lab.cheaptickets.com/shop/reviews
> http://www.webcity.fr/concert/recherche-evenement
> http://www.adidastrainers.co.uk
> http://firm-ua.com/
> http://www.findthatzipfile.com
> http://www.findthatexe.com
> http://www.findthataudio.com,http://www.findthatvideo.com
> http://www.saveur.com/solrSearchResults.jsp
> http://www.voucher-code-discount.co.uk
> 
> http://blog.gkudos.com/2010/03/05/observatory-of-presidential-elections-colombia-2010/
> http://www.tvtv.co.uk
> http://www.jounce.com/
> http://katalog.finn.no/
> http://jobs.trovit.com/
> http://www.shipsmoorings.com
> http://www.talenttube.co/CampusSearch.do
> http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
> http://dazzlepod.com/gifiles/
> http://www.searchbyquote.com/
> http://www.benipaltechnologies.com
> 
> http://phx.corporate-ir.net/phoenix.zhtml?c=98566&p=irol-aboutIMOverview
> http://www.webcity.fr/hotels/recherche-lieu
> http://www.talenttube.co/CorporateSearch.do
> http://dazzlepod.com/cable/
> http://www.webcity.fr/restaurants/recherche-lieu
> http://www.talenttube.co/JobSearch.do
> http://www.webcity.fr/
> http://www.talenttube.co/CandidateSearch.do
> http://www.talenttube.co/StudentSearch.do
> http://www.talenttube.co/
>
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>


Max length of solr query

2017-01-12 Thread 武井宜行
Hi,all

My Application throws too large query to solr server with solrj
client.(Http Method is Post)

I have two questions.

At first,I would like to know the limit of  clauses of Boolean Query.I Know
the number is restricted to 1024 by default, and I can increase the limit
by setting setMaxClauseCount,but what is the limit of increasing clauses?

Next,if there is no limit of increasing clauses,is there the limit of query
length?My Application throws to large query like this with solrj client.

item_id: OR item_id: OR item_id: ...
(The number of item_id is maybe over than one million)


Re: Solr on HDFS: AutoAddReplica does not add a replica

2017-01-12 Thread Shawn Heisey
On 1/11/2017 7:14 PM, Chetas Joshi wrote:
> This is what I understand about how Solr works on HDFS. Please correct me
> if I am wrong.
>
> Although solr shard replication Factor = 1, HDFS default replication = 3.
> When the node goes down, the solr server running on that node goes down and
> hence the instance (core) representing the replica goes down. The data in
> on HDFS (distributed across all the datanodes of the hadoop cluster with 3X
> replication).  This is the reason why I have kept replicationFactor=1.
>
> As per the link:
> https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS
> One benefit to running Solr in HDFS is the ability to automatically add new
> replicas when the Overseer notices that a shard has gone down. Because the
> "gone" index shards are stored in HDFS, a new core will be created and the
> new core will point to the existing indexes in HDFS.
>
> This is the expected behavior of Solr overseer which I am not able to see.
> After a couple of hours a node was assigned to host the shard but the
> status of the shard is still "down" and the instance dir is missing on that
> node for that particular shard_replica.

As I said before, I know very little about HDFS, so the following could
be wrong, but it makes sense so I'll say it:

I would imagine that Solr doesn't know or care what your HDFS
replication is ... the only replicas it knows about are the ones that it
is managing itself.  The autoAddReplicas feature manages *SolrCloud*
replicas, not HDFS replicas.

I have seen people say that multiple SolrCloud replicas will take up
additional space in HDFS -- they do not point at the same index files. 
This is because proper Lucene operation requires that it lock an index
and prevent any other thread/process from writing to the index at the
same time.  When you index, SolrCloud updates all replicas independently
-- the only time indexes are replicated is when you add a new replica or
a serious problem has occurred and an index needs to be recovered.

Thanks,
Shawn



Re: Max length of solr query

2017-01-12 Thread Scott Stults
That doesn't seem like an efficient use of a search engine. Maybe what you
want to do is use streaming expressions to process some data:

https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions


k/r,
Scott

On Thu, Jan 12, 2017 at 11:36 AM, 武井宜行  wrote:

> Hi,all
>
> My Application throws too large query to solr server with solrj
> client.(Http Method is Post)
>
> I have two questions.
>
> At first,I would like to know the limit of  clauses of Boolean Query.I Know
> the number is restricted to 1024 by default, and I can increase the limit
> by setting setMaxClauseCount,but what is the limit of increasing clauses?
>
> Next,if there is no limit of increasing clauses,is there the limit of query
> length?My Application throws to large query like this with solrj client.
>
> item_id: OR item_id: OR item_id: ...
> (The number of item_id is maybe over than one million)
>



-- 
Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
| 434.409.2780
http://www.opensourceconnections.com


Re: Max length of solr query

2017-01-12 Thread Shawn Heisey
On 1/12/2017 9:36 AM, 武井宜行 wrote:
> My Application throws too large query to solr server with solrj
> client.(Http Method is Post)
>
> I have two questions.
>
> At first,I would like to know the limit of  clauses of Boolean Query.I Know
> the number is restricted to 1024 by default, and I can increase the limit
> by setting setMaxClauseCount,but what is the limit of increasing clauses?

The maximum possible value for maxBooleanClauses is Java's
Integer.MAX_VALUE -- about 2.1 billion.  Note that if you want to
increase this setting, you must do it in EVERY configuration.  The
setting is global, which means that the last core that loads is the one
that sets it for everything running in that JVM.  If the last core that
loads happens to be missing the config, it will be set back to 1024. 
Some of us have been trying to get this limit lifted, or at least
arranged so that it doesn't have to be changed on every core, but we've
been meeting with some resistance.

> Next,if there is no limit of increasing clauses,is there the limit of query
> length?My Application throws to large query like this with solrj client.

The default size limit on a POST request is 2MB, since about version
4.1.  Before that version, it was controlled by the container config,
not Solr.  This can be adjusted with the formdataUploadLimitInKB setting
in solrconfig.xml.  The default value for this is 2048, resulting in the
2MB I already mentioned.  This page contains the documentation for that
setting:

https://cwiki.apache.org/confluence/display/solr/RequestDispatcher+in+SolrConfig

Thanks,
Shawn



Re: regarding extending classes in org.apache.solr.client.solrj.io.stream.metrics package

2017-01-12 Thread Scott Stults
Radhakrishnan,

That would be an appropriate Jira ticket. You can submit it here:

https://issues.apache.org/jira/browse/solr

Also, if you want to submit a patch, check out the guidelines (it's pretty
easy):

https://wiki.apache.org/solr/HowToContribute


k/r,
Scott


On Tue, Jan 10, 2017 at 7:12 PM, radha krishnan 
wrote:

>  Hi,
>
> i want to extend the update(Tuple tuple) method in MaxMetric,. MinMetric,
> SumMetric, MeanMetric classes.
>
> can you please make the below metioned variables and methods in the above
> mentioned classes as protected so that it will be easy to extend
>
> variables
> ---
>
> longMax
>
> doubleMax
>
> columnName
>
>
> and
>
> methods
>
> ---
>
> init
>
>
>
> Thanks,
>
> Radhakrishnan D
>



-- 
Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
| 434.409.2780
http://www.opensourceconnections.com


Re: Facet date Range without start and and date

2017-01-12 Thread Scott Stults
No it's not. Use something like facet.date.start=-00-00T00:00:00Z
and facet.date.end=3000-00-00T00:00:00Z.


k/r,
Scott

On Mon, Jan 9, 2017 at 10:46 AM, nabil Kouici 
wrote:

> Hi All,
> Is it possible to have facet date range without specifying start and and
> of the range.
> Otherwise, is it possible to put in the same request start to min value
> and end to max value.
> Thank you.
> Regards,NKI.
>



-- 
Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
| 434.409.2780
http://www.opensourceconnections.com


Re: Max length of solr query

2017-01-12 Thread Noriyuki TAKEI
Hi,all.I got it.Thanks a lot!!

2017年1月13日(金) 1:56 Shawn Heisey-2 [via Lucene] <
ml-node+s472066n4313737...@n3.nabble.com>:

>
>
>
>
> On 1/12/2017 9:36 AM, 武井宜行 wrote:
>
>
> > My Application throws too large query to solr server with solrj
>
>
> > client.(Http Method is Post)
>
>
> >
>
>
> > I have two questions.
>
>
> >
>
>
> > At first,I would like to know the limit of  clauses of Boolean Query.I
> Know
>
>
> > the number is restricted to 1024 by default, and I can increase the limit
>
>
> > by setting setMaxClauseCount,but what is the limit of increasing clauses?
>
>
>
> The maximum possible value for maxBooleanClauses is Java's
>
>
> Integer.MAX_VALUE -- about 2.1 billion.  Note that if you want to
>
>
> increase this setting, you must do it in EVERY configuration.  The
>
>
> setting is global, which means that the last core that loads is the one
>
>
> that sets it for everything running in that JVM.  If the last core that
>
>
> loads happens to be missing the config, it will be set back to 1024.
>
>
> Some of us have been trying to get this limit lifted, or at least
>
>
> arranged so that it doesn't have to be changed on every core, but we've
>
>
> been meeting with some resistance.
>
>
>
> > Next,if there is no limit of increasing clauses,is there the limit of
> query
>
>
> > length?My Application throws to large query like this with solrj client.
>
>
>
> The default size limit on a POST request is 2MB, since about version
>
>
> 4.1.  Before that version, it was controlled by the container config,
>
>
> not Solr.  This can be adjusted with the formdataUploadLimitInKB setting
>
>
> in solrconfig.xml.  The default value for this is 2048, resulting in the
>
>
> 2MB I already mentioned.  This page contains the documentation for that
>
>
> setting:
>
>
>
>
> https://cwiki.apache.org/confluence/display/solr/RequestDispatcher+in+SolrConfig
>
> Thanks,
>
>
> Shawn
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
>
>
>
>
> If you reply to this email, your message will be added to the discussion
> below:
>
>
>
> http://lucene.472066.n3.nabble.com/Max-length-of-solr-query-tp4313734p4313737.html
>
>
>
>
>
> To start a new topic under Solr - User, email
> ml-node+s472066n472068...@n3.nabble.com
>
>
> To unsubscribe from Solr - User, click here
> 
> .
>
>
> NAML
> 
>
>




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Max-length-of-solr-query-tp4313734p4313740.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Max length of solr query

2017-01-12 Thread Erick Erickson
Also, consider using TermsQueryParser for your very large OR clauses.
That avoids the "too many boolean clauses" error and is more efficient
than a large OR clause. It'll also reduce the size of your query
somewhat since using it substitutes a space or perhaps comma for " OR
".

Best,
Erick

On Thu, Jan 12, 2017 at 9:18 AM, Noriyuki TAKEI  wrote:
> Hi,all.I got it.Thanks a lot!!
>
> 2017年1月13日(金) 1:56 Shawn Heisey-2 [via Lucene] <
> ml-node+s472066n4313737...@n3.nabble.com>:
>
>>
>>
>>
>>
>> On 1/12/2017 9:36 AM, 武井宜行 wrote:
>>
>>
>> > My Application throws too large query to solr server with solrj
>>
>>
>> > client.(Http Method is Post)
>>
>>
>> >
>>
>>
>> > I have two questions.
>>
>>
>> >
>>
>>
>> > At first,I would like to know the limit of  clauses of Boolean Query.I
>> Know
>>
>>
>> > the number is restricted to 1024 by default, and I can increase the limit
>>
>>
>> > by setting setMaxClauseCount,but what is the limit of increasing clauses?
>>
>>
>>
>> The maximum possible value for maxBooleanClauses is Java's
>>
>>
>> Integer.MAX_VALUE -- about 2.1 billion.  Note that if you want to
>>
>>
>> increase this setting, you must do it in EVERY configuration.  The
>>
>>
>> setting is global, which means that the last core that loads is the one
>>
>>
>> that sets it for everything running in that JVM.  If the last core that
>>
>>
>> loads happens to be missing the config, it will be set back to 1024.
>>
>>
>> Some of us have been trying to get this limit lifted, or at least
>>
>>
>> arranged so that it doesn't have to be changed on every core, but we've
>>
>>
>> been meeting with some resistance.
>>
>>
>>
>> > Next,if there is no limit of increasing clauses,is there the limit of
>> query
>>
>>
>> > length?My Application throws to large query like this with solrj client.
>>
>>
>>
>> The default size limit on a POST request is 2MB, since about version
>>
>>
>> 4.1.  Before that version, it was controlled by the container config,
>>
>>
>> not Solr.  This can be adjusted with the formdataUploadLimitInKB setting
>>
>>
>> in solrconfig.xml.  The default value for this is 2048, resulting in the
>>
>>
>> 2MB I already mentioned.  This page contains the documentation for that
>>
>>
>> setting:
>>
>>
>>
>>
>> https://cwiki.apache.org/confluence/display/solr/RequestDispatcher+in+SolrConfig
>>
>> Thanks,
>>
>>
>> Shawn
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>>
>>
>>
>> If you reply to this email, your message will be added to the discussion
>> below:
>>
>>
>>
>> http://lucene.472066.n3.nabble.com/Max-length-of-solr-query-tp4313734p4313737.html
>>
>>
>>
>>
>>
>> To start a new topic under Solr - User, email
>> ml-node+s472066n472068...@n3.nabble.com
>>
>>
>> To unsubscribe from Solr - User, click here
>> 
>> .
>>
>>
>> NAML
>> 
>>
>>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Max-length-of-solr-query-tp4313734p4313740.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr on HDFS: AutoAddReplica does not add a replica

2017-01-12 Thread Erick Erickson
Hmmm, have you changed any of the settings for autoAddReplcia? There
are several parameters that govern how long before a replica would be
added.

But I suggest you use the Cloudera resources for this question, not
only did they write this functionality, but Cloudera support is deeply
embedded in HDFS and I suspect has _by far_ the most experience with
it.

And that said, anything you find out that would suggest good ways to
clarify the docs would be most welcome!

Best,
Erick

On Thu, Jan 12, 2017 at 8:42 AM, Shawn Heisey  wrote:
> On 1/11/2017 7:14 PM, Chetas Joshi wrote:
>> This is what I understand about how Solr works on HDFS. Please correct me
>> if I am wrong.
>>
>> Although solr shard replication Factor = 1, HDFS default replication = 3.
>> When the node goes down, the solr server running on that node goes down and
>> hence the instance (core) representing the replica goes down. The data in
>> on HDFS (distributed across all the datanodes of the hadoop cluster with 3X
>> replication).  This is the reason why I have kept replicationFactor=1.
>>
>> As per the link:
>> https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS
>> One benefit to running Solr in HDFS is the ability to automatically add new
>> replicas when the Overseer notices that a shard has gone down. Because the
>> "gone" index shards are stored in HDFS, a new core will be created and the
>> new core will point to the existing indexes in HDFS.
>>
>> This is the expected behavior of Solr overseer which I am not able to see.
>> After a couple of hours a node was assigned to host the shard but the
>> status of the shard is still "down" and the instance dir is missing on that
>> node for that particular shard_replica.
>
> As I said before, I know very little about HDFS, so the following could
> be wrong, but it makes sense so I'll say it:
>
> I would imagine that Solr doesn't know or care what your HDFS
> replication is ... the only replicas it knows about are the ones that it
> is managing itself.  The autoAddReplicas feature manages *SolrCloud*
> replicas, not HDFS replicas.
>
> I have seen people say that multiple SolrCloud replicas will take up
> additional space in HDFS -- they do not point at the same index files.
> This is because proper Lucene operation requires that it lock an index
> and prevent any other thread/process from writing to the index at the
> same time.  When you index, SolrCloud updates all replicas independently
> -- the only time indexes are replicated is when you add a new replica or
> a serious problem has occurred and an index needs to be recovered.
>
> Thanks,
> Shawn
>


what will happen for incoming request when we are reloading a collection?

2017-01-12 Thread Jeffery Yuan
I am wondering what will happen for incoming requests when we are reloading a
collection. Whether the incoming requests may fail or just maybe a little
slow, or no impact for incoming requests at all?

Thanks
Jeffery Yuan



--
View this message in context: 
http://lucene.472066.n3.nabble.com/what-will-happen-for-incoming-request-when-we-are-reloading-a-collection-tp4313743.html
Sent from the Solr - User mailing list archive at Nabble.com.


Referencing a !key and !stat in facet.pivot

2017-01-12 Thread John Blythe
hi all

i'm having an issue with an attempt to assign a key to a facet.pivot while
simultaneously referencing one of my stat fields.

i've got something like this:

stats.field={!tag=pivot_stats}lastPrice&
> ...
> facet.pivot={!key=pivot} {!stats=pivot_stats}buyer,vendor& ...


i've attempted it without a space, wrapping the entire pivot in the !key's
{ } braces and anything else i could think of. some return errors, others
return the query results but w an empty

"facet_counts":{
> 
> "facet_pivot":{
>   "pivot":[]}},


it will work if I totally remove the {!key=pivot} portion, however.

is there any way to have both present?

thanks!


Retrieve one field from collection

2017-01-12 Thread Daisy Khaing TM
Hi,

 

I would like to get all the productIds from the collection which consist of 7 
million plus records. (production environment)

Is there any efficient way to do this? 

 

curl 
"http://localhost:8983/solr/product/select?q=*&fl=P_ProductId&wt=csv&start=7950001&rows=15000";
 -o productIds54.csv

 

Above is one of the method I could think of currently but it could impact our 
current performance and tedious to do it.

Thank you.

 

Regards,

Daisy

 


--
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or 
privileged information. If you are not the intended recipient or have received 
this e-mail in error, please inform the sender immediately and delete this 
e-mail (including any attachments) from your computer, and you must not use, 
disclose to anyone else or copy this e-mail (including any attachments), 
whether in whole or in part. 

This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.



Re: Retrieve one field from collection

2017-01-12 Thread Alexandre Rafalovitch
Have you looked at export handler?
https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 12 January 2017 at 22:04, Daisy Khaing TM  wrote:
> Hi,
>
>
>
> I would like to get all the productIds from the collection which consist of 7 
> million plus records. (production environment)
>
> Is there any efficient way to do this?
>
>
>
> curl 
> "http://localhost:8983/solr/product/select?q=*&fl=P_ProductId&wt=csv&start=7950001&rows=15000";
>  -o productIds54.csv
>
>
>
> Above is one of the method I could think of currently but it could impact our 
> current performance and tedious to do it.
>
> Thank you.
>
>
>
> Regards,
>
> Daisy
>
>
>
>
> --
> CONFIDENTIALITY NOTICE
>
> This e-mail (including any attachments) may contain confidential and/or 
> privileged information. If you are not the intended recipient or have 
> received this e-mail in error, please inform the sender immediately and 
> delete this e-mail (including any attachments) from your computer, and you 
> must not use, disclose to anyone else or copy this e-mail (including any 
> attachments), whether in whole or in part.
>
> This e-mail and any reply to it may be monitored for security, legal, 
> regulatory compliance and/or other appropriate reasons.
>


Re: what will happen for incoming request when we are reloading a collection?

2017-01-12 Thread Mikhail Khludnev
Hello Jeffery,
I bet for "becoming a little slower".

On Thu, Jan 12, 2017 at 8:52 PM, Jeffery Yuan  wrote:

> I am wondering what will happen for incoming requests when we are
> reloading a
> collection. Whether the incoming requests may fail or just maybe a little
> slow, or no impact for incoming requests at all?
>
> Thanks
> Jeffery Yuan
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/what-will-happen-for-incoming-request-when-we-
> are-reloading-a-collection-tp4313743.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev