Phonetic matching relevance

2018-01-29 Thread LOPEZ-CORTES Mariano-ext
Hello.

We work on a search application whose main goal is to find persons by name 
(surname and lastname).

Query text comes from a user-entered text field. Ordering of the text is not 
defined (lastname-surname, surname-lastname), but
some orderings are most important than others. The ranking is :

1 Exact match
2 Inexact match (contains entered words)
3 Inexact phonetic match (contains with Beider-Morse filter French version)

In addition, Lastname+surname  is prioritized over Surname+lastname.

All words entered by user have to match (in exact or inexact way)

We have following fields :

lastNameE : WordTokenizer, LowerCaseFilter, ASCIIFoldingFilterFactory
lastName : StandardTokenizer, LowerCaseFilter, ASCIIFoldingFilterFactory
lastNameP : StandardTokenizer, LowerCaseFilter, ASCIIFoldingFilterFactory and 
BMF
surnameE : WordTokenizer, LowerCaseFilter, ASCIIFoldingFilterFactory
surname : StandardTokenizer, LowerCaseFilter, ASCIIFoldingFilterFactory
surnameP : StandardTokenizer, LowerCaseFilter, ASCIIFoldingFilterFactory and BMF

We use Edismax query parser and we assign higher weights to exact fields and 
lower to inexact fields.

However, for the phonetic matches, there are some matches closer to the query 
text than others. How can we boost these results ?

Thanks in advance !


Re: Phonetic matching relevance

2018-01-29 Thread alessandro.benedetti
when you say : "However, for the phonetic matches, there are some matches
closer to the query text than others. How can we boost these results ? "

Do you mean closer in String edit distance ?
If that is the case you could use the String distance metric implemented in
Solr with a function query :
>From the wiki[1] : 

*strdist*
Calculate the distance between two strings. Uses the Lucene spell checker
StringDistance interface and supports all of the implementations available
in that package, plus allows applications to plug in their own via Solr’s
resource loading capabilities. strdist takes (string1, string2, distance
measure).

Possible values for distance measure are:

jw: Jaro-Winkler

edit: Levenstein or Edit distance

ngram: The NGramDistance, if specified, can optionally pass in the ngram
size too. Default is 2.

FQN: Fully Qualified class Name for an implementation of the StringDistance
interface. Must have a no-arg constructor.
e.g.
strdist("SOLR",id,edit)

You can add this to the edismax using a boost function ( boost parameter)
[2]

[1] https://lucene.apache.org/solr/guide/6_6/function-queries.html
[2] https://nolanlawson.com/2012/06/02/comparing-boost-methods-in-solr/



-
---
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr 4.8.1 multiple client updates the same collection

2018-01-29 Thread alessandro.benedetti
Generally speaking, if a full re-index is happening everyday, wouldn't be
better to use a technique such as collection alias ?

You could point your search clients to the "Alias" which points to the
online collection "collection1".
When you re-index you build "collection2", when it is finished you point
"Alias" to "collection2" .
The following day you do the same thing but you use "collection1" to index.

Client 2 for the atomic Updates will point to "Alias" .

I am assuming here that during the re-indexing the price we get in the fresh
index are the most up to date.
So as soon as re-index finishes the collection is perfectly up to date.

In the case you want to update the prices during re-indexing, the price
updater should point to the temporary collection.
Also in this case I assume that if a document was not indexed yet, the price
update will fail, but the document will get the correct price when it is
indexed.
Please correct any wrong assumption,

Cheers





-
---
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


query response time is too high

2018-01-29 Thread Aashish Agarwal
Hi,

Solr query time for a request comes aroung 10-12ms. But when I am hitting
the queries parallely the qtime rises to 900 ms but there is no significant
increase in cpu load. I am using solr with default memory settings. How can
I optimize to give less query time.

Thanks in advance.


Aashish Agarwal
Computer Science
Birla Institute of Technology and Science,Pilani



 Sent with Mailtrack



Perform incremental import with PDF Files

2018-01-29 Thread Karan Saini
Hi folks,

Please suggest the solution for importing and indexing PDF files
*incrementally*. My requirements is to pull the PDF files remotely from the
network folder path. This network folder will be having new sets of PDF
files after certain intervals (for say 20 secs). The folder will be forced
to get empty, every time the new sets of PDF files are copied into it. I do
not want to loose the earlier saved index of the old files, while doing the
next incremental import.

Currently, i am using Solr 6.6 version for the research.

The dataimport handler config is currently like this :-


  
  



-->


   




  

  


Kind regards,
Karan Singh


Re: ***UNCHECKED*** Limit Solr search to number of character/words (without changing index)

2018-01-29 Thread Muhammad Zahid Iqbal
Thanks Erick.

This is fine but I do not want to update my indexes as this configuration
will get applied to indexing as well. I have a requirement where one field
(XYZ) of type (text) requires two types of searches.

One is simple, search query will look for whole content indexed in XYZ field
Other one is, search query will have to look for first 100 characters
indexed in same XYZ field.

So I just want to do this at query time only.

Any idea? Would be much appreciated!


On Sat, Jan 27, 2018 at 10:27 PM, Erick Erickson 
wrote:

> Sure, use TruncateFieldUpdateProcessorFactory in your update chain,
> here's the base definition:
>
>   
> 
>   trunc
>   5
> 
>   
>
> This _can_ be configured to operate on "all StrField", or "all
> TextFields" as well, see the Javadocs.
>
> This is static, that is the field is truncated at index time so you
> can't change the values per-request.
>
> Best,
> Erick
>
>
>
> On Sat, Jan 27, 2018 at 6:32 AM, Muhammad Zahid Iqbal
>  wrote:
> > Thanks.
> >
> > I do not want to search if the query is shorter than a certain number of
> > terms/characters.
> >
> > For example, I have a 10MB document indexed in Solr what I want is to
> > search query in first 1MB content of that indexed document.
> >
> > Any workaround e.g .can I send query to Solr to look for only 1MB from
> > start of document.?
> >
> >
> >
> > On Fri, Jan 26, 2018 at 10:46 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) <
> > dceccarel...@bloomberg.net> wrote:
> >
> >> Hi Zahid, if you want to allow searching only if the query is shorter
> than
> >> a certain number of terms / characters, I would do it before calling
> solr
> >> probably, otherwise you could write a QueryParserPlugin (see [1]) and
> check
> >> that the query is sound before processing it.
> >> See also: http://coding-art.blogspot.co.uk/2016/05/writing-custom-
> >> solr-query-parser-for.html
> >>
> >> Cheers,
> >> Diego
> >>
> >> [1] https://wiki.apache.org/solr/SolrPlugins
> >>
> >>
> >> From: solr-user@lucene.apache.org At: 01/26/18 13:24:36To:
> >> solr-user@lucene.apache.org
> >> Cc:  apa...@elyograg.org
> >> Subject: ***UNCHECKED*** Limit Solr search to number of character/words
> >> (without changing index)
> >>
> >> Hi All,
> >>
> >> Is there any way I can restrict Solr search query to look for specified
> >> number of characters/words (for only searching purposes not for
> >> highlighting)
> >>
> >> *For example:*
> >>
> >> *Indexed content:*
> >> *I am a man of my words I am a lazy man...*
> >>
> >> Search to consider only below mentioned (words=7 or characters=16)
> >> *I am a man of my words*
> >>
> >> If I search for *lazy *no record should find.
> >> If I search for *a *1 record should find.
> >>
> >>
> >> Thanks
> >> Zahid Iqbal
> >>
> >>
> >>
>


RE: 7.2.1 cluster dies within minutes after restart

2018-01-29 Thread Markus Jelsma
Ok, i applied the patch and it is clear the timeout is 15000. Solr.xml says 
3 if ZK_CLIENT_TIMEOUT is not set, which is by default unset in 
solr.in.sh,but set in bin/solr to 15000. So it seems Solr's default is still 
15000, not 3.

But, back to my topic. I see we explicitly set it in solr.in.sh to 3. To be 
sure, i applied your patch to a production machine, all our collections run 
with 3. So how would that explain this log line?

o.a.z.ClientCnxn Client session timed out, have not heard from server in 22130ms

We also see these with smaller values, seven seconds. And, is this actually an 
indicator of the problems we have?

Any ideas?

Many thanks,
Markus
 
 
-Original message-
> From:Markus Jelsma 
> Sent: Saturday 27th January 2018 10:03
> To: solr-user@lucene.apache.org
> Subject: RE: 7.2.1 cluster dies within minutes after restart
> 
> Hello,
> 
> I grepped for it yesterday and found nothing but 3 in the settings, but 
> judging from the weird time out value, you may be right. Let me apply your 
> patch early next week and check for spurious warnings.
> 
> Another note worthy observation for those working on cloud stability and 
> recovery, whenever this happens, some nodes are also absolutely sure to run 
> OOM. The leaders usually live longest, the replica's don't, their heap usage 
> peaks every time, consistently. 
> 
> Thanks,
> Markus
>  
> -Original message-
> > From:Shawn Heisey 
> > Sent: Saturday 27th January 2018 0:49
> > To: solr-user@lucene.apache.org
> > Subject: Re: 7.2.1 cluster dies within minutes after restart
> > 
> > On 1/26/2018 10:02 AM, Markus Jelsma wrote:
> > > o.a.z.ClientCnxn Client session timed out, have not heard from server in 
> > > 22130ms (although zkClientTimeOut is 3).
> > 
> > Are you absolutely certain that there is a setting for zkClientTimeout
> > that is actually getting applied?  The default value in Solr's example
> > configs is 30 seconds, but the internal default in the code (when no
> > configuration is found) is still 15.  I have confirmed this in the code.
> > 
> > Looks like SolrCloud doesn't log the values it's using for things like
> > zkClientTimeout.  I think it should.
> > 
> > https://issues.apache.org/jira/browse/SOLR-11915
> > 
> > Thanks,
> > Shawn
> > 
> > 
> 


Re: ***UNCHECKED*** Limit Solr search to number of character/words (without changing index)

2018-01-29 Thread Emir Arnautović
Hi Muhammad,
If the limit(s) are static, you can still do it at index time: Assuming you 
send “content” field, you index it fully (and store if needed), and you use 
copy field to copy to content_limitted field where you use limit token count 
filter to index only first X tokens: 
https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-LimitTokenCountFilter
 


You can use CloneFieldUpdateProcessorFactory in combination with 
TruncateFieldUpdateProcessorFactory to do the similar thing in update request 
processor chain.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 29 Jan 2018, at 11:51, Muhammad Zahid Iqbal 
>  wrote:
> 
> Thanks Erick.
> 
> This is fine but I do not want to update my indexes as this configuration
> will get applied to indexing as well. I have a requirement where one field
> (XYZ) of type (text) requires two types of searches.
> 
> One is simple, search query will look for whole content indexed in XYZ field
> Other one is, search query will have to look for first 100 characters
> indexed in same XYZ field.
> 
> So I just want to do this at query time only.
> 
> Any idea? Would be much appreciated!
> 
> 
> On Sat, Jan 27, 2018 at 10:27 PM, Erick Erickson 
> wrote:
> 
>> Sure, use TruncateFieldUpdateProcessorFactory in your update chain,
>> here's the base definition:
>> 
>>  
>>
>>  trunc
>>  5
>>
>>  
>> 
>> This _can_ be configured to operate on "all StrField", or "all
>> TextFields" as well, see the Javadocs.
>> 
>> This is static, that is the field is truncated at index time so you
>> can't change the values per-request.
>> 
>> Best,
>> Erick
>> 
>> 
>> 
>> On Sat, Jan 27, 2018 at 6:32 AM, Muhammad Zahid Iqbal
>>  wrote:
>>> Thanks.
>>> 
>>> I do not want to search if the query is shorter than a certain number of
>>> terms/characters.
>>> 
>>> For example, I have a 10MB document indexed in Solr what I want is to
>>> search query in first 1MB content of that indexed document.
>>> 
>>> Any workaround e.g .can I send query to Solr to look for only 1MB from
>>> start of document.?
>>> 
>>> 
>>> 
>>> On Fri, Jan 26, 2018 at 10:46 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) <
>>> dceccarel...@bloomberg.net> wrote:
>>> 
 Hi Zahid, if you want to allow searching only if the query is shorter
>> than
 a certain number of terms / characters, I would do it before calling
>> solr
 probably, otherwise you could write a QueryParserPlugin (see [1]) and
>> check
 that the query is sound before processing it.
 See also: http://coding-art.blogspot.co.uk/2016/05/writing-custom-
 solr-query-parser-for.html
 
 Cheers,
 Diego
 
 [1] https://wiki.apache.org/solr/SolrPlugins
 
 
 From: solr-user@lucene.apache.org At: 01/26/18 13:24:36To:
 solr-user@lucene.apache.org
 Cc:  apa...@elyograg.org
 Subject: ***UNCHECKED*** Limit Solr search to number of character/words
 (without changing index)
 
 Hi All,
 
 Is there any way I can restrict Solr search query to look for specified
 number of characters/words (for only searching purposes not for
 highlighting)
 
 *For example:*
 
 *Indexed content:*
 *I am a man of my words I am a lazy man...*
 
 Search to consider only below mentioned (words=7 or characters=16)
 *I am a man of my words*
 
 If I search for *lazy *no record should find.
 If I search for *a *1 record should find.
 
 
 Thanks
 Zahid Iqbal
 
 
 
>> 



Re: ***UNCHECKED*** Limit Solr search to number of character/words (without changing index)

2018-01-29 Thread alessandro.benedetti
This seems different from what you initially asked ( and Diego responded)
"One is simple, search query will look for whole content indexed in XYZ
field 
Other one is, search query will have to look for first 100 characters 
indexed in same XYZ field. "

This is still doable at Indexing time using a copy field.
You can have your "originalField" and your "truncatedField" with no problem
at all.
Just use a combination of copyFields[1] and what Erick suggested.

Cheers

[1] https://lucene.apache.org/solr/guide/6_6/copying-fields.html



-
---
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Perform incremental import with PDF Files

2018-01-29 Thread Emir Arnautović
Hi Karan,
Did you try running full import with clean=false?

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 29 Jan 2018, at 11:18, Karan Saini  wrote:
> 
> Hi folks,
> 
> Please suggest the solution for importing and indexing PDF files
> *incrementally*. My requirements is to pull the PDF files remotely from the
> network folder path. This network folder will be having new sets of PDF
> files after certain intervals (for say 20 secs). The folder will be forced
> to get empty, every time the new sets of PDF files are copied into it. I do
> not want to loose the earlier saved index of the old files, while doing the
> next incremental import.
> 
> Currently, i am using Solr 6.6 version for the research.
> 
> The dataimport handler config is currently like this :-
> 
> 
>  
>  
> dataSource="null"
>   recursive = "true"  
> 
>   baseDir="\\CLDSINGH02\*RemoteFileDepot*"
>   fileName=".*pdf" rootEntity="false">
>   
>   
>-->
>
> 
>  onError="skip"
> 
> url="${K2FileEntity.fileAbsolutePath}" format="text"> 
> 
>   
>meta="true"/>
>   
> 
>
>  
> 
> 
> Kind regards,
> Karan Singh



Re: ***UNCHECKED*** Limit Solr search to number of character/words (without changing index)

2018-01-29 Thread Muhammad Zahid Iqbal
Hi Alessandro,

Thanks for making it more clear. As I mentioned I do not want to change my
index (mentioned in subject) for the feature I requested.


search query will have to look for first 100 characters indexed in same XYZ
field. "
How can I achieve this without changing index? I want at searching side.


On Mon, Jan 29, 2018 at 4:13 PM, alessandro.benedetti 
wrote:

> This seems different from what you initially asked ( and Diego responded)
> "One is simple, search query will look for whole content indexed in XYZ
> field
> Other one is, search query will have to look for first 100 characters
> indexed in same XYZ field. "
>
> This is still doable at Indexing time using a copy field.
> You can have your "originalField" and your "truncatedField" with no problem
> at all.
> Just use a combination of copyFields[1] and what Erick suggested.
>
> Cheers
>
> [1] https://lucene.apache.org/solr/guide/6_6/copying-fields.html
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: query response time is too high

2018-01-29 Thread Emir Arnautović
Hi Aashish,
Can you tell us a bit more about the size of your index and if you are running 
updates at the same time, types of queries, tests (is it some randomized query 
or some predefined), how many test threads do you use?

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 29 Jan 2018, at 11:17, Aashish Agarwal  wrote:
> 
> Hi,
> 
> Solr query time for a request comes aroung 10-12ms. But when I am hitting
> the queries parallely the qtime rises to 900 ms but there is no significant
> increase in cpu load. I am using solr with default memory settings. How can
> I optimize to give less query time.
> 
> Thanks in advance.
> 
> 
> Aashish Agarwal
> Computer Science
> Birla Institute of Technology and Science,Pilani
> 
> 
> 
>  Sent with Mailtrack
> 



Re: query response time is too high

2018-01-29 Thread Deepak Goel
FYI. I recently did a study on 'Performance of Solr'

https://www.linkedin.com/pulse/performance-comparison-solr-elasticsearch-deepak-goel/?trackingId=N2j9xWvVEQQaZYa%2BoEsy%2Bw%3D%3D



Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Mon, Jan 29, 2018 at 4:56 PM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Aashish,
> Can you tell us a bit more about the size of your index and if you are
> running updates at the same time, types of queries, tests (is it some
> randomized query or some predefined), how many test threads do you use?
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 29 Jan 2018, at 11:17, Aashish Agarwal  wrote:
> >
> > Hi,
> >
> > Solr query time for a request comes aroung 10-12ms. But when I am hitting
> > the queries parallely the qtime rises to 900 ms but there is no
> significant
> > increase in cpu load. I am using solr with default memory settings. How
> can
> > I optimize to give less query time.
> >
> > Thanks in advance.
> >
> >
> > Aashish Agarwal
> > Computer Science
> > Birla Institute of Technology and Science,Pilani
> >
> >
> >
> >  Sent with Mailtrack
> >  ndnaehgpjlnokgebbaldlmgkapkpjkkb?utm_source=gmail&utm_
> medium=signature&utm_campaign=signaturevirality>
>
>


Merging fields of two streaming expressions

2018-01-29 Thread Gintautas Sulskus
Hi,

Is it possible to merge fields of two stream sources in a specific way?
Take for example two search result sets:

search_1(... fl="score")
search_2(... fl="score")

I would like to merge these two into one result set. Its score would be
computed using a custom function f(x,y) that takes take scores of search_1
and search_2 as its parameters.

The function can be, for example, the average of the two inputs e.g.
f=(x+y)/2


Using Solr 6.5.1


Thanks,
Gintas


Re: ***UNCHECKED*** Limit Solr search to number of character/words (without changing index)

2018-01-29 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
In theory it should be possible if you are indexing the positions of the tokens 
in your field, 
but I am not aware of any solr query that allows you to weight the matches 
based on the position, does anyone know if is possible? 

From: solr-user@lucene.apache.org At: 01/29/18 11:25:36To:  
solr-user@lucene.apache.org
Subject: Re: ***UNCHECKED*** Limit Solr search to number of character/words 
(without changing index)

Hi Alessandro,

Thanks for making it more clear. As I mentioned I do not want to change my
index (mentioned in subject) for the feature I requested.


search query will have to look for first 100 characters indexed in same XYZ
field. "
How can I achieve this without changing index? I want at searching side.


On Mon, Jan 29, 2018 at 4:13 PM, alessandro.benedetti 
wrote:

> This seems different from what you initially asked ( and Diego responded)
> "One is simple, search query will look for whole content indexed in XYZ
> field
> Other one is, search query will have to look for first 100 characters
> indexed in same XYZ field. "
>
> This is still doable at Indexing time using a copy field.
> You can have your "originalField" and your "truncatedField" with no problem
> at all.
> Just use a combination of copyFields[1] and what Erick suggested.
>
> Cheers
>
> [1] https://lucene.apache.org/solr/guide/6_6/copying-fields.html
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>




Re: Perform incremental import with PDF Files

2018-01-29 Thread Karan Saini
Thanks Emir :-) . Setting the property *clean=false* worked for me.

Is there a way, i can selectively clean the particular index from the
C#.NET code using the SolrNet API ?
Please suggest.

Kind regards,
Karan


On 29 January 2018 at 16:49, Emir Arnautović 
wrote:

> Hi Karan,
> Did you try running full import with clean=false?
>
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 29 Jan 2018, at 11:18, Karan Saini  wrote:
> >
> > Hi folks,
> >
> > Please suggest the solution for importing and indexing PDF files
> > *incrementally*. My requirements is to pull the PDF files remotely from
> the
> > network folder path. This network folder will be having new sets of PDF
> > files after certain intervals (for say 20 secs). The folder will be
> forced
> > to get empty, every time the new sets of PDF files are copied into it. I
> do
> > not want to loose the earlier saved index of the old files, while doing
> the
> > next incremental import.
> >
> > Currently, i am using Solr 6.6 version for the research.
> >
> > The dataimport handler config is currently like this :-
> >
> > 
> >  
> >  
> > > dataSource="null"
> >   recursive = "true"
> >   baseDir="\\CLDSINGH02\*RemoteFileDepot*"
> >   fileName=".*pdf" rootEntity="false">
> >
> >   
> >-->
> > name="lastmodified" />
> >
> >  onError="skip"
> > 
> > url="${K2FileEntity.fileAbsolutePath}"
> format="text">
> >
> >meta="true"/>
> >meta="true"/>
> >   
> > 
> >
> >  
> >
> >
> > Kind regards,
> > Karan Singh
>
>


Re: LTR original score feature

2018-01-29 Thread Michael Alcorn
>It seems to me that the original score feature is not useful because it is
not normalized across all queries and therefore cannot be used to compare
relevance in different queries.

I don't agree with this statement and it's not what Alessandro was
suggesting ("When you put the original score together with the rest of
features, it may
be of potential usage."). The magnitude of the score could very well
contain useful information in certain contexts. The simplest way to
determine whether or not the score is useful is to just train and test the
model with and without the feature included and see which one performs
better.

On Thu, Jan 25, 2018 at 3:41 PM, Brian Yee  wrote:

> Thanks for the reply Alessandro. I'm starting to agree with you but I
> wanted to see if others agree. It seems to me that the original score
> feature is not useful because it is not normalized across all queries and
> therefore cannot be used to compare relevance in different queries.
>
> -Original Message-
> From: alessandro.benedetti [mailto:a.benede...@sease.io]
> Sent: Wednesday, January 24, 2018 10:22 AM
> To: solr-user@lucene.apache.org
> Subject: Re: LTR original score feature
>
> This is actually an interesting point.
> The original Solr score alone will mean nothing, the ranking position of
> the document would be a more relevant feature at that stage.
>
> When you put the original score together with the rest of features, it may
> be of potential usage ( number of query terms, tf for a specific field, idf
> for another field ...).
> Also because some training algorithms will group the training samples by
> query.
>
> personally I start to believe it would be better to decompose the original
> score into finer grain features and then rely on LTR to weight them ( as
> the original score is effectively already mixing up finer grain features
> following a standard formula).
>
>
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director Sease Ltd. -
> www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Perform incremental import with PDF Files

2018-01-29 Thread Emir Arnautović
Hi Karan,
Glad it worked for you.

I am not sure how to do it in C# client, but adding clean=false parameter in 
URL should do the trick.

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 29 Jan 2018, at 14:48, Karan Saini  wrote:
> 
> Thanks Emir :-) . Setting the property *clean=false* worked for me.
> 
> Is there a way, i can selectively clean the particular index from the
> C#.NET code using the SolrNet API ?
> Please suggest.
> 
> Kind regards,
> Karan
> 
> 
> On 29 January 2018 at 16:49, Emir Arnautović 
> wrote:
> 
>> Hi Karan,
>> Did you try running full import with clean=false?
>> 
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>>> On 29 Jan 2018, at 11:18, Karan Saini  wrote:
>>> 
>>> Hi folks,
>>> 
>>> Please suggest the solution for importing and indexing PDF files
>>> *incrementally*. My requirements is to pull the PDF files remotely from
>> the
>>> network folder path. This network folder will be having new sets of PDF
>>> files after certain intervals (for say 20 secs). The folder will be
>> forced
>>> to get empty, every time the new sets of PDF files are copied into it. I
>> do
>>> not want to loose the earlier saved index of the old files, while doing
>> the
>>> next incremental import.
>>> 
>>> Currently, i am using Solr 6.6 version for the research.
>>> 
>>> The dataimport handler config is currently like this :-
>>> 
>>> 
>>> 
>>> 
>>>   >> dataSource="null"
>>>  recursive = "true"
>>>  baseDir="\\CLDSINGH02\*RemoteFileDepot*"
>>>  fileName=".*pdf" rootEntity="false">
>>> 
>>>  
>>>   -->
>>>   > name="lastmodified" />
>>> 
>>>> onError="skip"
>>>
>>> url="${K2FileEntity.fileAbsolutePath}"
>> format="text">
>>> 
>>>  > meta="true"/>
>>>  > meta="true"/>
>>>  
>>>
>>>   
>>> 
>>> 
>>> 
>>> Kind regards,
>>> Karan Singh
>> 
>> 



parallel - cartesianProduct

2018-01-29 Thread Kojo
Hi solr-users!
I have a Streaming Expression which joins two search SE, one of them is
evaluated on a cartesianProduct SE.
I´am trying to run that in parallel mode but it does not work.


Trying a very simple parallel I can see that it works:

parallel(
  search(



But this one I´m trying to run, doesn´t works:

parallel(
rollup(
sort(
hashJoin(
  search(
  hashed=cartesianProduct(
search(



The simplified version of the above, doesn´t works too:

parallel(
   cartesianProduct(
search(


The error is bellow, do you have any hint on how can I fix the expression?

Thank you.



java.io.IOException: java.lang.NullPointerException
at
org.apache.solr.client.solrj.io.stream.ParallelStream.constructStreams(ParallelStream.java:277)
at
org.apache.solr.client.solrj.io.stream.CloudSolrStream.open(CloudSolrStream.java:305)
at
org.apache.solr.client.solrj.io.stream.ExceptionStream.open(ExceptionStream.java:51)
at
org.apache.solr.handler.StreamHandler$TimerStream.open(StreamHandler.java:535)
at
org.apache.solr.client.solrj.io.stream.TupleStream.writeMap(TupleStream.java:83)
at
org.apache.solr.response.JSONWriter.writeMap(JSONResponseWriter.java:547)
at
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:193)
at
org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:209)
at
org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:325)
at
org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:120)
at
org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:71)
at
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)
at
org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:809)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:538)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at
org.apache.solr.client.solrj.io.stream.CartesianProductStream.toExpression(CartesianProductStream.java:154)
at
org.apache.solr.client.solrj.io.stream.CartesianProductStream.toExpression(CartesianProductStream.java:134)
at
org.apache.solr.client.solrj.io.stream.CartesianProductStream.toExpression(CartesianProductStream.java:44)
at
org.apache.solr.client.solrj.io.stream.ParallelStream.constructStreams(ParallelStream.java:255)


Re: parallel - cartesianProduct

2018-01-29 Thread Joel Bernstein
This looks like a bug in the CartesianProductStream. It's going to have be
fixed before parallel cartesian products can be run. Feel free to create a
jira for this.

Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Jan 29, 2018 at 9:58 AM, Kojo  wrote:

> Hi solr-users!
> I have a Streaming Expression which joins two search SE, one of them is
> evaluated on a cartesianProduct SE.
> I´am trying to run that in parallel mode but it does not work.
>
>
> Trying a very simple parallel I can see that it works:
>
> parallel(
>   search(
>
>
>
> But this one I´m trying to run, doesn´t works:
>
> parallel(
> rollup(
> sort(
> hashJoin(
>   search(
>   hashed=cartesianProduct(
> search(
>
>
>
> The simplified version of the above, doesn´t works too:
>
> parallel(
>cartesianProduct(
> search(
>
>
> The error is bellow, do you have any hint on how can I fix the expression?
>
> Thank you.
>
>
>
> java.io.IOException: java.lang.NullPointerException
> at
> org.apache.solr.client.solrj.io.stream.ParallelStream.constructStreams(
> ParallelStream.java:277)
> at
> org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> open(CloudSolrStream.java:305)
> at
> org.apache.solr.client.solrj.io.stream.ExceptionStream.
> open(ExceptionStream.java:51)
> at
> org.apache.solr.handler.StreamHandler$TimerStream.
> open(StreamHandler.java:535)
> at
> org.apache.solr.client.solrj.io.stream.TupleStream.
> writeMap(TupleStream.java:83)
> at
> org.apache.solr.response.JSONWriter.writeMap(JSONResponseWriter.java:547)
> at
> org.apache.solr.response.TextResponseWriter.writeVal(
> TextResponseWriter.java:193)
> at
> org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(
> JSONResponseWriter.java:209)
> at
> org.apache.solr.response.JSONWriter.writeNamedList(
> JSONResponseWriter.java:325)
> at
> org.apache.solr.response.JSONWriter.writeResponse(
> JSONResponseWriter.java:120)
> at
> org.apache.solr.response.JSONResponseWriter.write(
> JSONResponseWriter.java:71)
> at
> org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(
> QueryResponseWriterUtil.java:65)
> at
> org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:809)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:538)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:361)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:305)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler.java:1691)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(
> SecurityHandler.java:548)
> at
> org.eclipse.jetty.server.session.SessionHandler.
> doHandle(SessionHandler.java:226)
> at
> org.eclipse.jetty.server.handler.ContextHandler.
> doHandle(ContextHandler.java:1180)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
> at
> org.eclipse.jetty.server.session.SessionHandler.
> doScope(SessionHandler.java:185)
> at
> org.eclipse.jetty.server.handler.ContextHandler.
> doScope(ContextHandler.java:1112)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:141)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
> ContextHandlerCollection.java:213)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.
> handle(HandlerCollection.java:119)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper.java:134)
> at
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(
> RewriteHandler.java:335)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper.java:134)
> at org.eclipse.jetty.server.Server.handle(Server.java:534)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(
> HttpConnection.java:251)
> at
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(
> AbstractConnection.java:273)
> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
> at
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
> SelectChannelEndPoint.java:93)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> executeProduceConsume(ExecuteProduceConsume.java:303)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> produceConsume(ExecuteProduceConsume.java:148)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(
> ExecuteProduceConsume.java:136)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> QueuedThreadPool.java:671)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(
> QueuedThreadPool.java:589)
> at java.lang.Thre

Re: ***UNCHECKED*** Limit Solr search to number of character/words (without changing index)

2018-01-29 Thread alessandro.benedetti
Taking a look to Lucene code, this seems the closest query to your
requirement :

org.apache.lucene.search.spans.SpanPositionRangeQuery

But it is not used in Solr out of the box according to what I know.
You may potentially develop a query parser and use it to reach your goals.

Given that, I think the index time strategy will be much easier and it will
just require a re-index and few small changes at query time configuration.
Another possibility may be to use payloads and the related query parser, but
also in this case you would need to re-index so it is unlikely that this
option would be your favorite.
I appreciate the fact you can not re-index, so in this case you will need to
follow the other approaches ( developing components).

Regards





-
---
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: LTR original score feature

2018-01-29 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
I think it really depends on the particular use case. Sometime the absolute 
score is a good feature, sometimes no.  

If you are using the default bm25, I think that increasing the number of terms 
in the query will increase the average doc. score in the results. So maybe I 
would normalize the score at least considering the number of terms in the 
query. 

Using the rank has been proposed in academia [1] and it seems to improve 
quality of the results. 

[1] http://hpc.isti.cnr.it/~claudio/web/archives/20150416b/index.html 

From: solr-user@lucene.apache.org At: 01/29/18 14:19:41To:  
solr-user@lucene.apache.org
Subject: Re: LTR original score feature

>It seems to me that the original score feature is not useful because it is
not normalized across all queries and therefore cannot be used to compare
relevance in different queries.

I don't agree with this statement and it's not what Alessandro was
suggesting ("When you put the original score together with the rest of
features, it may
be of potential usage."). The magnitude of the score could very well
contain useful information in certain contexts. The simplest way to
determine whether or not the score is useful is to just train and test the
model with and without the feature included and see which one performs
better.

On Thu, Jan 25, 2018 at 3:41 PM, Brian Yee  wrote:

> Thanks for the reply Alessandro. I'm starting to agree with you but I
> wanted to see if others agree. It seems to me that the original score
> feature is not useful because it is not normalized across all queries and
> therefore cannot be used to compare relevance in different queries.
>
> -Original Message-
> From: alessandro.benedetti [mailto:a.benede...@sease.io]
> Sent: Wednesday, January 24, 2018 10:22 AM
> To: solr-user@lucene.apache.org
> Subject: Re: LTR original score feature
>
> This is actually an interesting point.
> The original Solr score alone will mean nothing, the ranking position of
> the document would be a more relevant feature at that stage.
>
> When you put the original score together with the rest of features, it may
> be of potential usage ( number of query terms, tf for a specific field, idf
> for another field ...).
> Also because some training algorithms will group the training samples by
> query.
>
> personally I start to believe it would be better to decompose the original
> score into finer grain features and then rely on LTR to weight them ( as
> the original score is effectively already mixing up finer grain features
> following a standard formula).
>
>
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director Sease Ltd. -
> www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>




Can't find resource 'syns.txt'

2018-01-29 Thread beji dhia
 hello,
I'm beginner with solrand I have some difficulties to manipulate resources.
In fact, I m using solr 7.2. in cloud mode using zookeper.
1/ I create collection named films
2/ I wanted to add synonyms file  called "syns.txt" so I added it into
"server/solr/configsets/_default/conf/"
3/ i execute command line

curl -X POST -H 'Content-type:application/json' --data-binary '{
  "replace-field-type":{
 "name":"_text_",
 "class":"solr.TextField",
 "positionIncrementGap":"100",
 "analyzer":{
"charFilters":[{
   "class":"solr.PatternReplaceCharFilterFactory",
   "replacement":"$1$1",
   "pattern":"([a-zA-Z])1+" }],
"tokenizer":{
   "class":"solr.UAX29URLEmailTokenizerFactory" },
"filters":[{
   "class":"solr.StopFilterFactory",
   "ignoreCase":true,
"words":"stopwords.txt"
   },
   { "class":"solr.LowerCaseFilterFactory" },
   { "class":"solr.ASCIIFoldingFilterFactory" },
   { "class":"solr.EnglishPossessiveFilterFactory" },
   { "class":"solr.SynonymFilterFactory","synonyms":"syns.txt"}
   ]}},
   "replace-field" : {
  "name":"_text_",
  "type":"_text_",
  "stored":true,
  "multiValued":true,
  "indexed":true
  }
}' http://localhost:8983/solr/films/schema

the problem is that I get Can't find resource 'syns.txt' in classpath or
'/configs/films' cwd=/home/solr-7.2.1/server\nCan't find resource
'syns.txt' in classpath or '/configs/films'.

Can anyone help me please?


Re: parallel - cartesianProduct

2018-01-29 Thread Kojo
Joel,
The Jira is created:
https://issues.apache.org/jira/browse/SOLR-11922

I hope it helps.

Thank you very much.




2018-01-29 13:03 GMT-02:00 Joel Bernstein :

> This looks like a bug in the CartesianProductStream. It's going to have be
> fixed before parallel cartesian products can be run. Feel free to create a
> jira for this.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Mon, Jan 29, 2018 at 9:58 AM, Kojo  wrote:
>
> > Hi solr-users!
> > I have a Streaming Expression which joins two search SE, one of them is
> > evaluated on a cartesianProduct SE.
> > I´am trying to run that in parallel mode but it does not work.
> >
> >
> > Trying a very simple parallel I can see that it works:
> >
> > parallel(
> >   search(
> >
> >
> >
> > But this one I´m trying to run, doesn´t works:
> >
> > parallel(
> > rollup(
> > sort(
> > hashJoin(
> >   search(
> >   hashed=cartesianProduct(
> > search(
> >
> >
> >
> > The simplified version of the above, doesn´t works too:
> >
> > parallel(
> >cartesianProduct(
> > search(
> >
> >
> > The error is bellow, do you have any hint on how can I fix the
> expression?
> >
> > Thank you.
> >
> >
> >
> > java.io.IOException: java.lang.NullPointerException
> > at
> > org.apache.solr.client.solrj.io.stream.ParallelStream.constructStreams(
> > ParallelStream.java:277)
> > at
> > org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> > open(CloudSolrStream.java:305)
> > at
> > org.apache.solr.client.solrj.io.stream.ExceptionStream.
> > open(ExceptionStream.java:51)
> > at
> > org.apache.solr.handler.StreamHandler$TimerStream.
> > open(StreamHandler.java:535)
> > at
> > org.apache.solr.client.solrj.io.stream.TupleStream.
> > writeMap(TupleStream.java:83)
> > at
> > org.apache.solr.response.JSONWriter.writeMap(
> JSONResponseWriter.java:547)
> > at
> > org.apache.solr.response.TextResponseWriter.writeVal(
> > TextResponseWriter.java:193)
> > at
> > org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(
> > JSONResponseWriter.java:209)
> > at
> > org.apache.solr.response.JSONWriter.writeNamedList(
> > JSONResponseWriter.java:325)
> > at
> > org.apache.solr.response.JSONWriter.writeResponse(
> > JSONResponseWriter.java:120)
> > at
> > org.apache.solr.response.JSONResponseWriter.write(
> > JSONResponseWriter.java:71)
> > at
> > org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(
> > QueryResponseWriterUtil.java:65)
> > at
> > org.apache.solr.servlet.HttpSolrCall.writeResponse(
> HttpSolrCall.java:809)
> > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:538)
> > at
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > SolrDispatchFilter.java:361)
> > at
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > SolrDispatchFilter.java:305)
> > at
> > org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> > doFilter(ServletHandler.java:1691)
> > at
> > org.eclipse.jetty.servlet.ServletHandler.doHandle(
> ServletHandler.java:582)
> > at
> > org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > ScopedHandler.java:143)
> > at
> > org.eclipse.jetty.security.SecurityHandler.handle(
> > SecurityHandler.java:548)
> > at
> > org.eclipse.jetty.server.session.SessionHandler.
> > doHandle(SessionHandler.java:226)
> > at
> > org.eclipse.jetty.server.handler.ContextHandler.
> > doHandle(ContextHandler.java:1180)
> > at
> > org.eclipse.jetty.servlet.ServletHandler.doScope(
> ServletHandler.java:512)
> > at
> > org.eclipse.jetty.server.session.SessionHandler.
> > doScope(SessionHandler.java:185)
> > at
> > org.eclipse.jetty.server.handler.ContextHandler.
> > doScope(ContextHandler.java:1112)
> > at
> > org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > ScopedHandler.java:141)
> > at
> > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
> > ContextHandlerCollection.java:213)
> > at
> > org.eclipse.jetty.server.handler.HandlerCollection.
> > handle(HandlerCollection.java:119)
> > at
> > org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> > HandlerWrapper.java:134)
> > at
> > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(
> > RewriteHandler.java:335)
> > at
> > org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> > HandlerWrapper.java:134)
> > at org.eclipse.jetty.server.Server.handle(Server.java:534)
> > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
> > at
> > org.eclipse.jetty.server.HttpConnection.onFillable(
> > HttpConnection.java:251)
> > at
> > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(
> > AbstractConnection.java:273)
> > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
> > at
> > org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
> > SelectChannelEndPoint.java:93)
> > at
> > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> > executeProduceConsume(E

Query parser problem, using fuzzy search

2018-01-29 Thread David Frese

Hello everybody,

how can I formulate a fuzzy query that works for an arbitrary string, 
resp. is there a formal syntax definition somewhere?


I already found by by hand, that

field:"val"~2

Is read by the parser, but the fuzzyness seems to get lost. So I write

field:val~2

Now if val contain spaces and other special characters, I can escape them:

field:my\ val~2

But now I'm stuck with the term AND:

field:AND~2

Note that I do not want a boolean expression here, but I want to match 
the string AND! But the parser complains:


"org.apache.solr.search.SyntaxError: Cannot parse 'field:AND~2': 
Encountered \"  \"AND \"\" at line 1, column 4.\nWas expecting one 
of:\n ...\n\"(\" ...\n\"*\" ...\n 
...\n ...\n ...\n ...\n 
 ...\n\"[\" ...\n\"{\" ...\n ...\n 
\"filter(\" ...\n ...\n",



Thanks for any hints and help.

--
David Frese
+49 7071 70896 75

Active Group GmbH
Hechinger Str. 12/1, 72072 Tübingen
Registergericht: Amtsgericht Stuttgart, HRB 224404
Geschäftsführer: Dr. Michael Sperber


Re: Query parser problem, using fuzzy search

2018-01-29 Thread Erick Erickson
Try searching with lowercase the word and. Somehow you have to allow
the parser to distinguish the two.

You _might_ be able to try "AND~2" (with quotes) to see if you can get
that through the parser. Kind of a hack, but

There's also a parameter (depending on the parser) about lowercasing
operators, so if and~2 doesn't work check thatl

On Mon, Jan 29, 2018 at 8:32 AM, David Frese
 wrote:
> Hello everybody,
>
> how can I formulate a fuzzy query that works for an arbitrary string, resp.
> is there a formal syntax definition somewhere?
>
> I already found by by hand, that
>
> field:"val"~2
>
> Is read by the parser, but the fuzzyness seems to get lost. So I write
>
> field:val~2
>
> Now if val contain spaces and other special characters, I can escape them:
>
> field:my\ val~2
>
> But now I'm stuck with the term AND:
>
> field:AND~2
>
> Note that I do not want a boolean expression here, but I want to match the
> string AND! But the parser complains:
>
> "org.apache.solr.search.SyntaxError: Cannot parse 'field:AND~2': Encountered
> \"  \"AND \"\" at line 1, column 4.\nWas expecting one of:\n
>  ...\n\"(\" ...\n\"*\" ...\n ...\n
> ...\n ...\n ...\n  ...\n\"[\"
> ...\n\"{\" ...\n ...\n \"filter(\" ...\n ...\n
> ",
>
>
> Thanks for any hints and help.
>
> --
> David Frese
> +49 7071 70896 75
>
> Active Group GmbH
> Hechinger Str. 12/1, 72072 Tübingen
> Registergericht: Amtsgericht Stuttgart, HRB 224404
> Geschäftsführer: Dr. Michael Sperber


Re: Can't find resource 'syns.txt'

2018-01-29 Thread Erick Erickson
did you push 'syns.txt' to ZooKeeper into the same place your schema is?

Best,
Erick

On Mon, Jan 29, 2018 at 7:34 AM, beji dhia  wrote:
>  hello,
> I'm beginner with solrand I have some difficulties to manipulate resources.
> In fact, I m using solr 7.2. in cloud mode using zookeper.
> 1/ I create collection named films
> 2/ I wanted to add synonyms file  called "syns.txt" so I added it into
> "server/solr/configsets/_default/conf/"
> 3/ i execute command line
>
> curl -X POST -H 'Content-type:application/json' --data-binary '{
>   "replace-field-type":{
>  "name":"_text_",
>  "class":"solr.TextField",
>  "positionIncrementGap":"100",
>  "analyzer":{
> "charFilters":[{
>"class":"solr.PatternReplaceCharFilterFactory",
>"replacement":"$1$1",
>"pattern":"([a-zA-Z])1+" }],
> "tokenizer":{
>"class":"solr.UAX29URLEmailTokenizerFactory" },
> "filters":[{
>"class":"solr.StopFilterFactory",
>"ignoreCase":true,
> "words":"stopwords.txt"
>},
>{ "class":"solr.LowerCaseFilterFactory" },
>{ "class":"solr.ASCIIFoldingFilterFactory" },
>{ "class":"solr.EnglishPossessiveFilterFactory" },
>{ "class":"solr.SynonymFilterFactory","synonyms":"syns.txt"}
>]}},
>"replace-field" : {
>   "name":"_text_",
>   "type":"_text_",
>   "stored":true,
>   "multiValued":true,
>   "indexed":true
>   }
> }' http://localhost:8983/solr/films/schema
>
> the problem is that I get Can't find resource 'syns.txt' in classpath or
> '/configs/films' cwd=/home/solr-7.2.1/server\nCan't find resource
> 'syns.txt' in classpath or '/configs/films'.
>
> Can anyone help me please?


Re: parallel - cartesianProduct

2018-01-29 Thread Joel Bernstein
Thanks!

Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Jan 29, 2018 at 11:14 AM, Kojo  wrote:

> Joel,
> The Jira is created:
> https://issues.apache.org/jira/browse/SOLR-11922
>
> I hope it helps.
>
> Thank you very much.
>
>
>
>
> 2018-01-29 13:03 GMT-02:00 Joel Bernstein :
>
> > This looks like a bug in the CartesianProductStream. It's going to have
> be
> > fixed before parallel cartesian products can be run. Feel free to create
> a
> > jira for this.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Mon, Jan 29, 2018 at 9:58 AM, Kojo  wrote:
> >
> > > Hi solr-users!
> > > I have a Streaming Expression which joins two search SE, one of them is
> > > evaluated on a cartesianProduct SE.
> > > I´am trying to run that in parallel mode but it does not work.
> > >
> > >
> > > Trying a very simple parallel I can see that it works:
> > >
> > > parallel(
> > >   search(
> > >
> > >
> > >
> > > But this one I´m trying to run, doesn´t works:
> > >
> > > parallel(
> > > rollup(
> > > sort(
> > > hashJoin(
> > >   search(
> > >   hashed=cartesianProduct(
> > > search(
> > >
> > >
> > >
> > > The simplified version of the above, doesn´t works too:
> > >
> > > parallel(
> > >cartesianProduct(
> > > search(
> > >
> > >
> > > The error is bellow, do you have any hint on how can I fix the
> > expression?
> > >
> > > Thank you.
> > >
> > >
> > >
> > > java.io.IOException: java.lang.NullPointerException
> > > at
> > > org.apache.solr.client.solrj.io.stream.ParallelStream.
> constructStreams(
> > > ParallelStream.java:277)
> > > at
> > > org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> > > open(CloudSolrStream.java:305)
> > > at
> > > org.apache.solr.client.solrj.io.stream.ExceptionStream.
> > > open(ExceptionStream.java:51)
> > > at
> > > org.apache.solr.handler.StreamHandler$TimerStream.
> > > open(StreamHandler.java:535)
> > > at
> > > org.apache.solr.client.solrj.io.stream.TupleStream.
> > > writeMap(TupleStream.java:83)
> > > at
> > > org.apache.solr.response.JSONWriter.writeMap(
> > JSONResponseWriter.java:547)
> > > at
> > > org.apache.solr.response.TextResponseWriter.writeVal(
> > > TextResponseWriter.java:193)
> > > at
> > > org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(
> > > JSONResponseWriter.java:209)
> > > at
> > > org.apache.solr.response.JSONWriter.writeNamedList(
> > > JSONResponseWriter.java:325)
> > > at
> > > org.apache.solr.response.JSONWriter.writeResponse(
> > > JSONResponseWriter.java:120)
> > > at
> > > org.apache.solr.response.JSONResponseWriter.write(
> > > JSONResponseWriter.java:71)
> > > at
> > > org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(
> > > QueryResponseWriterUtil.java:65)
> > > at
> > > org.apache.solr.servlet.HttpSolrCall.writeResponse(
> > HttpSolrCall.java:809)
> > > at org.apache.solr.servlet.HttpSolrCall.call(
> HttpSolrCall.java:538)
> > > at
> > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > > SolrDispatchFilter.java:361)
> > > at
> > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > > SolrDispatchFilter.java:305)
> > > at
> > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> > > doFilter(ServletHandler.java:1691)
> > > at
> > > org.eclipse.jetty.servlet.ServletHandler.doHandle(
> > ServletHandler.java:582)
> > > at
> > > org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > > ScopedHandler.java:143)
> > > at
> > > org.eclipse.jetty.security.SecurityHandler.handle(
> > > SecurityHandler.java:548)
> > > at
> > > org.eclipse.jetty.server.session.SessionHandler.
> > > doHandle(SessionHandler.java:226)
> > > at
> > > org.eclipse.jetty.server.handler.ContextHandler.
> > > doHandle(ContextHandler.java:1180)
> > > at
> > > org.eclipse.jetty.servlet.ServletHandler.doScope(
> > ServletHandler.java:512)
> > > at
> > > org.eclipse.jetty.server.session.SessionHandler.
> > > doScope(SessionHandler.java:185)
> > > at
> > > org.eclipse.jetty.server.handler.ContextHandler.
> > > doScope(ContextHandler.java:1112)
> > > at
> > > org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > > ScopedHandler.java:141)
> > > at
> > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
> > > ContextHandlerCollection.java:213)
> > > at
> > > org.eclipse.jetty.server.handler.HandlerCollection.
> > > handle(HandlerCollection.java:119)
> > > at
> > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> > > HandlerWrapper.java:134)
> > > at
> > > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(
> > > RewriteHandler.java:335)
> > > at
> > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> > > HandlerWrapper.java:134)
> > > at org.eclipse.jetty.server.Server.handle(Server.java:534)
> > > at org.eclipse.jetty.server.HttpChannel.handle(
> HttpChannel.java:320)
> > > at
> > > org.eclipse.jetty.server.HttpConnection.onFillable

Re: Perform incremental import with PDF Files

2018-01-29 Thread Alexandre Rafalovitch
If you need to make a request to Solr that has a lot of custom
parameters and values, you can create an additional definition for a
Request handler and all all those parameters in there, instead of
hardcoding them on the client side. See solrconfig.xml, there are lots
of examples there.

Regards,
   Alex.

On 29 January 2018 at 20:48, Karan Saini  wrote:
> Thanks Emir :-) . Setting the property *clean=false* worked for me.
>
> Is there a way, i can selectively clean the particular index from the
> C#.NET code using the SolrNet API ?
> Please suggest.
>
> Kind regards,
> Karan
>
>
> On 29 January 2018 at 16:49, Emir Arnautović 
> wrote:
>
>> Hi Karan,
>> Did you try running full import with clean=false?
>>
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>
>>
>>
>> > On 29 Jan 2018, at 11:18, Karan Saini  wrote:
>> >
>> > Hi folks,
>> >
>> > Please suggest the solution for importing and indexing PDF files
>> > *incrementally*. My requirements is to pull the PDF files remotely from
>> the
>> > network folder path. This network folder will be having new sets of PDF
>> > files after certain intervals (for say 20 secs). The folder will be
>> forced
>> > to get empty, every time the new sets of PDF files are copied into it. I
>> do
>> > not want to loose the earlier saved index of the old files, while doing
>> the
>> > next incremental import.
>> >
>> > Currently, i am using Solr 6.6 version for the research.
>> >
>> > The dataimport handler config is currently like this :-
>> >
>> > 
>> >  
>> >  
>> >> > dataSource="null"
>> >   recursive = "true"
>> >   baseDir="\\CLDSINGH02\*RemoteFileDepot*"
>> >   fileName=".*pdf" rootEntity="false">
>> >
>> >   
>> >-->
>> >> name="lastmodified" />
>> >
>> > > onError="skip"
>> > 
>> > url="${K2FileEntity.fileAbsolutePath}"
>> format="text">
>> >
>> >   > meta="true"/>
>> >   > meta="true"/>
>> >   
>> > 
>> >
>> >  
>> >
>> >
>> > Kind regards,
>> > Karan Singh
>>
>>


Re: 7.2.1 cluster dies within minutes after restart

2018-01-29 Thread S G
Hi Markus,

We are in the process of upgrading our clusters to 7.2.1 and I am not sure
I quite follow the conversation here.
Is there a simple workaround to set the ZK_CLIENT_TIMEOUT to a higher value
in the config (and it's just a default value being wrong/overridden
somewhere)?
Or is it more severe in the sense that any config set for ZK_CLIENT_TIMEOUT
by the user is just ignored completely by Solr in 7.2.1 ?

Thanks
SG


On Mon, Jan 29, 2018 at 3:09 AM, Markus Jelsma 
wrote:

> Ok, i applied the patch and it is clear the timeout is 15000. Solr.xml
> says 3 if ZK_CLIENT_TIMEOUT is not set, which is by default unset in
> solr.in.sh,but set in bin/solr to 15000. So it seems Solr's default is
> still 15000, not 3.
>
> But, back to my topic. I see we explicitly set it in solr.in.sh to 3.
> To be sure, i applied your patch to a production machine, all our
> collections run with 3. So how would that explain this log line?
>
> o.a.z.ClientCnxn Client session timed out, have not heard from server in
> 22130ms
>
> We also see these with smaller values, seven seconds. And, is this
> actually an indicator of the problems we have?
>
> Any ideas?
>
> Many thanks,
> Markus
>
>
> -Original message-
> > From:Markus Jelsma 
> > Sent: Saturday 27th January 2018 10:03
> > To: solr-user@lucene.apache.org
> > Subject: RE: 7.2.1 cluster dies within minutes after restart
> >
> > Hello,
> >
> > I grepped for it yesterday and found nothing but 3 in the settings,
> but judging from the weird time out value, you may be right. Let me apply
> your patch early next week and check for spurious warnings.
> >
> > Another note worthy observation for those working on cloud stability and
> recovery, whenever this happens, some nodes are also absolutely sure to run
> OOM. The leaders usually live longest, the replica's don't, their heap
> usage peaks every time, consistently.
> >
> > Thanks,
> > Markus
> >
> > -Original message-
> > > From:Shawn Heisey 
> > > Sent: Saturday 27th January 2018 0:49
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: 7.2.1 cluster dies within minutes after restart
> > >
> > > On 1/26/2018 10:02 AM, Markus Jelsma wrote:
> > > > o.a.z.ClientCnxn Client session timed out, have not heard from
> server in 22130ms (although zkClientTimeOut is 3).
> > >
> > > Are you absolutely certain that there is a setting for zkClientTimeout
> > > that is actually getting applied?  The default value in Solr's example
> > > configs is 30 seconds, but the internal default in the code (when no
> > > configuration is found) is still 15.  I have confirmed this in the
> code.
> > >
> > > Looks like SolrCloud doesn't log the values it's using for things like
> > > zkClientTimeout.  I think it should.
> > >
> > > https://issues.apache.org/jira/browse/SOLR-11915
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
>


RE: 7.2.1 cluster dies within minutes after restart

2018-01-29 Thread Markus Jelsma
Hello SG,

The default in solr.in.sh is commented so it defaults to the value set in 
bin/solr, which is fifteen seconds. Just uncomment the setting in solr.in.sh 
and your timeout will be thirty seconds.

For Solr itself to really default to thirty seconds, Solr's bin/solr needs to 
be patched to use the correct value.

Regards,
Markus
 
-Original message-
> From:S G 
> Sent: Monday 29th January 2018 20:15
> To: solr-user@lucene.apache.org
> Subject: Re: 7.2.1 cluster dies within minutes after restart
> 
> Hi Markus,
> 
> We are in the process of upgrading our clusters to 7.2.1 and I am not sure
> I quite follow the conversation here.
> Is there a simple workaround to set the ZK_CLIENT_TIMEOUT to a higher value
> in the config (and it's just a default value being wrong/overridden
> somewhere)?
> Or is it more severe in the sense that any config set for ZK_CLIENT_TIMEOUT
> by the user is just ignored completely by Solr in 7.2.1 ?
> 
> Thanks
> SG
> 
> 
> On Mon, Jan 29, 2018 at 3:09 AM, Markus Jelsma 
> wrote:
> 
> > Ok, i applied the patch and it is clear the timeout is 15000. Solr.xml
> > says 3 if ZK_CLIENT_TIMEOUT is not set, which is by default unset in
> > solr.in.sh,but set in bin/solr to 15000. So it seems Solr's default is
> > still 15000, not 3.
> >
> > But, back to my topic. I see we explicitly set it in solr.in.sh to 3.
> > To be sure, i applied your patch to a production machine, all our
> > collections run with 3. So how would that explain this log line?
> >
> > o.a.z.ClientCnxn Client session timed out, have not heard from server in
> > 22130ms
> >
> > We also see these with smaller values, seven seconds. And, is this
> > actually an indicator of the problems we have?
> >
> > Any ideas?
> >
> > Many thanks,
> > Markus
> >
> >
> > -Original message-
> > > From:Markus Jelsma 
> > > Sent: Saturday 27th January 2018 10:03
> > > To: solr-user@lucene.apache.org
> > > Subject: RE: 7.2.1 cluster dies within minutes after restart
> > >
> > > Hello,
> > >
> > > I grepped for it yesterday and found nothing but 3 in the settings,
> > but judging from the weird time out value, you may be right. Let me apply
> > your patch early next week and check for spurious warnings.
> > >
> > > Another note worthy observation for those working on cloud stability and
> > recovery, whenever this happens, some nodes are also absolutely sure to run
> > OOM. The leaders usually live longest, the replica's don't, their heap
> > usage peaks every time, consistently.
> > >
> > > Thanks,
> > > Markus
> > >
> > > -Original message-
> > > > From:Shawn Heisey 
> > > > Sent: Saturday 27th January 2018 0:49
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: 7.2.1 cluster dies within minutes after restart
> > > >
> > > > On 1/26/2018 10:02 AM, Markus Jelsma wrote:
> > > > > o.a.z.ClientCnxn Client session timed out, have not heard from
> > server in 22130ms (although zkClientTimeOut is 3).
> > > >
> > > > Are you absolutely certain that there is a setting for zkClientTimeout
> > > > that is actually getting applied?  The default value in Solr's example
> > > > configs is 30 seconds, but the internal default in the code (when no
> > > > configuration is found) is still 15.  I have confirmed this in the
> > code.
> > > >
> > > > Looks like SolrCloud doesn't log the values it's using for things like
> > > > zkClientTimeout.  I think it should.
> > > >
> > > > https://issues.apache.org/jira/browse/SOLR-11915
> > > >
> > > > Thanks,
> > > > Shawn
> > > >
> > > >
> > >
> >
> 


Re: 7.2.1 cluster dies within minutes after restart

2018-01-29 Thread Michael Braun
Believe this is reported in https://issues.apache.org/jira/browse/SOLR-10471


On Mon, Jan 29, 2018 at 2:55 PM, Markus Jelsma 
wrote:

> Hello SG,
>
> The default in solr.in.sh is commented so it defaults to the value set in
> bin/solr, which is fifteen seconds. Just uncomment the setting in
> solr.in.sh and your timeout will be thirty seconds.
>
> For Solr itself to really default to thirty seconds, Solr's bin/solr needs
> to be patched to use the correct value.
>
> Regards,
> Markus
>
> -Original message-
> > From:S G 
> > Sent: Monday 29th January 2018 20:15
> > To: solr-user@lucene.apache.org
> > Subject: Re: 7.2.1 cluster dies within minutes after restart
> >
> > Hi Markus,
> >
> > We are in the process of upgrading our clusters to 7.2.1 and I am not
> sure
> > I quite follow the conversation here.
> > Is there a simple workaround to set the ZK_CLIENT_TIMEOUT to a higher
> value
> > in the config (and it's just a default value being wrong/overridden
> > somewhere)?
> > Or is it more severe in the sense that any config set for
> ZK_CLIENT_TIMEOUT
> > by the user is just ignored completely by Solr in 7.2.1 ?
> >
> > Thanks
> > SG
> >
> >
> > On Mon, Jan 29, 2018 at 3:09 AM, Markus Jelsma <
> markus.jel...@openindex.io>
> > wrote:
> >
> > > Ok, i applied the patch and it is clear the timeout is 15000. Solr.xml
> > > says 3 if ZK_CLIENT_TIMEOUT is not set, which is by default unset
> in
> > > solr.in.sh,but set in bin/solr to 15000. So it seems Solr's default is
> > > still 15000, not 3.
> > >
> > > But, back to my topic. I see we explicitly set it in solr.in.sh to
> 3.
> > > To be sure, i applied your patch to a production machine, all our
> > > collections run with 3. So how would that explain this log line?
> > >
> > > o.a.z.ClientCnxn Client session timed out, have not heard from server
> in
> > > 22130ms
> > >
> > > We also see these with smaller values, seven seconds. And, is this
> > > actually an indicator of the problems we have?
> > >
> > > Any ideas?
> > >
> > > Many thanks,
> > > Markus
> > >
> > >
> > > -Original message-
> > > > From:Markus Jelsma 
> > > > Sent: Saturday 27th January 2018 10:03
> > > > To: solr-user@lucene.apache.org
> > > > Subject: RE: 7.2.1 cluster dies within minutes after restart
> > > >
> > > > Hello,
> > > >
> > > > I grepped for it yesterday and found nothing but 3 in the
> settings,
> > > but judging from the weird time out value, you may be right. Let me
> apply
> > > your patch early next week and check for spurious warnings.
> > > >
> > > > Another note worthy observation for those working on cloud stability
> and
> > > recovery, whenever this happens, some nodes are also absolutely sure
> to run
> > > OOM. The leaders usually live longest, the replica's don't, their heap
> > > usage peaks every time, consistently.
> > > >
> > > > Thanks,
> > > > Markus
> > > >
> > > > -Original message-
> > > > > From:Shawn Heisey 
> > > > > Sent: Saturday 27th January 2018 0:49
> > > > > To: solr-user@lucene.apache.org
> > > > > Subject: Re: 7.2.1 cluster dies within minutes after restart
> > > > >
> > > > > On 1/26/2018 10:02 AM, Markus Jelsma wrote:
> > > > > > o.a.z.ClientCnxn Client session timed out, have not heard from
> > > server in 22130ms (although zkClientTimeOut is 3).
> > > > >
> > > > > Are you absolutely certain that there is a setting for
> zkClientTimeout
> > > > > that is actually getting applied?  The default value in Solr's
> example
> > > > > configs is 30 seconds, but the internal default in the code (when
> no
> > > > > configuration is found) is still 15.  I have confirmed this in the
> > > code.
> > > > >
> > > > > Looks like SolrCloud doesn't log the values it's using for things
> like
> > > > > zkClientTimeout.  I think it should.
> > > > >
> > > > > https://issues.apache.org/jira/browse/SOLR-11915
> > > > >
> > > > > Thanks,
> > > > > Shawn
> > > > >
> > > > >
> > > >
> > >
> >
>


SolrCloud installation troubles...

2018-01-29 Thread Scott Prentice

Using Solr 7.2.0 and Zookeeper 3.4.11

In an effort to move to a more robust Solr environment, I'm setting up a 
prototype system of 3 Solr servers and 3 Zookeeper servers. For now, 
this is all on one machine, but will eventually be 3 machines.


This works fine on a Ubuntu 5.4.0-6 VM on my local system, but when I do 
the same setup on the company's network machine (a Red Hat 4.8.5-16 VM), 
I'm unable to create a collection. To keep things simple, I'm not using 
our custom schema yet, but just creating a collection through the Solr 
Admin UI using Collections > Add Collection, using the "_default" config 
set. On the Ubuntu system, I can create various collections .. 1 shard 
w/ 1 replication .. 2 shards w/ 3 replications .. 3 shards w/ 4 
replications .. all seem alive and well.


But when I do the same thing on the Red Hat system it fails. Through the 
UI, it'll first time out with this message ..


    Connection to Solr lost

Then after a refresh, the collection appears to have been partially 
created, but it's in the "Gone" state, and after some time, is deleted 
by an apparent cleanup process. If I try to create one through the 
command line ..


    ./bin/solr create -c test99 -n _default -s 2 -rf 2

I get this response ..

ERROR: Failed to create collection 'test99' due to: 
{10.6.208.31:8984_solr=org.apache.solr.client.solrj.SolrServerException:IOException 
occured when talking to server at: http://10.6.208.31:8984/solr, 
10.6.208.31:8985_solr=org.apache.solr.client.solrj.SolrServerException:IOException 
occured when talking to server at: http://10.6.208.31:8985/solr, 
10.6.208.31:8983_solr=org.apache.solr.client.solrj.SolrServerException:IOException 
occured when talking to server at: http://10.6.208.31:8983/solr}


I've seen other reports of errors like this but no solutions that seem 
to apply to my situation. Any thoughts?


Thanks!
...scott




Re: SolrCloud installation troubles...

2018-01-29 Thread Shawn Heisey

On 1/29/2018 1:13 PM, Scott Prentice wrote:
But when I do the same thing on the Red Hat system it fails. Through 
the UI, it'll first time out with this message ..


    Connection to Solr lost

Then after a refresh, the collection appears to have been partially 
created, but it's in the "Gone" state, and after some time, is deleted 
by an apparent cleanup process. If I try to create one through the 
command line ..


    ./bin/solr create -c test99 -n _default -s 2 -rf 2

I get this response ..

ERROR: Failed to create collection 'test99' due to: 
{10.6.208.31:8984_solr=org.apache.solr.client.solrj.SolrServerException:IOException 
occured when talking to server at: http://10.6.208.31:8984/solr, 
10.6.208.31:8985_solr=org.apache.solr.client.solrj.SolrServerException:IOException 
occured when talking to server at: http://10.6.208.31:8985/solr, 
10.6.208.31:8983_solr=org.apache.solr.client.solrj.SolrServerException:IOException 
occured when talking to server at: http://10.6.208.31:8983/solr} 


This sounds like either network connectivity problems or possibly issues 
caused by extreme garbage collection pauses that result in timeouts.


Thanks,
Shawn



RE: SolrCloud installation troubles...

2018-01-29 Thread Davis, Daniel (NIH/NLM) [C]
To expand on that answer, you have to wonder what ports are open in the server 
system's port-based firewall.I have to ask my systems team to open ports 
for everything I'm using, especially when I move from localhost to outside.

You should be able to "fake it out" if you set up your zookeeper configuration 
to use localhost ports.

-Original Message-
From: Scott Prentice [mailto:s...@leximation.com] 
Sent: Monday, January 29, 2018 3:13 PM
To: solr-user@lucene.apache.org
Subject: SolrCloud installation troubles...

Using Solr 7.2.0 and Zookeeper 3.4.11

In an effort to move to a more robust Solr environment, I'm setting up a 
prototype system of 3 Solr servers and 3 Zookeeper servers. For now, this is 
all on one machine, but will eventually be 3 machines.

This works fine on a Ubuntu 5.4.0-6 VM on my local system, but when I do the 
same setup on the company's network machine (a Red Hat 4.8.5-16 VM), I'm unable 
to create a collection. To keep things simple, I'm not using our custom schema 
yet, but just creating a collection through the Solr Admin UI using Collections 
> Add Collection, using the "_default" config set. On the Ubuntu system, I can 
create various collections .. 1 shard w/ 1 replication .. 2 shards w/ 3 
replications .. 3 shards w/ 4 replications .. all seem alive and well.

But when I do the same thing on the Red Hat system it fails. Through the UI, 
it'll first time out with this message ..

     Connection to Solr lost

Then after a refresh, the collection appears to have been partially created, 
but it's in the "Gone" state, and after some time, is deleted by an apparent 
cleanup process. If I try to create one through the command line ..

     ./bin/solr create -c test99 -n _default -s 2 -rf 2

I get this response ..

ERROR: Failed to create collection 'test99' due to: 
{10.6.208.31:8984_solr=org.apache.solr.client.solrj.SolrServerException:IOException
occured when talking to server at: http://10.6.208.31:8984/solr, 
10.6.208.31:8985_solr=org.apache.solr.client.solrj.SolrServerException:IOException
occured when talking to server at: http://10.6.208.31:8985/solr, 
10.6.208.31:8983_solr=org.apache.solr.client.solrj.SolrServerException:IOException
occured when talking to server at: http://10.6.208.31:8983/solr}

I've seen other reports of errors like this but no solutions that seem to apply 
to my situation. Any thoughts?

Thanks!
...scott




Re: SolrCloud installation troubles...

2018-01-29 Thread Scott Prentice


On 1/29/18 12:44 PM, Shawn Heisey wrote:

On 1/29/2018 1:13 PM, Scott Prentice wrote:
But when I do the same thing on the Red Hat system it fails. Through 
the UI, it'll first time out with this message ..


    Connection to Solr lost

Then after a refresh, the collection appears to have been partially 
created, but it's in the "Gone" state, and after some time, is 
deleted by an apparent cleanup process. If I try to create one 
through the command line ..


    ./bin/solr create -c test99 -n _default -s 2 -rf 2

I get this response ..

ERROR: Failed to create collection 'test99' due to: 
{10.6.208.31:8984_solr=org.apache.solr.client.solrj.SolrServerException:IOException 
occured when talking to server at: http://10.6.208.31:8984/solr, 
10.6.208.31:8985_solr=org.apache.solr.client.solrj.SolrServerException:IOException 
occured when talking to server at: http://10.6.208.31:8985/solr, 
10.6.208.31:8983_solr=org.apache.solr.client.solrj.SolrServerException:IOException 
occured when talking to server at: http://10.6.208.31:8983/solr} 


This sounds like either network connectivity problems or possibly 
issues caused by extreme garbage collection pauses that result in 
timeouts.


Thanks,
Shawn

Thanks, Shawn. I was wondering if there was something going on with IP 
redirection that was causing confusion. Any thoughts on how to debug? 
And, what do you mean by "extreme garbage collection pauses"? Is that 
Solr garbage collection or the OS itself? There's really nothing 
happening on this machine, it's purely for testing so there shouldn't be 
any extra load from other processes.


Thanks!
...scott





Re: SolrCloud installation troubles...

2018-01-29 Thread Scott Prentice
Interesting. I am using "localhost" in the config files (using the IP 
caused things to break even worse). But perhaps I should check with IT 
to make sure the ports are all open.


Thanks,
...scott


On 1/29/18 12:57 PM, Davis, Daniel (NIH/NLM) [C] wrote:

To expand on that answer, you have to wonder what ports are open in the server 
system's port-based firewall.I have to ask my systems team to open ports 
for everything I'm using, especially when I move from localhost to outside.

You should be able to "fake it out" if you set up your zookeeper configuration 
to use localhost ports.

-Original Message-
From: Scott Prentice [mailto:s...@leximation.com]
Sent: Monday, January 29, 2018 3:13 PM
To: solr-user@lucene.apache.org
Subject: SolrCloud installation troubles...

Using Solr 7.2.0 and Zookeeper 3.4.11

In an effort to move to a more robust Solr environment, I'm setting up a 
prototype system of 3 Solr servers and 3 Zookeeper servers. For now, this is 
all on one machine, but will eventually be 3 machines.

This works fine on a Ubuntu 5.4.0-6 VM on my local system, but when I do the same setup on 
the company's network machine (a Red Hat 4.8.5-16 VM), I'm unable to create a collection. To 
keep things simple, I'm not using our custom schema yet, but just creating a collection 
through the Solr Admin UI using Collections > Add Collection, using the 
"_default" config set. On the Ubuntu system, I can create various collections .. 1 
shard w/ 1 replication .. 2 shards w/ 3 replications .. 3 shards w/ 4 replications .. all 
seem alive and well.

But when I do the same thing on the Red Hat system it fails. Through the UI, 
it'll first time out with this message ..

      Connection to Solr lost

Then after a refresh, the collection appears to have been partially created, but it's in 
the "Gone" state, and after some time, is deleted by an apparent cleanup 
process. If I try to create one through the command line ..

      ./bin/solr create -c test99 -n _default -s 2 -rf 2

I get this response ..

ERROR: Failed to create collection 'test99' due to:
{10.6.208.31:8984_solr=org.apache.solr.client.solrj.SolrServerException:IOException
occured when talking to server at: http://10.6.208.31:8984/solr, 
10.6.208.31:8985_solr=org.apache.solr.client.solrj.SolrServerException:IOException
occured when talking to server at: http://10.6.208.31:8985/solr, 
10.6.208.31:8983_solr=org.apache.solr.client.solrj.SolrServerException:IOException
occured when talking to server at: http://10.6.208.31:8983/solr}

I've seen other reports of errors like this but no solutions that seem to apply 
to my situation. Any thoughts?

Thanks!
...scott






RE: SolrCloud installation troubles...

2018-01-29 Thread Davis, Daniel (NIH/NLM) [C]
Trying 127.0.0.1 could help.   We kind of tend to think localhost is always 
127.0.0.1, but I've seen localhost start to resolve to ::1, the IPv6 equivalent 
of 127.0.0.1.

I guess some environments can be strict enough to restrict communication on 
localhost; seems hard to imagine, but it does happen.

-Original Message-
From: Scott Prentice [mailto:s...@leximation.com] 
Sent: Monday, January 29, 2018 4:02 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud installation troubles...


On 1/29/18 12:44 PM, Shawn Heisey wrote:
> On 1/29/2018 1:13 PM, Scott Prentice wrote:
>> But when I do the same thing on the Red Hat system it fails. Through 
>> the UI, it'll first time out with this message ..
>>
>>     Connection to Solr lost
>>
>> Then after a refresh, the collection appears to have been partially 
>> created, but it's in the "Gone" state, and after some time, is 
>> deleted by an apparent cleanup process. If I try to create one 
>> through the command line ..
>>
>>     ./bin/solr create -c test99 -n _default -s 2 -rf 2
>>
>> I get this response ..
>>
>> ERROR: Failed to create collection 'test99' due to: 
>> {10.6.208.31:8984_solr=org.apache.solr.client.solrj.SolrServerExcepti
>> on:IOException occured when talking to server at: 
>> http://10.6.208.31:8984/solr, 
>> 10.6.208.31:8985_solr=org.apache.solr.client.solrj.SolrServerExceptio
>> n:IOException occured when talking to server at: 
>> http://10.6.208.31:8985/solr, 
>> 10.6.208.31:8983_solr=org.apache.solr.client.solrj.SolrServerExceptio
>> n:IOException occured when talking to server at: 
>> http://10.6.208.31:8983/solr}
>
> This sounds like either network connectivity problems or possibly 
> issues caused by extreme garbage collection pauses that result in 
> timeouts.
>
> Thanks,
> Shawn
>
Thanks, Shawn. I was wondering if there was something going on with IP 
redirection that was causing confusion. Any thoughts on how to debug? 
And, what do you mean by "extreme garbage collection pauses"? Is that Solr 
garbage collection or the OS itself? There's really nothing happening on this 
machine, it's purely for testing so there shouldn't be any extra load from 
other processes.

Thanks!
...scott





Re: SolrCloud installation troubles...

2018-01-29 Thread Scott Prentice
Looks like 2888 and 2890 are not open. At least they are not reported 
with a netstat -plunt .. could be the problem.


Thanks, all!

...scott


On 1/29/18 1:10 PM, Davis, Daniel (NIH/NLM) [C] wrote:

Trying 127.0.0.1 could help.   We kind of tend to think localhost is always 
127.0.0.1, but I've seen localhost start to resolve to ::1, the IPv6 equivalent 
of 127.0.0.1.

I guess some environments can be strict enough to restrict communication on 
localhost; seems hard to imagine, but it does happen.

-Original Message-
From: Scott Prentice [mailto:s...@leximation.com]
Sent: Monday, January 29, 2018 4:02 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud installation troubles...


On 1/29/18 12:44 PM, Shawn Heisey wrote:

On 1/29/2018 1:13 PM, Scott Prentice wrote:

But when I do the same thing on the Red Hat system it fails. Through
the UI, it'll first time out with this message ..

     Connection to Solr lost

Then after a refresh, the collection appears to have been partially
created, but it's in the "Gone" state, and after some time, is
deleted by an apparent cleanup process. If I try to create one
through the command line ..

     ./bin/solr create -c test99 -n _default -s 2 -rf 2

I get this response ..

ERROR: Failed to create collection 'test99' due to:
{10.6.208.31:8984_solr=org.apache.solr.client.solrj.SolrServerExcepti
on:IOException occured when talking to server at:
http://10.6.208.31:8984/solr,
10.6.208.31:8985_solr=org.apache.solr.client.solrj.SolrServerExceptio
n:IOException occured when talking to server at:
http://10.6.208.31:8985/solr,
10.6.208.31:8983_solr=org.apache.solr.client.solrj.SolrServerExceptio
n:IOException occured when talking to server at:
http://10.6.208.31:8983/solr}

This sounds like either network connectivity problems or possibly
issues caused by extreme garbage collection pauses that result in
timeouts.

Thanks,
Shawn


Thanks, Shawn. I was wondering if there was something going on with IP 
redirection that was causing confusion. Any thoughts on how to debug?
And, what do you mean by "extreme garbage collection pauses"? Is that Solr 
garbage collection or the OS itself? There's really nothing happening on this machine, 
it's purely for testing so there shouldn't be any extra load from other processes.

Thanks!
...scott







Re: SolrCloud installation troubles...

2018-01-29 Thread Shawn Heisey

On 1/29/2018 2:02 PM, Scott Prentice wrote:
Thanks, Shawn. I was wondering if there was something going on with IP 
redirection that was causing confusion. Any thoughts on how to debug? 
And, what do you mean by "extreme garbage collection pauses"? Is that 
Solr garbage collection or the OS itself? There's really nothing 
happening on this machine, it's purely for testing so there shouldn't 
be any extra load from other processes. 


Garbage collection is one of the primary features of Java's memory 
management.  It's not Solr or the OS.


If the java heap is really enormous, you can end up with long pauses, 
but I wouldn't expect them to be frequent unless the index is also 
really huge.


A very common issue that can cause even worse pause issues than a large 
heap is a heap that's too small, but not quite small enough to cause 
Java to completely run out of heap memory.  The default max heap size in 
recent Solr versions is 512MB, which is very small.  A Java program 
(which Solr is) can never use more heap memory than the maximum it is 
configured with, even if the machine has more memory available.


This paragraph is included because you mentioned IP redirection:  
Extreme care must be used when setting up SolrCloud on virtual machines 
where accessing the VM has to go through any kind of IP translation.  
SolrCloud keeps track of how to reach each server in the cloud and if it 
stores an untranslated address when you need the translated address (or 
vice-versa), things are not going to work.  Generally speaking 
translated addresses are going to be problematic for SolrCloud, and 
should not be used.


Thanks,
Shawn



Broken Feature in Solr 6.6

2018-01-29 Thread Antelmo Aguilar
Hi All,

I was using this feature in Solr 6.1:
https://issues.apache.org/jira/browse/SOLR-5244

It seems that this feature is broken in Solr 6.6.  If I do this query in
Solr 6.1, it works as expected.

q=*:*&fl=exp_id_s&rq={!xport}&wt=xsort&sort=exp_id_s+asc

However, doing the same query in Solr 6.6 does not return all the results.
It just returns 10 results.

Also, it seems that the wt=xsort parameter does not do anything since it
returns the results in xml format.  In 6.1 it returned the results in
JSON.  I asked same question in the IRC channel and they told me that it is
supposed to still work the same way.  Had to leave so hopefully someone can
help me out through e-mail.  I would really appreciate it.

Thank you,
Antelmo


Re: SolrCloud installation troubles...

2018-01-29 Thread Scott Prentice


On 1/29/18 1:31 PM, Shawn Heisey wrote:

On 1/29/2018 2:02 PM, Scott Prentice wrote:
Thanks, Shawn. I was wondering if there was something going on with 
IP redirection that was causing confusion. Any thoughts on how to 
debug? And, what do you mean by "extreme garbage collection pauses"? 
Is that Solr garbage collection or the OS itself? There's really 
nothing happening on this machine, it's purely for testing so there 
shouldn't be any extra load from other processes. 


Garbage collection is one of the primary features of Java's memory 
management.  It's not Solr or the OS.


If the java heap is really enormous, you can end up with long pauses, 
but I wouldn't expect them to be frequent unless the index is also 
really huge.


A very common issue that can cause even worse pause issues than a 
large heap is a heap that's too small, but not quite small enough to 
cause Java to completely run out of heap memory.  The default max heap 
size in recent Solr versions is 512MB, which is very small.  A Java 
program (which Solr is) can never use more heap memory than the 
maximum it is configured with, even if the machine has more memory 
available.


This paragraph is included because you mentioned IP redirection: 
Extreme care must be used when setting up SolrCloud on virtual 
machines where accessing the VM has to go through any kind of IP 
translation.  SolrCloud keeps track of how to reach each server in the 
cloud and if it stores an untranslated address when you need the 
translated address (or vice-versa), things are not going to work.  
Generally speaking translated addresses are going to be problematic 
for SolrCloud, and should not be used.


Thanks,
Shawn

Thanks for the clarification. Yes, we're just using the default heap 
size for Solr, but there's no index (yet) and nothing really going on, 
so I'd hope that garbage collection isn't the problem.


I'm putting my money on some IP translation issues (this is on a tightly 
controlled corporate network) or the fact that the 2888 and 2890 ports 
appear to not be open. I'll dig down the network issue path for now and 
see where that gets me.


Thanks,
...scott




Matching within list fields

2018-01-29 Thread John Davis
Hi there!

We have a use case where we'd like to search within a list field, however
the search should not match across different elements in the list field --
all terms should match a single element in the list.

For eg if the field is a list of comments on a product, search should be
able to find a comment that matches all the terms.

Short of creating separate documents for each element in the list, is there
any other efficient way of accomplishing this?

Thanks
John


Re: Solr 4.8.1 multiple client updates the same collection

2018-01-29 Thread Vincenzo D'Amore
Hey guys, thank you so much for this help.

Recalling all your suggestions I think I should:

1. Have two collections with an alias that alternatively point to one of
them. Use the configuration with autoSoftCommit and openSearcher=true. I
call this scenario an active/passive configuration.
2. The delete all and full reindex task I must always use the passive
collection.
3. The two clients can write concurrently using the alias, which always
point to the active collection, but they must stop once a day during the
full reindex.

Now the two clients now are synchronized by some sort of semaphore based on
a zookeeper resource, so only one at time can write into the SolrCloud
collection.
On the other hand, when I'll have the new configuration, I can leave the
clients to write concurrently, but even in this case, I must always stop
them once a day, during the full reindex.

At last, please let me ask another question, is it true that after every
commit, even if I had only updated one document, the SolrCloud cache is
invalidated (i.e. Solr must open a new searcher)?
Because this what the second clients does, updating a document at time and
commit.
In other words, how is good/bad having multiple hard commit in a short time
(few seconds)?

Best regards,
Vincenzo

On Mon, Jan 29, 2018 at 11:12 AM, alessandro.benedetti  wrote:

> Generally speaking, if a full re-index is happening everyday, wouldn't be
> better to use a technique such as collection alias ?
>
> You could point your search clients to the "Alias" which points to the
> online collection "collection1".
> When you re-index you build "collection2", when it is finished you point
> "Alias" to "collection2" .
> The following day you do the same thing but you use "collection1" to index.
>
> Client 2 for the atomic Updates will point to "Alias" .
>
> I am assuming here that during the re-indexing the price we get in the
> fresh
> index are the most up to date.
> So as soon as re-index finishes the collection is perfectly up to date.
>
> In the case you want to update the prices during re-indexing, the price
> updater should point to the temporary collection.
> Also in this case I assume that if a document was not indexed yet, the
> price
> update will fail, but the document will get the correct price when it is
> indexed.
> Please correct any wrong assumption,
>
> Cheers
>
>
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>



-- 
Vincenzo D'Amore


Change in behavior of CoreDescriptors

2018-01-29 Thread Shefali Dubey
Hello,

I observed the following change on switching from solr verion 6.4.2 to 6.6.2:

In 6.6.2, in case of an init failure, SolrCores.getCoreDescriptor does not 
return the core. The core is transient in nature but was not present in 
transient core cache.
This was not the case in 6.4.2. I was getting the CoreDescriptor even in case 
of init failure.
As a result of described behavior, unload of such a core (one with init 
failures) does not work properly.

Thanks,
Shefali



Solr JOIN Performance 4.10.0 vs 7.x

2018-01-29 Thread Horatiu Lazu
Hello,

I'm wondering what performance improvements occurred in Solr JOIN from
4.10.0 to 7.x. I noticed that the performance is quicker, but from looking
at the code in JoinQParserPlugin it isn't much change.

I saw that in Solr 5.x passing score=none would invoke Lucene's join
algorithm, which can be faster. Are there any other changes?

Thank you in advance.

-- 
*Horatiu Lazu*


Help with Boolean search using Solr parser edismax

2018-01-29 Thread Wendy2
Hi Solr users,I am having an issue on boolean search with Solr parser
edismax. The search "OR" doesn't work. The image below shows the different
results tested on different Solr versions. There are two types of search
requester handlers, /select vs /search. The /select requester uses Lucene
default parser, while /search uses Solr edismax parser.  I also listed the
search requester handler below. I am expecting the result count of 997 (844
+ 153) but I only get the correct count via the default /select request
handler on Solr v5.3.0 and 6.2.0.  I I go back to use the old Lucene default
parser via /select request handler, I lose all the nice customization
ranking and sorting :-(Does anyone know some workaround/solution to fix this
type of search issue? THANKS! 

Part of the /search request handler in solrconfig.xml
file:trueexplicitedismaxpdb_id^5.0struct.title^35.0citation.title^25.0title_fields_stem^3.0...rest_fields_stem
^0.3score desc,release_date desc,pdb_id desc7100text



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Change in behavior of CoreDescriptors

2018-01-29 Thread Erick Erickson
Lots of that was reworked between those two versions.

I'm not clear what you expect here. If a core fails to initialize,
then what's the purpose of unloading it? It isn't there in the first
place. The coreDescriptor should still be available if you need that,
and can be used to load the core later if the init issue is fixed.

Best,
Erick

On Mon, Jan 29, 2018 at 3:14 PM, Shefali Dubey  wrote:
> Hello,
>
> I observed the following change on switching from solr verion 6.4.2 to 6.6.2:
>
> In 6.6.2, in case of an init failure, SolrCores.getCoreDescriptor does not 
> return the core. The core is transient in nature but was not present in 
> transient core cache.
> This was not the case in 6.4.2. I was getting the CoreDescriptor even in case 
> of init failure.
> As a result of described behavior, unload of such a core (one with init 
> failures) does not work properly.
>
> Thanks,
> Shefali
>


Re: Broken Feature in Solr 6.6

2018-01-29 Thread Joel Bernstein
There was a change in the configs between 6.1 and 6.6. If you upgraded you
system and kept the old configs then the /export handler won't work
properly. Check solrconfig.xml and remove any reference to the /export
handler. You also don't need to specify the rq or wt when you access the
/export handler anymore. This should work fine:

http://host:port/solr/collection/export?q=*:*&fl=exp_id_s&sort=exp_id_s+asc

Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Jan 29, 2018 at 4:59 PM, Antelmo Aguilar  wrote:

> Hi All,
>
> I was using this feature in Solr 6.1:
> https://issues.apache.org/jira/browse/SOLR-5244
>
> It seems that this feature is broken in Solr 6.6.  If I do this query in
> Solr 6.1, it works as expected.
>
> q=*:*&fl=exp_id_s&rq={!xport}&wt=xsort&sort=exp_id_s+asc
>
> However, doing the same query in Solr 6.6 does not return all the results.
> It just returns 10 results.
>
> Also, it seems that the wt=xsort parameter does not do anything since it
> returns the results in xml format.  In 6.1 it returned the results in
> JSON.  I asked same question in the IRC channel and they told me that it is
> supposed to still work the same way.  Had to leave so hopefully someone can
> help me out through e-mail.  I would really appreciate it.
>
> Thank you,
> Antelmo
>


Re: Matching within list fields

2018-01-29 Thread Erick Erickson
That's what "positionIncrementGap" is all about. It's the offset
between the last token of one element of your list and the first token
of the next. Let's say your doc looks like:

   some text
   other stuff


I'm presuming that's what you mean by "list".

Now the positions of these tokens are
some:0
text:   1
other: 101
stuff:   102

assuming your positionIncrementGap is 100.

How you search for _phrases_ with a slop < 100 to prevent matches across. So
"some text"~90 would match, but "text other"~90 would not.

Best,
Erick

On Mon, Jan 29, 2018 at 2:29 PM, John Davis  wrote:
> Hi there!
>
> We have a use case where we'd like to search within a list field, however
> the search should not match across different elements in the list field --
> all terms should match a single element in the list.
>
> For eg if the field is a list of comments on a product, search should be
> able to find a comment that matches all the terms.
>
> Short of creating separate documents for each element in the list, is there
> any other efficient way of accomplishing this?
>
> Thanks
> John


Re: SolrCloud installation troubles...

2018-01-29 Thread Rick Leir
SELinux? Number open File limits? Number of Process limits? 
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: Change in behavior of CoreDescriptors

2018-01-29 Thread Shefali Dubey
Thanks for your response.
CoreDescriptor is not present in TransientSolrCoreCacheDefault for a core that 
has init failure. Is that expected?

On 1/29/18, 4:35 PM, "Erick Erickson"  wrote:

Lots of that was reworked between those two versions.

I'm not clear what you expect here. If a core fails to initialize,
then what's the purpose of unloading it? It isn't there in the first
place. The coreDescriptor should still be available if you need that,
and can be used to load the core later if the init issue is fixed.

Best,
Erick

On Mon, Jan 29, 2018 at 3:14 PM, Shefali Dubey  wrote:
> Hello,
>
> I observed the following change on switching from solr verion 6.4.2 to 
6.6.2:
>
> In 6.6.2, in case of an init failure, SolrCores.getCoreDescriptor does 
not return the core. The core is transient in nature but was not present in 
transient core cache.
> This was not the case in 6.4.2. I was getting the CoreDescriptor even in 
case of init failure.
> As a result of described behavior, unload of such a core (one with init 
failures) does not work properly.
>
> Thanks,
> Shefali
>




Re: Perform incremental import with PDF Files

2018-01-29 Thread Karan Saini
Hi Emir,

There is one behavior i noticed while performing the incremental import. I
added a new field into the managed-schema.xml to test the incremental
nature of using the clean=false.

 **

Now xtimestamp is having a new value even on every DIH import with
clean=false property. Now i am confused that how will i know, if
clean=false is working or not ?
Please suggest.

Kind regards,
Karan



On 29 January 2018 at 20:12, Emir Arnautović 
wrote:

> Hi Karan,
> Glad it worked for you.
>
> I am not sure how to do it in C# client, but adding clean=false parameter
> in URL should do the trick.
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 29 Jan 2018, at 14:48, Karan Saini  wrote:
> >
> > Thanks Emir :-) . Setting the property *clean=false* worked for me.
> >
> > Is there a way, i can selectively clean the particular index from the
> > C#.NET code using the SolrNet API ?
> > Please suggest.
> >
> > Kind regards,
> > Karan
> >
> >
> > On 29 January 2018 at 16:49, Emir Arnautović <
> emir.arnauto...@sematext.com>
> > wrote:
> >
> >> Hi Karan,
> >> Did you try running full import with clean=false?
> >>
> >> Emir
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection
> >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >>
> >>> On 29 Jan 2018, at 11:18, Karan Saini  wrote:
> >>>
> >>> Hi folks,
> >>>
> >>> Please suggest the solution for importing and indexing PDF files
> >>> *incrementally*. My requirements is to pull the PDF files remotely from
> >> the
> >>> network folder path. This network folder will be having new sets of PDF
> >>> files after certain intervals (for say 20 secs). The folder will be
> >> forced
> >>> to get empty, every time the new sets of PDF files are copied into it.
> I
> >> do
> >>> not want to loose the earlier saved index of the old files, while doing
> >> the
> >>> next incremental import.
> >>>
> >>> Currently, i am using Solr 6.6 version for the research.
> >>>
> >>> The dataimport handler config is currently like this :-
> >>>
> >>> 
> >>> 
> >>> 
> >>>>>> dataSource="null"
> >>>  recursive = "true"
> >>>  baseDir="\\CLDSINGH02\*RemoteFileDepot*"
> >>>  fileName=".*pdf" rootEntity="false">
> >>>
> >>>  
> >>>   -->
> >>>>> name="lastmodified" />
> >>>
> >>> >> onError="skip"
> >>>url="${K2FileEntity.
> fileAbsolutePath}"
> >> format="text">
> >>>
> >>>   >> meta="true"/>
> >>>   >> meta="true"/>
> >>>  
> >>>
> >>>   
> >>> 
> >>>
> >>>
> >>> Kind regards,
> >>> Karan Singh
> >>
> >>
>
>


Re: Querying on sum of child documents

2018-01-29 Thread Prath
Hi,
I used following query
{!parent which="isParent:1" exp=total}+description:JSON +exp:[4 TO 7]
{!func}exp.
It is considering only highest experience from matched descriptions but not
sum of matched description experience.
Can you please explain me in detail.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Using Solr with SharePoint Online

2018-01-29 Thread Mohammed . Adnan2
Hello Team,

I am a beginner learning Apache Solr. I am trying to check the compatibility of 
solr with SharePoint Online, but I am not getting anything concrete related to 
this in the website documentation. Can you please help me in providing some 
information on this? How I can index my SharePoint content with solr and then 
use solr on my SharePoint sites? I really appreciate your help on this.

Thanks,
Adnan