Re: How to persist the data in dataimport.properties

2020-09-09 Thread Bernd Fehling
It is kept in zookeeper within /configs/[collection_name], at least with my 
SolrCloud 6.6.6.

bin/solr zk ls /configs/[your_collection_name]

Regards
Bernd

Am 08.09.20 um 21:40 schrieb yaswanth kumar:
> Can someone help me on how to persists the data that's updated in
> dataimport.properties file because it got a last index time so that my data
> import depends on it for catching up the delta imports.
> 
> What I noticed is that every time when I restart solr this file is wiped
> out and getting its default content instead of what I used to see before
> solr service restart. So want to know if there is anything that I can do to
> persist the last successful index timestamp?
> 
> Solr version: 8.2
> Zookeeper: 3.4
> 


Unexpected behaviour for bracket

2020-09-09 Thread Jan Nehring

Hi fellow Solr users,

I use Solr in an application for full text search in textual data and I 
spent a lot of time debugging a strange behaviour of Solr. When I search 
for (ABC) then I want results with (ABC) in brackets only. But I get 
results for ABC also, without brackets.


I tried several ways how to formulate the query:

q=rawText:\(ABC)\
q=rawText:"(ABC)"
q=rawText:"\(ABC\)"

But all of them find results for ABC also.

Can you give me a hint why my Solr instance seems to ignore brackets?

Thank you very much Jan



Re: Solr Schema API seems broken to me after 8.2.0

2020-09-09 Thread jeanc...@gmail.com
Thanks for the reply,

I didn't see anything in the Solr logs BUT I'm going to recheck it next
week and update you.
Will check this as well:
* It could be that after the upgrade some filesystem permissions do not
work anymore *

Thanks

Best Regards,

*Jean Silva*


https://github.com/jeancsil

https://linkedin.com/in/jeancsil



On Tue, Sep 8, 2020 at 9:39 AM Jörn Franke  wrote:

> Can you check the logfiles of Solr?
>
> It could be that after the upgrade some filesystem permissions do not work
> anymore
>
> > Am 08.09.2020 um 09:27 schrieb "jeanc...@gmail.com"  >:
> >
> > Hey guys, good morning.
> >
> > As I didn't get any reply for this one, is it ok then that I create the
> > Jira ticket?
> >
> > Best Regards,
> >
> > *Jean Silva*
> >
> >
> > https://github.com/jeancsil
> >
> > https://linkedin.com/in/jeancsil
> >
> >
> >
> >> On Fri, Aug 28, 2020 at 11:10 AM jeanc...@gmail.com  >
> >> wrote:
> >>
> >> Hey everybody,
> >>
> >> First of all, I wanted to say that this is my first time writing here. I
> >> hope I don't do anything wrong.
> >> I went to create the "bug" ticket and saw it would be a good idea to
> first
> >> talk to some of you via IRC (didn't work for me or I did something wrong
> >> after 20 years of not using it..)
> >>
> >> I'm currently using Solr 8.1.1 in production and I use the Schema API to
> >> create the necessary fields before starting to index my new data.
> (Reason,
> >> the managed-schema would be big for me to take care of and I decided to
> >> automate this process by using the REST API).
> >>
> >> I started trying to upgrade* from 8.1.1* directly to *8.6.1* and the
> >> python script I use to add some fields and analyzers started to *kill
> >> solr after some successful processes to finish* without issues.
> >>
> >> *Let's put it simply that I have to make sure the fields that contain
> the
> >> word "blablabla" in it need to be deleted and then recreated. I have
> ~33 of
> >> them.*
> >>
> >> The script works as expected but after some successful creations it
> kills
> >> Solr!
> >>
> >> This script was implemented in python and I thought that I might have
> done
> >> something that doesn't work with Solr 8.6.1 anymore and decided to test
> it
> >> with the *proper implementation of the library in Java*, SolrJ 8.6.1 as
> >> well. The same error occurred. I also didn't see any change in the
> >> documentation with regards to the request I was making.
> >>
> >> Unfortunately I don't have any stacktrace from Solr as there were no
> >> errors popping up in the console for me. The only thing I see was the
> >> output of my script, saying that the *Remote closed connection without
> >> response*:
> >> ...
> >> Traceback (most recent call last):
> >>  File
> "/usr/local/lib/python3.7/dist-packages/urllib3/connectionpool.py",
> >> line 677, in urlopen
> >>chunked=chunked,
> >>  File
> "/usr/local/lib/python3.7/dist-packages/urllib3/connectionpool.py",
> >> line 426, in _make_request
> >>six.raise_from(e, None)
> >>  File "", line 3, in raise_from
> >>  File
> "/usr/local/lib/python3.7/dist-packages/urllib3/connectionpool.py",
> >> line 421, in _make_request
> >>httplib_response = conn.getresponse()
> >>  File "/usr/lib/python3.7/http/client.py", line 1336, in getresponse
> >>response.begin()
> >>  File "/usr/lib/python3.7/http/client.py", line 306, in begin
> >>version, status, reason = self._read_status()
> >>  File "/usr/lib/python3.7/http/client.py", line 275, in _read_status
> >>raise RemoteDisconnected("Remote end closed connection without"
> >> http.client.RemoteDisconnected: Remote end closed connection without
> >> response
> >>
> >>
> >> With *Java and SolrJ* matching the Solr version I was using, I got this:
> >>
> >> Deleting field field_name_1
> >> {responseHeader={status=0,QTime=2187}}
> >>
> >> Deleting field field_name_2
> >> {responseHeader={status=0,QTime=1571}}
> >>
> >> Deleting field field_name_3
> >> {responseHeader={status=0,QTime=1587}}
> >>
> >> Deleting field field_name_4
> >> Exception while deleting the field field_name_4:* IOException occured
> >> when talking to server at: http://localhost:32783/solr/my_core_name
> >> *
> >>
> >> Deleting field field_name_5
> >> Exception while deleting the field field_name_5:* IOException occured
> >> when talking to server at: http://localhost:32783/solr/my_core_name
> >> *
> >> // THIS REPEATES FOR + 30 TIMES AND THEN THE MESSAGE CHANGES A BIT
> >>
> >> Exception while deleting the field field_name_6:* Server refused
> >> connection at:
> >> http://localhost:32783/solr/my_core_name/schema?wt=javabin&version=2
> >> *
> >> Deleting field field_name_6
> >> // REPEATS ALSO MANY TIMES
> >>
> >> Maybe I need to run the same thing again with some different
> configuration
> >> to help give you guys a hint on what the problem is?
> >>
> >> To fina

NullPointerException in IndexSearcher.explain() when using ComplexPhraseQueryParser

2020-09-09 Thread Michał Słomkowski
Hello,

I get NPE when I use IndexSearcher.explain(). Checked with Lucene 8.6.0
and 8.6.2.

The query: (lorem AND NOT "dolor lorem") OR ipsum
The text: dolor lorem ipsum

Stack trace:
> java.lang.NullPointerException
>   at java.util.Objects.requireNonNull(Objects.java:203)
>   at org.apache.lucene.search.LeafSimScorer.(LeafSimScorer.java:38)
>   at 
> org.apache.lucene.search.spans.SpanWeight.explain(SpanWeight.java:160)
>   at org.apache.lucene.search.BooleanWeight.explain(BooleanWeight.java:87)
>   at org.apache.lucene.search.BooleanWeight.explain(BooleanWeight.java:87)
>   at 
> org.apache.lucene.search.IndexSearcher.explain(IndexSearcher.java:716)
>   at 
> org.apache.lucene.search.IndexSearcher.explain(IndexSearcher.java:693)

The sample code:

val analyzer = new StandardAnalyzer();
val query = new ComplexPhraseQueryParser("",
analyzer).parse(queryString);

final MemoryIndex memoryIndex = new MemoryIndex(true);
memoryIndex.addField("", text, analyzer);

final IndexSearcher searcher = memoryIndex.createSearcher();
final TopDocs topDocs = searcher.search(query, 1);

final ScoreDoc match = topDocs.scoreDocs[0];

searcher.explain(query, match.doc);



-- 
Michał Słomkowski


Re: Unexpected behaviour for bracket

2020-09-09 Thread Erick Erickson
Places to look:

> add &debug=query to the query and look at the parsed result. Does the parsed 
> version match what you expect? Hint: un-check the “verbose” checkbox, at this 
> level the detailed information probably is just distracting.

> The admin UI>>select_your_core>>analysis page. Put your text in both the 
> “index” and “query” boxes and see what pops out. One thing here that’s often 
> confusing. This is what Solr does with a term _after_ it’s through the 
> parsing process which can fool people. What you enter should be what you see 
> assigned to your field in the parsed query from adding &debug=query above.

> adminUI>>select_your_core>>schema_analysis. Select your field and press the 
> button that loads terms. That’ll show you exactly what’s in your index. The 
> Terms Component let’s you do the same with curl or a browser with more 
> control, see: 
> https://lucene.apache.org/solr/guide/8_5/the-terms-component.html

My bet is that it has nothing to do with your query and everything to do with 
your analysis chain. For instance, WordDelimiter(Graph)FilterFactory will break 
the input up by on non-alphanumerics. So what you actually have in your index 
is the raw ABC. Ditto at query time.

Best,
Erick



> On Sep 9, 2020, at 4:38 AM, Jan Nehring  wrote:
> 
> Hi fellow Solr users,
> 
> I use Solr in an application for full text search in textual data and I spent 
> a lot of time debugging a strange behaviour of Solr. When I search for (ABC) 
> then I want results with (ABC) in brackets only. But I get results for ABC 
> also, without brackets.
> 
> I tried several ways how to formulate the query:
> 
> q=rawText:\(ABC)\
> q=rawText:"(ABC)"
> q=rawText:"\(ABC\)"
> 
> But all of them find results for ABC also.
> 
> Can you give me a hint why my Solr instance seems to ignore brackets?
> 
> Thank you very much Jan
> 



Lowercase-ing everything but acronyms

2020-09-09 Thread Dunham-Wilkie, Mike CITZ:EX
Hi SOLR list,

I'm currently using the White Space tokenizer and the Lower Case filter with 
SOLR 7.3.  I'd like to modify the logic to keep any tokens that are entirely 
upper case as upper case, and just apply the Lower Case filter (or something 
equivalent) to the remaining tokens.  Is there a way to do this using 
tokenizers and filters?

Thanks
Mike


Mike Dunham-Wilkie | Senior Spatial Data Administration Analyst | PHONE... 
778-676-1791
Data Systems & Services - Digital Platforms and Data Division - Ministry of 
Citizens' Services

For faster response and/or future inquires, the following email addresses are 
monitored continuously:
BC Geographic Warehouse (BCGW) and Replication/ETL | DataBC Data Architecture 
Services (databc...@gov.bc.ca)
BC Data Catalogue (BCDC) and Open Data | DataBC Catalogue Services 
(data...@gov.bc.ca)



Re: Lowercase-ing everything but acronyms

2020-09-09 Thread Stavros Macrakis
I can't help you on the implementation issues, but...

You may want to do something a little different than keep all-uppercase
tokens in upper case. You may want simply to special-case all-uppercase
stopwords, so that they are not ignored. The poster boy for that is IT,
which in my last search application, was *extremely common *and important.
On the corpus side, [it] and [IT] are very distinct. But on the query side,
most users will write [it], so it's fine to have it in the index as [it]
and not [IT]. Similarly for ON (Ontario) and ME (Maine). A nasty one is OR:
if you are using all-uppercase OR for the Boolean operator, how do users
enter OR meaning Operations Research? We know that not many users will
write ["OR"]. So you may simply want to allow lowercase [or] in the query
to match uppercase [OR] in the corpus, and reserve uppercase OR for the
Boolean operator.  Other cases are much rarer (Dijsktra's THE operating
system is of historical interest only...). For non-stopwords, there doesn't
seem to be much of a problem.

  -s

On Wed, Sep 9, 2020 at 2:59 PM Dunham-Wilkie, Mike CITZ:EX <
mike.dunham-wil...@gov.bc.ca> wrote:

> Hi SOLR list,
>
> I'm currently using the White Space tokenizer and the Lower Case filter
> with SOLR 7.3.  I'd like to modify the logic to keep any tokens that are
> entirely upper case as upper case, and just apply the Lower Case filter (or
> something equivalent) to the remaining tokens.  Is there a way to do this
> using tokenizers and filters?
>
> Thanks
> Mike
>
>
> Mike Dunham-Wilkie | Senior Spatial Data Administration Analyst | PHONE...
> 778-676-1791
> Data Systems & Services - Digital Platforms and Data Division - Ministry
> of Citizens' Services
>
> For faster response and/or future inquires, the following email addresses
> are monitored continuously:
> BC Geographic Warehouse (BCGW) and Replication/ETL | DataBC Data
> Architecture Services (databc...@gov.bc.ca)
> BC Data Catalogue (BCDC) and Open Data | DataBC Catalogue Services (
> data...@gov.bc.ca)
>
>


Re: Creating a phrase match feature in LTR

2020-09-09 Thread krishan goyal
Hi,

Can anyone help me on this ? I am stuck on this for days.

On Tue, Sep 8, 2020 at 3:02 PM krishan goyal  wrote:

> Thanks Dmitry.
>
> Using
>  "q": "{!complexphrase inOrder=true}fieldName:${input}"
> works for single token queries but raises same exception when input is
> multi token
>
> Using
> "q": "{!complexphrase inOrder=true df=fieldName}${input}"
> works for all types of tokens but the scoring logic isn't the same as "pf"
> or as using the same query via query reranking -
> rqq: "{!complexphrase inOrder=true v=$v1}",
> v1: "query(fieldName:"some text"^1.0)",
>
> Eg:
> query: "nike red shoes"
> I expect the phrase score to be 0 if the tokens are not in order in the
> document or if any one token is absent in the document.
>
> This is the score returned based on document and the type of reranking
>
> Document LTR score
> Reranking score
> "nike red shoes" 3 3
> "nike caps" 1 0
> "nike shoes red" 3 0
> What is the cause for LTR score to not match query reranking score
>
>
> On Fri, Aug 28, 2020 at 11:17 PM Dmitry Kan  wrote:
>
>> Hi Krishan,
>>
>> What if you remove the query() wrapping?
>>
>> {
>>   "name": "phraseMatch",
>>   "class": "org.apache.solr.ltr.feature.SolrFeature",
>>   "params": {
>> "q": "{!complexphrase inOrder=true}fieldName:${input}"
>>   },
>>   "store": "_DEFAULT_"
>> }
>>
>> or even:
>>
>> {
>>   "name": "phraseMatch",
>>   "class": "org.apache.solr.ltr.feature.SolrFeature",
>>   "params": {
>> "q": "{!complexphrase inOrder=true df=fieldName}${input}"
>>   },
>>   "store": "_DEFAULT_"
>> }
>>
>>
>> On Tue, Aug 25, 2020 at 9:59 AM krishan goyal 
>> wrote:
>>
>> > Hi,
>> >
>> > I am trying to create a phrase match feature (what "pf" does in
>> > dismax/edismax parsers)
>> >
>> > I've tried various ways to set it up
>> >
>> > {
>> >   "name": "phraseMatch",
>> >   "class": "org.apache.solr.ltr.feature.SolrFeature",
>> >   "params": {
>> > "q": "{!complexphrase inOrder=true}query(fieldName:${input})"
>> >   },
>> >   "store": "_DEFAULT_"
>> > }
>> >
>> > This fails with the exception
>> >
>> > Exception from createWeight for SolrFeature [name=phraseMatch,
>> > params={q={!complexphrase inOrder=true}query(fieldName:${input})}] null
>> >
>> > But similar query works when used in the query reranking construct with
>> > these params
>> >
>> > rqq: "{!complexphrase inOrder=true v=$v1}",
>> > v1: "query(fieldName:"some text"~2^1.0,0)",
>> >
>> > What is the problem in the LTR configuration for the feature ?
>> >
>>
>>
>> --
>> Dmitry Kan
>> Luke Toolbox: http://github.com/DmitryKey/luke
>> Blog: http://dmitrykan.blogspot.com
>> Twitter: http://twitter.com/dmitrykan
>> SemanticAnalyzer: https://semanticanalyzer.info
>>
>