Re: solr 8.3 indexing wrong values in some fields

2019-12-02 Thread Colvin Cowie
This sounds like https://issues.apache.org/jira/browse/SOLR-13963
Solr 8.3.1 is likely to be available soon - RC2 is at
https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.3.1-RC2-reva3d456fba2cd1b9892defbcf46a0eb4d4bb4d01f/solr/
Re-index on it, and see if you still have issues.

On Sun, 1 Dec 2019 at 17:35, Odysci  wrote:

> Hi,
> I have a solr cloud setup using solr 8.3 and zookeeper, which I recently
> converted from solr 7.7. I converted the index using the index updater and
> it all went fine. My index has about 40 million docs.
> I used a separate program to check the values of all fields in the solr
> docs, for consistency (e.g., fields which are supposed to have
> only numbers, or only alpha chars, etc.). I ran this program immediately
> after the index updating and it did not detect any problems.
>
> Then I started regular use of the system, indexing new documents, and I
> noticed that some fields were getting the wrong values. For example, a
> field which was supposed to be a string with only digits had a string
> containing parts of another field name. It looked as if memory was getting
> corrupted. There were no error msgs in the solr logs.
> In other words, solr 8.3 seems to be indexing wrong values in some fields.
> This happens very few times, but it's happening.
> Has anyone seen this happening?
> Thanks!
>
> Reinaldo
>


hi question about solr

2019-12-02 Thread eli chen
hi im kind of new to solr so please be patient

i'll try to explain what do i need and what im trying to do.

we a have a lot of books content and we want to index them and allow search
in the books.
when someone search for a term
i need to get back the position of matchen word in the book
for example
if the book content is "hello my name is jeff" and someone search for "my".
i want to get back the position of my in the content field (which is 1 in
this case)
i tried to do that with payloads but no success. and another problem i
encourage is .
lets say the content field is "hello my name is jeff what is your name".
now if someone search for "name" i want to get back the index of all
occurrences not just the first one

is there any way to that with solr without develop new plugins

thx


Re: hi question about solr

2019-12-02 Thread Bernd Fehling
In short,

you are trying to use an indexer as a full-text search engine, right?

Regards
Bernd

Am 02.12.19 um 12:24 schrieb eli chen:
> hi im kind of new to solr so please be patient
> 
> i'll try to explain what do i need and what im trying to do.
> 
> we a have a lot of books content and we want to index them and allow search
> in the books.
> when someone search for a term
> i need to get back the position of matchen word in the book
> for example
> if the book content is "hello my name is jeff" and someone search for "my".
> i want to get back the position of my in the content field (which is 1 in
> this case)
> i tried to do that with payloads but no success. and another problem i
> encourage is .
> lets say the content field is "hello my name is jeff what is your name".
> now if someone search for "name" i want to get back the index of all
> occurrences not just the first one
> 
> is there any way to that with solr without develop new plugins
> 
> thx
> 


Re: hi question about solr

2019-12-02 Thread eli chen
yes

On Mon, 2 Dec 2019 at 13:29, Bernd Fehling 
wrote:

> In short,
>
> you are trying to use an indexer as a full-text search engine, right?
>
> Regards
> Bernd
>
> Am 02.12.19 um 12:24 schrieb eli chen:
> > hi im kind of new to solr so please be patient
> >
> > i'll try to explain what do i need and what im trying to do.
> >
> > we a have a lot of books content and we want to index them and allow
> search
> > in the books.
> > when someone search for a term
> > i need to get back the position of matchen word in the book
> > for example
> > if the book content is "hello my name is jeff" and someone search for
> "my".
> > i want to get back the position of my in the content field (which is 1 in
> > this case)
> > i tried to do that with payloads but no success. and another problem i
> > encourage is .
> > lets say the content field is "hello my name is jeff what is your name".
> > now if someone search for "name" i want to get back the index of all
> > occurrences not just the first one
> >
> > is there any way to that with solr without develop new plugins
> >
> > thx
> >
>


Re: hi question about solr

2019-12-02 Thread Charlie Hull

Hi,

https://livebook.manning.com/book/solr-in-action/chapter-3 may help (I'd 
suggest reading the whole book as well).


Basically what you're looking for is the 'term position'. The 
TermVectorComponent in Solr will allow you to return this for each result.


Cheers

Charlie

On 02/12/2019 11:24, eli chen wrote:

hi im kind of new to solr so please be patient

i'll try to explain what do i need and what im trying to do.

we a have a lot of books content and we want to index them and allow search
in the books.
when someone search for a term
i need to get back the position of matchen word in the book
for example
if the book content is "hello my name is jeff" and someone search for "my".
i want to get back the position of my in the content field (which is 1 in
this case)
i tried to do that with payloads but no success. and another problem i
encourage is .
lets say the content field is "hello my name is jeff what is your name".
now if someone search for "name" i want to get back the index of all
occurrences not just the first one

is there any way to that with solr without develop new plugins

thx



--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk



Re: hi question about solr

2019-12-02 Thread eli chen
first of all thank you very much. i was looking for good resource to read
on solr.

i actually already tried the term vector. but for it to work i had to set
the fl=content which response with the value of content field (which really
really big)


How to control the number of grouped results [DRUPAL]

2019-12-02 Thread alee2
Hello everyone, I need some help with the number of results that is generated
per group, what I need is to control how many results appear per group type,
currently they are sorting out a little bit in a disorderly way, and I can't
get that. end user see all categories, would you like to know if there is
any way for me to control how many results appear per group in the results
screen.

Image -1: Selected group by category option.
 

Image -2: Results that are generated with grouping done
 

I would need to enter the 13 categories in the results screen controlling 4
item by category, does anyone have any idea how I can do this?




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: How to control the number of grouped results [DRUPAL]

2019-12-02 Thread Saurabh Sharma
You can use group.limit parameter to get required number of results per
group. This value is 1 result per group by default.

On Mon, Dec 2, 2019, 7:58 PM alee2  wrote:

> Hello everyone, I need some help with the number of results that is
> generated
> per group, what I need is to control how many results appear per group
> type,
> currently they are sorting out a little bit in a disorderly way, and I
> can't
> get that. end user see all categories, would you like to know if there is
> any way for me to control how many results appear per group in the results
> screen.
>
> Image -1: Selected group by category option.
> 
>
> Image -2: Results that are generated with grouping done
> 
>
> I would need to enter the 13 categories in the results screen controlling 4
> item by category, does anyone have any idea how I can do this?
>
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: How to control the number of grouped results [DRUPAL]

2019-12-02 Thread alee2
Where would I put this parameter ?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: How to control the number of grouped results [DRUPAL]

2019-12-02 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
This parameter referers to the Solr request, for example: 

https://lucene.apache.org/solr/guide/7_0/result-grouping.html#grouping-by-query

Drupal should expose it in the API, I guess? 

Cheers,
diego

From: solr-user@lucene.apache.org At: 12/02/19 14:47:06To:  
solr-user@lucene.apache.org
Subject: Re: How to control the number of grouped results [DRUPAL]

Where would I put this parameter ?


--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html




Re: How to control the number of grouped results [DRUPAL]

2019-12-02 Thread Erick Erickson
This is a Solr parameter, how Drupal allows you to pass it on would probably be 
a better asked of Drupal.

> On Dec 2, 2019, at 10:01 AM, Diego Ceccarelli (BLOOMBERG/ LONDON) 
>  wrote:
> 
> This parameter referers to the Solr request, for example: 
> 
> https://lucene.apache.org/solr/guide/7_0/result-grouping.html#grouping-by-query
> 
> Drupal should expose it in the API, I guess? 
> 
> Cheers,
> diego
> 
> From: solr-user@lucene.apache.org At: 12/02/19 14:47:06To:  
> solr-user@lucene.apache.org
> Subject: Re: How to control the number of grouped results [DRUPAL]
> 
> Where would I put this parameter ?
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> 
> 



Re: Is it possible to use the Lucene Query Builder? Is there any API to create boolean queries?

2019-12-02 Thread Alexandre Rafalovitch
What about XMLQueryParser:
https://lucene.apache.org/solr/guide/8_2/other-parsers.html#xml-query-parser

Regards,
   Alex.

On Wed, 27 Nov 2019 at 22:43,  wrote:
>
> I am trying to simulate the following query(Lucene query builder) using Solr
>
>
>
>
> BooleanQuery.Builder main = new BooleanQuery.Builder();
>
> Term t1 = new Term("f1","term");
> Term t2 = new Term("f1","second");
> Term t3 = new Term("f1","another");
>
> BooleanQuery.Builder q1 = new BooleanQuery.Builder();
> q1.add(new FuzzyQuery(t1,2), BooleanClause.Occur.SHOULD);
> q1.add(new FuzzyQuery(t2,2), BooleanClause.Occur.SHOULD);
> q1.add(new FuzzyQuery(t3,2), BooleanClause.Occur.SHOULD);
> q1.setMinimumNumberShouldMatch(2);
>
> Term t4 = new Term("f1","anothert");
> Term t5 = new Term("f1","anothert2");
> Term t6 = new Term("f1","anothert3");
>
> BooleanQuery.Builder q2 = new BooleanQuery.Builder();
> q2.add(new FuzzyQuery(t4,2), BooleanClause.Occur.SHOULD);
> q2.add(new FuzzyQuery(t5,2), BooleanClause.Occur.SHOULD);
> q2.add(new FuzzyQuery(t6,2), BooleanClause.Occur.SHOULD);
> q2.setMinimumNumberShouldMatch(2);
>
>
> main.add(q1.build(),BooleanClause.Occur.SHOULD);
> main.add(q2.build(),BooleanClause.Occur.SHOULD);
> main.setMinimumNumberShouldMatch(1);
>
> System.out.println(main.build()); // (((f1:term~2 f1:second~2
> f1:another~2)~2) ((f1:anothert~2 f1:anothert2~2 f1:anothert3~2)~2))~1   -->
> Invalid Solr Query
>
>
>
>
>
> In a few words :  ( q1 OR q2 )
>
>
>
> Where q1 and q2 are a set of different terms using I'd like to do a fuzzy
> search but I also need a minimum of terms to match.
>
>
>
> The best I was able to create was something like this  :
>
>
>
> SolrQuery query = new SolrQuery();
> query.set("fl", "term");
> query.set("q", "term~1 term2~2 term3~2");
> query.set("mm",2);
>
> System.out.println(query);
>
>
>
> And I was unable to find any example that would allow me to do the type of
> query that I am trying to build with only one solr query.
>
>
>
> Is it possible to use the Lucene Query builder with Solr? Is there any way
> to create Boolean queries with Solr? Do I need to build the query as a
> String? If so , how do I set the mm parameter in a String query?
>
>
>
> Thank you
>


Re: Is it possible to use the Lucene Query Builder? Is there any API to create boolean queries?

2019-12-02 Thread Mikhail Khludnev
and Query DSL as well. Although, it didn't get the point in the topic
starter.

On Mon, Dec 2, 2019 at 9:16 PM Alexandre Rafalovitch 
wrote:

> What about XMLQueryParser:
>
> https://lucene.apache.org/solr/guide/8_2/other-parsers.html#xml-query-parser
>
> Regards,
>Alex.
>
> On Wed, 27 Nov 2019 at 22:43,  wrote:
> >
> > I am trying to simulate the following query(Lucene query builder) using
> Solr
> >
> >
> >
> >
> > BooleanQuery.Builder main = new BooleanQuery.Builder();
> >
> > Term t1 = new Term("f1","term");
> > Term t2 = new Term("f1","second");
> > Term t3 = new Term("f1","another");
> >
> > BooleanQuery.Builder q1 = new BooleanQuery.Builder();
> > q1.add(new FuzzyQuery(t1,2), BooleanClause.Occur.SHOULD);
> > q1.add(new FuzzyQuery(t2,2), BooleanClause.Occur.SHOULD);
> > q1.add(new FuzzyQuery(t3,2), BooleanClause.Occur.SHOULD);
> > q1.setMinimumNumberShouldMatch(2);
> >
> > Term t4 = new Term("f1","anothert");
> > Term t5 = new Term("f1","anothert2");
> > Term t6 = new Term("f1","anothert3");
> >
> > BooleanQuery.Builder q2 = new BooleanQuery.Builder();
> > q2.add(new FuzzyQuery(t4,2), BooleanClause.Occur.SHOULD);
> > q2.add(new FuzzyQuery(t5,2), BooleanClause.Occur.SHOULD);
> > q2.add(new FuzzyQuery(t6,2), BooleanClause.Occur.SHOULD);
> > q2.setMinimumNumberShouldMatch(2);
> >
> >
> > main.add(q1.build(),BooleanClause.Occur.SHOULD);
> > main.add(q2.build(),BooleanClause.Occur.SHOULD);
> > main.setMinimumNumberShouldMatch(1);
> >
> > System.out.println(main.build()); // (((f1:term~2 f1:second~2
> > f1:another~2)~2) ((f1:anothert~2 f1:anothert2~2 f1:anothert3~2)~2))~1
>  -->
> > Invalid Solr Query
> >
> >
> >
> >
> >
> > In a few words :  ( q1 OR q2 )
> >
> >
> >
> > Where q1 and q2 are a set of different terms using I'd like to do a fuzzy
> > search but I also need a minimum of terms to match.
> >
> >
> >
> > The best I was able to create was something like this  :
> >
> >
> >
> > SolrQuery query = new SolrQuery();
> > query.set("fl", "term");
> > query.set("q", "term~1 term2~2 term3~2");
> > query.set("mm",2);
> >
> > System.out.println(query);
> >
> >
> >
> > And I was unable to find any example that would allow me to do the type
> of
> > query that I am trying to build with only one solr query.
> >
> >
> >
> > Is it possible to use the Lucene Query builder with Solr? Is there any
> way
> > to create Boolean queries with Solr? Do I need to build the query as a
> > String? If so , how do I set the mm parameter in a String query?
> >
> >
> >
> > Thank you
> >
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Is it possible to use the Lucene Query Builder? Is there any API to create boolean queries?

2019-12-02 Thread yeikel valdes
Is there any builder for the XMLQueryParser so that we don't need to build as a 
String?


And what query DSL are you referring to?


 On Mon, 02 Dec 2019 08:00:57 -1100 m...@apache.org wrote 


and Query DSL as well. Although, it didn't get the point in the topic
starter.

On Mon, Dec 2, 2019 at 9:16 PM Alexandre Rafalovitch 
wrote:

> What about XMLQueryParser:
>
> https://lucene.apache.org/solr/guide/8_2/other-parsers.html#xml-query-parser
>
> Regards,
> Alex.
>
> On Wed, 27 Nov 2019 at 22:43,  wrote:
> >
> > I am trying to simulate the following query(Lucene query builder) using
> Solr
> >
> >
> >
> >
> > BooleanQuery.Builder main = new BooleanQuery.Builder();
> >
> > Term t1 = new Term("f1","term");
> > Term t2 = new Term("f1","second");
> > Term t3 = new Term("f1","another");
> >
> > BooleanQuery.Builder q1 = new BooleanQuery.Builder();
> > q1.add(new FuzzyQuery(t1,2), BooleanClause.Occur.SHOULD);
> > q1.add(new FuzzyQuery(t2,2), BooleanClause.Occur.SHOULD);
> > q1.add(new FuzzyQuery(t3,2), BooleanClause.Occur.SHOULD);
> > q1.setMinimumNumberShouldMatch(2);
> >
> > Term t4 = new Term("f1","anothert");
> > Term t5 = new Term("f1","anothert2");
> > Term t6 = new Term("f1","anothert3");
> >
> > BooleanQuery.Builder q2 = new BooleanQuery.Builder();
> > q2.add(new FuzzyQuery(t4,2), BooleanClause.Occur.SHOULD);
> > q2.add(new FuzzyQuery(t5,2), BooleanClause.Occur.SHOULD);
> > q2.add(new FuzzyQuery(t6,2), BooleanClause.Occur.SHOULD);
> > q2.setMinimumNumberShouldMatch(2);
> >
> >
> > main.add(q1.build(),BooleanClause.Occur.SHOULD);
> > main.add(q2.build(),BooleanClause.Occur.SHOULD);
> > main.setMinimumNumberShouldMatch(1);
> >
> > System.out.println(main.build()); // (((f1:term~2 f1:second~2
> > f1:another~2)~2) ((f1:anothert~2 f1:anothert2~2 f1:anothert3~2)~2))~1
> -->
> > Invalid Solr Query
> >
> >
> >
> >
> >
> > In a few words : ( q1 OR q2 )
> >
> >
> >
> > Where q1 and q2 are a set of different terms using I'd like to do a fuzzy
> > search but I also need a minimum of terms to match.
> >
> >
> >
> > The best I was able to create was something like this :
> >
> >
> >
> > SolrQuery query = new SolrQuery();
> > query.set("fl", "term");
> > query.set("q", "term~1 term2~2 term3~2");
> > query.set("mm",2);
> >
> > System.out.println(query);
> >
> >
> >
> > And I was unable to find any example that would allow me to do the type
> of
> > query that I am trying to build with only one solr query.
> >
> >
> >
> > Is it possible to use the Lucene Query builder with Solr? Is there any
> way
> > to create Boolean queries with Solr? Do I need to build the query as a
> > String? If so , how do I set the mm parameter in a String query?
> >
> >
> >
> > Thank you
> >
>


--
Sincerely yours
Mikhail Khludnev


Is it possible to have different Stop words depending on the value of a field?

2019-12-02 Thread yeikel valdes
Hi,


I have an index that stores addresses from different countries.


As every country has different stop words, I was wondering if it is possible to 
apply a different set of stop words depending on the value of a field. 


Or do I need different indexes/do itnat the ETL step to accomplish this?




Re: Is it possible to have different Stop words depending on the value of a field?

2019-12-02 Thread Jörn Franke
You can have different fields by country. I am not sure about your stop words 
but if they are not occurring in the other languages then you have not a 
problem. 
On the other hand: it you need more than stop words (eg lemmatizing, 
specialized way of tokenization etc) then you need a different field per 
language. You don’t describe your full use case, but if you have different 
fields for different language then your client application needs to handle this 
(not difficult, but you have to be aware).
Not sure if you need to search a given address in all languages or if you use 
the language of the user etc.

> Am 02.12.2019 um 20:13 schrieb yeikel valdes :
> 
> Hi,
> 
> 
> I have an index that stores addresses from different countries.
> 
> 
> As every country has different stop words, I was wondering if it is possible 
> to apply a different set of stop words depending on the value of a field. 
> 
> 
> Or do I need different indexes/do itnat the ETL step to accomplish this?
> 
> 


Re: Is it possible to have different Stop words depending on the value of a field?

2019-12-02 Thread Walter Underwood
The best approach is to not use stop words at all. That gives better relevance 
with less configuration, so it is a total win.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Dec 2, 2019, at 12:24 PM, Jörn Franke  wrote:
> 
> You can have different fields by country. I am not sure about your stop words 
> but if they are not occurring in the other languages then you have not a 
> problem. 
> On the other hand: it you need more than stop words (eg lemmatizing, 
> specialized way of tokenization etc) then you need a different field per 
> language. You don’t describe your full use case, but if you have different 
> fields for different language then your client application needs to handle 
> this (not difficult, but you have to be aware).
> Not sure if you need to search a given address in all languages or if you use 
> the language of the user etc.
> 
>> Am 02.12.2019 um 20:13 schrieb yeikel valdes :
>> 
>> Hi,
>> 
>> 
>> I have an index that stores addresses from different countries.
>> 
>> 
>> As every country has different stop words, I was wondering if it is possible 
>> to apply a different set of stop words depending on the value of a field. 
>> 
>> 
>> Or do I need different indexes/do itnat the ETL step to accomplish this?
>> 
>> 



Exact match

2019-12-02 Thread OTH
Hello,

What would be the best way to get exact matches (if any) to a query?

E.g.:  Let's the document text is:  "united states of america".
Currently, any query containing one or more of the three words "united",
"states", or "america" will match with the above document.  I would like a
way so that the document matches only and only if the query were also
"united states of america" (case-insensitive).

Document field type:  TextField
Index Analyzer: TokenizerChain
Index Tokenizer: StandardTokenizerFactory
Index Token Filters: StopFilterFactory, LowerCaseFilterFactory,
SnowballPorterFilterFactory
The Query Analyzer / Tokenizer / Token Filters are the same as the Index
ones above.

FYI I'm relatively novice at Solr / Lucene / Search.

Much appreciated
Omer


Re: Exact match

2019-12-02 Thread David Hastings
if the query is in quotes it will work.  also, not sure if youve been
following, but get rid of:
StopFilterFactory and all stopwords, or just make your stop word file empty
if you need it to work in non quotes, add them to the query post
submission ?

On Mon, Dec 2, 2019 at 3:44 PM OTH  wrote:

> Hello,
>
> What would be the best way to get exact matches (if any) to a query?
>
> E.g.:  Let's the document text is:  "united states of america".
> Currently, any query containing one or more of the three words "united",
> "states", or "america" will match with the above document.  I would like a
> way so that the document matches only and only if the query were also
> "united states of america" (case-insensitive).
>
> Document field type:  TextField
> Index Analyzer: TokenizerChain
> Index Tokenizer: StandardTokenizerFactory
> Index Token Filters: StopFilterFactory, LowerCaseFilterFactory,
> SnowballPorterFilterFactory
> The Query Analyzer / Tokenizer / Token Filters are the same as the Index
> ones above.
>
> FYI I'm relatively novice at Solr / Lucene / Search.
>
> Much appreciated
> Omer
>


Re: Exact match

2019-12-02 Thread Emir Arnautović
Hi Omer,
From performance perspective, it is the best if you index title as a single 
token: KeywordTokenizer + LowerCaseFilter

If you need to query that field in some other way, you can index it differently 
as some other field using copyField.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 2 Dec 2019, at 21:43, OTH  wrote:
> 
> Hello,
> 
> What would be the best way to get exact matches (if any) to a query?
> 
> E.g.:  Let's the document text is:  "united states of america".
> Currently, any query containing one or more of the three words "united",
> "states", or "america" will match with the above document.  I would like a
> way so that the document matches only and only if the query were also
> "united states of america" (case-insensitive).
> 
> Document field type:  TextField
> Index Analyzer: TokenizerChain
> Index Tokenizer: StandardTokenizerFactory
> Index Token Filters: StopFilterFactory, LowerCaseFilterFactory,
> SnowballPorterFilterFactory
> The Query Analyzer / Tokenizer / Token Filters are the same as the Index
> ones above.
> 
> FYI I'm relatively novice at Solr / Lucene / Search.
> 
> Much appreciated
> Omer



Re: Exact match

2019-12-02 Thread Erick Erickson
There are two different interpretations of “exact match” going on here, don’t 
be confused!

Emir’s version is “the text has to match the _entire_ input. So a field with “a 
b c d” will NOT match “a b” or “a b c” or “b c", but only “a b c d”.

David’s version is “The text has to contain some sequence of words that exactly 
matches my query”, so a field with “a b c d” _would_ match “a b”, “a b c”, “a b 
c d”, “b c”, “c d”, etc.

Both are entirely valid use-cases, depending on what you mean by “exact match"

Best,
Erick

> On Dec 2, 2019, at 4:38 PM, Emir Arnautović  
> wrote:
> 
> Hi Omer,
> From performance perspective, it is the best if you index title as a single 
> token: KeywordTokenizer + LowerCaseFilter
> 
> If you need to query that field in some other way, you can index it 
> differently as some other field using copyField.
> 
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> 
> 
> 
>> On 2 Dec 2019, at 21:43, OTH  wrote:
>> 
>> Hello,
>> 
>> What would be the best way to get exact matches (if any) to a query?
>> 
>> E.g.:  Let's the document text is:  "united states of america".
>> Currently, any query containing one or more of the three words "united",
>> "states", or "america" will match with the above document.  I would like a
>> way so that the document matches only and only if the query were also
>> "united states of america" (case-insensitive).
>> 
>> Document field type:  TextField
>> Index Analyzer: TokenizerChain
>> Index Tokenizer: StandardTokenizerFactory
>> Index Token Filters: StopFilterFactory, LowerCaseFilterFactory,
>> SnowballPorterFilterFactory
>> The Query Analyzer / Tokenizer / Token Filters are the same as the Index
>> ones above.
>> 
>> FYI I'm relatively novice at Solr / Lucene / Search.
>> 
>> Much appreciated
>> Omer
> 



Re: Solr Case Insensitive Search while preserving cases in Index and allowing Boolean AND/OR searches

2019-12-02 Thread Emir Arnautović
Hi Lewin,
Not sure I follow your example. From what I read, you could have one field 
lowercased and other not and filter on the first field and facet on the second. 
There is probably something that I am missing, so some example would probably 
help.

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 25 Nov 2019, at 23:00, Lewin Joy (TMNA)  wrote:
> 
> Hi,
> 
> I am exploring possibility to do case insensitive filter/facet queries in 
> solr.
> I would also need to preserve the cases in the index.
> This means that the normal LowerCaseFilterFactory approach would not work as 
> facet values will not preserve cases and will show in all lowercase.
> 
> One method was to use facet.contains along with 
> f.fieldname.facet.ignoreCase=true.
> But, I need an option to do more with the search keyword. 
> Example if possible,  would be something like  --> facet.contains=Apple OR 
> Dell OR HP
> 
> Another approach is to do a filter query with general expressions, which gets 
> costly.
> Or copy field with edge Ngram and LowerCaseFilter factory which is again 
> costly.
> 
> 
> Does anyone have any suggestions? It would be good if we have an option with 
> the facet.contains 
> Just need a Boolean capability in there.
> 
> Thanks,
> Lewin



RE: Is it possible to have different Stop words depending on the value of a field?

2019-12-02 Thread email
To clarify, a document would look like this : 

{
  address: "123 main Street",
  country : "US"
}

What I'd like to do when I configure my index is to apply a set of different 
stop words to the address field depending on the value of the country. For 
example, something like this : 

If (country == US) -> File1
Else If (country == UK) -> File2

Etc..

Hopefully, that clarifies.

-Original Message-
From: Jörn Franke  
Sent: Monday, December 2, 2019 3:25 PM
To: solr-user@lucene.apache.org
Subject: Re: Is it possible to have different Stop words depending on the value 
of a field?

You can have different fields by country. I am not sure about your stop words 
but if they are not occurring in the other languages then you have not a 
problem. 
On the other hand: it you need more than stop words (eg lemmatizing, 
specialized way of tokenization etc) then you need a different field per 
language. You don’t describe your full use case, but if you have different 
fields for different language then your client application needs to handle this 
(not difficult, but you have to be aware).
Not sure if you need to search a given address in all languages or if you use 
the language of the user etc.

> Am 02.12.2019 um 20:13 schrieb yeikel valdes :
> 
> Hi,
> 
> 
> I have an index that stores addresses from different countries.
> 
> 
> As every country has different stop words, I was wondering if it is possible 
> to apply a different set of stop words depending on the value of a field. 
> 
> 
> Or do I need different indexes/do itnat the ETL step to accomplish this?
> 
> 




Re: Is it possible to have different Stop words depending on the value of a field?

2019-12-02 Thread Dave
It clarifies yes. You need new fields. In this case something like
Address_us
Address_uk
And index and search them accordingly with different stopword files used in 
different field types, hence the copy field from “address” into as many new 
fields as needed

> On Dec 2, 2019, at 7:33 PM,   wrote:
> 
> To clarify, a document would look like this : 
> 
> {
>  address: "123 main Street",
>  country : "US"
> }
> 
> What I'd like to do when I configure my index is to apply a set of different 
> stop words to the address field depending on the value of the country. For 
> example, something like this : 
> 
> If (country == US) -> File1
> Else If (country == UK) -> File2
> 
> Etc..
> 
> Hopefully, that clarifies.
> 
> -Original Message-
> From: Jörn Franke  
> Sent: Monday, December 2, 2019 3:25 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Is it possible to have different Stop words depending on the 
> value of a field?
> 
> You can have different fields by country. I am not sure about your stop words 
> but if they are not occurring in the other languages then you have not a 
> problem. 
> On the other hand: it you need more than stop words (eg lemmatizing, 
> specialized way of tokenization etc) then you need a different field per 
> language. You don’t describe your full use case, but if you have different 
> fields for different language then your client application needs to handle 
> this (not difficult, but you have to be aware).
> Not sure if you need to search a given address in all languages or if you use 
> the language of the user etc.
> 
>> Am 02.12.2019 um 20:13 schrieb yeikel valdes :
>> 
>> Hi,
>> 
>> 
>> I have an index that stores addresses from different countries.
>> 
>> 
>> As every country has different stop words, I was wondering if it is possible 
>> to apply a different set of stop words depending on the value of a field. 
>> 
>> 
>> Or do I need different indexes/do itnat the ETL step to accomplish this?
>> 
>> 
> 
> 


RE: Is it possible to have different Stop words depending on the value of a field?

2019-12-02 Thread email
That makes sense, thank you for the clarification!

@wun...@wunderwood.org If you can, please build on your explanation as It 
sounds relevant. 
-Original Message-
From: Dave  
Sent: Monday, December 2, 2019 7:38 PM
To: solr-user@lucene.apache.org
Cc: jornfra...@gmail.com
Subject: Re: Is it possible to have different Stop words depending on the value 
of a field?

It clarifies yes. You need new fields. In this case something like Address_us 
Address_uk And index and search them accordingly with different stopword files 
used in different field types, hence the copy field from “address” into as many 
new fields as needed

> On Dec 2, 2019, at 7:33 PM,   wrote:
> 
> To clarify, a document would look like this : 
> 
> {
>  address: "123 main Street",
>  country : "US"
> }
> 
> What I'd like to do when I configure my index is to apply a set of different 
> stop words to the address field depending on the value of the country. For 
> example, something like this : 
> 
> If (country == US) -> File1
> Else If (country == UK) -> File2
> 
> Etc..
> 
> Hopefully, that clarifies.
> 
> -Original Message-
> From: Jörn Franke 
> Sent: Monday, December 2, 2019 3:25 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Is it possible to have different Stop words depending on the 
> value of a field?
> 
> You can have different fields by country. I am not sure about your stop words 
> but if they are not occurring in the other languages then you have not a 
> problem. 
> On the other hand: it you need more than stop words (eg lemmatizing, 
> specialized way of tokenization etc) then you need a different field per 
> language. You don’t describe your full use case, but if you have different 
> fields for different language then your client application needs to handle 
> this (not difficult, but you have to be aware).
> Not sure if you need to search a given address in all languages or if you use 
> the language of the user etc.
> 
>> Am 02.12.2019 um 20:13 schrieb yeikel valdes :
>> 
>> Hi,
>> 
>> 
>> I have an index that stores addresses from different countries.
>> 
>> 
>> As every country has different stop words, I was wondering if it is possible 
>> to apply a different set of stop words depending on the value of a field. 
>> 
>> 
>> Or do I need different indexes/do itnat the ETL step to accomplish this?
>> 
>> 
> 
> 




Re: Convert javabin to json

2019-12-02 Thread Noble Paul
obj = new JavabinCodec().unmarshal();
Utils#writeJson()

On Thu, Nov 28, 2019 at 11:54 AM Wei  wrote:
>
> Hi,
>
> Is there a reliable way to convert solr's javabin response to json format?
> We use solrj client with wt=javabin, but want to convert the received
> javabin response to json for passing to client.  We don't want to use
> wt=json as javabin is more efficient.  We tried the noggit jsonutil
>
> https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/noggit/JSONUtil.java
>
> but seems it is not able to convert parts of the query response such as
> facet.  Are there any other options available?
>
> Thanks,
> Wei



-- 
-
Noble Paul


Re: Is it possible to have different Stop words depending on the value of a field?

2019-12-02 Thread Dave
I’ll add to that since I’m up. Stopwords are in a practical sense useless and 
serve no purpose. It’s an old way to save index size that’s not needed any 
more. You’d need very specific use cases to want to use them. Maybe you do, but 
generally you never do unless it’s for training a machine or something a bit 
more on the experimental side. If you can explain *why you think you need stop 
words that would be helpful in perhaps guiding you to an alternative 

> On Dec 2, 2019, at 7:45 PM,   wrote:
> 
> That makes sense, thank you for the clarification!
> 
> @wun...@wunderwood.org If you can, please build on your explanation as It 
> sounds relevant. 
> -Original Message-
> From: Dave  
> Sent: Monday, December 2, 2019 7:38 PM
> To: solr-user@lucene.apache.org
> Cc: jornfra...@gmail.com
> Subject: Re: Is it possible to have different Stop words depending on the 
> value of a field?
> 
> It clarifies yes. You need new fields. In this case something like Address_us 
> Address_uk And index and search them accordingly with different stopword 
> files used in different field types, hence the copy field from “address” into 
> as many new fields as needed
> 
>> On Dec 2, 2019, at 7:33 PM,   wrote:
>> 
>> To clarify, a document would look like this : 
>> 
>> {
>> address: "123 main Street",
>> country : "US"
>> }
>> 
>> What I'd like to do when I configure my index is to apply a set of different 
>> stop words to the address field depending on the value of the country. For 
>> example, something like this : 
>> 
>> If (country == US) -> File1
>> Else If (country == UK) -> File2
>> 
>> Etc..
>> 
>> Hopefully, that clarifies.
>> 
>> -Original Message-
>> From: Jörn Franke 
>> Sent: Monday, December 2, 2019 3:25 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Is it possible to have different Stop words depending on the 
>> value of a field?
>> 
>> You can have different fields by country. I am not sure about your stop 
>> words but if they are not occurring in the other languages then you have not 
>> a problem. 
>> On the other hand: it you need more than stop words (eg lemmatizing, 
>> specialized way of tokenization etc) then you need a different field per 
>> language. You don’t describe your full use case, but if you have different 
>> fields for different language then your client application needs to handle 
>> this (not difficult, but you have to be aware).
>> Not sure if you need to search a given address in all languages or if you 
>> use the language of the user etc.
>> 
>>> Am 02.12.2019 um 20:13 schrieb yeikel valdes :
>>> 
>>> Hi,
>>> 
>>> 
>>> I have an index that stores addresses from different countries.
>>> 
>>> 
>>> As every country has different stop words, I was wondering if it is 
>>> possible to apply a different set of stop words depending on the value of a 
>>> field. 
>>> 
>>> 
>>> Or do I need different indexes/do itnat the ETL step to accomplish this?
>>> 
>>> 
>> 
>> 
> 
> 


RE: Is it possible to have different Stop words depending on the value of a field?

2019-12-02 Thread email
Thank you for jumping in @hastings.recurs...@gmail.com

I have an index with raw addresses in a nonstandardized format such as "123 
main street" or "main street 123", and I am looking to search this index and 
pull the closest addresses from another raw input with a similar unpredictable 
format. Ideally, I am trying to reduce the number of results as much as 
possible because of time constraints. 

At the moment, I am launching a dismax query with the mm(minimum should match) 
parameter set to a value I am comfortable with(say 50% for example). 

In an address such as "123 main street CA 90201 US" , if I execute a query such 
as: "return addresses that match 50% of the tokens"(dismax,with mm set to 50%), 
 I will potentially get records with "US Street 123" or "main street CA", which 
is not something that I am looking for. I understand that I could increase the 
mm parameter and set it to say "100%", but again, I am not sure if the token 
"street" should be considered when calculating the mm parameter as I could miss 
a record such as "123 main CA 90201 US"

For longer addresses, the relevance of "main" or "street" is much lower than 
keywords such as apartment number or the city. 

I am not sure if this is the right way to search for unstructured addresses so 
we are open for suggestions. 

Thank you

-Original Message-
From: Dave  
Sent: Monday, December 2, 2019 7:50 PM
To: solr-user@lucene.apache.org
Cc: wun...@wunderwood.org; jornfra...@gmail.com
Subject: Re: Is it possible to have different Stop words depending on the value 
of a field?

I’ll add to that since I’m up. Stopwords are in a practical sense useless and 
serve no purpose. It’s an old way to save index size that’s not needed any 
more. You’d need very specific use cases to want to use them. Maybe you do, but 
generally you never do unless it’s for training a machine or something a bit 
more on the experimental side. If you can explain *why you think you need stop 
words that would be helpful in perhaps guiding you to an alternative 

> On Dec 2, 2019, at 7:45 PM,   wrote:
> 
> That makes sense, thank you for the clarification!
> 
> @wun...@wunderwood.org If you can, please build on your explanation as It 
> sounds relevant. 
> -Original Message-
> From: Dave  
> Sent: Monday, December 2, 2019 7:38 PM
> To: solr-user@lucene.apache.org
> Cc: jornfra...@gmail.com
> Subject: Re: Is it possible to have different Stop words depending on the 
> value of a field?
> 
> It clarifies yes. You need new fields. In this case something like Address_us 
> Address_uk And index and search them accordingly with different stopword 
> files used in different field types, hence the copy field from “address” into 
> as many new fields as needed
> 
>> On Dec 2, 2019, at 7:33 PM,   wrote:
>> 
>> To clarify, a document would look like this : 
>> 
>> {
>> address: "123 main Street",
>> country : "US"
>> }
>> 
>> What I'd like to do when I configure my index is to apply a set of different 
>> stop words to the address field depending on the value of the country. For 
>> example, something like this : 
>> 
>> If (country == US) -> File1
>> Else If (country == UK) -> File2
>> 
>> Etc..
>> 
>> Hopefully, that clarifies.
>> 
>> -Original Message-
>> From: Jörn Franke 
>> Sent: Monday, December 2, 2019 3:25 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Is it possible to have different Stop words depending on the 
>> value of a field?
>> 
>> You can have different fields by country. I am not sure about your stop 
>> words but if they are not occurring in the other languages then you have not 
>> a problem. 
>> On the other hand: it you need more than stop words (eg lemmatizing, 
>> specialized way of tokenization etc) then you need a different field per 
>> language. You don’t describe your full use case, but if you have different 
>> fields for different language then your client application needs to handle 
>> this (not difficult, but you have to be aware).
>> Not sure if you need to search a given address in all languages or if you 
>> use the language of the user etc.
>> 
>>> Am 02.12.2019 um 20:13 schrieb yeikel valdes :
>>> 
>>> Hi,
>>> 
>>> 
>>> I have an index that stores addresses from different countries.
>>> 
>>> 
>>> As every country has different stop words, I was wondering if it is 
>>> possible to apply a different set of stop words depending on the value of a 
>>> field. 
>>> 
>>> 
>>> Or do I need different indexes/do itnat the ETL step to accomplish this?
>>> 
>>> 
>> 
>> 
> 
> 




Issue in SolrInputDocument on Upgrade time

2019-12-02 Thread vishal patel
Hi,

I am getting below error while converting json to my object. I am using Gson 
class (gson-2.2.4.jar) to generate json from object and object from json.
gson fromJson() method throws below error.
Note: This was working fine with solr-solrj-5.2.0.jar but it causing issue when 
i uses solr-solrj-6.1.0.jar or higher. As I checked SolrInputDocument class has 
changed in solr-solrj-5.5.0.

java.lang.IllegalArgumentException: Can not set 
org.apache.solr.common.SolrInputDocument field 
com.test.common.MySolrMessage.body to com.google.gson.internal.LinkedTreeMap
at 
sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:167)
at 
sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:171)
at 
sun.reflect.UnsafeObjectFieldAccessorImpl.set(UnsafeObjectFieldAccessorImpl.java:81)
at java.lang.reflect.Field.set(Field.java:764)
at 
com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.read(ReflectiveTypeAdapterFactory.java:108)
at 
com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:185)
at 
com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.read(TypeAdapterRuntimeTypeWrapper.java:40)
at 
com.google.gson.internal.bind.CollectionTypeAdapterFactory$Adapter.read(CollectionTypeAdapterFactory.java:81)
at 
com.google.gson.internal.bind.CollectionTypeAdapterFactory$Adapter.read(CollectionTypeAdapterFactory.java:1)
at 
com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.read(ReflectiveTypeAdapterFactory.java:106)
at 
com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:185)
at com.google.gson.Gson.fromJson(Gson.java:825)
at com.google.gson.Gson.fromJson(Gson.java:790)
at com.google.gson.Gson.fromJson(Gson.java:739)
at com.google.gson.Gson.fromJson(Gson.java:711)


public class MySolrMessage implements IMessage
{
private static final long serialVersionUID = 1L;
private T body = null;
private String collection;
private int action;
private int errorCode;
private long msgId;
//few parameterized constructor
//getter and setter method of all above attributes
}

public interface IMessage extends Serializable
{
public long getMsgId();
public void setMsgId(long id);
public Object getBody();
public void setBody(Object o);
public void setErrorCode(int ec);
public int getErrorCode();
}

public class Request {
LinkedList msgList = new LinkedList();

public Request() {
}

public Request(LinkedList l) {
this.msgList = l;
}

public LinkedList getMsgList() {
return this.msgList;
}
}

@JsonAutoDetect(JsonMethod.FIELD)
@JsonSerialize(include = JsonSerialize.Inclusion.NON_NULL)
public class Request2
{
@JsonProperty
@JsonDeserialize(as=LinkedList.class,contentAs = MySolrMessage.class)
LinkedList> msgList = new 
LinkedList>();

public Request()
{

}

public Request(LinkedList> l)
{
this.msgList = l;
}

public LinkedList> getMsgList()
{
return this.msgList;
}
}


public class Test {

public static void main(String[] args) {
SolrInputDocument solrDocument = new SolrInputDocument();
solrDocument.addField("id", "1234");
solrDocument.addField("name", "test");
MySolrMessage asm = new MySolrMessage(solrDocument, 
"collection1", 1);
IMessage message = asm;
List msgList = new ArrayList();
msgList.add(message);
LinkedList ex = new LinkedList();
ex.addAll(msgList);
Request request = new Request(ex);
try
{
String json = "";
Gson gson = (new GsonBuilder()).serializeNulls().create();
gson.setASessionId((String) null);
json = gson.toJson(request);
Gson gson2 = new Gson();
Request2 retObj = gson2.fromJson(json, Request2.class); //this will gives the 
above error.
}
catch (Exception e)
{
   e.printStackTrace();
}
}
}

Any idea?



Regards,
Vishal