Re: Score higher if multiple terms match

2017-06-08 Thread OTH
Hi - Sorry it was very late at night for me and I think I didn't pick my
wordings right.
bq: it is indeed returning documents with only either one of the two query
terms
What I meant was:  Initially, I thought it was only returning documents
which contained both 'tv' and 'promotion'.  Then I realized I was mistaken;
it was also returning documents which contained either 'tv' or 'promotion'
(as well as documents which contained both, which were scored higher).
I hope that clears the confusion.
Thanks

On Thu, Jun 8, 2017 at 9:04 AM, Erick Erickson 
wrote:

> bq: it is indeed returning documents with only either one of the two query
> terms
>
> Uhm, this should not be true. What's the output of adding debug=query?
> And are you totally sure the above is true and you're just not seeing
> the other term in the return? Or that you have a synonyms file that is
> somehow making docs match? Or ???
>
> So you're saying you get the exact same number of hits for
> name:tv OR name:promotion
> and
> name:tv AND name:promotion
> ??? Definitely not expected unless all docs happen to have both these
> terms in the name field either through normal input or synonyms etc.
>
> You should need something like:
> name:tv OR name:promotion OR (name:tv AND name:promotion)^100
> to score all the docs with both terms in the name field higher than just
> one.
>
> Best,
> Erick
>
> On Wed, Jun 7, 2017 at 3:05 PM, OTH  wrote:
> > I'm sorry, there was a mistake.
> >
> > I previously wrote:
> >
> > However, these are returning only those documents which have both the
> terms
> >> 'tv promotion' in them (there are a few).  It's not returning any
> >> document which have only 'tv' or only 'promotion' in them.
> >
> >
> > That's not true at all; it is indeed returning documents with only either
> > one of the two query terms (so, documents with only 'tv' or only
> > 'promotion' in them).  Sorry.  You can disregard my question in the last
> > email.
> >
> > Thanks
> >
> > On Thu, Jun 8, 2017 at 2:03 AM, OTH  wrote:
> >
> >> Thanks.
> >> Both of these are working in my case:
> >> name:"tv promotion"   -->  name:"tv promotion"
> >> name:tv AND name:promotion --> name:tv AND name:promotion
> >> (Although I'm assuming, the first might not have worked if my document
> had
> >> been say 'promotion tv' or 'tv xyz promotion')
> >>
> >> However, these are returning only those documents which have both the
> >> terms 'tv promotion' in them (there are a few).  It's not returning any
> >> document which have only 'tv' or only 'promotion' in them.
> >>
> >> That's not an absolute requirement of mine, I could work around it, but
> I
> >> was just wondering, if it were possible to pass a single solr query with
> >> both the terms 'tv' and 'promotion' in them, and have them return all
> the
> >> documents which contain either of those terms, but with higher scores
> >> attached to those documents with both those terms?
> >>
> >> Much thanks
> >>
> >> On Thu, Jun 8, 2017 at 1:43 AM, David Hastings <
> >> hastings.recurs...@gmail.com> wrote:
> >>
> >>> sorry, i meant debug query where you would get output like this:
> >>>
> >>> "debug": {
> >>> "rawquerystring": "name:tv promotion",
> >>> "querystring": "name:tv promotion",
> >>> "parsedquery": "+name:tv +text:promotion",
> >>>
> >>>
> >>> On Wed, Jun 7, 2017 at 4:41 PM, David Hastings <
> >>> hastings.recurs...@gmail.com
> >>> > wrote:
> >>>
> >>> > well, short answer, use the analyzer to see whats happening.
> >>> > long answer
> >>> >  theres a difference between
> >>> > name:tv promotion   -->  name:tv default_field:promotion
> >>> > name:"tv promotion"   -->  name:"tv promotion"
> >>> > name:tv AND name:promotion --> name:tv AND name:promotion
> >>> >
> >>> >
> >>> > since your default field most likely isnt name, its going to search
> only
> >>> > the default field for it.  you can alter this behavior using qf
> >>> parameters:
> >>> >
> >>> >
> >>> >
> >>> > qf='name^5 text'
> >>> >
> >>> >
> >>> > for example would apply a boost of 5 if it matched the field 'name',
> and
> >>> > only 1 for 'text'
> >>> >
> >>> > On Wed, Jun 7, 2017 at 4:35 PM, OTH  wrote:
> >>> >
> >>> >> Hello,
> >>> >>
> >>> >> I have what I would think to be a fairly simple problem to solve,
> >>> however
> >>> >> I'm not sure how it's done in Solr and couldn't find an answer on
> >>> Google.
> >>> >>
> >>> >> Say I have two documents, "TV" and "TV promotion".  If the search
> >>> query is
> >>> >> "TV promotion", then, obviously, I would like the document "TV
> >>> promotion"
> >>> >> to score higher.  However, that is not the case right now.
> >>> >>
> >>> >> My syntax is something like this:
> >>> >> http://localhost:8983/solr/sales/select?indent=on&wt=json&;
> >>> >> fl=*,score&q=name:tv
> >>> >> promotion
> >>> >> (I tried "q=name:tv+promotion (added the '+'), but it made no
> >>> difference.)
> >>> >>
> >>> >> It's not scoring the document "TV promotion" higher than "TV"; in
> fact
> >>> >> it's
> >>> >> scoring i

Replcate data from Solr to Solrcloud

2017-06-08 Thread Novin Novin
Hi Guys,

I have set up SolrCloud  for production but ready to use and currently Solr
running with two core in production. SolrCloud machines are separate than
standalone Solr and has two collections in SolrCloud similar to Solr.

Is it possible and  would be useful. If I could be replicate data from Solr
to SolrCloud like master slave does or use some other method to send data
from Solr to SolrCloud.

Let me know if you guys need more information.

Thanks in advance,
Navin


Re: Replcate data from Solr to Solrcloud

2017-06-08 Thread Erick Erickson
You say you have two cores. Are Tha same collection? That is, are you doing
distributed search? If not, you can use the replication APIs fetchindex
command to manually move them.

For that matter, you can just scp the indexes over too, they're just files.

If you're doing distributed search on your stand alone Solr, then you'd
need to insure that the hash ranges were correct on your two-handed
SolrCloud setup.

Best,
Erick

On Jun 8, 2017 07:06, "Novin Novin"  wrote:

> Hi Guys,
>
> I have set up SolrCloud  for production but ready to use and currently Solr
> running with two core in production. SolrCloud machines are separate than
> standalone Solr and has two collections in SolrCloud similar to Solr.
>
> Is it possible and  would be useful. If I could be replicate data from Solr
> to SolrCloud like master slave does or use some other method to send data
> from Solr to SolrCloud.
>
> Let me know if you guys need more information.
>
> Thanks in advance,
> Navin
>


Re: Score higher if multiple terms match

2017-06-08 Thread Erick Erickson
bq: I hope that clears the confusion.

Nope, doesn't clear it up at all. It's not clear which query you're
talking about at least to me.

If you're searching for
name:tv AND name:promotion

and getting back a document that has only "tv" in the name field
that's simply wrong and you need to find out why.

If you're saying that searching for
name:tv OR name:promotion

returns both and that docs with both terms score higher, that's likely
true although it'll be fuzzy. I'm guessing that the name field is
fairly short so the length norm will be the sam and this will be
fairly reliable. If the field could have a widely varying number of
terms it's less reliable.

Best,
Erick

On Thu, Jun 8, 2017 at 1:41 AM, OTH  wrote:
> Hi - Sorry it was very late at night for me and I think I didn't pick my
> wordings right.
> bq: it is indeed returning documents with only either one of the two query
> terms
> What I meant was:  Initially, I thought it was only returning documents
> which contained both 'tv' and 'promotion'.  Then I realized I was mistaken;
> it was also returning documents which contained either 'tv' or 'promotion'
> (as well as documents which contained both, which were scored higher).
> I hope that clears the confusion.
> Thanks
>
> On Thu, Jun 8, 2017 at 9:04 AM, Erick Erickson 
> wrote:
>
>> bq: it is indeed returning documents with only either one of the two query
>> terms
>>
>> Uhm, this should not be true. What's the output of adding debug=query?
>> And are you totally sure the above is true and you're just not seeing
>> the other term in the return? Or that you have a synonyms file that is
>> somehow making docs match? Or ???
>>
>> So you're saying you get the exact same number of hits for
>> name:tv OR name:promotion
>> and
>> name:tv AND name:promotion
>> ??? Definitely not expected unless all docs happen to have both these
>> terms in the name field either through normal input or synonyms etc.
>>
>> You should need something like:
>> name:tv OR name:promotion OR (name:tv AND name:promotion)^100
>> to score all the docs with both terms in the name field higher than just
>> one.
>>
>> Best,
>> Erick
>>
>> On Wed, Jun 7, 2017 at 3:05 PM, OTH  wrote:
>> > I'm sorry, there was a mistake.
>> >
>> > I previously wrote:
>> >
>> > However, these are returning only those documents which have both the
>> terms
>> >> 'tv promotion' in them (there are a few).  It's not returning any
>> >> document which have only 'tv' or only 'promotion' in them.
>> >
>> >
>> > That's not true at all; it is indeed returning documents with only either
>> > one of the two query terms (so, documents with only 'tv' or only
>> > 'promotion' in them).  Sorry.  You can disregard my question in the last
>> > email.
>> >
>> > Thanks
>> >
>> > On Thu, Jun 8, 2017 at 2:03 AM, OTH  wrote:
>> >
>> >> Thanks.
>> >> Both of these are working in my case:
>> >> name:"tv promotion"   -->  name:"tv promotion"
>> >> name:tv AND name:promotion --> name:tv AND name:promotion
>> >> (Although I'm assuming, the first might not have worked if my document
>> had
>> >> been say 'promotion tv' or 'tv xyz promotion')
>> >>
>> >> However, these are returning only those documents which have both the
>> >> terms 'tv promotion' in them (there are a few).  It's not returning any
>> >> document which have only 'tv' or only 'promotion' in them.
>> >>
>> >> That's not an absolute requirement of mine, I could work around it, but
>> I
>> >> was just wondering, if it were possible to pass a single solr query with
>> >> both the terms 'tv' and 'promotion' in them, and have them return all
>> the
>> >> documents which contain either of those terms, but with higher scores
>> >> attached to those documents with both those terms?
>> >>
>> >> Much thanks
>> >>
>> >> On Thu, Jun 8, 2017 at 1:43 AM, David Hastings <
>> >> hastings.recurs...@gmail.com> wrote:
>> >>
>> >>> sorry, i meant debug query where you would get output like this:
>> >>>
>> >>> "debug": {
>> >>> "rawquerystring": "name:tv promotion",
>> >>> "querystring": "name:tv promotion",
>> >>> "parsedquery": "+name:tv +text:promotion",
>> >>>
>> >>>
>> >>> On Wed, Jun 7, 2017 at 4:41 PM, David Hastings <
>> >>> hastings.recurs...@gmail.com
>> >>> > wrote:
>> >>>
>> >>> > well, short answer, use the analyzer to see whats happening.
>> >>> > long answer
>> >>> >  theres a difference between
>> >>> > name:tv promotion   -->  name:tv default_field:promotion
>> >>> > name:"tv promotion"   -->  name:"tv promotion"
>> >>> > name:tv AND name:promotion --> name:tv AND name:promotion
>> >>> >
>> >>> >
>> >>> > since your default field most likely isnt name, its going to search
>> only
>> >>> > the default field for it.  you can alter this behavior using qf
>> >>> parameters:
>> >>> >
>> >>> >
>> >>> >
>> >>> > qf='name^5 text'
>> >>> >
>> >>> >
>> >>> > for example would apply a boost of 5 if it matched the field 'name',
>> and
>> >>> > only 1 for 'text'
>> >>> >
>> >>> > On Wed, Jun 7, 2017 at 4:35 PM, OTH  w

Re: Score higher if multiple terms match

2017-06-08 Thread David Hastings
Agreed, you need to show the debug query info from your original query:


My syntax is something like this:
>> >>> >> http://localhost:8983/solr/sales/select?indent=on&wt=json&;
>> >>> >> fl=*,score&q=name:tv
>> >>> >> promotion

and could probably help you get the results you want


On Thu, Jun 8, 2017 at 10:54 AM, Erick Erickson 
wrote:

> bq: I hope that clears the confusion.
>
> Nope, doesn't clear it up at all. It's not clear which query you're
> talking about at least to me.
>
> If you're searching for
> name:tv AND name:promotion
>
> and getting back a document that has only "tv" in the name field
> that's simply wrong and you need to find out why.
>
> If you're saying that searching for
> name:tv OR name:promotion
>
> returns both and that docs with both terms score higher, that's likely
> true although it'll be fuzzy. I'm guessing that the name field is
> fairly short so the length norm will be the sam and this will be
> fairly reliable. If the field could have a widely varying number of
> terms it's less reliable.
>
> Best,
> Erick
>
> On Thu, Jun 8, 2017 at 1:41 AM, OTH  wrote:
> > Hi - Sorry it was very late at night for me and I think I didn't pick my
> > wordings right.
> > bq: it is indeed returning documents with only either one of the two
> query
> > terms
> > What I meant was:  Initially, I thought it was only returning documents
> > which contained both 'tv' and 'promotion'.  Then I realized I was
> mistaken;
> > it was also returning documents which contained either 'tv' or
> 'promotion'
> > (as well as documents which contained both, which were scored higher).
> > I hope that clears the confusion.
> > Thanks
> >
> > On Thu, Jun 8, 2017 at 9:04 AM, Erick Erickson 
> > wrote:
> >
> >> bq: it is indeed returning documents with only either one of the two
> query
> >> terms
> >>
> >> Uhm, this should not be true. What's the output of adding debug=query?
> >> And are you totally sure the above is true and you're just not seeing
> >> the other term in the return? Or that you have a synonyms file that is
> >> somehow making docs match? Or ???
> >>
> >> So you're saying you get the exact same number of hits for
> >> name:tv OR name:promotion
> >> and
> >> name:tv AND name:promotion
> >> ??? Definitely not expected unless all docs happen to have both these
> >> terms in the name field either through normal input or synonyms etc.
> >>
> >> You should need something like:
> >> name:tv OR name:promotion OR (name:tv AND name:promotion)^100
> >> to score all the docs with both terms in the name field higher than just
> >> one.
> >>
> >> Best,
> >> Erick
> >>
> >> On Wed, Jun 7, 2017 at 3:05 PM, OTH  wrote:
> >> > I'm sorry, there was a mistake.
> >> >
> >> > I previously wrote:
> >> >
> >> > However, these are returning only those documents which have both the
> >> terms
> >> >> 'tv promotion' in them (there are a few).  It's not returning any
> >> >> document which have only 'tv' or only 'promotion' in them.
> >> >
> >> >
> >> > That's not true at all; it is indeed returning documents with only
> either
> >> > one of the two query terms (so, documents with only 'tv' or only
> >> > 'promotion' in them).  Sorry.  You can disregard my question in the
> last
> >> > email.
> >> >
> >> > Thanks
> >> >
> >> > On Thu, Jun 8, 2017 at 2:03 AM, OTH  wrote:
> >> >
> >> >> Thanks.
> >> >> Both of these are working in my case:
> >> >> name:"tv promotion"   -->  name:"tv promotion"
> >> >> name:tv AND name:promotion --> name:tv AND name:promotion
> >> >> (Although I'm assuming, the first might not have worked if my
> document
> >> had
> >> >> been say 'promotion tv' or 'tv xyz promotion')
> >> >>
> >> >> However, these are returning only those documents which have both the
> >> >> terms 'tv promotion' in them (there are a few).  It's not returning
> any
> >> >> document which have only 'tv' or only 'promotion' in them.
> >> >>
> >> >> That's not an absolute requirement of mine, I could work around it,
> but
> >> I
> >> >> was just wondering, if it were possible to pass a single solr query
> with
> >> >> both the terms 'tv' and 'promotion' in them, and have them return all
> >> the
> >> >> documents which contain either of those terms, but with higher scores
> >> >> attached to those documents with both those terms?
> >> >>
> >> >> Much thanks
> >> >>
> >> >> On Thu, Jun 8, 2017 at 1:43 AM, David Hastings <
> >> >> hastings.recurs...@gmail.com> wrote:
> >> >>
> >> >>> sorry, i meant debug query where you would get output like this:
> >> >>>
> >> >>> "debug": {
> >> >>> "rawquerystring": "name:tv promotion",
> >> >>> "querystring": "name:tv promotion",
> >> >>> "parsedquery": "+name:tv +text:promotion",
> >> >>>
> >> >>>
> >> >>> On Wed, Jun 7, 2017 at 4:41 PM, David Hastings <
> >> >>> hastings.recurs...@gmail.com
> >> >>> > wrote:
> >> >>>
> >> >>> > well, short answer, use the analyzer to see whats happening.
> >> >>> > long answer
> >> >>> >  theres a difference between
> >> >>> > name:tv promoti

RE: Re-Index is not working

2017-06-08 Thread Miller, William K - Norman, OK - Contractor
Sorry I did not give enough information.

"doesn't work" does mean that the documents are not getting indexed.  I am 
using a full import.  I did discover that if I used the Linux touch command 
that the document would re-index.  I don't have any of the logs as I have been 
able to get the documents to index.  You mentioned that the delta import would 
need the timestamp to change to index the documents again, but does the full 
import need this change as well?




~~~
William Kevin Miller

ECS Federal, Inc.
USPS/MTSC
(405) 573-2158


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, June 07, 2017 11:06 PM
To: solr-user
Subject: Re: Re-Index is not working

What does "doesn't work" mean? No documents get indexed? Are you doing a full 
import or a delta import? If the latter, the timestamp not having changed is 
probably causing the doc to be skipped. What does the Solr log say?

Best,
Erick

On Wed, Jun 7, 2017 at 9:46 AM, Miller, William K - Norman, OK - Contractor 
 wrote:

> Hello, I am new to this mailing list and I am having a problem with 
> re-indexing.  I will run an index on an xml file using the 
> DataImportHandler and it will index the file.  Then I delete the index 
> using the *:*, , and 
>  commands.  Then I attempt to re-index the same file with 
> the same configuration in my dataConfig file for the DIH, but it fails 
> to index the file.  If I make a change to the xml file that is being 
> indexed and re-index it works.
>
>
>
> I don’t understand why this is happening.  Any help with this will be 
> appreciated.
>
>
>
>
>
>
>
>
>
>
>
> ~~~
>
> William Kevin Miller
>
> [image: ecsLogo]
>
> ECS Federal, Inc.
>
> USPS/MTSC
>
> (405) 573-2158
>
>
>


Bringing down ZK without Solr

2017-06-08 Thread Venkateswarlu Bommineni
Hi Team,

Is there any way we can bring down ZK without impacting Solr ?

I know it might be a silly question as Solr tolly depends in ZK for all I/O
operations and configuration changes.

Thanks,
Venkat.


Re: Replcate data from Solr to Solrcloud

2017-06-08 Thread Novin Novin
Thanks Erick

No, I'm not doing distributed search. These two core with different type of
information.

If I understand you correctly, I can just use scp to copy index files from
solr to any shard of solrcloud and than solrcloud would balance the data
itself.

Cheers





On Thu, 8 Jun 2017 at 15:46 Erick Erickson  wrote:

> You say you have two cores. Are Tha same collection? That is, are you doing
> distributed search? If not, you can use the replication APIs fetchindex
> command to manually move them.
>
> For that matter, you can just scp the indexes over too, they're just files.
>
> If you're doing distributed search on your stand alone Solr, then you'd
> need to insure that the hash ranges were correct on your two-handed
> SolrCloud setup.
>
> Best,
> Erick
>
> On Jun 8, 2017 07:06, "Novin Novin"  wrote:
>
> > Hi Guys,
> >
> > I have set up SolrCloud  for production but ready to use and currently
> Solr
> > running with two core in production. SolrCloud machines are separate than
> > standalone Solr and has two collections in SolrCloud similar to Solr.
> >
> > Is it possible and  would be useful. If I could be replicate data from
> Solr
> > to SolrCloud like master slave does or use some other method to send data
> > from Solr to SolrCloud.
> >
> > Let me know if you guys need more information.
> >
> > Thanks in advance,
> > Navin
> >
>


RE: Re-Index is not working

2017-06-08 Thread Miller, William K - Norman, OK - Contractor
I figured out why it was not re-indexing without changing the timestamp even on 
the full import.  In my DIH I had a parameter in my top level entity that was 
checking for the last indexed time.




~~~
William Kevin Miller

ECS Federal, Inc.
USPS/MTSC
(405) 573-2158


-Original Message-
From: Miller, William K - Norman, OK - Contractor 
Sent: Thursday, June 08, 2017 10:12 AM
To: 'solr-user@lucene.apache.org'
Subject: RE: Re-Index is not working

Sorry I did not give enough information.

"doesn't work" does mean that the documents are not getting indexed.  I am 
using a full import.  I did discover that if I used the Linux touch command 
that the document would re-index.  I don't have any of the logs as I have been 
able to get the documents to index.  You mentioned that the delta import would 
need the timestamp to change to index the documents again, but does the full 
import need this change as well?




~~~
William Kevin Miller

ECS Federal, Inc.
USPS/MTSC
(405) 573-2158


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Wednesday, June 07, 2017 11:06 PM
To: solr-user
Subject: Re: Re-Index is not working

What does "doesn't work" mean? No documents get indexed? Are you doing a full 
import or a delta import? If the latter, the timestamp not having changed is 
probably causing the doc to be skipped. What does the Solr log say?

Best,
Erick

On Wed, Jun 7, 2017 at 9:46 AM, Miller, William K - Norman, OK - Contractor 
 wrote:

> Hello, I am new to this mailing list and I am having a problem with 
> re-indexing.  I will run an index on an xml file using the 
> DataImportHandler and it will index the file.  Then I delete the index 
> using the *:*, , and 
>  commands.  Then I attempt to re-index the same file with 
> the same configuration in my dataConfig file for the DIH, but it fails 
> to index the file.  If I make a change to the xml file that is being 
> indexed and re-index it works.
>
>
>
> I don’t understand why this is happening.  Any help with this will be 
> appreciated.
>
>
>
>
>
>
>
>
>
>
>
> ~~~
>
> William Kevin Miller
>
> [image: ecsLogo]
>
> ECS Federal, Inc.
>
> USPS/MTSC
>
> (405) 573-2158
>
>
>


Re: Replcate data from Solr to Solrcloud

2017-06-08 Thread Erick Erickson
bq: would balance the data itself.

not if you mean split it up amongst shards. The entire index would be
on a _single_ shard. If you then do ADDREPLICA on that shard it'll
replicate the entire index to each replica

Also note that when you scp stuff around I'd recommend the destination
Solr node be down. Otherwise use the fetchindex. Although note that
fetchindex will prevent queries from being served in cloud mode.

What I was thinking is more a one-time transfer rather than something
ongoing. Solr 7.0 will have support for variants of the ongoing theme.
I was thinking something like

1> move the indexes to a single-replica SolrCloud
2> if you need more shards, use SPLITSHARD on the SolrCloud installation.
3> use ADDREPLICA to build out your SolrCloud setup
4> thereafter index directly to your SolrCloud installation
5> when you've proved out your SolrCloud setup, get rid of the old
stand-alone stuff.

Best,
Erick

On Thu, Jun 8, 2017 at 8:55 AM, Novin Novin  wrote:
> Thanks Erick
>
> No, I'm not doing distributed search. These two core with different type of
> information.
>
> If I understand you correctly, I can just use scp to copy index files from
> solr to any shard of solrcloud and than solrcloud would balance the data
> itself.
>
> Cheers
>
>
>
>
>
> On Thu, 8 Jun 2017 at 15:46 Erick Erickson  wrote:
>
>> You say you have two cores. Are Tha same collection? That is, are you doing
>> distributed search? If not, you can use the replication APIs fetchindex
>> command to manually move them.
>>
>> For that matter, you can just scp the indexes over too, they're just files.
>>
>> If you're doing distributed search on your stand alone Solr, then you'd
>> need to insure that the hash ranges were correct on your two-handed
>> SolrCloud setup.
>>
>> Best,
>> Erick
>>
>> On Jun 8, 2017 07:06, "Novin Novin"  wrote:
>>
>> > Hi Guys,
>> >
>> > I have set up SolrCloud  for production but ready to use and currently
>> Solr
>> > running with two core in production. SolrCloud machines are separate than
>> > standalone Solr and has two collections in SolrCloud similar to Solr.
>> >
>> > Is it possible and  would be useful. If I could be replicate data from
>> Solr
>> > to SolrCloud like master slave does or use some other method to send data
>> > from Solr to SolrCloud.
>> >
>> > Let me know if you guys need more information.
>> >
>> > Thanks in advance,
>> > Navin
>> >
>>


Re: Re-Index is not working

2017-06-08 Thread Erick Erickson
Thanks for bring closure to that.

Erick

On Thu, Jun 8, 2017 at 9:12 AM, Miller, William K - Norman, OK -
Contractor  wrote:
> I figured out why it was not re-indexing without changing the timestamp even 
> on the full import.  In my DIH I had a parameter in my top level entity that 
> was checking for the last indexed time.
>
>
>
>
> ~~~
> William Kevin Miller
>
> ECS Federal, Inc.
> USPS/MTSC
> (405) 573-2158
>
>
> -Original Message-
> From: Miller, William K - Norman, OK - Contractor
> Sent: Thursday, June 08, 2017 10:12 AM
> To: 'solr-user@lucene.apache.org'
> Subject: RE: Re-Index is not working
>
> Sorry I did not give enough information.
>
> "doesn't work" does mean that the documents are not getting indexed.  I am 
> using a full import.  I did discover that if I used the Linux touch command 
> that the document would re-index.  I don't have any of the logs as I have 
> been able to get the documents to index.  You mentioned that the delta import 
> would need the timestamp to change to index the documents again, but does the 
> full import need this change as well?
>
>
>
>
> ~~~
> William Kevin Miller
>
> ECS Federal, Inc.
> USPS/MTSC
> (405) 573-2158
>
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Wednesday, June 07, 2017 11:06 PM
> To: solr-user
> Subject: Re: Re-Index is not working
>
> What does "doesn't work" mean? No documents get indexed? Are you doing a full 
> import or a delta import? If the latter, the timestamp not having changed is 
> probably causing the doc to be skipped. What does the Solr log say?
>
> Best,
> Erick
>
> On Wed, Jun 7, 2017 at 9:46 AM, Miller, William K - Norman, OK - Contractor 
>  wrote:
>
>> Hello, I am new to this mailing list and I am having a problem with
>> re-indexing.  I will run an index on an xml file using the
>> DataImportHandler and it will index the file.  Then I delete the index
>> using the *:*, , and
>>  commands.  Then I attempt to re-index the same file with
>> the same configuration in my dataConfig file for the DIH, but it fails
>> to index the file.  If I make a change to the xml file that is being
>> indexed and re-index it works.
>>
>>
>>
>> I don’t understand why this is happening.  Any help with this will be
>> appreciated.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ~~~
>>
>> William Kevin Miller
>>
>> [image: ecsLogo]
>>
>> ECS Federal, Inc.
>>
>> USPS/MTSC
>>
>> (405) 573-2158
>>
>>
>>


Re: Bringing down ZK without Solr

2017-06-08 Thread Erick Erickson
Well, it depends on what you mean by "impacting".

When ZK drops below quorum you will no longer be able to send indexing
requests to Solr, they'll all fail. At least they better ;).

_Queries_ should continue to work, but you're in somewhat uncharted
territory, nobody I know runs that way very long ;).

The other thing I'd be sure to test is how robust reconnection is from
Solr to ZK when you bring the nodes back up.

bq:  Solr tolly depends in ZK for all I/O

This is a common misunderstanding. Solr depends on ZK for all changes
in cluster state, i.e. nodes going up/down/changing state (down,
recovering, active etc). Those changes generate traffic between ZK and
Solr.

For a normal I/O request, each Solr node has been notified by ZK of
the current state of the collection already and that information
cached locally. So each node knows everything it needs to know to
service the index or query request without talking to ZooKeeper at
all.

I know of installations indexing 100s of K documents each second.
Actually the record I know of is over 1M docs/second. If each of those
requests had to touch ZK to complete ZK could never keep up

Best,
Erick

On Thu, Jun 8, 2017 at 8:49 AM, Venkateswarlu Bommineni
 wrote:
> Hi Team,
>
> Is there any way we can bring down ZK without impacting Solr ?
>
> I know it might be a silly question as Solr tolly depends in ZK for all I/O
> operations and configuration changes.
>
> Thanks,
> Venkat.


Re: Replcate data from Solr to Solrcloud

2017-06-08 Thread Novin Novin
Thanks Erick.

On Thu, 8 Jun 2017 at 17:28 Erick Erickson  wrote:

> bq: would balance the data itself.
>
> not if you mean split it up amongst shards. The entire index would be
> on a _single_ shard. If you then do ADDREPLICA on that shard it'll
> replicate the entire index to each replica
>
> Also note that when you scp stuff around I'd recommend the destination
> Solr node be down. Otherwise use the fetchindex. Although note that
> fetchindex will prevent queries from being served in cloud mode.
>
> What I was thinking is more a one-time transfer rather than something
> ongoing. Solr 7.0 will have support for variants of the ongoing theme.
> I was thinking something like
>
> 1> move the indexes to a single-replica SolrCloud
> 2> if you need more shards, use SPLITSHARD on the SolrCloud installation.
> 3> use ADDREPLICA to build out your SolrCloud setup
> 4> thereafter index directly to your SolrCloud installation
> 5> when you've proved out your SolrCloud setup, get rid of the old
> stand-alone stuff.
>
> Best,
> Erick
>
> On Thu, Jun 8, 2017 at 8:55 AM, Novin Novin  wrote:
> > Thanks Erick
> >
> > No, I'm not doing distributed search. These two core with different type
> of
> > information.
> >
> > If I understand you correctly, I can just use scp to copy index files
> from
> > solr to any shard of solrcloud and than solrcloud would balance the data
> > itself.
> >
> > Cheers
> >
> >
> >
> >
> >
> > On Thu, 8 Jun 2017 at 15:46 Erick Erickson 
> wrote:
> >
> >> You say you have two cores. Are Tha same collection? That is, are you
> doing
> >> distributed search? If not, you can use the replication APIs fetchindex
> >> command to manually move them.
> >>
> >> For that matter, you can just scp the indexes over too, they're just
> files.
> >>
> >> If you're doing distributed search on your stand alone Solr, then you'd
> >> need to insure that the hash ranges were correct on your two-handed
> >> SolrCloud setup.
> >>
> >> Best,
> >> Erick
> >>
> >> On Jun 8, 2017 07:06, "Novin Novin"  wrote:
> >>
> >> > Hi Guys,
> >> >
> >> > I have set up SolrCloud  for production but ready to use and currently
> >> Solr
> >> > running with two core in production. SolrCloud machines are separate
> than
> >> > standalone Solr and has two collections in SolrCloud similar to Solr.
> >> >
> >> > Is it possible and  would be useful. If I could be replicate data from
> >> Solr
> >> > to SolrCloud like master slave does or use some other method to send
> data
> >> > from Solr to SolrCloud.
> >> >
> >> > Let me know if you guys need more information.
> >> >
> >> > Thanks in advance,
> >> > Navin
> >> >
> >>
>


Query fieldNorm through http

2017-06-08 Thread tstusr
I wanted to ask the properly way to query or get the length of a field in
solr.

I'm trying to ask and append fieldNorm in a result field by querying
localhost:8983/solr/uda/tvrh?q=usage:stuff&fl={!func}norm(usage)&debugQuery=on&debugQuery=on

Nevertheless, the response to this query is:




true
500
22



org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException
java.lang.ClassCastException

Error from server at
http://172.16.13.121:7574/solr/uda_shard2_replica1:
org.apache.solr.common.util.SimpleOrderedMap cannot be cast to
java.lang.String
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
Error from server at http://172.16.13.121:7574/solr/uda_shard2_replica1:
org.apache.solr.common.util.SimpleOrderedMap cannot be cast to
java.lang.String
at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:587)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:279)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:268)
at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
at
org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassCastException:
org.apache.solr.common.util.SimpleOrderedMap cannot be cast to
java.lang.String
at
org.apache.solr.common.util.JavaBinCodec.readOrderedMap(JavaBinCodec.java:194)
at
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:269)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251)
at
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:173)
at
org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:50)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:585)
... 12 more

500




What we want is to append it to a field since we are using this field on
spark (using solr-spark) and then make some processing.

Another way we came with, is to compute fieldLength in a processor and write
it on a field, but, since norm is list on query functions we are trying to
obtain it this way.

We really appreciate your help and thanks in advice.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-fieldNorm-through-http-tp4339693.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query fieldNorm through http

2017-06-08 Thread Mikhail Khludnev
I tried to reproduce in on the recent release. Here is what I've got after
adding distrib=false
 requires a TFIDFSimilarity (such as
ClassicSimilarity) java.lang.UnsupportedOperationException:
requires a TFIDFSimilarity (such as ClassicSimilarity) at
org.apache.lucene.queries.function.valuesource.NormValueSource.getValues(NormValueSource.java:62)
at
org.apache.solr.response.transform.ValueSourceAugmenter.transform(ValueSourceAugmenter.java:92)
at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:170) at
org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:59)

Don't you have something like that in logs?

On Thu, Jun 8, 2017 at 10:42 PM, tstusr  wrote:

> I wanted to ask the properly way to query or get the length of a field in
> solr.
>
> I'm trying to ask and append fieldNorm in a result field by querying
> localhost:8983/solr/uda/tvrh?q=usage:stuff&fl={!func}norm(
> usage)&debugQuery=on&debugQuery=on
>
> Nevertheless, the response to this query is:
>
> 
> 
> 
> true
> 500
> 22
> 
> 
> 
>  name="error-class">org.apache.solr.client.solrj.impl.HttpSolrClient$
> RemoteSolrException
> java.
> lang.ClassCastException
> 
> Error from server at
> http://172.16.13.121:7574/solr/uda_shard2_replica1:
> org.apache.solr.common.util.SimpleOrderedMap cannot be cast to
> java.lang.String
>  name="trace">org.apache.solr.client.solrj.impl.HttpSolrClient$
> RemoteSolrException:
> Error from server at http://172.16.13.121:7574/solr/uda_shard2_replica1:
> org.apache.solr.common.util.SimpleOrderedMap cannot be cast to
> java.lang.String
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.
> executeMethod(HttpSolrClient.java:587)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(
> HttpSolrClient.java:279)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(
> HttpSolrClient.java:268)
> at org.apache.solr.client.solrj.SolrClient.request(SolrClient.
> java:1219)
> at
> org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(
> HttpShardHandler.java:163)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.
> call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(
> InstrumentedExecutorService.java:176)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.
> lambda$execute$0(ExecutorUtil.java:229)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassCastException:
> org.apache.solr.common.util.SimpleOrderedMap cannot be cast to
> java.lang.String
> at
> org.apache.solr.common.util.JavaBinCodec.readOrderedMap(
> JavaBinCodec.java:194)
> at
> org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:269)
> at org.apache.solr.common.util.JavaBinCodec.readVal(
> JavaBinCodec.java:251)
> at
> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:173)
> at
> org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(
> BinaryResponseParser.java:50)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.
> executeMethod(HttpSolrClient.java:585)
> ... 12 more
> 
> 500
> 
> 
>
>
> What we want is to append it to a field since we are using this field on
> spark (using solr-spark) and then make some processing.
>
> Another way we came with, is to compute fieldLength in a processor and
> write
> it on a field, but, since norm is list on query functions we are trying to
> obtain it this way.
>
> We really appreciate your help and thanks in advice.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Query-fieldNorm-through-http-tp4339693.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev


Re: Query fieldNorm through http

2017-06-08 Thread tstusr
Hi, thanks for reply.

After adding true on distrib, with query

localhost:8983/solr/uda/tvrh?q=usage:stuff&fl={!func}norm(usage)&debugQuery=on&distrib=true

I've got something similar, I append the complete solr log.

2017-06-08 20:22:02.065 INFO  (qtp1205044462-18) [c:uda s:shard2
r:core_node2 x:uda_shard2_replica1] o.a.s.c.S.Request [uda_shard2_replica1] 
webapp=/solr path=/tvrh
params={distrib=false&tv=false&df=_text_&debug=false&debug=timing&debug=track&qt=/tvrh&fl=id&fl=score&shards.purpose=4&tv.tf=true&start=0&fsv=true&shard.url=http://172.16.13.121:7574/solr/uda_shard2_replica1/&rid=-uda_shard1_replica1-1496953322061-23&rows=10&tv.tf_idf=true&version=2&q=usage:stuff&requestPurpose=GET_TOP_IDS&tv.df=true&NOW=1496953322060&isShard=true&wt=javabin&debugQuery=false}
hits=10 status=0 QTime=0
2017-06-08 20:22:02.285 INFO  (qtp1205044462-21) [c:uda s:shard2
r:core_node2 x:uda_shard2_replica1] o.a.s.c.S.Request [uda_shard2_replica1] 
webapp=/solr path=/tvrh
params={distrib=false&tv=true&df=_text_&debug=timing&debug=track&qt=/tvrh&fl={!func}norm(usage)&fl=id&shards.purpose=320&tv.tf=true&shard.url=http://172.16.13.121:7574/solr/uda_shard2_replica1/&rid=-uda_shard1_replica1-1496953322061-23&tv.tf_idf=true&version=2&q=usage:stuff&requestPurpose=GET_FIELDS,GET_DEBUG&tv.df=true&NOW=1496953322060&ids=8a647a6b-32dd-4fa0-9499-bc387aa0f647,6f7d1b1e-24d2-4e08-9b23-e011a9ba38ce,ae4f969b-1d07-4c2c-98ff-8cb04385e036,b37d9965-ed61-45bb-9b6c-b810be0c75be,1e3e3018-0f60-4ab2-8728-04dc4c24ffa5&isShard=true&wt=javabin&debugQuery=true}
status=0 QTime=40
2017-06-08 20:22:02.288 ERROR (qtp1205044462-21) [c:uda s:shard2
r:core_node2 x:uda_shard2_replica1] o.a.s.s.HttpSolrCall
null:java.lang.UnsupportedOperationException: requires a TFIDFSimilarity
(such as ClassicSimilarity)
at
org.apache.lucene.queries.function.valuesource.NormValueSource.getValues(NormValueSource.java:62)
at
org.apache.solr.response.transform.ValueSourceAugmenter.transform(ValueSourceAugmenter.java:92)
at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:151)
at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:57)
at
org.apache.solr.response.BinaryResponseWriter$Resolver.writeResultsBody(BinaryResponseWriter.java:124)
at
org.apache.solr.response.BinaryResponseWriter$Resolver.writeResults(BinaryResponseWriter.java:143)
at
org.apache.solr.response.BinaryResponseWriter$Resolver.resolve(BinaryResponseWriter.java:87)
at 
org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:234)
at
org.apache.solr.common.util.JavaBinCodec.writeNamedList(JavaBinCodec.java:218)
at
org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:325)
at 
org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:223)
at 
org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:146)
at
org.apache.solr.response.BinaryResponseWriter.write(BinaryResponseWriter.java:51)
at
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:49)
at
org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:809)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:538)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:347)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:298)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
  

Re: Query fieldNorm through http

2017-06-08 Thread Mikhail Khludnev
You probably need to configure TFIDFSimilarity or ClassicSimilarity in
schema and rebuild your index. Otherwise, norm() seems unuseful to me.

On Thu, Jun 8, 2017 at 11:24 PM, tstusr  wrote:

> Hi, thanks for reply.
>
> After adding true on distrib, with query
>
> localhost:8983/solr/uda/tvrh?q=usage:stuff&fl={!func}norm(
> usage)&debugQuery=on&distrib=true
>
> I've got something similar, I append the complete solr log.
>
> 2017-06-08 20:22:02.065 INFO  (qtp1205044462-18) [c:uda s:shard2
> r:core_node2 x:uda_shard2_replica1] o.a.s.c.S.Request [uda_shard2_replica1]
> webapp=/solr path=/tvrh
> params={distrib=false&tv=false&df=_text_&debug=false&
> debug=timing&debug=track&qt=/tvrh&fl=id&fl=score&shards.purpose=4&tv.tf
> =true&start=0&fsv=true&shard.url=http://172.16.13.121:7574/solr/uda_
> shard2_replica1/&rid=-uda_shard1_replica1-1496953322061-
> 23&rows=10&tv.tf_idf=true&version=2&q=usage:stuff&
> requestPurpose=GET_TOP_IDS&tv.df=true&NOW=1496953322060&
> isShard=true&wt=javabin&debugQuery=false}
> hits=10 status=0 QTime=0
> 2017-06-08 20:22:02.285 INFO  (qtp1205044462-21) [c:uda s:shard2
> r:core_node2 x:uda_shard2_replica1] o.a.s.c.S.Request [uda_shard2_replica1]
> webapp=/solr path=/tvrh
> params={distrib=false&tv=true&df=_text_&debug=timing&debug=
> track&qt=/tvrh&fl={!func}norm(usage)&fl=id&shards.purpose=320&tv.tf
> =true&shard.url=http://172.16.13.121:7574/solr/uda_
> shard2_replica1/&rid=-uda_shard1_replica1-1496953322061-
> 23&tv.tf_idf=true&version=2&q=usage:stuff&requestPurpose=
> GET_FIELDS,GET_DEBUG&tv.df=true&NOW=1496953322060&ids=
> 8a647a6b-32dd-4fa0-9499-bc387aa0f647,6f7d1b1e-24d2-4e08-9b23-e011a9ba38ce,
> ae4f969b-1d07-4c2c-98ff-8cb04385e036,b37d9965-ed61-45bb-9b6c-b810be0c75be,
> 1e3e3018-0f60-4ab2-8728-04dc4c24ffa5&isShard=true&wt=
> javabin&debugQuery=true}
> status=0 QTime=40
> 2017-06-08 20:22:02.288 ERROR (qtp1205044462-21) [c:uda s:shard2
> r:core_node2 x:uda_shard2_replica1] o.a.s.s.HttpSolrCall
> null:java.lang.UnsupportedOperationException: requires a TFIDFSimilarity
> (such as ClassicSimilarity)
> at
> org.apache.lucene.queries.function.valuesource.NormValueSource.getValues(
> NormValueSource.java:62)
> at
> org.apache.solr.response.transform.ValueSourceAugmenter.transform(
> ValueSourceAugmenter.java:92)
> at org.apache.solr.response.DocsStreamer.next(
> DocsStreamer.java:151)
> at org.apache.solr.response.DocsStreamer.next(
> DocsStreamer.java:57)
> at
> org.apache.solr.response.BinaryResponseWriter$Resolver.writeResultsBody(
> BinaryResponseWriter.java:124)
> at
> org.apache.solr.response.BinaryResponseWriter$Resolver.writeResults(
> BinaryResponseWriter.java:143)
> at
> org.apache.solr.response.BinaryResponseWriter$Resolver.
> resolve(BinaryResponseWriter.java:87)
> at org.apache.solr.common.util.JavaBinCodec.writeVal(
> JavaBinCodec.java:234)
> at
> org.apache.solr.common.util.JavaBinCodec.writeNamedList(
> JavaBinCodec.java:218)
> at
> org.apache.solr.common.util.JavaBinCodec.writeKnownType(
> JavaBinCodec.java:325)
> at org.apache.solr.common.util.JavaBinCodec.writeVal(
> JavaBinCodec.java:223)
> at org.apache.solr.common.util.JavaBinCodec.marshal(
> JavaBinCodec.java:146)
> at
> org.apache.solr.response.BinaryResponseWriter.write(
> BinaryResponseWriter.java:51)
> at
> org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(
> QueryResponseWriterUtil.java:49)
> at
> org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:809)
> at org.apache.solr.servlet.HttpSolrCall.call(
> HttpSolrCall.java:538)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:347)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:298)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler.java:1691)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(
> SecurityHandler.java:548)
> at
> org.eclipse.jetty.server.session.SessionHandler.
> doHandle(SessionHandler.java:226)
> at
> org.eclipse.jetty.server.handler.ContextHandler.
> doHandle(ContextHandler.java:1180)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
> at
> org.eclipse.jetty.server.session.SessionHandler.
> doScope(SessionHandler.java:185)
> at
> org.eclipse.jetty.server.handler.ContextHandler.
> doScope(ContextHandler.java:1112)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:141)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
> ContextHandlerCollection.java:213)
> at
> org.eclipse.jetty.server.handler.HandlerColle

Sharding vs single index vs separate collection

2017-06-08 Thread Johannes Knaus
Hi,
I have a solr cloud setup, with document routing (implicit routing with router 
field). As the index is about documents with a publication date, I routed 
according the publication year, as in my case, most of the search queries will 
have a year specified.


Now, what would be the best strategy -as regards performance (i.e. a huge 
amount of queries to be processed)- for search queries without any year 
specified? 

1 - Is it enough to define that these queries should go over all routes (i.e. 
route=year1, year2, ..., yearN)?

2 - Would it be better to add a separate node with a separate index that is not 
routed (but maybe sharded/splitted)? If so, how should I deal with such a 
separate index? Is it possible to add it to my existing Solr cloud? Would it go 
into a separate collection?

Thanks for your advice.

Johannes 

Segment information gets deleted

2017-06-08 Thread Chetas Joshi
Hi,

I am trying to understand what the possible root causes for the
following exception could be.


java.io.FileNotFoundException: File does not exist:
hdfs://*/*/*/*/data/index/_2h.si


I had some long GC pauses while executing some queries which took some
of the replicas down. But how can that affect the segmentation
metadata of the solr indexes?


Thanks!


Re: Sharding vs single index vs separate collection

2017-06-08 Thread Susheel Kumar
You mentioned most of the searches will use document routing based on year
as route key, correct? and then you mentioning  huge amount of searches
again without routing.  Can you give some no# how many will utilise routing
vs not routing?

In general, we should try to serve all the queries with one
index/collection which can be shared if needed or replicated to serve huge
amount of queries.  Having a separate index should be avoided unless you
have very good reasons.

Thnx



On Thu, Jun 8, 2017 at 5:45 PM, Johannes Knaus  wrote:

> Hi,
> I have a solr cloud setup, with document routing (implicit routing with
> router field). As the index is about documents with a publication date, I
> routed according the publication year, as in my case, most of the search
> queries will have a year specified.
>
>
> Now, what would be the best strategy -as regards performance (i.e. a huge
> amount of queries to be processed)- for search queries without any year
> specified?
>
> 1 - Is it enough to define that these queries should go over all routes
> (i.e. route=year1, year2, ..., yearN)?
>
> 2 - Would it be better to add a separate node with a separate index that
> is not routed (but maybe sharded/splitted)? If so, how should I deal with
> such a separate index? Is it possible to add it to my existing Solr cloud?
> Would it go into a separate collection?
>
> Thanks for your advice.
>
> Johannes


Re: Sharding vs single index vs separate collection

2017-06-08 Thread Susheel Kumar
correction: shared => sharded

On Thu, Jun 8, 2017 at 10:10 PM, Susheel Kumar 
wrote:

> You mentioned most of the searches will use document routing based on year
> as route key, correct? and then you mentioning  huge amount of searches
> again without routing.  Can you give some no# how many will utilise routing
> vs not routing?
>
> In general, we should try to serve all the queries with one
> index/collection which can be shared if needed or replicated to serve huge
> amount of queries.  Having a separate index should be avoided unless you
> have very good reasons.
>
> Thnx
>
>
>
> On Thu, Jun 8, 2017 at 5:45 PM, Johannes Knaus  wrote:
>
>> Hi,
>> I have a solr cloud setup, with document routing (implicit routing with
>> router field). As the index is about documents with a publication date, I
>> routed according the publication year, as in my case, most of the search
>> queries will have a year specified.
>>
>>
>> Now, what would be the best strategy -as regards performance (i.e. a huge
>> amount of queries to be processed)- for search queries without any year
>> specified?
>>
>> 1 - Is it enough to define that these queries should go over all routes
>> (i.e. route=year1, year2, ..., yearN)?
>>
>> 2 - Would it be better to add a separate node with a separate index that
>> is not routed (but maybe sharded/splitted)? If so, how should I deal with
>> such a separate index? Is it possible to add it to my existing Solr cloud?
>> Would it go into a separate collection?
>>
>> Thanks for your advice.
>>
>> Johannes
>
>
>


including a minus sign "-" in the token

2017-06-08 Thread Phil Scadden
We have important entities referenced in indexed documents which have 
convention naming of geographicname-number. Eg Wainui-8
I want the tokenizer to treat it as Wainui-8 when indexing, and when I search I 
want to a q of Wainui-8 (must it be specified as Wainui\-8 ??) to return docs 
with Wainui-8 but not with Wainui-9 or plain Wainui.

Docs are pdfs, and I have using tika to extract text.

How do I set up solr for queries like this?

Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.


Highlighter not working on some documents

2017-06-08 Thread Phil Scadden
Do a search with:
fl=id,title,datasource&hl=true&hl.method=unified&limit=50&page=1&q=pressure+AND+testing&rows=50&start=0&wt=json

and I get back a good list of documents. However, some documents are returning 
empty fields in the highlighter. Eg, in the highlight array have:
"W:\\Reports\\OCR\\4272.pdf":{"_text_":[]}

Getting this well up the list of results with good highlighted matchers above 
and below this entry. Why would the highlighter be failing?

Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.


JMX property keys

2017-06-08 Thread Aristedes Maniatis
I want to monitor my Solr instances using JMX and graph performance. Using 
Zabbix notation, I end up with a key that looks like this:

jmx["solr/suburbs-1547_shard1_replica1:type=standard,id=org.apache.solr.handler.component.SearchHandler","5minRateReqsPerSecond"]


My problem here is that the key contains the replica id "_replica1". But this 
of course changes across all the hosts in the Solr Cloud, so monitoring is a 
real pain as I roll out nodes. I need to know which replica is running on which 
host.

Why is this so? Is there a way to override how the Solr cores expose themselves 
to JMX?

Please cc me since I'm not subscribed here.

Cheers
Ari



-- 
-->
Aristedes Maniatis
CEO, ish
https://www.ish.com.au
GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A



signature.asc
Description: OpenPGP digital signature