Importing a csv file encapsulated by " creates a large copyField field of all fields combined.

2019-10-21 Thread rhys J
I am trying to import a csv file to my solr core.

It looks like this:

"user_id","name","email","client","classification","default_client","disabled","dm_password","manager"
"A2M","Art Morse","amo...@morsemoving.com","Morse
Moving","Morse","","X","blue0show",""
"ABW","Amy Wiedner","amy.wied...@pyramid-logistics.com","Pyramid","","","
","shawn",""
"J2P","Joan Padal","jo...@bergerallied.com","Berger","","","
","skew3cues",""
"ALB","Anna Bachman","an...@bergerallied.com","Berger","","","
","wary#scan",""
"B1B","Bridget Baker","bba...@reliablevan.com","Reliable","","","
","laps,hear",""
"B1K","Bev Klein"," ","Nor-Cal","",""," ","pipe3hour",""
"B1L","Beverly Leonard","bleon...@reliablevan.com","Reliable","","","
","gail6copy",""
"CMD","Christal Davis","christalda...@smmoving.com","SMMoving","","","
","risk-pair",""
"BEB","Bob Barnum","b...@bergerts.com","Berger","",""," ","mets=pol",""

I have set up the schema via the API, and have all the fields that are
listed on the top line of the csv file.

When I finish the import, it returns no errors. But when I go to look at
the schema, it's created a 2 fields in the managed-schema file:



and

 


Re: Importing a csv file encapsulated by " creates a large copyField field of all fields combined.

2019-10-21 Thread rhys J
I am using this command:

curl '
http://localhost:8983/solr/users/update/csv?commit=true&separator=%09&encapsulator=%20&escape=\&stream.file=/tmp/users.csv
'

On Mon, Oct 21, 2019 at 1:22 PM Alexandre Rafalovitch 
wrote:

> What command do you use to get the file into Solr? My guess that you
> are somehow not hitting the correct handler. Perhaps you are sending
> it to extract handler (designed for PDF, MSWord, etc) rather than the
> correct CSV handler.
>
> Solr comes with the examples of how to index CSV command.
> See for example:
>
> https://github.com/apache/lucene-solr/blob/master/solr/example/films/README.txt#L39
> Also reference documentation:
>
> https://lucene.apache.org/solr/guide/8_1/uploading-data-with-index-handlers.html
>
> Regards,
>Alex.
>
> On Mon, 21 Oct 2019 at 13:04, rhys J  wrote:
> >
> > I am trying to import a csv file to my solr core.
> >
> > It looks like this:
> >
> >
> "user_id","name","email","client","classification","default_client","disabled","dm_password","manager"
> > "A2M","Art Morse","amo...@morsemoving.com","Morse
> > Moving","Morse","","X","blue0show",""
> > "ABW","Amy Wiedner","amy.wied...@pyramid-logistics.com
> ","Pyramid","","","
> > ","shawn",""
> > "J2P","Joan Padal","jo...@bergerallied.com","Berger","","","
> > ","skew3cues",""
> > "ALB","Anna Bachman","an...@bergerallied.com","Berger","","","
> > ","wary#scan",""
> > "B1B","Bridget Baker","bba...@reliablevan.com","Reliable","","","
> > ","laps,hear",""
> > "B1K","Bev Klein"," ","Nor-Cal","",""," ","pipe3hour",""
> > "B1L","Beverly Leonard","bleon...@reliablevan.com","Reliable","","","
> > ","gail6copy",""
> > "CMD","Christal Davis","christalda...@smmoving.com","SMMoving","","","
> > ","risk-pair",""
> > "BEB","Bob Barnum","b...@bergerts.com","Berger","",""," ","mets=pol",""
> >
> > I have set up the schema via the API, and have all the fields that are
> > listed on the top line of the csv file.
> >
> > When I finish the import, it returns no errors. But when I go to look at
> > the schema, it's created a 2 fields in the managed-schema file:
> >
> >  >
> name="_user_id___name___email___client___classification___default_client___disabled___dm_password___manager_"
> > type="text_general"/>
> >
> > and
> >
> >   >
> source="_user_id___name___email___client___classification___default_client___disabled___dm_password___manager_"
> >
> dest="_user_id___name___email___client___classification___default_client___disabled___dm_password___manager__str"
> > maxChars="256"/>
>


Re: Importing a csv file encapsulated by " creates a large copyField field of all fields combined.

2019-10-21 Thread rhys J
Thank you, that worked perfectly. I can't believe I didn't notice the
separator was a tab.


using the df parameter to set a default to search all fields

2019-10-22 Thread rhys J
 How do I make Solr search on all fields in a document?

I read the documentation about the df field, and added the following to my
solrconfig.xml:

 
  explicit
  10
 _text_


in my managed-schema file i have the following:

 

I have deleted the documents, and re-indexed the csv file.

When I do a search in the api for: _text_:amy - which should return 2
documents, I get nothing.

If I do a search for 'amy' in the q field, I still get nothing.

If I do an explicit search for name:amy, I get 2 documents returned.


Re: using the df parameter to set a default to search all fields

2019-10-22 Thread rhys J
> Solr does not have a way to ask for all fields on a search.  If you use
> the edismax query parser, you can specify multiple fields with the qf
> parameter, but there is nothing you can put in that parameter as a
> shortcut for "all fields."  Using qf with multiple fields is the
> cleanest way to do this.
>
>
How would I enter qf parameters in the solrconfig.xml?


> Probably what you are looking for here is to set up one or more
> copyField definitions in your schema, which are configured to copy one
> or more of your other fields to _text_ so it can be searched as a
> catchall field.  I find it useful to name that field "catchall" rather
> than something like _text_ which seems like a special field name, but
> isn't.
>

I did as you suggested, and created a field called 'all_fields' and added
copyFields too. I re-indexed, and this works when i do the search.

Thanks

Rhys


Parts of the Json response to a curl query are arrays, and parts are hashes

2019-10-25 Thread rhys J
Is there some reason that text_general fields are returned as arrays, and
other fields are returned as hashes in the json response from a curl query?

Here's my curl query:

curl "http://10.40.10.14:8983/solr/dbtr/select?indent=on&q=debtor_id:393291";

Here's the response:

response":{"numFound":1,"start":0,"docs":[
  {
"agent":[" "],
"assign_id":["587"],
"client_group":[" "],
"credit_hold":false,
"credit_limit":0.0,
"credit_terms":["N30"],
"currency":["USD"],
"debtor_id":"393291",
"dl1":["165743"],
"dl2":["Great Plains"],
"do_not_call":false,
"do_not_report":false,
"in_aris_date":"2009-10-19T00:00:00Z",
"name1":["CRATE & BARREL"],
"name2":[" "],
"next_contact_date":"2019-10-17T00:00:00Z",
"parent_customer_number":["215976"],
"potential_bad_debt":true,
"priority_followup":false,
"reference_no":["165743"],
"report_as":"CRATE & BARREL",
"report_status":[" "],
"risk":["Low"],
"rms_acct_id":["Berger"],
"salesperson":["Corp House"],
"ssn1":["32"],
"ssn2":["EXEMPT"],
"status_code":["173"],
"status_date":"2016-05-12T00:00:00Z",
"watch_list":[0],
"_version_":1648384727255613441,
"data_signature":"f020b831dd6e553eed217125de13de850d1f4bbc"}]
  }}

As you can see, dates and booleans are hashes, and the text_general fields
(the only thing I can think of that is different) are arrays.

Why is this, and how can i make it return just a hash for the code to
handle?

One thing I did notice in the schema API is that even though I did not
choose MultiValued, it's set to true.

Is this a bug?

Thanks,

Rhys


Re: Parts of the Json response to a curl query are arrays, and parts are hashes

2019-10-25 Thread rhys J
> 
>
> >  "dl2":["Great Plains"],
> >  "do_not_call":false,
>
> There are no hashes inside the document.  If there were, they would be
> surrounded by {} characters.  The whole document is a hash, which is why
> it has {} characters.  Referring to the snippet that I included above,
> dl2 is mapped in the hash to an array, and do_not_call is mapped to a
> boolean, not a hash.
>
> When there is an array in search results, it happens because the field
> is multiValued ... even if there is only one value, it is placed in an
> array for consistency.
>

So I went back to one of the fields that is multi-valued, which I
explicitly did not choose when I created the field, and I re-created it.

It still made the field multi-valued as true.

Why is this?

Thanks,

Rhys

>
> Thanks,
> Shawn
>


Re: Parts of the Json response to a curl query are arrays, and parts are hashes

2019-10-28 Thread rhys J
> Did you reload the core/collection or restart Solr so the new schema
> would take effect? If it's SolrCloud, did you upload the changes to
> zookeeper and then reload the collection?  SolrCloud does not use config
> files on disk.
>

So I have not done this part yet, but I noticed some things in the
managed-schema.

 the first was this (I did verify that the version of the schema is
up-to-date. I am doing an out of the box install of the latest Solr release.

I checked all the fields that I created (I will paste them below), and they
are NOT multi-valued. However, text_general is set to multi-valued as a
default?

 

  
  
  


  
  
  
  

  

Here are some of the fields I created through the API. When I created them,
I did NOT check the multi-valued box at all. However, when I then go to
look at the field through the API, it is marked Multi-valued. I am assuming
this is because of the fieldType definition above? Why is this set to
default to Multi-valued?

Will I break Solr if i change this to default to not multi-valued?

Thanks,

Rhys


Re: Parts of the Json response to a curl query are arrays, and parts are hashes

2019-10-28 Thread rhys J
I forgot to include the fields created through the API:


  
  
  
  
  
  
  
  
  
  
  
  
  

Thanks,

Rhys

On Mon, Oct 28, 2019 at 11:30 AM rhys J  wrote:

>
>
>> Did you reload the core/collection or restart Solr so the new schema
>> would take effect? If it's SolrCloud, did you upload the changes to
>> zookeeper and then reload the collection?  SolrCloud does not use config
>> files on disk.
>>
>
> So I have not done this part yet, but I noticed some things in the
> managed-schema.
>
>  the first was this (I did verify that the version of the schema is
> up-to-date. I am doing an out of the box install of the latest Solr release.
>
> I checked all the fields that I created (I will paste them below), and
> they are NOT multi-valued. However, text_general is set to multi-valued as
> a default?
>
>   positionIncrementGap="100" multiValued="true">
> 
>   
>ignoreCase="true"/>
>   
> 
> 
>   
>ignoreCase="true"/>
>ignoreCase="true" synonyms="synonyms.txt"/>
>   
> 
>   
>
> Here are some of the fields I created through the API. When I created
> them, I did NOT check the multi-valued box at all. However, when I then go
> to look at the field through the API, it is marked Multi-valued. I am
> assuming this is because of the fieldType definition above? Why is this set
> to default to Multi-valued?
>
> Will I break Solr if i change this to default to not multi-valued?
>
> Thanks,
>
> Rhys
>


creating a core with a custom managed-schema

2019-11-04 Thread rhys J
I have created a tmp directory where I want to have reside custom
managed-schemas to use when creating cores.

/tmp/solr_schema/CORENAME/managed-schema

Based on this page:
https://lucene.apache.org/solr/guide/7_0/coreadmin-api.html#coreadmin-create
, I am running the following command:

sudo -u solr /opt/solr/bin/solr create -c dbtrphon -schema
/tmp/solr_schemas/dbtrphon/managed-schema

I get this error:

ERROR: Unrecognized or misplaced argument: -schema!

How can I create a core with a custom managed-schema?

I'm trying to implement solr in a development environment, but I would like
to have custom schemas, so that when we move it to live, we don't have to
recreate the schemas by hand again.

Thanks,

Rhys


Virus-free.
www.avast.com

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


Re: creating a core with a custom managed-schema

2019-11-04 Thread rhys J
On Mon, Nov 4, 2019 at 1:36 PM Erick Erickson 
wrote:

> Well, just what it says. -schema isn’t a recognized parameter, where did
> you get it? Did you try bin/solr create -help and follow the instructions
> there?
>
> I am confused.

This page:
https://lucene.apache.org/solr/guide/7_0/coreadmin-api.html#coreadmin-create

says that schema is a valid parameter, and it explains how to use it.

But when I use the command create, I get an error.

Is there no way to use a custom schema to create a core from the command
line? Will I always have to either hand edit the managed-schema, or use the
API?

Thanks,

Rhys


Using solr API to return csv results

2019-11-07 Thread rhys J
If I am using the Solr API to query the core, is there a way to tell how
many documents are found if i use wt=CSV?

Thanks,

Rhys


different results in numFound vs using the cursor

2019-11-11 Thread rhys J
i am using this logic in perl:

my $decoded = decode_json( $solrResponse->{_content} );
my $numFound = $decoded->{response}{numFound};

$cursor = "*";
$prevCursor = '';

while ( $prevCursor ne $cursor )
{
  my $solrURI = "\"http://[SOLR URL]:8983/solr/";
  $solrURI .= $fdat{core};

  $solrSort = ( $fdat{core} eq 'dbtr' ) ? "debtor_id+asc" : "id+asc";
  $solrOptions = "/select?indent=on&rows=$getrows&sort=$solrSort&q=";
  $solrURI .= $solrOptions;
  $solrURI .= $query;

 $solrURI .= ( $prevCursor eq '' ) ? "&cursorMark=*\"":
 "&cursorMark=$cursor\"";

 print STDERR "solrURI '$solrURI'\n";
 my $solrResponse = $ua->post( $solrURI );
   my $decoded = decode_json( $solrResponse->{_content} );
  my $numFound = $decoded->{response}{numFound};

 foreach my $d ( $decoded->{response}{docs} )
  {
  my @docs = @$d;
  print STDERR "size of docs '" . scalar( @docs ) . "'\n";
   foreach my $r ( @docs )
   {
   if ( $fdat{cust_num} and $fdat{core} eq 'dbtr' )
   {
   push ( @solrResults, $r->{debtor_id} );
   }
   elsif ( $fdat{cust_num} and $fdat{core} eq 'debt' )
   {
   push ( @solrResults, $r->{debt_id} );
   }
   }

}
   $prevCursor = ( $prevCursor eq '' ) ? "*" : $cursor;
 $cursor = $decoded->{nextCursorMark};
  print STDERR "cursor '$cursor'\n";
  print STDERR "prevCursor '$prevCursor'\n";
  print STDERR "size of solrResults '" . scalar( @solrResults ) . "'\n";
}

print out:

http://[SOLR
URL]:8983/solr/debt/select?indent=on&rows=1000&sort=id+asc&q=debt_id:
608384 OR debt_id: 393291&cursorMark=AoEmMzkzMjkx

The numFound: 35008
final size of solrResults: 22006

Am I missing something I should be using with cursorMark? Or is this
expected?

I've checked my logic, and I'm using the cursors the way this page is using
them in examples:

https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html

Thanks

Rhys


Re: different results in numFound vs using the cursor

2019-11-12 Thread rhys J
On Mon, Nov 11, 2019 at 8:32 PM Chris Hostetter 
wrote:

>
> Based on the info provided, it's hard to be certain, but reading between
> the lines here are hte assumptions i'm making...
>
> 1) your core name is "dbtr"
> 2) the uniqueId field for the "dbtr" core is "debtor_id"
>
> ..are those assumptions correct?
>

Yes they are. Sorry I didn't provide that from the beginning.


> Two key pieces of information that doesn't seem to be assumable from the
> imfo you've provided:
>
> a) What is the fieldType of the uniqueKey field in use?
>

It is a textField


> b) how are you determining that "The numFound: 35008"
>
>
I do a preliminary query to the solr core and print out the numFound from
this:

 my $solrResponse = $ua->post( $solrURI );

 my $decoded = decode_json( $solrResponse->{_content} );
 my $numFound = $decoded->{response}{numFound};


> ...
>
> You show the code that prints out "size of solrResults: 22006" but nothing
> in your code ever prints $numFound.  there is a snippet of code at the top
>

I am printing numFound every time it loops. This should remain constant,
because it is the total of all documents found. It's not really necessary
that I am printing it.

The number of docs is the size that I also print, and that is 1000 every
time, until the last little bit, and then it is 6 docs found.


> of your perl logic that seems disconnected from the rest of the code which
> makes me think that before you do anything with a cursor you are already
> parsing some *other* query response to get $numFound that way...
>
>
I am running this query first, to get the cursor set:

"http://10.40.10.14:8983/solr/debt/select?indent=on&rows=1000&sort=id
asc&q=debt_id: 608384 OR debt_id: 393291&cursorMark=*"

This sets the cursor, and then returns a cursorMark that I start using in
order to grab 1000 documents at a time.



> ...what exactly does all the code *before* this look like? what is the
> request that you are using to get that initial '$solrResponse' that you
> are parsing to extract '$numFound'  are you sure it's exactly the same as
> the query whose cursor you are iterating over?
>
>
query from before the loop:

"http://10.40.10.14:8983/solr/debt/select?indent=on&rows=1000&sort=id
asc&q=debt_id: 608384 OR debt_id: 393291&cursorMark=*"

query in the loop:

http://10.40.10.14:8983/solr/debt/select?indent=on&rows=1000&sort=id+asc&q=debt_id:
608384 OR debt_id: 393291&cursorMark=AoElMTg1MzE=

I do have some logic to make sure i grab the first 1000 from the first
query, but other than that, it's a simple loop.


> It looks like you are (also) extracting 'my $numFound =
> $decoded->{response}{numFound};' on every (cusor) request ... what do you
> get if add this to your cursor loop...
>
>print STDERR "numFound = $numFound at '$cursor'";
>
> numFound is always 35008 because that is how many total documents are
found. The number of docs in the response is the number that I care about,
because that shows me how many came back for this slice.


> ...because unless documents are being added/deleted as you iterate over
> hte cursor, the numFound value should be consistent on each request.
>
>
numFound is consistently 35008.

Thanks

Rhys


using fq means no results

2019-11-12 Thread rhys J
If I do this query in the browser:

http://10.40.10.14:8983/solr/debt/select?q=(clt_ref_no:+owl-2924-8)^=1.0+clt_ref_no:owl-2924-8

I get 84662 results.

If I do this query:

http://10.40.10.14:8983/solr/debt/select?q=(clt_ref_no:+owl-2924-8)^=1.0+clt_ref_no:owl-2924-8&fq=clt_ref_no

I get 0 results.

Why does using fq do this?

What am I missing in my query?

Thanks,

Rhys


Re: using fq means no results

2019-11-12 Thread rhys J
On Tue, Nov 12, 2019 at 11:57 AM Erik Hatcher 
wrote:

> fq is a filter query, and thus narrows the result set provided by the q
> down to what also matches all specified fq's.
>
>
So this can be used instead of scoring? Or alongside scoring?


> You gave it a query, "cat_ref_no", which literally looks for that string
> in your default field.   Looking at your q parameter, cat_ref_no looks like
> a field name, and your fq should probably also have a value for that field
> (say fq=cat_ref_no=owl-2924-8)
>
> Use debug=true to see how your q and fq's are parsed, and that
> should shed some light on the issue.
>
>
Thank you for your help!

Rhys


Re: different results in numFound vs using the cursor

2019-11-12 Thread rhys J
On Tue, Nov 12, 2019 at 12:18 PM Chris Hostetter 
wrote:

>
> : > a) What is the fieldType of the uniqueKey field in use?
> : >
> :
> : It is a textField
>
> whoa... that's not normal .. what *exactly* does the fieldType declaration
> (with all analyzers) look like, and what does the  declaration
> look like?
>
>




  
  
  


  
  
  
  

  



> you should really never use TextField for a uniqueKey ... it's possible,
> but incredibly tricky to get "right".
>
>
I am going to adjust my schema, re-index, and try again. See if that
doesn't fix this problem. I didn't know that having the uniqueKey be a
textField was a bad idea.


> Independent from that, "sorting" on a TextField doesn't always do what you
> might think (again: depending on the analysis in use)
>
> With a cursorMark you have other factors to consider: i bet what's
> happening is that the post-analysis terms for your docs result it
> duplicate values, so the cursorMark is skipping all docs that have hte
> same (post analysis) sort value ... this could also manifest itself in
> other weird ways, like trying to deleteById.
>
> Step #1: switch to using a simple StrField for your uniqueKey field and
> see if htat solves all your problems.
>
>
Thanks, doing this now.

Rhys


Re: different results in numFound vs using the cursor

2019-11-12 Thread rhys J
> : I am going to adjust my schema, re-index, and try again. See if that
> : doesn't fix this problem. I didn't know that having the uniqueKey be a
> : textField was a bad idea.
>
>
> https://lucene.apache.org/solr/guide/8_3/other-schema-elements.html#OtherSchemaElements-UniqueKey
>
> "The fieldType of uniqueKey must not be analyzed"
>
> (hence my comment baout "possible, but hard to get right ... you can use
> something like the KeywordTokenizer, but at that point you might as well
> use StrField except in some really esoteric special situations)
>
>
Good news. I added a field called ID, and made it string. Then I deleted
documents, re-indexed my data, and tried the search again.

Now solrResults size and numFound size are exactly the same.

Thanks for your help.

Rhys


date fields and invalid date string errors

2019-11-13 Thread rhys J
I have date fields in my documents that are just -MM-DD.

I set them as a pdate field in the schema as such:



and



When I use the API to do a search and try:

2018-01-01
[2018-01-01 TO NOW]

I get 'Invalid Date String'.

Did I type my data wrong in the schema? Is there something I'm missing from
the field itself?

According to this page, I should be able to query on just  or -MM
or -MM-DD.

https://lucene.apache.org/solr/guide/6_6/working-with-dates.html

Thanks,

Rhys


Re: date fields and invalid date string errors

2019-11-13 Thread rhys J
> If you use DateRangeField instead of DatePointField for your field's
> class, then you can indeed use partial timestamps for both indexing and
> querying.  This only works with DateRangeField.
>
>
I don't see that as an option in the API? Do I need to change what pdate's
type is in the managed-schema for it to take effect?

As in:

 

to

 

Thanks,

Rhys


Re: date fields and invalid date string errors

2019-11-13 Thread rhys J
> You could do it that way ... but instead, I'd create a new fieldType,
> not change an existing one.  The existing name is "pdate" which implies
> "point date".  I would probably go with "daterange" or "rdate" as the
> name, but that is completely up to you.
>
>
I did that, deleted docs, stopped, started solr, and then re-indexed. And
it's working like I expect it to.

Thanks for the help.

Rhys


Query More Than One Core

2019-11-13 Thread rhys J
I have more than one core. Each core represents one database table.

They are coordinated by debt_id/debtor_id, so we can do join statements on
them with Sybase/SQL.

Is there a way to query more than one core at a time, or do I need to do
separate queries per core, and then somehow with perl aggregate them into
one list?

Thanks,

Rhys


Re: Query More Than One Core

2019-11-13 Thread rhys J
On Wed, Nov 13, 2019 at 3:16 PM Jörn Franke  wrote:

> You can use nested indexing and Index both types of documents in one core.
>
> https://lucene.apache.org/solr/guide/8_1/indexing-nested-documents.html


I had read that, but it doesn't really fit our needs right now.

I figured out how to do a join like so:

http://localhost8983/solr/debt/select?indent=on&rows=100&sort=id
asc&q=(debt_id:570856 OR reference_no: *570856*)&fq={!join from=debtor_id
to=debt_id fromIndex=dbtr}ssn1:12


However, what is the use case for Solr if you have already a database?
>

The use case is that we have an old search tool that uses the db, but it's
painfully slow, and it doesn't do fuzzy searches very well, or handle
things like searching for phone numbers without it relying on a lot of
regular expressions. A search engine speeds things up, and gets more
precise results.

Thanks,

Rhys


using gt and lt in a query

2019-11-14 Thread rhys J
I am trying to duplicate this line from a db query:

(debt.orig_princ_amt > 0 AND debt.princ_paid > 0 AND debt.orig_princ_amt >
debt.princ_paid)

I have the following, but it returns no results:

http://localhost:8983/solr/debt/select?q=orig_princ_amt
: 0 TO
 *
AND

princ_paid:0

TO

*
AND

gt(orig_princ_amt
,
princ_paid)


I should have 1075459 results, but I get 0.

Thanks,

Rhys


Re: using gt and lt in a query

2019-11-14 Thread rhys J
> Range queries are done with brackets and/or braces.  A square bracket
> indicates that the range should include the precise value mentioned, and
> a curly brace indicates that the range should exclude the precise value
> mentioned.
>
>
> https://lucene.apache.org/solr/guide/8_2/the-standard-query-parser.html#TheStandardQueryParser-RangeSearches
>
>
But I'm not doing a range, I'm doing a query on whether one field is
greater than another field. Or am I missing something here?

Thanks,

Rhys


Re: using gt and lt in a query

2019-11-14 Thread rhys J
On Thu, Nov 14, 2019 at 1:28 PM Erick Erickson 
wrote:

> You might be able to make this work with function queries….
>
>
>
I managed to decipher something along the lines of this:

http://10.40.10.14:8983/solr/debt/select?q=orig_princ_amt: 0 TO
 *
AND

princ_paid:0

TO

*&
fq={!frange
l=0}if( gt(orig_princ_amt
,
princ_paid),1, 0 )

but it's still not giving me the entire results that the database gives. So
I'm not sure what I am missing?

Thanks,

Rhys


using NOT or - to exclude results with a textField type

2019-11-15 Thread rhys J
I'm trying to exclude results based on the documentation about the boolean
NOT symbol, but I keep getting errors.

I've tried:

http://localhost:8983/solr/debt/select?q=clt_ref_no:-”owl-2924-8”

and

http://localhost:8983/solr/debt/select?q=clt_ref_no:NOT”owl-2924-8”

I have tried with and without quotes too.

Am I not able to use the NOT with a textField?

Here are the errors I get from the browser:

"msg":"org.apache.solr.search.SyntaxError: Cannot parse
'clt_ref_no:-”=owl-2924-8”': Encountered \" \"-\" \"- \"\" at line 1,
column 11.\nWas expecting one of:\n ...\n\"(\" ...\n
   \"*\" ...\n ...\n ...\n ...\n
...\n ...\n\"[\" ...\n\"{\"
...\n ...\n\"filter(\" ...\n ...\n",

Thanks,

Rhys


attempting to get an exact match on a textField

2019-11-15 Thread rhys J
I am trying to use the API to get an exact match on clt_ref_no.

At one point, I was using ""s to enclose the text such as:

clt_ref_no: "OWL-2924-8", and I was getting 5 results. Which is accurate.

Now when I use it, I only get one match.

If I try to build the url in perl, and then post the url, my response is
this:

http://localhost:8983/solr/debt/select?indent=on&rows=1000&sort=id%20asc&q=(%20clt_ref_no:%22OWL-2924-8%E2%80%9D%20OR%20contract_number:%22OWL-2924-8%22%20)&fq={!join%20from=debtor_id%20to=debt_id%20fromIndex=dbtr}&cursorMark=*&debug=true


Breaking that down, I've got:

q=( clt_ref_no: "OWL-2924-8" OR contract_number: "OWL-2924-8" )
fq= {!join from=debtor_id to=debt_id fromIndex=dbtr}

"error":{
"trace":"java.lang.NullPointerException\n\tat
org.apache.solr.search.JoinQuery.hashCode(JoinQParserPlugin.java:584)\n\tat
java.base/java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)\n\tat
org.apache.solr.util.ConcurrentLRUCache.get(ConcurrentLRUCache.java:130)\n\tat
org.apache.solr.search.FastLRUCache.get(FastLRUCache.java:165)\n\tat
org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:815)\n\tat
org.apache.solr.search.SolrIndexSearcher.getProcessedFilter(SolrIndexSearcher.java:1026)\n\tat
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1541)\n\tat
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1421)\n\tat
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:568)\n\tat
org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryComponent.java:1484)\n\tat
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:398)\n\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:305)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:2578)\n\tat
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:780)\n\tat
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:566)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:423)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:350)\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:505)\n\tat
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)\n\tat
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)\n\tat
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)\n\tat
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProd

Re: attempting to get an exact match on a textField

2019-11-16 Thread rhys J
I figured it out. It was a combination of problems.

1. not fully indexing the data. that made the result set return smaller
than expected.
2. using the join statement without adding a field at the end of it to
search the other core on.

On Fri, Nov 15, 2019 at 1:39 PM rhys J  wrote:

>
> I am trying to use the API to get an exact match on clt_ref_no.
>
> At one point, I was using ""s to enclose the text such as:
>
> clt_ref_no: "OWL-2924-8", and I was getting 5 results. Which is accurate.
>
> Now when I use it, I only get one match.
>
> If I try to build the url in perl, and then post the url, my response is
> this:
>
>
> http://localhost:8983/solr/debt/select?indent=on&rows=1000&sort=id%20asc&q=(%20clt_ref_no:%22OWL-2924-8%E2%80%9D%20OR%20contract_number:%22OWL-2924-8%22%20)&fq={!join%20from=debtor_id%20to=debt_id%20fromIndex=dbtr}&cursorMark=*&debug=true
> <http://10.40.10.14:8983/solr/debt/select?indent=on&rows=1000&sort=id%20asc&q=(%20clt_ref_no:%22OWL-2924-8%E2%80%9D%20OR%20contract_number:%22OWL-2924-8%22%20)&fq=%7B!join%20from=debtor_id%20to=debt_id%20fromIndex=dbtr%7D&cursorMark=*&debug=true>
>
> Breaking that down, I've got:
>
> q=( clt_ref_no: "OWL-2924-8" OR contract_number: "OWL-2924-8" )
> fq= {!join from=debtor_id to=debt_id fromIndex=dbtr}
>
> "error":{
> "trace":"java.lang.NullPointerException\n\tat 
> org.apache.solr.search.JoinQuery.hashCode(JoinQParserPlugin.java:584)\n\tat 
> java.base/java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)\n\tat
>  
> org.apache.solr.util.ConcurrentLRUCache.get(ConcurrentLRUCache.java:130)\n\tat
>  org.apache.solr.search.FastLRUCache.get(FastLRUCache.java:165)\n\tat 
> org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:815)\n\tat
>  
> org.apache.solr.search.SolrIndexSearcher.getProcessedFilter(SolrIndexSearcher.java:1026)\n\tat
>  
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1541)\n\tat
>  
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1421)\n\tat
>  
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:568)\n\tat
>  
> org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryComponent.java:1484)\n\tat
>  
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:398)\n\tat
>  
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:305)\n\tat
>  
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat
>  org.apache.solr.core.SolrCore.execute(SolrCore.java:2578)\n\tat 
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:780)\n\tat 
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:566)\n\tat 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:423)\n\tat
>  
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:350)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat
>  
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152)\n\tat
>  
> org.eclipse.jetty.server.

using scoring to find exact matches while using a cursormark

2019-11-18 Thread rhys J
I am trying to use scoring to get the expected results at the top of the
stack when doing a Solr query.

I am looking up clt_ref_no: OWL-2924-8^2 OR contract_number: OWL-2924-8^2

If I use the following query:in the browser, I get the expected results at
the top of the returned values from Solr.

{
  "responseHeader":{
"status":0,
"QTime":41,
"params":{
  "q":"( clt_ref_no:OWL-2924-8 ^2 OR contract_number:OWL-2924-8^2 )",
  "indent":"on",
  "fl":"clt_ref_no, score",
  "rows":"1000"}},
  "response":{"numFound":84663,"start":0,"maxScore":25.664566,"docs":[
  {
"clt_ref_no":"OWL-2924-8",
"score":25.664566},
  {
"clt_ref_no":"OWL-2924-8",
"score":25.664566},
  {
"clt_ref_no":"OWL-2924-8/73847",
"score":23.509575},
  {
"clt_ref_no":"OWL-2924-8/73847",
"score":23.509575},
  {
"clt_ref_no":"OWL-2924-8/73847",
"score":23.509575},
  {
"clt_ref_no":"U615-2924-8",
"score":19.244316},
  {
"clt_ref_no":"M1057-2924-8/88543",
"score":17.650301},

If I add ihe sorting needed for cursor, my results change
dramatically, and the exact matches are not at the top of the stack.

Example:



{
  "responseHeader":{
"status":0,
"QTime":80,
"params":{
  "q":"( clt_ref_no:OWL-2924-8 ^2 OR contract_number:OWL-2924-8^2 )",
  "indent":"on",
  "fl":"clt_ref_no, score",
  "sort":"score asc, id asc",
  "rows":"1000"}},
  "response":{"numFound":84663,"start":0,"maxScore":25.664566,"docs":[
  {
"clt_ref_no":"MMRO-1258-13/MMRO-1258-13/8",
"score":1.3380225},
  {
"clt_ref_no":"MMMP-151-14/MMMP-151-14/8",
"score":1.3380225},
  {
"clt_ref_no":"MMRO-806-14/MMRO-806-14/8",
"score":1.3380225},
  {
"clt_ref_no":"MMMP-44-14/MMMP-44-14/8",
"score":1.3380225},
  {
"clt_ref_no":"MMRO-45-13/MMRO-45-13/8",
"score":1.3380225},
  {
"clt_ref_no":"MMIN-202-14/MMIN-202-14/8",
"score":1.3380225},
  {
"clt_ref_no":"MMTC-1457-14/MMTC-1457-14/8",
"score":1.3380225},
  {

Should I not be sorting on score? I thought sorting on score was how I
would get the exact matches to return?

If I add in sort=score asc to the first query, it does what the second
query does, and not have expected matches floating to the top of the
results.

Thanks,

Rhys


Re: using scoring to find exact matches while using a cursormark

2019-11-18 Thread rhys J
> ...so w/o a score param you're getting the default sort: score "desc"
> (descending)...
>
>
> https://lucene.apache.org/solr/guide/8_3/common-query-parameters.html#CommonQueryParameters-ThesortParameter
>
> "If the sort parameter is omitted, sorting is performed as though
> the
> parameter were set to score desc."
>
>
>
Oh my goodness, I didn't realize the default was desc! Thanks for pointing
that out. I adjusted my query, and now it's getting the sorting right.

Thanks so much,

Rhys


Attempting to do a join with 3 cores

2019-11-18 Thread rhys J
I was hoping to be able to do a join with 3 cores.

I found this page that seemed to indicate it's possible?

https://stackoverflow.com/questions/52380302/solr-how-to-join-three-cores

Here's my query:

http://localhost:8983/solr/dbtrphon/select?indent=on&rows=1000&sort=score
desc, id desc&q=(phone:*Meredith* OR descr:*Meredith*){!join from=debtor_id
to=debt_id fromIndex=debt}*&fq={!join from=debtor_id to=debtor_id
fromIndex=dbtr }(ssn1:12 OR ssn1:33) AND (assign_id:584 OR
assign_id:583)&cursorMark=*

{
  "responseHeader":{
"status":400,
"QTime":14,
"params":{
  "q":"(phone:*Meredith* OR descr:*Meredith*){!join from=debtor_id
to=debt_id fromIndex=debt}*",
  "indent":"on",
  "cursorMark":"*",
  "sort":"score desc, id desc",
  "fq":"{!join from=debtor_id to=debtor_id fromIndex=dbtr
}(ssn1:12 OR ssn1:33) AND (assign_id:584 OR assign_id:583)",
  "rows":"1000"}},
  "error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.common.SolrException"],
"msg":"undefined field: \"debtor_id\"",
"code":400}}


The schema for dbtrphon has debtor_id, the schema for debt has debt_id, and
the schema for dbtr has debtor_id. These fields all should be able to join,
but I get this error.

I've tried substituting debt_id for the debtor_id in the second join, but I
get the same error 'undefined field 'debt_id'.

I am unsure what I'm missing?

Thanks,

Rhys


exact matches on a join

2019-11-19 Thread rhys J
I am trying to do a join, which I have working properly on 2 cores.

One core has report_as, and the other core has debt_id.

If I enter 'report_as: "Freeman", I expect to get 272 results. But I get
557.

When I do a database search on the matched fields, it shows me that
report_as: "Freeman" is matching also on 'A-1 Freeman'.

I have tried boosting the score as report_as: "Freeman"^2, but I get the
same results from the API, and from the browser itself.

Here is my query:

{
  "responseHeader":{
"status":0,
"QTime":5,
"params":{
  "q":"( * )",
  "indent":"on",
  "fl":"debt_id, score",
  "cursorMark":"*",
  "sort":"score desc, id desc",
  "fq":"{!join from=debtor_id to=debt_id fromIndex=dbtr}(
report_as:\"Freeman\"^2)",
  "rows":"1000"}},
  "response":{"numFound":557,"start":0,"maxScore":1.0,"docs":[
  {
"debt_id":"485435",
"score":1.0},
  {
"debt_id":"485435",
"score":1.0},
  {
"debt_id":"482795",
"score":1.0},
  {
"debt_id":"482795",
"score":1.0},
  {
"debt_id":"482794",
"score":1.0},
  {
"debt_id":"482794",
"score":1.0},
  {
"debt_id":"482794",
"score":1.0},

SKIP



{
"debt_id":"396925",
"score":1.0},
  {
"debt_id":"396925",
"score":1.0},
  {
"debt_id":"396925",
"score":1.0},
  {
"debt_id":"396925",
"score":1.0},
  {
"debt_id":"396925",
"score":1.0},
  {
"debt_id":"396925",
"score":1.0},
  {
"debt_id":"396925",
"score":1.0},
  {
"debt_id":"396925",
"score":1.0},
  {
"debt_id":"396925",
"score":1.0},
  {
"debt_id":"396925",
"score":1.0},
  {
"debt_id":"396925",


These ones are the correct matches that I can verify with the
database, but their scores are the same as the ones matching on
'A1-Freeman'

Is my scoring set up wrong?

Thanks,

Rhys


Re: exact matches on a join

2019-11-21 Thread rhys J
On Thu, Nov 21, 2019 at 8:04 AM Jason Gerlowski 
wrote:

> Are these fields "string" or "text" fields?
>
> Text fields receive analysis that splits them into a series of terms.
> That's why the query "Freeman" matches the document "A-1 Freeman".
> "A-1 Freeman" gets split up into multiple terms, and the "Freeman"
> query matches one of those terms.  Text fields are what you use when
> you want matches to have some wiggle room based on your analyzers.
>
> String fields are much more geared towards exact matches.  No analysis
> is done, so a query for "Freeman" would only match docs who have that
> value identically.
>
>
Thanks, this was the conclusion I came to too. When I asked, they decided
that those matches were acceptable, and to keep the field a textField.

Rhys


Highlighting on typing in search box

2019-11-21 Thread rhys J
Are there any recommended APIs or code examples of using Solr and then
highlighting results below the search box?

I'm trying to implement a search box that will search solr as the user
types, if that makes sense?

Thanks,

Rhys


Re: Highlighting on typing in search box

2019-11-21 Thread rhys J
Thank you both! I've got an autocomplete working on a basic format right
now, and I'm working on implementing it to be smart about which core it
searches.

On Thu, Nov 21, 2019 at 11:43 AM Jörn Franke  wrote:

> It sounds like you look for a suggester.
>
> You can use the suggester of Solr.
>
> For the visualization part: Angular has a suggestion box that can ingest
> the results from Solr.
>
> > Am 21.11.2019 um 16:42 schrieb rhys J :
> >
> > Are there any recommended APIs or code examples of using Solr and then
> > highlighting results below the search box?
> >
> > I'm trying to implement a search box that will search solr as the user
> > types, if that makes sense?
> >
> > Thanks,
> >
> > Rhys
>


How to tell which core was used based on Json or XML response from Solr

2019-11-22 Thread rhys J
I'm implementing an autocomplete search box for Solr.

I'm using JSON as my response style, and this is the jquery code.


 var url='http://10.40.10.14:8983/solr/'+core+'/select/?q='+queryField +

query+'&version=2.2&hl=true&start=0&rows=50&indent=on&wt=json&callback=?&json.wrf=on_data';

 jQuery_3_4_1.getJSON(url);

___

on_data(data)
{
 var docs = data.response.docs;
jQuery_3_4_1.each(docs, function(i, item) {

var trLink = ' '
 + item.debtor_id + '';

trLink += '' + item.name1 + '';
trLink += '' + item.dl1 + '';
trLink += '';

jQuery_3_4_1('#resultsTable').prepend(jQuery_3_4_1(trLink));
});

}

the jQuery_3_4_1 variable is replacing $ because I needed to have 2
different versions of jQuery running in the same document.

I'd like to know if there's something I'm missing that will indicate which
core I've used in Solr based on the response.

Thanks,

Rhys


Re: How to tell which core was used based on Json or XML response from Solr

2019-11-22 Thread rhys J
On Fri, Nov 22, 2019 at 1:39 PM David Hastings 
wrote:

> 2 things (maybe 3):
> 1.  dont have this code facing a client thats not you, otherwise anyone
> could view the source and see where the solr server is, which means they
> can destroy your index or anything they want.  put at the very least a
> simple api/front end in between the javascript page for the user and the
> solr server
>

Is there a way I can fix this?


> 2. i dont think there is a way, you would be better off indexing an
> indicator of sorts into your documents
>

Oh this is a good idea.

Thanks!

3. the jquery in your example already has the core identified, not sure why
> the receiving javascript wouldn't be able to read that variable unless im
> missing something.
>
>
There's another function on_data that is being called by the url, which
does not have any indication of what the core was, only the response from
the url.

Thanks,

Rhys


Re: How to tell which core was used based on Json or XML response from Solr

2019-11-25 Thread rhys J
On Mon, Nov 25, 2019 at 2:10 AM Erik Hatcher  wrote:

> add &core=&echoParams=all and the parameter will be in the response
> header.
>
>Erik
>

Thanks. I just tried this, and all I got was this response:

http://localhost:8983/solr/dbtr/select?q=debtor_id%3A%20393291&echoParams=all



{
  "responseHeader":{
"status":0,
"QTime":14,
"params":{
  "q":"debtor_id: 393291",
  "df":"_text_",
  "rows":"10",
  "echoParams":"all"}},
  "response":{"numFound":1,"start":0,"docs":[
  {

Rhys


Re: How to tell which core was used based on Json or XML response from Solr

2019-11-25 Thread rhys J
> if you are taking the PHP route for the mentioned server part then I would
> suggest
> using a client library, not plain curl.  There is solarium, for instance:
>
> https://solarium.readthedocs.io/en/stable/
> https://github.com/solariumphp/solarium
>
> It can use curl under the hood but you can program your stuff on a higher
> level,
> against an API.
>
>
I am using jquery, so I am using the json package to send and decode the
json that solr sends. I hope that makes sense?

Thanks for your tip!

Our pages are a combo of jquery, javascript, and perl.


Re: How to tell which core was used based on Json or XML response from Solr

2019-11-25 Thread rhys J
On Mon, Nov 25, 2019 at 1:10 AM Paras Lehana 
wrote:

> Hey rhys,
>
> What David suggested is what we do for querying Solr. You can figure out
> our frontend implementation of Auto-Suggest by seeing the AJAX requests
> fired when you type in the search box on www.indiamart.com.
>

 That is pretty cool.

I've ended up with something that highlights the match in a results table.
It's working, and the client seems happy with that implementation for now.


> Why are you using two jQuery files? If you have a web server, you already
> know that which core you queried from. Just convert the Solr JSON response
> and add the key "core" and return the modified JSON response. Keep your
> front-end query simple - just describe your query. All the other parameters
>

We are using 2 jquery versions, because this tool is running a tool that
has an old version of jquery attached to it. Because of that, I'm doing the
trick where you can load 2 different versions at the same time.


> can be added on the web server side. Anyways, why do you want to know the
> core name?
>

I need to know the core name, because each core has different values in the
documents, and I want to display those values based on which core was
queried.

This is kind of like an omnibox, where the user will just start typing
stuff into it. Based on what is typed, I will search a different core to
provide the right answer to them.

Thanks,

Rhys


Re: How to tell which core was used based on Json or XML response from Solr

2019-11-25 Thread rhys J
On Mon, Nov 25, 2019 at 10:43 AM David Hastings <
hastings.recurs...@gmail.com> wrote:

> you missed the part about adding &core= to the query:
> &echoParams=all&core=mega
>
> returns for me:
>
>  "responseHeader":{
> "status":0,
> "QTime":0,
> "params":{
>   "q":"*:*",
>   "core":"mega",
>   "df":"text",
>   "q.op":"AND",
>   "rows":"10",
>   "echoParams":"all"}},
>

You're right, I missed that. I added it, and it works perfectly.


>
> also we are a perl shop as well, you could implement something as
> simple as this in a cgi script or something:
>
>
> my $url = $searcher;
> my $agent = new LWP::UserAgent;
> my $request = POST($url, $data);
> my $response = $agent->request($request)->decoded_content;
>
>
>
Thanks for this tip.

Rhys


Using an & in an indexed field and then querying for it.

2019-11-25 Thread rhys J
I have some fields that have text like so:

Reliable Van & Storage.

They indexed fine when I used curl and csv files to read them into the core.

Now when I try to query for them, I get errors.

If I try escaping it like so \&, I get the following error:

on_data({
  "responseHeader":{
"status":400,
"QTime":1,
"params":{
  "q":"name1:( reliable van \\",
  "core":"dbtr",
  "json.wrf":"on_data",
  "hl":"true",
  "indent":"on",
  "start":"0",
  "stor )":"",
  "callback":"?",
  "rows":"50",
  "version":"2.2",
  "wt":"json"}},
  "error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.parser.TokenMgrError"],
"msg":"org.apache.solr.search.SyntaxError: Cannot parse 'name1:(
reliable van \\': Lexical error at line 1, column 23.  Encountered:
 after : \"\"",
"code":400}})

If I try html encoding it like so: & I get the following error:



on_data({
  "responseHeader":{
"status":400,
"QTime":3,
"params":{
  "q":"name1:( reliable van ",
  "core":"dbtr",
  "json.wrf":"on_data",
  "hl":"true",
  "indent":"on",
  "amp; stor )":"",
  "start":"0",
  "callback":"?",
  "rows":"50",
  "version":"2.2",
  "wt":"json"}},
  "error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.parser.ParseException"],
"msg":"org.apache.solr.search.SyntaxError: Cannot parse 'name1:(
reliable van ': Encountered \"\" at line 1, column 21.\nWas
expecting one of:\n ...\n ...\n ...\n
\"+\" ...\n\"-\" ...\n ...\n\"(\" ...\n\")\"
...\n\"*\" ...\n ...\n ...\n
...\n ...\n ...\n\"[\" ...\n
\"{\" ...\n ...\n\"filter(\" ...\n ...\n
  ",
"code":400}})


How can I search for a field that has an & without breaking the
parser, or is it not possible because & is used as a special
character?

Thanks,

Rhys


Re: Using an & in an indexed field and then querying for it.

2019-11-25 Thread rhys J
On Mon, Nov 25, 2019 at 2:36 PM David Hastings 
wrote:

> its breaking on the & because its in the url and you are most likely
> sending a get request to solr.  you should send it as post or as %26
>
>
The package I am using doesn't have a postJSON function available, so I'm
using their getJSON function.

I changed the & to %26, and that fixed things.

Thanks,

Rhys


Search returning unexpected matches at the top

2019-12-06 Thread rhys J
I have a search box that is just searching every possible core, and every
possible field.

When I enter 'owl-2924-8', I expect the clt_ref_no of OWL-2924-8 to float
to the top, however it is the third result in my list.

Here is the code from the search:

on_data({
  "responseHeader":{
"status":0,
"QTime":31,
"params":{
  "hl":"true",
  "indent":"on",
  "fl":"debt_id, clt_ref_no",
  "start":"0",
  "sort":"score desc, id asc",
  "rows":"500",
  "version":"2.2",
  "q":"clt_ref_no:owl\\-2924\\-8 debt_descr:owl\\-2924\\-8
comments:owl\\-2924\\-8 reference_no:owl\\-2924\\-8 ",
  "core":"debt",
  "json.wrf":"on_data",
  "urlquery":"owl-2924-8",
  "callback":"?",
  "wt":"json"}},
  "response":{"numFound":85675,"start":0,"docs":[
  {
"clt_ref_no":"2924",
"debt_id":"574574"},
  {
"clt_ref_no":"2924",
"debt_id":"598663"},
  {
"clt_ref_no":"OWL-2924-8",
"debt_id":"624401"},
  {
"clt_ref_no":"OWL-2924-8",
"debt_id":"628157"},
  {
"clt_ref_no":"2924",
"debt_id":"584807"},
  {
"clt_ref_no":"U615-2924-8",
"debt_id":"628310"},
  {
"clt_ref_no":"OWL-2924-8/73847",
"debt_id":"596713"},
  {
"clt_ref_no":"OWL-2924-8/73847",
"debt_id":"624401"},
  {
"clt_ref_no":"OWL-2924-8/73847",
"debt_id":"628157"},
  {

I'm not interested in having a specific search with quotes around it,
because this is searching everything, so it's a fuzzy search. But I am
interested in understanding why 'owl-2924-8' doesn't come out on top of the
search.

As you can see, I'm sorting by score and then id, which should take care of
things, but it's not.

Thanks,

Rhys


Re: Search returning unexpected matches at the top

2019-12-06 Thread rhys J
On Fri, Dec 6, 2019 at 11:21 AM David Hastings  wrote:

> whats the field type for:
> clt_ref_no
>

It is a text_general field because it can have numbers or alphanumeric
characters.

*_no isnt a default dynamic character, and owl-2924-8 usually translates
> into
> owl 2924 8
>
>
So it's matching on word breaks, am I understanding properly?

It's matching all things that match either 'owl' or '2924' or '8'?

Thanks,

Rhys


Re: Search returning unexpected matches at the top

2019-12-09 Thread rhys J
On Mon, Dec 9, 2019 at 12:06 AM Paras Lehana 
wrote:

> Hi Rhys,
>
> Use Solr Query Debugger
> <
> https://chrome.google.com/webstore/detail/solr-query-debugger/gmpkeiamnmccifccnbfljffkcnacmmdl?hl=en
> >
> Chrome
> Extension to see what's making up the score for both of them. I guess
> fieldNorm should impact but that should not be the only thing - there's
> another catch here.
>

Oh wow, thank you for this!

I figured out that if I added quotes to the terms, and then added ^2 to the
score, that it floated to the top just like I expected.

Thanks,

Rhys


Re: Search returning unexpected matches at the top

2019-12-10 Thread rhys J
On Tue, Dec 10, 2019 at 12:35 AM Paras Lehana 
wrote:

> That's great.
>
> But I also wanted to know why the concerned document was scored lower in
> the original query. Anyways, glad that the issue is resolved. :)
>
>
That I need to look into. If I find an answer, I will let you know.

Thanks,

Rhys


Re: Search returning unexpected matches at the top

2019-12-10 Thread rhys J
product of:\n  5.304949 = idf, computed as log(1 + (N - n + 0.5) / (n +
0.5)) from:\n14381 = n, number of documents containing term\n
 2895437 = N, total number of documents with field\n  0.33170202 = tf,
computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:\n
 1.0 = freq, occurrences of term within document\n1.2 = k1, term
saturation parameter\n0.75 = b, length normalization parameter\n
 3.0 = dl, length of field\n1.57457 = avgdl, average length of
field\n 3.8241074 = weight(clt_ref_no:2924 in 32270) [SchemaSimilarity],
result of:\n3.8241074 = score(freq=1.0), product of:\n  11.528743 =
idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:\n28 = n,
number of documents containing term\n2895437 = N, total number of
documents with field\n  0.33170202 = tf, computed as freq / (freq + k1
* (1 - b + b * dl / avgdl)) from:\n1.0 = freq, occurrences of term
within document\n1.2 = k1, term saturation parameter\n0.75
= b, length normalization parameter\n3.0 = dl, length of field\n
 1.57457 = avgdl, average length of field\n  1.1763713 =
weight(clt_ref_no:8 in 32270) [SchemaSimilarity], result of:\n1.1763713
= score(freq=1.0), product of:\n  3.5464702 = idf, computed as log(1 +
(N - n + 0.5) / (n + 0.5)) from:\n83464 = n, number of documents
containing term\n2895437 = N, total number of documents with
field\n  0.33170202 = tf, computed as freq / (freq + k1 * (1 - b + b *
dl / avgdl)) from:\n1.0 = freq, occurrences of term within
document\n1.2 = k1, term saturation parameter\n 0.75 = b, length
normalization parameter\n3.0 = dl, length of field\n1.57457
= avgdl, average length of field\n  4.0874248 = weight(reference_no:2924 in
32270) [SchemaSimilarity], result of:\n4.0874248 = score(freq=1.0),
product of:\n  11.528412 = idf, computed as log(1 + (N - n + 0.5) / (n
+ 0.5)) from:\n28 = n, number of documents containing term\n
 2894478 = N, total number of documents with field\n 0.35455227 = tf,
computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:\n
 1.0 = freq, occurrences of term within document\n1.2 = k1, term
saturation parameter\n0.75 = b, length normalization parameter\n
 3.0 = dl, length of field\n1.77578 = avgdl, average length of
field\n  1.2573512 = weight(reference_no:8 in 32270) [SchemaSimilarity],
result of:\n1.2573512 = score(freq=1.0), product of:\n 3.5463068 = idf,
computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:\n83450 = n,
number of documents containing term\n2894478 = N, total number of
documents with field\n  0.35455227 = tf, computed as freq / (freq + k1
* (1 - b + b * dl / avgdl)) from:\n1.0 = freq, occurrences of term
within document\n1.2 = k1, term saturation parameter\n0.75
= b, length normalization parameter\n3.0 = dl, length of field\n
1.77578 = avgdl, average length of field\n",

"584807-3":"\n11.142687 = sum of:\n  6.159883 = weight(clt_ref_no:2924 in
27502) [SchemaSimilarity], result of:\n6.159883 = score(freq=1.0),
product of:\n  11.528743 = idf, computed as log(1 + (N - n + 0.5) / (n
+ 0.5)) from:\n28 = n, number of documents containing term\n
 2895437 = N, total number of documents with field\n  0.5343066 = tf,
computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:\n
 1.0 = freq, occurrences of term within document\n1.2 = k1, term
saturation parameter\n0.75 = b, length normalization parameter\n
 1.0 = dl, length of field\n1.57457 = avgdl, average length of
field\n  4.9828043 = weight(reference_no:2924 in 27502) [SchemaSimilarity],
result of:\n4.9828043 = score(freq=1.0), product of:\n  11.528412 =
idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:\n28 = n,
number of documents containing term\n2894478 = N, total number of
documents with field\n  0.4322195 = tf, computed as freq / (freq + k1 *
(1 - b + b * dl / avgdl)) from:\n1.0 = freq, occurrences of term
within document\n1.2 = k1, term saturation parameter\n0.75
= b, length normalization parameter\n2.0 = dl, length of field\n
 1.77578 = avgdl, average length of field\n",
  "628310-6004":"\n10.345255 = sum of:\n  3.8241074 =
weight(clt_ref_no:2924 in 2391) [SchemaSimilarity], result of:\n
 3.8241074 = score(freq=1.0), product of:\n  11.528743 = idf, computed
as log(1 + (N - n + 0.5) / (n + 0.5)) from:\n28 = n, number of
documents containing term\n2895437 = N, total number of documents
with field\n  0.33170202 = tf, computed as freq / (freq + k1 * (1 - b +
b * dl / avgdl)) from:\n1.0 = freq, occurrences of term within
document\n1.2 = k1, term saturation parameter\n0.75 = b,
length normalization parameter\n3.0 = dl, length of field\n
 1.57457 = avgdl, average length of field\n  1.1763713 =
weight(clt_ref_no:8 in 2391) [SchemaSimilarity], result of:\n1.1763713
= score(freq=1.0), product of:\n  3.5464702 = idf, computed as log(1 +
(N - n + 0.5) / (n + 0.5)) from:\n83464 = n, number of documents
containing term\n2895437 = N, total number of documents with
field\n  0.33170202 = tf, computed as freq / (freq + k1 * (1 - b + b *
dl / avgdl)) from:\n1.0 = freq, occurrences of term within
document\n1.2 = k1, term saturation parameter\n0.75 = b,
length normalization parameter\n3.0 = dl, length of field\n
 1.57457 = avgdl, average length of field\n  4.0874248 =
weight(reference_no:2924 in 2391) [SchemaSimilarity], result of:\n
 4.0874248 = score(freq=1.0), product of:\n  11.528412 = idf, computed
as log(1 + (N - n + 0.5) / (n + 0.5)) from:\n28 = n, number of
documents containing term\n2894478 = N, total number of documents
with field\n  0.35455227 = tf, computed as freq / (freq + k1 * (1 - b +
b * dl / avgdl)) from:\n1.0 = freq, occurrences of term within
document\n1.2 = k1, term saturation parameter\n0.75 = b,
length normalization parameter\n3.0 = dl, length of field\n
 1.77578 = avgdl, average length of field\n  1.2573512 =
weight(reference_no:8 in 2391) [SchemaSimilarity], result of:\n
 1.2573512 = score(freq=1.0), product of:\n  3.5463068 = idf, computed
as log(1 + (N - n + 0.5) / (n + 0.5)) from:\n83450 = n, number of
documents containing term\n2894478 = N, total number of documents
with field\n  0.35455227 = tf, computed as freq / (freq + k1 * (1 - b +
b * dl / avgdl)) from:\n1.0 = freq, occurrences of term within
document\n1.2 = k1, term saturation parameter\n0.75 = b,
length normalization parameter\n3.0 = dl, length of field\n
 1.77578 = avgdl, average length of field\n",

On Tue, Dec 10, 2019 at 10:56 AM rhys J  wrote:

>
>
> On Tue, Dec 10, 2019 at 12:35 AM Paras Lehana 
> wrote:
>
>> That's great.
>>
>> But I also wanted to know why the concerned document was scored lower in
>> the original query. Anyways, glad that the issue is resolved. :)
>>
>>
> That I need to look into. If I find an answer, I will let you know.
>
> Thanks,
>
> Rhys
>


user solr created by install not working with default password

2019-12-11 Thread rhys J
I installed Solr following the directions on this site:

https://lucene.apache.org/solr/guide/6_6/installing-solr.html

I am running standalone Solr with no authentication added because it is all
in-house with no access to outside requests.

When I try to su solr, using the password mentioned here:
https://lucidworks.com/post/securing-solr-basic-auth-permission-rules/, i
get an authentication failure.

I'm trying to chase down a bug, and I need to be able to see the results of
some commands from the user solr's perspective.

What am I doing wrong?

Thanks,

Rhys


backing up and restoring

2019-12-11 Thread rhys J
I made backups with the following command:

sudo -u solr curl '
http://localhost:8983/solr/debt/replication?command=backup&location=/tmp/solr

backups/debt/'

I double checked that I had made the backup, and I had a backup.

To test the restore function, I deleted documents with the following
command:

sudo -u solr curl http://localhost:8983/solr/debt/update -H "Content-type:
text/xml" --data-binary '*:*

I stopped and started the service to verify that there are 0 documents in
the debt core.

Then I ran the following command:

sudo -u solr curl
'http://localhost:8983/solr/debt/replication?command=restore&name=/tmp/solrbackups/debt/snapshot.20191211175715254'

response:

 "responseHeader":{
"status":0,
"QTime":4},
  "status":"OK"}

Then I went to the web API to check the amount of documents I had on the
core. It still says 0.

I stopped and started the service to just be sure that that wasn't the
problem, but it still says there are 0 documents.

Am I missing a step in how i restore a backup?

Thanks,

Rhys


Re: user solr created by install not working with default password

2019-12-11 Thread rhys J
> That page talks about setting up authentication for HTTP access to the
> Solr API.  It has nothing at all to do with the OS user created by the
> service install script.
>
> When the service install creates the OS user for the service, it is
> created in such a way that its password is disabled.  You can't use a
> password for that user.  On my Linux machine with a Solr service
> installed, the hashed password entry in /etc/shadow is * - an asterisk.
>
> Here is an excerpt from the man page for shadow:
>
>
Thanks for explaining that. I was confused.

Rhys


Re: backing up and restoring

2019-12-12 Thread rhys J
On Thu, Dec 12, 2019 at 3:49 AM sudhir kumar 
wrote:


> once you backup index with some location, you have to specify the same
> location to restore.
>
> ie in your case /tmp/solr is the location indexed was backed up , use same
> location for restore.
>
> you did not provide name so latest snapshot will be restored.
>
> curl '
>
> http://localhost:8983/solr/debt/replication?command=backup&location=/tmp/solr
> '
>
> snapshot is created at /tmp/solr/snapshot.2019xx
>
> execute below command, latest snapshot will be restored
> curl '
>
> http://localhost:8983/solr/debt/replication?command=restore&location=/tmp/solr
> <
> http://localhost:8983/solr/debt/replication?command=backup&location=/tmp/solr
> >
> '
>
>
I figured this out, but even when I specify location, and even name, I get
an OK from the status, but the index remains empty?

Commands used:

sudo -u solr curl '
http://localhost:8983/solr/debt/replication?command=backup

&location=/tmp/solrbackups/debt/'

sudo -u solr curl
'http://localhost:8983/solr/debt/replication?command=restore&location=/tmp/solrbackups/debt/'



Thanks,

Rhys


Re: backing up and restoring

2019-12-12 Thread rhys J
This page seems to indicate that I should copy the files from the backup
directory back into the index?

Is this accurate?

https://codice.atlassian.net/wiki/spaces/DDF22/pages/2785407/Solr+Standalone+Server+Backup

Thanks,

Rhys


Re: backing up and restoring

2019-12-12 Thread rhys J
I was able to successfully restore a backup by specifying name and location
in the restore command.

But now when i try to run:

sudo -u solr curl http://localhost:8983/solr/debt/update -H "Content-type:
text/xml" --data-binary '*:*'

I get the following error:

 no segments* file found in
LockValidatingDirectoryWrapper(NRTCachingDirectory(MMapDirectory@/var/solr/data/debt/data/index
lockFactory=org.apache.lucene.store.NativeFSLockFactory@4746f577;
maxCacheMB=48.0 maxMergeSizeMB=4.0)): files: [write.lock]
  org.apache.lucene.index.IndexNotFoundException: no
segments* file found in
LockValidatingDirectoryWrapper(NRTCachingDirectory(MMapDirectory@/var/solr/data/debt/data/index
lockFactory=org.apache.lucene.store.NativeFSLockFactory@4746f577;
maxCacheMB=48.0 maxMergeSizeMB=4.0)): files: [write.lock]

I am just copying the top portion of the error, as it is very long.

What did I do wrong?

Thanks,

Rhys


Updates via curl and json not showing in api

2019-12-13 Thread rhys J
When I do the following update:

curl http://localhost:8983/solr/debt/update -d '[ {"id": "393291-18625",
"orig_int_amt":{ "set" : "2.5"}, }]'

and then:

curl http://localhost:8983/solr/debt/get?id=393291-18625

I see the document is updated via the command line.

It shows the following:

{
  "doc":
  {
"id":"393291-18625",
"adjust_int":0.0,
"adjust_princ":0.0,
"clt_id":"8032",
"clt_ref_no":"ORD15096 000",
"comments":" ",
"debt_descr":"PO/XREF: 904-132985337, SHIPPER: BERGER AUSTIN",
"debt_id":"393291",
"debt_no":18625,
"debt_type":"COM",
"delq_date":"2015-02-08T00:00:00Z",
"internal_adjustment":0,
"list_date":"2015-01-13T00:00:00Z",
"orig_clt":"8032",
"orig_int_amt":2.5,
"orig_princ_amt":49.3,
"potential_bad_debt":0,
"princ_paid":49.3,
"reference_no":"invoice:ORD15096 000",
"serv_date":"2015-01-09T00:00:00Z",
"status_code":520,
"status_date":"2015-02-20T00:00:00Z",
"storage_account":0,
"time_stamp":"2015-01-13T06:09:00Z",
"_version_":1652822026780409856}}

But when I use the Solr Web API, I get the following:

  {
"id":"393291-18625",
"adjust_int":0.0,
"adjust_princ":0.0,
"clt_id":"8032",
"clt_ref_no":"ORD15096 000",
"comments":" ",
"debt_descr":"PO/XREF: 904-132985337, SHIPPER: BERGER AUSTIN",
"debt_id":"393291",
"debt_no":18625,
"debt_type":"COM",
"delq_date":"2015-02-08T00:00:00Z",
"internal_adjustment":0,
"list_date":"2015-01-13T00:00:00Z",
"orig_clt":"8032",
"orig_int_amt":0.0,
"orig_princ_amt":49.3,
"potential_bad_debt":0,
"princ_paid":49.3,
"reference_no":"invoice:ORD15096 000",
"serv_date":"2015-01-09T00:00:00Z",
"status_code":520,
"status_date":"2015-02-20T00:00:00Z",
"storage_account":0,
"time_stamp":"2015-01-13T06:09:00Z",
"_version_":1652734816636895232},

Notice that orig_int_amt is still 0.0.

How do I get the updates to show in the API?

Is there a step I'm missing?

Thanks,

Rhys


Re: Updates via curl and json not showing in api

2019-12-13 Thread rhys J
On Fri, Dec 13, 2019 at 11:51 AM Shawn Heisey  wrote:


> > Is there a step I'm missing?
>
> It appears that you have not executed a commit that opens a new searcher.
>
>
Thanks for explaining this.

I turned on commit=true, and everything works as expected.

Thanks again,

Rhys


unable to update using empty strings or 'null' in value

2019-12-13 Thread rhys J
When I do the following update:

curl http://localhost:8983/solr/dbtr/update?commit=true -d '[{ "id":
"601000", "agen
t": {"set": "null"},"assign_id": {"set": "320"},"client_group": {"set":
"null"},"credit_class": {"se
t": "null"},"credit_hold": {"set": "null"},"credit_hold_date": {"set":
"null"},"credit_limit": {"set
": "null"},"credit_terms": {"set": "null"},"currency": {"set":
"null"},"data_signature": {"set": "nu
ll"},"debtor_id": {"set": "601000"},"dl1": {"set": "25611"},"dl2": {"set":
"null"},"do_not_call": {"
set": "null"},"do_not_call_date": {"set": "null"},"do_not_report": {"set":
"null"},"in_aris_date": {
"set": "2016-09-01 00:00:00"},"name1": {"set": "60 Grit Studios"},"name2":
{"set": "null"},"next_con
tact_date": {"set": "2018-12-24 00:00:00"},"parent_customer_number":
{"set": "null"},"potential_bad_debt": {"set": "null"},"priority_followup":
{"set": "null"},"reference_no": {"set": "25611"},"report_as": {"set":
"null"},"report_status": {"set": "null"},"risk": {"set":
"null"},"rms_acct_id": {"set": "null"},"salesperson": {"set":
"null"},"ssn1": {"set": "null"},"ssn2": {"set": "null"},"status_code":
{"set": "172"},"status_date": {"set": "2018-10-30 00:00:00"},"timestamp":
{"set": "null"},"tz_offset": {"set": "null"},"warning_item_no": {"set":
"null"},"watch_list": {"set": "null"},"watch_list_date": {"set": "null"},}]'

I get the following error:

 "error":{
"msg":"For input string: \"null\"",
"trace":"java.lang.NumberFormatException: For input string:
\"null\"\n\tat java.base/jdk.interna
l.math.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2054)\n\tat
java.base/jdk.internal.
math.FloatingDecimal.parseFloat(FloatingDecimal.java:122)\n\tat
java.base/java.lang.Float.parseFloat
(Float.java:461)\n\tat
org.apache.solr.schema.IntPointField.toNativeType(IntPointField.java:54)\n\ta
t
org.apache.solr.update.processor.AtomicUpdateDocumentMerger.getNativeFieldValue(AtomicUpdateDocume
ntMerger.java:563)\n\tat
org.apache.solr.update.processor.AtomicUpdateDocumentMerger.doSet(AtomicUpd
ateDocumentMerger.java:436)\n\tat
org.apache.solr.update.processor.AtomicUpdateDocumentMerger.merge(
AtomicUpdateDocumentMerger.java:112)\n\tat
org.apache.solr.update.processor.DistributedUpdateProcess
or.getUpdatedDocument(DistributedUpdateProcessor.java:704)\n\tat
org.apache.solr.update.processor.Di
stributedUpdateProcessor.doVersionAdd(DistributedUpdateProcessor.java:372)\n\tat
org.apache.solr.upd
ate.processor.DistributedUpdateProcessor.lambda$versionAdd$0(DistributedUpdateProcessor.java:337)\n\
tat
org.apache.solr.update.VersionBucket.runWithLock(VersionBucket.java:50)\n\tat
org.apache.solr.up
date.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:337)\n\tat
org.
apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:2
23)\n\tat
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(L
ogUpdateProcessorFactory.java:103)\n\tat
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)\n\tat
org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:475)\n\tat
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)\n\tat
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)\n\tat
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)\n\tat
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)\n\tat
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)\n\tat
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)\n\tat
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)\n\tat
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)\n\tat
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)\n\tat
org.apache.solr.update.processor.FieldNameMutatingUpdateProcessorFactory$1.processAdd(FieldNameMutatingUpdateProcessorFactory.java:75)\n\tat
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)\n\tat
org.apache.solr.update.processor.AbstractDefaultValueUpdateProcessorFactory$DefaultValueUpdateProcessor.processAdd(AbstractDefaultValueUpdateProcessorFactory.java:92)\n\tat
org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:507)\n\tat
org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:145)\n\tat
org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:121)\n\tat
org.apache.s

Re: unable to update using empty strings or 'null' in value

2019-12-16 Thread rhys J
On Mon, Dec 16, 2019 at 2:51 AM Paras Lehana 
wrote:

> Hey Rhys,
>
>
> Short Answer: Try using "set": null and not "set": "null".
>
>
Thank you, this worked!

Rhys


Re: backing up and restoring

2019-12-16 Thread rhys J
On Mon, Dec 16, 2019 at 1:42 AM Paras Lehana 
wrote:

> Looks like a write lock. Did reloading the core fix that? I guess it would
> have been fixed by now. I guess you had run the delete query few moments
> after restoring, no?
>
>
Restoring setting the name parameter only worked the once.

This is my workaround:

run backup command

Delete documents.

Stop solr
Start solr

delete the segment and write.lock files by name.

Copy over the index files from the snapshot to the data/index folder

Start solr

Verify presence of documents via search for *:*

I know it's not pretty, but I have found it works every time.

Thanks,

Rhys


updating documents via csv

2019-12-16 Thread rhys J
Is there a way to update documents already stored in the solr cores via csv?

The reason I am asking is because I am running into a problem with updating
via script with single quotes embedded into the field itself.

Example:

curl http://localhost:8983/solr/dbtr/update?commit=true -d '[{ "id":
"356767", "name1": {"set": "NORTH AMERICAN INT'L"},"name2": {"set": " "}}]'

I have tried the following as well:

curl http://localhost:8983/solr/dbtr/update?commit=true -d '[{ "id":
"356767", "name1": {"set": "NORTH AMERICAN INT\'L"},"name2": {"set": " "}}]'

curl http://localhost:8983/solr/dbtr/update?commit=true -d '[{ "id":
"356767", "name1": {"set": "NORTH AMERICAN INT\\'L"},"name2": {"set": "
"}}]'

curl http://localhost:8983/solr/dbtr/update?commit=true -d '[{ \\"id\\":
\\"356767\\", \\"name1\\": {\\"set\\": \\"NORTH AMERICAN INT\\'L\\"},}]'

All of these break on the single quote embedded in field name1.

Does anyone have any ideas as to what I can do to get around this?

I will also eventually need to get around having an & inside a field too,
but that hasn't come up yet.

Thanks,

Rhys


Re: updating documents via csv

2019-12-17 Thread rhys J
On Mon, Dec 16, 2019 at 11:58 PM Paras Lehana 
wrote:

> Hi Rhys,
>
> I use CDATA for XMLs:
>
>
>  
>
> There should be a similar solution for JSON though I couldn't find the
> specific one on the internet. If you are okay to use XMLs for indexing, you
> can use this.
>
>
We are set on using json, but I figured out how to handle the single quote.

If i use curl " and then single quotes inside, I can escape the single
quote in the field with no problem.

Thanks for the help!

Rhys


Increase Physical Memory in Solr

2020-01-13 Thread rhys J
I am trying to figure out how to increase the physical memory in Solr.

I see how to increase the JVM size, and I've done that. But my physical
memory is 97% out of 7.79G of physical memory, and I'm trying to index a
lot more documents as I move this live.

Is there any documentation that I've missed on how to do this?

Thanks,

Rhys


Re: Increase Physical Memory in Solr

2020-01-13 Thread rhys J
On Mon, Jan 13, 2020 at 3:11 PM Gael Jourdan-Weil <
gael.jourdan-w...@kelkoogroup.com> wrote:

> Hello,
>
> If you are talking about "physical memory" as the bar displayed in Solr
> UI, that is the actual RAM your host have.
> If you need more, you need more RAM, it's not related to Solr.
>
>
Thanks, that helped me understand what is going on.

I am going to ask to increase the RAM of the machine.

Rhys


Trouble adding a csv file - invalid date string error

2020-01-14 Thread rhys J
I am trying to add a csv file while indexing a core.

curl command:

sudo -u solr curl '
http://localhost:8983/solr/dbtraddr/update/csv?commit=true&escape=\&encapsulator=%7C&stream.file=/tmp/csv/dbtrphon_0.csv
'


The header of the csv file:

|id|,|archive|,|contact_type|,|debtor_id|,|descr|,|group_id|,|item_no|,|phone|,|phone_index|,|source|,|source_id|,|status|,|time_stamp|,|type|,|user_id_stamp|


The line that the error is reported on:

|0-1|,|1|,| |,|0|,|Crystal Kolakowski|,|0|,|1|,|
crystal.kolakow...@yellowbook.com|,||,|LEH|,||,|A|,|2010-03-04
09:52:00|,|Email|,|LEH|

managed-schema lines defining the phone field:



managed-schema line defining date field:



The error:

"msg":"ERROR: [doc=0-1] Error adding field 'phone'='
crystal.kolakow...@yellowbook.com' msg=Invalid Date
String:'crystal.kolakow...@yellowbook.com'",
"code":400}}

I am confused because the field it's reporting the error on is not a date
field.

I'm also confused because the rdate field was suggested to me by someone
here, and has worked in all the other indexing operations I've used on
other cores.

Am I missing something key here?

Thanks,

Rhys


Re: Increase Physical Memory in Solr

2020-01-14 Thread rhys J
On Mon, Jan 13, 2020 at 3:42 PM Terry Steichen  wrote:

> Maybe solr isn't using enough of your available memory (a rough check is
> produced by 'solr status'). Do you realize you can start solr with a
> '-m xx' parameter? (for me, xx = 1g)
>
> Terry
>
>
I changed the java_mem field in solr.in.sh, and that also helped the memory
issue.

Thanks

Rhys


Re: Trouble adding a csv file - invalid date string error

2020-01-14 Thread rhys J
I went ahead and adjusted the time_stamp field to be UTC, and that took
care of the problem.

On Tue, Jan 14, 2020 at 10:24 AM rhys J  wrote:

> I am trying to add a csv file while indexing a core.
>
> curl command:
>
> sudo -u solr curl '
> http://localhost:8983/solr/dbtraddr/update/csv?commit=true&escape=\&encapsulator=%7C&stream.file=/tmp/csv/dbtrphon_0.csv
> <http://localhost:8983/solr/dbtraddr/update/csv?commit=true&escape=%5C&encapsulator=%7C&stream.file=/tmp/csv/dbtrphon_0.csv>
> '
>
>
> The header of the csv file:
>
> |id|,|archive|,|contact_type|,|debtor_id|,|descr|,|group_id|,|item_no|,|phone|,|phone_index|,|source|,|source_id|,|status|,|time_stamp|,|type|,|user_id_stamp|
>
>
> The line that the error is reported on:
>
> |0-1|,|1|,| |,|0|,|Crystal Kolakowski|,|0|,|1|,|
> crystal.kolakow...@yellowbook.com|,||,|LEH|,||,|A|,|2010-03-04
> 09:52:00|,|Email|,|LEH|
>
> managed-schema lines defining the phone field:
>
>  required="true" multiValued="false" />
>
> managed-schema line defining date field:
>
> 
>
> The error:
>
> "msg":"ERROR: [doc=0-1] Error adding field 'phone'='
> crystal.kolakow...@yellowbook.com' msg=Invalid Date
> String:'crystal.kolakow...@yellowbook.com'",
> "code":400}}
>
> I am confused because the field it's reporting the error on is not a date
> field.
>
> I'm also confused because the rdate field was suggested to me by someone
> here, and has worked in all the other indexing operations I've used on
> other cores.
>
> Am I missing something key here?
>
> Thanks,
>
> Rhys
>


Failed to connect to server

2020-01-16 Thread rhys J
I have noticed that if I am using curl to index a csv file *and* using curl
thru a script to update the Solr cores, that I get the following error:

curl: (7) Failed to connect to 10.40.10.14 port 8983: Connection refused

Can I only index *or* update, but not do both?

I am not running shards or collections, just a standalone set of cores.

Thanks,

Rhys


Error while updating: java.lang.NumberFormatException: empty String

2020-01-16 Thread rhys J
While updating my Solr core, I ran into a problem with this curl statement.

When I looked up the error, the only reference I could find was that maybe
a float was being added as null. So I changed all the float fields from
'null' to '0.00'. But I still get the error.

Float fields as per the schema:

adjust_int

adjust_princ

cur_bal_original_currency

manual_orig_balance

orig_int_amt

orig_princ_amt

princ_paid

Curl statement:

curl http://localhost:8983/solr/debt/update?commit=true -d "[{ 'id':
'636628-242', 'adjust_int': {'set': '0.00'},'adjust_princ': {'set':
'0.00'},'clt_id': {'set': '3017'},'clt_ref_no': {'set':
'1057-43261-9/128694'},'comments': {'set': ' '},'contract_number': {'set':
'1057-43261-9'},'cur_bal_original_currency': {'set':
'0.00'},'currency_conv': {'set': '0.00'},'debt_descr': {'set': 'PO/XREF:
994042088'},'debt_id': {'set': '636628'},'debt_no': {'set':
'242'},'debt_type': {'set': 'COM'},'delq_date': {'set':
'2020-01-30T00:00:00Z'},'internal_adjustment': {'set':
'0'},'invoice_currency': {'set': null},'last_spreadsheet_date': {'set':
null},'list_date': {'set': '2019-12-31T00:00:00Z'},'manual_change': {'set':
null},'manual_orig_balance': {'set': '0.00'},'orig_clt': {'set':
'3017'},'orig_int_amt': {'set': '0.00'},'orig_princ_amt': {'set':
'480.00'},'original_invoice': {'set': null},'potential_bad_debt': {'set':
'0'},'primary_debtor_id': {'set': null},'princ_paid': {'set':
'0.00'},'reference_no': {'set':
'invoice:1057-43261-9/128694'},'reg_number': {'set': null},'salesperson':
{'set': 'Bob Drummond'},'serv_date': {'set':
'2019-12-31T00:00:00Z'},'shipper_name': {'set': null},'status_code':
{'set': ' '},'status_date': {'set':
'2020-01-16T00:00:00Z'},'storage_account': {'set': '0'},'time_stamp':
{'set': '2019-12-31T23:35:00Z'},}]"


Thanks,


Rhys


Re: Failed to connect to server

2020-01-16 Thread rhys J
On Thu, Jan 16, 2020 at 3:27 PM Edward Ribeiro 
wrote:

> A regular update is a delete followed by an indexing of the document. So
> technically both are indexes. :) If there's an atomic update (
> https://lucene.apache.org/solr/guide/8_4/updating-parts-of-documents.html
> ), Solr would throw some sort of version conflict exception like
>
>
These would have been atomic updates running at the same time I was
importing a csv file into another core.

After the connection errors, I noticed in the log that there was an error
from a curl statement that said 'Error: Solr core is loading'

The connection refused exception does not seem related to the indexing by
> itself. Maybe it has to do with you hitting the maximum connection requests
> allowed per host. See in the link below the maxConnectionsPerHost and
> maxConnections parameters of your Solr version:
>
>
> https://lucene.apache.org/solr/guide/6_6/format-of-solr-xml.html#Formatofsolr.xml-The%3CshardHandlerFactory%3Eelement
>
>
Thank you for this. This was helpful. I have increased the number of
maxConnections to see if this fixes the problem.

Rhys


Re: Error while updating: java.lang.NumberFormatException: empty String

2020-01-16 Thread rhys J
On Thu, Jan 16, 2020 at 3:10 PM Edward Ribeiro 
wrote:

> Hi,
>
> There is a status_code in the JSON snippet and it is going as a string with
> single space. Maybe it is an integer?
>
> Best,
> Edward
>
>
Oh wow, yes you are right. When I adjusted the status_code to not be a
space, it fixed everything.

I had forgotten that status_code was an integer.

It turned out that a database update had an error, and the status_code was
not entered. So my script is now handling whether the status_code is empty,
and adjusting accordingly.

Thanks,

Rhys


Re: Failed to connect to server

2020-01-17 Thread rhys J
On Thu, Jan 16, 2020 at 3:48 PM David Hastings 
wrote:

> >  'Error: Solr core is loading'
>
> do you have any suggesters or anything configured that would get rebuilt?
>
>
> I don't think so? But I'm not quite sure what you are asking?
>

Rhys


Re: Failed to connect to server

2020-01-17 Thread rhys J
On Fri, Jan 17, 2020 at 12:10 PM David Hastings <
hastings.recurs...@gmail.com> wrote:

> something like this in your solr config:
>
>  autosuggest  "exactMatchFirst">false text str> 0.005 
> DocumentDictionaryFactory title  "weightField">weight true  "buildOnOptimize">true 
>
>
I checked both /var/solr/solr/data/solr.xml and
/var/solr/data/CORE/solrconfig.xml, and I did not find this entry.

Thanks,

Rhys