Re: Solr Query Suggestion

2017-03-07 Thread vrindavda
Hi Emir,Grouping is exactly what I wanted to achieve. Thanks !!Thank
you,Vrinda Davda



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Query-Suggestion-tp4323180p4323743.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Learning to rank - Bad Request

2017-03-07 Thread Vincent

Hi Christine,

Thanks for the reply!

I suppose something in our config doens't comply with the LTR plugin. If 
I browse to http://[HOST]:[PORT]/solr/[COLLECTION]/schema/feature-store, 
where I upload the features to, the browser can't find the page:


*Not Found*
No REST managed resource registered for path /schema/feature-store
...

So it seems that there /is /no feature endpoint? I suspect that maybe 
our config doesn't apply necessary solr-plugins for LTR or something 
similar, despite the -Dsolr.ltr.enabled=true parameter.


Any advice would be appreciated!

Thanks,

Vincent


On 06-03-17 21:18, Christine Poerschke (BLOOMBERG/ LONDON) wrote:

Hi Vincent,

Would you be comfortable sharing (redacted) details of the exact upload command 
you used and (redacted) extracts of the features json file that gave the upload 
error?

Two things I have encountered commonly myself:
* uploading features to the model endpoint or model to the feature endpoint
* forgotten double-quotes around the numbers in MultipleAdditiveTreesModel json

Regards,
Christine

- Original Message -
From: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org
At: 03/06/17 13:22:40

Hi all,

I've been trying to get learning to rank working on our own search
index. Following the LTR-readme
(https://github.com/bloomberg/lucene-solr/blob/master-ltr/solr/contrib/ltr/example/README.md)
I ran the example python script to train and upload the model, but I
already get an error during the uploading of the features:

Bad Request (400) - Expected Map to create a new ManagedResource but
received a java.util.ArrayList
  at
org.apache.solr.rest.RestManager$RestManagerManagedResource.doPut(RestManager.java:523)
  at
org.apache.solr.rest.ManagedResource.doPost(ManagedResource.java:355)
  at
org.apache.solr.rest.RestManager$ManagedEndpoint.post(RestManager.java:351)
  at
org.restlet.resource.ServerResource.doHandle(ServerResource.java:454)
  ...

This makes sense: the json feature file is an array, and the RestManager
needs a Map in doPut.

Using the curl command from the cwiki
(https://cwiki.apache.org/confluence/display/solr/Learning+To+Rank)
yields the same error, but instead of it having "received a
java.util.ArrayList" it "received a java.lang.String".

I wonder how this actually is supposed to work, and what's going wrong
in this case. I have tried the LTR with the default techproducts
example, and that worked just fine. Does anyone have an idea of what's
going wrong here?

Thanks in advance!
Vincent





How to enable Gzip compression in Solr v6.1.0 with Jetty 9.3.8.v20160314

2017-03-07 Thread Gul Imran
Hi
I am trying to upgrade Solr from v5.3 to v6.1.0 which comes with Jetty 
9.3.8.v20160314.  However, after the upgrade we seem to have lost Gzip 
compression capability since we still have the old configuration.  When I send 
the following request with the appropriate headers, I do not get a gzipped 
response:
curl -H "Accept-Encoding: gzip,deflate" 
"http://localhost:8983/solr/myApiAlias/select?wt=json&q=uuid:%22146c521c-9966-4f0a-94f9-465cd847b921%22&group=true&group.ngroups=true&group.field=uuid&group.limit=1&sort=start+asc,definition+asc,id+asc&start=0&rows=5";

I should be expecting the "Content-Encoding: gzip" header in the response.  
However, I get the following response:
< HTTP/1.1 200 OK< Content-Type: application/json; charset=UTF-8< 
Content-Length: 393
Here is how the previous configuration was for enabling configuration:
dir:  /opt/solr/server/contexts/solr-jetty-context.xml

--Solr v5.3 
configuration---http://www.eclipse.org/jetty/configure_9_0.dtd";>    /solr-webapp/webapp  /etc/webdefault.xml  false    org.eclipse.jetty.servlets.GzipFilter    
/*                    
          
                  
mimetypes        
text/html,text/xml,text/plain,text/css,text/javascript,text/json,application/x-javascript,application/javascript,application/json,application/xml,application/xml+xhtml,image/svg+xml
                methods        
GET,POST    
I have modified this configuration to use the GzipHandler 
(http://www.eclipse.org/jetty/documentation/9.3.x/gzip-filter.html) and updated 
solr-jetty-context.xml as follows:
Solr v6.1.0 
configuration-http://www.eclipse.org/jetty/configure_9_0.dtd";>    /solr-webapp/webapp  /etc/webdefault.xml  false                                /*     
                               
              text/html            
text/xml            text/plain            
text/css            text/javascript            
text/json            application/x-javascript         
   application/javascript            application/json 
           application/xml            
application/xml+xhtml            image/svg+xml        
                              
          GET          POST      
              
However, when I restart Solr v6.1.0 with the new configuration, it does not 
show any errors in the logs but the application becomes unavailable and all 
requests return 404 Not Found response code.
I have also tried the suggestion posted on Stack Overflow 
(http://stackoverflow.com/questions/30391741/gzip-compression-not-working-in-solr-5-1).
  However, as this is not for Solr v6.1.0, it fails to work as well.
Wondering if someone can please provide a way to configure gzip compression 
with Solr+Jetty installation.  Many thanks.
Kind regards,
Gul



Solr Update If Record Exists ?

2017-03-07 Thread ~$alpha`
*SOLR_URL/update -d \'
[
 {"id" : "1",
  "ONLINE" : {"set":"1"}
 }
]'*

I am using solr6.3. Above command works fine as it updates online flag to 1
for id=1. But the issue is if the record is not present then it adds a value
as id=1 and online=1 which is not desired.

So question is, is it possible that so updates the value only if the record
is present in the solr.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Update-If-Record-Exists-tp4323767.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Learning to rank - Bad Request

2017-03-07 Thread Vincent

Update: solved, I missed some config steps.

Thanks for the help,

Vincent

On 07-03-17 12:20, Vincent wrote:

Hi Christine,

Thanks for the reply!

I suppose something in our config doens't comply with the LTR plugin. 
If I browse to 
http://[HOST]:[PORT]/solr/[COLLECTION]/schema/feature-store, where I 
upload the features to, the browser can't find the page:


*Not Found*
No REST managed resource registered for path /schema/feature-store
...

So it seems that there /is /no feature endpoint? I suspect that maybe 
our config doesn't apply necessary solr-plugins for LTR or something 
similar, despite the -Dsolr.ltr.enabled=true parameter.


Any advice would be appreciated!

Thanks,

Vincent


On 06-03-17 21:18, Christine Poerschke (BLOOMBERG/ LONDON) wrote:

Hi Vincent,

Would you be comfortable sharing (redacted) details of the exact 
upload command you used and (redacted) extracts of the features json 
file that gave the upload error?


Two things I have encountered commonly myself:
* uploading features to the model endpoint or model to the feature 
endpoint
* forgotten double-quotes around the numbers in 
MultipleAdditiveTreesModel json


Regards,
Christine

- Original Message -
From: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org
At: 03/06/17 13:22:40

Hi all,

I've been trying to get learning to rank working on our own search
index. Following the LTR-readme
(https://github.com/bloomberg/lucene-solr/blob/master-ltr/solr/contrib/ltr/example/README.md) 


I ran the example python script to train and upload the model, but I
already get an error during the uploading of the features:

Bad Request (400) - Expected Map to create a new ManagedResource but
received a java.util.ArrayList
  at
org.apache.solr.rest.RestManager$RestManagerManagedResource.doPut(RestManager.java:523) 


  at
org.apache.solr.rest.ManagedResource.doPost(ManagedResource.java:355)
  at
org.apache.solr.rest.RestManager$ManagedEndpoint.post(RestManager.java:351) 


  at
org.restlet.resource.ServerResource.doHandle(ServerResource.java:454)
  ...

This makes sense: the json feature file is an array, and the RestManager
needs a Map in doPut.

Using the curl command from the cwiki
(https://cwiki.apache.org/confluence/display/solr/Learning+To+Rank)
yields the same error, but instead of it having "received a
java.util.ArrayList" it "received a java.lang.String".

I wonder how this actually is supposed to work, and what's going wrong
in this case. I have tried the LTR with the default techproducts
example, and that worked just fine. Does anyone have an idea of what's
going wrong here?

Thanks in advance!
Vincent








Re: I want to contribute custom made NLP based solr filters but dont know how.

2017-03-07 Thread Avtar Singh Mehra
Well i have created some filters using Apache OpenNLP. Will it work?

On 6 March 2017 at 00:30, Joel Bernstein  wrote:

> I believe StanfordCore is licensed under the GPL which means it will be
> incompatible with the Apache License. Would it be possible to port to a
> different NLP library?
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Sun, Mar 5, 2017 at 12:14 PM, Erick Erickson 
> wrote:
>
> > Well, you've taken the first step ;).
> >
> > Start by going here: https://issues.apache.org/jira/browse/SOLR/ and
> > creating a logon and a JIRA.
> >
> > NOTE: Before you go to the trouble of creating a patch, it's perfectly
> > OK to do a high-level overview of the approach you used and see what
> > the feedback is. It'll be a short discussion if the licensing is
> > incompatible for instance ;).
> >
> > After that, be ready for some discussion back and forth, reviews and
> > the like and we'll see where this goes.
> >
> > Best,
> > Erick
> >
> > On Sun, Mar 5, 2017 at 4:40 AM, Avtar Singh Mehra 
> > wrote:
> > > Hello everyone,
> > > I have developed project called WiseOwl which is basically a fact based
> > > question answering system which can be accessed at :
> > > https://github.com/asmehra95/wiseowl
> > >
> > > In the process of making the project work i have developed pluggable
> solr
> > > filters optimised for solr 6.3.0.
> > > I would like to donate them to solr.
> > > 1. *WiseOwlStanford Filter* :It uses StanfordCoreNLP to tag named
> > entities
> > > and it also normalises Dates during indexing or searching.
> DEmonstration
> > > screenshots are available on the github profile. But i don't know how
> to
> > > donate them.
> > >
> > > If there is a way then please let me know. As it may be useful for
> anyone
> > > doing natural language processing.
> >
>


Re: I want to contribute custom made NLP based solr filters but dont know how.

2017-03-07 Thread Joel Bernstein
Yes, I think Apache OpenNLP should be fine.

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Mar 7, 2017 at 8:09 AM, Avtar Singh Mehra 
wrote:

> Well i have created some filters using Apache OpenNLP. Will it work?
>
> On 6 March 2017 at 00:30, Joel Bernstein  wrote:
>
> > I believe StanfordCore is licensed under the GPL which means it will be
> > incompatible with the Apache License. Would it be possible to port to a
> > different NLP library?
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Sun, Mar 5, 2017 at 12:14 PM, Erick Erickson  >
> > wrote:
> >
> > > Well, you've taken the first step ;).
> > >
> > > Start by going here: https://issues.apache.org/jira/browse/SOLR/ and
> > > creating a logon and a JIRA.
> > >
> > > NOTE: Before you go to the trouble of creating a patch, it's perfectly
> > > OK to do a high-level overview of the approach you used and see what
> > > the feedback is. It'll be a short discussion if the licensing is
> > > incompatible for instance ;).
> > >
> > > After that, be ready for some discussion back and forth, reviews and
> > > the like and we'll see where this goes.
> > >
> > > Best,
> > > Erick
> > >
> > > On Sun, Mar 5, 2017 at 4:40 AM, Avtar Singh Mehra  >
> > > wrote:
> > > > Hello everyone,
> > > > I have developed project called WiseOwl which is basically a fact
> based
> > > > question answering system which can be accessed at :
> > > > https://github.com/asmehra95/wiseowl
> > > >
> > > > In the process of making the project work i have developed pluggable
> > solr
> > > > filters optimised for solr 6.3.0.
> > > > I would like to donate them to solr.
> > > > 1. *WiseOwlStanford Filter* :It uses StanfordCoreNLP to tag named
> > > entities
> > > > and it also normalises Dates during indexing or searching.
> > DEmonstration
> > > > screenshots are available on the github profile. But i don't know how
> > to
> > > > donate them.
> > > >
> > > > If there is a way then please let me know. As it may be useful for
> > anyone
> > > > doing natural language processing.
> > >
> >
>


Tokenized querying

2017-03-07 Thread OTH
Hello,

I am new to Solr.  I am using v. 6.4.1.  I have what is probably a pretty
simple question.

Let's say I have these documents with the following values in a single
field (let's call it "name"):

sando...@company.example.com
sandb...@company.example.com
sa...@company.example.com
Sancho Landolt
Sanders Greenley
Sanders Massey
Santa Catarina
San Carlos de Bariloche
San Francisco
San Mateo

I would like, if the search query is "San", for Solr to return the
following and only the following:
San Carlos de Bariloche
San Francisco
San Mateo

So basically, I'd like to search based on tokens.  I'd also like Solr to
return an associated score.  So eg, if the user searches "San Francisco",
it should still return the above results, but obviously the score for the
document with "San Francisco" would be much higher.

I've been doing this pretty easily using Lucene from Java, however I'm
unable to figure out how to do it using Solr.

Much thanks


[Migration Solr5 to Solr6] Unwanted deleted files references

2017-03-07 Thread Elodie Sannier

Hello,

We have migrated from Solr 5.4.1 to Solr 6.4.0 and the disk usage has increased.
We found hundreds of references to deleted index files being held by solr.
Before the migration, we had 15-30% of disk space used, after the migration we 
have 60-90% of disk space used.

We are using Solr Cloud with 2 collections.

The commands applied on the collections are:
- for incremental indexation mode: add, deleteById with commitWithin of 30 
minutes
- for full indexation mode: add, deleteById, commit
- for switch between incremental and full mode: deleteByQuery, createAlias, 
reload
- there is also an autocommit every 15 minutes

We have seen the email "Solr leaking references to deleted files"  2016-05-31 
which describe the same problem but the mentioned bugs are fixed.

We manually tried to force a commit, a reload and an optimize on the 
collections without effect.

Is a problem of configuration (merge / delete policy) or a possible regression 
in the Solr code ?

Thank you


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 158 Ter Rue du Temple 75003 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


Re: [Migration Solr5 to Solr6] Unwanted deleted files references

2017-03-07 Thread Erick Erickson
Just as a sanity check, if you restart the Solr JVM, do the files
disappear from disk?

Do you have any custom code anywhere in this chain? If so, do you open
any searchers but
fail to close them? Although why 6.4 would manifest the problem but
other code wouldn't
is a mystery, just another sanity check.

Best,
Erick

On Tue, Mar 7, 2017 at 6:44 AM, Elodie Sannier  wrote:
> Hello,
>
> We have migrated from Solr 5.4.1 to Solr 6.4.0 and the disk usage has
> increased.
> We found hundreds of references to deleted index files being held by solr.
> Before the migration, we had 15-30% of disk space used, after the migration
> we have 60-90% of disk space used.
>
> We are using Solr Cloud with 2 collections.
>
> The commands applied on the collections are:
> - for incremental indexation mode: add, deleteById with commitWithin of 30
> minutes
> - for full indexation mode: add, deleteById, commit
> - for switch between incremental and full mode: deleteByQuery, createAlias,
> reload
> - there is also an autocommit every 15 minutes
>
> We have seen the email "Solr leaking references to deleted files"
> 2016-05-31 which describe the same problem but the mentioned bugs are fixed.
>
> We manually tried to force a commit, a reload and an optimize on the
> collections without effect.
>
> Is a problem of configuration (merge / delete policy) or a possible
> regression in the Solr code ?
>
> Thank you
>
>
> Kelkoo SAS
> Société par Actions Simplifiée
> Au capital de € 4.168.964,30
> Siège social : 158 Ter Rue du Temple 75003 Paris
> 425 093 069 RCS Paris
>
> Ce message et les pièces jointes sont confidentiels et établis à l'attention
> exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce
> message, merci de le détruire et d'en avertir l'expéditeur.


RE: Getting an error: was indexed without position data; cannot run PhraseQuery

2017-03-07 Thread Pouliot, Scott
Welcome to IT right?  We're always in some sort of pickle  ;-)  I'm going to 
play with settings on one of our internal environments and see if I can 
replicate the issue and go from there with some test fixes.

Here's a question though...  If I need to re-indexcould I do it on another 
instance running the same SOLR version (4.8.0) and then copy the database into 
place instead?  We're using some crappy custom Groovy script run through Aspire 
to do our indexing and it's horribly slow.  50GB would take at least a 
day...maybe 2 and I obviously can't have a client down for that long in 
Production, but if I did it on a backup SOLR boxcopying 50GB into place is 
much much quicker.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Monday, March 6, 2017 8:48 PM
To: solr-user 
Subject: Re: Getting an error:  was indexed without position data; 
cannot run PhraseQuery

You're in a pickle then. If you change the definition you need to re-index.

But you claim you haven't changed anything in years as far as the schema is 
concerned so maybe you're going to get lucky ;).

The error you reported is because somehow there's a phrase search going on 
against this field. You could have changed something in the query parsers or 
eDismax definitions or the query generated on the app side to have  phrase 
query get through. I'm not quite sure if you'll get information back when the 
query fails, but try adding &debug=query to the URL and see what the 
parsed_query and parsed_query_toString() to see where phrases are getting 
generated.

Best,
Erick

On Mon, Mar 6, 2017 at 5:26 PM, Pouliot, Scott  
wrote:
> Hmm.  We haven’t changed data or the definition in YEARS now.  I'll 
> have to do some more digging I guess.  Not sure re-indexing is a great 
> thing to do though since this is a production setup and the database 
> for this user is @ 50GB.  It would take quite a long time to reindex 
> all that data from scratch.  H
>
> Thanks for the quick reply Erick!
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Monday, March 6, 2017 5:33 PM
> To: solr-user 
> Subject: Re: Getting an error:  was indexed without position 
> data; cannot run PhraseQuery
>
> Usually an _s field is a "string" type, so be sure you didn't change the 
> definition without completely re-indexing. In fact I generally either index 
> to a new collection or remove the data directory entirely.
>
> right, the field isn't indexed with position information. That combined with 
> (probably) the WordDelimiterFilterFactory in text_en_splitting is generating 
> multiple tokens for inputs like 3799H.
> See the admin/analysis page for how that gets broken up. Term positions are 
> usually enable by default, so I'm not quite sure why they're gone unless you 
> disabled them.
>
> But you're on the right track regardless. you have to
> 1> include term positions for anything that generates phrase queries
> or
> 2> make sure you don't generate phrase queries. edismax can do this if
> you have it configured to, and then there's autoGeneratePhrasQueries that you 
> may find.
>
> And do reindex completely from scratch if you change the definitions.
>
> Best,
> Erick
>
> On Mon, Mar 6, 2017 at 1:41 PM, Pouliot, Scott 
>  wrote:
>> We keep getting this in our Tomcat/SOLR Logs and I was wondering if a simple 
>> schema change will alleviate this issue:
>>
>> INFO  - 2017-03-06 07:26:58.751; org.apache.solr.core.SolrCore; 
>> [Client_AdvanceAutoParts] webapp=/solr path=/select 
>> params={fl=candprofileid,+candid&start=0&q=*:*&wt=json&fq=issearchable:1+AND+cpentitymodifiedon:[2017-01-20T00:00:00.000Z+TO+*]+AND+clientreqid:17672+AND+folderid:132+AND+(engagedid_s:(0)+AND+atleast21_s:(1))+AND+(preferredlocations_s:(3799H))&rows=1000}
>>  status=500 QTime=1480 ERROR - 2017-03-06 07:26:58.766; 
>> org.apache.solr.common.SolrException; null:java.lang.IllegalStateException: 
>> field "preferredlocations_s" was indexed without position data; cannot run 
>> PhraseQuery (term=3799)
>> at 
>> org.apache.lucene.search.PhraseQuery$PhraseWeight.scorer(PhraseQuery.java:277)
>> at 
>> org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:351)
>> at 
>> org.apache.lucene.search.Weight.bulkScorer(Weight.java:131)
>> at 
>> org.apache.lucene.search.BooleanQuery$BooleanWeight.bulkScorer(BooleanQuery.java:313)
>> at 
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618)
>> at 
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
>> at 
>> org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:1158)
>> at 
>> org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:846)
>> at 
>> org.apache.solr.search.SolrIndexSearcher.getProcessedFilter(SolrIndexSearcher.j

Re: Tokenized querying

2017-03-07 Thread Alexandre Rafalovitch
The default text field definition (text_general) tokenizes on spaces,
so - if I understand the question correctly - it should just work. Are
you by any chance searching against name field that is defined as
String (and is not tokenized).

If you do Solr tutorial, you search on "ipod", which seems like a
similar case to me. So, can you start from there? You can just index
your own text into the example config for example.

Regards,
   Alex.
P.s. If you are coming from Lucene, copyField instruction may be
slightly confusing. In the examples provided, your text is copied from
named specific fields to text/_text_ field which is actually the
default field searched, using the type definition associated with that
text/_text_ field, rather than with the original field.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 7 March 2017 at 09:30, OTH  wrote:
> Hello,
>
> I am new to Solr.  I am using v. 6.4.1.  I have what is probably a pretty
> simple question.
>
> Let's say I have these documents with the following values in a single
> field (let's call it "name"):
>
> sando...@company.example.com
> sandb...@company.example.com
> sa...@company.example.com
> Sancho Landolt
> Sanders Greenley
> Sanders Massey
> Santa Catarina
> San Carlos de Bariloche
> San Francisco
> San Mateo
>
> I would like, if the search query is "San", for Solr to return the
> following and only the following:
> San Carlos de Bariloche
> San Francisco
> San Mateo
>
> So basically, I'd like to search based on tokens.  I'd also like Solr to
> return an associated score.  So eg, if the user searches "San Francisco",
> it should still return the above results, but obviously the score for the
> document with "San Francisco" would be much higher.
>
> I've been doing this pretty easily using Lucene from Java, however I'm
> unable to figure out how to do it using Solr.
>
> Much thanks


Re: Solr Update If Record Exists ?

2017-03-07 Thread Alexandre Rafalovitch
Try adding: _version_:1, as per Optimistic Concurrency feature:
https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-OptimisticConcurrency

This should work for a single document update. If you are trying to
update multiple document, this is slightly more complicated as there
will be rejections/exceptions generated. Perhaps, it can be combined
with TollerantUpdate URP:
http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/update/processor/TolerantUpdateProcessorFactory.html

Regards,
   Alex.
P.s. SOLR-9530 may also be interesting, though its primary use case is
somewhat different

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 7 March 2017 at 07:30, ~$alpha`  wrote:
> *SOLR_URL/update -d \'
> [
>  {"id" : "1",
>   "ONLINE" : {"set":"1"}
>  }
> ]'*
>
> I am using solr6.3. Above command works fine as it updates online flag to 1
> for id=1. But the issue is if the record is not present then it adds a value
> as id=1 and online=1 which is not desired.
>
> So question is, is it possible that so updates the value only if the record
> is present in the solr.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Update-If-Record-Exists-tp4323767.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: [Migration Solr5 to Solr6] Unwanted deleted files references

2017-03-07 Thread Alexandre Rafalovitch
More sanity checks: what are the extensions/types of the files that
are not deleted?

If they are index files, optimize command (even if no longer
recommended for production) should really blow all the old ones away.
So, are they other kinds of files?

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 7 March 2017 at 09:55, Erick Erickson  wrote:
> Just as a sanity check, if you restart the Solr JVM, do the files
> disappear from disk?
>
> Do you have any custom code anywhere in this chain? If so, do you open
> any searchers but
> fail to close them? Although why 6.4 would manifest the problem but
> other code wouldn't
> is a mystery, just another sanity check.
>
> Best,
> Erick
>
> On Tue, Mar 7, 2017 at 6:44 AM, Elodie Sannier  
> wrote:
>> Hello,
>>
>> We have migrated from Solr 5.4.1 to Solr 6.4.0 and the disk usage has
>> increased.
>> We found hundreds of references to deleted index files being held by solr.
>> Before the migration, we had 15-30% of disk space used, after the migration
>> we have 60-90% of disk space used.
>>
>> We are using Solr Cloud with 2 collections.
>>
>> The commands applied on the collections are:
>> - for incremental indexation mode: add, deleteById with commitWithin of 30
>> minutes
>> - for full indexation mode: add, deleteById, commit
>> - for switch between incremental and full mode: deleteByQuery, createAlias,
>> reload
>> - there is also an autocommit every 15 minutes
>>
>> We have seen the email "Solr leaking references to deleted files"
>> 2016-05-31 which describe the same problem but the mentioned bugs are fixed.
>>
>> We manually tried to force a commit, a reload and an optimize on the
>> collections without effect.
>>
>> Is a problem of configuration (merge / delete policy) or a possible
>> regression in the Solr code ?
>>
>> Thank you
>>
>>
>> Kelkoo SAS
>> Société par Actions Simplifiée
>> Au capital de € 4.168.964,30
>> Siège social : 158 Ter Rue du Temple 75003 Paris
>> 425 093 069 RCS Paris
>>
>> Ce message et les pièces jointes sont confidentiels et établis à l'attention
>> exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce
>> message, merci de le détruire et d'en avertir l'expéditeur.


RE: Solrcloud after restore collection, when index new documents into restored collection, leader not write to index.

2017-03-07 Thread Marquiss, John
Just another bit of information supporting the thought that this has to 
recycling the searcher when there is a change to the index directory that is 
named something other than "index".

Running our tests again, this time after restring the content I shut down solr 
and renamed the two "restore.#" directories to "index" and updated 
index.properties to reflect this. After restarting Solr the collection searched 
correctly and immediately reflected index updates in search results following 
commit.

I see two possible solutions for this:

1) Modify the restore process so that it copies index files into a directory 
named "index" instead of "restore.#". This is probably easy but it 
doesn't actually fix the root problem. Something isn't respecting the path in 
index.properties to recycle the searcher after commit.

2) Fix and find the code to create a new searcher to watch the path in 
index.properties instead of specifically looking for "index". This may be 
harder to find but it fixes the root problem.

We are more than willing to try to fix this if someone could suggest where we 
could start looking into the source to find this.

John Marquiss

>-Original Message-
>From: Marquiss, John [mailto:john.marqu...@wolterskluwer.com] 
>Sent: Monday, March 6, 2017 9:39 PM
>To: solr-user@lucene.apache.org
>Subject: RE: Solrcloud after restore collection, when index new documents into 
>restored collection, leader not write to index.
>
>I couldn't find an issue for this in JIRA so I thought I would add some of our 
>own findings here... We are seeing the same problem with the Solr 6 >Restore 
>functionality. While I do not think it is important it happens on both our 
>Linux environments and our local Windows development >environments. Also, from 
>our testing, I do not think it has anything to do with actual indexing (if you 
>notice in the order of my test steps documents >appear in replicas after 
>creation, without re-indexing).
>
>Test Environment:
>•  Windows 10 (we see the same behavior on Linux as well)
>•  Java 1.8.0_121
>•  Solr 6.3.0 with patch for SOLR-9527 (To fix RESTORE shard distribution 
>and add createNodeSet to RESTORE)
>•  1 Zookeeper node running on localhost:2181
>•  3 Solr nodes running on localhost:8171, localhost:8181 and 
>localhost:8191 (hostname NY07LP521696)
>
>Test and observations:
>1) Create a 2 shard collection 'test'
>   
> http://localhost:8181/solr/admin/collections?>action=CREATE&name=test&numShards=2&replicationFactor=1&maxShardsPerNode=1&collection.configName=testconf&&createNodeSet=NY07LP>521696:8171_solr,NY07LP521696:8181_solr
>
>2) Index 7 documents to 'test'
>3) Search 'test' - result count 7
>4) Backup collection 'test'
>   
> http://localhost:8181/solr/admin/collections?action=BACKUP&collection=test&name=copy&location=%2FData%2Fsolr%2Fbkp&async=1234
>
>5) Restore 'test' to collection 'test2'
>   
> http://localhost:8191/solr/admin/collections?action=RESTORE&name=copy&location=%2FData%2Fsolr%>2Fbkp&collection=test2&async=1234&maxShardsPerNode=1&createNodeSet=NY07LP521696:8181_solr,NY07LP521696:8191_solr
>
>6) Search 'test2' - result count 7
>7) Index 2 new documents to 'test2'
>8) Search 'test2' - result count 7 (new documents do not appear in results)
>9) Create a replica for each of the shards of 'test2'
>   
> http://localhost:8191/solr/admin/collections?action=ADDREPLICA&collection=test2&shard=shard1&node=NY07LP521696:8181_solr
>   
> http://localhost:8191/solr/admin/collections?action=ADDREPLICA&collection=test2&shard=shard2&node=NY07LP521696:8171_solr
>
>*** Note that it is not necessary to try to re-index the 2 new documents 
>before this step, just create replicas and query ***
>10)Repeatedly query 'test2' - result count randomly changes between 7, 8 
>and 9. This is because Solr is randomly selecting replicas of 'test2' and >one 
>of the two new docs were added to each of the shards in the collection so if 
>replica0 of both shards are selected the result is 7, if replica0 and 
>>replica1 are selected for each of either shard the result is 8 and if 
>replica1 is selected for both shards the result is 9. This is random behavior 
>because >we do not know ahead of time which shards the new documents will be 
>added to and if they will be split evenly.
>
>   Query 'test2' with shards parameter of original restored shards - 
> result count 7
>   
> http://localhost:8181/solr/test2/select?q=*:*&shards=localhost:8181/solr/test2_shard1_replica0,localhost:8181/solr/test2_shard2_replica0
>
>   Query 'test2' with shard parameter of one original restored shard and 
> one replica shard - result count 8
>   
> http://localhost:8181/solr/test2/select?q=*:*&shards=localhost:8181/solr/test2_shard1_replica0,localhost:8181/solr/test2_shard2_replica1
>   
> http://localhost:8181/solr/test2/select?q=*:*&shards=localhost:8181/solr/test2_shard1_replica1,lo

Re: Getting an error: was indexed without position data; cannot run PhraseQuery

2017-03-07 Thread Erick Erickson
First, it's not clear whether you're using SolrCloud or not, so there
may be some irrelevant info in here

bq: .could I do it on another instance running the same SOLR version
(4.8.0) and then copy the database into place instead

In a word "yes", if you're careful. Assuming you have more than one
shard you have to be sure to copy the shards faithfully. By that I
mean look at your admin UI>>cloud>>tree>>(clusterstate.json or
>>collection>>state.json). You'll see a bunch of information for each
replica but the critical bit is that the hash range should be the same
for the source and destination. It'll be something like
0x-0x7fff for one shard (each replica on a shard has the
same hash range). etc.

The implication of course is that both collections need to have the
same number of shards.

If you don't have any shards, don't worry about it...

Another possibility, depending on your resources is to create another
collection with the same number of shards and index to _that_. Then
use the Collections API CREATEALIAS command to atomically switch. This
assumes you have enough extra capacity that you can do the reindexing
without unduly impacting prod.

And there are a number of variants on this.
> index to a leader-only collection
> during a small maintenance window you shut down prod and ADDREPLICA for all 
> the shards to build out your new collection
> blow away your old collection when you're comfortable.

But the bottom line is that indexes may be freely copied wherever you
want as long as the bookkeeping is respected wrt hash ranges. I used
to build Lucene indexes on a Windows box and copy it to a Unix server
as long as I used binary copy

Best,
Erick

On Tue, Mar 7, 2017 at 7:04 AM, Pouliot, Scott
 wrote:
> Welcome to IT right?  We're always in some sort of pickle  ;-)  I'm going to 
> play with settings on one of our internal environments and see if I can 
> replicate the issue and go from there with some test fixes.
>
> Here's a question though...  If I need to re-indexcould I do it on 
> another instance running the same SOLR version (4.8.0) and then copy the 
> database into place instead?  We're using some crappy custom Groovy script 
> run through Aspire to do our indexing and it's horribly slow.  50GB would 
> take at least a day...maybe 2 and I obviously can't have a client down for 
> that long in Production, but if I did it on a backup SOLR boxcopying 50GB 
> into place is much much quicker.
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Monday, March 6, 2017 8:48 PM
> To: solr-user 
> Subject: Re: Getting an error:  was indexed without position data; 
> cannot run PhraseQuery
>
> You're in a pickle then. If you change the definition you need to re-index.
>
> But you claim you haven't changed anything in years as far as the schema is 
> concerned so maybe you're going to get lucky ;).
>
> The error you reported is because somehow there's a phrase search going on 
> against this field. You could have changed something in the query parsers or 
> eDismax definitions or the query generated on the app side to have  phrase 
> query get through. I'm not quite sure if you'll get information back when the 
> query fails, but try adding &debug=query to the URL and see what the 
> parsed_query and parsed_query_toString() to see where phrases are getting 
> generated.
>
> Best,
> Erick
>
> On Mon, Mar 6, 2017 at 5:26 PM, Pouliot, Scott 
>  wrote:
>> Hmm.  We haven’t changed data or the definition in YEARS now.  I'll
>> have to do some more digging I guess.  Not sure re-indexing is a great
>> thing to do though since this is a production setup and the database
>> for this user is @ 50GB.  It would take quite a long time to reindex
>> all that data from scratch.  H
>>
>> Thanks for the quick reply Erick!
>>
>> -Original Message-
>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>> Sent: Monday, March 6, 2017 5:33 PM
>> To: solr-user 
>> Subject: Re: Getting an error:  was indexed without position
>> data; cannot run PhraseQuery
>>
>> Usually an _s field is a "string" type, so be sure you didn't change the 
>> definition without completely re-indexing. In fact I generally either index 
>> to a new collection or remove the data directory entirely.
>>
>> right, the field isn't indexed with position information. That combined with 
>> (probably) the WordDelimiterFilterFactory in text_en_splitting is generating 
>> multiple tokens for inputs like 3799H.
>> See the admin/analysis page for how that gets broken up. Term positions are 
>> usually enable by default, so I'm not quite sure why they're gone unless you 
>> disabled them.
>>
>> But you're on the right track regardless. you have to
>> 1> include term positions for anything that generates phrase queries
>> or
>> 2> make sure you don't generate phrase queries. edismax can do this if
>> you have it configured to, and then there's autoGeneratePhrasQueries that 
>> you 

Re: Solrcloud after restore collection, when index new documents into restored collection, leader not write to index.

2017-03-07 Thread Erick Erickson
John:

Just skimming, but this certainly seems like it merits a JIRA, please
feel free to create one (you may have to create your own logon first).
Please include the steps for the test you did where new replicas "see"
the restored index. And this last where you hand edited things is
important.

The only other question I'd have is whether you saw anything odd in
the logs. I'm no expert in this functionality, just covering the
possibility that for some reason the restore didn't finish
successfully even though all the files appear to be copied back.

I don't have any bandwidth to tackle this, but a JIRA will preserve it
for others to look at.

Thanks for all your research on this!

Erick

On Tue, Mar 7, 2017 at 7:44 AM, Marquiss, John
 wrote:
> Just another bit of information supporting the thought that this has to 
> recycling the searcher when there is a change to the index directory that is 
> named something other than "index".
>
> Running our tests again, this time after restring the content I shut down 
> solr and renamed the two "restore.#" directories to "index" and 
> updated index.properties to reflect this. After restarting Solr the 
> collection searched correctly and immediately reflected index updates in 
> search results following commit.
>
> I see two possible solutions for this:
>
> 1) Modify the restore process so that it copies index files into a directory 
> named "index" instead of "restore.#". This is probably easy but 
> it doesn't actually fix the root problem. Something isn't respecting the path 
> in index.properties to recycle the searcher after commit.
>
> 2) Fix and find the code to create a new searcher to watch the path in 
> index.properties instead of specifically looking for "index". This may be 
> harder to find but it fixes the root problem.
>
> We are more than willing to try to fix this if someone could suggest where we 
> could start looking into the source to find this.
>
> John Marquiss
>
>>-Original Message-
>>From: Marquiss, John [mailto:john.marqu...@wolterskluwer.com]
>>Sent: Monday, March 6, 2017 9:39 PM
>>To: solr-user@lucene.apache.org
>>Subject: RE: Solrcloud after restore collection, when index new documents 
>>into restored collection, leader not write to index.
>>
>>I couldn't find an issue for this in JIRA so I thought I would add some of 
>>our own findings here... We are seeing the same problem with the Solr 6 
>>>Restore functionality. While I do not think it is important it happens on 
>>both our Linux environments and our local Windows development >environments. 
>>Also, from our testing, I do not think it has anything to do with actual 
>>indexing (if you notice in the order of my test steps documents >appear in 
>>replicas after creation, without re-indexing).
>>
>>Test Environment:
>>•  Windows 10 (we see the same behavior on Linux as well)
>>•  Java 1.8.0_121
>>•  Solr 6.3.0 with patch for SOLR-9527 (To fix RESTORE shard distribution 
>>and add createNodeSet to RESTORE)
>>•  1 Zookeeper node running on localhost:2181
>>•  3 Solr nodes running on localhost:8171, localhost:8181 and 
>>localhost:8191 (hostname NY07LP521696)
>>
>>Test and observations:
>>1) Create a 2 shard collection 'test'
>>   
>> http://localhost:8181/solr/admin/collections?>action=CREATE&name=test&numShards=2&replicationFactor=1&maxShardsPerNode=1&collection.configName=testconf&&createNodeSet=NY07LP>521696:8171_solr,NY07LP521696:8181_solr
>>
>>2) Index 7 documents to 'test'
>>3) Search 'test' - result count 7
>>4) Backup collection 'test'
>>   
>> http://localhost:8181/solr/admin/collections?action=BACKUP&collection=test&name=copy&location=%2FData%2Fsolr%2Fbkp&async=1234
>>
>>5) Restore 'test' to collection 'test2'
>>   
>> http://localhost:8191/solr/admin/collections?action=RESTORE&name=copy&location=%2FData%2Fsolr%>2Fbkp&collection=test2&async=1234&maxShardsPerNode=1&createNodeSet=NY07LP521696:8181_solr,NY07LP521696:8191_solr
>>
>>6) Search 'test2' - result count 7
>>7) Index 2 new documents to 'test2'
>>8) Search 'test2' - result count 7 (new documents do not appear in 
>>results)
>>9) Create a replica for each of the shards of 'test2'
>>   
>> http://localhost:8191/solr/admin/collections?action=ADDREPLICA&collection=test2&shard=shard1&node=NY07LP521696:8181_solr
>>   
>> http://localhost:8191/solr/admin/collections?action=ADDREPLICA&collection=test2&shard=shard2&node=NY07LP521696:8171_solr
>>
>>*** Note that it is not necessary to try to re-index the 2 new documents 
>>before this step, just create replicas and query ***
>>10)Repeatedly query 'test2' - result count randomly changes between 7, 8 
>>and 9. This is because Solr is randomly selecting replicas of 'test2' and 
>>>one of the two new docs were added to each of the shards in the collection 
>>so if replica0 of both shards are selected the result is 7, if replica0 and 
>>>replica1 are selected for each of eit

Re: Tokenized querying

2017-03-07 Thread OTH
Hello,

Thanks for your response; it turned out the fields were indeed of 'string'
type, and when I changed them to 'text_general', it started to work as I
wanted.

However, I'm still not sure how to extract the scores?  I don't seem to be
getting that in the response.

Much thanks

On Tue, Mar 7, 2017 at 8:07 PM, Alexandre Rafalovitch 
wrote:

> The default text field definition (text_general) tokenizes on spaces,
> so - if I understand the question correctly - it should just work. Are
> you by any chance searching against name field that is defined as
> String (and is not tokenized).
>
> If you do Solr tutorial, you search on "ipod", which seems like a
> similar case to me. So, can you start from there? You can just index
> your own text into the example config for example.
>
> Regards,
>Alex.
> P.s. If you are coming from Lucene, copyField instruction may be
> slightly confusing. In the examples provided, your text is copied from
> named specific fields to text/_text_ field which is actually the
> default field searched, using the type definition associated with that
> text/_text_ field, rather than with the original field.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
> On 7 March 2017 at 09:30, OTH  wrote:
> > Hello,
> >
> > I am new to Solr.  I am using v. 6.4.1.  I have what is probably a pretty
> > simple question.
> >
> > Let's say I have these documents with the following values in a single
> > field (let's call it "name"):
> >
> > sando...@company.example.com
> > sandb...@company.example.com
> > sa...@company.example.com
> > Sancho Landolt
> > Sanders Greenley
> > Sanders Massey
> > Santa Catarina
> > San Carlos de Bariloche
> > San Francisco
> > San Mateo
> >
> > I would like, if the search query is "San", for Solr to return the
> > following and only the following:
> > San Carlos de Bariloche
> > San Francisco
> > San Mateo
> >
> > So basically, I'd like to search based on tokens.  I'd also like Solr to
> > return an associated score.  So eg, if the user searches "San Francisco",
> > it should still return the above results, but obviously the score for the
> > document with "San Francisco" would be much higher.
> >
> > I've been doing this pretty easily using Lucene from Java, however I'm
> > unable to figure out how to do it using Solr.
> >
> > Much thanks
>


Managed schema vs schema.xml

2017-03-07 Thread OTH
Hello

I'm sure this has been asked many times but I'm having some confusion here.

I understand that managed-schema is not supposed to be edited by hand but
only via the "API".  All I understand about this "API" however, is that it
may be referring to the "Schema" page in the Solr browser-based Admin.

However, in this "Schema" page, it provides options for "Add Field", "Add
Dynamic Field", "Add Copy Field"; but when I was trying to add a
"fieldType", I couldn't find any way to do this from this web page.

So I instead edited the managed-schema page by hand, which I understand can
be problematic if the schema is ever edited it via the API later on?

I am using v. 6.4.1; when I create a new core, it creates the
managed-schema file in the 'conf' folder.  Is there any way to use the
older 'schema.xml' format instead?  Because there seems to be more
documentation available for that, and like I describe, the browser API
seems to perhaps be lacking.

If so - what do users usually prefer; schema.xml or managed-schema?  (I'm
aware this depends on individual preference, but would be nice to get
others' feedback.)

Thanks


Re: [Migration Solr5 to Solr6] Unwanted deleted files references

2017-03-07 Thread Elodie Sannier

Thank you Erick for your answer.

The files are deleted even without JVM restart but they are still seen
as DELETED by the kernel.

We have a custom code and for the migration to Solr 6.4.0 we have added
a new code with req.getSearcher() but without "close".
We will decrement the reference count on a resource for the Searcher
(prevent the Searcher remains open after a commit) and see if it fixes
the problem.

Elodie

On 03/07/2017 03:55 PM, Erick Erickson wrote:

Just as a sanity check, if you restart the Solr JVM, do the files
disappear from disk?

Do you have any custom code anywhere in this chain? If so, do you open
any searchers but
fail to close them? Although why 6.4 would manifest the problem but
other code wouldn't
is a mystery, just another sanity check.

Best,
Erick

On Tue, Mar 7, 2017 at 6:44 AM, Elodie Sannier  wrote:

Hello,

We have migrated from Solr 5.4.1 to Solr 6.4.0 and the disk usage has
increased.
We found hundreds of references to deleted index files being held by solr.
Before the migration, we had 15-30% of disk space used, after the migration
we have 60-90% of disk space used.

We are using Solr Cloud with 2 collections.

The commands applied on the collections are:
- for incremental indexation mode: add, deleteById with commitWithin of 30
minutes
- for full indexation mode: add, deleteById, commit
- for switch between incremental and full mode: deleteByQuery, createAlias,
reload
- there is also an autocommit every 15 minutes

We have seen the email "Solr leaking references to deleted files"
2016-05-31 which describe the same problem but the mentioned bugs are fixed.

We manually tried to force a commit, a reload and an optimize on the
collections without effect.

Is a problem of configuration (merge / delete policy) or a possible
regression in the Solr code ?

Thank you


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 158 Ter Rue du Temple 75003 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce
message, merci de le détruire et d'en avertir l'expéditeur.



--

Elodie Sannier
Software engineer



*E*elodie.sann...@kelkoo.fr*Skype*kelkooelodies
*T*+33 (0)4 56 09 07 55
*A*Parc Sud Galaxie, 6, rue des Méridiens, 38130 Echirolles


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 158 Ter Rue du Temple 75003 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


Re: Tokenized querying

2017-03-07 Thread Alexandre Rafalovitch
Try adding "score" as a pseudo-field in the 'fl' parameter:
https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-Thefl(FieldList)Parameter

You can also enable debug and debug.explain.structured, if you want to
go all inception on figuring the scores out:
https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThedebugParameter
. And if you do, https://www.manning.com/books/relevant-search is your
friend and I think Manning is running 40% discount right now on
Twitter.

Regards,
   Alex.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 7 March 2017 at 11:41, OTH  wrote:
> Hello,
>
> Thanks for your response; it turned out the fields were indeed of 'string'
> type, and when I changed them to 'text_general', it started to work as I
> wanted.
>
> However, I'm still not sure how to extract the scores?  I don't seem to be
> getting that in the response.
>
> Much thanks
>
> On Tue, Mar 7, 2017 at 8:07 PM, Alexandre Rafalovitch 
> wrote:
>
>> The default text field definition (text_general) tokenizes on spaces,
>> so - if I understand the question correctly - it should just work. Are
>> you by any chance searching against name field that is defined as
>> String (and is not tokenized).
>>
>> If you do Solr tutorial, you search on "ipod", which seems like a
>> similar case to me. So, can you start from there? You can just index
>> your own text into the example config for example.
>>
>> Regards,
>>Alex.
>> P.s. If you are coming from Lucene, copyField instruction may be
>> slightly confusing. In the examples provided, your text is copied from
>> named specific fields to text/_text_ field which is actually the
>> default field searched, using the type definition associated with that
>> text/_text_ field, rather than with the original field.
>> 
>> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>>
>>
>> On 7 March 2017 at 09:30, OTH  wrote:
>> > Hello,
>> >
>> > I am new to Solr.  I am using v. 6.4.1.  I have what is probably a pretty
>> > simple question.
>> >
>> > Let's say I have these documents with the following values in a single
>> > field (let's call it "name"):
>> >
>> > sando...@company.example.com
>> > sandb...@company.example.com
>> > sa...@company.example.com
>> > Sancho Landolt
>> > Sanders Greenley
>> > Sanders Massey
>> > Santa Catarina
>> > San Carlos de Bariloche
>> > San Francisco
>> > San Mateo
>> >
>> > I would like, if the search query is "San", for Solr to return the
>> > following and only the following:
>> > San Carlos de Bariloche
>> > San Francisco
>> > San Mateo
>> >
>> > So basically, I'd like to search based on tokens.  I'd also like Solr to
>> > return an associated score.  So eg, if the user searches "San Francisco",
>> > it should still return the above results, but obviously the score for the
>> > document with "San Francisco" would be much higher.
>> >
>> > I've been doing this pretty easily using Lucene from Java, however I'm
>> > unable to figure out how to do it using Solr.
>> >
>> > Much thanks
>>


Re: [Migration Solr5 to Solr6] Unwanted deleted files references

2017-03-07 Thread Elodie Sannier

Thank you Alex for your answer.

The reference on deleted files are only on index files (with .fdt, .doc.
dvd, ... extensions).

sudo lsof | grep DEL
java   1366kookel  DEL   REG 253,8   15360013
/opt/kookel/data/searchSolrNode/solrindex/fr1_green/index/_2508z.cfs
java   1366kookel  DEL   REG 253,8   15360035
/opt/kookel/data/searchSolrNode/solrindex/fr1_green/index/_25091.fdt
java   1366kookel  DEL   REG 253,8   15425603
/opt/kookel/data/searchSolrNode/solrindex/fr1_green/index/_25091_Lucene50_0.tim
java   1366kookel  DEL   REG 253,8   11624982
/opt/kookel/data/searchSolrNode/solrindex/fr1_green/index/_2508y.fdt
...

We have tested to optimize the collection with Solr Admin but without
effect on it.

Elodie

On 03/07/2017 04:11 PM, Alexandre Rafalovitch wrote:

More sanity checks: what are the extensions/types of the files that
are not deleted?

If they are index files, optimize command (even if no longer
recommended for production) should really blow all the old ones away.
So, are they other kinds of files?

Regards,
Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 7 March 2017 at 09:55, Erick Erickson  wrote:

Just as a sanity check, if you restart the Solr JVM, do the files
disappear from disk?

Do you have any custom code anywhere in this chain? If so, do you open
any searchers but
fail to close them? Although why 6.4 would manifest the problem but
other code wouldn't
is a mystery, just another sanity check.

Best,
Erick

On Tue, Mar 7, 2017 at 6:44 AM, Elodie Sannier  wrote:

Hello,

We have migrated from Solr 5.4.1 to Solr 6.4.0 and the disk usage has
increased.
We found hundreds of references to deleted index files being held by solr.
Before the migration, we had 15-30% of disk space used, after the migration
we have 60-90% of disk space used.

We are using Solr Cloud with 2 collections.

The commands applied on the collections are:
- for incremental indexation mode: add, deleteById with commitWithin of 30
minutes
- for full indexation mode: add, deleteById, commit
- for switch between incremental and full mode: deleteByQuery, createAlias,
reload
- there is also an autocommit every 15 minutes

We have seen the email "Solr leaking references to deleted files"
2016-05-31 which describe the same problem but the mentioned bugs are fixed.

We manually tried to force a commit, a reload and an optimize on the
collections without effect.

Is a problem of configuration (merge / delete policy) or a possible
regression in the Solr code ?

Thank you


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 158 Ter Rue du Temple 75003 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce
message, merci de le détruire et d'en avertir l'expéditeur.



--

Elodie Sannier
Software engineer



*E*elodie.sann...@kelkoo.fr*Skype*kelkooelodies
*T*+33 (0)4 56 09 07 55
*A*Parc Sud Galaxie, 6, rue des Méridiens, 38130 Echirolles


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 158 Ter Rue du Temple 75003 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


Re: Managed schema vs schema.xml

2017-03-07 Thread Shawn Heisey
On 3/7/2017 9:41 AM, OTH wrote:
> I understand that managed-schema is not supposed to be edited by hand but
> only via the "API".  All I understand about this "API" however, is that it
> may be referring to the "Schema" page in the Solr browser-based Admin.
>
> However, in this "Schema" page, it provides options for "Add Field", "Add
> Dynamic Field", "Add Copy Field"; but when I was trying to add a
> "fieldType", I couldn't find any way to do this from this web page.

The schema page in the admin UI is not actually the Schema API, but it
USES the Schema API.  The admin UI is a javascript app that runs in your
browser and makes Solr API requests.  Admin UI URLs are useless outside
of a full browser.

> So I instead edited the managed-schema page by hand, which I understand can
> be problematic if the schema is ever edited it via the API later on?

Hand-editing is only problematic if you mix those edits with using the
API and forget to reload or restart after a hand-edit and before using
the API.  If you are careful to reload/restart before switching editing
methods, there will be no problems.

> I am using v. 6.4.1; when I create a new core, it creates the
> managed-schema file in the 'conf' folder.  Is there any way to use the
> older 'schema.xml' format instead?  Because there seems to be more
> documentation available for that, and like I describe, the browser API
> seems to perhaps be lacking.

The "format" of the schema never changes.  It is exactly the same with
either file.  It is the filename that is different.  Also, the managed
schema allows the Schema API to be used, so you can edit it with HTTP
requests.  If you switch to the Classic schema, then it will go back to
schema.xml.  Depending on which example configuration you start with,
switching back to Classic may require more config edits beyond just
changing the schema factory.  There are additional features Solr can use
that rely on the managed schema.

> If so - what do users usually prefer; schema.xml or managed-schema?  (I'm
> aware this depends on individual preference, but would be nice to get
> others' feedback.)

As for what users prefer, I do not know.  I can tell you that the
default schema factory has been the managed schema since version 5.5,
and all example configs since that version are using it.  When I upgrade
to a 6.x version in production, I plan on keeping the managed schema,
because it's good to go with defaults unless there's a good reason not
to, but I will continue to hand-edit for all changes.

Thanks,
Shawn



RE: Getting an error: was indexed without position data; cannot run PhraseQuery

2017-03-07 Thread Pouliot, Scott
We are NOT using SOLRCloud yet.  I'm still trying to figure out how to get 
SOLRCloud running.  We're using old school master/slave replication still.  So 
sounds like it can be done if I get to that point.  I've got a few non SOLR 
tasks to get done today, so hoping to dig into this later in the week though.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, March 7, 2017 11:05 AM
To: solr-user 
Subject: Re: Getting an error:  was indexed without position data; 
cannot run PhraseQuery

First, it's not clear whether you're using SolrCloud or not, so there may be 
some irrelevant info in here

bq: .could I do it on another instance running the same SOLR version
(4.8.0) and then copy the database into place instead

In a word "yes", if you're careful. Assuming you have more than one shard you 
have to be sure to copy the shards faithfully. By that I mean look at your 
admin UI>>cloud>>tree>>(clusterstate.json or
>>collection>>state.json). You'll see a bunch of information for each
replica but the critical bit is that the hash range should be the same for the 
source and destination. It'll be something like 0x-0x7fff for one 
shard (each replica on a shard has the same hash range). etc.

The implication of course is that both collections need to have the same number 
of shards.

If you don't have any shards, don't worry about it...

Another possibility, depending on your resources is to create another 
collection with the same number of shards and index to _that_. Then use the 
Collections API CREATEALIAS command to atomically switch. This assumes you have 
enough extra capacity that you can do the reindexing without unduly impacting 
prod.

And there are a number of variants on this.
> index to a leader-only collection
> during a small maintenance window you shut down prod and ADDREPLICA 
> for all the shards to build out your new collection blow away your old 
> collection when you're comfortable.

But the bottom line is that indexes may be freely copied wherever you want as 
long as the bookkeeping is respected wrt hash ranges. I used to build Lucene 
indexes on a Windows box and copy it to a Unix server as long as I used binary 
copy

Best,
Erick

On Tue, Mar 7, 2017 at 7:04 AM, Pouliot, Scott  
wrote:
> Welcome to IT right?  We're always in some sort of pickle  ;-)  I'm going to 
> play with settings on one of our internal environments and see if I can 
> replicate the issue and go from there with some test fixes.
>
> Here's a question though...  If I need to re-indexcould I do it on 
> another instance running the same SOLR version (4.8.0) and then copy the 
> database into place instead?  We're using some crappy custom Groovy script 
> run through Aspire to do our indexing and it's horribly slow.  50GB would 
> take at least a day...maybe 2 and I obviously can't have a client down for 
> that long in Production, but if I did it on a backup SOLR boxcopying 50GB 
> into place is much much quicker.
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Monday, March 6, 2017 8:48 PM
> To: solr-user 
> Subject: Re: Getting an error:  was indexed without position 
> data; cannot run PhraseQuery
>
> You're in a pickle then. If you change the definition you need to re-index.
>
> But you claim you haven't changed anything in years as far as the schema is 
> concerned so maybe you're going to get lucky ;).
>
> The error you reported is because somehow there's a phrase search going on 
> against this field. You could have changed something in the query parsers or 
> eDismax definitions or the query generated on the app side to have  phrase 
> query get through. I'm not quite sure if you'll get information back when the 
> query fails, but try adding &debug=query to the URL and see what the 
> parsed_query and parsed_query_toString() to see where phrases are getting 
> generated.
>
> Best,
> Erick
>
> On Mon, Mar 6, 2017 at 5:26 PM, Pouliot, Scott 
>  wrote:
>> Hmm.  We haven’t changed data or the definition in YEARS now.  I'll 
>> have to do some more digging I guess.  Not sure re-indexing is a 
>> great thing to do though since this is a production setup and the 
>> database for this user is @ 50GB.  It would take quite a long time to 
>> reindex all that data from scratch.  H
>>
>> Thanks for the quick reply Erick!
>>
>> -Original Message-
>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>> Sent: Monday, March 6, 2017 5:33 PM
>> To: solr-user 
>> Subject: Re: Getting an error:  was indexed without position 
>> data; cannot run PhraseQuery
>>
>> Usually an _s field is a "string" type, so be sure you didn't change the 
>> definition without completely re-indexing. In fact I generally either index 
>> to a new collection or remove the data directory entirely.
>>
>> right, the field isn't indexed with position information. That combined with 
>> (probably) the WordDelimiterFilterFac

Re: Tokenized querying

2017-03-07 Thread OTH
Hi,

Thanks a lot for the help.  Adding 'score' to 'fl' worked.

I had been using Lucene for some time (thought not at an expert level), and
I was usually pretty satisfied with the scoring; so I'm assuming Solr
should work fine for me too.  At the time being I'm just trying to get a
handle on how to use Solr in the first place though.

Thanks

On Tue, Mar 7, 2017 at 9:45 PM, Alexandre Rafalovitch 
wrote:

> Try adding "score" as a pseudo-field in the 'fl' parameter:
> https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#
> CommonQueryParameters-Thefl(FieldList)Parameter
>
> You can also enable debug and debug.explain.structured, if you want to
> go all inception on figuring the scores out:
> https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#
> CommonQueryParameters-ThedebugParameter
> . And if you do, https://www.manning.com/books/relevant-search is your
> friend and I think Manning is running 40% discount right now on
> Twitter.
>
> Regards,
>Alex.
>
> Regards,
>Alex.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
> On 7 March 2017 at 11:41, OTH  wrote:
> > Hello,
> >
> > Thanks for your response; it turned out the fields were indeed of
> 'string'
> > type, and when I changed them to 'text_general', it started to work as I
> > wanted.
> >
> > However, I'm still not sure how to extract the scores?  I don't seem to
> be
> > getting that in the response.
> >
> > Much thanks
> >
> > On Tue, Mar 7, 2017 at 8:07 PM, Alexandre Rafalovitch <
> arafa...@gmail.com>
> > wrote:
> >
> >> The default text field definition (text_general) tokenizes on spaces,
> >> so - if I understand the question correctly - it should just work. Are
> >> you by any chance searching against name field that is defined as
> >> String (and is not tokenized).
> >>
> >> If you do Solr tutorial, you search on "ipod", which seems like a
> >> similar case to me. So, can you start from there? You can just index
> >> your own text into the example config for example.
> >>
> >> Regards,
> >>Alex.
> >> P.s. If you are coming from Lucene, copyField instruction may be
> >> slightly confusing. In the examples provided, your text is copied from
> >> named specific fields to text/_text_ field which is actually the
> >> default field searched, using the type definition associated with that
> >> text/_text_ field, rather than with the original field.
> >> 
> >> http://www.solr-start.com/ - Resources for Solr users, new and
> experienced
> >>
> >>
> >> On 7 March 2017 at 09:30, OTH  wrote:
> >> > Hello,
> >> >
> >> > I am new to Solr.  I am using v. 6.4.1.  I have what is probably a
> pretty
> >> > simple question.
> >> >
> >> > Let's say I have these documents with the following values in a single
> >> > field (let's call it "name"):
> >> >
> >> > sando...@company.example.com
> >> > sandb...@company.example.com
> >> > sa...@company.example.com
> >> > Sancho Landolt
> >> > Sanders Greenley
> >> > Sanders Massey
> >> > Santa Catarina
> >> > San Carlos de Bariloche
> >> > San Francisco
> >> > San Mateo
> >> >
> >> > I would like, if the search query is "San", for Solr to return the
> >> > following and only the following:
> >> > San Carlos de Bariloche
> >> > San Francisco
> >> > San Mateo
> >> >
> >> > So basically, I'd like to search based on tokens.  I'd also like Solr
> to
> >> > return an associated score.  So eg, if the user searches "San
> Francisco",
> >> > it should still return the above results, but obviously the score for
> the
> >> > document with "San Francisco" would be much higher.
> >> >
> >> > I've been doing this pretty easily using Lucene from Java, however I'm
> >> > unable to figure out how to do it using Solr.
> >> >
> >> > Much thanks
> >>
>


Re: Managed schema vs schema.xml

2017-03-07 Thread Ivan Bianchi
Hi OTH,

I personally prefer to use the classic *schema.xml* file as I feel its
better for core creation with the desired fields than dealing with api
calls.

You can use it specifying the schemaFactory class as
ClassicIndexSchemaFactory as follows:



Best regards,
Ivan

2017-03-07 17:41 GMT+01:00 OTH :

> Hello
>
> I'm sure this has been asked many times but I'm having some confusion here.
>
> I understand that managed-schema is not supposed to be edited by hand but
> only via the "API".  All I understand about this "API" however, is that it
> may be referring to the "Schema" page in the Solr browser-based Admin.
>
> However, in this "Schema" page, it provides options for "Add Field", "Add
> Dynamic Field", "Add Copy Field"; but when I was trying to add a
> "fieldType", I couldn't find any way to do this from this web page.
>
> So I instead edited the managed-schema page by hand, which I understand can
> be problematic if the schema is ever edited it via the API later on?
>
> I am using v. 6.4.1; when I create a new core, it creates the
> managed-schema file in the 'conf' folder.  Is there any way to use the
> older 'schema.xml' format instead?  Because there seems to be more
> documentation available for that, and like I describe, the browser API
> seems to perhaps be lacking.
>
> If so - what do users usually prefer; schema.xml or managed-schema?  (I'm
> aware this depends on individual preference, but would be nice to get
> others' feedback.)
>
> Thanks
>



-- 
Ivan


Re: Managed schema vs schema.xml

2017-03-07 Thread OTH
Hi,

Thanks, that sufficiently answers the question.
It's especially good to know now that hand-editing is fine, as long as it's
separated from API calls with restarts in between.

Thanks

On Tue, Mar 7, 2017 at 9:57 PM, Shawn Heisey  wrote:

> On 3/7/2017 9:41 AM, OTH wrote:
> > I understand that managed-schema is not supposed to be edited by hand but
> > only via the "API".  All I understand about this "API" however, is that
> it
> > may be referring to the "Schema" page in the Solr browser-based Admin.
> >
> > However, in this "Schema" page, it provides options for "Add Field", "Add
> > Dynamic Field", "Add Copy Field"; but when I was trying to add a
> > "fieldType", I couldn't find any way to do this from this web page.
>
> The schema page in the admin UI is not actually the Schema API, but it
> USES the Schema API.  The admin UI is a javascript app that runs in your
> browser and makes Solr API requests.  Admin UI URLs are useless outside
> of a full browser.
>
> > So I instead edited the managed-schema page by hand, which I understand
> can
> > be problematic if the schema is ever edited it via the API later on?
>
> Hand-editing is only problematic if you mix those edits with using the
> API and forget to reload or restart after a hand-edit and before using
> the API.  If you are careful to reload/restart before switching editing
> methods, there will be no problems.
>
> > I am using v. 6.4.1; when I create a new core, it creates the
> > managed-schema file in the 'conf' folder.  Is there any way to use the
> > older 'schema.xml' format instead?  Because there seems to be more
> > documentation available for that, and like I describe, the browser API
> > seems to perhaps be lacking.
>
> The "format" of the schema never changes.  It is exactly the same with
> either file.  It is the filename that is different.  Also, the managed
> schema allows the Schema API to be used, so you can edit it with HTTP
> requests.  If you switch to the Classic schema, then it will go back to
> schema.xml.  Depending on which example configuration you start with,
> switching back to Classic may require more config edits beyond just
> changing the schema factory.  There are additional features Solr can use
> that rely on the managed schema.
>
> > If so - what do users usually prefer; schema.xml or managed-schema?  (I'm
> > aware this depends on individual preference, but would be nice to get
> > others' feedback.)
>
> As for what users prefer, I do not know.  I can tell you that the
> default schema factory has been the managed schema since version 5.5,
> and all example configs since that version are using it.  When I upgrade
> to a 6.x version in production, I plan on keeping the managed schema,
> because it's good to go with defaults unless there's a good reason not
> to, but I will continue to hand-edit for all changes.
>
> Thanks,
> Shawn
>
>


Re: Managed schema vs schema.xml

2017-03-07 Thread Alexandre Rafalovitch
Yes, it has been asked many times and has been answered both on the
list and in the - awesome - Reference Guide. I'd recommend reading
that and then coming back again with more specific question:
https://cwiki.apache.org/confluence/display/solr/Overview+of+Documents%2C+Fields%2C+and+Schema+Design

One confusion to clarify though. API is HTTP API, Admin UI just uses
it and does not - yet - expose everything possible. You can always
just hit Solr directly for the missing bits. Again, RTARG (.. Awesome
Reference Guide) and then come back with specifics:
https://cwiki.apache.org/confluence/display/solr/Schema+API

Regards,
   Alex.


http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 7 March 2017 at 11:41, OTH  wrote:
> Hello
>
> I'm sure this has been asked many times but I'm having some confusion here.
>
> I understand that managed-schema is not supposed to be edited by hand but
> only via the "API".  All I understand about this "API" however, is that it
> may be referring to the "Schema" page in the Solr browser-based Admin.
>
> However, in this "Schema" page, it provides options for "Add Field", "Add
> Dynamic Field", "Add Copy Field"; but when I was trying to add a
> "fieldType", I couldn't find any way to do this from this web page.
>
> So I instead edited the managed-schema page by hand, which I understand can
> be problematic if the schema is ever edited it via the API later on?
>
> I am using v. 6.4.1; when I create a new core, it creates the
> managed-schema file in the 'conf' folder.  Is there any way to use the
> older 'schema.xml' format instead?  Because there seems to be more
> documentation available for that, and like I describe, the browser API
> seems to perhaps be lacking.
>
> If so - what do users usually prefer; schema.xml or managed-schema?  (I'm
> aware this depends on individual preference, but would be nice to get
> others' feedback.)
>
> Thanks


Re: question related to solr LTR plugin

2017-03-07 Thread Michael Nilsson
Hey Saurabh,

So there are a few things you can do to with the LTR plugin and your Solr
collection to solve different degrees of the kind of personalization you
might want.  I'll start with the simplest, which isn't exactly what you're
looking for but is very quick to implement and play around with, and move
on to some more complex features you could write.


   1. Make a simple binary feature to see if the user is the author using
   the existing SolrFeature class.
   {
   "store" : "myFeatureStore",
   "name" : "isUserTheDocumentAuthor ",
   "class" : "org.apache.solr.ltr.feature.SolrFeature",
   "params" : {
   "fq" : [ "{!field f=authorName}${userName}" ]
   }
   } Pass in the userName at request time using external feature
   information (efi)
   ...&rq={!ltr model="myModel" reRankDocs=20 efi.userName="Saurabh"}
   2. Similar to the previous one, except this time instead of seeing if
   the user is the document's author, see if the user has a relationship to
   the document's author.  The only difference here is that you would add a
   new multivalued field to your document which could index all the author's
   relationships that you would match against.
   {
   "store" : "myFeatureStore",
   "name" : "doesUserHaveRelationshipToDocumentAuthor ",
   "class" : "org.apache.solr.ltr.feature.SolrFeature",
   "params" : {
   "fq" : [ "{!field f=authorRelationships}${userName}" ]
   }
   } Ranking request doesn't change
   ...&rq={!ltr model="myModel" reRankDocs=20 efi.userName="Saurabh"}
   3. Implement your own custom feature by subclassing Feature.java
   
.
   You can look at the implementation of ValueFeature.java
   

for
   a simple concrete example of how to do that.  Depending on how large your
   user-user map is, and how often it changes, you could do it in a couple
   different ways.

   If the map is pretty static and relatively small, you can upload it
   along with your feature definition in the params section.  Your custom
   feature can parse that dump once at creation time, and then at query time
   you can look up the value of userName+authorName in your map and return
   that in the score() function for the feature's value.  If it wasn't found
   in the map you could return a default value of 0.

   If the map changes more frequently, you could instead pass in only that
   user's relationship map at request time through an efi.  Your custom
   feature can parse that user's specific relationship map (based on whatever
   format you send it in) and then return the authorName lookup value in the
   score() function.

   If neither of those options are good for you, you could of course just
   make your custom feature class do whatever you want and lookup the
   appropriate data in whatever store you have it in.  You could index this
   info in a parallel collection and look it up there.  Solr recently added
   jdbc driver support, so you might even be able to use that to get data from
   a sql store as well.  I haven't messed around with these two options myself
   so you'd be treading somewhat new ground.  If you or anyone does test this
   out and it seems to work nicely, I think this would make an
*excellent *generic
   feature class to contribute back to the community.

Hope that helps,
Michael
















On Mon, Mar 6, 2017 at 4:24 PM, Saurabh Agarwal (BLOOMBERG/ 731 LEX) <
sagarwal...@bloomberg.net> wrote:

> Hi,
>
> I do have a question related to solr LTR plugin. I have a use case of
> personalization and wondering whether you can help me there. I would like
> to rerank my query based on the relationship of searcher with the author of
> the returned documents. I do have relationship score in the external
> datastore in form of user1(searcher), user2(author), relationship score. In
> my query, I can pass searcher id as external feature. My question is that
> during querying, how do I retrieve relationship score for each documents as
> a feature and rerank the documents. Would I need to implement a custom
> feature to do so? and How to implement the custom feature.
>
> Thanks,
> Saurabh


[ANNOUNCE] Apache Solr 6.4.2 released

2017-03-07 Thread Ishan Chattopadhyaya
7 March 2017, Apache Solr 6.4.2 available

Solr is the popular, blazing fast, open source NoSQL search platform from
the Apache Lucene project. Its major features include powerful full-text
search, hit highlighting, faceted search and analytics, rich document
parsing, geospatial search, extensive REST APIs as well as parallel SQL.
Solr is enterprise grade, secure and highly scalable, providing fault
tolerant distributed search and indexing, and powers the search and
navigation features of many of the world's largest internet sites.

Solr 6.4.2 is available for immediate download at:

   -

   http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

Please read CHANGES.txt for a full list of new features and changes:

   -

   https://lucene.apache.org/solr/6_4_2/changes/Changes.html

Solr 6.4.2 contains 4 bug fixes since the 6.4.1 release:

   -

   Serious performance degradation in Solr 6.4 due to the metrics
   collection. IndexWriter metrics collection turned off by default, directory
   level metrics collection completely removed (until a better design is
   found)
   -

   Transaction log replay can hit an NullPointerException due to new
   Metrics code
   -

   NullPointerException in CloudSolrClient when reading stale alias
   -

   UnifiedHighlighter and PostingsHighlighter bug in PrefixQuery and
   TermRangeQuery for multi-byte text

Further details of changes are available in the change log available at:
http://lucene.apache.org/solr/6_4_2/changes/Changes.html

Please report any feedback to the mailing lists (http://lucene.apache.org/
solr/discussion.html)
Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases. It is possible that the mirror you are using may
not have replicated the release yet. If that is the case, please try
another mirror. This also applies to Maven access.


RE: Solrcloud after restore collection, when index new documents into restored collection, leader not write to index.

2017-03-07 Thread Marquiss, John
Thanks, I have done that... for those following this on the mail list or coming 
across this in the archives the JIRA is SOLR-10242

https://issues.apache.org/jira/browse/SOLR-10242 Cores created by Solr RESTORE 
end up with stale searches after indexing.


Also, we do not see any warnings or errors in any of our logs after the restore 
has finished.

John Marquiss

>-Original Message-
>From: Erick Erickson [mailto:erickerick...@gmail.com] 
>Sent: Tuesday, March 7, 2017 9:53 AM
>To: solr-user 
>Subject: Re: Solrcloud after restore collection, when index new documents into 
>restored collection, leader not write to index.
>
>John:
>
>Just skimming, but this certainly seems like it merits a JIRA, please feel 
>free to create one (you may have to create your own logon first).
>Please include the steps for the test you did where new replicas "see"
>the restored index. And this last where you hand edited things is important.
>
>The only other question I'd have is whether you saw anything odd in the logs. 
>I'm no expert in this functionality, just covering the possibility that for 
>>some reason the restore didn't finish successfully even though all the files 
>appear to be copied back.
>
>I don't have any bandwidth to tackle this, but a JIRA will preserve it for 
>others to look at.
>
>Thanks for all your research on this!
>
>Erick



Re: I want to contribute custom made NLP based solr filters but dont know how.

2017-03-07 Thread Erik Hatcher
Nice use of the VelocityResponseWriter :)

(and looks like, at quick glance, several other goodies under there too) 

Erik


> On Mar 5, 2017, at 7:40 AM, Avtar Singh Mehra  wrote:
> 
> Hello everyone,
> I have developed project called WiseOwl which is basically a fact based
> question answering system which can be accessed at :
> https://github.com/asmehra95/wiseowl
> 
> In the process of making the project work i have developed pluggable solr
> filters optimised for solr 6.3.0.
> I would like to donate them to solr.
> 1. *WiseOwlStanford Filter* :It uses StanfordCoreNLP to tag named entities
> and it also normalises Dates during indexing or searching. DEmonstration
> screenshots are available on the github profile. But i don't know how to
> donate them.
> 
> If there is a way then please let me know. As it may be useful for anyone
> doing natural language processing.



Re: query rewriting

2017-03-07 Thread Tim Casey
Hendrik,

I would recommend attempting to stick to the query syntax, as it is in
lucene, as close as possible.

However, if you do your own query parse build up, you can use a Lucene
Query object.  I don't know where this bolts into solr, exactly.  But I
have done this extensively with lucene.  The reason was to combine two
distinct portions of content into one unified query language.  Also, we did
some remapping of field names into a normalized user experience.  This
meant the field names could be exposed in the UI, independent of the
metadata of the underlying content.  For what I did, the source content
could be vastly different from one index to another.  Usually this is not
the case.

You end up building or/and query phrases, then passing it off to the query
engine.  If you do this, you can also optimize and add boost terms under
specific circumstances.  If there are a set of required terms/phrases, then
you can add terms to boost or remove non-required terms without any loss to
the overall result set.  This changes the order in which items are
returned, so may impact user perception of recall, but is possible under
for specific reasons.

tim

On Sun, Mar 5, 2017 at 11:40 PM, Hendrik Haddorp 
wrote:

> Hi,
>
> I would like to dynamically modify a query, for example by replacing a
> field name with a different one. Given how complex the query parsing is it
> does look error prone to duplicate that so I would like to work on the
> Lucene Query object model instead. The subclasses of Query look relatively
> simple and easy to rewrite on the Lucene side but on the Solr side this
> does not seem to be the case. Any suggestions on how this could be done?
>
> thanks,
> Hendrik
>


Re: Managed schema vs schema.xml

2017-03-07 Thread OTH
Hi,

Thanks, I should've consulted this guide more thoroughly.  I actually had
encountered this section when reading the guide, but somehow forgot about
it when asking this question.  I think, it doesn't clarify some things very
well, which could leave a beginner a bit confused.

Specifically, that 'managed-schema' could indeed be modified by hand, or
even that what the HTTP API is doing is actually modifying this file.
When I was first checking out Solr, I saw this section and remembered
thinking how verbose it was to make changes this way, because I saw on some
website how someone was making changes to a 'schema.xml' file instead, and
that seemed easier.  This file was supposed to be in 'conf' but I couldn't
find it... so I tried making the changes to modified-schema instead and it
worked.  But then I also read somewhere that you aren't supposed to do
that, so I wasn't sure how to do things going forward.

Anyways, I'm clearer now that the managed-schema does safely allow
hand-edits if done properly, which might in some cases be easier than the
HTTP calls; and at the same time it offers the HTTP API as an option as
well when needed / preferred.

Much thanks

On Tue, Mar 7, 2017 at 9:50 PM, Alexandre Rafalovitch 
wrote:

> Yes, it has been asked many times and has been answered both on the
> list and in the - awesome - Reference Guide. I'd recommend reading
> that and then coming back again with more specific question:
> https://cwiki.apache.org/confluence/display/solr/Overview+of+Documents%2C+
> Fields%2C+and+Schema+Design
>
> One confusion to clarify though. API is HTTP API, Admin UI just uses
> it and does not - yet - expose everything possible. You can always
> just hit Solr directly for the missing bits. Again, RTARG (.. Awesome
> Reference Guide) and then come back with specifics:
> https://cwiki.apache.org/confluence/display/solr/Schema+API
>
> Regards,
>Alex.
>
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
> On 7 March 2017 at 11:41, OTH  wrote:
> > Hello
> >
> > I'm sure this has been asked many times but I'm having some confusion
> here.
> >
> > I understand that managed-schema is not supposed to be edited by hand but
> > only via the "API".  All I understand about this "API" however, is that
> it
> > may be referring to the "Schema" page in the Solr browser-based Admin.
> >
> > However, in this "Schema" page, it provides options for "Add Field", "Add
> > Dynamic Field", "Add Copy Field"; but when I was trying to add a
> > "fieldType", I couldn't find any way to do this from this web page.
> >
> > So I instead edited the managed-schema page by hand, which I understand
> can
> > be problematic if the schema is ever edited it via the API later on?
> >
> > I am using v. 6.4.1; when I create a new core, it creates the
> > managed-schema file in the 'conf' folder.  Is there any way to use the
> > older 'schema.xml' format instead?  Because there seems to be more
> > documentation available for that, and like I describe, the browser API
> > seems to perhaps be lacking.
> >
> > If so - what do users usually prefer; schema.xml or managed-schema?  (I'm
> > aware this depends on individual preference, but would be nice to get
> > others' feedback.)
> >
> > Thanks
>


Re: Managed schema vs schema.xml

2017-03-07 Thread Alexandre Rafalovitch
On 7 March 2017 at 15:02, OTH  wrote:
> Specifically, that 'managed-schema' could indeed be modified by hand, or
> even that what the HTTP API is doing is actually modifying this file.

Thank you for the specific feedback. That is something we should fold
into the Guide as you are not the only one asking this specific aspect
of the question. And seeing that you've read the guide first, it is
obvious that the question is not fully answered.

Regards,
   Alex.


http://www.solr-start.com/ - Resources for Solr users, new and experienced


RE: Managed schema vs schema.xml

2017-03-07 Thread Phil Scadden
I would second that guide could be clearer on that. I read and reread several 
times trying to get my head around the schema.xml/managed-schema bit. I came 
away from first cursory reading with the idea that managed-schema was mostly 
for schema-less mode and only after some stuff ups and puzzling over comments 
in the basic-config schema file itself did I go back for more careful re-read. 
I am still not sure that I have got all the nuances. My understanding is:

If you don’t want ability to edit it via admin UI or config api, rename to 
schema.xml. Unclear whether you have to make changes to other configs to do 
this. Also unclear to me whether there was any upside at all to using 
schema.xml? Why degrade functionality? Does the capacity for schema.xml only 
exist for backward compatibility?

If you want to run schema-less, you have to use managed-schema? (I didn’t 
delve too deep into this).

In the end, I used basic-config to create core and then hacked managed-schema 
from there.


I would have to say the "basic-config" seems distinctly more than basic. It is 
still a huge file. I thought perhaps I could delete every unused field type, 
but worried there were some "system" dependencies. Ie if you want *target type 
wildcard queries do you need to have text_general_reverse and a copy to it? If 
you always explicitly set only defined fields in a custom indexer, then can you 
dump the whole dynamic fields bit?
Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.


Re: Managed schema vs schema.xml

2017-03-07 Thread Erick Erickson
See SOLR-10241 I just opened for discussion. My first impulse (well
actually second) is to _not_ encourage anyone to hand-edit managed
schema, and especially not put that in the ref guide.

But perhaps put the classic schema factory in a comment in
basic_configs and direct people there (and maybe even from the ref
guide) if they want to do the classic managed schema.

So I think the direction here is to say, basically:
1> if you want to hand-edit, use classic schema factory, see the
comments in configset XX
2> Otherwise use managed schema and modify it via the rest API.

and leave out mention of hand-editing managed-schema, that's expert level stuff.

FWIW,
Erick

Which is entirely separate from clarifying the ref guide

On Tue, Mar 7, 2017 at 12:11 PM, Alexandre Rafalovitch
 wrote:
> On 7 March 2017 at 15:02, OTH  wrote:
>> Specifically, that 'managed-schema' could indeed be modified by hand, or
>> even that what the HTTP API is doing is actually modifying this file.
>
> Thank you for the specific feedback. That is something we should fold
> into the Guide as you are not the only one asking this specific aspect
> of the question. And seeing that you've read the guide first, it is
> obvious that the question is not fully answered.
>
> Regards,
>Alex.
>
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced


https

2017-03-07 Thread pubdiverses

Hello,

I would like to acces my solr instance with https://domain.com/solr.

how to do this ?


Re: Managed schema vs schema.xml

2017-03-07 Thread OTH
In the reference guide, in the chapter named "The Well Configured Solr
Instance", it says (I'm copying+pasting from the PDF version) :

Switching from Managed Schema to Manually Edited schema.xml
> If you have started Solr with managed schema enabled and you would like to
> switch to manually editing a schem
> a.xml
> a.xml file, you should take the following steps:
> Rename the
> Rename the managed-schema file to schema.xml.
> Modify
> Modify solrconfig.xml to replace the schemaFactory class.
> Remove any
> Remove any ManagedIndexSchemaFactory definition if it exists.
> Add a
> Add a ClassicIndexSchemaFactory definition as shown above
> Reload the core(s).
> Reload the core(s).
> Apache Solr Reference Guide 6.4 515
> If you are using SolrCloud, you may need to modify the files via
> ZooKeeper. The
> If you are using SolrCloud, you may need to modify the files via
> ZooKeeper. The bin/solr script provides an
> easy way to download the files from ZooKeeper and upload them back after
> edits. See the section
> easy way to download the files from ZooKeeper and upload them back after
> edits. See the section ZooKeeper
> Operations
> Operations for more information.
> IndexConfig in SolrConfig
> The  section of solrconfig.xml defines low-level behavior of
> the Lucene index writers.
> By default, the settings are commented out in the sample
> By default, the settings are commented out in the sample solrconfig.xml 
> included
> with Solr, which means
> the defaults are used. In most cases, the defaults are fine.
> the defaults are used. In most cases, the defaults are fine.
> 
> ...
> 
> Parameters covered in this section:
> Writing New Segments
> Merging Index Segments
> Compound File Segments
> Index Locks
> Other Indexing Settings
> Writing New Segments
> ramBufferSizeMB
> Once accumulated document updates exceed this much memory space (defined
> in megabytes), then the
> pending updates are flushed. This can also create new segments or trigger
> a merge. Using this setting is
> generally preferable to maxBufferedDocs. If both maxBufferedDocs and 
> ramBufferSizeMB
> are set in s
> olrconfig.xml
> olrconfig.xml, then a flush will occur when either limit is reached. The
> default is 100Mb.
> 100
> maxBufferedDocs
> Sets the number of document updates to buffer in memory before they are
> flushed as a new segment. This
> may also trigger a merge. The default Solr configuration sets to flush by
> RAM usage (ramBufferSizeMB).
> 1000
> useCompoundFile
> Controls whether newly written (and not yet merged) index segments should
> use the Compound File
> Segment
> Segment format. The default is false.
> false
> To have full control over your schema.xml file, you may also want to
> disable schema guessing, which
> allows unknown fields to be added to the schema during indexing. The
> properties that enable this feature
> are discussed in the section
> allows unknown fields to be added to the schema during indexing. The
> properties that enable this feature
> are discussed in the section Schemaless Mode


On Wed, Mar 8, 2017 at 1:32 AM, Phil Scadden  wrote:

> I would second that guide could be clearer on that. I read and reread
> several times trying to get my head around the schema.xml/managed-schema
> bit. I came away from first cursory reading with the idea that
> managed-schema was mostly for schema-less mode and only after some stuff
> ups and puzzling over comments in the basic-config schema file itself did I
> go back for more careful re-read. I am still not sure that I have got all
> the nuances. My understanding is:
>
> If you don’t want ability to edit it via admin UI or config api, rename to
> schema.xml. Unclear whether you have to make changes to other configs to do
> this. Also unclear to me whether there was any upside at all to using
> schema.xml? Why degrade functionality? Does the capacity for schema.xml
> only exist for backward compatibility?
>
> If you want to run schema-less, you have to use managed-schema? (I
> didn’t delve too deep into this).
>
> In the end, I used basic-config to create core and then hacked
> managed-schema from there.
>
>
> I would have to say the "basic-config" seems distinctly more than basic.
> It is still a huge file. I thought perhaps I could delete every unused
> field type, but worried there were some "system" dependencies. Ie if you
> want *target type wildcard queries do you need to have text_general_reverse
> and a copy to it? If you always explicitly set only defined fields in a
> custom indexer, then can you dump the whole dynamic fields bit?
> Notice: This email and any attachments are confidential and may not be
> used, published or redistributed without the prior written consent of the
> Institute of Geological and Nuclear Sciences Limited (GNS Science). If
> received in error please destroy and immediately notify GNS Science. Do not
> copy or disclose the contents.
>


Re: Getting an error: was indexed without position data; cannot run PhraseQuery

2017-03-07 Thread Erick Erickson
OK, you can do kind of the same thing with the core admin API "SWAP" command.

And in stand-alone it's much simpler. Just index your data somewhere
(I don't particularly care where, your workstation, a spare machine
lying around, whatever) and copy the result to the index directory for
prod. I'd copy it to the master then make sure it propagates to all
the slaves. You can do this by removing the data directory while a
slave is shut down and starting it back up.

Or, you an copy the index to the master and all the slaves in one big go.

Up to you.

Best,
Erick



On Tue, Mar 7, 2017 at 8:59 AM, Pouliot, Scott
 wrote:
> We are NOT using SOLRCloud yet.  I'm still trying to figure out how to get 
> SOLRCloud running.  We're using old school master/slave replication still.  
> So sounds like it can be done if I get to that point.  I've got a few non 
> SOLR tasks to get done today, so hoping to dig into this later in the week 
> though.
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Tuesday, March 7, 2017 11:05 AM
> To: solr-user 
> Subject: Re: Getting an error:  was indexed without position data; 
> cannot run PhraseQuery
>
> First, it's not clear whether you're using SolrCloud or not, so there may be 
> some irrelevant info in here
>
> bq: .could I do it on another instance running the same SOLR version
> (4.8.0) and then copy the database into place instead
>
> In a word "yes", if you're careful. Assuming you have more than one shard you 
> have to be sure to copy the shards faithfully. By that I mean look at your 
> admin UI>>cloud>>tree>>(clusterstate.json or
>>>collection>>state.json). You'll see a bunch of information for each
> replica but the critical bit is that the hash range should be the same for 
> the source and destination. It'll be something like 0x-0x7fff for 
> one shard (each replica on a shard has the same hash range). etc.
>
> The implication of course is that both collections need to have the same 
> number of shards.
>
> If you don't have any shards, don't worry about it...
>
> Another possibility, depending on your resources is to create another 
> collection with the same number of shards and index to _that_. Then use the 
> Collections API CREATEALIAS command to atomically switch. This assumes you 
> have enough extra capacity that you can do the reindexing without unduly 
> impacting prod.
>
> And there are a number of variants on this.
>> index to a leader-only collection
>> during a small maintenance window you shut down prod and ADDREPLICA
>> for all the shards to build out your new collection blow away your old 
>> collection when you're comfortable.
>
> But the bottom line is that indexes may be freely copied wherever you want as 
> long as the bookkeeping is respected wrt hash ranges. I used to build Lucene 
> indexes on a Windows box and copy it to a Unix server as long as I used 
> binary copy
>
> Best,
> Erick
>
> On Tue, Mar 7, 2017 at 7:04 AM, Pouliot, Scott 
>  wrote:
>> Welcome to IT right?  We're always in some sort of pickle  ;-)  I'm going to 
>> play with settings on one of our internal environments and see if I can 
>> replicate the issue and go from there with some test fixes.
>>
>> Here's a question though...  If I need to re-indexcould I do it on 
>> another instance running the same SOLR version (4.8.0) and then copy the 
>> database into place instead?  We're using some crappy custom Groovy script 
>> run through Aspire to do our indexing and it's horribly slow.  50GB would 
>> take at least a day...maybe 2 and I obviously can't have a client down for 
>> that long in Production, but if I did it on a backup SOLR boxcopying 
>> 50GB into place is much much quicker.
>>
>> -Original Message-
>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>> Sent: Monday, March 6, 2017 8:48 PM
>> To: solr-user 
>> Subject: Re: Getting an error:  was indexed without position
>> data; cannot run PhraseQuery
>>
>> You're in a pickle then. If you change the definition you need to re-index.
>>
>> But you claim you haven't changed anything in years as far as the schema is 
>> concerned so maybe you're going to get lucky ;).
>>
>> The error you reported is because somehow there's a phrase search going on 
>> against this field. You could have changed something in the query parsers or 
>> eDismax definitions or the query generated on the app side to have  phrase 
>> query get through. I'm not quite sure if you'll get information back when 
>> the query fails, but try adding &debug=query to the URL and see what the 
>> parsed_query and parsed_query_toString() to see where phrases are getting 
>> generated.
>>
>> Best,
>> Erick
>>
>> On Mon, Mar 6, 2017 at 5:26 PM, Pouliot, Scott 
>>  wrote:
>>> Hmm.  We haven’t changed data or the definition in YEARS now.  I'll
>>> have to do some more digging I guess.  Not sure re-indexing is a
>>> great thing to do though since this is a production setup and the

Re: Managed schema vs schema.xml

2017-03-07 Thread Walter Underwood
Maybe this is expert stuff, but we keep our schema, solrconfig, and everything 
else checked into source control.

I wrote a Python thingy to hit the cluster through the load balancer, get the 
zkHost string from status, upload the files to zookeeper (kazoo is a nice 
library), link the config, then do an async reload.

I’ve been thinking about time stamping the config directories so I can roll 
back to a previous config if the reload fails.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Mar 7, 2017, at 12:47 PM, OTH  wrote:
> 
> In the reference guide, in the chapter named "The Well Configured Solr
> Instance", it says (I'm copying+pasting from the PDF version) :
> 
> Switching from Managed Schema to Manually Edited schema.xml
>> If you have started Solr with managed schema enabled and you would like to
>> switch to manually editing a schem
>> a.xml
>> a.xml file, you should take the following steps:
>> Rename the
>> Rename the managed-schema file to schema.xml.
>> Modify
>> Modify solrconfig.xml to replace the schemaFactory class.
>> Remove any
>> Remove any ManagedIndexSchemaFactory definition if it exists.
>> Add a
>> Add a ClassicIndexSchemaFactory definition as shown above
>> Reload the core(s).
>> Reload the core(s).
>> Apache Solr Reference Guide 6.4 515
>> If you are using SolrCloud, you may need to modify the files via
>> ZooKeeper. The
>> If you are using SolrCloud, you may need to modify the files via
>> ZooKeeper. The bin/solr script provides an
>> easy way to download the files from ZooKeeper and upload them back after
>> edits. See the section
>> easy way to download the files from ZooKeeper and upload them back after
>> edits. See the section ZooKeeper
>> Operations
>> Operations for more information.
>> IndexConfig in SolrConfig
>> The  section of solrconfig.xml defines low-level behavior of
>> the Lucene index writers.
>> By default, the settings are commented out in the sample
>> By default, the settings are commented out in the sample solrconfig.xml 
>> included
>> with Solr, which means
>> the defaults are used. In most cases, the defaults are fine.
>> the defaults are used. In most cases, the defaults are fine.
>> 
>> ...
>> 
>> Parameters covered in this section:
>> Writing New Segments
>> Merging Index Segments
>> Compound File Segments
>> Index Locks
>> Other Indexing Settings
>> Writing New Segments
>> ramBufferSizeMB
>> Once accumulated document updates exceed this much memory space (defined
>> in megabytes), then the
>> pending updates are flushed. This can also create new segments or trigger
>> a merge. Using this setting is
>> generally preferable to maxBufferedDocs. If both maxBufferedDocs and 
>> ramBufferSizeMB
>> are set in s
>> olrconfig.xml
>> olrconfig.xml, then a flush will occur when either limit is reached. The
>> default is 100Mb.
>> 100
>> maxBufferedDocs
>> Sets the number of document updates to buffer in memory before they are
>> flushed as a new segment. This
>> may also trigger a merge. The default Solr configuration sets to flush by
>> RAM usage (ramBufferSizeMB).
>> 1000
>> useCompoundFile
>> Controls whether newly written (and not yet merged) index segments should
>> use the Compound File
>> Segment
>> Segment format. The default is false.
>> false
>> To have full control over your schema.xml file, you may also want to
>> disable schema guessing, which
>> allows unknown fields to be added to the schema during indexing. The
>> properties that enable this feature
>> are discussed in the section
>> allows unknown fields to be added to the schema during indexing. The
>> properties that enable this feature
>> are discussed in the section Schemaless Mode
> 
> 
> On Wed, Mar 8, 2017 at 1:32 AM, Phil Scadden  wrote:
> 
>> I would second that guide could be clearer on that. I read and reread
>> several times trying to get my head around the schema.xml/managed-schema
>> bit. I came away from first cursory reading with the idea that
>> managed-schema was mostly for schema-less mode and only after some stuff
>> ups and puzzling over comments in the basic-config schema file itself did I
>> go back for more careful re-read. I am still not sure that I have got all
>> the nuances. My understanding is:
>> 
>> If you don’t want ability to edit it via admin UI or config api, rename to
>> schema.xml. Unclear whether you have to make changes to other configs to do
>> this. Also unclear to me whether there was any upside at all to using
>> schema.xml? Why degrade functionality? Does the capacity for schema.xml
>> only exist for backward compatibility?
>> 
>> If you want to run schema-less, you have to use managed-schema? (I
>> didn’t delve too deep into this).
>> 
>> In the end, I used basic-config to create core and then hacked
>> managed-schema from there.
>> 
>> 
>> I would have to say the "basic-config" seems distinctly more than basic.
>> It is still a huge file. I thought perhaps I could delete every unused
>> field type,

Re: https

2017-03-07 Thread Alexandre Rafalovitch
The first advise is NOT to expose your Solr directly to the public.
Anyone that can hit /search, can also hit /update and wipe out your
index.

Unless you run a proper proxy that secures URLs and sanitizes the
parameters (in GET, in POST, escaped, etc).  And if you are doing
that, you can setup the HTTPS in your proxy and have it speak HTTP to
Solr on the backend.

Otherwise, you need middleware, which runs on a server as well, so you
are back into configuring _that_ server (not Solr) for HTTPS.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 7 March 2017 at 15:45, pubdiverses  wrote:
> Hello,
>
> I would like to acces my solr instance with https://domain.com/solr.
>
> how to do this ?


Re: https

2017-03-07 Thread Shawn Heisey

On 3/7/2017 1:45 PM, pubdiverses wrote:

I would like to acces my solr instance with https://domain.com/solr.

how to do this ?


The reference guide covers this.

https://cwiki.apache.org/confluence/display/solr/Enabling+SSL

If you want to change the port to 443 so it will work without a port in 
the URL, then you will need to change the port number and run Solr as 
root or an admin user.  In later versions, Solr will refuse to start 
when run as root without a "force" option.  The root/admin requirement 
comes from the operating system -- this is almost always required to 
bind to a port number below 1024.


Thanks,
Shawn



Re: Managed schema vs schema.xml

2017-03-07 Thread Shawn Heisey

On 3/7/2017 1:32 PM, Phil Scadden wrote:
I would have to say the "basic-config" seems distinctly more than 
basic. It is still a huge file. I thought perhaps I could delete every 
unused field type, but worried there were some "system" dependencies.


This is definitely true.  Solr example configs tend towards including 
"everything and the kitchen sink".  Although this is good at 
illustrating everything that Solr can do, it is also VERY overwhelming 
to new users.  I have found that in my production configs, I tend to 
strip almost everything out and make them very lean.  I have kept a 
number of the schema fieldType definitions from the example, 
particularly those for basic data types, such as numeric fields.


Most of the dependencies in a schema will be contained within the schema 
itself -- fieldTypes that are referenced by field definitions, etc.  
There are a few other possible dependencies, such as a default field 
parameter in a search handler definition that lives in solrconfig.xml.


Thanks,
Shawn



Re: Managed schema vs schema.xml

2017-03-07 Thread Alexandre Rafalovitch
Actually, the main cross-references are from the solrconfig.xml, and
primarily from the Update Request Handler chain that creates the
"schemaless" effect. Then, I think you also have highlighters, etc.

I did that full analysis as a presentation at the last Solr
Revolution: 
https://www.slideshare.net/arafalov/rebuilding-solr-6-examples-layer-by-layer-lucenesolrrevolution-2016

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 7 March 2017 at 17:18, Shawn Heisey  wrote:
> On 3/7/2017 1:32 PM, Phil Scadden wrote:
>>
>> I would have to say the "basic-config" seems distinctly more than basic.
>> It is still a huge file. I thought perhaps I could delete every unused field
>> type, but worried there were some "system" dependencies.
>
>
> This is definitely true.  Solr example configs tend towards including
> "everything and the kitchen sink".  Although this is good at illustrating
> everything that Solr can do, it is also VERY overwhelming to new users.  I
> have found that in my production configs, I tend to strip almost everything
> out and make them very lean.  I have kept a number of the schema fieldType
> definitions from the example, particularly those for basic data types, such
> as numeric fields.
>
> Most of the dependencies in a schema will be contained within the schema
> itself -- fieldTypes that are referenced by field definitions, etc.  There
> are a few other possible dependencies, such as a default field parameter in
> a search handler definition that lives in solrconfig.xml.
>
> Thanks,
> Shawn
>


RE: https

2017-03-07 Thread Phil Scadden

>The first advise is NOT to expose your Solr directly to the public.
>Anyone that can hit /search, can also hit /update and wipe out your index.

I would second that too. We have never exposed Solr and I also sanitise queries 
in the proxy.
Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.


Re: Managed schema vs schema.xml

2017-03-07 Thread Erick Erickson
I suggest we make additional comments on SOLR-10241. I created it as a
result of this discussion and anyone who takes it on would benefit
from the comments being made there.

Anyone can make comments there, there's no special karma required
although you do have to create a login.  From the interest this thread
has generated so far this is definitely something that could stand
some clarification/code fixes...

Best,
Erick

On Tue, Mar 7, 2017 at 3:01 PM, Alexandre Rafalovitch
 wrote:
> Actually, the main cross-references are from the solrconfig.xml, and
> primarily from the Update Request Handler chain that creates the
> "schemaless" effect. Then, I think you also have highlighters, etc.
>
> I did that full analysis as a presentation at the last Solr
> Revolution: 
> https://www.slideshare.net/arafalov/rebuilding-solr-6-examples-layer-by-layer-lucenesolrrevolution-2016
>
> Regards,
>Alex.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
> On 7 March 2017 at 17:18, Shawn Heisey  wrote:
>> On 3/7/2017 1:32 PM, Phil Scadden wrote:
>>>
>>> I would have to say the "basic-config" seems distinctly more than basic.
>>> It is still a huge file. I thought perhaps I could delete every unused field
>>> type, but worried there were some "system" dependencies.
>>
>>
>> This is definitely true.  Solr example configs tend towards including
>> "everything and the kitchen sink".  Although this is good at illustrating
>> everything that Solr can do, it is also VERY overwhelming to new users.  I
>> have found that in my production configs, I tend to strip almost everything
>> out and make them very lean.  I have kept a number of the schema fieldType
>> definitions from the example, particularly those for basic data types, such
>> as numeric fields.
>>
>> Most of the dependencies in a schema will be contained within the schema
>> itself -- fieldTypes that are referenced by field definitions, etc.  There
>> are a few other possible dependencies, such as a default field parameter in
>> a search handler definition that lives in solrconfig.xml.
>>
>> Thanks,
>> Shawn
>>


DrillSideWaysSearch on faceting

2017-03-07 Thread Chitra
Hi,
  I am a new one to Solr. Recently we are digging drill sideways search
(for faceting purpose) on Lucene. Is that solr facets support drill
sideways search like Lucene?? If yes, Kindly suggest the API or article how
to use.


Any help is much appreciated.


Thanks,
Chitra