Re: How to implement Autosuggestion

2016-04-04 Thread Alessandro Benedetti
Hi Chandan,
I will answer as my previous answer to a similar topic that got lost :
"First of all, simple string autosuggestion or document autosuggestion ? (
with more additional field to show then the label)
Are you interested in the analysis for the text to suggest ? Fuzzy
suggestions ? exact "beginning of the phrase" suggestions ? infix
suggestions ?"

If you need only the category *payloadField* should be what you need .
I never used it as a feature but it is there [1] .
As Reth suggested, at the moment Solr supports only one payloadField,
ignoring the others ( code confirms this).

In the case you want to show the label AND the category AND whatever ( in
Amazon style to make it simple) .
A very straighforward solution is to model a specific Solr Field for your
product collection.
This field will contain the name of the product, analyzed according your
need.
Then your autosuggester will simply hit that field on each char typed, and
you can show the entire document in the suggestions ( with all the fields
you want) .
Or we take a look to the implementation and we contribute the support for
multiple *payloadField*.

Cheers

[1] https://cwiki.apache.org/confluence/display/solr/Suggester

On Sun, Apr 3, 2016 at 1:09 PM, Reth RM  wrote:

> There is a payload attribute but I'm not sure if this can be used for such
> use case. Lets wait for others contributors to confirm.
> Similar question posted here:
>
> http://stackoverflow.com/questions/32434186/solr-suggestion-with-multiple-payloads
> .
>
> If its just a category that you need then the work around(although not
> accurate one) that I can think of is to include the category value to the
> same field with pipe separation and extract from it?
>
> On Sun, Apr 3, 2016 at 11:41 AM, chandan khatri 
> wrote:
>
> > Hi All,
> >
> > I've a query regarding autosuggestion. My use case is as below:
> >
> > 1. User enters product name (say Nokia)
> > 2. I want suggestions along with the category with which the product
> > belongs. (e.g Nokia belongs to "electronics" and "mobile" category) so I
> > want suggestion like Nokia in electronics and Nokia in mobile.
> >
> > I am able to get the suggestions using the OOTB AnalyzingInFixSuggester
> but
> > not sure how I can get the category along with the suggestion(can this
> > category be considered as facet of the suggestion??)
> >
> > Any help/pointer is highly appreciated.
> >
> > Thanks,
> > Chandan
> >
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Tutorial example loading of exampledocs for *.xml fails due to bad request

2016-04-04 Thread onixterry
I am following the tutorial documentation at 
http://lucene.apache.org/solr/quickstart.html
  .  I successfully indexed
the "docs" folder using the SimplePostTool (Windows, using the Java method).

When I attempt the second example, of loading the *.xml files, I receive an
error back.  I tried just one of the XMLs and receive the same error.  

Here is the output:

C:\solr-5.5.0>java -Dauto -Dc=gettingstarted -jar
example/exampledocs/post.jar example/exampledocs/gb18030-example.xml
SimplePostTool version 5.0.0
Posting files to [base] url
http://localhost:8983/solr/gettingstarted/update...
Entering auto mode. File endings considered are
xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
POSTing file gb18030-example.xml (application/xml) to [base]
SimplePostTool: WARNING: Solr returned an error #400 (Bad Request) for url:
http://localhost:8983/solr/gettingstarted/update
SimplePostTool: WARNING: Response: 

4008org.apache.solr.common.SolrExceptionjava.lang.NumberFormatExceptionorg.apache.solr.common.SolrExceptionorg.apache.solr.common.SolrExceptionBad Request



request:
http://10.0.1.36:8983/solr/gettingstarted_shard1_replica2/update?update.chain=add-unknown-fields-to-the-schema&update.distrib=TOLEADER&distrib.from=http%3A%2F%2F10.0.1.36%3A8983%2Fsolr%2Fgettingstarted_shard2_replica2%2F&wt=javabin&version=2400

SimplePostTool: WARNING: IOException while reading response:
java.io.IOException: Server returned HTTP response code: 400 for URL:
http://localhost:8983/solr/gettingstarted/update
1 files indexed.
COMMITting Solr index changes to
http://localhost:8983/solr/gettingstarted/update...
Time spent: 0:00:00.093

I haven't seen any online posts related to the QuickStart documentation so I
am not sure where to turn for assistance.

Anyone have any suggestions?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Tutorial-example-loading-of-exampledocs-for-xml-fails-due-to-bad-request-tp4267878.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Tutorial example loading of exampledocs for *.xml fails due to bad request

2016-04-04 Thread Binoy Dalal
The stack trace says that it is a number format exception, which means that
some field which is expecting a numeric value is receiving a non-numeric
value.

You should check you schema for all the fields pertaining to these docs
which are numeric and check those against the docs themselves to ensure
that those fields in the docs contain numeric values.

On Mon, 4 Apr 2016, 19:14 onixterry,  wrote:

> I am following the tutorial documentation at
> http://lucene.apache.org/solr/quickstart.html
>   .  I successfully indexed
> the "docs" folder using the SimplePostTool (Windows, using the Java
> method).
>
> When I attempt the second example, of loading the *.xml files, I receive an
> error back.  I tried just one of the XMLs and receive the same error.
>
> Here is the output:
>
> C:\solr-5.5.0>java -Dauto -Dc=gettingstarted -jar
> example/exampledocs/post.jar example/exampledocs/gb18030-example.xml
> SimplePostTool version 5.0.0
> Posting files to [base] url
> http://localhost:8983/solr/gettingstarted/update...
> Entering auto mode. File endings considered are
>
> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
> POSTing file gb18030-example.xml (application/xml) to [base]
> SimplePostTool: WARNING: Solr returned an error #400 (Bad Request) for url:
> http://localhost:8983/solr/gettingstarted/update
> SimplePostTool: WARNING: Response: 
> 
> 400 name="QTime">8 name="error-class">org.apache.solr.common.SolrException name="root-error-class">java.lang.NumberFormatException name="error-class">org.apache.solr.common.SolrException
> name="root-error-class">org.apache.solr.common.SolrException name="msg">Bad Request
>
>
>
> request:
>
> http://10.0.1.36:8983/solr/gettingstarted_shard1_replica2/update?update.chain=add-unknown-fields-to-the-schema&update.distrib=TOLEADER&distrib.from=http%3A%2F%2F10.0.1.36%3A8983%2Fsolr%2Fgettingstarted_shard2_replica2%2F&wt=javabin&version=2
>  name="code">400
> 
> SimplePostTool: WARNING: IOException while reading response:
> java.io.IOException: Server returned HTTP response code: 400 for URL:
> http://localhost:8983/solr/gettingstarted/update
> 1 files indexed.
> COMMITting Solr index changes to
> http://localhost:8983/solr/gettingstarted/update...
> Time spent: 0:00:00.093
>
> I haven't seen any online posts related to the QuickStart documentation so
> I
> am not sure where to turn for assistance.
>
> Anyone have any suggestions?
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Tutorial-example-loading-of-exampledocs-for-xml-fails-due-to-bad-request-tp4267878.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
Regards,
Binoy Dalal


Re: Tutorial example loading of exampledocs for *.xml fails due to bad request

2016-04-04 Thread onixterry
OK, but this content was provided by the people who created the tutorial.  
Perhaps there is a change in the recent release as the data files need to be
modified to work?  The tutorial says it is for Solr 5.3 and I am using 5.5.

The XML files all seem very simple.  Example:



  SP2514N
  Samsung SpinPoint P120 SP2514N - hard drive - 250 GB -
ATA-133
  Samsung Electronics Co. Ltd.
  
  samsung
  electronics
  hard drive
  7200RPM, 8MB cache, IDE Ultra ATA-133
  NoiseGuard, SilentSeek technology, Fluid Dynamic
Bearing (FDB) motor
  92.0
  6
  true
  2006-02-13T15:26:37Z
  
  35.0752,-97.032



  6H500F0
  Maxtor DiamondMax 11 - hard drive - 500 GB -
SATA-300
  Maxtor Corp.
  
  maxtor
  electronics
  hard drive
  SATA 3.0Gb/s, NCQ
  8.5ms seek
  16MB cache
  350.0
  6
  true
  
  45.17614,-93.87341
  2006-02-13T15:26:37Z







--
View this message in context: 
http://lucene.472066.n3.nabble.com/Tutorial-example-loading-of-exampledocs-for-xml-fails-due-to-bad-request-tp4267878p4267881.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Tutorial example loading of exampledocs for *.xml fails due to bad request

2016-04-04 Thread Binoy Dalal
You should check the logs. They'll tell you the exact fields that pose a
problem in this case.

On Mon, 4 Apr 2016, 19:22 onixterry,  wrote:

> OK, but this content was provided by the people who created the tutorial.
> Perhaps there is a change in the recent release as the data files need to
> be
> modified to work?  The tutorial says it is for Solr 5.3 and I am using 5.5.
>
> The XML files all seem very simple.  Example:
>
> 
> 
>   SP2514N
>   Samsung SpinPoint P120 SP2514N - hard drive - 250 GB -
> ATA-133
>   Samsung Electronics Co. Ltd.
>
>   samsung
>   electronics
>   hard drive
>   7200RPM, 8MB cache, IDE Ultra ATA-133
>   NoiseGuard, SilentSeek technology, Fluid Dynamic
> Bearing (FDB) motor
>   92.0
>   6
>   true
>   2006-02-13T15:26:37Z
>
>   35.0752,-97.032
> 
>
> 
>   6H500F0
>   Maxtor DiamondMax 11 - hard drive - 500 GB -
> SATA-300
>   Maxtor Corp.
>
>   maxtor
>   electronics
>   hard drive
>   SATA 3.0Gb/s, NCQ
>   8.5ms seek
>   16MB cache
>   350.0
>   6
>   true
>
>   45.17614,-93.87341
>   2006-02-13T15:26:37Z
> 
> 
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Tutorial-example-loading-of-exampledocs-for-xml-fails-due-to-bad-request-tp4267878p4267881.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
Regards,
Binoy Dalal


Re: Sorting question

2016-04-04 Thread Tamás Barta
Hi,

FYI: the final solution I found is that I created a custom
"listpos(fieldName, listId)" function and now I can display a sorted list
via:

fq=listid_s:378
sort=listpos(listpos_s,378) asc

Regards,
Tamas

On Fri, Apr 1, 2016 at 8:55 PM, John Bickerstaff 
wrote:

> Tamas,
>
> This feels a bit like a "user favorites" problem.
>
> I did a little searching and found this...  Don't know if it will help, but
> when I'm looking for stuff like this I find it helps to try to come up with
> generic or different descriptions of my problem and go search those as
> well...
>
>
> http://stackoverflow.com/questions/3931827/solr-merging-results-of-2-cores-into-only-those-results-that-have-a-matching-fie
>
> On Fri, Apr 1, 2016 at 12:40 PM, John Bickerstaff <
> j...@johnbickerstaff.com>
> wrote:
>
> > Tamas,
> >
> > I'm brainstorming here - not being careful, just throwing out ideas...
> >
> > One thing that comes up is a separate document in SOLR - one doc for each
> > list.
> >
> > If a user adds a doc to their list, that doc's id gets added to this
> other
> > type of document...
> >
> > So, a document with the title "List 1" would have a multivalue field of
> > ID's and the list order number like so:
> >
> > IDList Position
> > _
> > doc1 ID :   1
> > doc2 ID:2
> > doc3 ID:3
> >
> > and so on...  The big problem I see with this is keeping it organized
> > correctly.  More code would have to be written to handle this when the
> user
> > does any kind of "crud" on the list...
> >
> > I'm pretty sure there's a way to write a query that uses that list to
> > properly order the items returned by your primary search, although I
> > haven't written such a query yet.
> >
> > If you have the luxury of NOT being in production yet with this system,
> > I'd seriously consider pushing to keep application metadata OUT of your
> > product information store.  This particular problem (of ordering the
> > results based on arbitrary user choices) might be more easily handled
> via a
> > separate step that queries a relational database to handle list order -
> > once Solr gives you the documents that match the query and the user's
> list
> > number...
> >
> > Even if you can't use another relational data store - keeping that
> > metadata out of your individual product documents could be argued to be a
> > good design idea...
> >
> > +
> >
> > Here's an alternative brainstorm...
> >
> > Where does the user data live?  What about putting the information about
> > the order of document ID's in the User's lists with the User?  Then you
> can
> > get all documents that match the search terms and are on List X from
> Solr -
> > and then sort them by ID based on the data associated with the User (a
> list
> > of ID's, in order)
> >
> > There is even a way to write a plugin that will go after external data to
> > help sort Solr documents, although I'm guessing you'd rather avoid
> that...
> >
> >
> >
> > On Fri, Apr 1, 2016 at 11:59 AM, John Bickerstaff <
> > j...@johnbickerstaff.com> wrote:
> >
> >> OK - I get it.  List order is totally arbitrary and cannot be tied to an
> >> hard data point.
> >>
> >> I'll have to think - Perhaps billnbell's solution will help, although
> I'm
> >> not totally sure I understand that suggestion yet.
> >>
> >> At this point, you could get all the documents for List X that match the
> >> search terms.  The next problem is sorting.  If you have the listpos
> field
> >> too, you could use that, and some regex to find the proper order for
> these
> >> documents before displaying them (in code I mean) but of course that
> means
> >> you need some kind of "interceptor" to deal with this before the results
> >> are displayed.
> >>
> >> If I had enough control to do this in code, behind the scenes, I'd grab
> >> that second part of the listops field, put it into a variable on each
> >> object and then sort by that.  Then I'd return the entire list to the
> UI.
> >>
> >> I understand that if you could get SOLR to do it all, that would be
> >> ideal...  There is the possibility of writing some new code and
> plugging it
> >> in to Solr, but I'm guessing you don't want to go that far..  As a final
> >> step in the process, with custom code to consume the listpos entry,
> sorting
> >> these would be fairly straightforward.  I'm not sure how you get away
> from
> >> the lispos multivalue field however...
> >>
> >> I'll keep thinking...
> >>
> >> On Fri, Apr 1, 2016 at 11:26 AM, Tamás Barta 
> >> wrote:
> >>
> >>> So, the list order is determined by the user. The user creates a list,
> >>> adds
> >>> products to it and i have to display these list using filters and
> >>> pagination.
> >>>
> >>> Let's assume there is list with 1 products in it. In the website
> >>> where
> >>> i display the list only 50 products are displayed in a page. So if i
> >>> could
> >>> query solr to give me products from list 

Porting LTR plugin for Solr-5.5.0

2016-04-04 Thread Ahmet Anil Pala
I need to use the LTR plugin which compiles for Solr-6.0.0 
[here](https://github.com/bloomberg/lucene-solr/tree/master-ltr-plugin-rfc-cpoerschke-comments)

I have attempted to port the plugin for Solr-5.5.0 
[here](https://github.com/aanilpala/lucene-solr/commit/94ad14c4b9eae2c899e3941967f59b9fc20401b9)

Reranking works fine, however when I want to extract features with the 
following parameters, I get an NPE : 

fv=true&fl=*,[features]

Here is the stack trace :

o.a.s.s.HttpSolrCall null:java.lang.NullPointerException
at 
org.apache.solr.ltr.ranking.LTRFeatureLoggerTransformerFactory$FeatureTransformer.transform(LTRFeatureLoggerTransformerFactory.java:131)
at 
org.apache.solr.response.transform.DocTransformers.transform(DocTransformers.java:76)
at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:160)
at 
org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:246)
at 
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:151)
at 
org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:183)
at 
org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:299)
at 
org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:95)
at 
org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:60)
at 
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:52)
at 
org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:743)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:467)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:225)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:183)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)

Regards
Anil

Re: Sorting question

2016-04-04 Thread John Bickerstaff
Thanks for sharing the solution Tamas -- I was hoping you'd let us know...

On Mon, Apr 4, 2016 at 8:05 AM, Tamás Barta  wrote:

> Hi,
>
> FYI: the final solution I found is that I created a custom
> "listpos(fieldName, listId)" function and now I can display a sorted list
> via:
>
> fq=listid_s:378
> sort=listpos(listpos_s,378) asc
>
> Regards,
> Tamas
>
> On Fri, Apr 1, 2016 at 8:55 PM, John Bickerstaff  >
> wrote:
>
> > Tamas,
> >
> > This feels a bit like a "user favorites" problem.
> >
> > I did a little searching and found this...  Don't know if it will help,
> but
> > when I'm looking for stuff like this I find it helps to try to come up
> with
> > generic or different descriptions of my problem and go search those as
> > well...
> >
> >
> >
> http://stackoverflow.com/questions/3931827/solr-merging-results-of-2-cores-into-only-those-results-that-have-a-matching-fie
> >
> > On Fri, Apr 1, 2016 at 12:40 PM, John Bickerstaff <
> > j...@johnbickerstaff.com>
> > wrote:
> >
> > > Tamas,
> > >
> > > I'm brainstorming here - not being careful, just throwing out ideas...
> > >
> > > One thing that comes up is a separate document in SOLR - one doc for
> each
> > > list.
> > >
> > > If a user adds a doc to their list, that doc's id gets added to this
> > other
> > > type of document...
> > >
> > > So, a document with the title "List 1" would have a multivalue field of
> > > ID's and the list order number like so:
> > >
> > > IDList Position
> > > _
> > > doc1 ID :   1
> > > doc2 ID:2
> > > doc3 ID:3
> > >
> > > and so on...  The big problem I see with this is keeping it organized
> > > correctly.  More code would have to be written to handle this when the
> > user
> > > does any kind of "crud" on the list...
> > >
> > > I'm pretty sure there's a way to write a query that uses that list to
> > > properly order the items returned by your primary search, although I
> > > haven't written such a query yet.
> > >
> > > If you have the luxury of NOT being in production yet with this system,
> > > I'd seriously consider pushing to keep application metadata OUT of your
> > > product information store.  This particular problem (of ordering the
> > > results based on arbitrary user choices) might be more easily handled
> > via a
> > > separate step that queries a relational database to handle list order -
> > > once Solr gives you the documents that match the query and the user's
> > list
> > > number...
> > >
> > > Even if you can't use another relational data store - keeping that
> > > metadata out of your individual product documents could be argued to
> be a
> > > good design idea...
> > >
> > > +
> > >
> > > Here's an alternative brainstorm...
> > >
> > > Where does the user data live?  What about putting the information
> about
> > > the order of document ID's in the User's lists with the User?  Then you
> > can
> > > get all documents that match the search terms and are on List X from
> > Solr -
> > > and then sort them by ID based on the data associated with the User (a
> > list
> > > of ID's, in order)
> > >
> > > There is even a way to write a plugin that will go after external data
> to
> > > help sort Solr documents, although I'm guessing you'd rather avoid
> > that...
> > >
> > >
> > >
> > > On Fri, Apr 1, 2016 at 11:59 AM, John Bickerstaff <
> > > j...@johnbickerstaff.com> wrote:
> > >
> > >> OK - I get it.  List order is totally arbitrary and cannot be tied to
> an
> > >> hard data point.
> > >>
> > >> I'll have to think - Perhaps billnbell's solution will help, although
> > I'm
> > >> not totally sure I understand that suggestion yet.
> > >>
> > >> At this point, you could get all the documents for List X that match
> the
> > >> search terms.  The next problem is sorting.  If you have the listpos
> > field
> > >> too, you could use that, and some regex to find the proper order for
> > these
> > >> documents before displaying them (in code I mean) but of course that
> > means
> > >> you need some kind of "interceptor" to deal with this before the
> results
> > >> are displayed.
> > >>
> > >> If I had enough control to do this in code, behind the scenes, I'd
> grab
> > >> that second part of the listops field, put it into a variable on each
> > >> object and then sort by that.  Then I'd return the entire list to the
> > UI.
> > >>
> > >> I understand that if you could get SOLR to do it all, that would be
> > >> ideal...  There is the possibility of writing some new code and
> > plugging it
> > >> in to Solr, but I'm guessing you don't want to go that far..  As a
> final
> > >> step in the process, with custom code to consume the listpos entry,
> > sorting
> > >> these would be fairly straightforward.  I'm not sure how you get away
> > from
> > >> the lispos multivalue field however...
> > >>
> > >> I'll keep thinking...
> > >>
> > >> On Fri, Apr 1, 2016 at 11:26 AM, Tamás Barta 
> > >> wro

Re: Same origin policy for Apache Solr 5.5

2016-04-04 Thread Upayavira
Why would you want to do this?

On Sun, 3 Apr 2016, at 04:15 AM, Aditya Desai wrote:
> Hello SOLR Experts
> 
> I am interested to know if SOLR 5.5 supports Same Origin Policy. I am
> trying to read the data from http://localhost:8984/Solr_1/my/directory1
> and
> display it on UI on http://localhost:8983/Solr_2/my/directory2.
> 
> http://localhost:8983 has Solr 4.10 running and http://localhost:8984 has
> Solr 5.5 running.I am using Javascript to make XMLHTTP request, but it is
> failing with NS_ERROR. So I doubt SOLR supports same origin policy.
> 
> Is this possible? Any suggestion on how to achieve this?
> 
> Thanks in advance
> 
> -- 
> Aditya Ramachandra Desai
> MS Computer Science Graduate Student
> USC Viterbi School of Engineering
> Los Angeles, CA 90007
> M : +1-415-463-9864 | L : https://www.linkedin.com/in/adityardesai


Re: Same origin policy for Apache Solr 5.5

2016-04-04 Thread Aditya Desai
Hello Upayavira

I am trying to build an application to get the data from independent stand
alone SOLR4.10 and then parse that data on global map. So effectively there
are
two SOLRs, one is independent(4.10) and the other one is having Map
APIs(SOLR 5.10 here). I want to give customers the my entire SOLR5.5
package and they
just need to put the collections present in any SOLR(here SOLR 4.10). Does
this help?

On Mon, Apr 4, 2016 at 9:11 AM, Upayavira  wrote:

> Why would you want to do this?
>
> On Sun, 3 Apr 2016, at 04:15 AM, Aditya Desai wrote:
> > Hello SOLR Experts
> >
> > I am interested to know if SOLR 5.5 supports Same Origin Policy. I am
> > trying to read the data from
> https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8984_Solr-5F1_my_directory1&d=CwICAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=YI-5lkYG_20oFkV3OYnMntLrJtskVjWY2d7zJBcsfpo&s=OZNbFMIY0w8PkqNE-rdtJ1_HXYKHVV14O9xOQHeLaTg&e=
> > and
> > display it on UI on
> https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8983_Solr-5F2_my_directory2&d=CwICAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=YI-5lkYG_20oFkV3OYnMntLrJtskVjWY2d7zJBcsfpo&s=jbb1jIDNQ5S-5WilIQjNWPWj6odAi1Dw76aUZEeEsR8&e=
> .
> >
> >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8983&d=CwICAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=YI-5lkYG_20oFkV3OYnMntLrJtskVjWY2d7zJBcsfpo&s=Qw1EoPcAPdhlW4lJ7QH1P2CcL--41WTsAPqBaGuqzmQ&e=
> has Solr 4.10 running and
> https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8984&d=CwICAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=YI-5lkYG_20oFkV3OYnMntLrJtskVjWY2d7zJBcsfpo&s=VOLsCmWLyGadKpEldVW2r4VDXnfaJsYQGvUZlXAwPF8&e=
> has
> > Solr 5.5 running.I am using Javascript to make XMLHTTP request, but it is
> > failing with NS_ERROR. So I doubt SOLR supports same origin policy.
> >
> > Is this possible? Any suggestion on how to achieve this?
> >
> > Thanks in advance
> >
> > --
> > Aditya Ramachandra Desai
> > MS Computer Science Graduate Student
> > USC Viterbi School of Engineering
> > Los Angeles, CA 90007
> > M : +1-415-463-9864 | L :
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_in_adityardesai&d=CwICAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=YI-5lkYG_20oFkV3OYnMntLrJtskVjWY2d7zJBcsfpo&s=ZYioUPYMkaBFyqZkefbXTCv8WpOtY-i-yf63sTnQMsg&e=
>



-- 
Aditya Ramachandra Desai
MS Computer Science Graduate Student
USC Viterbi School of Engineering
Los Angeles, CA 90007
M : +1-415-463-9864 | L : https://www.linkedin.com/in/adityardesai


SolrCloud backup/restore

2016-04-04 Thread Zisis Tachtsidis
I've tested backup/restore successfully in a SolrCloud installation with a
single node (no replicas). This has been achieved in
https://issues.apache.org/jira/browse/SOLR-6637 
Can you do something similar when more replicas are involved? What I'm
looking for is a restore command that will restore index in all replicas of
a collection.
Judging from the code in /ReplicationHandler.java/ and
https://issues.apache.org/jira/browse/SOLR-5750 I assume that more work
needs to be done to achieve this.

Is my understanding correct? If the situation is like this I guess an
alternative would be to just create a new collection, restore index and then
add replicas. (I'm using Solr 5.5.0)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-backup-restore-tp4267954.html
Sent from the Solr - User mailing list archive at Nabble.com.


Sort order for *:* query

2016-04-04 Thread Steven White
Hi everyone,

When I send Solr the query *:* the result I get back is sorted based on
Lucene's internal DocID which is oldest to most recent (can someone correct
me if I get this wrong?)  Given this, the most recently added / updated
document is at the bottom of the list.  Is there a way to reverse this sort
order?  If so, how can I make this the default in Solr's solrconfig.xml
file?

Thanks

Steve


Re: Sort order for *:* query

2016-04-04 Thread Chris Hostetter

1) The hard coded implicit default sort order is "score desc" 

2) Whenever a sort results in ties, the final ordering of tied documents 
is non-deterministic

3) currently the behavior is that tied documents are returned in "index 
order" but that can change as segments are merged

4) if you wish to change the beahvior when there is a tie, just add 
additional deterministic sort clauses to your sort param.  This can be 
done at the request level, or as a user specified "default" for the 
request handler...

https://cwiki.apache.org/confluence/display/solr/InitParams+in+SolrConfig


: Date: Mon, 4 Apr 2016 13:34:27 -0400
: From: Steven White 
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Sort order for *:* query
: 
: Hi everyone,
: 
: When I send Solr the query *:* the result I get back is sorted based on
: Lucene's internal DocID which is oldest to most recent (can someone correct
: me if I get this wrong?)  Given this, the most recently added / updated
: document is at the bottom of the list.  Is there a way to reverse this sort
: order?  If so, how can I make this the default in Solr's solrconfig.xml
: file?
: 
: Thanks
: 
: Steve
: 

-Hoss
http://www.lucidworks.com/


Re: Sort order for *:* query

2016-04-04 Thread John Bickerstaff
You can sort like this (I believe that _version_ is the internal id/index
number for the document, but you might want to verify)

In the Admin UI, enter the following in the sort field:

_version_ asc

You could also put an entry in the default searchHandler in solrconfig.xml
to do this to every incoming query...

This is the one that gets hit from "/select"

It would look something like this although I haven't tested...  Don't know
if a colon is necessary or not between the fieldname and desc.

_version_ desc

And, of course, you can put it on the URL you are hitting if that's what
you need to do.



On Mon, Apr 4, 2016 at 11:34 AM, Steven White  wrote:

> Hi everyone,
>
> When I send Solr the query *:* the result I get back is sorted based on
> Lucene's internal DocID which is oldest to most recent (can someone correct
> me if I get this wrong?)  Given this, the most recently added / updated
> document is at the bottom of the list.  Is there a way to reverse this sort
> order?  If so, how can I make this the default in Solr's solrconfig.xml
> file?
>
> Thanks
>
> Steve
>


Re: Tutorial example loading of exampledocs for *.xml fails due to bad request

2016-04-04 Thread Chris Hostetter

: When I attempt the second example, of loading the *.xml files, I receive an
: error back.  I tried just one of the XMLs and receive the same error.  

Yeah ... there's a poor assumption here in the tutorial.  note in 
particular this paragraph...

--SNIP--
Solr's install includes a handful of Solr XML formatted files with example 
data (mostly mocked tech product data). NOTE: This tech product data has a 
more domain-specific configuration, including schema and browse UI. The 
bin/solr script includes built-in support for this by running bin/solr 
start -e techproducts which not only starts Solr but also then indexes 
this data too (be sure to bin/solr stop -all before trying it out). 
However, the example below assumes Solr was started with bin/solr start -e 
cloud to stay consistent with all examples on this page, and thus the 
collection used is "gettingstarted", not "techproducts".
--SNIP--

If you use "bin/solr start -e techproducts" (or explicitly create a solr 
collection using the "sample_techproducts" config set) then those 
documents will index just fine -- but the assumption written here in the 
tutorial that you can index those tech product documents to the same 
gettingstarted collection you've been indexing to earlier in the tutorial 
is definitely flawed -- the fieldtype deduction logic that's applied for 
the gettingstarted collection (and the specific type deduced from the 
earlier docs) won't neccessarily apply to the sample tech product 
documents.

https://issues.apache.org/jira/browse/SOLR-8943


-Hoss
http://www.lucidworks.com/


Solr 4 replication

2016-04-04 Thread abhi Abhishek
Hi all,
Is solr 4 replication push or pull?

Best Regards,
Abhishek


Re: Tutorial example loading of exampledocs for *.xml fails due to bad request

2016-04-04 Thread onixterry
Ah, ok.  I was just figuring that out when I stripped everything down to two
fields and it was still failing until I put a numeric value in a field
called "name".

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Tutorial-example-loading-of-exampledocs-for-xml-fails-due-to-bad-request-tp4267878p4267990.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Sort order for *:* query

2016-04-04 Thread Chris Hostetter

: You can sort like this (I believe that _version_ is the internal id/index
: number for the document, but you might want to verify)

that is not true, and i strongly advise you not to try to sort on the 
_version_ field ... for some queries/testing it may deceptively *look* 
like it's sorting by the order the documents are added, but it will not 
actaully sort in any useful way -- two documents added in sequence A, B 
may have version values that are not in ascending sequence (depending on 
the hash bucket their uniqueKeys fall in for routing purposes) so sorting 
on that field will not give you any sort of meaningful order

If you want to sort by "recency" or "date added you need to add a 
date based field to capture this.  see for example the 
TimestampUpdateProcessorFactory...

https://lucene.apache.org/solr/5_5_0/solr-core/org/apache/solr/update/processor/TimestampUpdateProcessorFactory.html



-Hoss
http://www.lucidworks.com/


Re: Sort order for *:* query

2016-04-04 Thread Yonik Seeley
On Mon, Apr 4, 2016 at 2:24 PM, Chris Hostetter
 wrote:
>
> : You can sort like this (I believe that _version_ is the internal id/index
> : number for the document, but you might want to verify)
>
> that is not true, and i strongly advise you not to try to sort on the
> _version_ field ... for some queries/testing it may deceptively *look*
> like it's sorting by the order the documents are added, but it will not
> actaully sort in any useful way -- two documents added in sequence A, B
> may have version values that are not in ascending sequence (depending on
> the hash bucket their uniqueKeys fall in for routing purposes) so sorting
> on that field will not give you any sort of meaningful order

Not sure I understand... _version_ is time based and hence will give
roughly the same accuracy as something like
TimestampUpdateProcessorFactory that you recommend below.  Both
methods will not be strictly equivalent to indexed order due to
parallelism / thread scheduling, etc., but will generally be pretty
close.
_version_ has the added benefit of being unique in an index (hence a
sort on _version_ won't resort to a tie-break by unstable
internal-id).

-Yonik


> If you want to sort by "recency" or "date added you need to add a
> date based field to capture this.  see for example the
> TimestampUpdateProcessorFactory...
>
> https://lucene.apache.org/solr/5_5_0/solr-core/org/apache/solr/update/processor/TimestampUpdateProcessorFactory.html
>
>
>
> -Hoss
> http://www.lucidworks.com/


Re: Solr 4 replication

2016-04-04 Thread Mikhail Khludnev
It's pull, but you can trigger pulling.

On Mon, Apr 4, 2016 at 9:19 PM, abhi Abhishek  wrote:

> Hi all,
> Is solr 4 replication push or pull?
>
> Best Regards,
> Abhishek
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





RE: using custom analyzer on SolrCloud

2016-04-04 Thread Rose, Stuart J
Thanks that simplifies things!

Creating a lib folder under solr home and placing the customanalyzer.jar there 
works for me :) 

I also had to change how I start solr and create the collection, here are the 
steps I followed after copying 'basic_configs' to 'tdt_configs' in the 
solr/configsets folder: 

bin/solr start -cloud
bin/solr create_collection -c tdt2 -d tdt2_configs -shards 2
bin/solr stop -all
bin/solr start -cloud
bin/solr healthcheck -c tdt2
curl 'http://localhost:8983/solr/tdt2/schema/fields?wt=json'



Stuart


-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Sunday, April 03, 2016 8:04 PM
To: solr-user@lucene.apache.org
Subject: Re: using custom analyzer on SolrCloud

On 4/2/2016 9:04 PM, Rose, Stuart J wrote:
> I am trying to setup on my dev workstation a small SolrCloud in order to 
> assess the faceting capability in Solr 5.5 and I have several questions.
>
> First some context:
> I need to be able to add a field that uses a custom analyzer.
>
> In a perfect world I would be able to just drop the 'customanalyzer.jar' 
> somewhere in the solr folder and trigger zookeeper to propagate that to the 
> various cores. As I have not seen any mention of how to do that in the ref 
> guide or online I am assuming I really do need to use the blob store thingy 
> which is unfortunate.

On every server, create a "lib" directory in the Solr home.  Where this lives 
will depend on exactly how you installed Solr and how you're starting it.  
Normally your core instanceDirs will be in the Solr home.

Copy your jar into that lib directory.  It will be loaded once when Solr 
starts, and will be available to every core on that machine.  No "lib"
config elements are necessary, and trying to load jars from this directory will 
probably break those jars, because then they would be loaded twice.

Alternately you could use the blob store, but I have no idea how to do it, or 
which version of Solr includes that feature.

Thanks,
Shawn




Parallel Updates

2016-04-04 Thread Robert Brown

Hi,

Does Solr have any sort of limit when attempting multiple updates, from 
separate clients?


Are there any safe thresholds one should try to stay within?

I have an index of around 60m documents that gets updated at key points 
during the day from ~200 downloaded files - I'd like to fork off 
multiple processes to deal with the incoming data to get it into Solr 
quicker.


Thanks,
Rob




Re: Parallel Updates

2016-04-04 Thread John Bickerstaff
Will the processes be Solr processes?  Or do you mean multiple threads
hitting the same Solr server(s)?

There will be a natural bottleneck at one Solr server if you are hitting it
with a lot of threads - since that one server will have to do all the
indexing.

I don't know if this idea is helpful, but if your underlying challenge is
protecting the user experience and preventing slowdown during the indexing,
you can have a separate Solr server that just accepts incoming documents
(and bearing the cost of the indexing) while serving documents from other
Solr servers...

There will be a slight cost for those "serving servers" to get updates from
the "indexing server" but that will be much less than the cost of indexing
directly.

If processing power was really important you could have two or more
"indexing" servers and fire multiple threads at each one...

You probably already know this, but the key is how often you "commit" and
force the indexing to occur...

On Mon, Apr 4, 2016 at 3:33 PM, Robert Brown  wrote:

> Hi,
>
> Does Solr have any sort of limit when attempting multiple updates, from
> separate clients?
>
> Are there any safe thresholds one should try to stay within?
>
> I have an index of around 60m documents that gets updated at key points
> during the day from ~200 downloaded files - I'd like to fork off multiple
> processes to deal with the incoming data to get it into Solr quicker.
>
> Thanks,
> Rob
>
>
>


Re: Parallel Updates

2016-04-04 Thread Robert Brown

Thanks John,

I have 2 shards, 1 replica in each.

The issue is the external processing job(s) I have to convert external 
data into JSON, and then upload it via cURL.


Will one Solr server only accept one update at a time and have any 
others queued?  (And possibly timeout).


I like the idea of having my leaders only deal with indexing, and the 
replicas only deal with searching - how can I actually configure this?  
And is it actually required with my shard setup?


I'm doing hard commits every minute but not opening a new searcher (so I 
know the data is safe), with soft commits happening every 10 minutes to 
make the data visible.


Cheers,
Rob


On 04/04/16 22:40, John Bickerstaff wrote:

Will the processes be Solr processes?  Or do you mean multiple threads
hitting the same Solr server(s)?

There will be a natural bottleneck at one Solr server if you are hitting it
with a lot of threads - since that one server will have to do all the
indexing.

I don't know if this idea is helpful, but if your underlying challenge is
protecting the user experience and preventing slowdown during the indexing,
you can have a separate Solr server that just accepts incoming documents
(and bearing the cost of the indexing) while serving documents from other
Solr servers...

There will be a slight cost for those "serving servers" to get updates from
the "indexing server" but that will be much less than the cost of indexing
directly.

If processing power was really important you could have two or more
"indexing" servers and fire multiple threads at each one...

You probably already know this, but the key is how often you "commit" and
force the indexing to occur...

On Mon, Apr 4, 2016 at 3:33 PM, Robert Brown  wrote:


Hi,

Does Solr have any sort of limit when attempting multiple updates, from
separate clients?

Are there any safe thresholds one should try to stay within?

I have an index of around 60m documents that gets updated at key points
during the day from ~200 downloaded files - I'd like to fork off multiple
processes to deal with the incoming data to get it into Solr quicker.

Thanks,
Rob







Re: Parallel Updates

2016-04-04 Thread Anshum Gupta
The short answer is - There's no real limit on Solr in terms of
concurrency.

Here are a few things that would impact your numbers though:
* What version of Solr are you using and how ? i.e. SolrCloud, standalone,
traditional replication ?
* Do you use atomic updates?
* How do you index ?

Assuming you are on SolrCloud, you wouldn't be able to have a dedicated
indexing node.

There are a ton of other settings you could read about and tweak to get
good throughput but in general, multi-threading is highly recommended in
terms of indexing.


On Mon, Apr 4, 2016 at 2:33 PM, Robert Brown  wrote:

> Hi,
>
> Does Solr have any sort of limit when attempting multiple updates, from
> separate clients?
>
> Are there any safe thresholds one should try to stay within?
>
> I have an index of around 60m documents that gets updated at key points
> during the day from ~200 downloaded files - I'd like to fork off multiple
> processes to deal with the incoming data to get it into Solr quicker.
>
> Thanks,
> Rob
>
>
>


-- 
Anshum Gupta


Re: Sort order for *:* query

2016-04-04 Thread Chris Hostetter
: 
: Not sure I understand... _version_ is time based and hence will give
: roughly the same accuracy as something like
: TimestampUpdateProcessorFactory that you recommend below.  Both

Hmmm... last time i looked, i thought _version_ numbers were allocated & 
incremented on a per-shard basis and "time" was only used for initial 
seeding when the leader started up -- so in a stable system running for 
a long time, if shardA gets signifcantly more updates then shardB the 
_version_ numbers can get skewed and a new doc in shardB might be updated 
with a _version_ less then the _version_ of a document added to shardA 
well before that.

But maybe I'm remembering wrong?



-Hoss
http://www.lucidworks.com/


Re: Sort order for *:* query

2016-04-04 Thread Yonik Seeley
On Mon, Apr 4, 2016 at 6:06 PM, Chris Hostetter
 wrote:
> :
> : Not sure I understand... _version_ is time based and hence will give
> : roughly the same accuracy as something like
> : TimestampUpdateProcessorFactory that you recommend below.  Both
>
> Hmmm... last time i looked, i thought _version_ numbers were allocated &
> incremented on a per-shard basis and "time" was only used for initial
> seeding when the leader started up

No, time is used for every version generated.  Upper bits are
milliseconds and lower bits are incremented only if needed for
uniqueness in the shard (i.e. two documents indexed at the same
millisecond).  We have 20 lower bits, so one would need a sustained
indexing rate of over 1M documents per millisecond (or 1B docs/sec) to
introduce a permanent skew due to indexing.

There is system clock skew between shards of course, but an update
processor that added a date field would include that as well.

The code in VersionInfo is:

public long getNewClock() {
  synchronized (clockSync) {
long time = System.currentTimeMillis();
long result = time << 20;
if (result <= vclock) {
  result = vclock + 1;
}
vclock = result;
return vclock;
  }
}


-Yonik

> -- so in a stable system running for
> a long time, if shardA gets signifcantly more updates then shardB the
> _version_ numbers can get skewed and a new doc in shardB might be updated
> with a _version_ less then the _version_ of a document added to shardA
> well before that.
>
> But maybe I'm remembering wrong?
>
>
>
> -Hoss
> http://www.lucidworks.com/


Re: Parallel Updates

2016-04-04 Thread John Bickerstaff
Does SOLR cloud push indexing across all nodes?  I've been planning 4 SOLR
boxes with only 3 exposed via the load balancer, leaving the 4th available
internally for my microservices to hit with indexing work.

I was assuming that if I hit my "solr4" IP address, only "solr4" will do
the indexing...  Perhaps I'm making a dangerous assumption?
On Apr 4, 2016 3:49 PM, "Anshum Gupta"  wrote:

The short answer is - There's no real limit on Solr in terms of
concurrency.

Here are a few things that would impact your numbers though:
* What version of Solr are you using and how ? i.e. SolrCloud, standalone,
traditional replication ?
* Do you use atomic updates?
* How do you index ?

Assuming you are on SolrCloud, you wouldn't be able to have a dedicated
indexing node.

There are a ton of other settings you could read about and tweak to get
good throughput but in general, multi-threading is highly recommended in
terms of indexing.


On Mon, Apr 4, 2016 at 2:33 PM, Robert Brown  wrote:

> Hi,
>
> Does Solr have any sort of limit when attempting multiple updates, from
> separate clients?
>
> Are there any safe thresholds one should try to stay within?
>
> I have an index of around 60m documents that gets updated at key points
> during the day from ~200 downloaded files - I'd like to fork off multiple
> processes to deal with the incoming data to get it into Solr quicker.
>
> Thanks,
> Rob
>
>
>


--
Anshum Gupta


Re: Parallel Updates

2016-04-04 Thread Anshum Gupta
Solr would push all updates to all shards that are supposed to host the
data. The documents are initially forwarded to the leader of the shard,
which can dynamically change and the leader is responsible for versioning
and ensuring replication across the followers but other than that, all
nodes would be equally loaded in most regular situations.

On Mon, Apr 4, 2016 at 3:37 PM, John Bickerstaff 
wrote:

> Does SOLR cloud push indexing across all nodes?  I've been planning 4 SOLR
> boxes with only 3 exposed via the load balancer, leaving the 4th available
> internally for my microservices to hit with indexing work.
>
> I was assuming that if I hit my "solr4" IP address, only "solr4" will do
> the indexing...  Perhaps I'm making a dangerous assumption?
> On Apr 4, 2016 3:49 PM, "Anshum Gupta"  wrote:
>
> The short answer is - There's no real limit on Solr in terms of
> concurrency.
>
> Here are a few things that would impact your numbers though:
> * What version of Solr are you using and how ? i.e. SolrCloud, standalone,
> traditional replication ?
> * Do you use atomic updates?
> * How do you index ?
>
> Assuming you are on SolrCloud, you wouldn't be able to have a dedicated
> indexing node.
>
> There are a ton of other settings you could read about and tweak to get
> good throughput but in general, multi-threading is highly recommended in
> terms of indexing.
>
>
> On Mon, Apr 4, 2016 at 2:33 PM, Robert Brown  wrote:
>
> > Hi,
> >
> > Does Solr have any sort of limit when attempting multiple updates, from
> > separate clients?
> >
> > Are there any safe thresholds one should try to stay within?
> >
> > I have an index of around 60m documents that gets updated at key points
> > during the day from ~200 downloaded files - I'd like to fork off multiple
> > processes to deal with the incoming data to get it into Solr quicker.
> >
> > Thanks,
> > Rob
> >
> >
> >
>
>
> --
> Anshum Gupta
>



-- 
Anshum Gupta


Re: Parallel Updates

2016-04-04 Thread Shawn Heisey
On 4/4/2016 3:46 PM, Robert Brown wrote:
> I have 2 shards, 1 replica in each.
>
> The issue is the external processing job(s) I have to convert external
> data into JSON, and then upload it via cURL.
>
> Will one Solr server only accept one update at a time and have any
> others queued?  (And possibly timeout).
>
> I like the idea of having my leaders only deal with indexing, and the
> replicas only deal with searching - how can I actually configure
> this?  And is it actually required with my shard setup?
>
> I'm doing hard commits every minute but not opening a new searcher (so
> I know the data is safe), with soft commits happening every 10 minutes
> to make the data visible.

You can have exactly the paradigm you outlined with the old master-slave
replication -- one master server indexing, and a separate load balancer
dividing traffic among your slave servers.  The problem with this
paradigm is that you have a single point of failure -- the master
server.  Reconfiguring the machines to choose a new master is possible,
but not automatic, and not exactly trivial.

The terminology you used (shards and replicas) suggests that you're
running SolrCloud -- which handles this *very* differently.  In
SolrCloud, every replica indexes the data independently, and every
replica handles queries.  There is no single point of failure, and if
you use a cloud-aware client (there are only two of these, and one of
them is the Java client included with Solr), you don't even need a load
balancer.

Thanks,
Shawn



Re: Complex Sort

2016-04-04 Thread Chris Hostetter

: I am not sure how to use "Sort By Function" for Case.
: 
: |10#40|14#19|33#17|27#6|15#6|19#5|7#2|6#1|29#1|5#1|30#1|28#1|12#0|20#0|
: 
: Can you tell how to fetch 40 when input is 10.

Something like...

if(termfreq(f,10),40,if(termfreq(f,14),19,if(termfreq(f,33),17,)))

But i suspect there may be a much better way to achieve your ultimate goal 
if you tell us what it is.  what do these fields represent? what makes 
these numeric valuessignificant? do you know which values are significant 
when indexing, or do they vary for every query?

https://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue.  Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341




-Hoss
http://www.lucidworks.com/


solr 5.2.1, data import issue, shown processed rows doesn't match acturally indexed doc quantity.

2016-04-04 Thread cqlangyi
hi there,


i have an solr 5.2.1,  when i do data import, after the job is done, it's shown 
165,191 rows processed successfully.


but when i query with *:*, the "numFound" shown only 163,349 docs in index.


when i tred to do it again, , it's shown 165,191 rows processed successfully. 
but the *:* query result now is 162,390.


no errors in any log,


any idea?


thank you very much!


cq








At 2016-04-05 09:19:48, "Chris Hostetter"  wrote:
>
>: I am not sure how to use "Sort By Function" for Case.
>: 
>: |10#40|14#19|33#17|27#6|15#6|19#5|7#2|6#1|29#1|5#1|30#1|28#1|12#0|20#0|
>: 
>: Can you tell how to fetch 40 when input is 10.
>
>Something like...
>
>if(termfreq(f,10),40,if(termfreq(f,14),19,if(termfreq(f,33),17,)))
>
>But i suspect there may be a much better way to achieve your ultimate goal 
>if you tell us what it is.  what do these fields represent? what makes 
>these numeric valuessignificant? do you know which values are significant 
>when indexing, or do they vary for every query?
>
>https://people.apache.org/~hossman/#xyproblem
>XY Problem
>
>Your question appears to be an "XY Problem" ... that is: you are dealing
>with "X", you are assuming "Y" will help you, and you are asking about "Y"
>without giving more details about the "X" so that we can understand the
>full issue.  Perhaps the best solution doesn't involve "Y" at all?
>See Also: http://www.perlmonks.org/index.pl?node_id=542341
>
>
>
>
>-Hoss
>http://www.lucidworks.com/


Re: solr 5.2.1, data import issue, shown processed rows doesn't match acturally indexed doc quantity.

2016-04-04 Thread Binoy Dalal
1) Are you sure you don't have duplicates?
2) All of your records might have been indexed but a new searcher may not
have opened on the updated index yet. Try issuing a commit and see if that
works.

On Tue, 5 Apr 2016, 08:56 cqlangyi,  wrote:

> hi there,
>
>
> i have an solr 5.2.1,  when i do data import, after the job is done, it's
> shown 165,191 rows processed successfully.
>
>
> but when i query with *:*, the "numFound" shown only 163,349 docs in index.
>
>
> when i tred to do it again, , it's shown 165,191 rows processed
> successfully. but the *:* query result now is 162,390.
>
>
> no errors in any log,
>
>
> any idea?
>
>
> thank you very much!
>
>
> cq
>
>
>
>
>
>
>
>
> At 2016-04-05 09:19:48, "Chris Hostetter" 
> wrote:
> >
> >: I am not sure how to use "Sort By Function" for Case.
> >:
> >: |10#40|14#19|33#17|27#6|15#6|19#5|7#2|6#1|29#1|5#1|30#1|28#1|12#0|20#0|
> >:
> >: Can you tell how to fetch 40 when input is 10.
> >
> >Something like...
> >
>
> >if(termfreq(f,10),40,if(termfreq(f,14),19,if(termfreq(f,33),17,)))
> >
> >But i suspect there may be a much better way to achieve your ultimate goal
> >if you tell us what it is.  what do these fields represent? what makes
> >these numeric valuessignificant? do you know which values are significant
> >when indexing, or do they vary for every query?
> >
> >https://people.apache.org/~hossman/#xyproblem
> >XY Problem
> >
> >Your question appears to be an "XY Problem" ... that is: you are dealing
> >with "X", you are assuming "Y" will help you, and you are asking about "Y"
> >without giving more details about the "X" so that we can understand the
> >full issue.  Perhaps the best solution doesn't involve "Y" at all?
> >See Also: http://www.perlmonks.org/index.pl?node_id=542341
> >
> >
> >
> >
> >-Hoss
> >http://www.lucidworks.com/
>
-- 
Regards,
Binoy Dalal


Re: solr 5.2.1, data import issue, shown processed rows doesn't match acturally indexed doc quantity.

2016-04-04 Thread John Bickerstaff
The first question is whether you have duplicate ID's in your data set.

I had the same kind of thing a few months back, freaked out, and spent a
few hours trying to figure it out by coding extra logging etc... to keep
track of every single count at every stage of the process..  All the
numbers matched, right until I sent everything to SOLR...  after which I
ended up with fewer Solr documents than I had "rows" of results.

Then my client told me they knew they had duplicates in the data set based
on the way they "harvest" the data...

I can't explain the difference between the first and second results in SOLR
and of course my situation may not match yours...

But - I suggest clearing solr and starting from scratch.  Get a count at
each stage and (if you're in control of the code as I was) build something
that checks for duplicates (a hashmap in Java is a handy tool for this by
virtue of refusing to accept duplicates).

If you don't have control of the code, you might write some SQL or
something agains the original data store that would uncover the presence of
duplicates.  If you're dealing with a "canned" set of data, you might want
to parse it in code and check for duplicates...

If you can reproduce the difference between the first and second run,
you've got more going on than duplicate ID's - but that might still be part
of your problem.

On Mon, Apr 4, 2016 at 9:26 PM, cqlangyi  wrote:

> hi there,
>
>
> i have an solr 5.2.1,  when i do data import, after the job is done, it's
> shown 165,191 rows processed successfully.
>
>
> but when i query with *:*, the "numFound" shown only 163,349 docs in index.
>
>
> when i tred to do it again, , it's shown 165,191 rows processed
> successfully. but the *:* query result now is 162,390.
>
>
> no errors in any log,
>
>
> any idea?
>
>
> thank you very much!
>
>
> cq
>
>
>
>
>
>
>
>
> At 2016-04-05 09:19:48, "Chris Hostetter" 
> wrote:
> >
> >: I am not sure how to use "Sort By Function" for Case.
> >:
> >: |10#40|14#19|33#17|27#6|15#6|19#5|7#2|6#1|29#1|5#1|30#1|28#1|12#0|20#0|
> >:
> >: Can you tell how to fetch 40 when input is 10.
> >
> >Something like...
> >
>
> >if(termfreq(f,10),40,if(termfreq(f,14),19,if(termfreq(f,33),17,)))
> >
> >But i suspect there may be a much better way to achieve your ultimate goal
> >if you tell us what it is.  what do these fields represent? what makes
> >these numeric valuessignificant? do you know which values are significant
> >when indexing, or do they vary for every query?
> >
> >https://people.apache.org/~hossman/#xyproblem
> >XY Problem
> >
> >Your question appears to be an "XY Problem" ... that is: you are dealing
> >with "X", you are assuming "Y" will help you, and you are asking about "Y"
> >without giving more details about the "X" so that we can understand the
> >full issue.  Perhaps the best solution doesn't involve "Y" at all?
> >See Also: http://www.perlmonks.org/index.pl?node_id=542341
> >
> >
> >
> >
> >-Hoss
> >http://www.lucidworks.com/
>


Re: solr 5.2.1, data import issue, shown processed rows doesn't match acturally indexed doc quantity.

2016-04-04 Thread John Bickerstaff
Sweet - that's a good point - I ran into that too - I had not run the
commit for the last "batch" (I was using SolrJ) and so numbers didn't match
until I did.

On Mon, Apr 4, 2016 at 9:50 PM, Binoy Dalal  wrote:

> 1) Are you sure you don't have duplicates?
> 2) All of your records might have been indexed but a new searcher may not
> have opened on the updated index yet. Try issuing a commit and see if that
> works.
>
> On Tue, 5 Apr 2016, 08:56 cqlangyi,  wrote:
>
> > hi there,
> >
> >
> > i have an solr 5.2.1,  when i do data import, after the job is done, it's
> > shown 165,191 rows processed successfully.
> >
> >
> > but when i query with *:*, the "numFound" shown only 163,349 docs in
> index.
> >
> >
> > when i tred to do it again, , it's shown 165,191 rows processed
> > successfully. but the *:* query result now is 162,390.
> >
> >
> > no errors in any log,
> >
> >
> > any idea?
> >
> >
> > thank you very much!
> >
> >
> > cq
> >
> >
> >
> >
> >
> >
> >
> >
> > At 2016-04-05 09:19:48, "Chris Hostetter" 
> > wrote:
> > >
> > >: I am not sure how to use "Sort By Function" for Case.
> > >:
> > >:
> |10#40|14#19|33#17|27#6|15#6|19#5|7#2|6#1|29#1|5#1|30#1|28#1|12#0|20#0|
> > >:
> > >: Can you tell how to fetch 40 when input is 10.
> > >
> > >Something like...
> > >
> >
> >
> >if(termfreq(f,10),40,if(termfreq(f,14),19,if(termfreq(f,33),17,)))
> > >
> > >But i suspect there may be a much better way to achieve your ultimate
> goal
> > >if you tell us what it is.  what do these fields represent? what makes
> > >these numeric valuessignificant? do you know which values are
> significant
> > >when indexing, or do they vary for every query?
> > >
> > >https://people.apache.org/~hossman/#xyproblem
> > >XY Problem
> > >
> > >Your question appears to be an "XY Problem" ... that is: you are dealing
> > >with "X", you are assuming "Y" will help you, and you are asking about
> "Y"
> > >without giving more details about the "X" so that we can understand the
> > >full issue.  Perhaps the best solution doesn't involve "Y" at all?
> > >See Also: http://www.perlmonks.org/index.pl?node_id=542341
> > >
> > >
> > >
> > >
> > >-Hoss
> > >http://www.lucidworks.com/
> >
> --
> Regards,
> Binoy Dalal
>


Re: solr 5.2.1, data import issue, shown processed rows doesn't match acturally indexed doc quantity.

2016-04-04 Thread John Bickerstaff
Both of us implied it, but to be completely clear - if you have a duplicate
ID in your data set, SOLR will throw away previous documents with that ID
and index the new one.  That's fine if your duplicates really are
duplicates - it's not OK if there's a problem in the data set and the
duplicates ID's are on documents that are actually unique.

On Mon, Apr 4, 2016 at 9:51 PM, John Bickerstaff 
wrote:

> Sweet - that's a good point - I ran into that too - I had not run the
> commit for the last "batch" (I was using SolrJ) and so numbers didn't match
> until I did.
>
> On Mon, Apr 4, 2016 at 9:50 PM, Binoy Dalal 
> wrote:
>
>> 1) Are you sure you don't have duplicates?
>> 2) All of your records might have been indexed but a new searcher may not
>> have opened on the updated index yet. Try issuing a commit and see if that
>> works.
>>
>> On Tue, 5 Apr 2016, 08:56 cqlangyi,  wrote:
>>
>> > hi there,
>> >
>> >
>> > i have an solr 5.2.1,  when i do data import, after the job is done,
>> it's
>> > shown 165,191 rows processed successfully.
>> >
>> >
>> > but when i query with *:*, the "numFound" shown only 163,349 docs in
>> index.
>> >
>> >
>> > when i tred to do it again, , it's shown 165,191 rows processed
>> > successfully. but the *:* query result now is 162,390.
>> >
>> >
>> > no errors in any log,
>> >
>> >
>> > any idea?
>> >
>> >
>> > thank you very much!
>> >
>> >
>> > cq
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > At 2016-04-05 09:19:48, "Chris Hostetter" 
>> > wrote:
>> > >
>> > >: I am not sure how to use "Sort By Function" for Case.
>> > >:
>> > >:
>> |10#40|14#19|33#17|27#6|15#6|19#5|7#2|6#1|29#1|5#1|30#1|28#1|12#0|20#0|
>> > >:
>> > >: Can you tell how to fetch 40 when input is 10.
>> > >
>> > >Something like...
>> > >
>> >
>> >
>> >if(termfreq(f,10),40,if(termfreq(f,14),19,if(termfreq(f,33),17,)))
>> > >
>> > >But i suspect there may be a much better way to achieve your ultimate
>> goal
>> > >if you tell us what it is.  what do these fields represent? what makes
>> > >these numeric valuessignificant? do you know which values are
>> significant
>> > >when indexing, or do they vary for every query?
>> > >
>> > >https://people.apache.org/~hossman/#xyproblem
>> > >XY Problem
>> > >
>> > >Your question appears to be an "XY Problem" ... that is: you are
>> dealing
>> > >with "X", you are assuming "Y" will help you, and you are asking about
>> "Y"
>> > >without giving more details about the "X" so that we can understand the
>> > >full issue.  Perhaps the best solution doesn't involve "Y" at all?
>> > >See Also: http://www.perlmonks.org/index.pl?node_id=542341
>> > >
>> > >
>> > >
>> > >
>> > >-Hoss
>> > >http://www.lucidworks.com/
>> >
>> --
>> Regards,
>> Binoy Dalal
>>
>
>


How to Get info about clusterstate in solr 5.2.1 just like ping request handler with distrib=true

2016-04-04 Thread preeti kumari
Hi,

I am using solr 5.2.1 . We need to configure F5 load balancer with
zookeepers.
For that we need to know whether our cluster as a whole is eligible to
serve queries or not. We can get cluster state using ping request handler
but in solr 5.2.1 with distrib=true it gives exception(known bug in solr
5.2.1). So now I need :

1. Any way to get cluster state as a whole to see if cluster can serve
queries without going to individual solr nodes.
2. If we can anyhow get this info from zookeepers
3. can we make ping request handler with distrib=true work in solr 5.2.1

Any info in this regard would be appreciated where i don't want to go to
individual solr nodes.

Thanks
Preeti