Solr suggest, auto complete & spellcheck

2016-01-04 Thread Steven White
Hi,

I'm trying to understand what are the differences between Solr suggest,
auto complete & spellcheck?  Isn't each a function of the UI?  If not, can
you provide me with links that show end-to-end example setting up Solr to
get all of the 3 features?

I'm on Solr 5.2.

Thanks

Steve


Using "join" with 2+ cores

2016-01-19 Thread Steven White
Hi evryone,

Does Solr's join support 2+ cores?  Is there an example?  Is there
performance impact when I have 5 to 10 cores vs. single core (all my data
in 1 core)?  Is relevancy score impacted with multiple cores vs. single
core?

Thanks

Steve


Faceting and multiValued field type

2016-01-19 Thread Steven White
Hi everyone,

Can I use facet on a field type of multiValued?  If so, how does facet work
with field type of "date" set as multiValued?

Thanks

Steve


Re: Faceting and multiValued field type

2016-01-19 Thread Steven White
My apology for not being clear -- I left out the keyword "range search"
with facet.  Let me try again.

Using DateRangeField field type, if this field is multiValued and I have 3
date values stored for one record, 5 for another, etc., which of those date
values will be used for faceting when I use range-search faceting on this
field?

Don't I have the same issue on other field types when it comes to range
searches?  Such as CurrencyField, or int, float, etc.

-- George

On Tue, Jan 19, 2016 at 1:10 PM, Erick Erickson 
wrote:

> Yes.
>
> What do you mean "how does it work"? The low-level
> details or what?
>
> Basically, faceting just... facets. I.e. for each unique
> value in the field specified it counts the number of
> docs in the result set that have that value.
>
> So if you have a doc with two dates and facet on that
> field, say 1/1/2015 and 1/1/2016,
> that doc will be counted in each bucket.
>
> Best,
> Erick
>
> On Tue, Jan 19, 2016 at 8:48 AM, Steven White 
> wrote:
> > Hi everyone,
> >
> > Can I use facet on a field type of multiValued?  If so, how does facet
> work
> > with field type of "date" set as multiValued?
> >
> > Thanks
> >
> > Steve
>


Re: Using "join" with 2+ cores

2016-01-19 Thread Steven White
I should rephrase my question, this isn't really about "join", it's more
about how do I search across multiple cores as if I have 1 core.  What is
the Solr URL syntax I need to send to Solr to search across N cores as if
I'm searching against 1 core?

I'm on Solr 5.2.1 a none cloud setup.

Steve


On Tue, Jan 19, 2016 at 1:15 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

>
> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-JoinQueryParser
> fromIndex= score=
> I don't think that there is a reason between joining in the single core and
> join across two ones, but I'm not sure what do you mean in 5 to 10 cores.
>
> On Tue, Jan 19, 2016 at 8:43 AM, Steven White 
> wrote:
>
> > Hi evryone,
> >
> > Does Solr's join support 2+ cores?  Is there an example?  Is there
> > performance impact when I have 5 to 10 cores vs. single core (all my data
> > in 1 core)?  Is relevancy score impacted with multiple cores vs. single
> > core?
> >
> > Thanks
> >
> > Steve
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
> 
>


Memory leak defect or misssuse of SolrJ API?

2016-01-30 Thread Steven White
Hi folks,

I'm getting memory leak in my code.  I narrowed the code to the following
minimal to cause the leak.

while (true) {
HttpSolrClient client = new HttpSolrClient("
http://192.168.202.129:8983/solr/core1";);
client.close();
}

Is this a defect or an issue in the way I'm using HttpSolrClient?

I'm on Solr 5.2.1

Thanks.

Steve


Re: Memory leak defect or misssuse of SolrJ API?

2016-01-31 Thread Steven White
Thank you all for your feedback.

This is code that I inherited and the example i gave is intended to
demonstrate the memory leak which based on YourKit is
on java/util/LinkedHashMap$Entry.  In short, I'm getting core dumps with
"Detail "java/lang/OutOfMemoryError" "Java heap space" received "

Here is a more detailed layout of the code.  This is a crawler that runs
24x7 without any recycle logic in place:

init_data()

while (true)
{
HttpSolrClient client = new HttpSolrClient("
http://localhost:8983/solr/core1 <http://192.168.202.129:8983/solr/core1>/");
 <<<< this is real code

see_if_we_have_new_data();

send_new_data_to_solr();

client.close();<<<< this is real code

sleep_for_a_bit(N);<<<< 'N' can be any positive int
}

By default, our Java program is given 4gb of ram "-Xmx4g" and N is set for
5 min.  We had a customer set N to 10 second and we started seeing core
dumps with OOM.  As I started to debug, I narrowed the OOM to
HttpSolrClient per my original email.

The follow up answers I got suggest that I move the construction of
HttpSolrClient object outside the while loop which I did (but I also had to
move "client.close()" outside the loop) and the leak is gone.

Give this, is this how HttpSolrClient is suppose to be used?  If so, what's
the point of HttpSolrClient.close()?

Another side question.  I noticed HttpSolrClient has a setBaseUrl().  Now,
if I call it and give it "http://localhost:8983/solr/core1
<http://192.168.202.129:8983/solr/core1>/" (ntoice the "/" at the end) next
time I use HttpSolrClient to send Solr data, I get back 404. The fix is to
remove the ending "/".  This is not how the constructor of HttpSolrClient
behaves; HttpSolrClient will take the URL with or without "/".

In summary, it would be good if someone can confirm f we have a memory leak
in HttpSolrClient if used per my example; if so this is a defect.  Also,
can someone confirm the fix I used for this issue: move the constructor of
HttpSolrClient outside the loop and reuse the existing object "client".

Again, thank you all for the quick response it is much appreciated.

Steve



On Sat, Jan 30, 2016 at 1:24 PM, Erick Erickson 
wrote:

> Assuming you're not really using code like above and it's a test case
>
> What's your evidence that memory consumption goes up? Are you sure
> you're not just seeing uncollected garbage?
>
> When I attached Java Mission Control to this program it looked pretty
> scary at first, but the heap allocated after old generation garbage
> collections leveled out to a steady state.
>
>
> On Sat, Jan 30, 2016 at 9:29 AM, Walter Underwood 
> wrote:
> > Create one HttpSolrClient object for each Solr server you are talking
> to. Reuse it for all requests to that Solr server.
> >
> > It will manage a pool of connections and keep them alive for faster
> communication.
> >
> > I took a look at the JavaDoc and the wiki doc, neither one explains this
> well. I don’t think they even point out what is thread safe.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >
> >> On Jan 30, 2016, at 7:42 AM, Susheel Kumar 
> wrote:
> >>
> >> Hi Steve,
> >>
> >> Can you please elaborate what error you are getting and i didn't
> understand
> >> your code above, that why initiating Solr client object  is in loop.  In
> >> general  creating client instance should be outside the loop and a one
> time
> >> activity during the complete execution of program.
> >>
> >> Thanks,
> >> Susheel
> >>
> >> On Sat, Jan 30, 2016 at 8:15 AM, Steven White 
> wrote:
> >>
> >>> Hi folks,
> >>>
> >>> I'm getting memory leak in my code.  I narrowed the code to the
> following
> >>> minimal to cause the leak.
> >>>
> >>>while (true) {
> >>>HttpSolrClient client = new HttpSolrClient("
> >>> http://192.168.202.129:8983/solr/core1";);
> >>>client.close();
> >>>}
> >>>
> >>> Is this a defect or an issue in the way I'm using HttpSolrClient?
> >>>
> >>> I'm on Solr 5.2.1
> >>>
> >>> Thanks.
> >>>
> >>> Steve
> >>>
> >
>


Re: Memory leak defect or misssuse of SolrJ API?

2016-01-31 Thread Steven White
Thanks Walter.  Yes, I saw your answer and fixed the issue per your
suggestion.

The JavaDoc need to make this clear.  The fact there is a close() on this
class and the JavaDoc does not say "your program should have exactly as
many HttpSolrClient objects as there are servers it talks to" is a prime
candidate for missuses of the class.

Steve


On Sun, Jan 31, 2016 at 5:20 PM, Walter Underwood 
wrote:

> I already answered this.
>
> Move the creation of the HttpSolrClient outside the loop. Your code will
> run much fast, because it will be able to reuse the connections.
>
> Put another way, your program should have exactly as many HttpSolrClient
> objects as there are servers it talks to. If there is one Solr server, you
> have one object.
>
> There is no leak in HttpSolrClient, you are misusing the class, massively.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Jan 31, 2016, at 2:10 PM, Steven White  wrote:
> >
> > Thank you all for your feedback.
> >
> > This is code that I inherited and the example i gave is intended to
> > demonstrate the memory leak which based on YourKit is
> > on java/util/LinkedHashMap$Entry.  In short, I'm getting core dumps with
> > "Detail "java/lang/OutOfMemoryError" "Java heap space" received "
> >
> > Here is a more detailed layout of the code.  This is a crawler that runs
> > 24x7 without any recycle logic in place:
> >
> >init_data()
> >
> >while (true)
> >{
> >HttpSolrClient client = new HttpSolrClient("
> > http://localhost:8983/solr/core1 <http://192.168.202.129:8983/solr/core1
> >/");
> > <<<< this is real code
> >
> >see_if_we_have_new_data();
> >
> >send_new_data_to_solr();
> >
> >client.close();<<<< this is real code
> >
> >sleep_for_a_bit(N);<<<< 'N' can be any positive int
> >}
> >
> > By default, our Java program is given 4gb of ram "-Xmx4g" and N is set
> for
> > 5 min.  We had a customer set N to 10 second and we started seeing core
> > dumps with OOM.  As I started to debug, I narrowed the OOM to
> > HttpSolrClient per my original email.
> >
> > The follow up answers I got suggest that I move the construction of
> > HttpSolrClient object outside the while loop which I did (but I also had
> to
> > move "client.close()" outside the loop) and the leak is gone.
> >
> > Give this, is this how HttpSolrClient is suppose to be used?  If so,
> what's
> > the point of HttpSolrClient.close()?
> >
> > Another side question.  I noticed HttpSolrClient has a setBaseUrl().
> Now,
> > if I call it and give it "http://localhost:8983/solr/core1
> > <http://192.168.202.129:8983/solr/core1>/" (ntoice the "/" at the end)
> next
> > time I use HttpSolrClient to send Solr data, I get back 404. The fix is
> to
> > remove the ending "/".  This is not how the constructor of HttpSolrClient
> > behaves; HttpSolrClient will take the URL with or without "/".
> >
> > In summary, it would be good if someone can confirm f we have a memory
> leak
> > in HttpSolrClient if used per my example; if so this is a defect.  Also,
> > can someone confirm the fix I used for this issue: move the constructor
> of
> > HttpSolrClient outside the loop and reuse the existing object "client".
> >
> > Again, thank you all for the quick response it is much appreciated.
> >
> > Steve
> >
> >
> >
> > On Sat, Jan 30, 2016 at 1:24 PM, Erick Erickson  >
> > wrote:
> >
> >> Assuming you're not really using code like above and it's a test
> case
> >>
> >> What's your evidence that memory consumption goes up? Are you sure
> >> you're not just seeing uncollected garbage?
> >>
> >> When I attached Java Mission Control to this program it looked pretty
> >> scary at first, but the heap allocated after old generation garbage
> >> collections leveled out to a steady state.
> >>
> >>
> >> On Sat, Jan 30, 2016 at 9:29 AM, Walter Underwood <
> wun...@wunderwood.org>
> >> wrote:
> >>> Create one HttpSolrClient object for each Solr server you are talking
> >> to. Reuse it for all requests to that Solr server.
> >>>
> >>> It will manage a pool of connections and keep them alive for fa

Using Tika that comes with Solr 5.2

2016-02-02 Thread Steven White
Hi,

I'm trying to use Tika that comes with Solr 5.2.  The following code is not
working:

public static void parseWithTika() throws Exception
{
File file = new File("C:\\temp\\test.pdf");

FileInputStream in = new FileInputStream(file);
AutoDetectParser parser = new AutoDetectParser();
Metadata metadata = new Metadata();
metadata.add(Metadata.RESOURCE_NAME_KEY, file.getName());
BodyContentHandler contentHandler = new BodyContentHandler();

parser.parse(in, contentHandler, metadata);

String content = contentHandler.toString();   <=== 'content' is always
empty

in.close();
}

'content' is always empty string unless when the file I pass to Tika is a
text file.  Any idea what's the issue?

I have also tried sample codes off https://tika.apache.org/1.8/examples.html
with the same result.


Thanks !!

Steve


Re: Using Tika that comes with Solr 5.2

2016-02-02 Thread Steven White
I'm not using solr-app.jar.  I need to stick with Tika JARs that come with
Solr 5.2 and yet get the full text extraction feature of Tika (all file
types it supports).

At first, I started to include Tika JARs as needed; I now have all Tika
related JARs that come with Solr and yet it is not working.  Here is the
list: tika-core-1.7.jar, tika-java7-1.7.jar, tika-parsers-1.7.jar,
tika-xmp-1.7.jar,
vorbis-java-tika-0.6.jar, kite-morphlines-tika-core-0.12.1.jar
and kite-morphlines-tika-decompress-0.12.1.jar.  As part of my program, I
also have SolrJ JARs and their dependency: solr-solrj-5.2.1.jar,
solr-core-5.2.1.jar, etc.

You said "Might not have the parsers on your path within your Solr
framework?".  I"m using Tika outside Solr framework.  I'm trying to use
Tika from my own crawler application that uses SojrJ to send the raw text
to Solr for indexing.

What is it that I am missing?!

Steve

On Tue, Feb 2, 2016 at 3:03 PM, Allison, Timothy B. 
wrote:

> Might not have the parsers on your path within your Solr framework?
>
> Which tika jars are on your path?
>
> If you want the functionality of all of Tika, use the standalone
> tika-app.jar, but do not use the app in the same JVM as Solr...without a
> custom class loader.  The Solr team carefully prunes the dependencies when
> integrating Tika and makes sure that the main parsers _just work_.
>
>
> -Original Message-
> From: Steven White [mailto:swhite4...@gmail.com]
> Sent: Tuesday, February 02, 2016 2:53 PM
> To: solr-user@lucene.apache.org
> Subject: Using Tika that comes with Solr 5.2
>
> Hi,
>
> I'm trying to use Tika that comes with Solr 5.2.  The following code is not
> working:
>
> public static void parseWithTika() throws Exception {
> File file = new File("C:\\temp\\test.pdf");
>
> FileInputStream in = new FileInputStream(file);
> AutoDetectParser parser = new AutoDetectParser();
> Metadata metadata = new Metadata();
> metadata.add(Metadata.RESOURCE_NAME_KEY, file.getName());
> BodyContentHandler contentHandler = new BodyContentHandler();
>
> parser.parse(in, contentHandler, metadata);
>
> String content = contentHandler.toString();   <=== 'content' is always
> empty
>
> in.close();
> }
>
> 'content' is always empty string unless when the file I pass to Tika is a
> text file.  Any idea what's the issue?
>
> I have also tried sample codes off
> https://tika.apache.org/1.8/examples.html
> with the same result.
>
>
> Thanks !!
>
> Steve
>


Re: Using Tika that comes with Solr 5.2

2016-02-02 Thread Steven White
I found my issue.  I need to include JARs off: \solr\contrib\extraction\lib\

Steve

On Tue, Feb 2, 2016 at 4:24 PM, Steven White  wrote:

> I'm not using solr-app.jar.  I need to stick with Tika JARs that come with
> Solr 5.2 and yet get the full text extraction feature of Tika (all file
> types it supports).
>
> At first, I started to include Tika JARs as needed; I now have all Tika
> related JARs that come with Solr and yet it is not working.  Here is the
> list: tika-core-1.7.jar, tika-java7-1.7.jar, tika-parsers-1.7.jar,
> tika-xmp-1.7.jar,
> vorbis-java-tika-0.6.jar, kite-morphlines-tika-core-0.12.1.jar
> and kite-morphlines-tika-decompress-0.12.1.jar.  As part of my program, I
> also have SolrJ JARs and their dependency: solr-solrj-5.2.1.jar,
> solr-core-5.2.1.jar, etc.
>
> You said "Might not have the parsers on your path within your Solr
> framework?".  I"m using Tika outside Solr framework.  I'm trying to use
> Tika from my own crawler application that uses SojrJ to send the raw text
> to Solr for indexing.
>
> What is it that I am missing?!
>
> Steve
>
> On Tue, Feb 2, 2016 at 3:03 PM, Allison, Timothy B. 
> wrote:
>
>> Might not have the parsers on your path within your Solr framework?
>>
>> Which tika jars are on your path?
>>
>> If you want the functionality of all of Tika, use the standalone
>> tika-app.jar, but do not use the app in the same JVM as Solr...without a
>> custom class loader.  The Solr team carefully prunes the dependencies when
>> integrating Tika and makes sure that the main parsers _just work_.
>>
>>
>> -Original Message-
>> From: Steven White [mailto:swhite4...@gmail.com]
>> Sent: Tuesday, February 02, 2016 2:53 PM
>> To: solr-user@lucene.apache.org
>> Subject: Using Tika that comes with Solr 5.2
>>
>> Hi,
>>
>> I'm trying to use Tika that comes with Solr 5.2.  The following code is
>> not
>> working:
>>
>> public static void parseWithTika() throws Exception {
>> File file = new File("C:\\temp\\test.pdf");
>>
>> FileInputStream in = new FileInputStream(file);
>> AutoDetectParser parser = new AutoDetectParser();
>> Metadata metadata = new Metadata();
>> metadata.add(Metadata.RESOURCE_NAME_KEY, file.getName());
>> BodyContentHandler contentHandler = new BodyContentHandler();
>>
>> parser.parse(in, contentHandler, metadata);
>>
>> String content = contentHandler.toString();   <=== 'content' is always
>> empty
>>
>> in.close();
>> }
>>
>> 'content' is always empty string unless when the file I pass to Tika is a
>> text file.  Any idea what's the issue?
>>
>> I have also tried sample codes off
>> https://tika.apache.org/1.8/examples.html
>> with the same result.
>>
>>
>> Thanks !!
>>
>> Steve
>>
>
>


List of file types supported by ExtractingRequestHandler

2016-02-05 Thread Steven White
Hi everyone,

Is there a publish list of Tika extractors and the file types supported
that comes with Solr 5.2?  For example, I noticed that the ASM JAR (
http://asm.ow2.org/) is not included with Solr.

I can examine the JARs under /solr/contrib/extraction/lib/ and try to come
up with the list, but would rather not if a published list exists somewhere.

Thanks in advanced.

Steve


How is Tika used with Solr

2016-02-09 Thread Steven White
Hi folks,

I'm writing a file-system-crawler that will index files.  The file system
is going to be very busy an I anticipate on average 10 new updates per
min.  My application checks for new or updated files once every 1 min.  I
use Tika to extract the raw-text off those files and send them over to Solr
for indexing.  My application will be running 24x7xN-days.  It will not
recycle unless if the OS is restarted.

Over at Tika mailing list, I was told the following:

"As a side note, if you are handling a bunch of files from the wild in a
production environment, I encourage separating Tika into a separate jvm vs
tying it into any post processing – consider tika-batch and writing
separate text files for each file processed (not so efficient, but
exceedingly robust).  If this is demo code or you know your document set
well enough, you should be good to go with keeping Tika and your
postprocessing steps in the same jvm."

My question is, how does Solr utilize Tika?  Does it run Tika in its own
JVM as an out-of-process application or does it link with Tika JARs
directly?  If it links in directly, are there known issues with Solr
integrated with Tika because of Tika issues?

Thanks

Steve


Re: How is Tika used with Solr

2016-02-09 Thread Steven White
Thank you Erick and Alex.

My main question is with a long running process using Tika in the same JVM
as my application.  I'm running my file-system-crawler in its own JVM (not
Solr's).  On Tika mailing list, it is suggested to run Tika's code in it's
own JVM and invoke it from my file-system-crawler using
Runtime.getRuntime().exec().

I fully understand from Alex suggestion and link provided by Erick to use
Tika outside Solr.  But what about using Tika within the same JVM as my
file-system-crawler application or should I be making a system call to
invoke another JAR, that runs in its own JVM to extract the raw text?  Are
there known issues with Tika when used in a long running process?

Steve


On Tue, Feb 9, 2016 at 5:53 PM, Erick Erickson 
wrote:

> Here's a writeup that should help
>
> https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/
>
> On Tue, Feb 9, 2016 at 2:49 PM, Alexandre Rafalovitch
>  wrote:
> > Solr uses Tika directly. And not in the most efficient way. It is
> > there mostly for convenience rather than performance.
> >
> > So, for performance, Solr recommendation is also to run Tika
> > separately and only send Solr the processed documents.
> >
> > Regards,
> > Alex.
> > 
> > Newsletter and resources for Solr beginners and intermediates:
> > http://www.solr-start.com/
> >
> >
> > On 10 February 2016 at 09:46, Steven White  wrote:
> >> Hi folks,
> >>
> >> I'm writing a file-system-crawler that will index files.  The file
> system
> >> is going to be very busy an I anticipate on average 10 new updates per
> >> min.  My application checks for new or updated files once every 1 min.
> I
> >> use Tika to extract the raw-text off those files and send them over to
> Solr
> >> for indexing.  My application will be running 24x7xN-days.  It will not
> >> recycle unless if the OS is restarted.
> >>
> >> Over at Tika mailing list, I was told the following:
> >>
> >> "As a side note, if you are handling a bunch of files from the wild in a
> >> production environment, I encourage separating Tika into a separate jvm
> vs
> >> tying it into any post processing – consider tika-batch and writing
> >> separate text files for each file processed (not so efficient, but
> >> exceedingly robust).  If this is demo code or you know your document set
> >> well enough, you should be good to go with keeping Tika and your
> >> postprocessing steps in the same jvm."
> >>
> >> My question is, how does Solr utilize Tika?  Does it run Tika in its own
> >> JVM as an out-of-process application or does it link with Tika JARs
> >> directly?  If it links in directly, are there known issues with Solr
> >> integrated with Tika because of Tika issues?
> >>
> >> Thanks
> >>
> >> Steve
>


Re: Knowing which doc failed to get added in solr during bulk addition in Solr 5.2

2016-02-11 Thread Steven White
For my application, the solution I implemented is I log the chunk that
failed into a file.  This file is than post processed one record at a
time.  The ones that fail, are reported to the admin and never looked at
again until the admin takes action.  This is not the most efficient
solution right now but I intend to refactor this code so that the failed
chunk is itself re-processed in smaller chunks till the chunk with the
failed record(s) is down to 1 record "chunk" that will fail.

Like Debraj, I would love to hear from others how they handle such failures.

Steve


On Thu, Feb 11, 2016 at 2:29 AM, Debraj Manna 
wrote:

> Thanks Erik. How do people handle this scenario? Right now the only option
> I can think of is to replay the entire batch by doing add for every single
> doc. Then this will give me error for all the docs which got added from the
> batch.
>
> On Tue, Feb 9, 2016 at 10:57 PM, Erick Erickson 
> wrote:
>
> > This has been a long standing issue, Hoss is doing some current work on
> it
> > see:
> > https://issues.apache.org/jira/browse/SOLR-445
> >
> > But the short form is "no, not yet".
> >
> > Best,
> > Erick
> >
> > On Tue, Feb 9, 2016 at 8:19 AM, Debraj Manna 
> > wrote:
> > > Hi,
> > >
> > >
> > >
> > > I have a Document Centric Versioning Constraints added in solr schema:-
> > >
> > > 
> > >   false
> > >   doc_version
> > > 
> > >
> > > I am adding multiple documents in solr in a single call using SolrJ
> 5.2.
> > > The code fragment looks something like below :-
> > >
> > >
> > > try {
> > > UpdateResponse resp = solrClient.add(docs.getDocCollection(),
> > > 500);
> > > if (resp.getStatus() != 0) {
> > > throw new Exception(new StringBuilder(
> > > "Failed to add docs in solr ").append(resp.toString())
> > > .toString());
> > > }
> > > } catch (Exception e) {
> > > logError("Adding docs to solr failed", e);
> > > }
> > >
> > >
> > > If one of the document is violating the versioning constraints then
> Solr
> > is
> > > returning an exception with error message like "user version is not
> high
> > > enough: 1454587156" & the other documents are getting added perfectly.
> Is
> > > there a way I can know which document is violating the constraints
> either
> > > in Solr logs or from the Update response returned by Solr?
> > >
> > > Thanks
> >
>


Re: How is Tika used with Solr

2016-02-11 Thread Steven White
eb-content-nanite/
> > >
> > >
> > >
> > > -Original Message-
> > > From: Erick Erickson [mailto:erickerick...@gmail.com]
> > > Sent: Tuesday, February 09, 2016 10:05 PM
> > > To: solr-user 
> > > Subject: Re: How is Tika used with Solr
> > >
> > > My impulse would be to _not_ run Tika in its own JVM, just catch any
> > exceptions in my code and "do the right thing". I'm not sure I see any
> > real benefit in yet another JVM.
> > >
> > > FWIW,
> > > Erick
> > >
> > > On Tue, Feb 9, 2016 at 6:22 PM, Allison, Timothy B.
> > > 
> > wrote:
> > >> I have one answer here [0], but I'd be interested to hear what Solr
> > users/devs/integrators have experienced on this topic.
> > >>
> > >> [0]
> > >> http://mail-archives.apache.org/mod_mbox/tika-user/201602.mbox/%3CC
> > >> Y1P
> > >> R09MB0795EAED947B53965BC86874C7D70%40CY1PR09MB0795.namprd09.prod.ou
> > >> tlo
> > >> ok.com%3E
> > >>
> > >> -Original Message-
> > >> From: Steven White [mailto:swhite4...@gmail.com]
> > >> Sent: Tuesday, February 09, 2016 6:33 PM
> > >> To: solr-user@lucene.apache.org
> > >> Subject: Re: How is Tika used with Solr
> > >>
> > >> Thank you Erick and Alex.
> > >>
> > >> My main question is with a long running process using Tika in the
> > >> same
> > JVM as my application.  I'm running my file-system-crawler in its own
> > JVM (not Solr's).  On Tika mailing list, it is suggested to run Tika's
> > code in it's own JVM and invoke it from my file-system-crawler using
> > Runtime.getRuntime().exec().
> > >>
> > >> I fully understand from Alex suggestion and link provided by Erick
> > >> to
> > use Tika outside Solr.  But what about using Tika within the same JVM
> > as my file-system-crawler application or should I be making a system
> > call to invoke another JAR, that runs in its own JVM to extract the
> > raw text?  Are there known issues with Tika when used in a long running
> process?
> > >>
> > >> Steve
> > >>
> > >>
> >
>


un-Boosting some Docs at index time

2016-02-12 Thread Steven White
Hi everyone,

I'm trying to figure out if this is possible, if so how do I do it.

I'm indexing records from my database.  The Solr doc has 2 basic fields:
the ID and the Data field.  I lump the data of each field from the record
into Solr's Data field.  At search time, I search on this single field Data.

My need is as follows: given how I'm indexing my data, at index time, how
do I un-boost some Solr doc?  I know which Solr doc will need to be
lower-boosted based on a field value in the record that I read off the DB.

Thanks

Steve


Re: un-Boosting some Docs at index time

2016-02-12 Thread Steven White
Thanks Erick!!

Yes, SolrInputDocument.setDocumentBoost() is what I'm looking for.  I was
under the impression boosting is on fields only.

Steve

On Fri, Feb 12, 2016 at 11:36 AM, Erick Erickson 
wrote:

> You can use index-time boosting on a per-field basis, here's a place to
> start:
>
> https://lucidworks.com/blog/2011/12/14/options-to-tune-documents-relevance-in-solr/
>
> Does that work?
>
> Best,
> Erick
>
> On Fri, Feb 12, 2016 at 8:30 AM, Steven White 
> wrote:
> > Hi everyone,
> >
> > I'm trying to figure out if this is possible, if so how do I do it.
> >
> > I'm indexing records from my database.  The Solr doc has 2 basic fields:
> > the ID and the Data field.  I lump the data of each field from the record
> > into Solr's Data field.  At search time, I search on this single field
> Data.
> >
> > My need is as follows: given how I'm indexing my data, at index time, how
> > do I un-boost some Solr doc?  I know which Solr doc will need to be
> > lower-boosted based on a field value in the record that I read off the
> DB.
> >
> > Thanks
> >
> > Steve
>


Why is my index size going up (or: why it was smaller)?

2016-02-15 Thread Steven White
Hi folks,

I'm fixing code that I noticed to have a defect.  My expectation was that
once I make the fix, the index size will be smaller but instead I see it
growing.

Here is the stripped down version of the code to show the issue:

Buggy code #1:

  for (String field : fieldsList)
  {
doc.addField(SolrField_ID_LIST, "1"); // <== Notice how I'm adding the
same value over and over
doc.addField(SolrField_ALL_FIELDS_DATA, stringData);
  }

  docsToAdd.add(doc);

Fixed code #2:

  for (String field : fieldsList)
  {
doc.addField(SolrField_ALL_FIELDS_DATA, stringData);
  }

  doc.addField(SolrField_ID_LIST, "1"); // <== Notice how I'm now adding
this value only once

  docsToAdd.add(doc);

I index the exact same data in both cases; all that changed is the logic of
the code per the above.

On my test index of 1000 records, when I look at Solr's admin page (same is
true looking at the physical disk in the "index" folder) the index size for
#1 is 834.77 KB, but for #2 it is 1.56 MB.

As a side test, I changed the code to the following:

Test code #3:

  for (String field : fieldsList)
  {
doc.addField(SolrField_ALL_FIELDS_DATA, stringData);
  }

  // doc.addField(SolrField_ID_LIST, "1"); // <== I no longer include this
field

  docsToAdd.add(doc);

And now the index size is 2.27 MB !!!

Yes, each time I run the test, i start with a fresh empty index (num docs:
0, index size: 0).

Here are my field definitions:

  
  

My question is, why my index size is going up in size?  I was expecting it
to go down because I'm now indexing less data into each Solr document.

Thanks

Steve


Re: Why is my index size going up (or: why it was smaller)?

2016-02-15 Thread Steven White
That's not the case (please read the entire email).

I'm starting with a fresh index each time when I run my tests.  In fact, I
even tested (multiple times) by deleting the entire "data" folder (stop /
start Solr).  In each case, I get the same exact results.

At one point, I started to wander if my index is not optimized, but looking
at the Solr admin page, there is a green check next to the "Optimized" text.

Steve


On Mon, Feb 15, 2016 at 3:29 PM, Upayavira  wrote:

> Not got time to read your mail in depth, but I bet it is because you are
> overwriting docs. When docs are overwritten, they are effectively marked
> as deleted then re-inserted, thus leaving you with both versions of your
> doc physically in your index. When you query though, the deleted one is
> filtered out.
>
> At some point later in time, when the number of commits you have made
> results on too many segments, a merge will be triggered, and this will
> remove the deleted documents from those merged segments.
>
> Compare the numDocs (number of undeleted docs) and the maxDocs (number
> of documents, whether deleted or not) for your index. I bet one will be
> 2x the other.
>
> Upayavira
>
> On Mon, Feb 15, 2016, at 08:12 PM, Steven White wrote:
> > Hi folks,
> >
> > I'm fixing code that I noticed to have a defect.  My expectation was that
> > once I make the fix, the index size will be smaller but instead I see it
> > growing.
> >
> > Here is the stripped down version of the code to show the issue:
> >
> > Buggy code #1:
> >
> >   for (String field : fieldsList)
> >   {
> > doc.addField(SolrField_ID_LIST, "1"); // <== Notice how I'm adding
> > the
> > same value over and over
> > doc.addField(SolrField_ALL_FIELDS_DATA, stringData);
> >   }
> >
> >   docsToAdd.add(doc);
> >
> > Fixed code #2:
> >
> >   for (String field : fieldsList)
> >   {
> > doc.addField(SolrField_ALL_FIELDS_DATA, stringData);
> >   }
> >
> >   doc.addField(SolrField_ID_LIST, "1"); // <== Notice how I'm now adding
> > this value only once
> >
> >   docsToAdd.add(doc);
> >
> > I index the exact same data in both cases; all that changed is the logic
> > of
> > the code per the above.
> >
> > On my test index of 1000 records, when I look at Solr's admin page (same
> > is
> > true looking at the physical disk in the "index" folder) the index size
> > for
> > #1 is 834.77 KB, but for #2 it is 1.56 MB.
> >
> > As a side test, I changed the code to the following:
> >
> > Test code #3:
> >
> >   for (String field : fieldsList)
> >   {
> > doc.addField(SolrField_ALL_FIELDS_DATA, stringData);
> >   }
> >
> >   // doc.addField(SolrField_ID_LIST, "1"); // <== I no longer include
> >   this
> > field
> >
> >   docsToAdd.add(doc);
> >
> > And now the index size is 2.27 MB !!!
> >
> > Yes, each time I run the test, i start with a fresh empty index (num
> > docs:
> > 0, index size: 0).
> >
> > Here are my field definitions:
> >
> >> indexed="true" required="false" stored="false"/>
> >> required="false" stored="false"/>
> >
> > My question is, why my index size is going up in size?  I was expecting
> > it
> > to go down because I'm now indexing less data into each Solr document.
> >
> > Thanks
> >
> > Steve
>


Re: Why is my index size going up (or: why it was smaller)?

2016-02-16 Thread Steven White
I found the issue: as soon as I restart Solr, the index size goes down.

My index and data size must have been at a border line where some segments
are not released on my last document commit.

Steve

On Mon, Feb 15, 2016 at 11:09 PM, Shawn Heisey  wrote:

> On 2/15/2016 1:12 PM, Steven White wrote:
> > I'm fixing code that I noticed to have a defect.  My expectation was that
> > once I make the fix, the index size will be smaller but instead I see it
> > growing.
>
> I'm going to assume that SolrField_ID_LIST and SolrField_ALL_FIELDS_DATA
> are String instances that contain "ID_LIST" and "ALL_FIELDS_DATA".
>
> All three pieces of code will add exactly one document with exactly two
> fields.  The value of "field" is never used in any of the code loops,
> and "doc" is never reset/changed.
>
> I'm guessing that the actual code is more complex than the code
> fragments that you shared.  We will need to see actual code, because the
> shared code looks incomplete.
>
> Thanks,
> Shawn
>
>


Re: Why is my index size going up (or: why it was smaller)?

2016-02-16 Thread Steven White
Here is how I was testing: stop Solr, delete the "data" folder, start Solr,
start indexing, and finally check index size.

I used the same pattern for the before and after my (see my original email)
and each time I run this test, the index size ended up being larger;
restarting Solr did the trick.

Each document I'm adding is unique so there is no deletion involved here at
all.

I'm testing this on Windows, so that maybe a factor too (the OS is not
releasing file handles?!)

Steve


On Tue, Feb 16, 2016 at 11:57 AM, Shawn Heisey  wrote:

> On 2/16/2016 9:37 AM, Steven White wrote:
> > I found the issue: as soon as I restart Solr, the index size goes down.
> >
> > My index and data size must have been at a border line where some
> segments
> > are not released on my last document commit.
>
> I think the only likely thing that could cause this behavior is having
> index segments that are composed fully of deleted documents, which
> supports the idea that Upayavira mentioned.  An optimize would probably
> cause the same behavior as the restart.
>
> If you do enough indexing to cause a segment merge, that would probably
> also remove segments composed only of deleted documents.
>
> Thanks,
> Shawn
>
>


Solr 6.0

2016-02-25 Thread Steven White
Hi,

Where can I learn more about the upcoming Solr 6.0?  I understand the
release date cannot be know, but I hope the features and how it difference
from 5.x is known.

Thank you

Steve


Question about Solr logs

2016-03-04 Thread Steven White
HI folks,

I am analyzing a performance issue with Solr during indexing.  My
simplefiled psedue code looks like so

while (more-items) {
for (int i = 0; i < 100; i++) {
docs.add(doc);
}
UpdateResponse resp = solrConn.add(docs, 1); // <== yes, using "1"
is bad, but ...
docs.clear();
}

Now looking at Solr's log, I want to understand the events that generates:

solrConn.add(docs, 1);

Yes, I know using "1" is a bad practice, but that's not what I'm after.  I
set this to "1" so I can understand:

1) what are the end-to-end operation Solr does to finish this action, and
2) how long does this call blocks before it returns.

Looking at the logs, I see this:

org.apache.solr.update.processor.LogUpdateProcessor; [test]
webapp=/solr path=/update params={wt=xml&version=2.2} {add=[5539783
(1527883353280217088), 5539867 (1527883353296994304), , ... (101 adds)]} 0
1174

What does this log tell me?  Is "1174" the time (in milliseconds) it took
Solr to process those 101 documents?  Does this mean "solrConn.add(docs,
1)" was blocked for "1174" milliseconds?

Thanks in advanced.

Steve


Re: Question about Solr logs

2016-03-05 Thread Steven White
Thanks Shawn.

To make sure I get this right, I see two methods on UpdateResponse class,
is getElapsedTime the client time and getQTime Solr's time?  If so, than
getElapsedTime is how long my call was blocked, right?  And getQTime will
have the value of 1174 (per the log of my example), right?

Steve

On Sat, Mar 5, 2016 at 1:33 AM, Shawn Heisey  wrote:

> On 3/4/2016 10:21 PM, Steven White wrote:
> > org.apache.solr.update.processor.LogUpdateProcessor; [test]
> > webapp=/solr path=/update params={wt=xml&version=2.2} {add=[5539783
> > (1527883353280217088), 5539867 (1527883353296994304), , ... (101 adds)]}
> 0
> > 1174
> >
> > What does this log tell me?  Is "1174" the time (in milliseconds) it took
> > Solr to process those 101 documents?  Does this mean "solrConn.add(docs,
> > 1)" was blocked for "1174" milliseconds?
>
> Yes, the QTime on the request was 1174 milliseconds.  The UpdateResponse
> object has a getElapsedTime method that will tell you how long the
> request took from the client's point of view.  Depending on which
> SolrClient implementation you used, as well as other performance
> factors, it may block for more or less time than what Solr reports in
> the QTime parameter.
>
> Thanks,
> Shawn
>
>


Warning and Error messages in Solr's log

2016-03-07 Thread Steven White
Hi folks,

In Solr's solr-8983-console.log I see the following (about 50 in a span of
24 hours when index is on going):

WARNING: Couldn't flush user prefs:
java.util.prefs.BackingStoreException: Couldn't get file lock.

What does it mean?  Should I wary about it?

What about this one:

118316292 [qtp114794915-39] ERROR org.apache.solr.core.SolrCore  [
test_idx] ? java.lang.IllegalStateException: file:
MMapDirectory@/b/vin291f1/vol/vin291f1v3/idx/solr_index/test/data/index
lockFactory=org.apache.lucene.store.NativeFSLockFactory@169f6ad3 appears
both in delegate and in cache: cache=[_2omj.fnm, _2omg_Lucene50_0.doc,
 _2omg.nvm],delegate=[write.lock, _1wuk.si,  segments_2b]
at
org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:103)

What does it mean?

I _think_ the error log is due to the NAS drive being disconnected before
shutting down Solr, but I need a Solr expect to confirm.

Unfortunately, I cannot find anything in solr.log files regarding this
because those files have rotated.

Thanks in advanced.

Steve


Re: Warning and Error messages in Solr's log

2016-03-08 Thread Steven White
Re-posting.  Anyone has any idea about this question?  Thanks.

Steve

On Mon, Mar 7, 2016 at 5:15 PM, Steven White  wrote:

> Hi folks,
>
> In Solr's solr-8983-console.log I see the following (about 50 in a span of
> 24 hours when index is on going):
>
> WARNING: Couldn't flush user prefs:
> java.util.prefs.BackingStoreException: Couldn't get file lock.
>
> What does it mean?  Should I wary about it?
>
> What about this one:
>
> 118316292 [qtp114794915-39] ERROR org.apache.solr.core.SolrCore  [
> test_idx] ? java.lang.IllegalStateException: file: 
> MMapDirectory@/b/vin291f1/vol/vin291f1v3/idx/solr_index/test/data/index
> lockFactory=org.apache.lucene.store.NativeFSLockFactory@169f6ad3 appears
> both in delegate and in cache: cache=[_2omj.fnm, _2omg_Lucene50_0.doc,
>  _2omg.nvm],delegate=[write.lock, _1wuk.si,  segments_2b]
> at
> org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:103)
>
> What does it mean?
>
> I _think_ the error log is due to the NAS drive being disconnected before
> shutting down Solr, but I need a Solr expect to confirm.
>
> Unfortunately, I cannot find anything in solr.log files regarding this
> because those files have rotated.
>
> Thanks in advanced.
>
> Steve
>


Timeout error during commit

2016-03-09 Thread Steven White
Hi folks,

I'm indexing about 1 billion records (each are small Solr doc, no more than
20 bytes each).  The logic is basically as follows:

while (data-of-1-billion) {
read-1000-items from DB
at-100-items send 100 items to Solr: i.e.:
solrConnection.add(docs);
}
solrConnection.commit()

I'm seeing the following expection from SolrJ:

org.apache.solr.client.solrj.SolrServerException: Timeout occured while
waiting response from server at: http://localhost:8983/solr/test_data

Looking at Solr's log, I see this:

INFO  - 2016-01-15 21:15:34.836; [   test_data]
org.apache.solr.update.processor.LogUpdateProcessor; [test_data]
webapp=/solr path=/update params={wt=xml&version=2.2} {add=[, ... (101
adds)]} 0 5172

Which tells me it took Solr a bit over 5 sec. to complete the commit.

Now when I created the Solr connection, I used 5 seconds like so:

solrClient.setConnectionTimeout(5000;
  solrClient.setSoTimeout(5000);

Two questions:

1) Is the time out error because of my use of 5000?
2) Should I be calling "solrConnection.commit()" every now and than inside
the loop?

Thanks

Steve


Re: Timeout error during commit

2016-03-10 Thread Steven White
Thanks you for your insight Shawn, they are always valuable.

Question, if I wait to the very end to issue a commit, wouldn't that mean I
could lose everything if there was an OOM or some other server issue?  I
don't have any commit setting set in my solrconfig.xml.

Steve

On Wed, Mar 9, 2016 at 8:32 PM, Shawn Heisey  wrote:

> On 3/9/2016 6:10 PM, Steven White wrote:
> > I'm indexing about 1 billion records (each are small Solr doc, no more
> than
> > 20 bytes each).  The logic is basically as follows:
> >
> > while (data-of-1-billion) {
> > read-1000-items from DB
> > at-100-items send 100 items to Solr: i.e.:
> > solrConnection.add(docs);
> > }
> > solrConnection.commit()
> >
> > I'm seeing the following expection from SolrJ:
> >
> > org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> > waiting response from server at: http://localhost:8983/solr/test_data
> 
> > Which tells me it took Solr a bit over 5 sec. to complete the commit.
> >
> > Now when I created the Solr connection, I used 5 seconds like so:
> >
> > solrClient.setConnectionTimeout(5000;
> >   solrClient.setSoTimeout(5000);
> >
> > Two questions:
> >
> > 1) Is the time out error because of my use of 5000?
> > 2) Should I be calling "solrConnection.commit()" every now and than
> inside
> > the loop?
>
> Yes, this problem is happening because you set the SoTimeout value to 5
> seconds.  This is an inactivity timeout on the TCP socket.  It's not
> clear whether the problem happened on the commit operation or on the add
> operation -- it could be either.
>
> Your SoTimeout value should either remain unset, or should be set to
> something *significantly* longer than you ever expect the request to
> take.  I would suggest something between five and fifteen minutes.  I
> use fifteen minutes.  This is long enough that it should only be reached
> if there's a real problem, but short enough that my build program will
> not hang indefinitely, and will have an opportunity to send me email to
> tell me there's a problem.
>
> I would suggest that you don't do *any* commits until the end of the
> loop -- after all one billion docs have been indexed.  If you want to do
> them in your loop, set up something that will do them far less
> frequently, perhaps every 100 times through the loop.  You could include
> a commitWithin parameter on the add request instead of sending actual
> commits, which I would recommend you set to a fairly large value.  I
> would use at least five minutes, but never less than one minute.
> Alternately, you could configure autoSoftCommit in your solrconfig.xml
> file.  I would recommend a maxTime value on that config of at least five
> minutes.
>
> Also, consider increasing your batch size to something larger than 100
> or 1000.  Use 1 or more.  With 20 byte documents, you could send a
> LOT of documents in each batch without worrying too much about memory.
>
> Regardless of what else you do with commits, if you're running at least
> Solr 4.0, your solrconfig.xml file should include an autoCommit section
> configured with openSearcher set to false and a maxTime between one and
> five minutes.
>
> By now, I hope you've seen a recommendation to read this blog post:
>
>
> http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> Thanks,
> Shawn
>
>


Re: Timeout error during commit

2016-03-10 Thread Steven White
Got it.

Last question on this topic (maybe), wouldn't a commit at the very end take
too long on a 1 billion items?  Wouldn't a commit every, lets say 10,000
items be more efficient?

Steve

On Thu, Mar 10, 2016 at 5:44 PM, Shawn Heisey  wrote:

> On 3/10/2016 3:29 PM, Steven White wrote:
> > Thanks you for your insight Shawn, they are always valuable.
> >
> > Question, if I wait to the very end to issue a commit, wouldn't that
> mean I
> > could lose everything if there was an OOM or some other server issue?  I
> > don't have any commit setting set in my solrconfig.xml.
>
> This should not be a worry.  The transaction log should keep everything
> safe.
>
> As I said before, no matter what your intentions with commits are, you
> do want to have autoCommit with openSearcher set to false and a
> reasonably long maxTime.  I recommend one minute or five minutes, but
> you will see 15 seconds commonly recommended.  I use the longer time
> because I don't want Solr to be spending a lot of time doing commits.  A
> commit that doesn't open a new searcher is pretty quick, but it still
> requires CPU/memory/IO resources.
>
> Thanks,
> Shawn
>
>


What is considered too many terms on a field search?

2016-03-12 Thread Steven White
Hi folks

I need to search for terms in a field that will be AND'ed with user's real
search terms, such as:

user-real-search-terms AND FooField:(a OR b OR c OR d OR e OR ...)

The list of terms in the field FooField can be as large as 1000 items, but
will average around 100.

The list of OR'ed terms will be pre-known for a user.  So user-A will
always have (a OR b) and user-B will have (a OR e OR g OR ...) and user-C
will have some different pre-known list.

Of the 1000 items that can be in the list, at lest 80% is shared across all
users for any given search.

The items are SKU numbers (i.e.: simple strings of 20 characters).

My question is this, will this cause issues with the large number of terms
OR'ed in the FooField?  The expected average is 100, but what if I start
hitting 500 or 1000?

Btw, the reason I use OR in the FooField is because my Solr default Boolean
is set to AND.

Thanks in advanced.

Steve


Re: What is considered too many terms on a field search?

2016-03-12 Thread Steven White
Thanks Yonik.

1) How would I enforce OR on the list of terms when AND is my default
search Boolean setting in solrconfig.xml?

2) And just to confirm that I understand your solution, here is my current
implementation:


q=user-real-search-terms&fq={!join+fromIndex=sku_idex+from=SkuID+to=SkuFfolder}FooField:(a
OR b OR c ...)

Based on what you showed, I'm assuming I can now do the following:


q=user-real-search-terms&fq={!join+fromIndex=sku_idex+from=SkuID+to=SkuFfolder}FooField:({!terms
f=FooField}a,b,c,d,e)

Did I get this right?

Steve


On Sat, Mar 12, 2016 at 11:21 AM, Yonik Seeley  wrote:

> On Sat, Mar 12, 2016 at 11:00 AM, Steven White 
> wrote:
> > Hi folks
> >
> > I need to search for terms in a field that will be AND'ed with user's
> real
> > search terms, such as:
> >
> > user-real-search-terms AND FooField:(a OR b OR c OR d OR e OR ...)
> >
> > The list of terms in the field FooField can be as large as 1000 items,
> but
> > will average around 100.
>
>
> Stay away from BooleanQuery for this - it's trappy as it has a limit
> (1024 by default) after which it will start throwing an exception.
> Use {!terms f=FooField}a,b,c,d,e
>
> When embedding in another query, it may be easiest/convenient to use a
> separate parameter for your term list:
> q=user-real-search-terms AND {!terms f=FooField v=$biglist)
> &biglist=a,b,c,d,e
>
>
> -Yonik
>


Sort order for *:* query

2016-04-04 Thread Steven White
Hi everyone,

When I send Solr the query *:* the result I get back is sorted based on
Lucene's internal DocID which is oldest to most recent (can someone correct
me if I get this wrong?)  Given this, the most recently added / updated
document is at the bottom of the list.  Is there a way to reverse this sort
order?  If so, how can I make this the default in Solr's solrconfig.xml
file?

Thanks

Steve


Re: Sort order for *:* query

2016-04-05 Thread Steven White
This is all good stuff.  Thank you all for your insight.

Steve

On Mon, Apr 4, 2016 at 6:15 PM, Yonik Seeley  wrote:

> On Mon, Apr 4, 2016 at 6:06 PM, Chris Hostetter
>  wrote:
> > :
> > : Not sure I understand... _version_ is time based and hence will give
> > : roughly the same accuracy as something like
> > : TimestampUpdateProcessorFactory that you recommend below.  Both
> >
> > Hmmm... last time i looked, i thought _version_ numbers were allocated &
> > incremented on a per-shard basis and "time" was only used for initial
> > seeding when the leader started up
>
> No, time is used for every version generated.  Upper bits are
> milliseconds and lower bits are incremented only if needed for
> uniqueness in the shard (i.e. two documents indexed at the same
> millisecond).  We have 20 lower bits, so one would need a sustained
> indexing rate of over 1M documents per millisecond (or 1B docs/sec) to
> introduce a permanent skew due to indexing.
>
> There is system clock skew between shards of course, but an update
> processor that added a date field would include that as well.
>
> The code in VersionInfo is:
>
> public long getNewClock() {
>   synchronized (clockSync) {
> long time = System.currentTimeMillis();
> long result = time << 20;
> if (result <= vclock) {
>   result = vclock + 1;
> }
> vclock = result;
> return vclock;
>   }
> }
>
>
> -Yonik
>
> > -- so in a stable system running for
> > a long time, if shardA gets signifcantly more updates then shardB the
> > _version_ numbers can get skewed and a new doc in shardB might be updated
> > with a _version_ less then the _version_ of a document added to shardA
> > well before that.
> >
> > But maybe I'm remembering wrong?
> >
> >
> >
> > -Hoss
> > http://www.lucidworks.com/
>


Indexing date data for facet search

2016-04-11 Thread Steven White
Hi everyone,

I need to index data data into Solr and then use this field for facet
search.  My question is this, the date data in my DB is stored in the
following format "2016-03-29 15:54:35.461":

1) What format I should be indexing this date + time stamp into Solr?
2) What Solr field type I should be using?  Is it "date"?
3) How do I handle various time zones and locales?
4) Can I insert multi value data data into the single "date" facet field
and still use this field for facet search?
5) Based on my need, will all the Date Math per [1] on date facet still
work? I'm confused here because of my need for (3).

To elaborate on (4) some more.  The need here is this.  In my DB, there are
more than one column with date data.  I will be indexing them all into this
single multi-value Solr field of type Date that I will then use for facet.
Is this possible?

I guess, this is a two part question, for date facet: a) how to properly
index, and b) how do I properly search.

As always, any insight is greatly appreciated.

Steve

[1] https://cwiki.apache.org/confluence/display/solr/Working+with+Dates


Re: Indexing date data for facet search

2016-04-12 Thread Steven White
Hi Erick,

In Solr's schema.xml, I cannot find  for "dateRange", not even
on Apache Solr Reference guide [1].  What am I missing?  I'm on Solr 5.2.1.

Also, since my date data doesn't have seconds, can I leave ".ssZ" out or
must I supply it with "00"?

Thanks

Steve

[1] https://cwiki.apache.org/confluence/display/solr/Working+with+Dates

On Mon, Apr 11, 2016 at 9:19 PM, Erick Erickson 
wrote:

> You have two options for dates in this scenario, "tdate" or "dateRange".
> Probably in this case use dateRange, it should be more time and
> space efficient. Here's some background:
>
> https://lucidworks.com/blog/2016/02/13/solrs-daterangefield-perform/
>
> Date types should be indexed as fully specified strings, as
>
> -MM-DDThh:mm:ssZ
>
> Best,
> Erick
>
> On Mon, Apr 11, 2016 at 3:03 PM, Steven White 
> wrote:
> > Hi everyone,
> >
> > I need to index data data into Solr and then use this field for facet
> > search.  My question is this, the date data in my DB is stored in the
> > following format "2016-03-29 15:54:35.461":
> >
> > 1) What format I should be indexing this date + time stamp into Solr?
> > 2) What Solr field type I should be using?  Is it "date"?
> > 3) How do I handle various time zones and locales?
> > 4) Can I insert multi value data data into the single "date" facet field
> > and still use this field for facet search?
> > 5) Based on my need, will all the Date Math per [1] on date facet still
> > work? I'm confused here because of my need for (3).
> >
> > To elaborate on (4) some more.  The need here is this.  In my DB, there
> are
> > more than one column with date data.  I will be indexing them all into
> this
> > single multi-value Solr field of type Date that I will then use for
> facet.
> > Is this possible?
> >
> > I guess, this is a two part question, for date facet: a) how to properly
> > index, and b) how do I properly search.
> >
> > As always, any insight is greatly appreciated.
> >
> > Steve
> >
> > [1] https://cwiki.apache.org/confluence/display/solr/Working+with+Dates
>


solr.StrField or solr.StringField?

2016-05-03 Thread Steven White
Hi Everyone,

Is solr.StrField and solr.StringField the same thing?

Thanks in advanced!

Steve


Re: solr.StrField or solr.StringField?

2016-05-03 Thread Steven White
Thanks John.

Yes, the out-of-the-box schema.xml does not have solr.StringField.
However, a number of Solr pages on the web mention solr.StringField [1] and
thus I'm not sure if that's a typo, a real thing and such it is missing
from the official Solr wiki's.

Steve

[1] https://wiki.apache.org/solr/SolrFacetingOverview,
http://grokbase.com/t/lucene/solr-commits/06cw5038rk/solr-wiki-update-of-solrfacetingoverview-by-jjlarrea
,

On Tue, May 3, 2016 at 3:35 PM, John Bickerstaff 
wrote:

> My default schema.xml does not have an entry for solr.StringField so I
> can't tell you what that one does.
>
> If you look for solr.StrField in the schema.xml file, you'll get some idea
> of how it's defined.  The default setting is for it not to be analyzed.
>
> On Tue, May 3, 2016 at 10:16 AM, Steven White 
> wrote:
>
> > Hi Everyone,
> >
> > Is solr.StrField and solr.StringField the same thing?
> >
> > Thanks in advanced!
> >
> > Steve
> >
>


Re: Basic auth

2015-07-20 Thread Steven White
Hi Everyone,

I don't mean to hijack this thread, but I have auth issues that maybe
related to this topic.

I'm on 5.2.1 and trying to setup basic auth using jetty realm per
https://wiki.apache.org/solr/SolrSecurity#Jetty_realm_example  And I found
other examples on the web which are very similar to the above link.  In
addition, I found Jetty's own basic auth setting at:
http://wiki.eclipse.org/Jetty/Tutorial/Realms

Problem is, no matter what I do I cannot get it to work, I get HTTP Error
404 when I try to access Solr's URL and when I look in
C:\Solr\solr-5.2.1\server\logs\solr.log this is all that I see:

INFO  - 2015-07-20 02:16:12.065; [   ] org.eclipse.jetty.util.log.Log;
Logging initialized @286ms
INFO  - 2015-07-20 02:16:12.231; [   ] org.eclipse.jetty.server.Server;
jetty-9.2.10.v20150310
WARN  - 2015-07-20 02:16:12.240; [   ]
org.eclipse.jetty.server.handler.RequestLogHandler; !RequestLog
INFO  - 2015-07-20 02:16:12.255; [   ]
org.eclipse.jetty.server.AbstractConnector; Started ServerConnector@5a5fae16
{HTTP/1.1}{0.0.0.0:8983}
INFO  - 2015-07-20 02:16:12.256; [   ] org.eclipse.jetty.server.Server;
Started @478ms

Just to be clear, the example at
https://wiki.apache.org/solr/SolrSecurity#Jetty_realm_example states to
modify the file in /example/etc/webdefault.xml and in
/example/etc/jetty.xml, but with Solr 5.2.1, those two files are in
C:\solr-5.2.1\server\etc\webdefault.xml and
C:\solr-5.2.1\server\etc\jetty.xml

Lastly, I'm doing the above on Windows.

Thanks

Steve

On Sun, Jul 19, 2015 at 2:20 PM, Erick Erickson 
wrote:

> You're mixing up a couple of things. The Drupal is specific to, well,
> Drupal. You'd probably be best off asking about that on the Drupal
> lists.
>
> SOLR-4470 has not been committed yet, so you can't really use it. This
> may have been superceded by SOLR-7274 and there's a link to the Wiki
> that points to:
> https://cwiki.apache.org/confluence/display/solr/Security
>
> This is all quite new, not sure how much is written in the way of docs.
>
> Best,
> Erick
>
> On Sun, Jul 19, 2015 at 9:35 AM,   wrote:
> > I followed this guide:
> >
> http://learnsubjects.drupalgardens.com/content/how-place-http-authentication-solr
> >
> > But there is some something wrong, can anyone help or refer to a guide
> on how to setup http basic auth?
> >
> > Regards
> >
> >> On 19 Jul 2015, at 01:10, solr.user.1...@gmail.com wrote:
> >>
> >> SOLR-4470 is about:
> >> Support for basic auth in internal Solr  requests.
> >>
> >> What is wrong with the internal requests?
> >> Can someone help simplify, would it ever be possible to run with basic
> auth? What work arounds?
> >>
> >> Regards
>


Re: Basic auth

2015-07-20 Thread Steven White
Thanks for updating the wiki page.  However, my issue remains, I cannot get
Basic auth working.  Has anyone got it working, on Windows?

Steve

On Mon, Jul 20, 2015 at 9:09 AM, Shawn Heisey  wrote:

> On 7/20/2015 6:06 AM, Steven White wrote:
> > Just to be clear, the example at
> > https://wiki.apache.org/solr/SolrSecurity#Jetty_realm_example states to
> > modify the file in /example/etc/webdefault.xml and in
> > /example/etc/jetty.xml, but with Solr 5.2.1, those two files are in
> > C:\solr-5.2.1\server\etc\webdefault.xml and
> > C:\solr-5.2.1\server\etc\jetty.xml
>
> I have updated the wiki page so it has information relevant for 5.x,
> with notes for older versions.  Thanks for letting me know it was outdated!
>
> Thanks,
> Shawn
>
>


Re: Basic auth

2015-07-22 Thread Steven White
SOLR-7692 is for ZK (or did I get it wrong?)  In my case, I'm trying what's
documented here:
https://wiki.apache.org/solr/SolrSecurity#Jetty_realm_example and it won't
work (see my earlier email).

Steve

On Wed, Jul 22, 2015 at 8:33 AM, Noble Paul  wrote:

> Solr 5.3 is coming with proper basic auth support
>
>
> https://issues.apache.org/jira/browse/SOLR-7692
>
> On Wed, Jul 22, 2015 at 5:28 PM, Peter Sturge 
> wrote:
> > if you're using Jetty you can use the standard realms mechanism for Basic
> > Auth, and it works the same on Windows or UNIX. There's plenty of docs on
> > the Jetty site about getting this working, although it does vary somewhat
> > depending on the version of Jetty you're running (N.B. I would suggest
> > using Jetty 9, and not 8, as 8 is missing some key authentication
> classes).
> > If, when you execute a search query to your Solr instance you get a
> > username and password popup, then Jetty's auth is setup. If you don't
> then
> > something's wrong in the Jetty config.
> >
> > it's worth noting that if you're doing distributed searches Basic Auth on
> > its own will not work for you. This is because Solr sends distributed
> > requests to remote instances on behalf of the user, and it has no
> knowledge
> > of the web container's auth mechanics. We got 'round this by customizing
> > Solr to receive credentials and use them for authentication to remote
> > instances - SOLR-1861 is an old implementation for a previous release,
> and
> > there has been some significant refactoring of SearchHandler since then,
> > but the concept works well for distributed queries.
> >
> > Thanks,
> > Peter
> >
> >
> >
> > On Wed, Jul 22, 2015 at 11:18 AM, O. Klein  wrote:
> >
> >> Steven White wrote
> >> > Thanks for updating the wiki page.  However, my issue remains, I
> cannot
> >> > get
> >> > Basic auth working.  Has anyone got it working, on Windows?
> >>
> >> Doesn't work for me on Linux either.
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >> http://lucene.472066.n3.nabble.com/Basic-auth-tp4218053p4218519.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
>
>
>
> --
> -
> Noble Paul
>


Issues sending mail to the list

2015-07-23 Thread Steven White
Hi Everyone,

I'm seeing that some of my emails are not making it to the mailing list and
I confirmed that I'm subscribed:

Hi! This is the ezmlm program. I'm managing the
solr-user@lucene.apache.org mailing list.

I'm working for my owner, who can be reached
at solr-user-ow...@lucene.apache.org.

Acknowledgment: The address

   swhite4...@gmail.com

was already on the solr-user mailing list when I received
your request, and remains a subscriber.

Any idea what the problem may be?

Thanks

Steve


Basic Auth (again)

2015-07-23 Thread Steven White
(re-posting as new email thread to see if this will make it to the list)


That didn't help.  I still get the same result and virtually no log to help
me figure out where / what things are going wrong.

Here is all that I see in C:\Solr\solr-5.2.1\server\logs\solr.log:

  INFO  - 2015-07-23 05:29:12.065; [   ] org.eclipse.jetty.util.log.Log;
Logging initialized @286ms
  INFO  - 2015-07-23 05:29:12.231; [   ] org.eclipse.jetty.server.Server;
jetty-9.2.10.v20150310
  WARN  - 2015-07-23 05:29:12.240; [   ]
org.eclipse.jetty.server.handler.RequestLogHandler; !RequestLog
  INFO  - 2015-07-23 05:29:12.255; [   ]
org.eclipse.jetty.server.AbstractConnector; Started ServerConnector@5a5fae16
{HTTP/1.1}{0.0.0.0:8983}
  INFO  - 2015-07-23 05:29:12.256; [   ] org.eclipse.jetty.server.Server;
Started @478ms

Does anyone know where / what logs I should turn on to debug this?  Should
I be posting this issue on the Jetty mailing list?

Steve


On Wed, Jul 22, 2015 at 10:34 AM, Peter Sturge 
 wrote:

> Try adding the "start" call in your jetty.xml:
> Realm Name
>  default="."/>/etc/realm.properties
> 5
> 


Re: Basic Auth (again)

2015-07-23 Thread Steven White
Hi Petter,

I'm on Solr 5.2.1 which comes with Jetty 9.2.  I'm setting this up on
Windows 2012 but will need to do the same on Linux too.

I followed the step per this link:
https://wiki.apache.org/solr/SolrSecurity#Jetty_realm_example very much to
the book.  Here are the changes I made:

File: C:\Solr\solr-5.2.1\server\etc\webdefault.xml

  

  Solr authenticated
application
  /db/*

   
  db-role

  


  BASIC
  Test Realm


File: E:\Solr\solr-5.2.1\server\etc\jetty.xml


  Test Realm
  /etc/realm.properties
  0
   


File: E:\Solr\solr-5.2.1\server\etc\realm.properties

admin: admin, db-role

I then restarted Solr.  After this, accessing http://localhost:8983/solr/
gives me:

HTTP ERROR: 404

Problem accessing /solr/. Reason:

Not Found
Powered by Jetty://

In a previous post, I asked if anyone has setup Solr 5.2.1 or any 5.x with
Basic Auth and got it working, I have not heard back.  Either this feature
is not tested or not in use.  If it is not in use, how do folks secure
their Solr instance?

Thanks

Steve

On Thu, Jul 23, 2015 at 2:52 PM, Peter Sturge 
wrote:

> Hi Steve,
>
> What version of Jetty are you using?
>
> Have you got a webdefault.xml in your etc folder?
> If so, does it have an entry like this:
>
>   
> BASIC
> Realm Name as specified in jetty.xml
>   
>
> It's been a few years since I set this up, but I believe you also need an
> auth-constraint in webdefault.xml - this tells jetty which apps are using
> which realms:
>
>   
> 
>   A web application name
>   /*
> 
> 
>   default-role
> 
>   
>
> Your realm.properties should then have user account entries for the role
> similar to:
>
> admin: some-cred, default-role
>
>
> Hope this helps,
> Peter
>
>
> On Thu, Jul 23, 2015 at 7:41 PM, Steven White 
> wrote:
>
> > (re-posting as new email thread to see if this will make it to the list)
> >
> >
> > That didn't help.  I still get the same result and virtually no log to
> help
> > me figure out where / what things are going wrong.
> >
> > Here is all that I see in C:\Solr\solr-5.2.1\server\logs\solr.log:
> >
> >   INFO  - 2015-07-23 05:29:12.065; [   ] org.eclipse.jetty.util.log.Log;
> > Logging initialized @286ms
> >   INFO  - 2015-07-23 05:29:12.231; [   ] org.eclipse.jetty.server.Server;
> > jetty-9.2.10.v20150310
> >   WARN  - 2015-07-23 05:29:12.240; [   ]
> > org.eclipse.jetty.server.handler.RequestLogHandler; !RequestLog
> >   INFO  - 2015-07-23 05:29:12.255; [   ]
> > org.eclipse.jetty.server.AbstractConnector; Started
> > ServerConnector@5a5fae16
> > {HTTP/1.1}{0.0.0.0:8983}
> >   INFO  - 2015-07-23 05:29:12.256; [   ] org.eclipse.jetty.server.Server;
> > Started @478ms
> >
> > Does anyone know where / what logs I should turn on to debug this?
> Should
> > I be posting this issue on the Jetty mailing list?
> >
> > Steve
> >
> >
> > On Wed, Jul 22, 2015 at 10:34 AM, Peter Sturge 
> >  wrote:
> >
> > > Try adding the "start" call in your jetty.xml:
> > > Realm Name
> > >  > > default="."/>/etc/realm.properties
> > > 5
> > > 
> >
>


Re: Issues sending mail to the list

2015-07-23 Thread Steven White
Three emails to the existing subject of "Basic auth" didn't make it.  As
you may have seen, I started a new email thread on this subject under
"Basic Auth (again)" and now they are making it to the list.

I don't know what to make of this.

Steve

On Thu, Jul 23, 2015 at 4:31 PM, Upayavira  wrote:

> Be sure to be sending plain text emails, not HTML, and watch out for
> things that could be considered spam. Apache mail servers do receive a
> LOT of spam, so need to have relatively aggressive spam filters in
> place.
>
> Upayavira
>
> On Thu, Jul 23, 2015, at 07:29 PM, Steven White wrote:
> > Hi Everyone,
> >
> > I'm seeing that some of my emails are not making it to the mailing list
> > and
> > I confirmed that I'm subscribed:
> >
> > Hi! This is the ezmlm program. I'm managing the
> > solr-user@lucene.apache.org mailing list.
> >
> > I'm working for my owner, who can be reached
> > at solr-user-ow...@lucene.apache.org.
> >
> > Acknowledgment: The address
> >
> >swhite4...@gmail.com
> >
> > was already on the solr-user mailing list when I received
> > your request, and remains a subscriber.
> >
> > Any idea what the problem may be?
> >
> > Thanks
> >
> > Steve
>


HTTP Error 500 on "/admin/ping" request

2015-08-03 Thread Steven White
Hi Everyone,

I cannot figure out why I'm getting HTTP Error 500 off the following code:

// Using: org.apache.wink.client
String contentType = "application/atom+xml";
URI uri = new URI("http://localhost:8983"; +
"/solr/db/admin/ping?wt=xml");
Resource resource = client.resource(uri.toURL().toString());

ClientResponse clientResponse = null;

clientResponse =
resource.contentType(contentType ).accept(contentType ).get();

clientResponse.getStatusCode();// Gives back: 500

Here is the call stack I get back from the call (it's also the same in
solr.log):

ERROR - 2015-08-03 17:30:29.457; [   db]
org.apache.solr.common.SolrException; org.apache.solr.common.SolrException:
Bad contentType for search handler :application/atom+xml
request={wt=xml&q=solrpingquery&echoParams=all&distrib=false}
at
org.apache.solr.request.json.RequestUtil.processParams(RequestUtil.java:74)
at
org.apache.solr.util.SolrPluginUtils.setDefaults(SolrPluginUtils.java:167)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:140)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
at
org.apache.solr.handler.PingRequestHandler.handlePing(PingRequestHandler.java:254)
at
org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:211)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)

INFO  - 2015-08-03 17:30:29.459; [   db] org.apache.solr.core.SolrCore;
[db] webapp=/solr path=/admin/ping params={wt=xml} status=400 QTime=6
ERROR - 2015-08-03 17:30:29.459; [   db]
org.apache.solr.common.SolrException; org.apache.solr.common.SolrException:
Ping query caused exception: Bad contentType for search handler
:application/atom+xml
request={wt=xml&q=solrpingquery&echoParams=all&distrib=false}
at
org.apache.solr.handler.PingRequestHandler.handlePing(PingRequestHandler.java:263)
at
org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:211)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
at java.lang.Thread.run(Thread.java:853)
Caused by: org.apache.solr.common.SolrException: Bad contentType for search
handler :application/atom+xml
request={wt=xml&q=solrpingquery&echoParams=all&distrib=false}
at
org.apache.solr.request.json.RequestUtil.processParams(RequestUtil.java:74)
at
org.apache.solr.util.SolrPluginUtils.setDefaults(SolrPluginUtils.java:167)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:140)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
at
org.apache.solr.handler.PingRequestHandler.handlePing(PingRequestHandler.java:254)
... 27 more
INFO  - 2015-08-03 17:30:29.461; [   db] org.apache.solr.core.SolrCore;
[db] webapp=/solr path=/admin/ping params={wt=xml} status=500 QTime=8
ERROR - 2015-08-03 17:30:29.462; [   db]
org.apache.solr.common.SolrException;
null:org.apache.solr.common.SolrException: Ping query caused exception: Bad
contentType for search handler :application/atom+xml
request={wt=xml&q=solrpingquery&echoParams=all&distrib=false}
at
org.apache.solr.handler.PingRequestHandler.handlePing(PingRequestHandler.java:263)
at
org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:211)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)

If I use a browser plug-in Rest Client it works just fine.

My Java code works with other paths such as
"/solr/db/config/requestHandler?wt=xml" or
"/solr/db/schema/fieldtypes/?wt=xml" or "/solr/db/schema/fields/?wt=xml".

Yes, I did try other content types, the outcome is the same error.

I'm using the default ping handler:

  

  solrpingquery


  all

  

Any clues / pointers why "/admin/ping" doesn't work but other query paths
do?

Thanks

Steve


Re: HTTP Error 500 on "/admin/ping" request

2015-08-03 Thread Steven White
Yes, my application is in Java, no I cannot switch to SolrJ because I'm
working off legacy code for which I don't have the luxury to refactor..

If my application is sending the wrong Content-Type HTTP header, which part
is it and why the same header is working for the other query paths
such as: "/solr/db/config/requestHandler?wt=xml"
or "/solr/db/schema/fieldtypes/?wt=xml" or "/solr/db/schema/fields/?wt=xml"
?

Steve

On Mon, Aug 3, 2015 at 2:10 PM, Shawn Heisey  wrote:

> On 8/3/2015 11:34 AM, Steven White wrote:
> > Hi Everyone,
> >
> > I cannot figure out why I'm getting HTTP Error 500 off the following
> code:
>
> 
>
> > Ping query caused exception: Bad contentType for search handler
> > :application/atom+xml
>
> Your application is sending an incorrect Content-Type HTTP header that
> Solr doesn't know how to handle.
>
> If your application is Java, why are you not using SolrJ?  You'll likely
> find that to be a lot easier to use than even a REST client.
>
> Thanks,
> Shawn
>
>


Re: HTTP Error 500 on "/admin/ping" request

2015-08-03 Thread Steven White
I found the issue.  With GET, the legacy code I'm calling into was written
like so:

clientResponse =
resource.contentType("application/atom+xml").accept("application/atom+xml").get();

This is a bug, and should have been:

clientResponse = resource.accept("application/atom+xml").get();

Google'ing on the issue helped me narrow it down.  Looks like others run
into it moving from Solr 5.0 to 5.1 [1] [2].

Steve

[1]
http://lucene.472066.n3.nabble.com/Bad-contentType-for-search-handler-text-xml-charset-UTF-8-td4200314.html
[2] https://github.com/solariumphp/solarium/issues/326


On Mon, Aug 3, 2015 at 2:16 PM, Steven White  wrote:

> Yes, my application is in Java, no I cannot switch to SolrJ because I'm
> working off legacy code for which I don't have the luxury to refactor..
>
> If my application is sending the wrong Content-Type HTTP header, which
> part is it and why the same header is working for the other query paths
> such as: "/solr/db/config/requestHandler?wt=xml" or
> "/solr/db/schema/fieldtypes/?wt=xml" or "/solr/db/schema/fields/?wt=xml" ?
>
> Steve
>
> On Mon, Aug 3, 2015 at 2:10 PM, Shawn Heisey  wrote:
>
>> On 8/3/2015 11:34 AM, Steven White wrote:
>> > Hi Everyone,
>> >
>> > I cannot figure out why I'm getting HTTP Error 500 off the following
>> code:
>>
>> 
>>
>> > Ping query caused exception: Bad contentType for search handler
>> > :application/atom+xml
>>
>> Your application is sending an incorrect Content-Type HTTP header that
>> Solr doesn't know how to handle.
>>
>> If your application is Java, why are you not using SolrJ?  You'll likely
>> find that to be a lot easier to use than even a REST client.
>>
>> Thanks,
>> Shawn
>>
>>
>


Documentation for: solr.EnglishPossessiveFilterFactory

2015-08-03 Thread Steven White
Hi Everyone,

Does anyone knows where I can find docs on ?  The only one I found is the
API doc:
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/en/EnglishPossessiveFilterFactory.html
but that's not what I'm looking for, I'm looking for one to describe in
details how this filter works with examples.

Thanks

Steve


Re: Documentation for: solr.EnglishPossessiveFilterFactory

2015-08-04 Thread Steven White
Thanks Alex.

Steve

On Mon, Aug 3, 2015 at 9:44 PM, Alexandre Rafalovitch 
wrote:

> Seems simple enough that the source answers all the questions:
>
> https://github.com/apache/lucene-solr/blob/lucene_solr_4_9/lucene/analysis/common/src/java/org/apache/lucene/analysis/en/EnglishPossessiveFilter.java#L66
>
> It just looks for a couple of versions of apostrophe followed by s or S.
>
> Regards,
>Alex.
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 3 August 2015 at 17:56, Steven White  wrote:
> > Hi Everyone,
> >
> > Does anyone knows where I can find docs on  > class="solr.EnglishPossessiveFilterFactory"/>?  The only one I found is
> the
> > API doc:
> >
> http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/en/EnglishPossessiveFilterFactory.html
> > but that's not what I'm looking for, I'm looking for one to describe in
> > details how this filter works with examples.
> >
> > Thanks
> >
> > Steve
>


Supported languages

2015-08-04 Thread Steven White
Hi Everyone,

I see Solr comes pre-configured with text analyzers for a list of supported
languages e.g.: "text_ar", "text_bq", "text_ca", "text_cjk", "text_ckb",
"text_cz", etc.

My questions are:

1) How well optimized are those languages for general usage?  This is
something I need help with because other then English, I cannot judge how
well the current pre-configured setting works for best quality.  Yes,
"quality" means different thing for each customer, but still I'm curious to
know if the out-of-the-box setting is optimal.

2) Is there a landing link that talks about each of the
supported languages, what is available and how to tune that fieldType for
the said language?

3) What do you do when a language I need is not on the list?  The obvious
answer is to write my own plug-in "fieldType" (or even customize one off
existing fieldType), but short of that, is there a "general" fieldType that
can be used?  Even if it means this fieldType will function as if it is
SQL's LIKE feature.

Thanks

Steve


Solr relevancy score order

2015-08-24 Thread Steven White
Hi Everyone,

When I search for a term in Solr, and it happens that 10 doc end up with
the same score, what's the order of doc ranking in the set of those 10
equally scored doc and what is it based on?  Is there a link I can read
more about this?

Thanks,

Steve


Re: Solr relevancy score order

2015-08-24 Thread Steven White
Thanks Ahmet.

Steve

On Mon, Aug 24, 2015 at 10:09 AM, Ahmet Arslan 
wrote:

> Hi Steven,
>
> When scores produce a tie, internal Lucene document IDs are used to break
> it.
> However, internal Lucene Ids can change when index changes. (merges,
> updates etc).
>
> You can see those values with [docid] - DocIdAugmenterFactory.
>
> If you want 100% stable sorting, use a second sorting criterion. e.g. sort
> = score desc, some_field asc
>
> Ahmet
>
>
>
> On Monday, August 24, 2015 4:56 PM, Steven White 
> wrote:
> Hi Everyone,
>
> When I search for a term in Solr, and it happens that 10 doc end up with
> the same score, what's the order of doc ranking in the set of those 10
> equally scored doc and what is it based on?  Is there a link I can read
> more about this?
>
> Thanks,
>
> Steve
>


Re: Solr relevancy score order

2015-08-24 Thread Steven White
A follow up question.  Is the sub-sorting on the lucene internal doc IDs
ascending or descending order?  That is, do the most recently index doc
show up first in this set of docs that have tied score?  If not, who can I
have the most recent be first?  Do I have to sort on lucene's internal doc
IDs?  If so, how do I tell Solr to do that?

Thanks

Steve

On Mon, Aug 24, 2015 at 10:09 AM, Ahmet Arslan 
wrote:

> Hi Steven,
>
> When scores produce a tie, internal Lucene document IDs are used to break
> it.
> However, internal Lucene Ids can change when index changes. (merges,
> updates etc).
>
> You can see those values with [docid] - DocIdAugmenterFactory.
>
> If you want 100% stable sorting, use a second sorting criterion. e.g. sort
> = score desc, some_field asc
>
> Ahmet
>
>
>
> On Monday, August 24, 2015 4:56 PM, Steven White 
> wrote:
> Hi Everyone,
>
> When I search for a term in Solr, and it happens that 10 doc end up with
> the same score, what's the order of doc ranking in the set of those 10
> equally scored doc and what is it based on?  Is there a link I can read
> more about this?
>
> Thanks,
>
> Steve
>


Re: Solr relevancy score order

2015-08-24 Thread Steven White
Thanks Hoss.

I understand the dynamic nature of doc-IDs.  All that I care about is the
most recent docs be at the top of the hit list when there is a tie.  From
your reply, it is not clear if that's what happens.  If not, then I have to
sort, but this is something I want to avoid so it won't add cost to my
queries (CPU and RAM).

Can you help me answer those two questions?

Steve

On Mon, Aug 24, 2015 at 2:16 PM, Chris Hostetter 
wrote:

>
> : A follow up question.  Is the sub-sorting on the lucene internal doc IDs
> : ascending or descending order?  That is, do the most recently index doc
>
> you can not make any generic assumptions baout hte order of the internal
> lucene doc IDS -- the secondary sort on the internal IDs is stable (and
> FWIW: ascending) for static indexes, but as mentioned before: the *actual*
> order hte the IDS changes as the index changes -- if there is an index
> merge, the ids can be totally different and docs can be re-arranged into a
> diff order...
>
> : > However, internal Lucene Ids can change when index changes. (merges,
> : > updates etc).
>
> ...
>
> : show up first in this set of docs that have tied score?  If not, who can
> I
> : have the most recent be first?  Do I have to sort on lucene's internal
> doc
>
> add a "timestamp" or "counter" field when you index your documents that
> means whatevery you want it to mean (order added, order updated, order
> according to some external sort criteria from some external system) and
> then do an explicit sort on that.
>
>
> -Hoss
> http://www.lucidworks.com/
>


Re: User Authentication

2015-08-24 Thread Steven White
Hi Noble,

Is everything in the link you provided applicable to Solr 5.2.1?

Thanks

Steve

On Mon, Aug 24, 2015 at 2:20 PM, Noble Paul  wrote:

> did you manage to look at the reference guide?
> https://cwiki.apache.org/confluence/display/solr/Securing+Solr
>
> On Mon, Aug 24, 2015 at 9:23 PM, LeZotte, Tom
>  wrote:
> > Alex
> > I got a super secret release of Solr 5.3.1, wasn’t suppose to say
> anything.
> >
> > Yes I’m running 5.2.1, I will check out the release notes for 5.3.
> >
> > Was looking for three types of user authentication, I guess.
> > 1. the Admin Console
> > 2. User auth for each Core ( and select and update) on a server.
> > 3. HTML interface access (example: ajax-solr<
> https://github.com/evolvingweb/ajax-solr>)
> >
> > Thanks
> >
> > Tom LeZotte
> > Health I.T. - Senior Product Developer
> > (p) 615-875-8830
> >
> >
> >
> >
> >
> >
> > On Aug 24, 2015, at 10:05 AM, Alexandre Rafalovitch  > wrote:
> >
> > Thanks for the email from the future. It is good to start to prepare
> > for 5.3.1 now that 5.3 is nearly out.
> >
> > Joking aside (and assuming Solr 5.2.1), what exactly are you trying to
> > achieve? Solr should not actually be exposed to the users directly. It
> > should be hiding in a backend only visible to your middleware. If you
> > are looking for a HTML interface that talks directly to Solr after
> > authentication, that's not the right way to set it up.
> >
> > That said, some security features are being rolled out and you should
> > definitely check the release notes for the 5.3.
> >
> > Regards,
> >   Alex.
> > 
> > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > http://www.solr-start.com/
> >
> >
> > On 24 August 2015 at 10:01, LeZotte, Tom 
> wrote:
> > Hi Solr Community
> >
> > I have been trying to add user authentication to our Solr 5.3.1 RedHat
> install. I’ve found some examples on user authentication on the Jetty side.
> But they have failed.
> >
> > Does any one have a step by step example on authentication for the admin
> screen? And a core?
> >
> >
> > Thanks
> >
> > Tom LeZotte
> > Health I.T. - Senior Product Developer
> > (p) 615-875-8830
> >
> >
> >
> >
> >
> >
> >
>
>
>
> --
> -
> Noble Paul
>


Re: User Authentication

2015-08-24 Thread Steven White
For my project, Keberos is not a requirement.  What I need is:

1) Basic Auth to Solr server (at all access levels)
2) SSL support

My setup is not using ZK, it's a single core.

Steve

On Mon, Aug 24, 2015 at 4:12 PM, Don Bosco Durai  wrote:

> Just curious, is Kerberos an option for you? If so, mostly all your 3 use
> cases will addressed.
>
> Bosco
>
>
> On 8/24/15, 12:18 PM, "Steven White"  wrote:
>
> >Hi Noble,
> >
> >Is everything in the link you provided applicable to Solr 5.2.1?
> >
> >Thanks
> >
> >Steve
> >
> >On Mon, Aug 24, 2015 at 2:20 PM, Noble Paul  wrote:
> >
> >> did you manage to look at the reference guide?
> >> https://cwiki.apache.org/confluence/display/solr/Securing+Solr
> >>
> >> On Mon, Aug 24, 2015 at 9:23 PM, LeZotte, Tom
> >>  wrote:
> >> > Alex
> >> > I got a super secret release of Solr 5.3.1, wasn¹t suppose to say
> >> anything.
> >> >
> >> > Yes I¹m running 5.2.1, I will check out the release notes for 5.3.
> >> >
> >> > Was looking for three types of user authentication, I guess.
> >> > 1. the Admin Console
> >> > 2. User auth for each Core ( and select and update) on a server.
> >> > 3. HTML interface access (example: ajax-solr<
> >> https://github.com/evolvingweb/ajax-solr>)
> >> >
> >> > Thanks
> >> >
> >> > Tom LeZotte
> >> > Health I.T. - Senior Product Developer
> >> > (p) 615-875-8830
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Aug 24, 2015, at 10:05 AM, Alexandre Rafalovitch
> >> >> <mailto:arafa...@gmail.com>> wrote:
> >> >
> >> > Thanks for the email from the future. It is good to start to prepare
> >> > for 5.3.1 now that 5.3 is nearly out.
> >> >
> >> > Joking aside (and assuming Solr 5.2.1), what exactly are you trying to
> >> > achieve? Solr should not actually be exposed to the users directly. It
> >> > should be hiding in a backend only visible to your middleware. If you
> >> > are looking for a HTML interface that talks directly to Solr after
> >> > authentication, that's not the right way to set it up.
> >> >
> >> > That said, some security features are being rolled out and you should
> >> > definitely check the release notes for the 5.3.
> >> >
> >> > Regards,
> >> >   Alex.
> >> > 
> >> > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> >> > http://www.solr-start.com/
> >> >
> >> >
> >> > On 24 August 2015 at 10:01, LeZotte, Tom 
> >> wrote:
> >> > Hi Solr Community
> >> >
> >> > I have been trying to add user authentication to our Solr 5.3.1 RedHat
> >> install. I¹ve found some examples on user authentication on the Jetty
> >>side.
> >> But they have failed.
> >> >
> >> > Does any one have a step by step example on authentication for the
> >>admin
> >> screen? And a core?
> >> >
> >> >
> >> > Thanks
> >> >
> >> > Tom LeZotte
> >> > Health I.T. - Senior Product Developer
> >> > (p) 615-875-8830
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> -
> >> Noble Paul
> >>
>
>
>


Re: Solr relevancy score order

2015-08-27 Thread Steven White
Thanks Erick.

Your summary about doc IDs is much helpful.

I tested the second level sort with a small set of data (10K records) and
didn't see much of a significant impact.  I will test with a 10m records at
some time later.

Steve

On Mon, Aug 24, 2015 at 11:03 PM, Erick Erickson 
wrote:

> Getting the most recent doc first in the case of a tie
> will _not_ "just happen". I don't think you really get the
> nuance here...
>
> You index doc1, and doc2 later. Let's
> claim that doc1 gets internal Lucene doc ID of 1 and
> doc2 gets an internal doc ID of 2. So far you're golden.
> Let's further claim that doc1 is in a different segment than
> doc2. Sometime later, as you add/update/delete docs,
> segments are merged and doc1 and doc2 may or may
> not be in the merged segment. At that point, doc1 can get an
> internal Lucene doc ID of, say, 823 and doc2 can get an internal
> doc ID of, say 64. So their relative order is changed.
>
> You have to have a secondary sort criteria then. And it has to be
> something monotonically increasing by time that won't ever change
> like internal doc IDs can. Adding a timestamp
> to every doc is certainly an option. Adding your own counter
> is also reasonable.
>
> But this is a _secondary_ sort, so it's not even consulted if the
> first sort (score) is not a tie. You can get a sense of how this would
> affect your query time/CPU usage/RAM by must specifying
> sort=score desc,id asc
> where id is your  field. This won't do what you want,
> but it will simulate it without having to re-index.
>
> Best,
> Erick
>
> On Mon, Aug 24, 2015 at 11:54 AM, Steven White 
> wrote:
> > Thanks Hoss.
> >
> > I understand the dynamic nature of doc-IDs.  All that I care about is the
> > most recent docs be at the top of the hit list when there is a tie.  From
> > your reply, it is not clear if that's what happens.  If not, then I have
> to
> > sort, but this is something I want to avoid so it won't add cost to my
> > queries (CPU and RAM).
> >
> > Can you help me answer those two questions?
> >
> > Steve
> >
> > On Mon, Aug 24, 2015 at 2:16 PM, Chris Hostetter <
> hossman_luc...@fucit.org>
> > wrote:
> >
> >>
> >> : A follow up question.  Is the sub-sorting on the lucene internal doc
> IDs
> >> : ascending or descending order?  That is, do the most recently index
> doc
> >>
> >> you can not make any generic assumptions baout hte order of the internal
> >> lucene doc IDS -- the secondary sort on the internal IDs is stable (and
> >> FWIW: ascending) for static indexes, but as mentioned before: the
> *actual*
> >> order hte the IDS changes as the index changes -- if there is an index
> >> merge, the ids can be totally different and docs can be re-arranged
> into a
> >> diff order...
> >>
> >> : > However, internal Lucene Ids can change when index changes. (merges,
> >> : > updates etc).
> >>
> >> ...
> >>
> >> : show up first in this set of docs that have tied score?  If not, who
> can
> >> I
> >> : have the most recent be first?  Do I have to sort on lucene's internal
> >> doc
> >>
> >> add a "timestamp" or "counter" field when you index your documents that
> >> means whatevery you want it to mean (order added, order updated, order
> >> according to some external sort criteria from some external system) and
> >> then do an explicit sort on that.
> >>
> >>
> >> -Hoss
> >> http://www.lucidworks.com/
> >>
>


Looking for Traditional Chinese support

2015-08-27 Thread Steven White
Hi Everyone

Per
https://cwiki.apache.org/confluence/display/solr/Language+Analysis#LanguageAnalysis-Language-SpecificFactories
I see the languages Solr supports.  Where is Traditional Chinese?  Is CJK
the one?

Thanks

Steve


Re: Looking for Traditional Chinese support

2015-08-27 Thread Steven White
Hi Jeanne,

I don't understand.  Are you saying "Chinese Tokenizer" per
https://cwiki.apache.org/confluence/display/solr/Language+Analysis#LanguageAnalysis-Chinese
is "Traditional Chinese"?  If so, then it "is deprecated as of Solr 3.4"
and I just tried it with Solr 5.2 and could not get Solr started because
solr.ChineseFilterFactory cannot be loaded.

This is what I tried:


  
  


Thanks

Steve

On Thu, Aug 27, 2015 at 2:20 PM, Jeanne Wang  wrote:

> Chinese instead of Simplified Chinese should be Traditional Chinese.
>
> Jeanne
>
> On Thu, Aug 27, 2015 at 12:51 PM, Steven White 
> wrote:
>
> > Hi Everyone
> >
> > Per
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Language+Analysis#LanguageAnalysis-Language-SpecificFactories
> > I see the languages Solr supports.  Where is Traditional Chinese?  Is CJK
> > the one?
> >
> > Thanks
> >
> > Steve
> >
>


solr-user@lucene.apache.org

2015-09-02 Thread Steven White
Hi Everyone,

I have the following in my schema:

  

  
  
  
  
  
  
  
  
  

  

In the text file "wdfftypes.txt", I have this:

  & => DIGIT
  $ => DIGIT

I also tried:

  & => ALPHA
  $ => ALPHA

I then index data that contains the string: "~ ! @ # $ % ^ & * ( ) _ + - =
[ { ] } \ | ; : ' " , < . > / ?"

But yet when I search on $ or &, I don't get any hit.  Any idea what I'm
doing wrong?

Thanks in advance.

Steve


Using join with edismax

2015-09-10 Thread Steven White
Hi everyone,

Does any one know if "join" across cores supported with edismax?

Thanks!!!

Steve,


Passing Basic Auth info to HttpSolrClient

2015-09-28 Thread Steven White
Hi,

I'm using HttpSolrClient to connect to Solr.  Everything works until when I
enabled basic authentication in Jetty.  My question is, how do I pass to
SolrJ the basic auth info. so that I don't get a 401 error?

Thanks in advance

Steve


Re: Passing Basic Auth info to HttpSolrClient

2015-09-29 Thread Steven White
Hi,

Re-posting to see if anyone can help.  If my question is not clear, let me
know.

Thanks!

Steve

On Mon, Sep 28, 2015 at 5:15 PM, Steven White  wrote:

> Hi,
>
> I'm using HttpSolrClient to connect to Solr.  Everything works until when
> I enabled basic authentication in Jetty.  My question is, how do I pass to
> SolrJ the basic auth info. so that I don't get a 401 error?
>
> Thanks in advance
>
> Steve
>


Two seperate intance of Solr on the same machine

2015-10-26 Thread Steven White
Hi,

For reasons I have no control over, I'm required to run 2 (maybe more)
instances of Solr on the same server (Windows and Linux).  To be more
specific, I will need to start each instance like so:

  > solr\bin start -p 8983 -s ..\instance_one
  > solr\bin start -p 8984 -s ..\instance_two
  > solr\bin start -p 8985 -s ..\instance_three

Each of those instances is a stand alone Solr (no ZK here at all).

I have tested this over and over and did not see any issue.  However, I did
notice that each instance is writing to the same solr\server\logs\ files
(will this be an issue?!!)

Is the above something I should avoid?  If so, why?

Thanks in advanced !!

Steve


Re: Two seperate intance of Solr on the same machine

2015-10-27 Thread Steven White
How do I specify a different log directory by editing "log4j.properties"?

Steve

On Mon, Oct 26, 2015 at 9:08 PM, Pushkar Raste 
wrote:

> It depends on your case. If you don't mind logs from 3 different instances
> inter-mingled with each other you should be fine.
> You add "-Dsolr.log=" to make logs to go different
> directories. If you want logs to go to same directory but different files
> try updating log4j.properties.
>
> On 26 October 2015 at 13:33, Steven White  wrote:
>
> > Hi,
> >
> > For reasons I have no control over, I'm required to run 2 (maybe more)
> > instances of Solr on the same server (Windows and Linux).  To be more
> > specific, I will need to start each instance like so:
> >
> >   > solr\bin start -p 8983 -s ..\instance_one
> >   > solr\bin start -p 8984 -s ..\instance_two
> >   > solr\bin start -p 8985 -s ..\instance_three
> >
> > Each of those instances is a stand alone Solr (no ZK here at all).
> >
> > I have tested this over and over and did not see any issue.  However, I
> did
> > notice that each instance is writing to the same solr\server\logs\ files
> > (will this be an issue?!!)
> >
> > Is the above something I should avoid?  If so, why?
> >
> > Thanks in advanced !!
> >
> > Steve
> >
>


Re: Two seperate intance of Solr on the same machine

2015-10-27 Thread Steven White
That's what I'm doing using "-s" to instruct each instance of Solr where
the data is.

Steve

On Tue, Oct 27, 2015 at 12:52 AM, Jack Krupansky 
wrote:

> Each instance should be installed in a separate directory. IOW, don't try
> running multiple Solr processes for the same data.
>
> -- Jack Krupansky
>
> On Mon, Oct 26, 2015 at 1:33 PM, Steven White 
> wrote:
>
> > Hi,
> >
> > For reasons I have no control over, I'm required to run 2 (maybe more)
> > instances of Solr on the same server (Windows and Linux).  To be more
> > specific, I will need to start each instance like so:
> >
> >   > solr\bin start -p 8983 -s ..\instance_one
> >   > solr\bin start -p 8984 -s ..\instance_two
> >   > solr\bin start -p 8985 -s ..\instance_three
> >
> > Each of those instances is a stand alone Solr (no ZK here at all).
> >
> > I have tested this over and over and did not see any issue.  However, I
> did
> > notice that each instance is writing to the same solr\server\logs\ files
> > (will this be an issue?!!)
> >
> > Is the above something I should avoid?  If so, why?
> >
> > Thanks in advanced !!
> >
> > Steve
> >
>


Closing Windows CMD kills Solr

2015-10-28 Thread Steven White
Hi Folks,

I don't understand if this is an expected behavior or not.

On Windows, I start Solr from a command prompt like so:

bin\solr start -p 8983 -s C:\MySolrIndex

Now, once I close the command prompt the Java process that started Solr is
killed.  is this expected?  How do I keep Solr alive when I close the
command prompt?

This dose not happen on Linux.

Thank in advanced.

Steve


copyField

2015-11-04 Thread Steven White
Hi,

I have 100's of fields to search against based on some pre-defined static
rules.  So fields A, B, C to be searched as group-X, fields A, B, D, E, F
as group-Y, fields B, E, F, G as group-Z.  Each group is made up of 100's
of fields (at least 500).

I can use copyField variations to copy into each group, or I can code the
copy logic into my Java code (which is what I'm doing now) and have it copy
into each group.

My question is two folds:

1) Give my need, am I losing anything by writing my own copy-field in my
Java code vs. using Solr's copyField in the schema?

2) How do I prevent a case where when I copy data from field A and B where
A has "Fable of the Throbbing" and B has "Genius of a Tank Town" which get
copied into group-X as "Fable of the Throbbing Genius of a Tank Town".
When this happens, a phrase search for "Throbbing Genius" will get me a hit
(when in reality, it shouldn't).  If I was using copyField, wouldn't this
problem still exists?

Thanks in advanced.

Steve


Stopping Solr on Linux when run as a service

2015-11-10 Thread Steven White
Hi folks,

This question maybe more of a Linux one vs. Solr, but I have to start
someplace.

I'm reading this link
https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production
to get Solr on Linux (I'm more of a Windows guy).

The page provides good intro on how to setup Solr to start as a service on
Linux.  Now what I don't get is this: what happens when the system is
shutting down?  How does Solr knows to shutdown gracefully when there is
noting on that page talks about issuing a "stop" command on system
shutdown?  Can someone shed some light on this?  Like I said, I'm more of a
"Windows" guy.

Thanks in advanced!!

Steve


Number of fields in qf & fq

2015-11-19 Thread Steven White
Hi everyone

What is considered too many fields for qf and fq?  On average I will have
1500 fields in qf and 100 in fq (all of which are OR'ed).  Assuming I can
(I have to check with the design) for qf, if I cut it down to 1 field, will
I see noticeable performance improvement?  It will take a lot of effort to
test this which is why I'm asking first.

As is, I'm seeing 2-5 sec response time for searches on an index of 1
million records with total index size (on disk) of 4 GB.  I gave Solr 2 GB
of RAM (also tested at 4 GB) in both cases Solr didn't use more then 1 GB.

Thanks in advanced

Steve


Re: Number of fields in qf & fq

2015-11-19 Thread Steven White
Thanks Walter.  I see your point.  Does this apply to fq as will?

Also, how does one go about debugging performance issues in Solr to find
out where time is mostly spent?

Steve

On Thu, Nov 19, 2015 at 6:54 PM, Walter Underwood 
wrote:

> With one field in qf for a single-term query, Solr is fetching one posting
> list. With 1500 fields, it is fetching 1500 posting lists. It could easily
> be 1500 times slower.
>
> It might be even slower than that, because we can’t guarantee that: a)
> every algorithm in Solr is linear, b) that all those lists will fit in
> memory.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Nov 19, 2015, at 3:46 PM, Steven White  wrote:
> >
> > Hi everyone
> >
> > What is considered too many fields for qf and fq?  On average I will have
> > 1500 fields in qf and 100 in fq (all of which are OR'ed).  Assuming I can
> > (I have to check with the design) for qf, if I cut it down to 1 field,
> will
> > I see noticeable performance improvement?  It will take a lot of effort
> to
> > test this which is why I'm asking first.
> >
> > As is, I'm seeing 2-5 sec response time for searches on an index of 1
> > million records with total index size (on disk) of 4 GB.  I gave Solr 2
> GB
> > of RAM (also tested at 4 GB) in both cases Solr didn't use more then 1
> GB.
> >
> > Thanks in advanced
> >
> > Steve
>
>


Re: Number of fields in qf & fq

2015-11-20 Thread Steven White
Thanks Erick.

The 1500 fields is a design that I inherited.  I'm trying to figure out why
it was done as such and what it will take to fix it.

What about my other question: how does one go about debugging performance
issues in Solr to find out where time is mostly spent?  How do I know my
Solr parameters, such as cache and what have you are set right?  From what
I see, we are using the defaults off solrconfig.xml.

I'm on Solr 5.2

Steve


On Thu, Nov 19, 2015 at 11:36 PM, Erick Erickson 
wrote:

> An fq is still a single entry in your filterCache so from that
> perspective it's the same.
>
> And to create that entry, you're still using all the underlying fields
> to search, so they have to be loaded just like they would be in a q
> clause.
>
> But really, the fundamental question here is why your design even has
> 1,500 fields and, more specifically, why you would want to search them
> all at once. From a 10,000 ft. view, that's a very suspect design.
>
> Best,
> Erick
>
> On Thu, Nov 19, 2015 at 4:06 PM, Walter Underwood 
> wrote:
> > The implementation for fq has changed from 4.x to 5.x, so I’ll let
> someone else answer that in detail.
> >
> > In 4.x, the result of each filter query can be cached. After that, they
> are quite fast.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >
> >> On Nov 19, 2015, at 3:59 PM, Steven White  wrote:
> >>
> >> Thanks Walter.  I see your point.  Does this apply to fq as will?
> >>
> >> Also, how does one go about debugging performance issues in Solr to find
> >> out where time is mostly spent?
> >>
> >> Steve
> >>
> >> On Thu, Nov 19, 2015 at 6:54 PM, Walter Underwood <
> wun...@wunderwood.org>
> >> wrote:
> >>
> >>> With one field in qf for a single-term query, Solr is fetching one
> posting
> >>> list. With 1500 fields, it is fetching 1500 posting lists. It could
> easily
> >>> be 1500 times slower.
> >>>
> >>> It might be even slower than that, because we can’t guarantee that: a)
> >>> every algorithm in Solr is linear, b) that all those lists will fit in
> >>> memory.
> >>>
> >>> wunder
> >>> Walter Underwood
> >>> wun...@wunderwood.org
> >>> http://observer.wunderwood.org/  (my blog)
> >>>
> >>>
> >>>> On Nov 19, 2015, at 3:46 PM, Steven White 
> wrote:
> >>>>
> >>>> Hi everyone
> >>>>
> >>>> What is considered too many fields for qf and fq?  On average I will
> have
> >>>> 1500 fields in qf and 100 in fq (all of which are OR'ed).  Assuming I
> can
> >>>> (I have to check with the design) for qf, if I cut it down to 1 field,
> >>> will
> >>>> I see noticeable performance improvement?  It will take a lot of
> effort
> >>> to
> >>>> test this which is why I'm asking first.
> >>>>
> >>>> As is, I'm seeing 2-5 sec response time for searches on an index of 1
> >>>> million records with total index size (on disk) of 4 GB.  I gave Solr
> 2
> >>> GB
> >>>> of RAM (also tested at 4 GB) in both cases Solr didn't use more then 1
> >>> GB.
> >>>>
> >>>> Thanks in advanced
> >>>>
> >>>> Steve
> >>>
> >>>
> >
>


Appending to user's query

2015-11-21 Thread Steven White
Hi everyone,

I have a dozen of different edismax request handlers in my solrconfig.xml
file where each customize for different uses.  They all have pre-defined fq
and qf to name some.  Here is an example of one such handler:


"/select_sales_en":{
  "class":"solr.SearchHandler",
  "name":"/select_sales_en",
  "defaults":{
"defType":"edismax",
"echoParams":"explicit",
"fl":"UniqueField,score",
"fq":"TypesList:(CA OR WA)",
"qf":"AllSales",
"rows":"10",
"wt":"xml"}},


"/select_sales_es":{
  "class":"solr.SearchHandler",
  "name":"/select_sales_es",
  "defaults":{
"defType":"edismax",
"echoParams":"explicit",
"fl":"UniqueField,score",
"fq":"TypesList:(CA OR WA OR TX OR FL)",
"qf":"AllSales",
"tie":"1.0",
"wt":"xml"}},

In some of the request handlers, I have a need to AND to the request a
customized search string such as "AND (Language:Spanish OR
Language:Chinese)" or "AND (Language:German)" for example to always be part
of the search no matter what the caller provides.  The issue I'm having is
that If I put this text as part of my "fq" in that request handler such as:

"/select_sales_en":{
  "class":"solr.SearchHandler",
  "name":"/select_sales_en",
  "defaults":{
"defType":"edismax",
"echoParams":"explicit",
"fl":"UniqueField,score",
>>>  "fq":"TypesList:(CA OR WA) AND (Language:English)",  <<<
"qf":"AllSales",
"rows":"10",
"wt":"xml"}},

It is getting replaced by the "fq" that the caller has the option to pass.
So, is there a way to force a search string to always be append to the
final search string before being passed on to Lucene?

I'm using Solr 5.2

Thanks in advanced

Steve


Solr memory usage

2015-12-08 Thread Steven White
Hi folks,

My index size on disk (optimized) is 20 GB (single core, single index).  I
have a system with 64 GB of RAM.  I start Solr with 24 GB of RAM.

I have run load tests (up to 100 concurrent users) for hours where each
user issuing unique searches (the same search is never executed again for
at least 30 minute since it was last executed).  In all tests I run, Solr's
JVM memory never goes over 10 GB (monitoring http://localhost:8983/).

I read over and over, for optimal performance, Solr should be given enough
RAM to hold the index in memory.  Well, I have done that and some but yet I
don't see Solr using up that whole RAM.  What am I doing wrong?  Is my test
at fault?  I doubled the test load (number of users) and didn't see much of
a difference with RAM usage but yet my search performance went down (takes
about 40% longer now).  I run my tests again but this time with only 12 GB
of RAM given to Solr.  Test result didn't differ much from the 24 GB run
and Solr never used more than 10 GB of RAM.

Can someone help me understand this?  I don't want to give Solr RAM that it
won't use.

PS: This is simply search tests, there is no update to the index at all.

Thanks in advanced.

Steve


Re: Solr memory usage

2015-12-09 Thread Steven White
Thanks Erick!!  Your summary and the blog by Uwe (thank you too Uwe) are
very helpful.

A follow up question.  I also noticed the "JVM-Memory" report off Solr's
home page is fluctuating.  I expect some fluctuation, but it kinda worries
me when it fluctuates up / down in a range of 4 GB and maybe more.  I.e.:
at times it is at 5 GB and other times it is at 10 GB (this is while I'm
running my search tests).  What does such high fluctuation means?

If it helps, Solr's "JVM-Memory" report states 2.5 GB usage when Solr is
first started and before I run any search on it.  I'm taking this as my
base startup memory usage.

Steve

On Tue, Dec 8, 2015 at 3:17 PM, Erick Erickson 
wrote:

> You're doing nothing wrong, that particular bit of advice has
> always needed a bit of explanation.
>
> Solr (well, actually Lucene) uses MMapDirectory for much of
> the index structure which uses the OS memory rather than
> the JVM heap. See Uwe's excellent:
>
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> Plus, the size on disk includes the stored data, which is in the *.fdt
> files in data/index. Very little of the stored data is kept in the JVM
> so that's another reason your Java heap may be smaller than
> your raw index size on disk.
>
> The advice about fitting your entire index into memory really has
> the following caveats (at least).
> 1> "memory" includes the OS memory available to the process
> 2> The size of the index on disk is misleading, the *.fdt files
>  should be subtracted in order to get a truer picture.
> 3> Both Solr and Lucene create structures in the Java JVM
>  that are _not_ reflected in the size on disk.
>
> <1> and <2> mean the JVM memory necessary is smaller
> than the size on disk.
>
> <3> means the JVM memory will be larger than.
>
> So you're doing the right thing, testing and seeing what you
> _really_ need. I'd pretty much take your test, add some
> padding and consider it good. You're _not_ doing the
> really bad thing of using the same query over and over
> again and hoping .
>
> Best,
> Erick
>
>
> On Tue, Dec 8, 2015 at 11:54 AM, Steven White 
> wrote:
> > Hi folks,
> >
> > My index size on disk (optimized) is 20 GB (single core, single index).
> I
> > have a system with 64 GB of RAM.  I start Solr with 24 GB of RAM.
> >
> > I have run load tests (up to 100 concurrent users) for hours where each
> > user issuing unique searches (the same search is never executed again for
> > at least 30 minute since it was last executed).  In all tests I run,
> Solr's
> > JVM memory never goes over 10 GB (monitoring http://localhost:8983/).
> >
> > I read over and over, for optimal performance, Solr should be given
> enough
> > RAM to hold the index in memory.  Well, I have done that and some but
> yet I
> > don't see Solr using up that whole RAM.  What am I doing wrong?  Is my
> test
> > at fault?  I doubled the test load (number of users) and didn't see much
> of
> > a difference with RAM usage but yet my search performance went down
> (takes
> > about 40% longer now).  I run my tests again but this time with only 12
> GB
> > of RAM given to Solr.  Test result didn't differ much from the 24 GB run
> > and Solr never used more than 10 GB of RAM.
> >
> > Can someone help me understand this?  I don't want to give Solr RAM that
> it
> > won't use.
> >
> > PS: This is simply search tests, there is no update to the index at all.
> >
> > Thanks in advanced.
> >
> > Steve
>


What's the need for copyField> when you have "fq"

2015-03-31 Thread Steven White
Hi folks,

I'm new to Solr and I have a question about , "q" and "fq".

If I have 50 fields in a Solr doc and I index them without doing any
 to a catch-all-field called "all_text".  During search I use
"fq" to list all the 50 fields to search on.  Now how different is this
from not using "fq" and searching against my catch-all-field of "all_text"
using "q"?

It seems to me that using  is a wast of space, and it also seems
to me that using "fq" I have better control over which fields will be
searched against.  Also, using "fq" I'm assuming my search terms will be
analyzed using that field's analyzer, in effect giving me better control
score and result.

Have I got this right, or am I missing something?

The problem that I'm trying to solve is this: user-A can search on a set of
field which is different from user-B.  Given this, why should I bother to
use  because my search will *always* be against a set of fields.

Note: I maybe mixing up "fq" with "qf" or even "uf".  Is "uf" what I should
be using vs. "fq"?

Thanks!

Steve


Filtering in Solr

2015-03-31 Thread Steven White
Hi folks,

I need filtering capability just as described here for Lucene:
http://www.javaranch.com/journal/2009/02/filtering-a-lucene-search.html

"Filtering is a mechanism of narrowing the search space, allowing only a
subset of the documents to be considered as possible hits. They can be used
to implement search-within-search features to successively search within a
previous set of results *or to constrain the document search space for
security or external data reasons.* A security filter is a powerful
example, *allowing users to only see search results of documents they own
even if their query technically matches other documents that are off
limits;* we provide an example of a security filter in the section
"Security filters".

How do I get this behavior using Solr?

If there is an example, that's great.

Thanks

Steve


How to find out which fields a search came from

2015-03-31 Thread Steven White
Hi folks,

When I get my hits back from Solr, is there a way to find out into which
fields my search term matched in?

For example, if the indexed document is:

  doc_1:
title = From Russia with Love
director = Terence Young
starting = Sean Connery, Redro Amendariz, Lotte Lenya,
music_by = John Barry
doc_2:
title = Goldfinger
director = Guy Hamilton
starting = Sean Connery, Honor Blackman, Gert Frobe
music_by = John Barry
doc_3:
title = Skyfall
director = Sam Mendes
starting = Daniel Craig, Javier Bardem, Ralph Fiennes
music_by = Thomas Newman

If my search term is "love john barry guy", Solr will tell me I have a hit
in doc_1 and doc_2.  But what I also need to know in which field my search
terms match.  How can Solr tell me that doc_1::title and doc_1::music_by
and doc_2::music_by are where my search terms matched?

It looks to me that the highlighter does this, but I need this feature
without enabling the highlighter.

Thanks!

Steve


"Taking Solr 5.0 to Production" on Windows

2015-04-02 Thread Steven White
Hi folks,

I'm reading "Taking Solr 5.0 to Production"
https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production
but I cannot find anything about Windows, is there some other link I'm
missing?

This section in the doc is an important part for a successful Solr
deployment, but it is missing Windows instructions.  Without one, there
will either be scattered deployment or Windows folks (like me) will miss
out on some key aspects that Solr expert know.

Any feedback on this?

Thanks

Steve


How do you manage / update schema.xml file

2015-04-15 Thread Steven White
Hi folks,

What is the best practice to manage and update Solr's schema.xml?

I need to deploy Solr dynamically based on customer configuration (they
will pick fields to be indexed or not, they will want to customize the
analyzer (WordDelimiterFilterFactory, etc.) and specify the language to use.

Is the task of setting up a proper schema.xml outside the scope of Solr
admin, one that I have to manage by writing my own application or is there
some tool that comes with Solr to help me do this?

I was thinking maybe SolrJ will do this for me but I couldn't find anything
about it to do this.

I also have to do customization to solrconfig.xml, thus the same question
applies here too.

Thanks in advanced.

Steve


Differentiating user search term in Solr

2015-04-15 Thread Steven White
Hi folks,

If a user types in the search box (without quotes): "{!q.op=AND df=text
solr sys" and I take that text and build the URL like so:

http://localhost:8983/solr/db/select?q={!q.op=AND%20df=text%20solr%20sys&fl=id%2Cscore%2Ctitle&wt=xml&indent=true

This will fail with "Expected identifier" because it is not a valid Solr
text.

My question is this: is there a flag I can send to Solr with the URL
telling it to treat what's in "q" as raw text vs. having it to process it
as a Solr syntax?  If not, than it means I have to escape all Solr reserved
characters and words.  If so, where can I find the complete list?  Also,
what happens when a new reserved characters or word is added to Solr down
the road?  It means I have to upgrade my application too, which is
something I would like to avoid.

Thanks

Steve


Re: How do you manage / update schema.xml file

2015-04-15 Thread Steven White
Thanks, this is exactly what I was looking for!!

Steve

On Wed, Apr 15, 2015 at 5:48 PM, Erick Erickson 
wrote:

> Have you looked at the "managed schema" stuff?
> see:
> https://cwiki.apache.org/confluence/display/solr/Managed+Schema+Definition+in+SolrConfig
> There's also some work being done to update at least parts of
> solrconfig.xml, see:
> https://issues.apache.org/jira/browse/SOLR-6533
>
> Best,
> Erick
>
> On Wed, Apr 15, 2015 at 11:46 AM, Steven White 
> wrote:
> > Hi folks,
> >
> > What is the best practice to manage and update Solr's schema.xml?
> >
> > I need to deploy Solr dynamically based on customer configuration (they
> > will pick fields to be indexed or not, they will want to customize the
> > analyzer (WordDelimiterFilterFactory, etc.) and specify the language to
> use.
> >
> > Is the task of setting up a proper schema.xml outside the scope of Solr
> > admin, one that I have to manage by writing my own application or is
> there
> > some tool that comes with Solr to help me do this?
> >
> > I was thinking maybe SolrJ will do this for me but I couldn't find
> anything
> > about it to do this.
> >
> > I also have to do customization to solrconfig.xml, thus the same question
> > applies here too.
> >
> > Thanks in advanced.
> >
> > Steve
>


Re: Differentiating user search term in Solr

2015-04-16 Thread Steven White
Thanks Shawn.

I cannot use escapeQueryChars method because my app interacts with Solr via
REST.

The summary of your email is: client's must escape search string to prevent
Solr from failing.

It would be a nice addition to Solr to provide a new query parameter that
tells it to treat the query text as literal text.  Doing so, means you
remove the burden placed on clients to understand and escape reserved Solr
/ Lucene tokens.

Steve

On Wed, Apr 15, 2015 at 7:18 PM, Shawn Heisey  wrote:

> On 4/15/2015 3:54 PM, Steven White wrote:
> > Hi folks,
> >
> > If a user types in the search box (without quotes): "{!q.op=AND df=text
> > solr sys" and I take that text and build the URL like so:
> >
> >
> http://localhost:8983/solr/db/select?q={!q.op=AND%20df=text%20solr%20sys&fl=id%2Cscore%2Ctitle&wt=xml&indent=true
> >
> > This will fail with "Expected identifier" because it is not a valid Solr
> > text.
>
> That isn't valid syntax for the lucene query parser ... the localparams
> are not closed (it would require a } character), and after the
> localparams there would need to be some additional text.
>
> > My question is this: is there a flag I can send to Solr with the URL
> > telling it to treat what's in "q" as raw text vs. having it to process it
> > as a Solr syntax?  If not, than it means I have to escape all Solr
> reserved
> > characters and words.  If so, where can I find the complete list?  Also,
> > what happens when a new reserved characters or word is added to Solr down
> > the road?  It means I have to upgrade my application too, which is
> > something I would like to avoid.
>
> One way to treat the entire input as literal text is to use the terms
> query parser ... but that requires the localparams syntax, and I do not
> know exactly what is going to happen if you use a query string that
> itself is localparams syntax -- {! other params} ... so escaping is
> probably safer.
>
>
> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermQueryParser
>
> The other way to handle it is to escape every special character with a
> backslash.  The escapeQueryChars method in SolrJ is always kept up to
> date, and can escape every special character.
>
>
> http://lucene.apache.org/solr/4_10_3/solr-solrj/org/apache/solr/client/solrj/util/ClientUtils.html#escapeQueryChars%28java.lang.String%29
>
> The javadoc for that method points to the queryparser syntax for more
> info on characters that need escaping.  Scroll to the very end of this
> page:
>
>
> http://lucene.apache.org/core/4_10_3/queryparser/org/apache/lucene/queryparser/classic/package-summary.html?is-external=true
>
> That page lists || and && rather than just the single characters | and &
> ... the escapeQueryChars method in SolrJ will escape both characters, as
> it only works at the character level, not the string level.
>
> If you want the *spaces* in your query to be treated literally also, you
> must escape them too.  The escapeQueryChars method I've mentioned will
> NOT escape spaces.
>
> Note that this does not cover URL escaping -- the & character must be
> sent as %26 or the servlet container will treat it as a special
> character, before it even gets to Solr.
>
> Thanks,
> Shawn
>
>


Re: Differentiating user search term in Solr

2015-04-16 Thread Steven White
defType didn't work:


http://localhost:8983/solr/db/select?q={!q.op=AND%20df=text%20solr%20sys&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&defType=lucene

Gave me error:

org.apache.solr.search.SyntaxError: Expected identifier at pos 27
str='{!q.op=AND df=text solr sys'

Is my use of defType correct?

Steve

On Thu, Apr 16, 2015 at 9:15 AM, Shawn Heisey  wrote:

> On 4/16/2015 7:09 AM, Steven White wrote:
> > I cannot use escapeQueryChars method because my app interacts with Solr
> via
> > REST.
> >
> > The summary of your email is: client's must escape search string to
> prevent
> > Solr from failing.
> >
> > It would be a nice addition to Solr to provide a new query parameter that
> > tells it to treat the query text as literal text.  Doing so, means you
> > remove the burden placed on clients to understand and escape reserved
> Solr
> > / Lucene tokens.
>
> That's a good idea, although we might already have that.
>
> I wonder what happens if you include defType=term with your request?
> That works for edismax, it might work for other query parsers, at least
> on the q parameter.
>
> Thanks,
> Shawn
>
>


Re: Differentiating user search term in Solr

2015-04-16 Thread Steven White
What is "term" in the "defType=term", do you mean the raw word "term" or
something else?  Because I tried that too in two different ways:

Using correct Solr syntax:


http://localhost:8983/solr/db/select?q={!q.op=AND%20df=text}%20solr%20sys&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&defType=term

This throws a NPE exception:

java.lang.NullPointerException at

org.apache.solr.schema.IndexSchema$DynamicReplacement$DynamicPattern$NameEndsWith.matches(IndexSchema.java:1033)
at

org.apache.solr.schema.IndexSchema$DynamicReplacement.matches(IndexSchema.java:1047)
at
org.apache.solr.schema.IndexSchema.dynFieldType(IndexSchema.java:1303)
at

org.apache.solr.schema.IndexSchema.getFieldTypeNoEx(IndexSchema.java:1280)
at

org.apache.solr.search.TermQParserPlugin$1.parse(TermQParserPlugin.java:56)
at

And when I try it with invalid Solr search syntax:


http://localhost:8983/solr/db/select?q={!q.op=AND%20df=text%20solr%20sys&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&defType=term


This gives me the SyntaxError:

org.apache.solr.search.SyntaxError: Expected identifier at pos 27
str='{!q.op=AND df=text solr sys'

What am I missing?

Steve

On Thu, Apr 16, 2015 at 10:43 AM, Shawn Heisey  wrote:

> On 4/16/2015 7:49 AM, Steven White wrote:
> > defType didn't work:
> >
> >
> >
> http://localhost:8983/solr/db/select?q={!q.op=AND%20df=text%20solr%20sys&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&defType=lucene
> >
> > Gave me error:
> >
> > org.apache.solr.search.SyntaxError: Expected identifier at pos 27
> > str='{!q.op=AND df=text solr sys'
> >
> > Is my use of defType correct?
>
> If everything is at defaults and you don't have defType in the handler
> definition, then defType=lucene doesn't do anything - it specifically
> says "use the lucene parser" which is the default.  You want
> defType=term instead.
>
> Thanks,
> Shawn
>
>


Re: Differentiating user search term in Solr

2015-04-16 Thread Steven White
I don't follow what the "f" parameter is.  Do you have a link where I can
read more about it?  I found this
https://wiki.apache.org/solr/HighlightingParameters and
https://wiki.apache.org/solr/SimpleFacetParameters but i"m not sure this is
what you mean (I'm not doing highlighting for faceting).

Thanks

Steve

On Thu, Apr 16, 2015 at 11:54 AM, Shawn Heisey  wrote:

> On 4/16/2015 9:37 AM, Steven White wrote:
> > What is "term" in the "defType=term", do you mean the raw word "term" or
> > something else?  Because I tried that too in two different ways:
>
> Oops.  I forgot that the term query parser (that's what "term" means --
> the name of the query parser) requires that you specify the field you
> are searching on, so that would be incomplete.  Try also setting the "f"
> parameter to the field that you want to search.  I will not be surprised
> if that doesn't work, though.
>
> Thanks,
> Shawn
>
>


Re: Differentiating user search term in Solr

2015-04-16 Thread Steven White
Thanks for trying Shawn.

Looks like I have to escape the string on my client side (this isn't a
clean design and can lead to errors if not all reserved tokens are not
escaped).

I hope folks from @dev are reading this and consider adding a parameter to
tell Solr the text is raw-text.

Steve

On Thu, Apr 16, 2015 at 12:18 PM, Shawn Heisey  wrote:

> On 4/16/2015 10:10 AM, Steven White wrote:
> > I don't follow what the "f" parameter is.  Do you have a link where I can
> > read more about it?  I found this
> > https://wiki.apache.org/solr/HighlightingParameters and
> > https://wiki.apache.org/solr/SimpleFacetParameters but i"m not sure
> this is
> > what you mean (I'm not doing highlighting for faceting).
>
> It looks like this isn't going to work.  I just tried it on my index.
>
> To see the reasoning behind what I was suggesting, click here:
>
> https://cwiki.apache.org/confluence/display/solr/Other+Parsers
>
> And then click on "Term Query Parser" in the third column of the list at
> the top of that page.
>
> The syntax for the localparams on this one is {!term f=field}querytext
> ... so I was hoping that f would work as a URL parameter, but from the
> test I just did on Solr 4.9.1, that's not the case.
>
> Thanks,
> Shawn
>
>


Solr 5.x deployment in production

2015-04-16 Thread Steven White
Hi folks,

With Solr 5.0, the WAR file is deprecated and I see Jetty is included with
Solr.  What if I have my own Web server into which I need to deploy Solr,
how do I go about doing this correctly without messing things up and making
sure Solr works?  Or is this not recommended and Jetty is the way to go, no
questions asked?

Thanks

Steve


Re: Solr 5.x deployment in production

2015-04-16 Thread Steven White
Thanks Karl.

In my case, I have to deploy Solr on Windows, AIX, and Linux (all server
edition).  We are a WebSphere shop, moving away from it means I have to
deal with politics and culture.

For Windows, I cannot use NSSM so I have to figure a solution managing Solr
(at least start-up and shutdown).  If anyone has experience in this area
(now that Solr is not in a WAS profile managed by Windows services) and can
share your experience, please do.  Thanks.

Steve

On Thu, Apr 16, 2015 at 3:49 PM, Karl Kildén  wrote:

> I asked a very similar question recently. You should switch to using the
> package as is and forget that it contains a .war. The war is now an
> internal component. Also switch to the new script for startup etc.
>
> I have seen several disappointed users that disagree with this decision but
> I assume the project now has more freedom in the future and also more
> alignment and focus on one experience.
>
> I did my own thing with NSSM because we use windows and I am satisfied.
>
> On 16 April 2015 at 21:36, Steven White  wrote:
>
> > Hi folks,
> >
> > With Solr 5.0, the WAR file is deprecated and I see Jetty is included
> with
> > Solr.  What if I have my own Web server into which I need to deploy Solr,
> > how do I go about doing this correctly without messing things up and
> making
> > sure Solr works?  Or is this not recommended and Jetty is the way to go,
> no
> > questions asked?
> >
> > Thanks
> >
> > Steve
> >
>


Re: Differentiating user search term in Solr

2015-04-16 Thread Steven White
Hi Hoss,

Maybe I'm missing something, but I tried this and got 1 hit:


http://localhost:8983/solr/db/select?q=title:(Apache%20Solr%20Notes)&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&q.op=AND

Than I tried this and got 0 hit:


http://localhost:8983/solr/db/select?q={!field%20f=title%20v=$qq}&qq=Apache%20Solr%20Notes&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&q.op=AND

It looks to me that "f" with "qq" is doing phrase search, that's not what I
want.  The data in the field "title" is "Apache Solr Release Notes"

I looked over the links you provided and tried out the examples, in each
case if the user-typed-text contains any reserved characters, it will fail
with a syntax error (the exception is when I used "f" and "qq" but like I
said, that gave me 0 hit).

If you can give me a concrete example, please do.  My need is to pass to
Solr the text "Apache: Solr Notes" (without quotes) and get a hit as if I
passed "Apache\: Solr Notes" ?

Thanks

Steve

On Thu, Apr 16, 2015 at 5:49 PM, Chris Hostetter 
wrote:

>
> : The summary of your email is: client's must escape search string to
> prevent
> : Solr from failing.
> :
> : It would be a nice addition to Solr to provide a new query parameter that
> : tells it to treat the query text as literal text.  Doing so, means you
> : remove the burden placed on clients to understand and escape reserved
> Solr
> : / Lucene tokens.
>
> i'm a little lost as to what exactly you want to do here -- but i'm going
> to focus on your thesis statement here, and assume that you want to
> search on a literal piece of text and you don't want to have to worry
> about escaping any characters and you don't wantsolr to treat any part of
> the query string as special.
>
> the only way something like that works is if you only want to search a
> single field -- searching multiple fields, searching multiple clauses,
> etc... none of those types of options make sense in this context.
>
> people have already mentioned the "term" parser -- which is fine ifyou
> want to serach for exactly one literal term, but as a more generally
> solution, what people usualy want, is the "field" parser -- which works
> better with TextFields in general...
>
>
> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-FieldQueryParser
>
> Just like the comment you've seen about the "term" parser needing an "f"
> localparam to specify the field, the same is true for the "field" parser.
> but variable refrences make this trivial to specify -- instead of using
> the full "{!field f=myfield}Foo Bar" syntax in your q param, you can use
> an alternate param ("qq" is common in many examples) for the raw data from
> the user...
>
> q={!field f=myfield v=$qq} & qq=whatever your usertypes
>
>
>
> https://cwiki.apache.org/confluence/display/solr/Local+Parameters+in+Queries
>
>
> -Hoss
> http://www.lucidworks.com/
>


Multilevel nested level support using Solr

2015-04-17 Thread Steven White
Hi folks,

In my DB, my records are nested in a folder base hierarchy:



record_1
record_2

record_3
record_4

record_5



record_6
record_7
record_8

You got the idea.

Is there anything in Solr that will let me preserve this structer and thus
when I'm searching to tell it in which level to narrow down the search?  I
have four search levels needs:

1) Be able to search inside only level: ...* (and
everything under Level_2 from this path).

2) Be able to search inside a level regardless it's path: .* (no
matter where  is, i want to search on all records under Level_2
and everything under it's path.

3) Same as #1 but limit the search to within that level (nothing below its
level are searched).

4) Same as #3 but limit the search to within that level (nothing below its
level are searched).

I found this:
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments
but it looks like it supports one level only and requires the whole two
levels be updated even if 1 of the doc in the nest is updated.

Thanks

Steve


Re: Solr 5.x deployment in production

2015-04-17 Thread Steven White
Thanks Shawn, this makes a lot of sense.

With WAR going away and no mention of Solr deployment strategy (see:
https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production)
isn't good; there is a gab in Solr's release. It feels as if Solr 5.x was
rushed out ignoring Windows Servers deployment.

-- George

On Fri, Apr 17, 2015 at 9:24 AM, Shawn Heisey  wrote:

> On 4/16/2015 2:07 PM, Steven White wrote:
> > In my case, I have to deploy Solr on Windows, AIX, and Linux (all server
> > edition).  We are a WebSphere shop, moving away from it means I have to
> > deal with politics and culture.
>
> You *can* run Solr 5.0 (and 5.1) in another container, just like you
> could with all previous Solr versions.  There are additional steps that
> have to be taken, such as correctly installing the logging jars and the
> logging config, but if you've used Solr 4.3 or later, you already know
> this:
>
> http://wiki.apache.org/solr/SolrLogging
>
> Eventually, hopefully before we reach the 6.0 release, that kind of
> deployment won't be possible, because Solr will be a true application
> (like Jetty itself), not a webapp contained in a .war file.  It may take
> us quite a while to reach that point.  If you are already using the
> scripts that come with Solr 5.x, you will have a seamless transition to
> the new implementation.
>
> The docs for 5.0 say that we aren't supporting deployment in a
> third-party servlet container, even though that still is possible.
> There are several reasons for this:
>
> * Eventually it won't be possible, because Solr's implementation will
> change.
>
> * We now have scripts that will start Solr in a consistent manner.
> ** This means that our instructions won't have to change for a new
> implementation.
>
> * There are a LOT of containers available.
> ** Each one requires different instructions.
> ** Are problems caused by the container, or Solr?  We may not know.
>
> * Jetty is the only container that gets tested.
> ** Bugs with other containers have happened.
> ** User feedback is usually the only way such bugs can be found.
>
> Thanks,
> Shawn
>
>


Re: Differentiating user search term in Solr

2015-04-20 Thread Steven White
Hi Hoss,

Thanks for that lengthy feedback, it is much appreciated.

Let me reset and bear in mind that I'm new to Solr.

I'm using Solr 5.0 (will switch over to 5.1 later this week) and my need is
as follows.

In my application, a user types "Apache Solr Notes".  I take that text and
send it over to Solr like so:


http://localhost:8983/solr/db/select?q=title:(Apache%20Solr%20Notes)&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&q.op=AND

And I get a hit on "Apache Solr Release Notes".  This is all good.

Now if the same user types "Apache: Solr Notes" (notice the ":" after
"Apache") I will get a SyntaxError.  The fix is to escape ":" before I send
it to Solr.  What I want to figure out is how can I tell Solr / Lucene to
ignore ":" and escape it for me?  In this example, I used ":" but my need
is for all other operators and reserved Solr / Lucene characters.

This need to be configurable via a URL parameter to Solr / Lucene because
there are times I will send text to Solr that has valid operators and other
times not.  If such a URL parameter exists, than my client application no
longer has to maintain a list of operators to escape and it doesn't have to
keep up with Solr as new operators are added.

What do you think?  I hope I got my message across better this time.

PS: Looking at
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-SimpleQueryParser
seems to be promising, but it doesn't include an example so I wan't able to
figure it out and it looks to me the list of operators is not complete
(there is no "{" for example)

Thanks

Steve

On Fri, Apr 17, 2015 at 3:02 PM, Chris Hostetter 
wrote:

>
> : It looks to me that "f" with "qq" is doing phrase search, that's not
> what I
> : want.  The data in the field "title" is "Apache Solr Release Notes"
>
> if you don't wnat phrase queries then you don't want pharse queries and
> that's fine -- but it wasn't clear from any of your original emails
> because you never provided (that i saw) any concrete examples of the types
> of queries you expected, the types of matches you wanted, and the types of
> matches you did *NOT* want.  details matter
>
> https://wiki.apache.org/solr/UsingMailingLists
>
>
> Based on that one concrete example i've now seen of what you *do* want to
> match: it seems that maybe a general description of your objective is that
> each of the "words" in your user input should treated as a mandatory
> clause in a boolean query -- but the concept of a "word" is already
> something that violates your earlier statement about not wanting the query
> parser to treat any "reserved characters" as special -- in order to
> recognize that "Apache", "Solr" and "Notes" should each be treated as
> independent mandatory clauses in a boolean query, then some query parser
> needs to recognize that *whitespace* is a syntactically significant
> character in your query string: it's what seperates the "words" in your
> input.
>
> the reason the "field" parser produces phrase queries in the example URLs
> you mentioned is because that parser doesn't have *ANY* special reserved
> characters -- not even whitespace.  it passes the entire input string to
> the analyzer of the configured (f) field.  if you are using TextField with
> a Tokenizer that means it gets split on whitespace, resulting in multiple
> *sequential* tokens, which will result in a phrase query (on the other
> hand, using something like StrField will cause the entire input string,
> spaces an all, to be serached as one single Term)
>
> : I looked over the links you provided and tried out the examples, in each
> : case if the user-typed-text contains any reserved characters, it will
> fail
> : with a syntax error (the exception is when I used "f" and "qq" but like I
> : said, that gave me 0 hit).
>
> As i said: Details matter.  which examples did you try? what configs were
> you using? what data where you using? which version of solr are you using?
> what exactly was the syntax error? etc ?
>
> "f" and "qq" are not magic -- saying you used them just means you used
> *some* parser that supports an "f" param ... if you tried it with the
> "term" or "field" parser then i don't know why you would have gotten a
> SyntaxError, but based on your goal it sounds like those parsers aren't
> really useful to you. (see below)
>
> : If you can give me a concrete example, please do.  My need is to pass to
> : Solr the text "Apache: Solr Notes" (without quotes) and get a hit as if I
> : passed "Apache\: Solr Notes" ?
>
> To re-iterate, saying you want the same bhavior as if you passed "Apache\:
> Solr Notes"  is a vague statment -- as if you passed that string to *what*
> ?  to the standard parser? to the dismax parser? using what request
> options? (q.op? qf? df?) ... query strings don't exist in a vacume.  the
> details & context matters.
>
> (I'm sorry if it feels like i keep hitting you over the head about this,
> i'm just trying to help you realize the breadth and scope of the variab

Re: Multilevel nested level support using Solr

2015-04-20 Thread Steven White
Re sending to see if anyone can help.  Thanks

Steve

On Fri, Apr 17, 2015 at 12:14 PM, Steven White  wrote:

> Hi folks,
>
> In my DB, my records are nested in a folder base hierarchy:
>
> 
> 
> record_1
> record_2
> 
> record_3
> record_4
> 
> record_5
> 
> 
> 
> record_6
> record_7
> record_8
>
> You got the idea.
>
> Is there anything in Solr that will let me preserve this structer and thus
> when I'm searching to tell it in which level to narrow down the search?  I
> have four search levels needs:
>
> 1) Be able to search inside only level: ...* (and
> everything under Level_2 from this path).
>
> 2) Be able to search inside a level regardless it's path: .* (no
> matter where  is, i want to search on all records under Level_2
> and everything under it's path.
>
> 3) Same as #1 but limit the search to within that level (nothing below its
> level are searched).
>
> 4) Same as #3 but limit the search to within that level (nothing below its
> level are searched).
>
> I found this:
> https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments
> but it looks like it supports one level only and requires the whole two
> levels be updated even if 1 of the doc in the nest is updated.
>
> Thanks
>
> Steve
>


Re: Differentiating user search term in Solr

2015-04-20 Thread Steven White
Hi Shawn,

If the user types "title:(Apache: Solr Notes)" (without quotes) than I want
Solr to treat the whole string as raw text string as if I escaped ":", "("
and ")" and any other reserved Solr keywords / tokens.  Using dismax it
worked for the ":" case, but I still get SyntaxError if I pass it the
following "title:(Apache: Solr Notes) AND" (here is the full URL):


http://localhost:8983/solr/db/select?q=title:(Apache:%20Solr%20Notes)%20AND&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&q.op=AND&defType=dismax&qf=title

So far, the only solution I can find is for my application to escape all
Solr operators before sending the string to Solr.  This is fine, but it
means my application will have to adopt to Solr's reserved operators as
Solr grows (if Solr 5.x / 6.x adds a new operator, I have to add that to my
applications escape list).  A better solution would be to have Solr support
a new parameter that I can pass to Solr as part of the URL.
This parameter will tell Solr to do the escaping for me or not (missing
means the same as don't do the escaping).

Thanks

Steve

On Mon, Apr 20, 2015 at 10:05 AM, Shawn Heisey  wrote:

> On 4/20/2015 7:41 AM, Steven White wrote:
> > In my application, a user types "Apache Solr Notes".  I take that text
> and
> > send it over to Solr like so:
> >
> >
> >
> http://localhost:8983/solr/db/select?q=title:(Apache%20Solr%20Notes)&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&q.op=AND
> >
> > And I get a hit on "Apache Solr Release Notes".  This is all good.
> >
> > Now if the same user types "Apache: Solr Notes" (notice the ":" after
> > "Apache") I will get a SyntaxError.  The fix is to escape ":" before I
> send
> > it to Solr.  What I want to figure out is how can I tell Solr / Lucene to
> > ignore ":" and escape it for me?  In this example, I used ":" but my need
> > is for all other operators and reserved Solr / Lucene characters.
>
> If we assume that what you did for the first query is what you will do
> for the second query, then this is what you would have sent:
>
> q=title:(Apache: Solr Notes)
>
> How is the parser supposed to know that only the second colon should be
> escaped, and not the first one?  If you escape them both (or treat the
> entire query string as query text), then the fact that you are searching
> the "title" field is lost.  The text "title" becomes an actual part of
> the query, and may not match, depending on what you have done with other
> parameters, such as the default operator.
>
> If you use the dismax parser (*NOT* the edismax parser, which parses
> field:value queries and boolean operator syntax just like the lucene
> parser), you may be able to achieve what you're after.
>
> https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser
> https://wiki.apache.org/solr/DisMaxQParserPlugin
>
> With dismax, you would use the qf and possibly the pf parameter to tell
> it which fields to search and send this as the query:
>
> q=Apache: Solr Notes
>
> Thanks,
> Shawn
>
>


Re: Multilevel nested level support using Solr

2015-04-20 Thread Steven White
Thanks Andy.

I have been thinking along the same line as your solution, and your
solution is what looks like I will have to do.

In summary, there is no Solr built-in way to achieve my need, I have to
construct my document and build a query to get this working.

Steve

On Mon, Apr 20, 2015 at 10:57 AM, Andrew Chillrud 
wrote:

> Don't know if this is what you are looking for, but we had a similar
> requirement. In our case each folder had a unique identifier associated
> with it.
>
> When generating the Solr input document our code populated 2 fields,
> parent_folder, and folder_hierarchy (multi-valued), and for a document in
> the root->foo->bar folder added:
>
> parent_folder:
> folder_hierarchy:
> folder_hierarchy:
> folder_hierarchy:
>
> At search time, if you wanted to restrict your search within the folder
> 'bar' we generated a filter query for either 'parent_folder: folder>' or 'folder_hierarchy:' depending on whether you
> wanted only documents directly under the 'bar' folder (your case 3), or at
> any level underneath 'bar' (your case 1).
>
> If your folders don't have unique identifiers then you could achieve
> something similar by indexing the folder paths in string fields:
>
> parent_folder:root|foo|bar
> folder_hierarchy:root|foo|bar
> folder_hierarchy:root|foo
> folder_hierarchy:root
>
> and generating a fq for either 'parent_folder:root|foo|bar' or
> 'folder_hierarchy:root|foo|bar'
>
> If you didn't want to have to generate all the permutations for the
> folder_hierarchy field before sending the document to Solr for indexing you
> should be able to do something like:
>
>positionIncrementGap="100">
> 
>   
> 
> 
>   
> 
>   
>
>  indexed="true" stored="true"  multiValued="false"/>
>stored="true"  multiValued="true"/>
>
>   
>
> In which case you could just send in the 'folder_parent' field and Solr
> would generate the folder_hierarchy field.
>
> For cases 2 and 4 you could do something similar by adding 2 additional
> fields that just index the folder names instead of the paths.
>
> - Andy -
>
> -Original Message-
> From: Steven White [mailto:swhite4...@gmail.com]
> Sent: Monday, April 20, 2015 9:49 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Multilevel nested level support using Solr
>
> Re sending to see if anyone can help.  Thanks
>
> Steve
>
> On Fri, Apr 17, 2015 at 12:14 PM, Steven White 
> wrote:
>
> > Hi folks,
> >
> > In my DB, my records are nested in a folder base hierarchy:
> >
> > 
> > 
> > record_1
> > record_2
> > 
> > record_3
> > record_4
> > 
> > record_5
> > 
> > 
> > 
> > record_6
> > record_7
> > record_8
> >
> > You got the idea.
> >
> > Is there anything in Solr that will let me preserve this structer and
> > thus when I'm searching to tell it in which level to narrow down the
> > search?  I have four search levels needs:
> >
> > 1) Be able to search inside only level: ...*
> > (and everything under Level_2 from this path).
> >
> > 2) Be able to search inside a level regardless it's path: .*
> > (no matter where  is, i want to search on all records under
> > Level_2 and everything under it's path.
> >
> > 3) Same as #1 but limit the search to within that level (nothing below
> > its level are searched).
> >
> > 4) Same as #3 but limit the search to within that level (nothing below
> > its level are searched).
> >
> > I found this:
> > https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+I
> > ndex+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments
> > but it looks like it supports one level only and requires the whole
> > two levels be updated even if 1 of the doc in the nest is updated.
> >
> > Thanks
> >
> > Steve
> >
>


Re: search by person name

2015-04-20 Thread Steven White
Why not just use q=name:(ana jose) ?  Than missing words or words order
won't matter.  No?

Steve

On Mon, Apr 20, 2015 at 12:26 PM, Erick Erickson 
wrote:

> First, a little patience on your part please, we're all volunteers here.
>
> Second, what have you done to try to analyze the problem? Have you
> tried adding &debgu=query to to your URL? Looked at the analysis page?
> Anything else?
>
> You might review: http://wiki.apache.org/solr/UsingMailingLists
>
> My guess (and Rafal provided you a strong clue if my guess is right)
> is that by enclosing "ana jose" in quotes you've created a phrase
> query that requires the two words to be right next to each other and
> they have "maria" between them. Using "slop", i.e. "ana jose"~2 should
> find the doc if I'm correct.
>
> Best,
> Erick
>
> On Mon, Apr 20, 2015 at 7:41 AM, Pedro Figueiredo
>  wrote:
> > Any help please?
> >
> > PF
> >
> > -Original Message-
> > From: Pedro Figueiredo [mailto:pjlfigueir...@criticalsoftware.com]
> > Sent: 20 de abril de 2015 14:19
> > To: solr-user@lucene.apache.org
> > Subject: RE: search by person name
> >
> > yes
> >
> > Pedro Figueiredo
> > Senior Engineer
> >
> > pjlfigueir...@criticalsoftware.com
> > M. 934058150
> >
> >
> > Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal T. 
> > +351
> 229 446 927 | F. +351 229 446 929 www.criticalsoftware.com
> >
> > PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI®
> LEVEL 5 RATED COMPANY CMMI® is registered in the USPTO by CMU"
> >
> >
> >
> > -Original Message-
> > From: Rafal Kuc [mailto:ra...@alud.com.pl]
> > Sent: 20 de abril de 2015 14:10
> > To: solr-user@lucene.apache.org
> > Subject: Re: search by person name
> >
> > Hello,
> >
> > How does you query look like? Do you use phrase query, like q=name:"ana
> jose" ?
> >
> > ---
> > Regards,
> > Rafał Kuć
> >
> >
> >
> >
> >> Wiadomość napisana przez Pedro Figueiredo <
> pjlfigueir...@criticalsoftware.com> w dniu 20 kwi 2015, o godz. 15:06:
> >>
> >> Hi all,
> >>
> >> Can anyone advise the tokens and filters to use, for the most common
> way to search by people’s names.
> >> The basics requirements are:
> >>
> >> For field name – “Ana Maria José”
> >> The following search’s should return the example:
> >> 1.   “Ana”
> >> 2.   “Maria”
> >> 3.   “Jose”
> >> 4.   “ana maria”
> >> 5.   “ana jose”
> >>
> >> With the following configuration I’m not able to satisfy all the
> searches (namely the last one….):
> >> 
> >> 
> >> 
> >>
> >> Thanks in advanced,
> >>
> >> Pedro Figueiredo
> >> Senior Engineer
> >>
> >> pjlfigueir...@criticalsoftware.com
> >> 
> >> M. 934058150
> >>
> >>
> >> Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal
> >> T. +351 229 446 927 | F. +351 229 446 929 www.criticalsoftware.com
> >> 
> >>
> >> PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI®
> >> LEVEL 5 RATED COMPANY  CMMI® is registered
> in the USPTO by CMU "
> >
> >
>


Re: Differentiating user search term in Solr

2015-04-20 Thread Steven White
Hi Erick,

I didn't know about ClientUtils.escapeQueryChars(), this is good to know.
Unfortunately I cannot use it because it means I have to import Solr
classes with my client application.  I want to avoid that and create a
lose coupling between my application and Solr (just rely on REST).

My suggestion is to add a new URL parameter to Solr, such as
"q.ignoreOperators=[true | false]" (or some other name).  If this parameter
is set to "false" or is missing, than the current behavior takes effect, if
it is set to "true" than Solr will treat everything in the search string by
first passing it to ClientUtils.escapeQueryChars().  This way, the client
application doesn't have to: a) be tightly coupled with Solr (require to
link with Solr JARs to use escapeQueryChars), and b) keep up with Solr when
new operators are added.

What do you think?

Steve

On Mon, Apr 20, 2015 at 12:41 PM, Erick Erickson 
wrote:

> Steve:
>
> In short, no. There's no good way for Solr to solve this problem in
> the _general_ case. Well, actually we could create parsers with rules
> like "if the colon is inside a paren, escape it). Which would
> completely break someone who wants to form queries like
>
> q=field1:whatever AND (a AND field:b) OR (field2:c AND "d: is a letter
> followed by a colon (:)").
>
> You say: " A better solution would be to have Solr support a new
> parameter that I can pass to Solr as part of the URL."
>
> How would Solr know _which_ parts of the URL to escape in the case above?
>
> You have to do this at the app layer as that's the only place that has
> a clue what the peculiarities of the situation are.
>
> But if you're using SolrJ in your app layer, you can use
> ClientUtils.escapeQueryChars() for user-entered data to do the
> escaping without you having to maintain a separate list.
>
> Best,
> Erick
>
> On Mon, Apr 20, 2015 at 8:39 AM, Steven White 
> wrote:
> > Hi Shawn,
> >
> > If the user types "title:(Apache: Solr Notes)" (without quotes) than I
> want
> > Solr to treat the whole string as raw text string as if I escaped ":",
> "("
> > and ")" and any other reserved Solr keywords / tokens.  Using dismax it
> > worked for the ":" case, but I still get SyntaxError if I pass it the
> > following "title:(Apache: Solr Notes) AND" (here is the full URL):
> >
> >
> >
> http://localhost:8983/solr/db/select?q=title:(Apache:%20Solr%20Notes)%20AND&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&q.op=AND&defType=dismax&qf=title
> >
> > So far, the only solution I can find is for my application to escape all
> > Solr operators before sending the string to Solr.  This is fine, but it
> > means my application will have to adopt to Solr's reserved operators as
> > Solr grows (if Solr 5.x / 6.x adds a new operator, I have to add that to
> my
> > applications escape list).  A better solution would be to have Solr
> support
> > a new parameter that I can pass to Solr as part of the URL.
> > This parameter will tell Solr to do the escaping for me or not (missing
> > means the same as don't do the escaping).
> >
> > Thanks
> >
> > Steve
> >
> > On Mon, Apr 20, 2015 at 10:05 AM, Shawn Heisey 
> wrote:
> >
> >> On 4/20/2015 7:41 AM, Steven White wrote:
> >> > In my application, a user types "Apache Solr Notes".  I take that text
> >> and
> >> > send it over to Solr like so:
> >> >
> >> >
> >> >
> >>
> http://localhost:8983/solr/db/select?q=title:(Apache%20Solr%20Notes)&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&q.op=AND
> >> >
> >> > And I get a hit on "Apache Solr Release Notes".  This is all good.
> >> >
> >> > Now if the same user types "Apache: Solr Notes" (notice the ":" after
> >> > "Apache") I will get a SyntaxError.  The fix is to escape ":" before I
> >> send
> >> > it to Solr.  What I want to figure out is how can I tell Solr /
> Lucene to
> >> > ignore ":" and escape it for me?  In this example, I used ":" but my
> need
> >> > is for all other operators and reserved Solr / Lucene characters.
> >>
> >> If we assume that what you did for the first query is what you will do
> >> for the second query, then this is what you would have sent:
> >>
> >> q=title:(Apache: Solr Notes)
> >>
> >> How is the parser supposed to know that only the second colon s

Re: Differentiating user search term in Solr

2015-04-20 Thread Steven White
Hi Erick,

I think you missed my point.  My request is, Solr support a new URL
parameter.  If this parameter is set, than EVERYTHING in q is treated as
raw text (i.e.: Solr will do the escaping vs. the client).

Thanks

Steve

On Mon, Apr 20, 2015 at 1:08 PM, Erick Erickson 
wrote:

> How does that address the example query I gave?
>
> q=field1:whatever AND (a AND field:b) OR (field2:c AND "d: is a letter
> followed by a colon (:)").
>
> bq: "Solr will treat everything in the search string by first passing
> it to ClientUtils.escapeQueryChars()."
>
> would incorrectly escape the colons after field1, field, field2 and
> correctly escape the colon after d and in parens. And parens are a
> reserved character too, so it would incorrectly escape _all_ the
> parens except the ones surrounding the colon.
>
> The list of reserved characters is pretty unchanging, so I don't think
> it's too much to ask the app layer, which knows (at least it better
> know) which bits of the query were user entered, what rules apply as
> to whether the user can enter field-qualified searches etc. Only armed
> with that knowledge can the right thing be done, and Solr has no
> knowledge of those rules.
>
> If you insist that the client shouldn't deal with that, you could
> always write a custom component that enforces the rules that are
> particular to your setup. For instance, you may have a rule that you
> can never field-qualify any term, in which case escaping on the Solr
> side would work in _your_ situation. But the general case just doesn't
> fit into the "escape on the Solr side" paradigm.
>
> Best,
> Erick
>
>
> On Mon, Apr 20, 2015 at 9:55 AM, Steven White 
> wrote:
> > Hi Erick,
> >
> > I didn't know about ClientUtils.escapeQueryChars(), this is good to know.
> > Unfortunately I cannot use it because it means I have to import Solr
> > classes with my client application.  I want to avoid that and create a
> > lose coupling between my application and Solr (just rely on REST).
> >
> > My suggestion is to add a new URL parameter to Solr, such as
> > "q.ignoreOperators=[true | false]" (or some other name).  If this
> parameter
> > is set to "false" or is missing, than the current behavior takes effect,
> if
> > it is set to "true" than Solr will treat everything in the search string
> by
> > first passing it to ClientUtils.escapeQueryChars().  This way, the client
> > application doesn't have to: a) be tightly coupled with Solr (require to
> > link with Solr JARs to use escapeQueryChars), and b) keep up with Solr
> when
> > new operators are added.
> >
> > What do you think?
> >
> > Steve
> >
> > On Mon, Apr 20, 2015 at 12:41 PM, Erick Erickson <
> erickerick...@gmail.com>
> > wrote:
> >
> >> Steve:
> >>
> >> In short, no. There's no good way for Solr to solve this problem in
> >> the _general_ case. Well, actually we could create parsers with rules
> >> like "if the colon is inside a paren, escape it). Which would
> >> completely break someone who wants to form queries like
> >>
> >> q=field1:whatever AND (a AND field:b) OR (field2:c AND "d: is a letter
> >> followed by a colon (:)").
> >>
> >> You say: " A better solution would be to have Solr support a new
> >> parameter that I can pass to Solr as part of the URL."
> >>
> >> How would Solr know _which_ parts of the URL to escape in the case
> above?
> >>
> >> You have to do this at the app layer as that's the only place that has
> >> a clue what the peculiarities of the situation are.
> >>
> >> But if you're using SolrJ in your app layer, you can use
> >> ClientUtils.escapeQueryChars() for user-entered data to do the
> >> escaping without you having to maintain a separate list.
> >>
> >> Best,
> >> Erick
> >>
> >> On Mon, Apr 20, 2015 at 8:39 AM, Steven White 
> >> wrote:
> >> > Hi Shawn,
> >> >
> >> > If the user types "title:(Apache: Solr Notes)" (without quotes) than I
> >> want
> >> > Solr to treat the whole string as raw text string as if I escaped ":",
> >> "("
> >> > and ")" and any other reserved Solr keywords / tokens.  Using dismax
> it
> >> > worked for the ":" case, but I still get SyntaxError if I pass it the
> >> > following "title:(Apache: Solr Notes) AND" (here is the full

  1   2   3   >