SuggestComponent and edismax type boosting

2016-05-25 Thread james

Hi

I'm setting up an autosuggester for Geonames location data with Solr 
6.0.0, and have followed something like 
https://lucidworks.com/blog/2015/03/04/solr-suggester/


I can rank results with "weightField" for population, but I wonder if is 
it possible to further boost/rank the results based on the content of 
other (country text code) fields. I have done this through EDISMAX which 
boosts my standard select? results according to 3 other fields 
(population/country/featureCode) but I can't get it to work with 
SuggestComponent.


Could/should I somehow populate a weightField as a combination of the 3 
boosting fields?


Is this possible, or am I best off indexing with Ngrams like this:
http://www.andornot.com/blog/post/Advanced-autocomplete-with-Solr-Ngrams-and-Twitters-typeaheadjs.aspx

Thoughts most welcome.

Thanks
James



Howto verify that update is "in-place"

2017-10-17 Thread James
I am using Solr 6.6 and carefully read the documentation about atomic and
in-place updates. I am pretty sure that everything is set up as it should.

 

But how can I make certain that a simple update command actually performs an
in-place update without internally re-indexing all other fields?

 

I am issuing this command to my server:

(I am using implicit document routing, so I need the "Shard" parameter.)

 

{

"ID":1133,

"Property_2":{"set":124},

"Shard":"FirstShard"

}

 

 

The log outputs:

 

2017-10-17 07:39:18.701 INFO  (qtp1937348256-643) [c:MyCollection
s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1]
o.a.s.u.p.LogUpdateProcessorFactory [MyCollection_FirstShard_replica1]
webapp=/solr path=/update
params={commitWithin=1000&boost=1.0&overwrite=true&wt=json&_=1508221142230}{
add=[1133 (1581489542869811200)]} 0 1

2017-10-17 07:39:19.703 INFO  (commitScheduler-283-thread-1) [c:MyCollection
s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1]
o.a.s.u.DirectUpdateHandler2 start
commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=f
alse,softCommit=true,prepareCommit=false}

2017-10-17 07:39:19.703 INFO  (commitScheduler-283-thread-1) [c:MyCollection
s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1]
o.a.s.s.SolrIndexSearcher Opening
[Searcher@32d539b4[MyCollection_FirstShard_replica1] main]

2017-10-17 07:39:19.703 INFO  (commitScheduler-283-thread-1) [c:MyCollection
s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1]
o.a.s.u.DirectUpdateHandler2 end_commit_flush

2017-10-17 07:39:19.703 INFO
(searcherExecutor-268-thread-1-processing-n:192.168.117.142:8983_solr
x:MyCollection_FirstShard_replica1 s:FirstShard c:MyCollection
r:core_node27) [c:MyCollection s:FirstShard r:core_node27
x:MyCollection_FirstShard_replica1] o.a.s.c.QuerySenderListener
QuerySenderListener sending requests to
Searcher@32d539b4[MyCollection_FirstShard_replica1]
main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_i(6.6.0
):C5011/1) Uninverting(_j(6.6.0):C478) Uninverting(_k(6.6.0):C345)
Uninverting(_l(6.6.0):C4182) Uninverting(_m(6.6.0):C317)
Uninverting(_n(6.6.0):C399) Uninverting(_q(6.6.0):C1)))}

2017-10-17 07:39:19.703 INFO
(searcherExecutor-268-thread-1-processing-n:192.168.117.142:8983_solr
x:MyCollection_FirstShard_replica1 s:FirstShard c:MyCollection
r:core_node27) [c:MyCollection s:FirstShard r:core_node27
x:MyCollection_FirstShard_replica1] o.a.s.c.QuerySenderListener
QuerySenderListener done.

2017-10-17 07:39:19.703 INFO
(searcherExecutor-268-thread-1-processing-n:192.168.117.142:8983_solr
x:MyCollection_FirstShard_replica1 s:FirstShard c:MyCollection
r:core_node27) [c:MyCollection s:FirstShard r:core_node27
x:MyCollection_FirstShard_replica1] o.a.s.c.SolrCore
[MyCollection_FirstShard_replica1] Registered new searcher
Searcher@32d539b4[MyCollection_FirstShard_replica1]
main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_i(6.6.0
):C5011/1) Uninverting(_j(6.6.0):C478) Uninverting(_k(6.6.0):C345)
Uninverting(_l(6.6.0):C4182) Uninverting(_m(6.6.0):C317)
Uninverting(_n(6.6.0):C399) Uninverting(_q(6.6.0):C1)))}

 

If I issue another, non-in-place update to another field which is not a
DocValue, the log output is very similar. Can I increase verbosity? Will it
tell me more about the type of update then?

 

Thank you!

James

 

 

 



AW: Howto verify that update is "in-place"

2017-10-17 Thread James
Hi Emir and Amrit, thanks for your reponses!

@Emir: Nice idea but after changing any document in any way and after 
committing the changes, all Doc counter (Num, Max, Deleted) are still the same, 
only thing that changes is the Version (increases by steps of 2) .

@Amrit: Are you saying that the _version_ field should not change when 
performing an atomic update operation?

Thanks
James


-Ursprüngliche Nachricht-
Von: Amrit Sarkar [mailto:sarkaramr...@gmail.com] 
Gesendet: Dienstag, 17. Oktober 2017 11:35
An: solr-user@lucene.apache.org
Betreff: Re: Howto verify that update is "in-place"

Hi James,

As for each update you are doing via atomic operation contains the "id" / 
"uniqueKey". Comparing the "_version_" field value for one of them would be 
fine for a batch. Rest, Emir has list them out.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Tue, Oct 17, 2017 at 2:47 PM, Emir Arnautović < 
emir.arnauto...@sematext.com> wrote:

> Hi James,
> I did not try, but checking max and num doc might give you info if 
> update was in-place or atomic - atomic is reindexing of existing doc 
> so the old doc will be deleted. In-place update should just update doc 
> values of existing doc so number of deleted docs should not change.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection Solr & 
> Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 17 Oct 2017, at 09:57, James  wrote:
> >
> > I am using Solr 6.6 and carefully read the documentation about 
> > atomic and in-place updates. I am pretty sure that everything is set 
> > up as it
> should.
> >
> >
> >
> > But how can I make certain that a simple update command actually
> performs an
> > in-place update without internally re-indexing all other fields?
> >
> >
> >
> > I am issuing this command to my server:
> >
> > (I am using implicit document routing, so I need the "Shard" 
> > parameter.)
> >
> >
> >
> > {
> >
> > "ID":1133,
> >
> > "Property_2":{"set":124},
> >
> > "Shard":"FirstShard"
> >
> > }
> >
> >
> >
> >
> >
> > The log outputs:
> >
> >
> >
> > 2017-10-17 07:39:18.701 INFO  (qtp1937348256-643) [c:MyCollection 
> > s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1]
> > o.a.s.u.p.LogUpdateProcessorFactory 
> > [MyCollection_FirstShard_replica1]
> > webapp=/solr path=/update
> > params={commitWithin=1000&boost=1.0&overwrite=true&wt=
> json&_=1508221142230}{
> > add=[1133 (1581489542869811200)]} 0 1
> >
> > 2017-10-17 07:39:19.703 INFO  (commitScheduler-283-thread-1)
> [c:MyCollection
> > s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1]
> > o.a.s.u.DirectUpdateHandler2 start
> > commit{,optimize=false,openSearcher=false,waitSearcher=true,
> expungeDeletes=f
> > alse,softCommit=true,prepareCommit=false}
> >
> > 2017-10-17 07:39:19.703 INFO  (commitScheduler-283-thread-1)
> [c:MyCollection
> > s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1]
> > o.a.s.s.SolrIndexSearcher Opening
> > [Searcher@32d539b4[MyCollection_FirstShard_replica1] main]
> >
> > 2017-10-17 07:39:19.703 INFO  (commitScheduler-283-thread-1)
> [c:MyCollection
> > s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1]
> > o.a.s.u.DirectUpdateHandler2 end_commit_flush
> >
> > 2017-10-17 07:39:19.703 INFO
> > (searcherExecutor-268-thread-1-processing-n:192.168.117.142:8983_sol
> > r
> > x:MyCollection_FirstShard_replica1 s:FirstShard c:MyCollection
> > r:core_node27) [c:MyCollection s:FirstShard r:core_node27 
> > x:MyCollection_FirstShard_replica1] o.a.s.c.QuerySenderListener 
> > QuerySenderListener sending requests to 
> > Searcher@32d539b4[MyCollection_FirstShard_replica1]
> > main{ExitableDirectoryReader(UninvertingDirectoryReader(
> Uninverting(_i(6.6.0
> > ):C5011/1) Uninverting(_j(6.6.0):C478) Uninverting(_k(6.6.0):C345)
> > Uninverting(_l(6.6.0):C4182) Uninverting(_m(6.6.0):C317)
> > Uninverting(_n(6.6.0):C399) Uninverting(_q(6.6.0):C1)))}
> >
> > 2017-10-17 07:39:19.703 INFO
> > (searcherExecutor-268-thread-1-processing-n:192.168.117.142:8983_sol
> > r
> > x:MyCollection_FirstShard_replica1 s:FirstShard c:MyCollection
> > r:core_node27) [c:MyCollection s:FirstShard r:core_node27 
> > 

AW: Howto verify that update is "in-place"

2017-10-17 Thread James
I found a solution which works for me:

Add a document with very little tokenized text and write down QTime (for me: 
5ms)
Add another document with very much text (I used about 1MB of Lorem Ipsum 
sample text) and write down QTime (for me: 70ms).
Perform an update operation on document 2 which you want to test whether it is 
"in-place" and compare QTime.
For me it was again 70ms. So I assume that my operation did re-index the whole 
document and was thus not an in-place update.


-Ursprüngliche Nachricht-
Von: Amrit Sarkar [mailto:sarkaramr...@gmail.com] 
Gesendet: Dienstag, 17. Oktober 2017 12:43
An: solr-user@lucene.apache.org
Betreff: Re: Howto verify that update is "in-place"

James,

@Amrit: Are you saying that the _version_ field should not change when
> performing an atomic update operation?


It should change. a new version will be allotted to the document. I am not that 
sure about in-place updates, probably a test run will verify that.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Tue, Oct 17, 2017 at 4:06 PM, James  wrote:

> Hi Emir and Amrit, thanks for your reponses!
>
> @Emir: Nice idea but after changing any document in any way and after 
> committing the changes, all Doc counter (Num, Max, Deleted) are still 
> the same, only thing that changes is the Version (increases by steps of 2) .
>
> @Amrit: Are you saying that the _version_ field should not change when 
> performing an atomic update operation?
>
> Thanks
> James
>
>
> -Ursprüngliche Nachricht-
> Von: Amrit Sarkar [mailto:sarkaramr...@gmail.com]
> Gesendet: Dienstag, 17. Oktober 2017 11:35
> An: solr-user@lucene.apache.org
> Betreff: Re: Howto verify that update is "in-place"
>
> Hi James,
>
> As for each update you are doing via atomic operation contains the 
> "id" / "uniqueKey". Comparing the "_version_" field value for one of 
> them would be fine for a batch. Rest, Emir has list them out.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>
> On Tue, Oct 17, 2017 at 2:47 PM, Emir Arnautović < 
> emir.arnauto...@sematext.com> wrote:
>
> > Hi James,
> > I did not try, but checking max and num doc might give you info if 
> > update was in-place or atomic - atomic is reindexing of existing doc 
> > so the old doc will be deleted. In-place update should just update 
> > doc values of existing doc so number of deleted docs should not change.
> >
> > HTH,
> > Emir
> > --
> > Monitoring - Log Management - Alerting - Anomaly Detection Solr & 
> > Elasticsearch Consulting Support Training - http://sematext.com/
> >
> >
> >
> > > On 17 Oct 2017, at 09:57, James  wrote:
> > >
> > > I am using Solr 6.6 and carefully read the documentation about 
> > > atomic and in-place updates. I am pretty sure that everything is 
> > > set up as it
> > should.
> > >
> > >
> > >
> > > But how can I make certain that a simple update command actually
> > performs an
> > > in-place update without internally re-indexing all other fields?
> > >
> > >
> > >
> > > I am issuing this command to my server:
> > >
> > > (I am using implicit document routing, so I need the "Shard"
> > > parameter.)
> > >
> > >
> > >
> > > {
> > >
> > > "ID":1133,
> > >
> > > "Property_2":{"set":124},
> > >
> > > "Shard":"FirstShard"
> > >
> > > }
> > >
> > >
> > >
> > >
> > >
> > > The log outputs:
> > >
> > >
> > >
> > > 2017-10-17 07:39:18.701 INFO  (qtp1937348256-643) [c:MyCollection 
> > > s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1]
> > > o.a.s.u.p.LogUpdateProcessorFactory
> > > [MyCollection_FirstShard_replica1]
> > > webapp=/solr path=/update
> > > params={commitWithin=1000&boost=1.0&overwrite=true&wt=
> > json&_=1508221142230}{
> > > add=[1133 (1581489542869811200)]} 0 1
> > >
> > > 2017-10-17 07:39:19.703 INFO  (commitScheduler-283-thread-1)
> > [c:MyCollection
> > > s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1]
> > > o.a.s.u.DirectUpdateHandler2 start 
> > 

No in-place updates with router.field set

2017-10-19 Thread James
Steps to reproduce:

Use Solr in SolrCloud mode.
Create collection with implicit routing and router.field set to some field,
e.g. "routerfield".
Index very small document. Stop time -> X
Index very large document. Stop time -> Y
Apply update to large document. Note that update command has at least three
entries:
{
 "ID":1133,
 "Property_2":{"set":124},
 "routerfield":"FirstShard"
 }
QTime of update will always be closer to Y than to X.

If I repeat these steps without setting router.field while creating the
collection, QTime of update will be very close X.


>From this simple test I conclude that router.field somehow prevents updates
from being performed as in-place updates.
Can anyone confirm? Is this a bug? Anybody care to open a Jira item if
necessary?

According to the first comment on
https://issues.apache.org/jira/browse/SOLR-8889 the router.field option is
hardly tested and there seem to be also other related problems.





BlendedTermQuery for Solr?

2017-10-24 Thread James
 

On my Solr 6.6 server I'd like to use BlendedTermQuery.

https://lucene.apache.org/core/6_6_0/core/org/apache/lucene/search/BlendedTe
rmQuery.html

 

I know it is a Lucene class. Is there a Solr API available to access it? If
not, maybe some workaround?

 

Thanks!



how to avoid OOM while merge index

2012-01-09 Thread James
I am build the solr index on the hadoop, and at reduce step I run the task that 
merge the indexes, each part of index is about 1G, I have 10 indexes to merge 
them together, I always get the java heap memory exhausted, the heap size is 
about 2G  also. I wonder which part use these so many memory. And how to avoid 
the OOM during the merge process.


Re:Re: how to avoid OOM while merge index

2012-01-09 Thread James
Sinece the hadoop task monitor will check each task, and when find it consume 
to much memory, then it will kill the task, so I am currently want to find a 
method to decrease the mem usage at solr side, any idea?
At 2012-01-09 17:07:09,"Tomas Zerolo"  wrote:
>On Mon, Jan 09, 2012 at 01:29:39PM +0800, James wrote:
>> I am build the solr index on the hadoop, and at reduce step I run the task 
>> that merge the indexes, each part of index is about 1G, I have 10 indexes to 
>> merge them together, I always get the java heap memory exhausted, the heap 
>> size is about 2G  also. I wonder which part use these so many memory. And 
>> how to avoid the OOM during the merge process.
>
>There are three issues in there. You should first try to find out which
>one it is (it's not clear to me based on your question):
>
>  - Java heap memory: you can set that as a start option of the JVM.
>You set the maximum with the -Xmxn start option. You get an
>OutOfMemory exception if you reach that (no idea wheter the
>SOLR code bubbles this up, but there are experts on that here).
>  - Operating system limit: you can set the limit for a process's
>use of resources (memory, among others). Typically, Linux based
>systems are shipped with unlimited memory setting; Ralf already
>posted how to check/set that.
>The situation here is a bit complicated, because there are
>different limits (memory size vs. virtual memory size, mainly)
>and they are exercised differently depending on the allocation
>pattern. Anyway, I'd expect malloc() returning NULL in this
>case and the Java runtime translating it (again) into an OutOfMemory
>exception.
>  - Now the OOM killer is quite another kettle of fish. AFAIK, it's
>Linux-specific. Once the global system memory is more-or-less
>exhausted, the kernel kills some applications to try to improve
>the situation. There's some heuristic in deciding which application
>to kill, and there are some knobs to help the kernel in this
>decision. I'd recommend [1]; after reading *that* you know all :-)
>You know you've run into that by looking at the system log.
>
>
>[1] <https://lwn.net/Articles/317814/>
>-- 
>Tomás Zerolo
>Axel Springer AG
>Axel Springer media Systems
>BILD Produktionssysteme
>Axel-Springer-Straße 65
>10888 Berlin
>Tel.: +49 (30) 2591-72875
>tomas.zer...@axelspringer.de
>www.axelspringer.de
>
>Axel Springer AG, Sitz Berlin, Amtsgericht Charlottenburg, HRB 4998
>Vorsitzender des Aufsichtsrats: Dr. Giuseppe Vita
>Vorstand: Dr. Mathias Döpfner (Vorsitzender)
>Jan Bayer, Ralph Büchi, Lothar Lanz, Dr. Andreas Wiele



Re:Re: is there any practice to load index into RAM to accelerate solr performance?

2012-02-08 Thread James
But the solr did not have the im-memory index, I am right?





At 2012-02-08 16:17:49,"Ted Dunning"  wrote:
>This is true with Lucene as it stands.  It would be much faster if there
>were a specialized in-memory index such as is typically used with high
>performance search engines.
>
>On Tue, Feb 7, 2012 at 9:50 PM, Lance Norskog  wrote:
>
>> Experience has shown that it is much faster to run Solr with a small
>> amount of memory and let the rest of the ram be used by the operating
>> system "disk cache". That is, the OS is very good at keeping the right
>> disk blocks in memory, much better than Solr.
>>
>> How much RAM is in the server and how much RAM does the JVM get? How
>> big are the documents, and how large is the term index for your
>> searches? How many documents do you get with each search? And, do you
>> use filter queries- these are very powerful at limiting searches.
>>
>> 2012/2/7 James :
>> > Is there any practice to load index into RAM to accelerate solr
>> performance?
>> > The over all documents is about 100 million. The search time around
>> 100ms. I am seeking some method to accelerate the respond time for solr.
>> > Just check that there is some practice use SSD disk. And SSD is also
>> cost much, just want to know is there some method like to load the index
>> file in RAM and keep the RAM index and disk index synchronized. Then I can
>> search on the RAM index.
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>>


Documentation Slop (DisMax parser)

2018-01-18 Thread James
Hi:

 

There seems to be an error in the documentation about the slop parameter ps
used by the eDisMax parser. It reads:

 

 

"This means that if the terms "foo" and "bar" appear in the document with
less than 10 terms between each

other, the phrase will match."

 

 

Counterexample:

"Foo one two three four five fix seven eight nine bar" will not match with
ps=10

 

It seems that it must be "less than 9".

 

 

However, when more query terms are used it gets complicated when one tries
to count words in between.

 

 

Easier to understand (and correct according to my testing) would be
something like:

 

"This means that if the terms "foo" and "bar" appear in the document within
a group of 10 or less terms, the phrase will match. For example the doc that
says:

*Foo* term1 term2 term3 *bar*

will match the phrase query. A document that says

*Foo* term1 term2 term3 term4 term5 term6 term7 term8 term9 *bar* 

will not (because the search terms are within a group of 11 terms).

Note: If any search term is a MUST-NOT term, the phrase slop query will
never match.

"

 

 

Anybody willing to review and change to documentation?

 

Thanks,

James

 

 



RE: Solr Basic Configuration - Highlight - Begginer

2015-12-16 Thread Teague James
Hi Evert,

I recently needed help with phrase highlighting and was pointed to the 
FastVectorHighlighter which worked out great. I just made a change to the 
configuration to add generateWordParts="0" and generateNumberParts="0" so that 
searches for things like "1a" would get highlighted correctly. You may or may 
not need that feature. You can always remove them or change the value to "1" to 
switch them on explicitly. Anyway, hope this helps!

solrconfig.xml (partial snip)


xml
explicit
10
documentText
on
text
true
100





schema.xml (partial snip)

   




















-Teague

From: Evert R. [mailto:evert.ra...@gmail.com] 
Sent: Tuesday, December 15, 2015 6:25 AM
To: solr-user@lucene.apache.org
Subject: Solr Basic Configuration - Highlight - Begginer

Hi there!

It´s my first installation, not sure if here is the right channel...

Here is my steps:

1. Set up a basic install of solr 5.4.0

2. Create a new core through command line (bin/solr create -c test)

3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/)

4. Query over the browser and it brings the correct search, but it does not 
show the part of the text I am querying, the highlight. 

  I have already flagled the 'hl' option. But still it does not word...

Exemple: I am looking for the word 'peace' in my pdf file (book) I have 4 
matches for this word, it shows me the book name (pdf file) but does not bring 
which part of the text it has the word peace on it.


I am problably missing some configuration in schema.xml, which is missing from 
my folder /solr/server/solr/test/conf/

Or even the solrconfig.xml...

I have read a bunch of things about highlight check these files, copied the 
standard schema.xml to my core/conf folder, but still it does not bring the 
highlight.


Attached a copy of my solrconfig.xml file.


I am very sorry for this, probably, dumb and too basic question... First time I 
see solr in live.


Any help will be appreciated.



Best regards,


Evert Ramos

mailto:evert.ra...@gmail.com




RE: Solr Basic Configuration - Highlight - Begginer

2015-12-16 Thread Teague James
Sorry to hear that didn't work! Let me ask a couple of questions...

Have you tried the analyzer inside of the Admin Interface? It has helped me 
sort out a number of highlighting issues in the past. To access it, go to your 
Admin interface, select your core, then select Analysis from the list of 
options on the left. In the analyzer, enter the term you are indexing in the 
top left (in other words the term in the document you are indexing that you 
expect to get a hit on) and right input fields. Select the field that it is 
destined for (in your case that would be 'content'), then hit analyze. Helps if 
you have a big screen!

This will show you the impact of the various filter factories that you have 
engaged and their effect on whether or not a 'hit' is being generated. Hits are 
idietified by a very feint highlight. (PSST... Developers... It would be really 
cool if the highlight color were more visible or customizable... Thanks y'all) 
If it looks like you're getting hits, but not getting highlighting, then open 
up a new tab with the Admin's query interface. Same place on the left as the 
analyzer. Replace the "*:*" with your search term (assuming you already indexed 
your document) and if necessary you can put something in the FQ like 
"id:123456" to target a specific record.

Did you get a hit? If no, then it's not highlighting that's the issue. If yes, 
then try dumping this in your address bar (using your URL/IP, search term, and 
core name of course. The fq= is an example) :
http://[URL/IP]/solr/[CORE-NAME]/select?fq=id:123456&q="[SEARCH-TERM]";

That will dump Solr's output to your browser where you can see exactly what is 
getting hit.

Hope that helps! Let me know how it goes. Good luck.

-Teague

-Original Message-
From: Evert R. [mailto:evert.ra...@gmail.com] 
Sent: Wednesday, December 16, 2015 1:46 PM
To: solr-user 
Subject: Re: Solr Basic Configuration - Highlight - Begginer

Hi Teague!

I configured the solrconf.xml and schema.xml exactly the way you did, only 
substituting the word 'documentText' per 'content' used by the techproducts 
sample, I reindex through :

 curl '
http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1&commit=true'
-F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"

with the same result no highlight in the respond as below:

"highlighting": { "pdf1": {} }

=(

Really... do not know what to do...

Thanks for your time, if you have any more suggestion where I could be missing 
something... please let me know.


Best regards,

*Evert*

2015-12-16 15:30 GMT-02:00 Teague James :

> Hi Evert,
>
> I recently needed help with phrase highlighting and was pointed to the 
> FastVectorHighlighter which worked out great. I just made a change to 
> the configuration to add generateWordParts="0" and 
> generateNumberParts="0" so that searches for things like "1a" would 
> get highlighted correctly. You may or may not need that feature. You 
> can always remove them or change the value to "1" to switch them on 
> explicitly. Anyway, hope this helps!
>
> solrconfig.xml (partial snip)
> 
> 
> xml
> explicit
> 10
> documentText
> on
> text
> true
> 100
> 
> 
> 
> 
>
> schema.xml (partial snip)
> required="true" multiValued="false" />
> multivalued="true" termVectors="true" termOffsets="true"
> termPositions="true" />
>
>  positionIncrementGap="100">
> 
> 
>  words="stopwords.txt" />
>  catenateAll="1" preserveOriginal="1" generateNumberParts="0"
> generateWordParts="0" />
>  synonyms="index_synonyms.txt" ignoreCase="true" expand="true"/>
> 
> 
> 
> 
> 
> 
>  catenateAll="1" preserveOriginal="1" generateWordParts="0" />
>  words="stopwords.txt" />
> 
> 
> 
> 
>
> -Teague
>
> From: Evert R. [mailto:evert.ra...@gmail.com]
> Sent: Tuesday, December 15, 2015 6:25 AM
> To: solr-user@lucene.apache.org
> Subject: Solr Basic Configuration - Highlight - Begginer
>
> Hi there!
>
> It´s my f

RE: DIH Caching w/ BerkleyBackedCache

2015-12-16 Thread Dyer, James
Todd,

I have no idea if this will perform acceptable with so many multiple values.  I 
doubt the solr/patch code was really optimized for such a use case.  In my 
production environment, I have je-6.2.31.jar on the classpath.  I don't think 
I've tried it with other versions.

James Dyer
Ingram Content Group

-Original Message-
From: Todd Long [mailto:lon...@gmail.com] 
Sent: Wednesday, December 16, 2015 10:21 AM
To: solr-user@lucene.apache.org
Subject: RE: DIH Caching w/ BerkleyBackedCache

James,

I apologize for the late response.


Dyer, James-2 wrote
> With the DIH request, are you specifying "cacheDeletePriorData=false"

We are not specifying that property (it looks like it defaults to "false").
I'm actually seeing this issue when running a full clean/import.

It appears that the Berkeley DB "cleaner" is always removing the oldest file
once there are three. In this case, I'll see two 1GB files and then as the
third file is being written (after ~200MB) the oldest 1GB file will fall off
(i.e. get deleted). I'm only utilizing ~13% disk space at the time. I'm
using Berkeley DB version 4.1.6 with Solr 4.8.1. I'm not specifying any
other configuration properties other than what I mentioned before. I simply
cannot figure out what is going on with the "cleaner" logic that would deem
that file "lowest utilized". Any other Berkeley DB/system configuration I
could consider that would affect this?

It's possible that this caching simply might not be suitable for our data
set where one document might contain a field with tens of thousands of
values... maybe this is the bottleneck with using this database as every add
copies in the prior data and then the "cleaner" removes the old stuff. Maybe
it's working like it should but just incredibly slow... I can get a full
index without caching in about two hours, however, when using this caching
it was still running after 24 hours (still caching the sub-entity).

Thanks again for the reply.

Respectfully,
Todd



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-Caching-w-BerkleyBackedCache-tp4240142p4245777.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr Basic Configuration - Highlight - Begginer

2015-12-17 Thread Teague James
 is being matched (probably
> > something like "text") and then try highlighting on _that_ field. Try
> > adding "debug=query" to the URL and look at the "parsed_query" section
> > of the return and you'll see what field(s) is/are actually being
> > searched against.
> >
> > NOTE: The field you highlight on _must_ have stored="true" in schema.xml.
> >
> > As to why "nietava" isn't being found in the content field, probably
> > you have some kind of analysis chain configured for that field that
> > isn't searching as you expect. See the admin/analysis page for some
> > insight into why that would be. The most frequent reason is that the
> > field is a "string" type which is not broken up into words. Another
> > possibility is that your analysis chain is leaving in the quotes or
> > something similar. As James says, looking at admin/analysis is a good
> > way to figure this out.
> >
> > I still strongly recommend you go from the stock techproducts example
> > and get familiar with how Solr (and highlighting) work before jumping
> > in and changing things. There are a number of ways things can be
> > mis-configured and trying to change several things at once is a fine
> > way to go mad. The admin UI>>schema browser is another way you can see
> > what kind of terms are _actually_ in your index in a particular field.
> >
> > Best,
> > Erick
> >
> >
> >
> >
> > On Wed, Dec 16, 2015 at 12:26 PM, Teague James  >
> > wrote:
> > > Sorry to hear that didn't work! Let me ask a couple of questions...
> > >
> > > Have you tried the analyzer inside of the Admin Interface? It has
> helped
> > me sort out a number of highlighting issues in the past. To access it, go
> > to your Admin interface, select your core, then select Analysis from the
> > list of options on the left. In the analyzer, enter the term you are
> > indexing in the top left (in other words the term in the document you are
> > indexing that you expect to get a hit on) and right input fields. Select
> > the field that it is destined for (in your case that would be 'content'),
> > then hit analyze. Helps if you have a big screen!
> > >
> > > This will show you the impact of the various filter factories that you
> > have engaged and their effect on whether or not a 'hit' is being
> generated.
> > Hits are idietified by a very feint highlight. (PSST... Developers... It
> > would be really cool if the highlight color were more visible or
> > customizable... Thanks y'all) If it looks like you're getting hits, but
> not
> > getting highlighting, then open up a new tab with the Admin's query
> > interface. Same place on the left as the analyzer. Replace the "*:*" with
> > your search term (assuming you already indexed your document) and if
> > necessary you can put something in the FQ like "id:123456" to target a
> > specific record.
> > >
> > > Did you get a hit? If no, then it's not highlighting that's the issue.
> > If yes, then try dumping this in your address bar (using your URL/IP,
> > search term, and core name of course. The fq= is an example) :
> > > http://[URL/IP]/solr/[CORE-NAME]/select?fq=id:123456&q="[SEARCH-TERM]";
> > >
> > > That will dump Solr's output to your browser where you can see exactly
> > what is getting hit.
> > >
> > > Hope that helps! Let me know how it goes. Good luck.
> > >
> > > -Teague
> > >
> > > -Original Message-
> > > From: Evert R. [mailto:evert.ra...@gmail.com]
> > > Sent: Wednesday, December 16, 2015 1:46 PM
> > > To: solr-user 
> > > Subject: Re: Solr Basic Configuration - Highlight - Begginer
> > >
> > > Hi Teague!
> > >
> > > I configured the solrconf.xml and schema.xml exactly the way you did,
> > only substituting the word 'documentText' per 'content' used by the
> > techproducts sample, I reindex through :
> > >
> > >  curl '
> > >
> >
> http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1&commit=true
> > '
> > > -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"
> > >
> > > with the same result no highlight in the respond as below:
> > >
> > > "highlighting": { "pdf1": {} }
> > >
> > > =(
> > >
> > >

RE: Spellcheck response format differs between a single core and SolrCloud

2016-01-11 Thread Dyer, James
Ryan,

The json response format changed for Solr 5.0.  See 
https://issues.apache.org/jira/browse/SOLR-3029 .  Is the single-core solr 
running a 4.x version with the cloud solr running 5.x ?  If they are both on 
the same major version, then we have a bug.

James Dyer
Ingram Content Group


-Original Message-
From: Ryan Yacyshyn [mailto:ryan.yacys...@gmail.com] 
Sent: Monday, January 11, 2016 12:32 AM
To: solr-user@lucene.apache.org
Subject: Spellcheck response format differs between a single core and SolrCloud

Hello,

I am using the spellcheck component for spelling suggestions and I've used
the same configurations in two separate projects, the only difference is
one project uses a single core and the other is a collection on SolrCloud
with three shards. The single core has about 56K docs and the one on
SolrCloud has 1M docs. Strangely, the format of the response is slightly
different between the two and I'm not sure why (particularly the collations
part). Was wondering if any can shed some light on this? Below is my
configuration and the results I'm getting.

This is in my "/select" searchHandler:


on
false
5
2
5
true
true
5
3

And my spellcheck component:



  
  
default
spelling
solr.DirectSolrSpellChecker
internal
0.5
2
1
5
4
0.01
  


Examples of each output can be found here:
https://gist.github.com/ryac/ceff8da00ec9f5b84106

Thanks,
Ryan


RE: How get around solr's spellcheck maxEdit limit of 2?

2016-01-21 Thread Dyer, James
But if you really need more than 2 edits, I think IndexBasedSpellChecker 
supports it.

James Dyer
Ingram Content Group

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, January 21, 2016 11:29 AM
To: solr-user
Subject: Re: How get around solr's spellcheck maxEdit limit of 2?

bq: ...is anyway to increase that maxEdit

IIUC, increasing maxEdit beyond 2 increases the space/time required
unacceptably, that limit is there on purpose, put there by people who
know their stuff.

Best,
Erick

On Thu, Jan 21, 2016 at 12:39 AM, Nitin Solanki  wrote:
> I am using Solr for spell Correction. Solr is limited to maxEdit of 2. Does
> there is anyway to increase that maxEdit without using phonetic mapping ?
> Please any suggestions



RE: How get around solr's spellcheck maxEdit limit of 2?

2016-01-22 Thread Dyer, James
See the old docs at 
https://wiki.apache.org/solr/SpellCheckComponent#Configuration

In particular, you need this line in solrconfig.xml:

./spellchecker


James Dyer
Ingram Content Group


-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Friday, January 22, 2016 11:20 AM
To: solr-user@lucene.apache.org
Subject: Re: How get around solr's spellcheck maxEdit limit of 2?

Ok, But IndexBasedSpellChecker needs a directory where all indexes are
stored to do spell check. I don't have any idea about
IndexBasedSpellChecker. If you send me snap configuration of that. It will
help me.. Thanks

On Fri, Jan 22, 2016 at 1:45 AM Dyer, James 
wrote:

> But if you really need more than 2 edits, I think IndexBasedSpellChecker
> supports it.
>
> James Dyer
> Ingram Content Group
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Thursday, January 21, 2016 11:29 AM
> To: solr-user
> Subject: Re: How get around solr's spellcheck maxEdit limit of 2?
>
> bq: ...is anyway to increase that maxEdit
>
> IIUC, increasing maxEdit beyond 2 increases the space/time required
> unacceptably, that limit is there on purpose, put there by people who
> know their stuff.
>
> Best,
> Erick
>
> On Thu, Jan 21, 2016 at 12:39 AM, Nitin Solanki 
> wrote:
> > I am using Solr for spell Correction. Solr is limited to maxEdit of 2.
> Does
> > there is anyway to increase that maxEdit without using phonetic mapping ?
> > Please any suggestions
>
>


unmerged index segments

2016-01-25 Thread James Mason
Hi,

I’ve have a large index that has been added to over several years, and I’ve 
discovered that I have many segments that haven’t been updated for well over a 
year, even though I’m adding, updating and deleting records daily. My five 
largest segments all haven’t been updated for over a year.

Meanwhile, the number of segments I have keeps on increasing, and I have 
hundreds of segment files that don’t seem to be getting merged past a certain 
size (e.g. the largest is 2Gb but my older segments are over 100Gb).

My understanding was that background merges should be merging these older 
segments with newer data over time, but this doesn’t seem to be the case.

I’m using Solr 4.9, but I was using an older version at the time that these 
‘older’ segments were created. 

Any help on suggestions of what’s happening would be very much appreciated. And 
also any suggestion on how I can monitor what’s happening with the background 
merges.

Thanks,

James

Re: unmerged index segments

2016-01-26 Thread James Mason
Hi Jack,

Sorry, I should have put them on my original message.

All merge policy settings are at their default except mergeFactor, which I now 
notice is quite high at 45. Unfortunately I don’t have the full history to see 
when this setting was changed, but I do know they haven’t been changed for well 
over a year, and that we did originally run Solr using the default settings.

So reading about mergeFactor it sounds like this is likely the problem, and 
we’re simply not asking Solr to merge into these old and large segments yet?

If I was to change this back down to the default of 10, would you expect we’d 
get quite an immediate and intense period of merging? 

If I was to launch a dupliacate test Solr instance, change the merge factor, 
and simply leave it for a few days, would it perform the background merge (so I 
can test to see if there’s enough memory etc for the merge to complete?).

Thanks,

James



> On 25 Jan 2016, at 21:39, Jack Krupansky  wrote:
> 
> What exacting are you merge policy settings in solrconfig? They control
> when the background merges will be performed. Sometimes they do need to be
> tweaked.
> 
> -- Jack Krupansky
> 
> On Mon, Jan 25, 2016 at 1:50 PM, James Mason 
> wrote:
> 
>> Hi,
>> 
>> I’ve have a large index that has been added to over several years, and
>> I’ve discovered that I have many segments that haven’t been updated for
>> well over a year, even though I’m adding, updating and deleting records
>> daily. My five largest segments all haven’t been updated for over a year.
>> 
>> Meanwhile, the number of segments I have keeps on increasing, and I have
>> hundreds of segment files that don’t seem to be getting merged past a
>> certain size (e.g. the largest is 2Gb but my older segments are over 100Gb).
>> 
>> My understanding was that background merges should be merging these older
>> segments with newer data over time, but this doesn’t seem to be the case.
>> 
>> I’m using Solr 4.9, but I was using an older version at the time that
>> these ‘older’ segments were created.
>> 
>> Any help on suggestions of what’s happening would be very much
>> appreciated. And also any suggestion on how I can monitor what’s happening
>> with the background merges.
>> 
>> Thanks,
>> 
>> James



RE: Solr spell check mutliwords

2015-07-30 Thread Dyer, James
Talha,

In your configuration, you have this set:

5

...which means it will consider the query "correctly spelled" and offer no 
suggestions if there are 5 or more results. You could omit this parameter and 
it will always suggest when possible.  

Possibly, a better option would be to add "spellcheck.collateParam.mm=100%" or 
"spellcheck.collateParam.q.op=100%", so when testing collations against the 
index, it will require all the terms to match something.  See 
https://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collateParam.XX for 
more information.

James Dyer
Ingram Content Group

-Original Message-
From: talha [mailto:talh...@gmail.com] 
Sent: Wednesday, July 22, 2015 9:34 AM
To: solr-user@lucene.apache.org
Subject: Solr spell check mutliwords

Could not figure out actual reason why my configured Solr spell checker not
giving desire output. In my indexed data query: symphony+mobile has around
3.5K+ docs and spell checker detect it as correctly spelled. When i
miss-spell "symphony" in query: symphony+mobile it showing only results for
"mobile" and spell checker detect this query as correctly spelled. I have
searched this query in different combination. Please find search result stat

Query: symphony 
ResultFound: 1190
SpellChecker: correctly spelled

Query: mobile
ResultFound: 2850
SpellChecker: correctly spelled

Query: simphony
ResultFound: 0
SpellChecker: symphony 
Collation Hits: 1190

Query: symphony+mobile
ResultFound: 3585
SpellChecker: correctly spelled 

Query: simphony+mobile
ResultFound: 2850
SpellChecker: correctly spelled

Query: symphony+mbile
ResultFound: 1190
SpellChecker: correctly spelled 

In last two quries it should suggest something for miss-spelled word
"simphony" and "mbile"

Please find my configuration below. Only spell check configuration are given

solrconfig.xml

  
  

explicit
10
product_name

on
default
wordbreak
true
5
2
5
true
true
5
3

  
  
spellcheck
  
  

  

  text_suggest

  
default
suggest
solr.DirectSolrSpellChecker
internal
0.5
  

  
wordbreak
suggest
solr.WordBreakSolrSpellChecker
true
true
10
5
  

  

schema.xml

  
  





  
  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-spell-check-mutliwords-tp4218580.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Solr spell check not showing any suggestions for other language

2015-08-05 Thread Dyer, James
Talha,

Possibly this english-specific analysis in your "text_suggest" field is 
interfering:  solr.EnglishPossessiveFilterFactory ?

Another guess is you're receiving more than 5 results and 
"maxResultsForSuggest" is set to 5.

But I'm not sure.  Maybe someone can help with more information from you?

Can you provide a few document examples that have Bangla text, then the full 
query request with a misspelled Bangla word (from the document examples you 
provide), then the full spellcheck response, and the total # of documents 
returned ? 

James Dyer
Ingram Content Group

-Original Message-
From: talha [mailto:talh...@gmail.com] 
Sent: Wednesday, August 05, 2015 5:20 AM
To: solr-user@lucene.apache.org
Subject: Solr spell check not showing any suggestions for other language

Solr spell check is not showing any suggestions for other language.I have
indexed mutli-languages (english and bangla) in same core.It's showing
suggestions for wrongly spelt english word but in case of wrongly spelt
bangla word it showing "correctlySpelled = false" but not showing any
suggestions for it.

Please check my configuration for spell check below

solrconfig.xml


  

explicit
10
product_name

on
default
wordbreak
true
5
2
5
true
true
5
3

  
  
spellcheck
  



  text_suggest

  
default
suggest
solr.DirectSolrSpellChecker
internal
0.5
  

  
wordbreak
suggest
solr.WordBreakSolrSpellChecker
true
true
10
5
  



schema.xml


  





  
  




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-spell-check-not-showing-any-suggestions-for-other-language-tp4220950.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Solr spell check not showing any suggestions for other language

2015-08-05 Thread Dyer, James
Talha,

Can you try putting your queried keyword in "spellcheck.q" ?

James Dyer
Ingram Content Group


-Original Message-
From: talha [mailto:talh...@gmail.com] 
Sent: Wednesday, August 05, 2015 10:13 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr spell check not showing any suggestions for other language

Dear James

Thank you for your reply.

I tested analyser without “solr.EnglishPossessiveFilterFactory” but still no
luck. I also updated analyser please find this below.


  



  



with above configuration for “text_sugggest” i got following results

For Correct Bangla Word: সহজ Solr response is 
Note: i set rows to 0 to skip results


  0
  2
  
সহজ
true
0
xml
1438787238383
  




  
true
  



For an Incorrect Bangla Word: সহগ where i just changed last letter and Solr
response is


  0
  7
  
সহগ
true
0
xml
1438787208052
  




  
false
  






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-spell-check-not-showing-any-suggestions-for-other-language-tp4220950p4221033.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: exclude folder in dataimport handler.

2015-08-20 Thread Dyer, James
I took a quick look at FileListEntityProcessor#init, and it looks like it 
applies the "excludes" regex to the filename element of the path only, and not 
to the directories.

If your filenames do not have a naming convention that would let you use it 
this way, you might be able to write a transformer to get what you want.

James Dyer
Ingram Content Group


-Original Message-
From: coolmals [mailto:coolm...@gmail.com] 
Sent: Thursday, August 20, 2015 12:57 PM
To: solr-user@lucene.apache.org
Subject: exclude folder in dataimport handler.

I am importing files from my file system and want to exclude import of files
from folder called templatedata. How do i configure that in entity. 
excludes="templatedata" doesnt seem to work.

 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/exclude-folder-in-dataimport-handler-tp4224267.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Spellcheck / Suggestions : Append custom dictionary to SOLR default index

2015-08-25 Thread Dyer, James
Max,

If you know the entire list of words you want to spellcheck against, you can 
use FileBasedSpellChecker.  See 
http://wiki.apache.org/solr/FileBasedSpellChecker .

If, however, you have a field you want to spellcheck against but also want 
additional words added, consider using a copy of the field for spellcheck 
purposes, and then index the additional terms to that field.   You may be able 
to accomplish this easily, for instance, by using index-time synonyms in the 
analysis chain for the spellcheck field.  Or you could just append them to any 
document (more than once if you want to boost the term frequency).

Keep in mind that while this will work fine for regular word-by-word spell 
suggestions, collations are not going to work well with these approaches.

James Dyer
Ingram Content Group

-Original Message-
From: Max Chadwick [mailto:mpchadw...@gmail.com] 
Sent: Monday, August 24, 2015 9:43 PM
To: solr-user@lucene.apache.org
Subject: Spellcheck / Suggestions : Append custom dictionary to SOLR default 
index

Is there a way to append a set of words the the out-of-box solr index when
using the spellcheck / suggestions feature?


RE: String index out of range exception from Spell check

2015-09-28 Thread Dyer, James
This looks similar to SOLR-4489, which is marked fixed for version 4.5.  If 
you're using an older version, the fix is to upgrade.  

Also see SOLR-3608, which is similar but here it seems as if the user's query 
is more than spellcheck was designed to handle.  This should still be looked at 
and possibly we can come up with a way to handle these cases.

A way to work around these bugs is to strip your query down to raw terms, 
separated by spaces, and use "spellcheck.q" with the raw terms only.

James Dyer
Ingram Content Group


-Original Message-
From: davidphilip cherian [mailto:davidphilipcher...@gmail.com] 
Sent: Sunday, September 27, 2015 3:50 PM
To: solr-user@lucene.apache.org
Subject: String index out of range exception from Spell check

There are irregular exceptions from spell check component. Below is the
stack trace. This is not common for all the q terms but have often seen
them occurring for specific queries after enabling spellcheck.collate
method.



String index out of range: -3



java.lang.StringIndexOutOfBoundsException: String index out of range: -3 at
java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:789) at
java.lang.StringBuilder.replace(StringBuilder.java:266) at
org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:235)
at
org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:92)
at
org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:230)
at
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:197)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:226)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1976) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:497) at
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:722)



500


Re: highlighting

2015-10-01 Thread Teague James
Hi everyone!

Pardon if it's not proper etiquette to chime in, but that feature would solve 
some issues I have with my app for the same reason. We are using markers now 
and it is very clunky - particularly with phrases and certain special 
characters. I would love to see this feature too Mark! For what it's worth - up 
vote. Thanks!

Cheers!

-Teague James

> On Oct 1, 2015, at 6:12 PM, Koji Sekiguchi  
> wrote:
> 
> Hi Mark,
> 
> I think I saw similar requirement recently in mailing list. The feature 
> sounds reasonable to me.
> 
> > If not, how do I go about posting this as a feature request?
> 
> JIRA can be used for the purpose, but there is no guarantee that the feature 
> is implemented. :(
> 
> Koji
> 
>> On 2015/10/01 20:07, Mark Fenbers wrote:
>> Yeah, I thought about using markers, but then I'd have to search the the 
>> text for the markers to
>> determine the locations.  This is a clunky way of getting the results I 
>> want, and it would save two
>> steps if Solr merely had an option to return a start/length array (of what 
>> should be highlighted) in
>> the original string rather than returning an altered string with tags 
>> inserted.
>> 
>> Mark
>> 
>>> On 9/29/2015 7:04 AM, Upayavira wrote:
>>> You can change the strings that are inserted into the text, and could
>>> place markers that you use to identify the start/end of highlighting
>>> elements. Does that work?
>>> 
>>> Upayavira
>>> 
>>>> On Mon, Sep 28, 2015, at 09:55 PM, Mark Fenbers wrote:
>>>> Greetings!
>>>> 
>>>> I have highlighting turned on in my Solr searches, but what I get back
>>>> is  tags surrounding the found term.  Since I use a SWT StyledText
>>>> widget to display my search results, what I really want is the offset
>>>> and length of each found term, so that I can highlight it in my own way
>>>> without HTML.  Is there a way to configure Solr to do that?  I couldn't
>>>> find it.  If not, how do I go about posting this as a feature request?
>>>> 
>>>> Thanks,
>>>> Mark
> 


RE: Spell Check and Privacy

2015-10-12 Thread Dyer, James
Arnon,

Use "spellcheck.collate=true" with "spellcheck.maxCollationTries" set to a 
non-zero value.  This will give you re-written queries that are guaranteed to 
return hits, given the original query and filters.  If you are using an "mm" 
value other than 100%, you also will want specify 
"spellcheck.collateParam.mm=100%". (or if using "q.op=OR", then use 
"spellcheck.collateParam.q.op=AND")

Of course, the first section of the spellcheck result will still show every 
possible suggestion, so your client needs to discard these and not divulge them 
to the user.  If you need to know word-by-word how the collations were 
constructed, then specify "spellcheck.collateExtendedResults=true".  Use the 
extended collation results for this information and not the first section of 
the spellcheck results.

This is all fairly well-documented on the old solr wiki:  
https://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate

James Dyer
Ingram Content Group

-Original Message-
From: Arnon Yogev [mailto:arn...@il.ibm.com] 
Sent: Monday, October 12, 2015 2:33 AM
To: solr-user@lucene.apache.org
Subject: Spell Check and Privacy

Hi,

Our system supports many users from different organizations and with 
different ACLs. 
We consider adding a spell check ("did you mean") functionality using 
DirectSolrSpellChecker. However, a privacy concern was raised, as this 
might lead to private information being revealed between users via the 
suggested terms. Using the FileBasedSpellChecker is another option, but 
naturally a static list of terms is not optimal.

Is there a best practice or a suggested method for these kind of cases?

Thanks,
Arnon



RE: File-based Spelling

2015-10-13 Thread Dyer, James
Mark,

The older spellcheck implementations create an n-gram sidecar index, which is 
why you're seeing your name split into 2-grams like this.  See the IR Book by 
Manning et al, section 3.3.4 for more information.  Based on the results you're 
getting, I think it is loading your file correctly.  You should now try a query 
against this spelling index, using words *not* in the file you loaded that are 
within 1 or 2 edits from something that is in the dictionary.  If it doesn't 
yield suggestions, then post the relevant sections of the solrconfig.xml, 
schema.xml and also the query string you are trying.

James Dyer
Ingram Content Group


-Original Message-
From: Mark Fenbers [mailto:mark.fenb...@noaa.gov] 
Sent: Monday, October 12, 2015 2:38 PM
To: Solr User Group
Subject: File-based Spelling

Greetings!

I'm attempting to use a file-based spell checker.  My sourceLocation is 
/usr/share/dict/linux.words, and my spellcheckIndexDir is set to 
./data/spFile.  BuildOnStartup is set to true, and I see nothing to 
suggest any sort of problem/error in solr.log.  However, in my 
./data/spFile/ directory, there are only two files: segments_2 with only 
71 bytes in it, and a zero-byte write.lock file.  For a source 
dictionary having 480,000 words in it, I was expecting a bit more 
substance in the ./data/spFile directory.  Something doesn't seem right 
with this.

Moreover, I ran a query on the word Fenbers, which isn't listed in the 
linux.words file, but there are several similar words.  The results I 
got back were odd, and suggestions included the following:
fenber
f en be r
f e nb er
f en b er
f e n be r
f en b e r
f e nb e r
f e n b er
f e n b e r

But I expected suggestions like fenders, embers, and fenberry, etc. I 
also ran a query on Mark (which IS listed in linux.words) and got back 
two suggestions in a similar format.  I played with configurables like 
changing the fieldType from text_en to string and the characterEncoding 
from UTF-8 to ASCII, etc., but nothing seemed to yield any different 
results.

Can anyone offer suggestions as to what I'm doing wrong?  I've been 
struggling with this for more than 40 hours now!  I'm surprised my 
persistence has lasted this long!

Thanks,
Mark


RE: DIH parallel processing

2015-10-15 Thread Dyer, James
Nabil,

What we do is have multiple dih request handlers configured in solrconfig.xml.  
Then in the sql query we put something like "where mod(id, ${partition})=0".  
Then an external script calls a full import on each request handler at the same 
time and monitors the response.  This isn't the most elegant solution but it 
gets around the fact that DIH is single-threaded.

James Dyer
Ingram Content Group


-Original Message-
From: nabil Kouici [mailto:koui...@yahoo.fr] 
Sent: Thursday, October 15, 2015 3:58 AM
To: Solr-user
Subject: DIH parallel processing

Hi All,
I'm using DIH to index more than 15M from Sql Server to Solr. This take more 
than 2 hours. Big amount of this time is consumed by data fetching from 
database. I'm thinking about a solution to have parallel (thread) loud in the 
same DIH. Each thread load a part of data.
Do you have any experience with this kind of situation?
Regards,Nabil. 


RE: DIH Caching with Delta Import

2015-10-21 Thread Dyer, James
The DIH Cache feature does not work with delta import.  Actually, much of DIH 
does not work with delta import.  The workaround you describe is similar to the 
approach described here: 
https://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport , which 
in my opinion is the best way to implement partial updates with DIH.

James Dyer
Ingram Content Group

-Original Message-
From: Todd Long [mailto:lon...@gmail.com] 
Sent: Tuesday, October 20, 2015 8:02 PM
To: solr-user@lucene.apache.org
Subject: DIH Caching with Delta Import

It appears that DIH entity caching (e.g. SortedMapBackedCache) does not work
with deltas... is this simply a bug with the DIH cache support or somehow by
design?

Any ideas on a workaround for this? Ideally, I could just omit the
"cacheImpl" attribute but that leaves the query (using the default processor
in my case) without the appropriate where clause including the "cacheKey"
and "cacheLookup". Should SqlEntityProcessor be smart enough to ignore the
cache with deltas and simply append a where clause which includes the
"cacheKey" and "cacheLookup"? Or possibly just include a where clause which
includes ('${dih.request.command}' = 'full-import' or cacheKey =
cacheLookup)? I suppose those could be used to mitigate the issue but I was
hoping for possibly a better solution.

Any help would be greatly appreciated. Thank you.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-Caching-with-Delta-Import-tp4235598.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: DIH Caching w/ BerkleyBackedCache

2015-11-20 Thread Dyer, James
Todd,

With the DIH request, are you specifying "cacheDeletePriorData=false".  Looking 
at the BerkleyBackedCache code if this is set to true, it deletes the cache and 
assumes the current update is to fully repopulate it.  If you want to do an 
incremental update to the cache, it needs to be false.  You might also need to 
specify "clean=false", but I'm not sure if this is a requirement.

I've used DIH with BerkleyBackedCache for a few years and it works well for us. 
 But rather than using it inline, we have a number of DIH handlers that just 
build caches, then when they're all built, a final DIH joins data from the 
caches and indexes it to solr.  We also do like you are, with several handlers 
running at once, each doing part of the data.

But I have to warn you this code hasn't been maintained by anyone.  I'm using 
an older DIH jar (4.6) with newer solr.  I think there might have been an api 
change or something that prevented the uncommitted caching code from working 
with newer versions, but I honestly forget.  This is probably a viable solution 
if you don't want to write any code, but it might take some trial and error 
getting it to work.

James Dyer
Ingram Content Group


-Original Message-
From: Todd Long [mailto:lon...@gmail.com] 
Sent: Tuesday, November 17, 2015 8:11 AM
To: solr-user@lucene.apache.org
Subject: Re: DIH Caching w/ BerkleyBackedCache

Mikhail Khludnev wrote
> It's worth to mention that for really complex relations scheme it might be
> challenging to organize all of them into parallel ordered streams.

This will most likely be the issue for us which is why I would like to have
the Berkley cache solution to fall back on, if possible. Again, I'm not sure
why but it appears that the Berkley cache is overwriting itself (i.e.
cleaning up unused data) when building the database... I've read plenty of
other threads where it appears folks are having success using that caching
solution.


Mikhail Khludnev wrote
> threads... you said? Which ones? Declarative parallelization in
> EntityProcessor worked only with certain 3.x version.

We are running multiple DIH instances which query against specific
partitions of the data (i.e. mod of the document id we're indexing).



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-Caching-w-BerkleyBackedCache-tp4240142p4240562.html
Sent from the Solr - User mailing list archive at Nabble.com.



URL Encoding on Import

2015-11-25 Thread Teague James
Hi everyone!

Does anyone have any suggestions on how to URL encode URLs that I'm
importing from SQL using the DIH? The importer pulls in something like
"http://www.downloadsite.com/document that is being downloaded.doc" and then
the Tika parser can't download the document because it ends up trying to
access "http://www.downloadsite.com/document"; and gets a 404 error. What I
need to do is transform the URL to
"http://www.downloadsite.com/document%20that%20is%20being%20downloaded.doc";
I added a regex transformer to the DIH field, but I have not found a
successful regex to accomplish this. Thoughts? 

Any advice would be appreciated! Thanks!

-Teague



Help With Phrase Highlighting

2015-12-01 Thread Teague James
Hello everyone,

I am having difficulty enabling phrase highlighting and am hoping someone
here can offer some help. This is what I have currently:

Solr 4.9
solrconfig.xml (partial snip)


xml
explicit
10
text
on
text
html
100





schema.xml (partial snip)

   

Query (partial snip):
...select?fq=id:43040&q="my%20search%20phrase"

Response (partial snip):
...

ipsum dolor sit amet, pro ne verear prompta, sea te aeterno scripta
assentior. (my search


phrase facilitates highlighting). Et option molestiae referrentur
ius. Viris quaeque legimus an pri


The document in which this phrase is found is very long. If I reduce the
document to a single sentence, such as "My search phrase facilitates
highlighting" then the response I get from Solr is:

My search phrase facilitates highlighting


What I am trying to achieve instead, regardless of the document size is:
My search phrase with a single indicator at the beginning
and end rather than three separate words that may get dsitributed between
two different snippets depending on the placement of the snippet in te
larger document.

I tried to follow this guide:
http://stackoverflow.com/questions/25930180/solr-how-to-highlight-the-whole-
search-phrase-only/25970452#25970452 but got zero results. I suspect that
this is due to the hl parameters in my solrconfig file, but I cannot find
any specific guidance on the correct parameters should be. I tried
commenting out all of the hl parameters and also got no results.

Can anyone offer any solutions for searching large documents and returning a
single phrase highlight?

-Teague



Re: Help With Phrase Highlighting

2015-12-01 Thread Teague James
Hello,

Thanks for replying! I tried using it in a query string, but without success. 
Should I add it to my solrconfig? If so, are there any other hl parameters that 
are necessary? 

-Teague

> On Dec 1, 2015, at 9:01 PM, Philippe Soares  wrote:
> 
> Hi,
> Did you try hl.mergeContiguous=true ?
> 
> On Tue, Dec 1, 2015 at 3:36 PM, Teague James 
> wrote:
> 
>> Hello everyone,
>> 
>> I am having difficulty enabling phrase highlighting and am hoping someone
>> here can offer some help. This is what I have currently:
>> 
>> Solr 4.9
>> solrconfig.xml (partial snip)
>> 
>>
>>xml
>>explicit
>>10
>>text
>>on
>>text
>>html
>>100
>>
>>
>>
>> 
>> 
>> schema.xml (partial snip)
>>   > required="true" multiValued="false" />
>>   
>> 
>> Query (partial snip):
>> ...select?fq=id:43040&q="my%20search%20phrase"
>> 
>> Response (partial snip):
>> ...
>> 
>> ipsum dolor sit amet, pro ne verear prompta, sea te aeterno scripta
>> assentior. (my search
>> 
>> 
>> phrase facilitates highlighting). Et option molestiae referrentur
>> ius. Viris quaeque legimus an pri
>> 
>> 
>> The document in which this phrase is found is very long. If I reduce the
>> document to a single sentence, such as "My search phrase facilitates
>> highlighting" then the response I get from Solr is:
>> 
>> My search phrase facilitates highlighting
>> 
>> 
>> What I am trying to achieve instead, regardless of the document size is:
>> My search phrase with a single indicator at the beginning
>> and end rather than three separate words that may get dsitributed between
>> two different snippets depending on the placement of the snippet in te
>> larger document.
>> 
>> I tried to follow this guide:
>> 
>> http://stackoverflow.com/questions/25930180/solr-how-to-highlight-the-whole-
>> search-phrase-only/25970452#25970452 but got zero results. I suspect that
>> this is due to the hl parameters in my solrconfig file, but I cannot find
>> any specific guidance on the correct parameters should be. I tried
>> commenting out all of the hl parameters and also got no results.
>> 
>> Can anyone offer any solutions for searching large documents and returning
>> a
>> single phrase highlight?
>> 
>> -Teague
> 
> 
> -- 
> [image: GQ Life Sciences, Inc.] <http://www.gqlifesciences.com/>Philippe
> Soares Senior Developer   |  [image: ☎] +1 508 599 3963
> GQ Life Sciences, Inc. www.gqlifesciences.comThis email message and any
> attachments are confidential and may be privileged. If you are not the
> intended recipient, please notify GQ Life Sciences immediately by
> forwarding this message to le...@gqlifesciences.com and destroy all copies
> of this message and any attachments without reading or disclosing their
> contents.


Re: highlight

2015-12-02 Thread Teague James
Hello,

Thanks for replying! Yes, I am storing the whole document. The document is 
indexed with a unique id. There are only 3 fields in the schema - id, 
rawDocument, tikaDocument. Search uses the tikaDocument field. Against this I 
am throwing 2-5 word phrases and getting highlighting matches to each 
individual word in the phrases instead of just the phrase. The highlighted text 
that is matched is read by another application for display in the front end UI. 
Right now my app has logic to figure out that multiple highlights indicate a 
phrase, but it isn't perfect. 

In this case Solr is reporting a single 3 word phrase as 2 hits one with 2 of 
the phrase words, the other with 1 of the phrase words. This only happens in 
large documents where the multi word phrase appears across the boundary of one 
of the document fragments that Solr in analyzing (this is a hunch - I really 
don't know the mechanics for certain, but the next statement makes evident how 
I came to this conclusion). However if I make a one sentence document with the 
same multi word phrase, Solr will report 1 hit with all three words 
individually highlighted. At the very least I know Solr is getting the phrase 
correct. It is the method of highlighting (I'm trying to get one set of tags 
per phrase) and the occasional breaking of a single phrase into 2 hits.

Given that setup, what do you recommend? I'm not sure I understand the approach 
you're describing. I appreciate the help!

-Teague James

> On Dec 2, 2015, at 10:09 AM, Rick Leir  wrote:
> 
> For performance, if you have many large documents, you want to index the
> whole document but only store some identifiers. (Maybe this is not a
> consideration for you, stop reading now )
> 
> If you are not storing the whole document, then Solr cannot do the
> highlighting.  You would get an id, then locate your source document (maybe
> in your filesystem) and do highlighting yourself.
> 
>> Can anyone offer any solutions for searching large documents and
> returning a
>> single phrase highlight?


RE: Help With Phrase Highlighting

2015-12-03 Thread Teague James
Thanks everyone who replied! The FastVectorHighlighter did the trick. Here
is how I configured it:

In solrconfig.xml:
In the requestHandler I added:
on
text
true
100

In schema.xml:
I modified the text field:


I restarted Solr, re-indexed the documents and tested. All phrases are
correctly highlighted as phrases! Thanks everyone!

-Teague



RE: Spellcheck error

2015-12-03 Thread Dyer, James
Matt,

Can you give some information about how your spellcheck field is analyzed and 
also if you're using a custom query converter.  Also, try and place the bare 
terms you want checked in spellcheck.q (ex, if your query is q=+movie +theatre, 
then spellcheck.q=movie theatre).  Does it work in this case?  Also, could you 
give the exact query you're using?

This is the very same bug as in the 3 tickets you mention.  We clearly haven't 
solved all of the possible ways this bug can be triggered.  But we cannot fix 
this unless we can come up with a unit test that reliably reproduces it.  At 
the very least, we should handle these problems better than throwing SIOOB like 
this.

Long term, there is probably a better design we could come up with for how 
terms are identified within queries and how collations are generated.

James Dyer
Ingram Content Group


-Original Message-
From: Matt Pearce [mailto:m...@flax.co.uk] 
Sent: Thursday, December 03, 2015 10:40 AM
To: solr-user
Subject: Spellcheck error

Hi,

We're using Solr 5.3.1, and we're getting a 
StringIndexOutOfBoundsException from the SpellCheckCollator. I've done 
some investigation, and it looks like the problem is that the corrected 
string is shorter than the original query.

For example, the search term is "theatre", the suggested correction is 
"there". The error is being thrown when replacing the original query 
with the shorter replacement.

This is the stack trace:
java.lang.StringIndexOutOfBoundsException: String index out of range: -2
 at 
java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:824)
 at java.lang.StringBuilder.replace(StringBuilder.java:262)
 at 
org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:235)
 at 
org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:92)
 at 
org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:237)
 at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:202)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277)

The error looks very similar to those described in 
https://issues.apache.org/jira/browse/SOLR-4489, 
https://issues.apache.org/jira/browse/SOLR-3608 and 
https://issues.apache.org/jira/browse/SOLR-2509, most of which are closed.

Any suggestions would be appreciated, or should I open a JIRA ticket?

Thanks,

Matt

-- 
Matt Pearce
Flax - Open Source Enterprise Search
www.flax.co.uk



RE: Data Import Handler - Multivalued fields - splitBy

2015-12-04 Thread Dyer, James
Brian,

Be sure to have...

transformer="RegexTransformer"

...in your  tag.  It’s the RegexTransformer class that looks for 
"splitBy".

See https://wiki.apache.org/solr/DataImportHandler#RegexTransformer for more 
information.

James Dyer
Ingram Content Group


-Original Message-
From: Brian Narsi [mailto:bnars...@gmail.com] 
Sent: Friday, December 04, 2015 3:10 PM
To: solr-user@lucene.apache.org
Subject: Data Import Handler - Multivalued fields - splitBy

I have the following:





I believe I had the following working (splitting on pipe delimited)



But it does not work now.



In-fact now I have even tried



But I cannot get the values to split into an array.

Any thoughts/suggestions what may be wrong?

Thanks,


fuzzy searches and EDISMAX

2015-12-08 Thread Felley, James
I am trying to build an edismax search handler that will allow a fuzzy search, 
using the "query fields" property (qf).

I have two instances of SOLR 4.8.1, one of which has edismax "qf" configured 
with no fuzzy search
...
ns_name^3.0  i_topic^3.0  i_object_type^3.0

...
And the other with a fuzzy search for ns_name (non-stemmed name)
ns_name~1^3.0  i_topic^3.0  i_object_type^3.0

...

The index of both includes a record with an ns_name of 'Johnson'

I get no return in either instance with the query
q=Johnso

I get the Johnson record returned in both instances with a query of
q=Johnso~1

The SOLR documentation seems silent on incorporating fuzzy searches in the 
query fields.  I have seen various posts on Google that suggest that 'qf' will 
accept fuzzy search declarations, other posts suggest only the query itself 
will allow fuzzy searches (as seems to be the case for me).

Any guidance will be much appreciated

Jim

Jim Felley
OCIO
Smithsonian Institution
fell...@si.edu






RE: spellcheck.count v/s spellcheck.alternativeTermCount

2015-02-17 Thread Dyer, James
See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.count and the 
following section, for details.

Briefly, "count" is the # of suggestions it will return for terms that are 
*not* in your index/dictionary.  "alternativeTermCount" are the # of 
alternatives you want returned for terms that *are* in your dictionary.  You 
can set them to the same value, unless you want fewer suggestions when the 
terms is in the dictionary.

James Dyer
Ingram Content Group

-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Tuesday, February 17, 2015 5:27 AM
To: solr-user@lucene.apache.org
Subject: spellcheck.count v/s spellcheck.alternativeTermCount

Hello Everyone,
  I got confusion between spellcheck.count and
spellcheck.alternativeTermCount in Solr. Any help in details?


RE: spellcheck.count v/s spellcheck.alternativeTermCount

2015-02-17 Thread Dyer, James
Here is an example to illustrate what I mean...

- query q=text:(life AND 
hope)&spellcheck.count=10&spellcheck.alternativeTermCount=5
- suppose at least one document in your dictionary field has "life" in it
- also suppose zero documents in your dictionary field have "hope" in them
- The spellchecker will try to return you up to 10 suggestions for "hope", but 
only up to 5 suggestions for "life"

James Dyer
Ingram Content Group


-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Tuesday, February 17, 2015 11:35 AM
To: solr-user@lucene.apache.org
Subject: Re: spellcheck.count v/s spellcheck.alternativeTermCount

Hi James,
How can you say that "count" doesn't use
index/dictionary then from where suggestions come.

On Tue, Feb 17, 2015 at 10:29 PM, Dyer, James 
wrote:

> See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.count and
> the following section, for details.
>
> Briefly, "count" is the # of suggestions it will return for terms that are
> *not* in your index/dictionary.  "alternativeTermCount" are the # of
> alternatives you want returned for terms that *are* in your dictionary.
> You can set them to the same value, unless you want fewer suggestions when
> the terms is in the dictionary.
>
> James Dyer
> Ingram Content Group
>
> -Original Message-
> From: Nitin Solanki [mailto:nitinml...@gmail.com]
> Sent: Tuesday, February 17, 2015 5:27 AM
> To: solr-user@lucene.apache.org
> Subject: spellcheck.count v/s spellcheck.alternativeTermCount
>
> Hello Everyone,
>   I got confusion between spellcheck.count and
> spellcheck.alternativeTermCount in Solr. Any help in details?
>


RE: spellcheck.count v/s spellcheck.alternativeTermCount

2015-02-18 Thread Dyer, James
It will try to give you suggestions up to the number you specify, but if fewer 
are available it will not give you any more.

James Dyer
Ingram Content Group

-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Tuesday, February 17, 2015 11:40 PM
To: solr-user@lucene.apache.org
Subject: Re: spellcheck.count v/s spellcheck.alternativeTermCount

Thanks James,
  I tried the same thing
spellcheck.count=10&spellcheck.alternativeTermCount=5. And I got 5
suggestions of both "life" and "hope" but not like this * The spellchecker
will try to return you up to 10 suggestions for "hope", but only up to 5
suggestions for "life". *


On Wed, Feb 18, 2015 at 1:10 AM, Dyer, James 
wrote:

> Here is an example to illustrate what I mean...
>
> - query q=text:(life AND
> hope)&spellcheck.count=10&spellcheck.alternativeTermCount=5
> - suppose at least one document in your dictionary field has "life" in it
> - also suppose zero documents in your dictionary field have "hope" in them
> - The spellchecker will try to return you up to 10 suggestions for "hope",
> but only up to 5 suggestions for "life"
>
> James Dyer
> Ingram Content Group
>
>
> -Original Message-
> From: Nitin Solanki [mailto:nitinml...@gmail.com]
> Sent: Tuesday, February 17, 2015 11:35 AM
> To: solr-user@lucene.apache.org
> Subject: Re: spellcheck.count v/s spellcheck.alternativeTermCount
>
> Hi James,
> How can you say that "count" doesn't use
> index/dictionary then from where suggestions come.
>
> On Tue, Feb 17, 2015 at 10:29 PM, Dyer, James <
> james.d...@ingramcontent.com>
> wrote:
>
> > See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.count and
> > the following section, for details.
> >
> > Briefly, "count" is the # of suggestions it will return for terms that
> are
> > *not* in your index/dictionary.  "alternativeTermCount" are the # of
> > alternatives you want returned for terms that *are* in your dictionary.
> > You can set them to the same value, unless you want fewer suggestions
> when
> > the terms is in the dictionary.
> >
> > James Dyer
> > Ingram Content Group
> >
> > -Original Message-
> > From: Nitin Solanki [mailto:nitinml...@gmail.com]
> > Sent: Tuesday, February 17, 2015 5:27 AM
> > To: solr-user@lucene.apache.org
> > Subject: spellcheck.count v/s spellcheck.alternativeTermCount
> >
> > Hello Everyone,
> >   I got confusion between spellcheck.count and
> > spellcheck.alternativeTermCount in Solr. Any help in details?
> >
>


RE: Why collations are coming even I set the value of spellcheck.count to zero(0)

2015-02-18 Thread Dyer, James
I think when you set "count"/"alternativeTermCount" to zero, the defaults (10?) 
are used instead.  Instead of setting these to zero, just use 
"spellcheck=false".  These 2 parameters control suggestions, not collations.

To turn off collations, set "spellcheck.collate=false".  Also, I wouldn't set 
"maxCollationTries" as high as 100, as it could (sometimes) potentially check 
100 possibly collations against the index and that would be very slow.

James Dyer
Ingram Content Group


-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Wednesday, February 18, 2015 2:37 AM
To: solr-user@lucene.apache.org
Subject: Why collations are coming even I set the value of spellcheck.count to 
zero(0)

Hi Everyone,
I have set the value of spellcheck.count = 0 and
spellcheck.alternativeTermCount = 0. Even though collations are coming when
I search any query which is misspelled. Why so?
I also set the value of spellcheck.maxCollations = 100 and
spellcheck.maxCollationTries = 100. What I know that collations are built
on suggestions. So, Have I any misunderstanding about collation or any
other configuration issue. Any help Please?


RE: Solr phonetics with spelling

2015-03-10 Thread Dyer, James
Ashish,

I would not recommend using spellcheck against a phonetic-analyzed field.  
Instead, you can use  to create a separate field that is lightly 
analyzed and use the copy for spelling.  

James Dyer
Ingram Content Group


-Original Message-
From: Ashish Mukherjee [mailto:ashish.mukher...@gmail.com] 
Sent: Tuesday, March 10, 2015 7:05 AM
To: solr-user@lucene.apache.org
Subject: Solr phonetics with spelling

Hello,

Couple of questions related to phonetics -

1. If I enable the phonetic filter in managed-schema file for a particular
field, how does it affect the spell handler?

2. What is the meaning of the inject attribute within  in
managed-schema? The documentation is not very clear about it.

Regards,
Ashish


Re: Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?

2015-03-19 Thread James Strassburg
Sorry, I've been a bit unfocused from this list for a bit. When I was
working with the APTF code I rewrote a big chunk of it and didn't include
the inclusion of the original tokens as I didn't need it at the time. That
feature could easily be added back in. I will see if I can find a bit of
time for that.

As for the other part of your message, are you suggesting that the token
indexes are not correct? There is a bit of a formatting issue with the text
and I'm not sure what you're getting at. Can you explain further please?

On Sun, Feb 8, 2015 at 3:04 PM, trhodesg  wrote:

> Thanks to everyone for the thought, time and effort put into
> AutoPhrasingTokenFilter(APTF)! It's a real lifesaver.
> While trying to add APTF to my indexing, i discovered that the original
> (TS)
> version throws an exception while indexing a 100MB PDF. The error
> isException writing document to the index; possible analysis errorThe
> modified (JS) version runs without error, but it removes the tokens used to
> create the phrase. They are needed.
> Before looking into this i have a question; Solr would normally tokenize
> the
> phrasethe peoples republic of china isasthe(1) peoples(2) republic(3) of(4)
> china(5) is(6)
> Defining the APTF phrase file asthe Solr admin analysis page reports that
> the APTF indexer tokenizes the phrase asWould it be possible for someone to
> explain the reasoning behind the discontinuous token numbering? As it is
> now
> phrase queries such as "republic of china" will fail. And i can't get
> proximity queries like "republic of"~10 to work either (though it seems
> they
> should). Wouldn't it be more flexible to return the following
> tokenizationThis allows spurious matches such as "peoples peoplesrepublic"
> but it seems like this type of event would be very rare. It has the
> advantage of allowing phrase queries to continue working the way most users
> think.
> Thank you for supporting more than one entity definition per phrase (ie
> peoplesrepublic and peoplesrepublicofchina). This is type of contraction is
> common in longer documents, especially when the first used phrase ends with
> a preposition. It helps support robust matching.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Have-anyone-used-Automatic-Phrase-Tokenization-AutoPhrasingTokenFilterFactory-tp4173808p4184888.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?

2015-03-20 Thread James Strassburg
I have an autophrase configured for 'wheel chair' and if I run analysis for
'super wheel chair awesome' such that it would index to 'super wheelchair
awesome' this is how mine behaves:
http://i.imgur.com/iR4IgGp.png

When I did the implementation that is how I thought the positioning should
work. Do you think it should be different?

On Fri, Mar 20, 2015 at 11:10 AM, trhodesg  wrote:

>
>
>
>
>
> Sorry, i can see my post is munged.
>   This seems to display it legibly
>
>
> http://lucene.472066.n3.nabble.com/Have-anyone-used-Automatic-Phrase-Tokenization-AutoPhrasingTokenFilterFactory-td4173808.html
>
>   I'm new to all this, so i hesitate to say the indexing isn't
>   correct. But my understanding is the query, "republic
> of china", will only match
> the indexing, republic(n) of(n+1) china(n+2)  Since
> the original APTF indexes this as republic(n) of(n+3) china(n+7)
>   that query will fail. Wouldn't it be more logical to leave the
>   original token numbering unchanged and just add the phrase token
>   with the same number as the last word in the matched series?
>
>   BTW, i looked at your code re this. It is quite informative to a
>   newbie. Thanks!
>
>
>   On 3/19/2015 11:38 AM, James Strassburg [via Lucene] wrote:
>
>  Sorry, I've been a bit unfocused from this list for a
>   bit. When I was
>
>   working with the APTF code I rewrote a big chunk of it and didn't
>   include
>
>   the inclusion of the original tokens as I didn't need it at the
>   time. That
>
>   feature could easily be added back in. I will see if I can find a
>   bit of
>
>   time for that.
>
>
>   As for the other part of your message, are you suggesting that the
>   token
>
>   indexes are not correct? There is a bit of a formatting issue with
>   the text
>
>   and I'm not sure what you're getting at. Can you explain further
>   please?
>
>
>   On Sun, Feb 8, 2015 at 3:04 PM, trhodesg < [hidden email] >
>   wrote:
>
>
> > Thanks to everyone for the thought, time and effort put
> into
>
> > AutoPhrasingTokenFilter(APTF)! It's a real lifesaver.
>
> > While trying to add APTF to my indexing, i discovered that
> the original
>
> > (TS)
>
> > version throws an exception while indexing a 100MB PDF. The
> error
>
> > isException writing document to the index; possible
> analysis errorThe
>
> > modified (JS) version runs without error, but it removes
> the tokens used to
>
> > create the phrase. They are needed.
>
> > Before looking into this i have a question; Solr would
> normally tokenize
>
> > the
>
> > phrasethe peoples republic of china isasthe(1) peoples(2)
> republic(3) of(4)
>
> > china(5) is(6)
>
> > Defining the APTF phrase file asthe Solr admin analysis
> page reports that
>
> > the APTF indexer tokenizes the phrase asWould it be
> possible for someone to
>
> > explain the reasoning behind the discontinuous token
> numbering? As it is
>
> > now
>
> > phrase queries such as "republic of china" will fail. And i
> can't get
>
> > proximity queries like "republic of"~10 to work either
> (though it seems
>
> > they
>
> > should). Wouldn't it be more flexible to return the
> following
>
> > tokenizationThis allows spurious matches such as "peoples
> peoplesrepublic"
>
> > but it seems like this type of event would be very rare. It
> has the
>
> > advantage of allowing phrase queries to continue working
> the way most users
>
> > think.
>
> > Thank you for supporting more than one entity definition
> per phrase (ie
>
> > peoplesrepublic and peoplesrepublicofchina). This is type
> of contraction is
>
> > common in longer documents, especially when the first used
> phrase ends with
>
> > a preposition. It helps support robust matching.
>
> >
>
> >
>
> >
>
> > --
>
> > View this message in context:
>
> >
> http://lucene.472066.n3.nabb

RE: using DirectSpellChecker and FileBasedSpellChecker with Solr 4.10.1

2015-04-14 Thread Dyer, James
Elisabeth,

Currently ConjunctionSolrSpellChecker only supports adding 
WordBreakSolrSpellchecker to IndexBased- FileBased- or DirectSolrSpellChecker.  
In the future, it would be great if it could handle other Spell Checker 
combinations.  For instance, if you had a (e)dismax query that searches 
multiple fields, to have a separate spellchecker for each of them.

But CSSC is not hardened for this more general usage, as hinted in the API doc. 
 The check done to ensure all spellcheckers use the same stringdistance object, 
I believe, is a safeguard against using this class for functionality it is not 
able to correctly support.  It looks to me that SOLR-6271 was opened to fix the 
bug in that it is comparing references on the stringdistance.  This is not a 
problem with WBSSC because this one does not support string distance at all.

What you're hoping for, however, is that the requirement for the string 
distances be the same to be removed entirely.  You could try modifying the code 
by removing the check.  However beware that you might not get the results you 
desire!  But should this happen, please, go ahead and fix it for your use case 
and then donate the code.  This is something I've personally wanted for a long 
time.

James Dyer
Ingram Content Group


-Original Message-
From: elisabeth benoit [mailto:elisaelisael...@gmail.com] 
Sent: Tuesday, April 14, 2015 7:37 AM
To: solr-user@lucene.apache.org
Subject: using DirectSpellChecker and FileBasedSpellChecker with Solr 4.10.1

Hello,

I am using Solr 4.10.1 and trying to use DirectSolrSpellChecker and
FileBasedSpellchecker in same request.

I've applied change from patch 135.patch (cf Solr-6271). I've tried running
the command "patch -p1 -i 135.patch --dry-run" but it didn't work, maybe
because the patch was a fix to Solr 4.9, so I just replaced line in
ConjunctionSolrSpellChecker

else if (!stringDistance.equals(checker.getStringDistance())) {
 throw new IllegalArgumentException(
 "All checkers need to use the same StringDistance.");
   }


by

else if (!stringDistance.equals(checker.getStringDistance())) {
throw new IllegalArgumentException(
"All checkers need to use the same StringDistance!!! 1:" +
checker.getStringDistance() + " 2: " + stringDistance);
  }

as it was done in the patch

but still, when I send a spellcheck request, I get the error

msg": "All checkers need to use the same StringDistance!!!
1:org.apache.lucene.search.spell.LuceneLevenshteinDistance@15f57db32:
org.apache.lucene.search.spell.LuceneLevenshteinDistance@280f7e08"

From error message I gather both spellchecker use same distanceMeasure
LuceneLevenshteinDistance, but they're not same instance of
LuceneLevenshteinDistance.

Is the condition all right? What should be done to fix this properly?

Thanks,
Elisabeth


Alternate ways to facet spatial data

2015-05-05 Thread James Sewell
Hello all,

I've just started using SOLR for spatial queries and it looks great so far.
I've mostly been investigating importing a large amount of point data,
indexing and searching it.

I've discovered the facet.heatmap functionality, which is great - but I
would like to ask if it is possible to get slightly different results from
this.

Essentially rather than a heatmap I would like either a polygon per cluster
(might be too much computation?) or a point per cluster (centroid would be
great, centre of grid would be ok), coupled with the point count.

Is this currently possible using faceting, or does it seem like a workable
feature I could implement?

Cheers,

James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


 Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099



-- 

James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


 Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


SolrCloud No Active Slice

2015-06-10 Thread James Webster
I'm having a config issue, I'm posting the error from Solrj which also
includes the cluster state JSON:

 org.apache.solr.common.SolrException: No active slice servicing hash code
2ee4d125 in DocCollection(rfp365)={
  "shards":{"shard1":{
  "range":"-",
  "state":"active",
  "replicas":{
"core_node1":{
  "state":"active",
  "core":"rfp365_shard1_replica1",
  "node_name":"172.31.58.150:8983_solr",
  "base_url":"http://172.31.58.150:8983/solr"},
"core_node2":{
  "state":"active",
  "core":"rfp365_shard1_replica2",
  "node_name":"172.31.60.137:8983_solr",
  "base_url":"http://172.31.60.137:8983/solr"},
"core_node3":{
  "state":"active",
  "core":"rfp365_shard1_replica3",
  "node_name":"172.31.58.65:8983_solr",
  "base_url":"http://172.31.58.65:8983/solr";,
  "leader":"true",
  "replicationFactor":"3",
  "router":{"name":"compositeId"},
  "maxShardsPerNode":"1",
  "autoAddReplicas":"true"}
at
org.apache.solr.common.cloud.HashBasedRouter.hashToSlice(HashBasedRouter.java:65)
at
org.apache.solr.common.cloud.HashBasedRouter.getTargetSlice(HashBasedRouter.java:39)
at
org.apache.solr.client.solrj.request.UpdateRequest.getRoutes(UpdateRequest.java:206)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.directUpdate(CloudSolrClient.java:581)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:948)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:839)
... 6 more

All nodes are active in the solr admin, not sure where to go from here.

Thanks in advance!
James


SolrCloud No Active Slice

2015-06-10 Thread James Webster
I'm having a config issue, I'm posting the error from Solrj which also
includes the cluster state JSON:

 org.apache.solr.common.SolrException: No active slice servicing hash code
2ee4d125 in DocCollection(rfp365)={
  "shards":{"shard1":{
  "range":"-",
  "state":"active",
  "replicas":{
"core_node1":{
  "state":"active",
  "core":"rfp365_shard1_replica1",
  "node_name":"172.31.58.150:8983_solr",
  "base_url":"http://172.31.58.150:8983/solr"},
"core_node2":{
  "state":"active",
  "core":"rfp365_shard1_replica2",
  "node_name":"172.31.60.137:8983_solr",
  "base_url":"http://172.31.60.137:8983/solr"},
"core_node3":{
  "state":"active",
  "core":"rfp365_shard1_replica3",
  "node_name":"172.31.58.65:8983_solr",
  "base_url":"http://172.31.58.65:8983/solr";,
  "leader":"true",
  "replicationFactor":"3",
  "router":{"name":"compositeId"},
  "maxShardsPerNode":"1",
  "autoAddReplicas":"true"}
at
org.apache.solr.common.cloud.HashBasedRouter.hashToSlice(HashBasedRouter.java:65)
at
org.apache.solr.common.cloud.HashBasedRouter.getTargetSlice(HashBasedRouter.java:39)
at
org.apache.solr.client.solrj.request.UpdateRequest.getRoutes(UpdateRequest.java:206)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.directUpdate(CloudSolrClient.java:581)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:948)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:839)
... 6 more

All nodes are active in the solr admin, not sure where to go from here.

Thanks in advance!


RE: Spell checking the synonym list?

2015-07-09 Thread Dyer, James
Ryan,

If you use index-time synonyms on the spellcheck field, this will give you what 
you want.

For instance, if the document has "lawyer" and you index both terms 
"lawyer","attorney", then the spellchecker will see that "atorney" is 1 edit 
away from an indexed term and will suggest "attorney". 

You'll need to have the same synonyms set up against the query field, but you 
have the option of making these query-time synonyms if you prefer.

James Dyer
Ingram Content Group

-Original Message-
From: Ryan Yacyshyn [mailto:ryan.yacys...@gmail.com] 
Sent: Thursday, July 09, 2015 2:28 AM
To: solr-user@lucene.apache.org
Subject: Spell checking the synonym list?

Hi all,

I'm wondering if it's possible to have spell checking performed on terms in
the synonym list?

For example, let's say I have documents with the word "lawyer" in them and
I add "lawyer, attorney" in the synonyms.txt file. Then a query is made for
the word "atorney". Is there any way to provide spell checking on this?

Thanks,
Ryan


RE: Protwords in solr spellchecker

2015-07-10 Thread Dyer, James
Kamal,

Given the constraint that you cannot re-index the data, your best bet might be 
to simply filter out the suggestions at the application level, or maybe even 
have a proxy do it.

Possibly another option, you might be able to extend DirectSolrSpellchecker and 
override #getSuggestions(), calling super(), then post-filtering out your stop 
words from the response.  You'll want to request a few more terms so you're 
more likely to get results even if a term or two get filtered out.  You can 
specify your custom spell checker in solrconfig.xml.

James Dyer
Ingram Content Group


-Original Message-
From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com] 
Sent: Friday, July 10, 2015 7:00 AM
To: solr-user@lucene.apache.org
Subject: Re: Protwords in solr spellchecker

So let's try to analyse the situation from the spellchecking point of view .
First of all we follow David suggestions and we add in the QueryTime
analysis, the StopWordsFilter, with our configured "bad" words.

*Starting scenario*
- we have the protected words in our index, we still want them to be in
there

Let's explore the different kind of Spellcheckers available, where do they
take the suggestions ? :

*Index Based Spellchecker*
The suggestions will come from an auxiliary index.

*Direct Spellchecker*
The suggestions will come from the current index.

*File based spellchecker*
It uses an external file to get the spelling suggestions from, so we can
curate this file properly with only good words, and we are fine.
But I guess you would like to use a blacklist, in this case we are going to
have a white list.

*Query Time*
At query time *the query is analysed *and a token stream is provided.
Then depending on the implementation we trigger a different lookup.
In the case of the Direct Spellchecker, if I remember well :
For each token a FST with all the supported inflections is generated and an
intersection happen with the Index FST ( based on the field), and the
suggestion is returned.

Unfortunately a proper* query time analysis will not help .*
When we analyse the query we have the misspelled word "sexe" that is not
going to be recognised as the bad word.
Then the inflections are calculated, the FST built and the intersection
will actually produce the feared suggestion "sex" .
This because the word is in the index.

If we can't modify the index, the *Direct Spellcheck is not an option *if
my understanding is correct.

Let's see if the Index Based spellcheck can help …
Unfortunately also in this case, the auxiliary index produced is based on
the analysed form of the original field.

If you really can not re-index content I would suggest you an
implementation based on a concept similar to the AnalyzingSuggester in Solr.

Open to clarify your further questions.








2015-07-10 9:31 GMT+01:00 davidphilip cherian 
:

> Hi Kamal,
>
> Not necessarily. You can have different filters applied at index time and
> query time. (note that the order in which filters are defined matters). You
> could just add the stop filter at query time.
> Have your own custom data type defined (similar to 'text_en' that will be
> in schem.xml) and perhaps use standard/whitespace tokenizer followed by
> stop filter at query time.
>
> Tip: Use analysis tool that is available in solr admin page to further
> understand the analysis chain of data types.
>
> HTH
>
>
>
> On Fri, Jul 10, 2015 at 1:03 PM, Kamal Kishore Aggarwal <
> kkroyal@gmail.com> wrote:
>
> > Hi David,
> >
> > This one is a good suggestion. But, if add these *adult* keywords in the
> > stopwords.txt file, it will be requiring the re-indexing of these
> keywords
> > related data.
> >
> > How can I see the change instantly. Is there any other great suggestion
> > that you can suggest me.
> >
> >
> >
> >
> > On Thu, Jul 9, 2015 at 12:09 PM, davidphilip cherian <
> > davidphilipcher...@gmail.com> wrote:
> >
> > > The best bet is to use solr.StopFilterFactory.
> > > Have all such words added to stopwords.txt and add this filter to your
> > > analyzer.
> > >
> > > Reference links
> > >
> > >
> >
> https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.StopFilterFactory
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-StopFilter
> > >
> > > HTH
> > >
> > >
> > > On Thu, Jul 9, 2015 at 11:50 AM, Kamal Kishore Aggarwal <
> > > kkroyal@gmail.com> wrote:
> > >
> > > > Hi Team,
> > > >
> > > > I am currently working with Java-1.7, Solr-4.8.1 with tomcat 7. Is
> > there
&g

Solr M2M authentication on Jetty

2016-05-18 Thread Gregoric, James
Dear Solr Community,

We would like to provide an in-house group of users access to our Solr database 
in a way that meets the following specifications:

1.   Use the Jetty web service that Solr 6.0 installs by default.

2.   Provide an M2M (machine-to-machine) interface, so a user can setup a 
cron job that periodically executes a query and stores the results.

3.   Authentication credentials for the M2M interface to the Jetty service 
are provided by an LDAP service so it is possible to log who is accessing what 
data.

4.   Result data retrieved from Solr (result UIDs) are recorded by SPLUNK.

Can you offer advice and/or point us to a working example of any of these 
specification items?

Here's what we have so far:

A.  Completed item 1 above.  We've installed Solr 6.0 with Jetty on a Linux 
VM and it works great.

B.  Partially addressed item 3 above in that we can login to Jetty using 
LDAP.  However, our implementation is such that the login credentials are input 
interactively (via a login dialog).  We don't yet know how to perform this 
login from machine to machine.  This is the main sticking point right now.

Any insight you might provide would be greatly appreciated.

Regards,
Jim Gregoric
Boston Children's Hospital, Clinical Research Informatics


RE: Solr M2M authentication on Jetty

2016-05-18 Thread Gregoric, James
Correction:  Item 1 is not an absolute requirement; we can use Apache or Tomcat 
if that makes things any easier.

-Original Message-
From: Gregoric, James [mailto:james.grego...@childrens.harvard.edu] 
Sent: Wednesday, May 18, 2016 1:54 PM
To: solr-user@lucene.apache.org
Subject: Solr M2M authentication on Jetty

Dear Solr Community,

We would like to provide an in-house group of users access to our Solr database 
in a way that meets the following specifications:

1.   Use the Jetty web service that Solr 6.0 installs by default.

2.   Provide an M2M (machine-to-machine) interface, so a user can setup a 
cron job that periodically executes a query and stores the results.

3.   Authentication credentials for the M2M interface to the Jetty service 
are provided by an LDAP service so it is possible to log who is accessing what 
data.

4.   Result data retrieved from Solr (result UIDs) are recorded by SPLUNK.

Can you offer advice and/or point us to a working example of any of these 
specification items?

Here's what we have so far:

A.  Completed item 1 above.  We've installed Solr 6.0 with Jetty on a Linux 
VM and it works great.

B.  Partially addressed item 3 above in that we can login to Jetty using 
LDAP.  However, our implementation is such that the login credentials are input 
interactively (via a login dialog).  We don't yet know how to perform this 
login from machine to machine.  This is the main sticking point right now.

Any insight you might provide would be greatly appreciated.

Regards,
Jim Gregoric
Boston Children's Hospital, Clinical Research Informatics


Alternate Port Not Working for Solr 6.0.0

2016-05-31 Thread Teague James
Hello,

I am trying to install Solr 6.0.0 and have been successful with the default
installation, following the instructions provided on the Apache Solr
website. However, I do not want Solr running on port 8983, I want it to run
on port 80. I started a new Ubuntu 14.04 VM, installed open JDK 8, then
installed Solr with the following commands:

Command: tar xzf solr-6.0.0.tgz solr-6.0.0/bin/install_solr_service.sh
--strip-components=2
Response: None, which is good.

Command: ./install_solr_service.sh solr-6.0.0.tgz -p 80
Response: Misplaced or Unknown flag -p

So I tried...
Command: ./install_solr_service.sh solr-6.0.0.tgz -i /opt -d /var/solr -u
solr -s solr -p 80
Response: A dump of the log, which is INFO only with no errors or warnings,
at the top of which is "Solr process 4831 from /var/solr/solr-80.pid not
found"

If I look in the /var/solr directory I find a file called solr-80.pid, but
nothing else. What did I miss? Previous versions of Solr, which I deployed
with Tomcat instead of Jetty, allowed me to control this in the server.xml
file in /etc/tomcat7/, but obviously this no longer applies. I like the ease
of the installation script; I just want to be able to control the port
assignment. Any help is appreciated! Thanks!

-Teague

PS - Please resist the urge to ask me why I want it on port 80. I am well
aware of the security implications, etc., but regardless I still need to
make this operational on port 80. Cheers!



RE: Alternate Port Not Working for Solr 6.0.0

2016-06-02 Thread Teague James
ssues - happy searching!

IF I change the port assignment to 1001, same screen dump/failure to load as 
with port 80.

IF I change the port assignment to 1250, no issues - happy searching!

IF I change the port assignment to 1100, no issues - happy searching!

IF I change the port assignment to 1050, no issues - happy searching!

IF I change the port assignment to 1025, no issues - happy searching!

IF I change the port assignment to 1015, same screen dump/failure to load as 
with port 80.

IF I change the port assignment to 1020, same screen dump/failure to load as 
with port 80.

IF I change the port assignment to 1021, same screen dump/failure to load as 
with port 80.

IF I change the port assignment to 1022, same screen dump/failure to load as 
with port 80.

IF I change the port assignment to 1023, same screen dump/failure to load as 
with port 80.

IF I change the port assignment to 1024, no issues - happy searching!

Based on the above, it appears that port 80 itself is not special, but rather 
that Solr does not play nice with any port below 1024. There may exist an upper 
limit, but I did not test for that since my goal was to assign the application 
to port 80.

For the record, there are no other listeners listening to port 80. The only 
listeners are 53 for dnsmasq and 631 for cupsd on my system. Also, I have 
successfully run Solr on port 80 on all 2.x-4.9.1 installations. I never go 
around to upgrading to 5.x, so I do not know if there are issues with low ports 
and that version.

Any insight as to why Solr 6.0.0 does not play nice with ports below 1024 would 
be appreciated. If this is a "feature" of the application, it'd be nice to see 
that in the documentation.

Thanks Shawn!

-Teague

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Tuesday, May 31, 2016 4:31 PM
To: solr-user@lucene.apache.org
Subject: Re: Alternate Port Not Working for Solr 6.0.0

On 5/31/2016 2:02 PM, Teague James wrote:
> Hello, I am trying to install Solr 6.0.0 and have been successful with 
> the default installation, following the instructions provided on the 
> Apache Solr website. However, I do not want Solr running on port 8983, 
> I want it to run on port 80. I started a new Ubuntu 14.04 VM, 
> installed open JDK 8, then installed Solr with the following commands:
> Command: tar xzf solr-6.0.0.tgz solr-6.0.0/bin/install_solr_service.sh
> --strip-components=2 Response: None, which is good. Command:
> ./install_solr_service.sh solr-6.0.0.tgz -p 80 Response: Misplaced or 
> Unknown flag -p So I tried... Command: ./install_solr_service.sh 
> solr-6.0.0.tgz -i /opt -d /var/solr -u solr -s solr -p 80 Response: A 
> dump of the log, which is INFO only with no errors or warnings, at the 
> top of which is "Solr process 4831 from /var/solr/solr-80.pid not 
> found" If I look in the /var/solr directory I find a file called 
> solr-80.pid, but nothing else. What did I miss? Previous versions of 
> Solr, which I deployed with Tomcat instead of Jetty, allowed me to 
> control this in the server.xml file in /etc/tomcat7/, but obviously 
> this no longer applies. I like the ease of the installation script; I 
> just want to be able to control the port assignment. Any help is 
> appreciated! Thanks!

The port can be changed after install, although I have been also able to change 
the port during install with the -p parameter.  Check /etc/default/solr.in.sh 
and look for a line setting SOLR_PORT.  On my dev server, it looks like this:

SOLR_PORT=8982

Before making any changes in that file, make sure that Solr is not running at 
all, or you may be forced to manually kill it.

Thanks,
Shawn



RE: Using Solr to index zip files

2016-06-07 Thread BURN, James
Hi
I think you'll need to do some unzipping of your zip files using an unzip 
application before you post to Solr. If you do this via a OS level batch script 
you can apply logic there to deal with nested zips. Then post your unzipped 
files to Solr via Curl.

James

-Original Message-
From: anupama.gangad...@daimler.com [mailto:anupama.gangad...@daimler.com] 
Sent: 07 June 2016 03:57
To: solr-user@lucene.apache.org
Subject: Using Solr to index zip files

Hi,

I have an use case where I need to search zip files quickly in HDFS. I intend 
to use Solr but not finding any relevant information about whether it can be 
done for zip files.
These are nested zip files i.e. zips within a zip file. Any help/information is 
much appreciated.

Thank you,
Regards,
Anupama


If you are not the addressee, please inform us immediately that you have 
received this e-mail by mistake, and delete it. We thank you for your support.

Oxford University Press (UK) Disclaimer

This message is confidential. You should not copy it or disclose its contents 
to anyone. You may use and apply the information for the intended purpose only. 
OUP does not accept legal responsibility for the contents of this message. Any 
views or opinions presented are those of the author only and not of OUP. If 
this email has come to you in error, please delete it, along with any 
attachments. Please note that OUP may intercept incoming and outgoing email 
communications.


RE: using spell check on phrases

2016-06-10 Thread Dyer, James
Kaveh,

If your query has "mm" set to zero or a low value, then you may want to 
override this when the spellchecker checks possible collations.  For example:

spellcheck.collateParam.mm=100%

You may also want to consider adding "spellcheck.maxResultsForSuggest" to your 
query, so that it will return spelling suggestions even when the query returns 
some results.  Also if you set "spellcheck.alternativeTermCount", then it will 
try to correct all of the query keywords, including those that exist in the 
dictionary.

See https://cwiki.apache.org/confluence/display/solr/Spell+Checking for more 
information.

James Dyer
Ingram Content Group

-Original Message-
From: kaveh minooie [mailto:ka...@plutoz.com] 
Sent: Monday, June 06, 2016 8:19 PM
To: solr-user@lucene.apache.org
Subject: using spell check on phrases

Hi everyone

I am using solr 6 and DirectSolrSpellChecker, and edismax parser. the 
problem that I am having is that when the query is a phrase, every 
single word in the phrase need to be misspelled for the spell checker to 
gets activated and gives suggestions. if only one of the word is 
misspelled then it just says that spelling is correct:
true

I was wondering if anyone has encountered this situation before and 
knows how to solve it?

thanks,

-- 
Kaveh Minooie



RE: Solr 4.3.1 - Spell-Checker with MULTI-WORD PHRASE

2016-07-29 Thread Dyer, James
You need to set the "spellcheck.maxCollationTries" parameter to a value greater 
than zero.  The higher the value, the more queries it checks for hits, and the 
longer it could potentially take.

See 
https://cwiki.apache.org/confluence/display/solr/Spell+Checking#SpellChecking-Thespellcheck.maxCollationTriesParameter

James Dyer
Ingram Content Group

-Original Message-
From: SRINI SOLR [mailto:srini.s...@gmail.com] 
Sent: Friday, July 22, 2016 12:05 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 4.3.1 - Spell-Checker with MULTI-WORD PHRASE

Hi all - please help me here

On Thursday, July 21, 2016, SRINI SOLR  wrote:
> Hi All -
> Could you please help me on spell check on multi-word phrase as a whole...
> Scenario -
> I have a problem with solr spellcheck suggestions for multi word phrases.
With the query for 'red chillies'
>
>
q=red+chillies&wt=xml&indent=true&spellcheck=true&spellcheck.extendedResults=true&spellcheck.collate=true
>
> I get
>
> 
> 
> 2
> 4
> 12
> 0
> 
> chiller4
> challis2
> 
> 
> false
> red chiller
> 
>
> The problem is, even though 'chiller' has 4 results in index, 'red
chiller' has none. So we end up suggesting a phrase with 0 result.
>
> What can I do to make spellcheck work on the whole phrase only?
>
> Please help me here ...


Tutorial not working for me

2016-09-16 Thread Pritchett, James
I apologize if this is a really stupid question. I followed all
instructions on installing Tutorial, got data loaded, everything works
great until I try to query with a field name -- e.g., name:foundation. I
get zero results from this or any other query which specifies a field name.
Simple queries return results, and the field names are listed in those
results correctly. But if I query using names that I know are there and
values that I know are there, I get nothing.

I figure this must be something basic that is not right about the way
things have gotten set up, but I am completely blocked at this point. I
tried blowing it all away and restarting from scratch with no luck. Where
should I be looking for problems here? I am running this on a MacBook, OS X
10.9, latest JDK (1.8).

James

-- 


*James Pritchett*

Leader, Process Redesign and Analysis

__


*Learning Ally™*Together It’s Possible
20 Roszel Road | Princeton, NJ 08540 | Office: 609.243.7608

jpritch...@learningally.org

www.LearningAlly.org <http://www.learningally.org/>

Join us in building a community that helps blind, visually impaired &
dyslexic students thrive.

Connect with our community: *Facebook*
<https://www.facebook.com/LearningAlly.org> | *Twitter*
<https://twitter.com/Learning_Ally> | *LinkedIn*
<https://www.linkedin.com/groups?home=&gid=2644842&trk=anet_ug_hm> |
*Explore1in5* <http://www.explore1in5.org/> | *Instagram*
<https://instagram.com/Learning_Ally/> | *Sign up for our community
newsletter* <https://go.learningally.org/about-learning-ally/stay-in-touch/>

Support us: *Donate*
<https://go.learningally.org/about-learning-ally/donate/> | *Volunteer*
<https://go.learningally.org/about-learning-ally/volunteers/how-you-can-help/>


Re: Tutorial not working for me

2016-09-16 Thread Pritchett, James
I am following the exact instructions in the tutorial: copy and pasting all
commands & queries from the tutorial:
https://lucene.apache.org/solr/quickstart.html. Where it breaks down is
this one:

http://localhost:8983/solr/gettingstarted/select?wt=json&indent=true&q=name:foundation

This returns no results. Tried in the web admin view as well, also tried
various field:value combinations to no avail. Clearly something didn't get
configured correctly, but I saw no error messages when running all the data
loads, etc. given in the tutorial.

Sorry to be so clueless, but I don't really have anything to go on for
troubleshooting besides asking dumb questions.

James

On Fri, Sep 16, 2016 at 11:24 AM, John Bickerstaff  wrote:

> Please share the exact query syntax?
>
> Are you using a collection you built or one of the examples?
>
> On Fri, Sep 16, 2016 at 9:06 AM, Pritchett, James <
> jpritch...@learningally.org> wrote:
>
> > I apologize if this is a really stupid question. I followed all
> > instructions on installing Tutorial, got data loaded, everything works
> > great until I try to query with a field name -- e.g., name:foundation. I
> > get zero results from this or any other query which specifies a field
> name.
> > Simple queries return results, and the field names are listed in those
> > results correctly. But if I query using names that I know are there and
> > values that I know are there, I get nothing.
> >
> > I figure this must be something basic that is not right about the way
> > things have gotten set up, but I am completely blocked at this point. I
> > tried blowing it all away and restarting from scratch with no luck. Where
> > should I be looking for problems here? I am running this on a MacBook,
> OS X
> > 10.9, latest JDK (1.8).
> >
> > James
> >
> > --
> >
> >
> > *James Pritchett*
> >
> > Leader, Process Redesign and Analysis
> >
> > __
> >
> >
> > *Learning Ally™*Together It’s Possible
> > 20 Roszel Road | Princeton, NJ 08540 | Office: 609.243.7608
> >
> > jpritch...@learningally.org
> >
> > www.LearningAlly.org <http://www.learningally.org/>
> >
> > Join us in building a community that helps blind, visually impaired &
> > dyslexic students thrive.
> >
> > Connect with our community: *Facebook*
> > <https://www.facebook.com/LearningAlly.org> | *Twitter*
> > <https://twitter.com/Learning_Ally> | *LinkedIn*
> > <https://www.linkedin.com/groups?home=&gid=2644842&trk=anet_ug_hm> |
> > *Explore1in5* <http://www.explore1in5.org/> | *Instagram*
> > <https://instagram.com/Learning_Ally/> | *Sign up for our community
> > newsletter* <https://go.learningally.org/about-learning-ally/stay-in-
> > touch/>
> >
> > Support us: *Donate*
> > <https://go.learningally.org/about-learning-ally/donate/> | *Volunteer*
> > <https://go.learningally.org/about-learning-ally/
> > volunteers/how-you-can-help/>
> >
>



-- 


*James Pritchett*

Leader, Process Redesign and Analysis

__


*Learning Ally™*Together It’s Possible
20 Roszel Road | Princeton, NJ 08540 | Office: 609.243.7608

jpritch...@learningally.org

www.LearningAlly.org <http://www.learningally.org/>

Join us in building a community that helps blind, visually impaired &
dyslexic students thrive.

Connect with our community: *Facebook*
<https://www.facebook.com/LearningAlly.org> | *Twitter*
<https://twitter.com/Learning_Ally> | *LinkedIn*
<https://www.linkedin.com/groups?home=&gid=2644842&trk=anet_ug_hm> |
*Explore1in5* <http://www.explore1in5.org/> | *Instagram*
<https://instagram.com/Learning_Ally/> | *Sign up for our community
newsletter* <https://go.learningally.org/about-learning-ally/stay-in-touch/>

Support us: *Donate*
<https://go.learningally.org/about-learning-ally/donate/> | *Volunteer*
<https://go.learningally.org/about-learning-ally/volunteers/how-you-can-help/>


Re: Tutorial not working for me

2016-09-16 Thread Pritchett, James
r-core-6.2.0.jar
-Dauto=yes -Dc=gettingstarted -Ddata=files
org.apache.solr.util.SimplePostTool example/exampledocs/books.json
SimplePostTool version 5.0.0
Posting files to [base] url
http://localhost:8983/solr/gettingstarted/update...
Entering auto mode. File endings considered are
xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
POSTing file books.json (application/json) to [base]/json/docs
1 files indexed.
COMMITting Solr index changes to
http://localhost:8983/solr/gettingstarted/update...
Time spent: 0:00:01.782
marplon:solr-6.2.0 jpritchett$ bin/post -c gettingstarted
example/exampledocs/books.csv
java -classpath /Users/jpritchett/solr-6.2.0/dist/solr-core-6.2.0.jar
-Dauto=yes -Dc=gettingstarted -Ddata=files
org.apache.solr.util.SimplePostTool example/exampledocs/books.csv
SimplePostTool version 5.0.0
Posting files to [base] url
http://localhost:8983/solr/gettingstarted/update...
Entering auto mode. File endings considered are
xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
POSTing file books.csv (text/csv) to [base]
1 files indexed.
COMMITting Solr index changes to
http://localhost:8983/solr/gettingstarted/update...
Time spent: 0:00:00.204
marplon:solr-6.2.0 jpritchett$ curl "
http://localhost:8983/solr/gettingstarted/select?wt=json&indent=true&q=foundation
"
{
  "responseHeader":{
"zkConnected":true,
"status":0,
"QTime":264,
"params":{
  "q":"foundation",
  "indent":"true",
  "wt":"json"}},
  "response":{"numFound":4156,"start":0,"maxScore":0.098080166,"docs":[
  {
"id":"0553293354",
"cat":["book"],
"name":["Foundation"],
"price":[7.99],
"inStock":[false],
"author":["Isaac Asimov"],
"series_t":["Foundation Novels"],
"sequence_i":1,
"genre_s":"scifi",
"_version_":1545646368061652992},

[etc.]]
  }}
marplon:solr-6.2.0 jpritchett$ curl "
http://localhost:8983/solr/gettingstarted/select?wt=json&indent=true&q=name:foundation
"
{
  "responseHeader":{
"zkConnected":true,
"status":0,
"QTime":47,
"params":{
  "q":"name:foundation",
  "indent":"true",
  "wt":"json"}},
  "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
  }}
marplon:solr-6.2.0 jpritchett$


On Fri, Sep 16, 2016 at 11:40 AM, Erick Erickson 
wrote:

> My bet:
> the fields (look in managed_schema or, possibly schema.xml)
> has stored="true" and indexed="false" set for the fields
> in question.
>
> Pretty much everyone takes a few passes before this really
> makes sense. "stored" means you see the results returned,
> "indexed" must be true before you can search on something.
>
> Second possibility: You've somehow indexed fields as
> "string" type rather than one of the text based fieldTypes.
> "string" types are not tokenized, thus a field with
> "My dog has fleas" will fail to find "My". It'll even not match
> "my dog has fleas" (note capital "M").
>
> The admin UI>>select core>>analysis page will show you
> lots of this kind of detail, although I admit it takes a bit to
> understand all the info (do un-check the "verbose" button
> for the nonce).
>
> Now, all that aside, please show us the field definition for
> one of the fields in question and, as John mentions, the exact
> query (I'd also ass &debugQuery=true to the results).
>
> Saying you followed the exact instructions somewhere isn't
> really helpful. It's likely that there's something innocent-seeming
> that was done differently. Giving the information asked for
> will help us diagnose what's happening and, perhaps,
> improve the docs if we can understand the mis-match.
>
> Best,
> Erick
>
> On Fri, Sep 16, 2016 at 8:28 AM, Pritchett, James
>  wrote:
> > I am following the exact instructions in the tutorial: copy and pasting
> all
> > commands & queries from the tutorial:
> > https://lucene.apache.org/solr/quickstart.html. Where it breaks down is
> > this one:
> >
> > http://localhost:8983/solr/gettingstarted/select?wt=json&;
> indent=true&q=name:foundation
> >
> > This returns no results. Tried i

Re: Tutorial not working for me

2016-09-16 Thread Pritchett, James
Second possibility: You've somehow indexed fields as
"string" type rather than one of the text based fieldTypes.
"string" types are not tokenized, thus a field with
"My dog has fleas" will fail to find "My". It'll even not match
"my dog has fleas" (note capital "M").

That appears to be the issue. Searching for name:Foundation indeed returns
the expected result. I will now go find some better entry point to SOLR
than the tutorial, which has wasted enough of my time for one day. Any
suggestions would be welcome.

James

On Fri, Sep 16, 2016 at 11:40 AM, Erick Erickson 
wrote:

> My bet:
> the fields (look in managed_schema or, possibly schema.xml)
> has stored="true" and indexed="false" set for the fields
> in question.
>
> Pretty much everyone takes a few passes before this really
> makes sense. "stored" means you see the results returned,
> "indexed" must be true before you can search on something.
>
> Second possibility: You've somehow indexed fields as
> "string" type rather than one of the text based fieldTypes.
> "string" types are not tokenized, thus a field with
> "My dog has fleas" will fail to find "My". It'll even not match
> "my dog has fleas" (note capital "M").
>
> The admin UI>>select core>>analysis page will show you
> lots of this kind of detail, although I admit it takes a bit to
> understand all the info (do un-check the "verbose" button
> for the nonce).
>
> Now, all that aside, please show us the field definition for
> one of the fields in question and, as John mentions, the exact
> query (I'd also ass &debugQuery=true to the results).
>
> Saying you followed the exact instructions somewhere isn't
> really helpful. It's likely that there's something innocent-seeming
> that was done differently. Giving the information asked for
> will help us diagnose what's happening and, perhaps,
> improve the docs if we can understand the mis-match.
>
> Best,
> Erick
>
> On Fri, Sep 16, 2016 at 8:28 AM, Pritchett, James
>  wrote:
> > I am following the exact instructions in the tutorial: copy and pasting
> all
> > commands & queries from the tutorial:
> > https://lucene.apache.org/solr/quickstart.html. Where it breaks down is
> > this one:
> >
> > http://localhost:8983/solr/gettingstarted/select?wt=json&;
> indent=true&q=name:foundation
> >
> > This returns no results. Tried in the web admin view as well, also tried
> > various field:value combinations to no avail. Clearly something didn't
> get
> > configured correctly, but I saw no error messages when running all the
> data
> > loads, etc. given in the tutorial.
> >
> > Sorry to be so clueless, but I don't really have anything to go on for
> > troubleshooting besides asking dumb questions.
> >
> > James
> >
> > On Fri, Sep 16, 2016 at 11:24 AM, John Bickerstaff <
> j...@johnbickerstaff.com
> >> wrote:
> >
> >> Please share the exact query syntax?
> >>
> >> Are you using a collection you built or one of the examples?
> >>
> >> On Fri, Sep 16, 2016 at 9:06 AM, Pritchett, James <
> >> jpritch...@learningally.org> wrote:
> >>
> >> > I apologize if this is a really stupid question. I followed all
> >> > instructions on installing Tutorial, got data loaded, everything works
> >> > great until I try to query with a field name -- e.g.,
> name:foundation. I
> >> > get zero results from this or any other query which specifies a field
> >> name.
> >> > Simple queries return results, and the field names are listed in those
> >> > results correctly. But if I query using names that I know are there
> and
> >> > values that I know are there, I get nothing.
> >> >
> >> > I figure this must be something basic that is not right about the way
> >> > things have gotten set up, but I am completely blocked at this point.
> I
> >> > tried blowing it all away and restarting from scratch with no luck.
> Where
> >> > should I be looking for problems here? I am running this on a MacBook,
> >> OS X
> >> > 10.9, latest JDK (1.8).
> >> >
> >> > James
> >> >
> >> > --
> >> >
> >> >
> >> > *James Pritchett*
> >> >
> >> > Leader, Process Redesign and Analysis
> >> >
> >> > _

Re: Tutorial not working for me

2016-09-16 Thread Pritchett, James
Thanks for that. I totally get how it is with complicated, open source
projects. And from experience, I realize that beginner-level documentation
is really hard, especially with these kinds of projects: by the time you
get to documentation, everybody involved is so expert in all the details
that they can't imagine approaching from a blank slate.

Thanks for the suggestions. Had to chuckle, though: one of your links (
quora.com) is the one that I started with. Step 1: "Download Solr, actually
do the tutorial ..."

Best wishes,

James

On Fri, Sep 16, 2016 at 1:41 PM, John Bickerstaff 
wrote:

> I totally empathize about the sense of wasted time.  On Solr in particular
> I pulled my hair out for months - and I had access to people who had been
> using it for over two years!!!
>
> For what it's worth - this is kind of how it goes with most open source
> projects in my experience.  It's painful - and - the more moving parts the
> open source project has, the more painful the learning curve (usually)...
>
> But - the good news is that's why this list is here - we're all trying to
> help each other, so feel free to ping the list sooner rather than later
> when you're frustrated.  My new rule is one hour of being blocked...  I
> used to wait days - but everyone on the list seems to really understand how
> frustrating it is to be stuck and people have really taken time to help me
> - so I'm less hesitant.  And, of course, I try to pay it forward by
> contributing as much as I can in the same way.
>
> On that note: I've been particularly focused on working with Solr in terms
> of being able to keep upgrading simple by just replacing and re-indexing so
> if you have questions on that space (Disaster Recovery, Zookeeper config,
> etc) I may be able to help - and if you're looking for "plan" for building
> and maintaining a simple solrCloud working model on Ubuntu VMs on
> VirtualBox, I can *really* help you.
>
> Off the top of my head - some places to start:
>
> http://yonik.com/getting-started-with-solr/
> https://www.quora.com/What-is-the-best-way-to-learn-SOLR
> http://blog.outerthoughts.com/2015/11/learning-solr-comprehensively/
> http://www.solr-start.com/
>
> I think everyone responsible for those links is also a frequent "helper" on
> this email forum.
>
> Also (and I'm aware it's a glass half-full thing which frequently irritates
> me, but I'll say it anyway).  Having run into this problem I'm willing to
> wager you'll never forget this particular quirk and if you see the problem
> in future, you'll know exactly what's wrong.  It shouldn't have been
> "wrong" with the example, but for my part at least - I've begun to think of
> stuff like this as just part of the learning curve because it happens
> nearly every time.
>
> Software is hard - complex projects like SOLR are hard.  It's why we get
> paid to do stuff like this.  I'm actually getting paid pretty well right
> now because Solr is recognized as difficult and I have (with many thanks to
> this list) become known as someone who "knows Solr"...
>
> It *could* and *should* be better, but open source is what it is as a
> result of the sum total of what everyone has contributed - and we're all
> happy to help you as best we can.
>
>
>
> On Fri, Sep 16, 2016 at 11:13 AM, Pritchett, James <
> jpritch...@learningally.org> wrote:
>
> > Second possibility: You've somehow indexed fields as
> > "string" type rather than one of the text based fieldTypes.
> > "string" types are not tokenized, thus a field with
> > "My dog has fleas" will fail to find "My". It'll even not match
> > "my dog has fleas" (note capital "M").
> >
> > That appears to be the issue. Searching for name:Foundation indeed
> returns
> > the expected result. I will now go find some better entry point to SOLR
> > than the tutorial, which has wasted enough of my time for one day. Any
> > suggestions would be welcome.
> >
> > James
> >
> > On Fri, Sep 16, 2016 at 11:40 AM, Erick Erickson <
> erickerick...@gmail.com>
> > wrote:
> >
> > > My bet:
> > > the fields (look in managed_schema or, possibly schema.xml)
> > > has stored="true" and indexed="false" set for the fields
> > > in question.
> > >
> > > Pretty much everyone takes a few passes before this really
> > > makes sense. "stored" means you see the results returned,
> > > "indexed" must be true before you can search on something.
> > 

Re: Tutorial not working for me

2016-09-21 Thread Pritchett, James
FWIW, my next step was to work with the movie example file, which worked
perfectly and was a much, much better "getting started" intro. You could do
worse than to build a new tutorial/getting started from this example.
Dataset is way more fun, too -- a quality that should never be
underestimated in a tutorial.

James

On Fri, Sep 16, 2016 at 8:34 PM, Chris Hostetter 
wrote:

>
> : I apologize if this is a really stupid question. I followed all
>
> It's not a stupid question, the tutorial is completley broken -- and for
> that matter, in my opinion, the data_driven_schema_configs used by that
> tutorial (and recommended for new users) are largely useless for the same
> underlying reason...
>
> https://issues.apache.org/jira/browse/SOLR-9526
>
> Thank you very much for asking about this - hopefully the folks who
> understand this more (and don't share my opinion that the entire concept
> of data_driven schemas are a terrible idea) can chime in and explain WTF
> is going on here)
>
>
> -Hoss
> http://www.lucidworks.com/
>



-- 


*James Pritchett*

Leader, Process Redesign and Analysis

__


*Learning Ally™*Together It’s Possible
20 Roszel Road | Princeton, NJ 08540 | Office: 609.243.7608

jpritch...@learningally.org

www.LearningAlly.org <http://www.learningally.org/>

Join us in building a community that helps blind, visually impaired &
dyslexic students thrive.

Connect with our community: *Facebook*
<https://www.facebook.com/LearningAlly.org> | *Twitter*
<https://twitter.com/Learning_Ally> | *LinkedIn*
<https://www.linkedin.com/groups?home=&gid=2644842&trk=anet_ug_hm> |
*Explore1in5* <http://www.explore1in5.org/> | *Instagram*
<https://instagram.com/Learning_Ally/> | *Sign up for our community
newsletter* <https://go.learningally.org/about-learning-ally/stay-in-touch/>

Support us: *Donate*
<https://go.learningally.org/about-learning-ally/donate/> | *Volunteer*
<https://go.learningally.org/about-learning-ally/volunteers/how-you-can-help/>


Re: Tutorial not working for me

2016-09-22 Thread Pritchett, James
>
>
>
> From your perspective as a new user, did you find it
> anoying/frustrating/confusing that the README.txt in the films example
> required/instructed you to first create a handful of fields using a curl
> command to hit the Schema API before you could index any of the documents?
>
> https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;a=
> blob;f=solr/example/films/README.txt
>
>
> N
​o, I didn't find that to be a problem. In fact, in my view that's not a
bug, that's a feature -- at least from my very limited experience, it seems
like that kind of schema setup is probably pretty standard stuff when
building a SOLR core, and so including it in the example teaches you
something useful that you'll need to do pretty much right off the bat. I
don't think that I did it via curl, though ... I must have used the admin
interface, which was just simpler than copying and pasting that
hairy-looking, multiline command into a terminal. If you used the films
example as the basis for a tutorial and wrote it up in pretty HTML, you
could include screenshots, etc. That would make it completely painless.

James


Solr 6 Highlighting Not Working

2016-10-21 Thread Teague James
Can someone please help me troubleshoot my Solr 6.0 highlighting issue? I
have a production Solr 4.9.0 unit configured to highlight responses and it
has worked for a long time now without issues. I have recently been testing
Solr 6.0 and have been unable to get highlighting to work. I used my 4.9
configuration as a guide when configuring my 6.0 machine. Here are the
primary configs:

solrconfig.xml
In my  query requestHandler I have the following:
on
text
html



It is worth noting here that the documentation in the wiki says
hl.simple.pre and hl.simple.post both accept the following:


Using this config in 6.0 causes the core to malfunction at startup throwing
an error that essentially says that an XML statement was not closed. I had
to add the escaped characters just to get the solrconfig to load! Why? That
isn't documented anywhere I looked. It makes me wonder if this is the source
of the problems with highlighting since it works in my 4.9 implementation
without escaping. Is there something wrong with 6's ability to parse XML?

I upload documents using cURL:
curl http://localhost:8983/solr/[CORENAME]/update?commit=true -H
"Content-Type:text/xml" --data-binary '7518TEST02. This is the second
test.'

When I search using a browser:
http://50.16.13.37:8983/solr/pp/query?indent=true&q=TEST04&wt=xml

The response I get is:


7518

TEST02. This is the second test.



TEST02. This is the second test.


1548827202660859904
2.2499826






Note that nothing appears in the highlight section. Why?



RE: CachedSqlEntityProcessor with delta-import

2016-10-21 Thread Dyer, James
Sowmya,

My memory is that the cache feature does not work with Delta Imports.  In fact, 
I believe that nearly all DIH features except straight JDBC imports do not work 
with Delta Imports.  My advice is to not use the Delta Import feature at all as 
the same result can (often more-efficiently) be accomplished following the 
approach outlined here: 
https://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport

James Dyer
Ingram Content Group

-Original Message-
From: Mohan, Sowmya [mailto:sowmya.mo...@icf.com] 
Sent: Tuesday, October 18, 2016 10:07 AM
To: solr-user@lucene.apache.org
Subject: CachedSqlEntityProcessor with delta-import

Good morning,

Can CachedSqlEntityProcessor be used with delta-import? In my setup when 
running a delta-import with CachedSqlEntityProcessor, the child entity values 
are not correctly updated for the parent record. I am on Solr 4.3. Has anyone 
experienced this and if so how to resolve it?

Thanks,
Sowmya.



Solr 6.0 Highlighting Not Working

2016-10-24 Thread Teague James
Can someone please help me troubleshoot my Solr 6.0 highlighting issue? I
have a production Solr 4.9.0 unit configured to highlight responses and it
has worked for a long time now without issues. I have recently been testing
Solr 6.0 and have been unable to get highlighting to work. I used my 4.9
configuration as a guide when configuring my 6.0 machine. Here are the
primary configs:

solrconfig.xml
In my  query requestHandler I have the following:
on
text
html



It is worth noting here that the documentation in the wiki says
hl.simple.pre and hl.simple.post both accept the following:


Using this config in 6.0 causes the core to malfunction at startup throwing
an error that essentially says that an XML statement was not closed. I had
to add the escaped characters just to get the solrconfig to load! Why? That
isn't documented anywhere I looked. It makes me wonder if this is the source
of the problems with highlighting since it works in my 4.9 implementation
without escaping. Is there something wrong with 6's ability to parse XML?

I upload documents using cURL:
curl http://localhost:8983/solr/[CORENAME]/update?commit=true -H
"Content-Type:text/xml" --data-binary '7518TEST02. This is the second
test.'

When I search using a browser:
http://50.16.13.37:8983/solr/pp/query?indent=true&q=TEST04&wt=xml

The response I get is:
 
7518  TEST02. This is the
second test.



TEST02. This is the second test.


1548827202660859904
2.2499826






Note that nothing appears in the highlight section. Why?

Any help would be appreciated - thanks!

-Teague



RE: Solr 6.0 Highlighting Not Working

2016-10-25 Thread Teague James
Hi - Thanks for the reply, I'll give that a try.  

-Original Message-
From: jimtronic [mailto:jimtro...@gmail.com] 
Sent: Monday, October 24, 2016 3:56 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 6.0 Highlighting Not Working

Perhaps you need to wrap your inner "" and "" tags in the CDATA
structure?





--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-6-0-Highlighting-Not-Working-tp43027
87p4302835.html
Sent from the Solr - User mailing list archive at Nabble.com.



Two separate instances sharing the same zookeeper cluster

2017-09-14 Thread James Keeney
I have a staging and a production solr cluster. I'd like to have them use
the same zookeeper cluster. It seems like it is possible if I can set a
different directory for the second cluster. I've looked through the
documentation though and I can't quite figure out where to set that up. As
a result my staging cluster nodes keep trying to add themselves tot he
production cluster.

If someone could point me in the right direction?

Jim K.
-- 
Jim Keeney
President, FitterWeb
E: j...@fitterweb.com
M: 703-568-5887

*FitterWeb Consulting*
*Are you lean and agile enough? *


Re: Two separate instances sharing the same zookeeper cluster

2017-09-15 Thread James Keeney
Mike -

Thank you, this was very helpful. I've doing some research and
experimenting.

As currently configured solr is launched as a service. I looked at the
sol.in.sh file in /etc/default and we are running using a list of servers
for the zookeeper cluster.

so I think that is translated to -z zookeeper1,zookeeper2,zookeeper3 (these
are defined in the hosts file)

If I understand what I am reading setting a specific configset path would
be done explicitly by adding the path to the end of the zookeeper server
list:

-z zookeeper1,zookeeper2,zookeeper3/solr_dev for example.

However, I'm not sure how to switch the production cluster to explicitly
reference the directory it currently uses. Do I need to setup the directory
first?

As per this?
https://lucene.apache.org/solr/guide/6_6/taking-solr-to-production.html#TakingSolrtoProduction-ZooKeeperchroot

Would I setup say solr_prod, upconfig all the configs, switch over one node
and then migrate over the rest of the nodes , ending with the leader?

Would that then move production to solr_prod as the config base?

Once that is done I would then setup the dev.

Does any of this make sense?

Jim K.


On Thu, Sep 14, 2017 at 4:08 PM Mike Drob  wrote:

> When you specify the zk string for a solr instance, you typically include a
> chroot in it. I think the default is /solr, but it doesn't have to be, so
> you should be able to run with -z zk1:2181/sorl-dev and /solr-prod
>
>
> https://lucene.apache.org/solr/guide/6_6/setting-up-an-external-zookeeper-ensemble.html#SettingUpanExternalZooKeeperEnsemble-PointSolrattheinstance
>
> On Thu, Sep 14, 2017 at 3:01 PM, James Keeney 
> wrote:
>
> > I have a staging and a production solr cluster. I'd like to have them use
> > the same zookeeper cluster. It seems like it is possible if I can set a
> > different directory for the second cluster. I've looked through the
> > documentation though and I can't quite figure out where to set that up.
> As
> > a result my staging cluster nodes keep trying to add themselves tot he
> > production cluster.
> >
> > If someone could point me in the right direction?
> >
> > Jim K.
> > --
> > Jim Keeney
> > President, FitterWeb
> > E: j...@fitterweb.com
> > M: 703-568-5887 <(703)%20568-5887>
> >
> > *FitterWeb Consulting*
> > *Are you lean and agile enough? *
> >
>
-- 
Jim Keeney
President, FitterWeb
E: j...@fitterweb.com
M: 703-568-5887

*FitterWeb Consulting*
*Are you lean and agile enough? *


Quick quester about suggester component

2017-10-17 Thread James Keeney
I've setup the suggester and want to act on the full document when user
selects one of the suggestions.

Ideally it would be nice to be able to tell the suggester to return more
than just the field that the suggestion index is built from.

If that can't be done, then should I do the following:


   1. Get the suggestions
   2. When user selects one, take the suggestion term and do a search of
   the field that the suggester used to build it's index.

Is that correct?

Jim K.
-- 
Jim Keeney
President, FitterWeb
E: j...@fitterweb.com
M: 703-568-5887

*FitterWeb Consulting*
*Are you lean and agile enough? *


Re: Quick quester about suggester component

2017-10-17 Thread James Keeney
Yep. Understood.

On Tue, Oct 17, 2017, 8:14 PM Erick Erickson 
wrote:

> Well, you tell the suggester what field to use in the first place in
> the configuration.
>
> But I don't quite understand. Suggester is not _intended_ to return
> documents. It returns, well, suggestions. It's up to you to do
> something with them, i.e. substitute them into a new query (against
> whatever fields you want) and send that query to Solr. The new query
> you send can use the edismax parser to automatically search across
> several fields and the like.
>
> Suggesters are not intended to automatically do another search if
> that's what you're asking.
>
> Best,
> Erick
>
> On Tue, Oct 17, 2017 at 10:49 AM, James Keeney 
> wrote:
> > I've setup the suggester and want to act on the full document when user
> > selects one of the suggestions.
> >
> > Ideally it would be nice to be able to tell the suggester to return more
> > than just the field that the suggestion index is built from.
> >
> > If that can't be done, then should I do the following:
> >
> >
> >1. Get the suggestions
> >2. When user selects one, take the suggestion term and do a search of
> >the field that the suggester used to build it's index.
> >
> > Is that correct?
> >
> > Jim K.
> > --
> > Jim Keeney
> > President, FitterWeb
> > E: j...@fitterweb.com
> > M: 703-568-5887
> >
> > *FitterWeb Consulting*
> > *Are you lean and agile enough? *
>
-- 
Jim Keeney
President, FitterWeb
E: j...@fitterweb.com
M: 703-568-5887

*FitterWeb Consulting*
*Are you lean and agile enough? *


Solr cloud inquiry

2017-11-15 Thread kasinger, james
Hello folks,



To start, we have a sharded solr cloud configuration running solr version 5.1.0 
. During shard to shard communication there is a problem state where queries 
are sent to a replica, and on that replica the storage is inaccessible. The 
node is healthy so it’s still taking requests which get piled up waiting to 
read from disk resulting in a latency increase. We’ve tried resolving this 
storage inaccessibility but it appears related to AWS ebs issues.  Has anyone 
encountered the same issue?

thanks


Starting SolrCloud

2016-11-28 Thread James Muerle
Hello,

I am very new to Solr, and I'm excited to get it up and running on amazon
ec2 for some prototypical testing. So, I've installed solr (and java) on
one ec2 instance, and I've installed zookeeper on another. After starting
the zookeeper server on the default port of 2181, I run this on the solr
instance: "opt/solr/bin/solr start -c -z ".
us-west-2.compute.amazonaws.com/solr"", which seems to complete
successfully:

Archiving 1 old GC log files to /opt/solr/server/logs/archived
Archiving 1 console log files to /opt/solr/server/logs/archived
Rotating solr logs, keeping a max of 9 generations
Waiting up to 180 seconds to see Solr running on port 8983 [|]
Started Solr server on port 8983 (pid=13038). Happy searching!

But then when I run "/opt/solr/bin/solr status", I get this output:

Found 1 Solr nodes:

Solr process 13038 running on port 8983

ERROR: Failed to get system information from http://localhost:8983/solr due
to: org.apache.http.client.ClientProtocolException: Expected JSON response
from server but received: 


Error 500 Server Error

HTTP ERROR 500
Problem accessing /solr/admin/info/system. Reason:
Server ErrorCaused
by:org.apache.solr.common.SolrException: Error processing the
request. CoreContainer is either not initialized or shutting down.
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:518)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
at java.lang.Thread.run(Thread.java:745)





Typically, this indicates a problem with the Solr server; check the Solr
server logs for more information.


I don't quite understand what things could be causing this problem, so I'm
really at a loss at the moment. If you need any additional information, I'd
be glad to provide it.

Thanks for reading!
James


Re: Starting SolrCloud

2016-11-29 Thread James Muerle
Hello,

Thanks for reading this, but it has been resolved. I honestly don't know
what was happening, but restarting my shell and running the exact same
commands today instead of yesterday seems to have fixed it.

Best,
James

On Mon, Nov 28, 2016 at 8:07 PM, James Muerle  wrote:

> Hello,
>
> I am very new to Solr, and I'm excited to get it up and running on amazon
> ec2 for some prototypical testing. So, I've installed solr (and java) on
> one ec2 instance, and I've installed zookeeper on another. After starting
> the zookeeper server on the default port of 2181, I run this on the solr
> instance: "opt/solr/bin/solr start -c -z ".us-
> west-2.compute.amazonaws.com/solr"", which seems to complete successfully:
>
> Archiving 1 old GC log files to /opt/solr/server/logs/archived
> Archiving 1 console log files to /opt/solr/server/logs/archived
> Rotating solr logs, keeping a max of 9 generations
> Waiting up to 180 seconds to see Solr running on port 8983 [|]
> Started Solr server on port 8983 (pid=13038). Happy searching!
>
> But then when I run "/opt/solr/bin/solr status", I get this output:
>
> Found 1 Solr nodes:
>
> Solr process 13038 running on port 8983
>
> ERROR: Failed to get system information from http://localhost:8983/solr
> due to: org.apache.http.client.ClientProtocolException: Expected JSON
> response from server but received: 
> 
> 
> Error 500 Server Error
> 
> HTTP ERROR 500
> Problem accessing /solr/admin/info/system. Reason:
> Server ErrorCaused 
> by:org.apache.solr.common.SolrException:
> Error processing the request. CoreContainer is either not initialized or
> shutting down.
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:263)
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:254)
> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler.java:1668)
> at org.eclipse.jetty.servlet.ServletHandler.doHandle(
> ServletHandler.java:581)
> at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:143)
> at org.eclipse.jetty.security.SecurityHandler.handle(
> SecurityHandler.java:548)
> at org.eclipse.jetty.server.session.SessionHandler.
> doHandle(SessionHandler.java:226)
> at org.eclipse.jetty.server.handler.ContextHandler.
> doHandle(ContextHandler.java:1160)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(
> ServletHandler.java:511)
> at org.eclipse.jetty.server.session.SessionHandler.
> doScope(SessionHandler.java:185)
> at org.eclipse.jetty.server.handler.ContextHandler.
> doScope(ContextHandler.java:1092)
> at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:141)
> at org.eclipse.jetty.server.handler.ContextHandlerCollection.
> handle(ContextHandlerCollection.java:213)
> at org.eclipse.jetty.server.handler.HandlerCollection.
> handle(HandlerCollection.java:119)
> at org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper.java:134)
> at org.eclipse.jetty.server.Server.handle(Server.java:518)
> at org.eclipse.jetty.server.HttpChannel.handle(
> HttpChannel.java:308)
> at org.eclipse.jetty.server.HttpConnection.onFillable(
> HttpConnection.java:244)
> at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(
> AbstractConnection.java:273)
> at org.eclipse.jetty.io.FillInterest.fillable(
> FillInterest.java:95)
> at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
> SelectChannelEndPoint.java:93)
> at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> produceAndRun(ExecuteProduceConsume.java:246)
> at org.eclipse.jetty.util.thread.strategy.
> ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)
> at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> QueuedThreadPool.java:654)
> at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(
> QueuedThreadPool.java:572)
> at java.lang.Thread.run(Thread.java:745)
> 
>
> 
> 
>
> Typically, this indicates a problem with the Solr server; check the Solr
> server logs for more information.
>
>
> I don't quite understand what things could be causing this problem, so I'm
> really at a loss at the moment. If you need any additional information, I'd
> be glad to provide it.
>
> Thanks for reading!
> James
>


Re: Regarding /sql -- WHERE <> IS NULL and IS NOT NULL

2017-01-09 Thread Gethin James
For NOT NULL, I had some success using:


WHERE field_name <> '' (greater or less than empty quotes)


Best regards,

Gethin.


From: Joel Bernstein 
Sent: 05 January 2017 20:12:19
To: solr-user@lucene.apache.org
Subject: Re: Regarding /sql -- WHERE <> IS NULL and IS NOT NULL

IS NULL and IS NOT NULL predicate are not currently supported.

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Jan 5, 2017 at 2:05 PM, radha krishnan 
wrote:

> Hi,
>
> solr version : 6.3
>
> will WHERE <> IS NULL / IS NOT NULL work with the /sql handler
> ?
>
> " select   name from gettingstarted where name is not null "
>
> the above query is not returning any documents in the response even if
> there are documents with "name"defined
>
>
> Thanks,
> Radhakrishnan D
>


RE: Can't get spelling suggestions to work properly

2017-01-17 Thread Dyer, James
Jimi,

Generally speaking, spellcheck does not work well against fields with stemming, 
or other "heavy" analysis.  I would  to a field that is tokenized 
on whitespace with little else, and use that field for spellcheck.

By default, the spellchecker does not suggest for words in the index.  So if 
the user misspells a word but the misspelling is actually some other word that 
is indexed, it will never suggest.  You can orverride this behavior by 
specifying  "spellcheck.alternativeTermCount" with a value >0.  This is how 
many suggestions it should give for words that indeed exist in the index.  This 
can be the same value as "spellcheck.count", but you may wish to set it to a 
lower value.

I do not recommend using "spellcheck.onlyMorePopular".  It is similar to 
"spellcheck.alternativeTermCount", but in my opinion, the later gives a better 
experience.

You might also wish to set "spellcheck.maxResultsForSuggest".  If you set this, 
then the spellchecker will not suggest anything if more results are returned 
than the value you specify.  This is helpful in providing "did you mean"-style 
suggestions for queries that return few results.

If you would like to ensure the suggestions combine nicely into a re-written 
query that returns results, then specify both "spellcheck.collate=true" and 
"spellcheck.maxCollationTries" to a value >0 (possibly 5-10).  This will cause 
it to internally check the re-written queries (aka. Collations) and report back 
on how many results you get for each.  If you are using "q.op=OR" or a low 
value for "mm", then you will likely want to override this with something like 
"spellcheck.collateParam.mm=0".  Otherwise every combination will get reported 
as returning results.

I hope this and other comments you've gotten helps demystify spellcheck 
configuration.  I do agree it is fairly complicated and frustrating to get it 
just right.

James Dyer
Ingram Content Group

-Original Message-
From: jimi.hulleg...@svensktnaringsliv.se 
[mailto:jimi.hulleg...@svensktnaringsliv.se] 
Sent: Friday, January 13, 2017 5:16 AM
To: solr-user@lucene.apache.org
Subject: RE: Can't get spelling suggestions to work properly

I just noticed why setting maxResultsForSuggest to a high value was not a good 
thing. Because now it show spelling suggestions even on correctly spelled words.

I think, what I would need is the logic of SuggestMode. 
SUGGEST_WHEN_NOT_IN_INDEX, but with a configurable limit instead of it being 
hard coded to 0. Ie just as maxQueryFrequency works.

/Jimi

-Original Message-
From: jimi.hulleg...@svensktnaringsliv.se 
[mailto:jimi.hulleg...@svensktnaringsliv.se] 
Sent: Friday, January 13, 2017 5:56 PM
To: solr-user@lucene.apache.org
Subject: RE: Can't get spelling suggestions to work properly

Hi Alessandro,

Thanks for your explanation. It helped a lot. Although setting 
"spellcheck.maxResultsForSuggest" to a value higher than zero was not enough. I 
also had to set "spellcheck.alternativeTermCount". With that done, I now get 
suggestions when searching for 'mycet' (a misspelling of the Swedish word 
'mycket', that didn't return suggestions before).

Although, I'm still not able to fully understand how to configure this 
properly. Because with this change there now are other misspelled searches that 
now longer gives suggestions. The problem here is stemming, I suspect. Because 
the main search fields use stemming, so that in some cases one can get lots of 
results for spellings that doesn't exist in the index at all (or, at least not 
in the spelling-field). How can I configure this component so that those 
suggestions are still included? Do I need to set maxResultsForSuggest to a 
really high number? Like Integer.MAX_VALUE? I feel that such a setting would 
defeat the purpose of that parameter, in a way. But I'm not sure how else to 
solve this.

Also, there is one other things I wonder about the spelling suggestions, that 
you might have the answer to. Is there a way to make the logic case 
insensitive, but the presentation case sensitive? For example, a search for 
'georg washington' now would return 'george washington' as a suggestion, but ' 
Georg Washington' would be even better.

Regards
/Jimi


-Original Message-
From: alessandro.benedetti [mailto:abenede...@apache.org] 
Sent: Thursday, January 12, 2017 5:14 PM
To: solr-user@lucene.apache.org
Subject: Re: Can't get spelling suggestions to work properly

Hi Jimi,
taking a look to the *maxQueryFrequency*  param :

Your understanding is correct.

1) we don't provide misspelled suggestions if we set the param to 1, and we 
have a minimum of 1 doc freq for the term .

2) we don't provide misspelled suggestions if the doc frequency of the term is 
greater t

RE: StringIndexOutOfBoundsException "in" SpellCheckCollator.getCollation

2017-01-17 Thread Dyer, James
This sounds a lot like SOLR-4489.  However it looks like this was fixed prior 
to you version (4.5).  So it could be you found another case where this bug 
still exists.

The other thing is the default Query Converter cannot handle all cases, and it 
could be the query you are sending is beyond its abilities?  Even in this case, 
it'd be nice if it failed more gracefully than this.

Could you provide the query parameters you are sending and also how you have 
spellcheck configured?

James Dyer
Ingram Content Group


-Original Message-
From: Clemens Wyss DEV [mailto:clemens...@mysign.ch] 
Sent: Thursday, January 05, 2017 8:22 AM
To: 'solr-user@lucene.apache.org' 
Subject: StringIndexOutOfBoundsException "in" SpellCheckCollator.getCollation

I am seeing many exceptions like this in my Solr [5.4.1] log:
null:java.lang.StringIndexOutOfBoundsException: String index out of range: -2
at 
java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:824)
at java.lang.StringBuilder.replace(StringBuilder.java:262)
at 
org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:236)
at 
org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:93)
at 
org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:238)
at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:203)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:273)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2073)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)
...
at java.lang.Thread.run(Thread.java:745)

What am I potentially facing here?

Thx
Clemens


Solr 6.0.0 Returns Blank Highlights for Certain Queries

2017-01-18 Thread Teague James
Hello everyone! I have a Solr 6.0.0 instance that is storing documents
peppered with text like "1a", "2e", "4c", etc. If I search the documents for
a word, "ms", "in", "the", etc., I get the correct number of hits and the
results are highlighted correctly in the highlighting section. But when I
search for "1a" or "2e" I get hits, but the highlights are blank:




Where "8667" is the document ID of the record that had the hit, but no
highlight. Other searches, "ms" for example, return:


 
  
   
MS
   
  
 


Why does highlighting fail for "1a" type searches? Any help is appreciated!
Thanks!

-Teague James



Solr 6.0.0 Returns Blank Highlights for alpha-numeric combos

2017-02-01 Thread Teague James
Hello everyone! I'm still stuck on this issue and could really use some
help. I have a Solr 6.0.0 instance that is storing documents peppered with
text like "1a", "2e", "4c", etc. If I search the documents for a word, "ms",
"in", "the", etc., I get the correct number of hits and the results are
highlighted correctly in the highlighting section. But when I search for
"1a" or "2e" I get hits, but the highlights are blank. Further testing
revealed that the highlighter fails to highlight any combination of
alpha-numeric two character value, such a n0, b1, 1z, etc.:

...



Where "8667" is the document ID of the record that had the hit, but no
highlight. Other searches, "ms" for example, return:

...

 
  
   
MS
   
  
 


Why does highlighting fail for "1a" type searches? Any help is appreciated!
Thanks!

-Teague James



RE: Solr 6.0.0 Returns Blank Highlights for alpha-numeric combos

2017-02-01 Thread Teague James
Hi Erick! Thanks for the reply. The goal is to get two character terms like 1a, 
1b, 2a, 2b, 3a, etc. to get highlighted in the documents. Additional testing 
shows that any alpha-numeric combo returns a blank highlight, regardless of 
length. Thus, "pr0blem" will not highlight because of the zero in the middle of 
the term.

I came across a ServerFault article where it was suggested that the fieldType 
must be tokenized in order for highlighting to work correctly. Setting the 
field type to text_general was suggested as a solution. In my case the data is 
stored as a string fieldType, which is then copied using copyField to a field 
that has a fieldType of text_general, but I'm still not getting a good 
highlight on terms like "1a". Highlighting works for any other 
non-alpha-numeric term though.

Other articles pointed to termVectors and termOffsets, but none of these seemed 
to help. Here's  my config:

























In the solrconfig file highlighting is set to use the text field: text 

Thoughts?

Appreciate the help! Thanks!

-Teague

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, February 1, 2017 2:49 PM
To: solr-user 
Subject: Re: Solr 6.0.0 Returns Blank Highlights for alpha-numeric combos

How far into the text field are these tokens? The highlighter defaults to the 
first 10K characters under control of hl.maxAnalyzedChars. It's vaguely 
possible that the values happen to be farther along in the text than that. Not 
likely, mind you but possible.

Best,
Erick

On Wed, Feb 1, 2017 at 8:24 AM, Teague James  wrote:
> Hello everyone! I'm still stuck on this issue and could really use 
> some help. I have a Solr 6.0.0 instance that is storing documents 
> peppered with text like "1a", "2e", "4c", etc. If I search the 
> documents for a word, "ms", "in", "the", etc., I get the correct 
> number of hits and the results are highlighted correctly in the 
> highlighting section. But when I search for "1a" or "2e" I get hits, 
> but the highlights are blank. Further testing revealed that the 
> highlighter fails to highlight any combination of alpha-numeric two character 
> value, such a n0, b1, 1z, etc.:
>  ...
> 
> 
>
> Where "8667" is the document ID of the record that had the hit, but no 
> highlight. Other searches, "ms" for example, return:
>  ...
> 
>  
>   
>
> MS
>
>   
>  
> 
>
> Why does highlighting fail for "1a" type searches? Any help is appreciated!
> Thanks!
>
> -Teague James
>



RE: spellcheck.q and local parameters

2014-04-28 Thread Dyer, James
spellcheck.q is supposed to take a list of raw query terms, so what you're 
trying to do in your example won't work.  What you should do instead is 
space-delimit the actual query terms that exist in "qq" and (nothing else) use 
that for your value of spellcheck.q .  

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Jeroen Steggink [mailto:jeroen.stegg...@contentstrategy.nl] 
Sent: Monday, April 28, 2014 3:01 PM
To: solr-user@lucene.apache.org
Subject: spellcheck.q and local parameters

Hi,

I'm having some trouble using the spellcheck.q parameter. The user's query is 
defined in the qq parameter and q parameter contains several other parameters 
for boosting.
I would like to use the qq parameter as a default for spellcheck.q.
I tried several ways of adding the qq parameter in the spellcheck.q parameter, 
but it doesn't seem to work. Is this at all possible or do I need to write a 
custom QueryConverter?

This is the configuration:

 _query_:"{!edismax qf=$qfQuery pf=$pfQuery bq=$boostQuery 
bf=$boostFunction v=$qq}"
{!v=$qq}

I haven't included all the variables, because they seem unnecessary.

Regards,
Jeroen



RE: solr 4.2.1 spellcheck strange results

2014-05-16 Thread Dyer, James
To achieve what you want, you need to specify a lightly analyzed field (no 
stemming) for spellcheck.  For instance, if your "solr.SpellCheckComponent" in 
solrconfig.xml is set up with "field" of "title_full", then try using 
"title_full_unstemmed".  Also, if you are specifying a 
"queryAnalyzerFieldType", it should be the same as your unstemmed text field.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: HL [mailto:freemail.grha...@gmail.com] 
Sent: Saturday, May 10, 2014 9:12 AM
To: solr-user@lucene.apache.org
Subject: solr 4.2.1 spellcheck strange results

Hi

I am querying the solr server spellcheck and the results I get back 
although at first glance look ok
it seems like solr is replying back as if it made the search with the 
wrong key.

so while I query the server with the word
"καρδυα"
Solr is responding me as if it was querying the database with the word 
"καρδυ" eliminating the last char
---



---

Ideally, Solr should properly indicate that the suggestions correspond 
with "καρδυα" rather than "καρδυ".

Is there a way to make solr respond with the original search word from 
the query in it's responce, instead of the one that is getting the hits 
from ??

Regars,
Harry



here is the complete solr responce
---


0
23

true
*,score
0
καρδυα
καρδυα

title_short^750 title_full_unstemmed^600 title_full^400 title^500 
title_alt^200 title_new^100 series^50 series2^30 author^300 
author_fuller^150 contents^10 topic_unstemmed^550 topic^500 
geographic^300 genre^300 allfields_unstemmed^10 fulltext_unstemmed^10 
allfields fulltext isbn issn

basicSpell
arrarr
dismax
xml
0






3
0
6
0


καρδ
5


καρδι
3


καρυ
1



false







RE: Spell check [or] Did you mean this with Phrase suggestion

2014-05-16 Thread Dyer, James
Have you looked at "spellcheck.collate", which re-writes the entire query with 
one or more corrected words?  See 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate .  There are 
several options shown at this link that controls how the "collate" feature 
works.

James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: vanitha venkatachalam [mailto:venkatachalam.vani...@gmail.com] 
Sent: Thursday, May 08, 2014 4:14 AM
To: solr-user@lucene.apache.org
Subject: Spell check [or] Did you mean this with Phrase suggestion

Hi,
We need a spell check component that suggest actual full phrase not just
words.

Say, we have list of brands : "Nike corporation", "Samsung electronics" ,

when I search for "tamsong", I like to get suggestions as "samsung
electronics" ( full phrase ) not just "samsung" ( words)
Please help.
-- 
regards,
Vanitha


RE: spellcheck if docsfound below threshold

2014-05-16 Thread Dyer, James
Its "spellcheck.maxResultsForSuggest".

http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.maxResultsForSuggest

James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: Jan Verweij - Reeleez [mailto:j...@reeleez.nl] 
Sent: Monday, May 12, 2014 2:12 AM
To: solr-user@lucene.apache.org
Subject: spellcheck if docsfound below threshold

Hi,

Is there a setting to only include spellcheck if the number of documents
found is below a certain threshold?

Or would we need to rerun the request with the spellcheck parameters based
on the docs found?

Kind regards,

Jan Verweij


Re: overseer queue clogged

2014-05-22 Thread James Hardwick
We’re seeing something similar to what Ryan reported, e.g. a massively clogged 
overseer queue that gets so bad it brings down our solr nodes. I tried “rmr”ing 
the entire /overseer/queue but it keeps returning with “Node does not exist: 
/overseer/queue/qn-0##”, after which in order to continue I have to 
create the node complained about and then execute the “rmr /overseer/queue” 
again, until it stumbled upon another node that doesn’t exist, rinse, wash, 
repeat…  


This is w/ Solr 4.7.1 and ZooKeeper 3.4.6

--  
James Hardwick


On Thursday, May 1, 2014 at 10:25 AM, Mark Miller wrote:

> What version are you running? This was fixed in a recent release. It can 
> happen if you hit add core with the defaults on the admin page in older 
> versions.
>  
> --  
> Mark Miller
> about.me/markrmiller (http://about.me/markrmiller)
>  
> On May 1, 2014 at 11:19:54 AM, ryan.cooke (ryan.co...@gmail.com 
> (mailto:ryan.co...@gmail.com)) wrote:
>  
> I saw an overseer queue clogged as well due to a bad message in the queue.  
> Unfortunately this went unnoticed for a while until there were 130K messages  
> in the overseer queue. Since it was a production system we were not able to  
> simply stop everything and delete all Zookeeper data, so we manually deleted  
> messages by issuing commands directly through the zkCli.sh (http://zkCli.sh) 
> tool. After all  
> the messages had been cleared, some nodes were in the wrong state (e.g.  
> 'down' when should have been 'active'). Restarting the 'down' or 'recovery  
> failed' nodes brought the whole cluster back to a stable and healthy state.  
>  
> Since it can take some digging to determine backlog in the overseer queue,  
> some of the symptoms we saw were:  
> Overseer throwing an exception like "Path must not end with / character"  
> Random nodes throwing an exception like "ClusterState says we are the  
> leader, but locally we don't think so"  
> Bringing up new replicas time out when attempting to fetch shard id  
>  
>  
>  
> --  
> View this message in context: 
> http://lucene.472066.n3.nabble.com/overseer-queue-clogged-tp4047878p4134129.html
>   
> Sent from the Solr - User mailing list archive at Nabble.com 
> (http://Nabble.com).  
>  
>  




RE: Wordbreak spellchecker excessive breaking.

2014-05-27 Thread Dyer, James
You can do this if you set it up like in the mail Solr example:


wordbreak
solr.WordBreakSolrSpellChecker  
name
true
true
10


The "combineWords" and "breakWords" flags let you tell it which kind of 
workbreak correction you want.  "maxChanges" controls the maximum number of 
words it can break 1 word into, or the maximum number of words it can combine.  
It is reasonable to set this to 1 or 2.

The best way to use this is in conjunction with a "regular" spellchecker like 
DirectSolrSpellChecker.  When used together with the collation functionality, 
it should take a query like "mob ile" and depending on what actually returns 
results from your data, suggest either "mobile" or perhaps "mob lie" or both.  
The one thing is cannot do is fix a transposition or misspelling and combine or 
break words in one shot.  That is, it cannot detect that "mob lie" should 
become "mobile".

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: S.L [mailto:simpleliving...@gmail.com] 
Sent: Saturday, May 24, 2014 4:21 PM
To: solr-user@lucene.apache.org
Subject: Wordbreak spellchecker excessive breaking.

I am using Solr wordbreak spellchecker and the issue is that when I search
for a term like "mob ile" expecting that the wordbreak spellchecker would
actually resutn a suggestion for "mobile" it breaks the search term into
letters like "m o b"  I have two issues with this behavior.

 1. How can I make Solr combine "mob ile" to mobile?
 2. Not withstanding the fact that my search term "mob ile" is being broken
incorrectly into individual letters , I realize that the wordbreak is
needed in certain cases, how do I control the wordbreak so that it does not
break it into letters like "m o b" which seems like excessive breaking to
me ?

Thanks.


How to Get Highlighting Working in Velocity (Solr 4.8.0)

2014-05-27 Thread Teague James
My Solr 4.8.0 index includes a field called 'dom_title'. The field is
displayed in the result set. I want to be able to highlight keywords from
this field in the displayed results. I have tried configuring solrconfig.xml
and I have tried adding parameters to the query "&hl=true&hl.fl=dom_title"
but the searched keyword never gets highlighted in the results. I am
attempting to use the Velocity Browse interface to demonstrate this. Most of
the configuration is right out of the box, except for the fields in the
schema.

>From my solrconfig.xml:


explicit
velocity
browse
layout
on
dom_title
html



I omitted a lot of basic query settings and facet field info from this
snippet to focus on the highlighting component. What am I missing?

-Teague



RE: Wordbreak spellchecker excessive breaking.

2014-05-30 Thread Dyer, James
I am not sure why changing spellcheck parameters would prevent your server from 
restarting.  One thing to check is to see if you have warming queries running 
that involve spellcheck.  I think I remember from long ago there was (maybe 
still is) an obscure bug where sometimes it will lock up in rare cases when 
spellcheck is used in warming queries.  I do not remember exactly what caused 
this or if it was ever fixed.

Besides that, you might want to post a stack trace or describe what happens 
when it doesn't restart.  Perhaps someone here will know what the problem is.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: S.L [mailto:simpleliving...@gmail.com] 
Sent: Friday, May 30, 2014 12:36 AM
To: solr-user@lucene.apache.org
Subject: Re: Wordbreak spellchecker excessive breaking.

James,

Thanks for clearly stating this , I was not able to find this documented
anywhere, yes I am using it with another spell checker (Direct) with the
collation on. I will try the maxChangtes and let you know.

On a side note , whenever I change the spellchecker parameter , I need to
rebuild the index  and delete the solr data directory before that  as my
Tomcat instance would not even start, can you let me know why ?

Thanks.




On Tue, May 27, 2014 at 12:21 PM, Dyer, James 
wrote:

> You can do this if you set it up like in the mail Solr example:
>
> 
> wordbreak
> solr.WordBreakSolrSpellChecker
> name
> true
> true
> 10
> 
>
> The "combineWords" and "breakWords" flags let you tell it which kind of
> workbreak correction you want.  "maxChanges" controls the maximum number of
> words it can break 1 word into, or the maximum number of words it can
> combine.  It is reasonable to set this to 1 or 2.
>
> The best way to use this is in conjunction with a "regular" spellchecker
> like DirectSolrSpellChecker.  When used together with the collation
> functionality, it should take a query like "mob ile" and depending on what
> actually returns results from your data, suggest either "mobile" or perhaps
> "mob lie" or both.  The one thing is cannot do is fix a transposition or
> misspelling and combine or break words in one shot.  That is, it cannot
> detect that "mob lie" should become "mobile".
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: S.L [mailto:simpleliving...@gmail.com]
> Sent: Saturday, May 24, 2014 4:21 PM
> To: solr-user@lucene.apache.org
> Subject: Wordbreak spellchecker excessive breaking.
>
> I am using Solr wordbreak spellchecker and the issue is that when I search
> for a term like "mob ile" expecting that the wordbreak spellchecker would
> actually resutn a suggestion for "mobile" it breaks the search term into
> letters like "m o b"  I have two issues with this behavior.
>
>  1. How can I make Solr combine "mob ile" to mobile?
>  2. Not withstanding the fact that my search term "mob ile" is being broken
> incorrectly into individual letters , I realize that the wordbreak is
> needed in certain cases, how do I control the wordbreak so that it does not
> break it into letters like "m o b" which seems like excessive breaking to
> me ?
>
> Thanks.
>


RE: DirectSpellChecker not returning expected suggestions.

2014-06-02 Thread Dyer, James
If "wrangle" is not in your index, and if it is within the max # of edits, then 
it should suggest it.

Are you getting anything back from spellcheck at all?  What is the exact query 
you are using?  How is the spellcheck field analyzed?  If you're using 
stemming, then "wrangle" and "wrangler" might be stemmed to the same word. (by 
the way, you shouldn't spellcheck against a stemmed or otherwise 
heavily-analyzed field).

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: S.L [mailto:simpleliving...@gmail.com] 
Sent: Monday, June 02, 2014 1:06 PM
To: solr-user@lucene.apache.org
Subject: Re: DirectSpellChecker not returning expected suggestions.

OK, I just realized that "wrangle" is a proper english word, probably thats
why I dont get a suggestion for "wrangler" in this case. How ever in my
test index there is no "wrangle" present , so even though this is a proper
english word , since there is no occurence of it in the index should'nt
Solr suggest me "wrangler" ?


On Mon, Jun 2, 2014 at 2:00 PM, S.L  wrote:

> I do not get any suggestion (when I search for "wrangle") , however I
> correctly get the suggestion wrangler when I search for wranglr , I am
> using the Direct and WordBreak spellcheckers in combination, I have not
> tried using anything else.
>
> Is the distance calculation of Solr different than what Levestien distance
> calculation ? I have set maxEdits to 1 , assuming that this corresponds to
> the maxDistance.
>
> Thanks for your help!
>
>
> On Mon, Jun 2, 2014 at 1:54 PM, david.w.smi...@gmail.com <
> david.w.smi...@gmail.com> wrote:
>
>> What do you get then?  Suggestions, but not the one you’re looking for, or
>> is it deemed correctly spelled?
>>
>> Have you tried another spellChecker impl, for troubleshooting purposes?
>>
>> ~ David Smiley
>> Freelance Apache Lucene/Solr Search Consultant/Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>>
>> On Sat, May 31, 2014 at 12:33 AM, S.L  wrote:
>>
>> > Hi All,
>> >
>> > I have a small test index of 400 documents , it happens to have an entry
>> > for  "wrangler", When I search for "wranglr", I correctly get the
>> collation
>> > suggestion as "wrangler", however when I search for "wrangle" , I do not
>> > get a suggestion for "wrangler".
>> >
>> > The Levenstien distance between wrangle --> wrangler is same as the
>> > Levestien distance between wranglr-->wrangler , I am just wondering why
>> I
>> > do not get a suggestion for wrangle.
>> >
>> > Below is my Direct spell checker configuration.
>> >
>> > 
>> >   direct
>> >   suggestAggregate
>> >   solr.DirectSolrSpellChecker
>> >   
>> >   internal
>> >   score
>> >
>> >   
>> >   0.7
>> >   
>> >   1
>> >   
>> >   3
>> >   
>> >   5
>> >   
>> >   4
>> >   
>> >   0.01
>> >   
>> >   
>> > 
>> >
>>
>
>


RE: Solr spellcheck - onlyMorePopular threshold?

2014-06-09 Thread Dyer, James
I believe it will return the terms that are most similar to the queried terms 
but have a greater term frequency than the queried terms.  It doesn't actually 
care what the term frequencies are, only that they are greater than the 
frequencies of the terms you queried on.

I do not know your use case, but you may want to consider using 
"spellcheck.alternativeTermCount" instead of "onlyMorePopular".  See 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount 
and 
https://issues.apache.org/jira/browse/SOLR-2585?focusedCommentId=13096153&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13096153
 for why.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Alistair [mailto:ali...@gmail.com] 
Sent: Monday, June 09, 2014 3:06 AM
To: solr-user@lucene.apache.org
Subject: Solr spellcheck - onlyMorePopular threshold?

Hello all,

I was wondering what does the "onlyMorePopular" option for spellchecking use
as its threshold? Will it always pick the suggestion that returns the most
queries or does it base its result based off of some threshold that can be
configured? 

Thanks!

Ali.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-spellcheck-onlyMorePopular-threshold-tp4140727.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Highlighting not working

2014-06-18 Thread Teague James
Vicky,

I resolved this by making sure that the field that is searched has
"stored=true". By default "text" is searched, which is the destination of
the copyFields and is not stored. If you change your copyField destination
to a field that is stored and use that field as the default search field
then highlighting should work - or at least it did for me.

As a super fast check, change the text field to "stored=true" and test.
Remember that you'll have to restart Solr and re-index first! HTH!

-Teague

-Original Message-
From: vicky [mailto:vi...@raytheon.com] 
Sent: Wednesday, June 18, 2014 10:28 AM
To: solr-user@lucene.apache.org
Subject: Re: Highlighting not working

Were you ever able to resolve this issue? I am having same issue and
highligh is not working for me on solr 4.8?



--
View this message in context:
http://lucene.472066.n3.nabble.com/Highlighting-not-working-tp4112659p414251
3.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Spell checker - limit on number of misspelt words in a search term.

2014-06-23 Thread Dyer, James
I do not believe there is such a setting.  Most likely you will need to 
increase the value for "maxCollationTries" to get it to discover the "correct" 
combination. Just be sure not to set this too high as queries with a lot of 
misspelled words (or for something your index simply doesn't have) will take 
longer to complete.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: S.L [mailto:simpleliving...@gmail.com] 
Sent: Tuesday, June 17, 2014 4:49 PM
To: solr-user@lucene.apache.org
Subject: Spell checker - limit on number of misspelt words in a search term.

Hi All,

I am using the Direct Spell checker component and I have collate =true in
my solrconfig.xml.

The issue that I noticed is that , when I have a search term with upto two
words in it and if both of them are misspelled  I get a collation query  as
a suggestion in the spellchecker output, if I increase the search term
length to 3 words and spell all of them incorrectly then I do not get a
collation query as an output in the spell checker suggestions.

Is there a setting in solrconfig.xml file that's  controlling this behavior
by restricting the length of the search term to be up to two misspelt words
to suggest a collation query, if so I would need to change the property.

Can anyone please let me know how to do so ?

Thanks.

Sent from my mobile.


RE: Endeca to Solr Migration

2014-07-02 Thread Dyer, James
We migrated a big application from Endeca (6.0, I think) a several years ago.  
We were not using any of the business UI tools, but we found that Solr is a lot 
more flexible and performant than Endeca.  But with more flexibility comes more 
you need to know.

The hardest thing was to migrate the Endeca dimensions to Solr facets.  We had 
endeca-api specific dependencies throughout the application, even in the 
presentation layer.  We ended up writing a bridge api that allowed us to keep 
our endeca-specific code and translate the queries to solr queries.  We are 
storing a cross-reference between the "N" values from Endeca and key/value 
pairs to translate something like N=4000 to "fq=Language:English".  With solr, 
there is more you need to do in your app that the backend doesn't manage for 
you.  In the end, though, it lets you sparate your concerns better.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: mrg81 [mailto:maya...@gmail.com] 
Sent: Saturday, June 28, 2014 1:11 PM
To: solr-user@lucene.apache.org
Subject: Endeca to Solr Migration

Hello --

I wanted to get some details on Endeca to Solr Migration. I am
interested in few topics:

1. We would like to migrate the Faceted Navigation, Boosting individual
records and a few other items. 
2. But the biggest question is about the UI [Experience Manager] - I have
not found a tool that comes close to Experience Manager. I did read about
Hue [In response to Gareth's question on Migration], but it seems that we
will have to do a lot of customization to use that. 

Questions:

1. Is there a UI that we can use? Is it possible to un-hook the Experience
Manager UI and point to Solr?
2. How long does a typical migration take? Assuming that we have to migrate
the Faceted Navigation and Boosted records? 

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Endeca-to-Solr-Migration-tp4144582.html
Sent from the Solr - User mailing list archive at Nabble.com.




Of, To, and Other Small Words

2014-07-14 Thread Teague James
Hello all,

I am working with Solr 4.9.0 and am searching for phrases that contain words
like "of" or "to" that Solr seems to be ignoring at index time. Here's what
I tried:

curl http://localhost/solr/update?commit=true -H "Content-Type: text/xml"
--data-binary '100blah blah blah knowledge of science blah blah
blah'

Then, using a broswer:

http://localhost/solr/collection1/select?q="knowledge+of+science"&fq=id:100

I get zero hits. Search for "knowledge" or "science" and I'll get hits.
"knowledge of" or "of science" and I get zero hits. I don't want to use
proximity if I can avoid it, as this may introduce too many undesireable
results. Stopwords.txt is blank, yet clearly Solr is ignoring "of" and "to"
and possibly more words that I have not discovered through testing yet. Is
there some other configuration file that contains these small words? Is
there any way to force Solr to pay attention to them and not drop them from
the phrase? Any advice is appreciated! Thanks!

-Teague




RE: Of, To, and Other Small Words

2014-07-14 Thread Teague James
Hi Anshum,

Thanks for replying and suggesting this, but the field type I am using (a 
modified text_general) in my schema has the file set to 'stopwords.txt'. 




















 

Just to be double sure I cleared the list in stopwords_en.txt, restarted Solr, 
re-indexed, and searched with still zero results. Any other suggestions on 
where I might be able to control this behavior?

-Teague


-Original Message-
From: Anshum Gupta [mailto:ans...@anshumgupta.net] 
Sent: Monday, July 14, 2014 4:04 PM
To: solr-user@lucene.apache.org
Subject: Re: Of, To, and Other Small Words

Hi Teague,

The StopFilterFactory (which I think you're using) by default uses 
lang/stopwords_en.txt (which wouldn't be empty if you check).
What you're looking at is the stopword.txt. You could either empty that file 
out or change the field type for your field.


On Mon, Jul 14, 2014 at 12:53 PM, Teague James  wrote:
> Hello all,
>
> I am working with Solr 4.9.0 and am searching for phrases that contain 
> words like "of" or "to" that Solr seems to be ignoring at index time. 
> Here's what I tried:
>
> curl http://localhost/solr/update?commit=true -H "Content-Type: text/xml"
> --data-binary '100 name="content">blah blah blah knowledge of science blah blah 
> blah'
>
> Then, using a broswer:
>
> http://localhost/solr/collection1/select?q="knowledge+of+science"&fq=i
> d:100
>
> I get zero hits. Search for "knowledge" or "science" and I'll get hits.
> "knowledge of" or "of science" and I get zero hits. I don't want to 
> use proximity if I can avoid it, as this may introduce too many 
> undesireable results. Stopwords.txt is blank, yet clearly Solr is ignoring 
> "of" and "to"
> and possibly more words that I have not discovered through testing 
> yet. Is there some other configuration file that contains these small 
> words? Is there any way to force Solr to pay attention to them and not 
> drop them from the phrase? Any advice is appreciated! Thanks!
>
> -Teague
>
>



-- 

Anshum Gupta
http://www.anshumgupta.net



RE: Of, To, and Other Small Words

2014-07-14 Thread Teague James
Jack,

Thanks for replying and the suggestion. I replied to another suggestion with my 
field type and I do have .  There's nothing in the 
stopwords.txt. I even cleaned out stopwords_en.txt just to be certain. Any 
other suggestions on how to control this behavior?

-Teague

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: Monday, July 14, 2014 4:26 PM
To: solr-user@lucene.apache.org
Subject: Re: Of, To, and Other Small Words

Or, if you happen to leave off the "words" attribute of the stop filter (or 
misspell the attribute name), it will use the internal Lucene hardwired list of 
stop words.

-- Jack Krupansky

-Original Message-
From: Anshum Gupta
Sent: Monday, July 14, 2014 4:03 PM
To: solr-user@lucene.apache.org
Subject: Re: Of, To, and Other Small Words

Hi Teague,

The StopFilterFactory (which I think you're using) by default uses 
lang/stopwords_en.txt (which wouldn't be empty if you check).
What you're looking at is the stopword.txt. You could either empty that file 
out or change the field type for your field.


On Mon, Jul 14, 2014 at 12:53 PM, Teague James 
wrote:
> Hello all,
>
> I am working with Solr 4.9.0 and am searching for phrases that contain 
> words like "of" or "to" that Solr seems to be ignoring at index time. 
> Here's what I tried:
>
> curl http://localhost/solr/update?commit=true -H "Content-Type: text/xml"
> --data-binary '100 name="content">blah blah blah knowledge of science blah blah 
> blah'
>
> Then, using a broswer:
>
> http://localhost/solr/collection1/select?q="knowledge+of+science"&fq=i
> d:100
>
> I get zero hits. Search for "knowledge" or "science" and I'll get hits.
> "knowledge of" or "of science" and I get zero hits. I don't want to 
> use proximity if I can avoid it, as this may introduce too many 
> undesireable results. Stopwords.txt is blank, yet clearly Solr is 
> ignoring "of" and "to"
> and possibly more words that I have not discovered through testing 
> yet. Is there some other configuration file that contains these small 
> words? Is there any way to force Solr to pay attention to them and not 
> drop them from the phrase? Any advice is appreciated! Thanks!
>
> -Teague
>
>



-- 

Anshum Gupta
http://www.anshumgupta.net 



RE: Of, To, and Other Small Words

2014-07-14 Thread Teague James
Alex,

Thanks! Great suggestion. I figured out that it was the EdgeNGramFilterFactory. 
Taking that out of the mix did it.

-Teague

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Monday, July 14, 2014 9:14 PM
To: solr-user
Subject: Re: Of, To, and Other Small Words

Have you tried the Admin UI's Analyze screen. Because it will show you what 
happens to the text as it progresses through the tokenizers and filters. No 
need to reindex.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov Solr resources: 
http://www.solr-start.com/ and @solrstart Solr popularizers community: 
https://www.linkedin.com/groups?gid=6713853


On Tue, Jul 15, 2014 at 8:10 AM, Teague James  wrote:
> Hi Anshum,
>
> Thanks for replying and suggesting this, but the field type I am using (a 
> modified text_general) in my schema has the file set to 'stopwords.txt'.
>
>  positionIncrementGap="100">
> 
> 
>  ignoreCase="true" words="stopwords.txt" />
> 
> 
> 
>  minGramSize="3" maxGramSize="10" />
> 
> 
> 
> 
> 
>  ignoreCase="true" words="stopwords.txt" />
>  synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> 
> 
> 
> 
> 
>
> Just to be double sure I cleared the list in stopwords_en.txt, restarted 
> Solr, re-indexed, and searched with still zero results. Any other suggestions 
> on where I might be able to control this behavior?
>
> -Teague
>
>
> -Original Message-
> From: Anshum Gupta [mailto:ans...@anshumgupta.net]
> Sent: Monday, July 14, 2014 4:04 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Of, To, and Other Small Words
>
> Hi Teague,
>
> The StopFilterFactory (which I think you're using) by default uses 
> lang/stopwords_en.txt (which wouldn't be empty if you check).
> What you're looking at is the stopword.txt. You could either empty that file 
> out or change the field type for your field.
>
>
> On Mon, Jul 14, 2014 at 12:53 PM, Teague James  
> wrote:
>> Hello all,
>>
>> I am working with Solr 4.9.0 and am searching for phrases that 
>> contain words like "of" or "to" that Solr seems to be ignoring at index time.
>> Here's what I tried:
>>
>> curl http://localhost/solr/update?commit=true -H "Content-Type: text/xml"
>> --data-binary '100> name="content">blah blah blah knowledge of science blah blah 
>> blah'
>>
>> Then, using a broswer:
>>
>> 
>> i
>> d:100
>>
>> I get zero hits. Search for "knowledge" or "science" and I'll get hits.
>> "knowledge of" or "of science" and I get zero hits. I don't want to 
>> use proximity if I can avoid it, as this may introduce too many 
>> undesireable results. Stopwords.txt is blank, yet clearly Solr is ignoring 
>> "of" and "to"
>> and possibly more words that I have not discovered through testing 
>> yet. Is there some other configuration file that contains these small 
>> words? Is there any way to force Solr to pay attention to them and 
>> not drop them from the phrase? Any advice is appreciated! Thanks!
>>
>> -Teague
>>
>>
>
>
>
> --
>
> Anshum Gupta
> http://www.anshumgupta.net
>



  1   2   3   4   5   6   7   8   9   >