SuggestComponent and edismax type boosting
Hi I'm setting up an autosuggester for Geonames location data with Solr 6.0.0, and have followed something like https://lucidworks.com/blog/2015/03/04/solr-suggester/ I can rank results with "weightField" for population, but I wonder if is it possible to further boost/rank the results based on the content of other (country text code) fields. I have done this through EDISMAX which boosts my standard select? results according to 3 other fields (population/country/featureCode) but I can't get it to work with SuggestComponent. Could/should I somehow populate a weightField as a combination of the 3 boosting fields? Is this possible, or am I best off indexing with Ngrams like this: http://www.andornot.com/blog/post/Advanced-autocomplete-with-Solr-Ngrams-and-Twitters-typeaheadjs.aspx Thoughts most welcome. Thanks James
Howto verify that update is "in-place"
I am using Solr 6.6 and carefully read the documentation about atomic and in-place updates. I am pretty sure that everything is set up as it should. But how can I make certain that a simple update command actually performs an in-place update without internally re-indexing all other fields? I am issuing this command to my server: (I am using implicit document routing, so I need the "Shard" parameter.) { "ID":1133, "Property_2":{"set":124}, "Shard":"FirstShard" } The log outputs: 2017-10-17 07:39:18.701 INFO (qtp1937348256-643) [c:MyCollection s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1] o.a.s.u.p.LogUpdateProcessorFactory [MyCollection_FirstShard_replica1] webapp=/solr path=/update params={commitWithin=1000&boost=1.0&overwrite=true&wt=json&_=1508221142230}{ add=[1133 (1581489542869811200)]} 0 1 2017-10-17 07:39:19.703 INFO (commitScheduler-283-thread-1) [c:MyCollection s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1] o.a.s.u.DirectUpdateHandler2 start commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=f alse,softCommit=true,prepareCommit=false} 2017-10-17 07:39:19.703 INFO (commitScheduler-283-thread-1) [c:MyCollection s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1] o.a.s.s.SolrIndexSearcher Opening [Searcher@32d539b4[MyCollection_FirstShard_replica1] main] 2017-10-17 07:39:19.703 INFO (commitScheduler-283-thread-1) [c:MyCollection s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1] o.a.s.u.DirectUpdateHandler2 end_commit_flush 2017-10-17 07:39:19.703 INFO (searcherExecutor-268-thread-1-processing-n:192.168.117.142:8983_solr x:MyCollection_FirstShard_replica1 s:FirstShard c:MyCollection r:core_node27) [c:MyCollection s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1] o.a.s.c.QuerySenderListener QuerySenderListener sending requests to Searcher@32d539b4[MyCollection_FirstShard_replica1] main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_i(6.6.0 ):C5011/1) Uninverting(_j(6.6.0):C478) Uninverting(_k(6.6.0):C345) Uninverting(_l(6.6.0):C4182) Uninverting(_m(6.6.0):C317) Uninverting(_n(6.6.0):C399) Uninverting(_q(6.6.0):C1)))} 2017-10-17 07:39:19.703 INFO (searcherExecutor-268-thread-1-processing-n:192.168.117.142:8983_solr x:MyCollection_FirstShard_replica1 s:FirstShard c:MyCollection r:core_node27) [c:MyCollection s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1] o.a.s.c.QuerySenderListener QuerySenderListener done. 2017-10-17 07:39:19.703 INFO (searcherExecutor-268-thread-1-processing-n:192.168.117.142:8983_solr x:MyCollection_FirstShard_replica1 s:FirstShard c:MyCollection r:core_node27) [c:MyCollection s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1] o.a.s.c.SolrCore [MyCollection_FirstShard_replica1] Registered new searcher Searcher@32d539b4[MyCollection_FirstShard_replica1] main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_i(6.6.0 ):C5011/1) Uninverting(_j(6.6.0):C478) Uninverting(_k(6.6.0):C345) Uninverting(_l(6.6.0):C4182) Uninverting(_m(6.6.0):C317) Uninverting(_n(6.6.0):C399) Uninverting(_q(6.6.0):C1)))} If I issue another, non-in-place update to another field which is not a DocValue, the log output is very similar. Can I increase verbosity? Will it tell me more about the type of update then? Thank you! James
AW: Howto verify that update is "in-place"
Hi Emir and Amrit, thanks for your reponses! @Emir: Nice idea but after changing any document in any way and after committing the changes, all Doc counter (Num, Max, Deleted) are still the same, only thing that changes is the Version (increases by steps of 2) . @Amrit: Are you saying that the _version_ field should not change when performing an atomic update operation? Thanks James -Ursprüngliche Nachricht- Von: Amrit Sarkar [mailto:sarkaramr...@gmail.com] Gesendet: Dienstag, 17. Oktober 2017 11:35 An: solr-user@lucene.apache.org Betreff: Re: Howto verify that update is "in-place" Hi James, As for each update you are doing via atomic operation contains the "id" / "uniqueKey". Comparing the "_version_" field value for one of them would be fine for a batch. Rest, Emir has list them out. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Tue, Oct 17, 2017 at 2:47 PM, Emir Arnautović < emir.arnauto...@sematext.com> wrote: > Hi James, > I did not try, but checking max and num doc might give you info if > update was in-place or atomic - atomic is reindexing of existing doc > so the old doc will be deleted. In-place update should just update doc > values of existing doc so number of deleted docs should not change. > > HTH, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection Solr & > Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 17 Oct 2017, at 09:57, James wrote: > > > > I am using Solr 6.6 and carefully read the documentation about > > atomic and in-place updates. I am pretty sure that everything is set > > up as it > should. > > > > > > > > But how can I make certain that a simple update command actually > performs an > > in-place update without internally re-indexing all other fields? > > > > > > > > I am issuing this command to my server: > > > > (I am using implicit document routing, so I need the "Shard" > > parameter.) > > > > > > > > { > > > > "ID":1133, > > > > "Property_2":{"set":124}, > > > > "Shard":"FirstShard" > > > > } > > > > > > > > > > > > The log outputs: > > > > > > > > 2017-10-17 07:39:18.701 INFO (qtp1937348256-643) [c:MyCollection > > s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1] > > o.a.s.u.p.LogUpdateProcessorFactory > > [MyCollection_FirstShard_replica1] > > webapp=/solr path=/update > > params={commitWithin=1000&boost=1.0&overwrite=true&wt= > json&_=1508221142230}{ > > add=[1133 (1581489542869811200)]} 0 1 > > > > 2017-10-17 07:39:19.703 INFO (commitScheduler-283-thread-1) > [c:MyCollection > > s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1] > > o.a.s.u.DirectUpdateHandler2 start > > commit{,optimize=false,openSearcher=false,waitSearcher=true, > expungeDeletes=f > > alse,softCommit=true,prepareCommit=false} > > > > 2017-10-17 07:39:19.703 INFO (commitScheduler-283-thread-1) > [c:MyCollection > > s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1] > > o.a.s.s.SolrIndexSearcher Opening > > [Searcher@32d539b4[MyCollection_FirstShard_replica1] main] > > > > 2017-10-17 07:39:19.703 INFO (commitScheduler-283-thread-1) > [c:MyCollection > > s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1] > > o.a.s.u.DirectUpdateHandler2 end_commit_flush > > > > 2017-10-17 07:39:19.703 INFO > > (searcherExecutor-268-thread-1-processing-n:192.168.117.142:8983_sol > > r > > x:MyCollection_FirstShard_replica1 s:FirstShard c:MyCollection > > r:core_node27) [c:MyCollection s:FirstShard r:core_node27 > > x:MyCollection_FirstShard_replica1] o.a.s.c.QuerySenderListener > > QuerySenderListener sending requests to > > Searcher@32d539b4[MyCollection_FirstShard_replica1] > > main{ExitableDirectoryReader(UninvertingDirectoryReader( > Uninverting(_i(6.6.0 > > ):C5011/1) Uninverting(_j(6.6.0):C478) Uninverting(_k(6.6.0):C345) > > Uninverting(_l(6.6.0):C4182) Uninverting(_m(6.6.0):C317) > > Uninverting(_n(6.6.0):C399) Uninverting(_q(6.6.0):C1)))} > > > > 2017-10-17 07:39:19.703 INFO > > (searcherExecutor-268-thread-1-processing-n:192.168.117.142:8983_sol > > r > > x:MyCollection_FirstShard_replica1 s:FirstShard c:MyCollection > > r:core_node27) [c:MyCollection s:FirstShard r:core_node27 > >
AW: Howto verify that update is "in-place"
I found a solution which works for me: Add a document with very little tokenized text and write down QTime (for me: 5ms) Add another document with very much text (I used about 1MB of Lorem Ipsum sample text) and write down QTime (for me: 70ms). Perform an update operation on document 2 which you want to test whether it is "in-place" and compare QTime. For me it was again 70ms. So I assume that my operation did re-index the whole document and was thus not an in-place update. -Ursprüngliche Nachricht- Von: Amrit Sarkar [mailto:sarkaramr...@gmail.com] Gesendet: Dienstag, 17. Oktober 2017 12:43 An: solr-user@lucene.apache.org Betreff: Re: Howto verify that update is "in-place" James, @Amrit: Are you saying that the _version_ field should not change when > performing an atomic update operation? It should change. a new version will be allotted to the document. I am not that sure about in-place updates, probably a test run will verify that. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Tue, Oct 17, 2017 at 4:06 PM, James wrote: > Hi Emir and Amrit, thanks for your reponses! > > @Emir: Nice idea but after changing any document in any way and after > committing the changes, all Doc counter (Num, Max, Deleted) are still > the same, only thing that changes is the Version (increases by steps of 2) . > > @Amrit: Are you saying that the _version_ field should not change when > performing an atomic update operation? > > Thanks > James > > > -Ursprüngliche Nachricht- > Von: Amrit Sarkar [mailto:sarkaramr...@gmail.com] > Gesendet: Dienstag, 17. Oktober 2017 11:35 > An: solr-user@lucene.apache.org > Betreff: Re: Howto verify that update is "in-place" > > Hi James, > > As for each update you are doing via atomic operation contains the > "id" / "uniqueKey". Comparing the "_version_" field value for one of > them would be fine for a batch. Rest, Emir has list them out. > > Amrit Sarkar > Search Engineer > Lucidworks, Inc. > 415-589-9269 > www.lucidworks.com > Twitter http://twitter.com/lucidworks > LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > > On Tue, Oct 17, 2017 at 2:47 PM, Emir Arnautović < > emir.arnauto...@sematext.com> wrote: > > > Hi James, > > I did not try, but checking max and num doc might give you info if > > update was in-place or atomic - atomic is reindexing of existing doc > > so the old doc will be deleted. In-place update should just update > > doc values of existing doc so number of deleted docs should not change. > > > > HTH, > > Emir > > -- > > Monitoring - Log Management - Alerting - Anomaly Detection Solr & > > Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > > > > > On 17 Oct 2017, at 09:57, James wrote: > > > > > > I am using Solr 6.6 and carefully read the documentation about > > > atomic and in-place updates. I am pretty sure that everything is > > > set up as it > > should. > > > > > > > > > > > > But how can I make certain that a simple update command actually > > performs an > > > in-place update without internally re-indexing all other fields? > > > > > > > > > > > > I am issuing this command to my server: > > > > > > (I am using implicit document routing, so I need the "Shard" > > > parameter.) > > > > > > > > > > > > { > > > > > > "ID":1133, > > > > > > "Property_2":{"set":124}, > > > > > > "Shard":"FirstShard" > > > > > > } > > > > > > > > > > > > > > > > > > The log outputs: > > > > > > > > > > > > 2017-10-17 07:39:18.701 INFO (qtp1937348256-643) [c:MyCollection > > > s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1] > > > o.a.s.u.p.LogUpdateProcessorFactory > > > [MyCollection_FirstShard_replica1] > > > webapp=/solr path=/update > > > params={commitWithin=1000&boost=1.0&overwrite=true&wt= > > json&_=1508221142230}{ > > > add=[1133 (1581489542869811200)]} 0 1 > > > > > > 2017-10-17 07:39:19.703 INFO (commitScheduler-283-thread-1) > > [c:MyCollection > > > s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1] > > > o.a.s.u.DirectUpdateHandler2 start > >
No in-place updates with router.field set
Steps to reproduce: Use Solr in SolrCloud mode. Create collection with implicit routing and router.field set to some field, e.g. "routerfield". Index very small document. Stop time -> X Index very large document. Stop time -> Y Apply update to large document. Note that update command has at least three entries: { "ID":1133, "Property_2":{"set":124}, "routerfield":"FirstShard" } QTime of update will always be closer to Y than to X. If I repeat these steps without setting router.field while creating the collection, QTime of update will be very close X. >From this simple test I conclude that router.field somehow prevents updates from being performed as in-place updates. Can anyone confirm? Is this a bug? Anybody care to open a Jira item if necessary? According to the first comment on https://issues.apache.org/jira/browse/SOLR-8889 the router.field option is hardly tested and there seem to be also other related problems.
BlendedTermQuery for Solr?
On my Solr 6.6 server I'd like to use BlendedTermQuery. https://lucene.apache.org/core/6_6_0/core/org/apache/lucene/search/BlendedTe rmQuery.html I know it is a Lucene class. Is there a Solr API available to access it? If not, maybe some workaround? Thanks!
how to avoid OOM while merge index
I am build the solr index on the hadoop, and at reduce step I run the task that merge the indexes, each part of index is about 1G, I have 10 indexes to merge them together, I always get the java heap memory exhausted, the heap size is about 2G also. I wonder which part use these so many memory. And how to avoid the OOM during the merge process.
Re:Re: how to avoid OOM while merge index
Sinece the hadoop task monitor will check each task, and when find it consume to much memory, then it will kill the task, so I am currently want to find a method to decrease the mem usage at solr side, any idea? At 2012-01-09 17:07:09,"Tomas Zerolo" wrote: >On Mon, Jan 09, 2012 at 01:29:39PM +0800, James wrote: >> I am build the solr index on the hadoop, and at reduce step I run the task >> that merge the indexes, each part of index is about 1G, I have 10 indexes to >> merge them together, I always get the java heap memory exhausted, the heap >> size is about 2G also. I wonder which part use these so many memory. And >> how to avoid the OOM during the merge process. > >There are three issues in there. You should first try to find out which >one it is (it's not clear to me based on your question): > > - Java heap memory: you can set that as a start option of the JVM. >You set the maximum with the -Xmxn start option. You get an >OutOfMemory exception if you reach that (no idea wheter the >SOLR code bubbles this up, but there are experts on that here). > - Operating system limit: you can set the limit for a process's >use of resources (memory, among others). Typically, Linux based >systems are shipped with unlimited memory setting; Ralf already >posted how to check/set that. >The situation here is a bit complicated, because there are >different limits (memory size vs. virtual memory size, mainly) >and they are exercised differently depending on the allocation >pattern. Anyway, I'd expect malloc() returning NULL in this >case and the Java runtime translating it (again) into an OutOfMemory >exception. > - Now the OOM killer is quite another kettle of fish. AFAIK, it's >Linux-specific. Once the global system memory is more-or-less >exhausted, the kernel kills some applications to try to improve >the situation. There's some heuristic in deciding which application >to kill, and there are some knobs to help the kernel in this >decision. I'd recommend [1]; after reading *that* you know all :-) >You know you've run into that by looking at the system log. > > >[1] <https://lwn.net/Articles/317814/> >-- >Tomás Zerolo >Axel Springer AG >Axel Springer media Systems >BILD Produktionssysteme >Axel-Springer-Straße 65 >10888 Berlin >Tel.: +49 (30) 2591-72875 >tomas.zer...@axelspringer.de >www.axelspringer.de > >Axel Springer AG, Sitz Berlin, Amtsgericht Charlottenburg, HRB 4998 >Vorsitzender des Aufsichtsrats: Dr. Giuseppe Vita >Vorstand: Dr. Mathias Döpfner (Vorsitzender) >Jan Bayer, Ralph Büchi, Lothar Lanz, Dr. Andreas Wiele
Re:Re: is there any practice to load index into RAM to accelerate solr performance?
But the solr did not have the im-memory index, I am right? At 2012-02-08 16:17:49,"Ted Dunning" wrote: >This is true with Lucene as it stands. It would be much faster if there >were a specialized in-memory index such as is typically used with high >performance search engines. > >On Tue, Feb 7, 2012 at 9:50 PM, Lance Norskog wrote: > >> Experience has shown that it is much faster to run Solr with a small >> amount of memory and let the rest of the ram be used by the operating >> system "disk cache". That is, the OS is very good at keeping the right >> disk blocks in memory, much better than Solr. >> >> How much RAM is in the server and how much RAM does the JVM get? How >> big are the documents, and how large is the term index for your >> searches? How many documents do you get with each search? And, do you >> use filter queries- these are very powerful at limiting searches. >> >> 2012/2/7 James : >> > Is there any practice to load index into RAM to accelerate solr >> performance? >> > The over all documents is about 100 million. The search time around >> 100ms. I am seeking some method to accelerate the respond time for solr. >> > Just check that there is some practice use SSD disk. And SSD is also >> cost much, just want to know is there some method like to load the index >> file in RAM and keep the RAM index and disk index synchronized. Then I can >> search on the RAM index. >> >> >> >> -- >> Lance Norskog >> goks...@gmail.com >>
Documentation Slop (DisMax parser)
Hi: There seems to be an error in the documentation about the slop parameter ps used by the eDisMax parser. It reads: "This means that if the terms "foo" and "bar" appear in the document with less than 10 terms between each other, the phrase will match." Counterexample: "Foo one two three four five fix seven eight nine bar" will not match with ps=10 It seems that it must be "less than 9". However, when more query terms are used it gets complicated when one tries to count words in between. Easier to understand (and correct according to my testing) would be something like: "This means that if the terms "foo" and "bar" appear in the document within a group of 10 or less terms, the phrase will match. For example the doc that says: *Foo* term1 term2 term3 *bar* will match the phrase query. A document that says *Foo* term1 term2 term3 term4 term5 term6 term7 term8 term9 *bar* will not (because the search terms are within a group of 11 terms). Note: If any search term is a MUST-NOT term, the phrase slop query will never match. " Anybody willing to review and change to documentation? Thanks, James
RE: Solr Basic Configuration - Highlight - Begginer
Hi Evert, I recently needed help with phrase highlighting and was pointed to the FastVectorHighlighter which worked out great. I just made a change to the configuration to add generateWordParts="0" and generateNumberParts="0" so that searches for things like "1a" would get highlighted correctly. You may or may not need that feature. You can always remove them or change the value to "1" to switch them on explicitly. Anyway, hope this helps! solrconfig.xml (partial snip) xml explicit 10 documentText on text true 100 schema.xml (partial snip) -Teague From: Evert R. [mailto:evert.ra...@gmail.com] Sent: Tuesday, December 15, 2015 6:25 AM To: solr-user@lucene.apache.org Subject: Solr Basic Configuration - Highlight - Begginer Hi there! It´s my first installation, not sure if here is the right channel... Here is my steps: 1. Set up a basic install of solr 5.4.0 2. Create a new core through command line (bin/solr create -c test) 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/) 4. Query over the browser and it brings the correct search, but it does not show the part of the text I am querying, the highlight. I have already flagled the 'hl' option. But still it does not word... Exemple: I am looking for the word 'peace' in my pdf file (book) I have 4 matches for this word, it shows me the book name (pdf file) but does not bring which part of the text it has the word peace on it. I am problably missing some configuration in schema.xml, which is missing from my folder /solr/server/solr/test/conf/ Or even the solrconfig.xml... I have read a bunch of things about highlight check these files, copied the standard schema.xml to my core/conf folder, but still it does not bring the highlight. Attached a copy of my solrconfig.xml file. I am very sorry for this, probably, dumb and too basic question... First time I see solr in live. Any help will be appreciated. Best regards, Evert Ramos mailto:evert.ra...@gmail.com
RE: Solr Basic Configuration - Highlight - Begginer
Sorry to hear that didn't work! Let me ask a couple of questions... Have you tried the analyzer inside of the Admin Interface? It has helped me sort out a number of highlighting issues in the past. To access it, go to your Admin interface, select your core, then select Analysis from the list of options on the left. In the analyzer, enter the term you are indexing in the top left (in other words the term in the document you are indexing that you expect to get a hit on) and right input fields. Select the field that it is destined for (in your case that would be 'content'), then hit analyze. Helps if you have a big screen! This will show you the impact of the various filter factories that you have engaged and their effect on whether or not a 'hit' is being generated. Hits are idietified by a very feint highlight. (PSST... Developers... It would be really cool if the highlight color were more visible or customizable... Thanks y'all) If it looks like you're getting hits, but not getting highlighting, then open up a new tab with the Admin's query interface. Same place on the left as the analyzer. Replace the "*:*" with your search term (assuming you already indexed your document) and if necessary you can put something in the FQ like "id:123456" to target a specific record. Did you get a hit? If no, then it's not highlighting that's the issue. If yes, then try dumping this in your address bar (using your URL/IP, search term, and core name of course. The fq= is an example) : http://[URL/IP]/solr/[CORE-NAME]/select?fq=id:123456&q="[SEARCH-TERM]"; That will dump Solr's output to your browser where you can see exactly what is getting hit. Hope that helps! Let me know how it goes. Good luck. -Teague -Original Message- From: Evert R. [mailto:evert.ra...@gmail.com] Sent: Wednesday, December 16, 2015 1:46 PM To: solr-user Subject: Re: Solr Basic Configuration - Highlight - Begginer Hi Teague! I configured the solrconf.xml and schema.xml exactly the way you did, only substituting the word 'documentText' per 'content' used by the techproducts sample, I reindex through : curl ' http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1&commit=true' -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf" with the same result no highlight in the respond as below: "highlighting": { "pdf1": {} } =( Really... do not know what to do... Thanks for your time, if you have any more suggestion where I could be missing something... please let me know. Best regards, *Evert* 2015-12-16 15:30 GMT-02:00 Teague James : > Hi Evert, > > I recently needed help with phrase highlighting and was pointed to the > FastVectorHighlighter which worked out great. I just made a change to > the configuration to add generateWordParts="0" and > generateNumberParts="0" so that searches for things like "1a" would > get highlighted correctly. You may or may not need that feature. You > can always remove them or change the value to "1" to switch them on > explicitly. Anyway, hope this helps! > > solrconfig.xml (partial snip) > > > xml > explicit > 10 > documentText > on > text > true > 100 > > > > > > schema.xml (partial snip) > required="true" multiValued="false" /> > multivalued="true" termVectors="true" termOffsets="true" > termPositions="true" /> > > positionIncrementGap="100"> > > > words="stopwords.txt" /> > catenateAll="1" preserveOriginal="1" generateNumberParts="0" > generateWordParts="0" /> > synonyms="index_synonyms.txt" ignoreCase="true" expand="true"/> > > > > > > > catenateAll="1" preserveOriginal="1" generateWordParts="0" /> > words="stopwords.txt" /> > > > > > > -Teague > > From: Evert R. [mailto:evert.ra...@gmail.com] > Sent: Tuesday, December 15, 2015 6:25 AM > To: solr-user@lucene.apache.org > Subject: Solr Basic Configuration - Highlight - Begginer > > Hi there! > > It´s my f
RE: DIH Caching w/ BerkleyBackedCache
Todd, I have no idea if this will perform acceptable with so many multiple values. I doubt the solr/patch code was really optimized for such a use case. In my production environment, I have je-6.2.31.jar on the classpath. I don't think I've tried it with other versions. James Dyer Ingram Content Group -Original Message- From: Todd Long [mailto:lon...@gmail.com] Sent: Wednesday, December 16, 2015 10:21 AM To: solr-user@lucene.apache.org Subject: RE: DIH Caching w/ BerkleyBackedCache James, I apologize for the late response. Dyer, James-2 wrote > With the DIH request, are you specifying "cacheDeletePriorData=false" We are not specifying that property (it looks like it defaults to "false"). I'm actually seeing this issue when running a full clean/import. It appears that the Berkeley DB "cleaner" is always removing the oldest file once there are three. In this case, I'll see two 1GB files and then as the third file is being written (after ~200MB) the oldest 1GB file will fall off (i.e. get deleted). I'm only utilizing ~13% disk space at the time. I'm using Berkeley DB version 4.1.6 with Solr 4.8.1. I'm not specifying any other configuration properties other than what I mentioned before. I simply cannot figure out what is going on with the "cleaner" logic that would deem that file "lowest utilized". Any other Berkeley DB/system configuration I could consider that would affect this? It's possible that this caching simply might not be suitable for our data set where one document might contain a field with tens of thousands of values... maybe this is the bottleneck with using this database as every add copies in the prior data and then the "cleaner" removes the old stuff. Maybe it's working like it should but just incredibly slow... I can get a full index without caching in about two hours, however, when using this caching it was still running after 24 hours (still caching the sub-entity). Thanks again for the reply. Respectfully, Todd -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-Caching-w-BerkleyBackedCache-tp4240142p4245777.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Basic Configuration - Highlight - Begginer
is being matched (probably > > something like "text") and then try highlighting on _that_ field. Try > > adding "debug=query" to the URL and look at the "parsed_query" section > > of the return and you'll see what field(s) is/are actually being > > searched against. > > > > NOTE: The field you highlight on _must_ have stored="true" in schema.xml. > > > > As to why "nietava" isn't being found in the content field, probably > > you have some kind of analysis chain configured for that field that > > isn't searching as you expect. See the admin/analysis page for some > > insight into why that would be. The most frequent reason is that the > > field is a "string" type which is not broken up into words. Another > > possibility is that your analysis chain is leaving in the quotes or > > something similar. As James says, looking at admin/analysis is a good > > way to figure this out. > > > > I still strongly recommend you go from the stock techproducts example > > and get familiar with how Solr (and highlighting) work before jumping > > in and changing things. There are a number of ways things can be > > mis-configured and trying to change several things at once is a fine > > way to go mad. The admin UI>>schema browser is another way you can see > > what kind of terms are _actually_ in your index in a particular field. > > > > Best, > > Erick > > > > > > > > > > On Wed, Dec 16, 2015 at 12:26 PM, Teague James > > > wrote: > > > Sorry to hear that didn't work! Let me ask a couple of questions... > > > > > > Have you tried the analyzer inside of the Admin Interface? It has > helped > > me sort out a number of highlighting issues in the past. To access it, go > > to your Admin interface, select your core, then select Analysis from the > > list of options on the left. In the analyzer, enter the term you are > > indexing in the top left (in other words the term in the document you are > > indexing that you expect to get a hit on) and right input fields. Select > > the field that it is destined for (in your case that would be 'content'), > > then hit analyze. Helps if you have a big screen! > > > > > > This will show you the impact of the various filter factories that you > > have engaged and their effect on whether or not a 'hit' is being > generated. > > Hits are idietified by a very feint highlight. (PSST... Developers... It > > would be really cool if the highlight color were more visible or > > customizable... Thanks y'all) If it looks like you're getting hits, but > not > > getting highlighting, then open up a new tab with the Admin's query > > interface. Same place on the left as the analyzer. Replace the "*:*" with > > your search term (assuming you already indexed your document) and if > > necessary you can put something in the FQ like "id:123456" to target a > > specific record. > > > > > > Did you get a hit? If no, then it's not highlighting that's the issue. > > If yes, then try dumping this in your address bar (using your URL/IP, > > search term, and core name of course. The fq= is an example) : > > > http://[URL/IP]/solr/[CORE-NAME]/select?fq=id:123456&q="[SEARCH-TERM]"; > > > > > > That will dump Solr's output to your browser where you can see exactly > > what is getting hit. > > > > > > Hope that helps! Let me know how it goes. Good luck. > > > > > > -Teague > > > > > > -Original Message- > > > From: Evert R. [mailto:evert.ra...@gmail.com] > > > Sent: Wednesday, December 16, 2015 1:46 PM > > > To: solr-user > > > Subject: Re: Solr Basic Configuration - Highlight - Begginer > > > > > > Hi Teague! > > > > > > I configured the solrconf.xml and schema.xml exactly the way you did, > > only substituting the word 'documentText' per 'content' used by the > > techproducts sample, I reindex through : > > > > > > curl ' > > > > > > http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1&commit=true > > ' > > > -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf" > > > > > > with the same result no highlight in the respond as below: > > > > > > "highlighting": { "pdf1": {} } > > > > > > =( > > > > > >
RE: Spellcheck response format differs between a single core and SolrCloud
Ryan, The json response format changed for Solr 5.0. See https://issues.apache.org/jira/browse/SOLR-3029 . Is the single-core solr running a 4.x version with the cloud solr running 5.x ? If they are both on the same major version, then we have a bug. James Dyer Ingram Content Group -Original Message- From: Ryan Yacyshyn [mailto:ryan.yacys...@gmail.com] Sent: Monday, January 11, 2016 12:32 AM To: solr-user@lucene.apache.org Subject: Spellcheck response format differs between a single core and SolrCloud Hello, I am using the spellcheck component for spelling suggestions and I've used the same configurations in two separate projects, the only difference is one project uses a single core and the other is a collection on SolrCloud with three shards. The single core has about 56K docs and the one on SolrCloud has 1M docs. Strangely, the format of the response is slightly different between the two and I'm not sure why (particularly the collations part). Was wondering if any can shed some light on this? Below is my configuration and the results I'm getting. This is in my "/select" searchHandler: on false 5 2 5 true true 5 3 And my spellcheck component: default spelling solr.DirectSolrSpellChecker internal 0.5 2 1 5 4 0.01 Examples of each output can be found here: https://gist.github.com/ryac/ceff8da00ec9f5b84106 Thanks, Ryan
RE: How get around solr's spellcheck maxEdit limit of 2?
But if you really need more than 2 edits, I think IndexBasedSpellChecker supports it. James Dyer Ingram Content Group -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Thursday, January 21, 2016 11:29 AM To: solr-user Subject: Re: How get around solr's spellcheck maxEdit limit of 2? bq: ...is anyway to increase that maxEdit IIUC, increasing maxEdit beyond 2 increases the space/time required unacceptably, that limit is there on purpose, put there by people who know their stuff. Best, Erick On Thu, Jan 21, 2016 at 12:39 AM, Nitin Solanki wrote: > I am using Solr for spell Correction. Solr is limited to maxEdit of 2. Does > there is anyway to increase that maxEdit without using phonetic mapping ? > Please any suggestions
RE: How get around solr's spellcheck maxEdit limit of 2?
See the old docs at https://wiki.apache.org/solr/SpellCheckComponent#Configuration In particular, you need this line in solrconfig.xml: ./spellchecker James Dyer Ingram Content Group -Original Message- From: Nitin Solanki [mailto:nitinml...@gmail.com] Sent: Friday, January 22, 2016 11:20 AM To: solr-user@lucene.apache.org Subject: Re: How get around solr's spellcheck maxEdit limit of 2? Ok, But IndexBasedSpellChecker needs a directory where all indexes are stored to do spell check. I don't have any idea about IndexBasedSpellChecker. If you send me snap configuration of that. It will help me.. Thanks On Fri, Jan 22, 2016 at 1:45 AM Dyer, James wrote: > But if you really need more than 2 edits, I think IndexBasedSpellChecker > supports it. > > James Dyer > Ingram Content Group > > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Thursday, January 21, 2016 11:29 AM > To: solr-user > Subject: Re: How get around solr's spellcheck maxEdit limit of 2? > > bq: ...is anyway to increase that maxEdit > > IIUC, increasing maxEdit beyond 2 increases the space/time required > unacceptably, that limit is there on purpose, put there by people who > know their stuff. > > Best, > Erick > > On Thu, Jan 21, 2016 at 12:39 AM, Nitin Solanki > wrote: > > I am using Solr for spell Correction. Solr is limited to maxEdit of 2. > Does > > there is anyway to increase that maxEdit without using phonetic mapping ? > > Please any suggestions > >
unmerged index segments
Hi, I’ve have a large index that has been added to over several years, and I’ve discovered that I have many segments that haven’t been updated for well over a year, even though I’m adding, updating and deleting records daily. My five largest segments all haven’t been updated for over a year. Meanwhile, the number of segments I have keeps on increasing, and I have hundreds of segment files that don’t seem to be getting merged past a certain size (e.g. the largest is 2Gb but my older segments are over 100Gb). My understanding was that background merges should be merging these older segments with newer data over time, but this doesn’t seem to be the case. I’m using Solr 4.9, but I was using an older version at the time that these ‘older’ segments were created. Any help on suggestions of what’s happening would be very much appreciated. And also any suggestion on how I can monitor what’s happening with the background merges. Thanks, James
Re: unmerged index segments
Hi Jack, Sorry, I should have put them on my original message. All merge policy settings are at their default except mergeFactor, which I now notice is quite high at 45. Unfortunately I don’t have the full history to see when this setting was changed, but I do know they haven’t been changed for well over a year, and that we did originally run Solr using the default settings. So reading about mergeFactor it sounds like this is likely the problem, and we’re simply not asking Solr to merge into these old and large segments yet? If I was to change this back down to the default of 10, would you expect we’d get quite an immediate and intense period of merging? If I was to launch a dupliacate test Solr instance, change the merge factor, and simply leave it for a few days, would it perform the background merge (so I can test to see if there’s enough memory etc for the merge to complete?). Thanks, James > On 25 Jan 2016, at 21:39, Jack Krupansky wrote: > > What exacting are you merge policy settings in solrconfig? They control > when the background merges will be performed. Sometimes they do need to be > tweaked. > > -- Jack Krupansky > > On Mon, Jan 25, 2016 at 1:50 PM, James Mason > wrote: > >> Hi, >> >> I’ve have a large index that has been added to over several years, and >> I’ve discovered that I have many segments that haven’t been updated for >> well over a year, even though I’m adding, updating and deleting records >> daily. My five largest segments all haven’t been updated for over a year. >> >> Meanwhile, the number of segments I have keeps on increasing, and I have >> hundreds of segment files that don’t seem to be getting merged past a >> certain size (e.g. the largest is 2Gb but my older segments are over 100Gb). >> >> My understanding was that background merges should be merging these older >> segments with newer data over time, but this doesn’t seem to be the case. >> >> I’m using Solr 4.9, but I was using an older version at the time that >> these ‘older’ segments were created. >> >> Any help on suggestions of what’s happening would be very much >> appreciated. And also any suggestion on how I can monitor what’s happening >> with the background merges. >> >> Thanks, >> >> James
RE: Solr spell check mutliwords
Talha, In your configuration, you have this set: 5 ...which means it will consider the query "correctly spelled" and offer no suggestions if there are 5 or more results. You could omit this parameter and it will always suggest when possible. Possibly, a better option would be to add "spellcheck.collateParam.mm=100%" or "spellcheck.collateParam.q.op=100%", so when testing collations against the index, it will require all the terms to match something. See https://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collateParam.XX for more information. James Dyer Ingram Content Group -Original Message- From: talha [mailto:talh...@gmail.com] Sent: Wednesday, July 22, 2015 9:34 AM To: solr-user@lucene.apache.org Subject: Solr spell check mutliwords Could not figure out actual reason why my configured Solr spell checker not giving desire output. In my indexed data query: symphony+mobile has around 3.5K+ docs and spell checker detect it as correctly spelled. When i miss-spell "symphony" in query: symphony+mobile it showing only results for "mobile" and spell checker detect this query as correctly spelled. I have searched this query in different combination. Please find search result stat Query: symphony ResultFound: 1190 SpellChecker: correctly spelled Query: mobile ResultFound: 2850 SpellChecker: correctly spelled Query: simphony ResultFound: 0 SpellChecker: symphony Collation Hits: 1190 Query: symphony+mobile ResultFound: 3585 SpellChecker: correctly spelled Query: simphony+mobile ResultFound: 2850 SpellChecker: correctly spelled Query: symphony+mbile ResultFound: 1190 SpellChecker: correctly spelled In last two quries it should suggest something for miss-spelled word "simphony" and "mbile" Please find my configuration below. Only spell check configuration are given solrconfig.xml explicit 10 product_name on default wordbreak true 5 2 5 true true 5 3 spellcheck text_suggest default suggest solr.DirectSolrSpellChecker internal 0.5 wordbreak suggest solr.WordBreakSolrSpellChecker true true 10 5 schema.xml -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-spell-check-mutliwords-tp4218580.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr spell check not showing any suggestions for other language
Talha, Possibly this english-specific analysis in your "text_suggest" field is interfering: solr.EnglishPossessiveFilterFactory ? Another guess is you're receiving more than 5 results and "maxResultsForSuggest" is set to 5. But I'm not sure. Maybe someone can help with more information from you? Can you provide a few document examples that have Bangla text, then the full query request with a misspelled Bangla word (from the document examples you provide), then the full spellcheck response, and the total # of documents returned ? James Dyer Ingram Content Group -Original Message- From: talha [mailto:talh...@gmail.com] Sent: Wednesday, August 05, 2015 5:20 AM To: solr-user@lucene.apache.org Subject: Solr spell check not showing any suggestions for other language Solr spell check is not showing any suggestions for other language.I have indexed mutli-languages (english and bangla) in same core.It's showing suggestions for wrongly spelt english word but in case of wrongly spelt bangla word it showing "correctlySpelled = false" but not showing any suggestions for it. Please check my configuration for spell check below solrconfig.xml explicit 10 product_name on default wordbreak true 5 2 5 true true 5 3 spellcheck text_suggest default suggest solr.DirectSolrSpellChecker internal 0.5 wordbreak suggest solr.WordBreakSolrSpellChecker true true 10 5 schema.xml -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-spell-check-not-showing-any-suggestions-for-other-language-tp4220950.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr spell check not showing any suggestions for other language
Talha, Can you try putting your queried keyword in "spellcheck.q" ? James Dyer Ingram Content Group -Original Message- From: talha [mailto:talh...@gmail.com] Sent: Wednesday, August 05, 2015 10:13 AM To: solr-user@lucene.apache.org Subject: RE: Solr spell check not showing any suggestions for other language Dear James Thank you for your reply. I tested analyser without “solr.EnglishPossessiveFilterFactory” but still no luck. I also updated analyser please find this below. with above configuration for “text_sugggest” i got following results For Correct Bangla Word: সহজ Solr response is Note: i set rows to 0 to skip results 0 2 সহজ true 0 xml 1438787238383 true For an Incorrect Bangla Word: সহগ where i just changed last letter and Solr response is 0 7 সহগ true 0 xml 1438787208052 false -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-spell-check-not-showing-any-suggestions-for-other-language-tp4220950p4221033.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: exclude folder in dataimport handler.
I took a quick look at FileListEntityProcessor#init, and it looks like it applies the "excludes" regex to the filename element of the path only, and not to the directories. If your filenames do not have a naming convention that would let you use it this way, you might be able to write a transformer to get what you want. James Dyer Ingram Content Group -Original Message- From: coolmals [mailto:coolm...@gmail.com] Sent: Thursday, August 20, 2015 12:57 PM To: solr-user@lucene.apache.org Subject: exclude folder in dataimport handler. I am importing files from my file system and want to exclude import of files from folder called templatedata. How do i configure that in entity. excludes="templatedata" doesnt seem to work. -- View this message in context: http://lucene.472066.n3.nabble.com/exclude-folder-in-dataimport-handler-tp4224267.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Spellcheck / Suggestions : Append custom dictionary to SOLR default index
Max, If you know the entire list of words you want to spellcheck against, you can use FileBasedSpellChecker. See http://wiki.apache.org/solr/FileBasedSpellChecker . If, however, you have a field you want to spellcheck against but also want additional words added, consider using a copy of the field for spellcheck purposes, and then index the additional terms to that field. You may be able to accomplish this easily, for instance, by using index-time synonyms in the analysis chain for the spellcheck field. Or you could just append them to any document (more than once if you want to boost the term frequency). Keep in mind that while this will work fine for regular word-by-word spell suggestions, collations are not going to work well with these approaches. James Dyer Ingram Content Group -Original Message- From: Max Chadwick [mailto:mpchadw...@gmail.com] Sent: Monday, August 24, 2015 9:43 PM To: solr-user@lucene.apache.org Subject: Spellcheck / Suggestions : Append custom dictionary to SOLR default index Is there a way to append a set of words the the out-of-box solr index when using the spellcheck / suggestions feature?
RE: String index out of range exception from Spell check
This looks similar to SOLR-4489, which is marked fixed for version 4.5. If you're using an older version, the fix is to upgrade. Also see SOLR-3608, which is similar but here it seems as if the user's query is more than spellcheck was designed to handle. This should still be looked at and possibly we can come up with a way to handle these cases. A way to work around these bugs is to strip your query down to raw terms, separated by spaces, and use "spellcheck.q" with the raw terms only. James Dyer Ingram Content Group -Original Message- From: davidphilip cherian [mailto:davidphilipcher...@gmail.com] Sent: Sunday, September 27, 2015 3:50 PM To: solr-user@lucene.apache.org Subject: String index out of range exception from Spell check There are irregular exceptions from spell check component. Below is the stack trace. This is not common for all the q terms but have often seen them occurring for specific queries after enabling spellcheck.collate method. String index out of range: -3 java.lang.StringIndexOutOfBoundsException: String index out of range: -3 at java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:789) at java.lang.StringBuilder.replace(StringBuilder.java:266) at org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:235) at org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:92) at org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:230) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:197) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:226) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1976) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:497) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:722) 500
Re: highlighting
Hi everyone! Pardon if it's not proper etiquette to chime in, but that feature would solve some issues I have with my app for the same reason. We are using markers now and it is very clunky - particularly with phrases and certain special characters. I would love to see this feature too Mark! For what it's worth - up vote. Thanks! Cheers! -Teague James > On Oct 1, 2015, at 6:12 PM, Koji Sekiguchi > wrote: > > Hi Mark, > > I think I saw similar requirement recently in mailing list. The feature > sounds reasonable to me. > > > If not, how do I go about posting this as a feature request? > > JIRA can be used for the purpose, but there is no guarantee that the feature > is implemented. :( > > Koji > >> On 2015/10/01 20:07, Mark Fenbers wrote: >> Yeah, I thought about using markers, but then I'd have to search the the >> text for the markers to >> determine the locations. This is a clunky way of getting the results I >> want, and it would save two >> steps if Solr merely had an option to return a start/length array (of what >> should be highlighted) in >> the original string rather than returning an altered string with tags >> inserted. >> >> Mark >> >>> On 9/29/2015 7:04 AM, Upayavira wrote: >>> You can change the strings that are inserted into the text, and could >>> place markers that you use to identify the start/end of highlighting >>> elements. Does that work? >>> >>> Upayavira >>> >>>> On Mon, Sep 28, 2015, at 09:55 PM, Mark Fenbers wrote: >>>> Greetings! >>>> >>>> I have highlighting turned on in my Solr searches, but what I get back >>>> is tags surrounding the found term. Since I use a SWT StyledText >>>> widget to display my search results, what I really want is the offset >>>> and length of each found term, so that I can highlight it in my own way >>>> without HTML. Is there a way to configure Solr to do that? I couldn't >>>> find it. If not, how do I go about posting this as a feature request? >>>> >>>> Thanks, >>>> Mark >
RE: Spell Check and Privacy
Arnon, Use "spellcheck.collate=true" with "spellcheck.maxCollationTries" set to a non-zero value. This will give you re-written queries that are guaranteed to return hits, given the original query and filters. If you are using an "mm" value other than 100%, you also will want specify "spellcheck.collateParam.mm=100%". (or if using "q.op=OR", then use "spellcheck.collateParam.q.op=AND") Of course, the first section of the spellcheck result will still show every possible suggestion, so your client needs to discard these and not divulge them to the user. If you need to know word-by-word how the collations were constructed, then specify "spellcheck.collateExtendedResults=true". Use the extended collation results for this information and not the first section of the spellcheck results. This is all fairly well-documented on the old solr wiki: https://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate James Dyer Ingram Content Group -Original Message- From: Arnon Yogev [mailto:arn...@il.ibm.com] Sent: Monday, October 12, 2015 2:33 AM To: solr-user@lucene.apache.org Subject: Spell Check and Privacy Hi, Our system supports many users from different organizations and with different ACLs. We consider adding a spell check ("did you mean") functionality using DirectSolrSpellChecker. However, a privacy concern was raised, as this might lead to private information being revealed between users via the suggested terms. Using the FileBasedSpellChecker is another option, but naturally a static list of terms is not optimal. Is there a best practice or a suggested method for these kind of cases? Thanks, Arnon
RE: File-based Spelling
Mark, The older spellcheck implementations create an n-gram sidecar index, which is why you're seeing your name split into 2-grams like this. See the IR Book by Manning et al, section 3.3.4 for more information. Based on the results you're getting, I think it is loading your file correctly. You should now try a query against this spelling index, using words *not* in the file you loaded that are within 1 or 2 edits from something that is in the dictionary. If it doesn't yield suggestions, then post the relevant sections of the solrconfig.xml, schema.xml and also the query string you are trying. James Dyer Ingram Content Group -Original Message- From: Mark Fenbers [mailto:mark.fenb...@noaa.gov] Sent: Monday, October 12, 2015 2:38 PM To: Solr User Group Subject: File-based Spelling Greetings! I'm attempting to use a file-based spell checker. My sourceLocation is /usr/share/dict/linux.words, and my spellcheckIndexDir is set to ./data/spFile. BuildOnStartup is set to true, and I see nothing to suggest any sort of problem/error in solr.log. However, in my ./data/spFile/ directory, there are only two files: segments_2 with only 71 bytes in it, and a zero-byte write.lock file. For a source dictionary having 480,000 words in it, I was expecting a bit more substance in the ./data/spFile directory. Something doesn't seem right with this. Moreover, I ran a query on the word Fenbers, which isn't listed in the linux.words file, but there are several similar words. The results I got back were odd, and suggestions included the following: fenber f en be r f e nb er f en b er f e n be r f en b e r f e nb e r f e n b er f e n b e r But I expected suggestions like fenders, embers, and fenberry, etc. I also ran a query on Mark (which IS listed in linux.words) and got back two suggestions in a similar format. I played with configurables like changing the fieldType from text_en to string and the characterEncoding from UTF-8 to ASCII, etc., but nothing seemed to yield any different results. Can anyone offer suggestions as to what I'm doing wrong? I've been struggling with this for more than 40 hours now! I'm surprised my persistence has lasted this long! Thanks, Mark
RE: DIH parallel processing
Nabil, What we do is have multiple dih request handlers configured in solrconfig.xml. Then in the sql query we put something like "where mod(id, ${partition})=0". Then an external script calls a full import on each request handler at the same time and monitors the response. This isn't the most elegant solution but it gets around the fact that DIH is single-threaded. James Dyer Ingram Content Group -Original Message- From: nabil Kouici [mailto:koui...@yahoo.fr] Sent: Thursday, October 15, 2015 3:58 AM To: Solr-user Subject: DIH parallel processing Hi All, I'm using DIH to index more than 15M from Sql Server to Solr. This take more than 2 hours. Big amount of this time is consumed by data fetching from database. I'm thinking about a solution to have parallel (thread) loud in the same DIH. Each thread load a part of data. Do you have any experience with this kind of situation? Regards,Nabil.
RE: DIH Caching with Delta Import
The DIH Cache feature does not work with delta import. Actually, much of DIH does not work with delta import. The workaround you describe is similar to the approach described here: https://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport , which in my opinion is the best way to implement partial updates with DIH. James Dyer Ingram Content Group -Original Message- From: Todd Long [mailto:lon...@gmail.com] Sent: Tuesday, October 20, 2015 8:02 PM To: solr-user@lucene.apache.org Subject: DIH Caching with Delta Import It appears that DIH entity caching (e.g. SortedMapBackedCache) does not work with deltas... is this simply a bug with the DIH cache support or somehow by design? Any ideas on a workaround for this? Ideally, I could just omit the "cacheImpl" attribute but that leaves the query (using the default processor in my case) without the appropriate where clause including the "cacheKey" and "cacheLookup". Should SqlEntityProcessor be smart enough to ignore the cache with deltas and simply append a where clause which includes the "cacheKey" and "cacheLookup"? Or possibly just include a where clause which includes ('${dih.request.command}' = 'full-import' or cacheKey = cacheLookup)? I suppose those could be used to mitigate the issue but I was hoping for possibly a better solution. Any help would be greatly appreciated. Thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-Caching-with-Delta-Import-tp4235598.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: DIH Caching w/ BerkleyBackedCache
Todd, With the DIH request, are you specifying "cacheDeletePriorData=false". Looking at the BerkleyBackedCache code if this is set to true, it deletes the cache and assumes the current update is to fully repopulate it. If you want to do an incremental update to the cache, it needs to be false. You might also need to specify "clean=false", but I'm not sure if this is a requirement. I've used DIH with BerkleyBackedCache for a few years and it works well for us. But rather than using it inline, we have a number of DIH handlers that just build caches, then when they're all built, a final DIH joins data from the caches and indexes it to solr. We also do like you are, with several handlers running at once, each doing part of the data. But I have to warn you this code hasn't been maintained by anyone. I'm using an older DIH jar (4.6) with newer solr. I think there might have been an api change or something that prevented the uncommitted caching code from working with newer versions, but I honestly forget. This is probably a viable solution if you don't want to write any code, but it might take some trial and error getting it to work. James Dyer Ingram Content Group -Original Message- From: Todd Long [mailto:lon...@gmail.com] Sent: Tuesday, November 17, 2015 8:11 AM To: solr-user@lucene.apache.org Subject: Re: DIH Caching w/ BerkleyBackedCache Mikhail Khludnev wrote > It's worth to mention that for really complex relations scheme it might be > challenging to organize all of them into parallel ordered streams. This will most likely be the issue for us which is why I would like to have the Berkley cache solution to fall back on, if possible. Again, I'm not sure why but it appears that the Berkley cache is overwriting itself (i.e. cleaning up unused data) when building the database... I've read plenty of other threads where it appears folks are having success using that caching solution. Mikhail Khludnev wrote > threads... you said? Which ones? Declarative parallelization in > EntityProcessor worked only with certain 3.x version. We are running multiple DIH instances which query against specific partitions of the data (i.e. mod of the document id we're indexing). -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-Caching-w-BerkleyBackedCache-tp4240142p4240562.html Sent from the Solr - User mailing list archive at Nabble.com.
URL Encoding on Import
Hi everyone! Does anyone have any suggestions on how to URL encode URLs that I'm importing from SQL using the DIH? The importer pulls in something like "http://www.downloadsite.com/document that is being downloaded.doc" and then the Tika parser can't download the document because it ends up trying to access "http://www.downloadsite.com/document"; and gets a 404 error. What I need to do is transform the URL to "http://www.downloadsite.com/document%20that%20is%20being%20downloaded.doc"; I added a regex transformer to the DIH field, but I have not found a successful regex to accomplish this. Thoughts? Any advice would be appreciated! Thanks! -Teague
Help With Phrase Highlighting
Hello everyone, I am having difficulty enabling phrase highlighting and am hoping someone here can offer some help. This is what I have currently: Solr 4.9 solrconfig.xml (partial snip) xml explicit 10 text on text html 100 schema.xml (partial snip) Query (partial snip): ...select?fq=id:43040&q="my%20search%20phrase" Response (partial snip): ... ipsum dolor sit amet, pro ne verear prompta, sea te aeterno scripta assentior. (my search phrase facilitates highlighting). Et option molestiae referrentur ius. Viris quaeque legimus an pri The document in which this phrase is found is very long. If I reduce the document to a single sentence, such as "My search phrase facilitates highlighting" then the response I get from Solr is: My search phrase facilitates highlighting What I am trying to achieve instead, regardless of the document size is: My search phrase with a single indicator at the beginning and end rather than three separate words that may get dsitributed between two different snippets depending on the placement of the snippet in te larger document. I tried to follow this guide: http://stackoverflow.com/questions/25930180/solr-how-to-highlight-the-whole- search-phrase-only/25970452#25970452 but got zero results. I suspect that this is due to the hl parameters in my solrconfig file, but I cannot find any specific guidance on the correct parameters should be. I tried commenting out all of the hl parameters and also got no results. Can anyone offer any solutions for searching large documents and returning a single phrase highlight? -Teague
Re: Help With Phrase Highlighting
Hello, Thanks for replying! I tried using it in a query string, but without success. Should I add it to my solrconfig? If so, are there any other hl parameters that are necessary? -Teague > On Dec 1, 2015, at 9:01 PM, Philippe Soares wrote: > > Hi, > Did you try hl.mergeContiguous=true ? > > On Tue, Dec 1, 2015 at 3:36 PM, Teague James > wrote: > >> Hello everyone, >> >> I am having difficulty enabling phrase highlighting and am hoping someone >> here can offer some help. This is what I have currently: >> >> Solr 4.9 >> solrconfig.xml (partial snip) >> >> >>xml >>explicit >>10 >>text >>on >>text >>html >>100 >> >> >> >> >> >> schema.xml (partial snip) >> > required="true" multiValued="false" /> >> >> >> Query (partial snip): >> ...select?fq=id:43040&q="my%20search%20phrase" >> >> Response (partial snip): >> ... >> >> ipsum dolor sit amet, pro ne verear prompta, sea te aeterno scripta >> assentior. (my search >> >> >> phrase facilitates highlighting). Et option molestiae referrentur >> ius. Viris quaeque legimus an pri >> >> >> The document in which this phrase is found is very long. If I reduce the >> document to a single sentence, such as "My search phrase facilitates >> highlighting" then the response I get from Solr is: >> >> My search phrase facilitates highlighting >> >> >> What I am trying to achieve instead, regardless of the document size is: >> My search phrase with a single indicator at the beginning >> and end rather than three separate words that may get dsitributed between >> two different snippets depending on the placement of the snippet in te >> larger document. >> >> I tried to follow this guide: >> >> http://stackoverflow.com/questions/25930180/solr-how-to-highlight-the-whole- >> search-phrase-only/25970452#25970452 but got zero results. I suspect that >> this is due to the hl parameters in my solrconfig file, but I cannot find >> any specific guidance on the correct parameters should be. I tried >> commenting out all of the hl parameters and also got no results. >> >> Can anyone offer any solutions for searching large documents and returning >> a >> single phrase highlight? >> >> -Teague > > > -- > [image: GQ Life Sciences, Inc.] <http://www.gqlifesciences.com/>Philippe > Soares Senior Developer | [image: ☎] +1 508 599 3963 > GQ Life Sciences, Inc. www.gqlifesciences.comThis email message and any > attachments are confidential and may be privileged. If you are not the > intended recipient, please notify GQ Life Sciences immediately by > forwarding this message to le...@gqlifesciences.com and destroy all copies > of this message and any attachments without reading or disclosing their > contents.
Re: highlight
Hello, Thanks for replying! Yes, I am storing the whole document. The document is indexed with a unique id. There are only 3 fields in the schema - id, rawDocument, tikaDocument. Search uses the tikaDocument field. Against this I am throwing 2-5 word phrases and getting highlighting matches to each individual word in the phrases instead of just the phrase. The highlighted text that is matched is read by another application for display in the front end UI. Right now my app has logic to figure out that multiple highlights indicate a phrase, but it isn't perfect. In this case Solr is reporting a single 3 word phrase as 2 hits one with 2 of the phrase words, the other with 1 of the phrase words. This only happens in large documents where the multi word phrase appears across the boundary of one of the document fragments that Solr in analyzing (this is a hunch - I really don't know the mechanics for certain, but the next statement makes evident how I came to this conclusion). However if I make a one sentence document with the same multi word phrase, Solr will report 1 hit with all three words individually highlighted. At the very least I know Solr is getting the phrase correct. It is the method of highlighting (I'm trying to get one set of tags per phrase) and the occasional breaking of a single phrase into 2 hits. Given that setup, what do you recommend? I'm not sure I understand the approach you're describing. I appreciate the help! -Teague James > On Dec 2, 2015, at 10:09 AM, Rick Leir wrote: > > For performance, if you have many large documents, you want to index the > whole document but only store some identifiers. (Maybe this is not a > consideration for you, stop reading now ) > > If you are not storing the whole document, then Solr cannot do the > highlighting. You would get an id, then locate your source document (maybe > in your filesystem) and do highlighting yourself. > >> Can anyone offer any solutions for searching large documents and > returning a >> single phrase highlight?
RE: Help With Phrase Highlighting
Thanks everyone who replied! The FastVectorHighlighter did the trick. Here is how I configured it: In solrconfig.xml: In the requestHandler I added: on text true 100 In schema.xml: I modified the text field: I restarted Solr, re-indexed the documents and tested. All phrases are correctly highlighted as phrases! Thanks everyone! -Teague
RE: Spellcheck error
Matt, Can you give some information about how your spellcheck field is analyzed and also if you're using a custom query converter. Also, try and place the bare terms you want checked in spellcheck.q (ex, if your query is q=+movie +theatre, then spellcheck.q=movie theatre). Does it work in this case? Also, could you give the exact query you're using? This is the very same bug as in the 3 tickets you mention. We clearly haven't solved all of the possible ways this bug can be triggered. But we cannot fix this unless we can come up with a unit test that reliably reproduces it. At the very least, we should handle these problems better than throwing SIOOB like this. Long term, there is probably a better design we could come up with for how terms are identified within queries and how collations are generated. James Dyer Ingram Content Group -Original Message- From: Matt Pearce [mailto:m...@flax.co.uk] Sent: Thursday, December 03, 2015 10:40 AM To: solr-user Subject: Spellcheck error Hi, We're using Solr 5.3.1, and we're getting a StringIndexOutOfBoundsException from the SpellCheckCollator. I've done some investigation, and it looks like the problem is that the corrected string is shorter than the original query. For example, the search term is "theatre", the suggested correction is "there". The error is being thrown when replacing the original query with the shorter replacement. This is the stack trace: java.lang.StringIndexOutOfBoundsException: String index out of range: -2 at java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:824) at java.lang.StringBuilder.replace(StringBuilder.java:262) at org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:235) at org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:92) at org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:237) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:202) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277) The error looks very similar to those described in https://issues.apache.org/jira/browse/SOLR-4489, https://issues.apache.org/jira/browse/SOLR-3608 and https://issues.apache.org/jira/browse/SOLR-2509, most of which are closed. Any suggestions would be appreciated, or should I open a JIRA ticket? Thanks, Matt -- Matt Pearce Flax - Open Source Enterprise Search www.flax.co.uk
RE: Data Import Handler - Multivalued fields - splitBy
Brian, Be sure to have... transformer="RegexTransformer" ...in your tag. It’s the RegexTransformer class that looks for "splitBy". See https://wiki.apache.org/solr/DataImportHandler#RegexTransformer for more information. James Dyer Ingram Content Group -Original Message- From: Brian Narsi [mailto:bnars...@gmail.com] Sent: Friday, December 04, 2015 3:10 PM To: solr-user@lucene.apache.org Subject: Data Import Handler - Multivalued fields - splitBy I have the following: I believe I had the following working (splitting on pipe delimited) But it does not work now. In-fact now I have even tried But I cannot get the values to split into an array. Any thoughts/suggestions what may be wrong? Thanks,
fuzzy searches and EDISMAX
I am trying to build an edismax search handler that will allow a fuzzy search, using the "query fields" property (qf). I have two instances of SOLR 4.8.1, one of which has edismax "qf" configured with no fuzzy search ... ns_name^3.0 i_topic^3.0 i_object_type^3.0 ... And the other with a fuzzy search for ns_name (non-stemmed name) ns_name~1^3.0 i_topic^3.0 i_object_type^3.0 ... The index of both includes a record with an ns_name of 'Johnson' I get no return in either instance with the query q=Johnso I get the Johnson record returned in both instances with a query of q=Johnso~1 The SOLR documentation seems silent on incorporating fuzzy searches in the query fields. I have seen various posts on Google that suggest that 'qf' will accept fuzzy search declarations, other posts suggest only the query itself will allow fuzzy searches (as seems to be the case for me). Any guidance will be much appreciated Jim Jim Felley OCIO Smithsonian Institution fell...@si.edu
RE: spellcheck.count v/s spellcheck.alternativeTermCount
See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.count and the following section, for details. Briefly, "count" is the # of suggestions it will return for terms that are *not* in your index/dictionary. "alternativeTermCount" are the # of alternatives you want returned for terms that *are* in your dictionary. You can set them to the same value, unless you want fewer suggestions when the terms is in the dictionary. James Dyer Ingram Content Group -Original Message- From: Nitin Solanki [mailto:nitinml...@gmail.com] Sent: Tuesday, February 17, 2015 5:27 AM To: solr-user@lucene.apache.org Subject: spellcheck.count v/s spellcheck.alternativeTermCount Hello Everyone, I got confusion between spellcheck.count and spellcheck.alternativeTermCount in Solr. Any help in details?
RE: spellcheck.count v/s spellcheck.alternativeTermCount
Here is an example to illustrate what I mean... - query q=text:(life AND hope)&spellcheck.count=10&spellcheck.alternativeTermCount=5 - suppose at least one document in your dictionary field has "life" in it - also suppose zero documents in your dictionary field have "hope" in them - The spellchecker will try to return you up to 10 suggestions for "hope", but only up to 5 suggestions for "life" James Dyer Ingram Content Group -Original Message- From: Nitin Solanki [mailto:nitinml...@gmail.com] Sent: Tuesday, February 17, 2015 11:35 AM To: solr-user@lucene.apache.org Subject: Re: spellcheck.count v/s spellcheck.alternativeTermCount Hi James, How can you say that "count" doesn't use index/dictionary then from where suggestions come. On Tue, Feb 17, 2015 at 10:29 PM, Dyer, James wrote: > See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.count and > the following section, for details. > > Briefly, "count" is the # of suggestions it will return for terms that are > *not* in your index/dictionary. "alternativeTermCount" are the # of > alternatives you want returned for terms that *are* in your dictionary. > You can set them to the same value, unless you want fewer suggestions when > the terms is in the dictionary. > > James Dyer > Ingram Content Group > > -Original Message- > From: Nitin Solanki [mailto:nitinml...@gmail.com] > Sent: Tuesday, February 17, 2015 5:27 AM > To: solr-user@lucene.apache.org > Subject: spellcheck.count v/s spellcheck.alternativeTermCount > > Hello Everyone, > I got confusion between spellcheck.count and > spellcheck.alternativeTermCount in Solr. Any help in details? >
RE: spellcheck.count v/s spellcheck.alternativeTermCount
It will try to give you suggestions up to the number you specify, but if fewer are available it will not give you any more. James Dyer Ingram Content Group -Original Message- From: Nitin Solanki [mailto:nitinml...@gmail.com] Sent: Tuesday, February 17, 2015 11:40 PM To: solr-user@lucene.apache.org Subject: Re: spellcheck.count v/s spellcheck.alternativeTermCount Thanks James, I tried the same thing spellcheck.count=10&spellcheck.alternativeTermCount=5. And I got 5 suggestions of both "life" and "hope" but not like this * The spellchecker will try to return you up to 10 suggestions for "hope", but only up to 5 suggestions for "life". * On Wed, Feb 18, 2015 at 1:10 AM, Dyer, James wrote: > Here is an example to illustrate what I mean... > > - query q=text:(life AND > hope)&spellcheck.count=10&spellcheck.alternativeTermCount=5 > - suppose at least one document in your dictionary field has "life" in it > - also suppose zero documents in your dictionary field have "hope" in them > - The spellchecker will try to return you up to 10 suggestions for "hope", > but only up to 5 suggestions for "life" > > James Dyer > Ingram Content Group > > > -Original Message- > From: Nitin Solanki [mailto:nitinml...@gmail.com] > Sent: Tuesday, February 17, 2015 11:35 AM > To: solr-user@lucene.apache.org > Subject: Re: spellcheck.count v/s spellcheck.alternativeTermCount > > Hi James, > How can you say that "count" doesn't use > index/dictionary then from where suggestions come. > > On Tue, Feb 17, 2015 at 10:29 PM, Dyer, James < > james.d...@ingramcontent.com> > wrote: > > > See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.count and > > the following section, for details. > > > > Briefly, "count" is the # of suggestions it will return for terms that > are > > *not* in your index/dictionary. "alternativeTermCount" are the # of > > alternatives you want returned for terms that *are* in your dictionary. > > You can set them to the same value, unless you want fewer suggestions > when > > the terms is in the dictionary. > > > > James Dyer > > Ingram Content Group > > > > -Original Message- > > From: Nitin Solanki [mailto:nitinml...@gmail.com] > > Sent: Tuesday, February 17, 2015 5:27 AM > > To: solr-user@lucene.apache.org > > Subject: spellcheck.count v/s spellcheck.alternativeTermCount > > > > Hello Everyone, > > I got confusion between spellcheck.count and > > spellcheck.alternativeTermCount in Solr. Any help in details? > > >
RE: Why collations are coming even I set the value of spellcheck.count to zero(0)
I think when you set "count"/"alternativeTermCount" to zero, the defaults (10?) are used instead. Instead of setting these to zero, just use "spellcheck=false". These 2 parameters control suggestions, not collations. To turn off collations, set "spellcheck.collate=false". Also, I wouldn't set "maxCollationTries" as high as 100, as it could (sometimes) potentially check 100 possibly collations against the index and that would be very slow. James Dyer Ingram Content Group -Original Message- From: Nitin Solanki [mailto:nitinml...@gmail.com] Sent: Wednesday, February 18, 2015 2:37 AM To: solr-user@lucene.apache.org Subject: Why collations are coming even I set the value of spellcheck.count to zero(0) Hi Everyone, I have set the value of spellcheck.count = 0 and spellcheck.alternativeTermCount = 0. Even though collations are coming when I search any query which is misspelled. Why so? I also set the value of spellcheck.maxCollations = 100 and spellcheck.maxCollationTries = 100. What I know that collations are built on suggestions. So, Have I any misunderstanding about collation or any other configuration issue. Any help Please?
RE: Solr phonetics with spelling
Ashish, I would not recommend using spellcheck against a phonetic-analyzed field. Instead, you can use to create a separate field that is lightly analyzed and use the copy for spelling. James Dyer Ingram Content Group -Original Message- From: Ashish Mukherjee [mailto:ashish.mukher...@gmail.com] Sent: Tuesday, March 10, 2015 7:05 AM To: solr-user@lucene.apache.org Subject: Solr phonetics with spelling Hello, Couple of questions related to phonetics - 1. If I enable the phonetic filter in managed-schema file for a particular field, how does it affect the spell handler? 2. What is the meaning of the inject attribute within in managed-schema? The documentation is not very clear about it. Regards, Ashish
Re: Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?
Sorry, I've been a bit unfocused from this list for a bit. When I was working with the APTF code I rewrote a big chunk of it and didn't include the inclusion of the original tokens as I didn't need it at the time. That feature could easily be added back in. I will see if I can find a bit of time for that. As for the other part of your message, are you suggesting that the token indexes are not correct? There is a bit of a formatting issue with the text and I'm not sure what you're getting at. Can you explain further please? On Sun, Feb 8, 2015 at 3:04 PM, trhodesg wrote: > Thanks to everyone for the thought, time and effort put into > AutoPhrasingTokenFilter(APTF)! It's a real lifesaver. > While trying to add APTF to my indexing, i discovered that the original > (TS) > version throws an exception while indexing a 100MB PDF. The error > isException writing document to the index; possible analysis errorThe > modified (JS) version runs without error, but it removes the tokens used to > create the phrase. They are needed. > Before looking into this i have a question; Solr would normally tokenize > the > phrasethe peoples republic of china isasthe(1) peoples(2) republic(3) of(4) > china(5) is(6) > Defining the APTF phrase file asthe Solr admin analysis page reports that > the APTF indexer tokenizes the phrase asWould it be possible for someone to > explain the reasoning behind the discontinuous token numbering? As it is > now > phrase queries such as "republic of china" will fail. And i can't get > proximity queries like "republic of"~10 to work either (though it seems > they > should). Wouldn't it be more flexible to return the following > tokenizationThis allows spurious matches such as "peoples peoplesrepublic" > but it seems like this type of event would be very rare. It has the > advantage of allowing phrase queries to continue working the way most users > think. > Thank you for supporting more than one entity definition per phrase (ie > peoplesrepublic and peoplesrepublicofchina). This is type of contraction is > common in longer documents, especially when the first used phrase ends with > a preposition. It helps support robust matching. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Have-anyone-used-Automatic-Phrase-Tokenization-AutoPhrasingTokenFilterFactory-tp4173808p4184888.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?
I have an autophrase configured for 'wheel chair' and if I run analysis for 'super wheel chair awesome' such that it would index to 'super wheelchair awesome' this is how mine behaves: http://i.imgur.com/iR4IgGp.png When I did the implementation that is how I thought the positioning should work. Do you think it should be different? On Fri, Mar 20, 2015 at 11:10 AM, trhodesg wrote: > > > > > > Sorry, i can see my post is munged. > This seems to display it legibly > > > http://lucene.472066.n3.nabble.com/Have-anyone-used-Automatic-Phrase-Tokenization-AutoPhrasingTokenFilterFactory-td4173808.html > > I'm new to all this, so i hesitate to say the indexing isn't > correct. But my understanding is the query, "republic > of china", will only match > the indexing, republic(n) of(n+1) china(n+2) Since > the original APTF indexes this as republic(n) of(n+3) china(n+7) > that query will fail. Wouldn't it be more logical to leave the > original token numbering unchanged and just add the phrase token > with the same number as the last word in the matched series? > > BTW, i looked at your code re this. It is quite informative to a > newbie. Thanks! > > > On 3/19/2015 11:38 AM, James Strassburg [via Lucene] wrote: > > Sorry, I've been a bit unfocused from this list for a > bit. When I was > > working with the APTF code I rewrote a big chunk of it and didn't > include > > the inclusion of the original tokens as I didn't need it at the > time. That > > feature could easily be added back in. I will see if I can find a > bit of > > time for that. > > > As for the other part of your message, are you suggesting that the > token > > indexes are not correct? There is a bit of a formatting issue with > the text > > and I'm not sure what you're getting at. Can you explain further > please? > > > On Sun, Feb 8, 2015 at 3:04 PM, trhodesg < [hidden email] > > wrote: > > > > Thanks to everyone for the thought, time and effort put > into > > > AutoPhrasingTokenFilter(APTF)! It's a real lifesaver. > > > While trying to add APTF to my indexing, i discovered that > the original > > > (TS) > > > version throws an exception while indexing a 100MB PDF. The > error > > > isException writing document to the index; possible > analysis errorThe > > > modified (JS) version runs without error, but it removes > the tokens used to > > > create the phrase. They are needed. > > > Before looking into this i have a question; Solr would > normally tokenize > > > the > > > phrasethe peoples republic of china isasthe(1) peoples(2) > republic(3) of(4) > > > china(5) is(6) > > > Defining the APTF phrase file asthe Solr admin analysis > page reports that > > > the APTF indexer tokenizes the phrase asWould it be > possible for someone to > > > explain the reasoning behind the discontinuous token > numbering? As it is > > > now > > > phrase queries such as "republic of china" will fail. And i > can't get > > > proximity queries like "republic of"~10 to work either > (though it seems > > > they > > > should). Wouldn't it be more flexible to return the > following > > > tokenizationThis allows spurious matches such as "peoples > peoplesrepublic" > > > but it seems like this type of event would be very rare. It > has the > > > advantage of allowing phrase queries to continue working > the way most users > > > think. > > > Thank you for supporting more than one entity definition > per phrase (ie > > > peoplesrepublic and peoplesrepublicofchina). This is type > of contraction is > > > common in longer documents, especially when the first used > phrase ends with > > > a preposition. It helps support robust matching. > > > > > > > > > > > > -- > > > View this message in context: > > > > http://lucene.472066.n3.nabb
RE: using DirectSpellChecker and FileBasedSpellChecker with Solr 4.10.1
Elisabeth, Currently ConjunctionSolrSpellChecker only supports adding WordBreakSolrSpellchecker to IndexBased- FileBased- or DirectSolrSpellChecker. In the future, it would be great if it could handle other Spell Checker combinations. For instance, if you had a (e)dismax query that searches multiple fields, to have a separate spellchecker for each of them. But CSSC is not hardened for this more general usage, as hinted in the API doc. The check done to ensure all spellcheckers use the same stringdistance object, I believe, is a safeguard against using this class for functionality it is not able to correctly support. It looks to me that SOLR-6271 was opened to fix the bug in that it is comparing references on the stringdistance. This is not a problem with WBSSC because this one does not support string distance at all. What you're hoping for, however, is that the requirement for the string distances be the same to be removed entirely. You could try modifying the code by removing the check. However beware that you might not get the results you desire! But should this happen, please, go ahead and fix it for your use case and then donate the code. This is something I've personally wanted for a long time. James Dyer Ingram Content Group -Original Message- From: elisabeth benoit [mailto:elisaelisael...@gmail.com] Sent: Tuesday, April 14, 2015 7:37 AM To: solr-user@lucene.apache.org Subject: using DirectSpellChecker and FileBasedSpellChecker with Solr 4.10.1 Hello, I am using Solr 4.10.1 and trying to use DirectSolrSpellChecker and FileBasedSpellchecker in same request. I've applied change from patch 135.patch (cf Solr-6271). I've tried running the command "patch -p1 -i 135.patch --dry-run" but it didn't work, maybe because the patch was a fix to Solr 4.9, so I just replaced line in ConjunctionSolrSpellChecker else if (!stringDistance.equals(checker.getStringDistance())) { throw new IllegalArgumentException( "All checkers need to use the same StringDistance."); } by else if (!stringDistance.equals(checker.getStringDistance())) { throw new IllegalArgumentException( "All checkers need to use the same StringDistance!!! 1:" + checker.getStringDistance() + " 2: " + stringDistance); } as it was done in the patch but still, when I send a spellcheck request, I get the error msg": "All checkers need to use the same StringDistance!!! 1:org.apache.lucene.search.spell.LuceneLevenshteinDistance@15f57db32: org.apache.lucene.search.spell.LuceneLevenshteinDistance@280f7e08" From error message I gather both spellchecker use same distanceMeasure LuceneLevenshteinDistance, but they're not same instance of LuceneLevenshteinDistance. Is the condition all right? What should be done to fix this properly? Thanks, Elisabeth
Alternate ways to facet spatial data
Hello all, I've just started using SOLR for spatial queries and it looks great so far. I've mostly been investigating importing a large amount of point data, indexing and searching it. I've discovered the facet.heatmap functionality, which is great - but I would like to ask if it is possible to get slightly different results from this. Essentially rather than a heatmap I would like either a polygon per cluster (might be too much computation?) or a point per cluster (centroid would be great, centre of grid would be ok), coupled with the point count. Is this currently possible using faceting, or does it seem like a workable feature I could implement? Cheers, James Sewell, PostgreSQL Team Lead / Solutions Architect __ Level 2, 50 Queen St, Melbourne VIC 3000 *P *(+61) 3 8370 8000 *W* www.lisasoft.com *F *(+61) 3 8370 8099 -- James Sewell, PostgreSQL Team Lead / Solutions Architect __ Level 2, 50 Queen St, Melbourne VIC 3000 *P *(+61) 3 8370 8000 *W* www.lisasoft.com *F *(+61) 3 8370 8099 -- -- The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
SolrCloud No Active Slice
I'm having a config issue, I'm posting the error from Solrj which also includes the cluster state JSON: org.apache.solr.common.SolrException: No active slice servicing hash code 2ee4d125 in DocCollection(rfp365)={ "shards":{"shard1":{ "range":"-", "state":"active", "replicas":{ "core_node1":{ "state":"active", "core":"rfp365_shard1_replica1", "node_name":"172.31.58.150:8983_solr", "base_url":"http://172.31.58.150:8983/solr"}, "core_node2":{ "state":"active", "core":"rfp365_shard1_replica2", "node_name":"172.31.60.137:8983_solr", "base_url":"http://172.31.60.137:8983/solr"}, "core_node3":{ "state":"active", "core":"rfp365_shard1_replica3", "node_name":"172.31.58.65:8983_solr", "base_url":"http://172.31.58.65:8983/solr";, "leader":"true", "replicationFactor":"3", "router":{"name":"compositeId"}, "maxShardsPerNode":"1", "autoAddReplicas":"true"} at org.apache.solr.common.cloud.HashBasedRouter.hashToSlice(HashBasedRouter.java:65) at org.apache.solr.common.cloud.HashBasedRouter.getTargetSlice(HashBasedRouter.java:39) at org.apache.solr.client.solrj.request.UpdateRequest.getRoutes(UpdateRequest.java:206) at org.apache.solr.client.solrj.impl.CloudSolrClient.directUpdate(CloudSolrClient.java:581) at org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:948) at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:839) ... 6 more All nodes are active in the solr admin, not sure where to go from here. Thanks in advance! James
SolrCloud No Active Slice
I'm having a config issue, I'm posting the error from Solrj which also includes the cluster state JSON: org.apache.solr.common.SolrException: No active slice servicing hash code 2ee4d125 in DocCollection(rfp365)={ "shards":{"shard1":{ "range":"-", "state":"active", "replicas":{ "core_node1":{ "state":"active", "core":"rfp365_shard1_replica1", "node_name":"172.31.58.150:8983_solr", "base_url":"http://172.31.58.150:8983/solr"}, "core_node2":{ "state":"active", "core":"rfp365_shard1_replica2", "node_name":"172.31.60.137:8983_solr", "base_url":"http://172.31.60.137:8983/solr"}, "core_node3":{ "state":"active", "core":"rfp365_shard1_replica3", "node_name":"172.31.58.65:8983_solr", "base_url":"http://172.31.58.65:8983/solr";, "leader":"true", "replicationFactor":"3", "router":{"name":"compositeId"}, "maxShardsPerNode":"1", "autoAddReplicas":"true"} at org.apache.solr.common.cloud.HashBasedRouter.hashToSlice(HashBasedRouter.java:65) at org.apache.solr.common.cloud.HashBasedRouter.getTargetSlice(HashBasedRouter.java:39) at org.apache.solr.client.solrj.request.UpdateRequest.getRoutes(UpdateRequest.java:206) at org.apache.solr.client.solrj.impl.CloudSolrClient.directUpdate(CloudSolrClient.java:581) at org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:948) at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:839) ... 6 more All nodes are active in the solr admin, not sure where to go from here. Thanks in advance!
RE: Spell checking the synonym list?
Ryan, If you use index-time synonyms on the spellcheck field, this will give you what you want. For instance, if the document has "lawyer" and you index both terms "lawyer","attorney", then the spellchecker will see that "atorney" is 1 edit away from an indexed term and will suggest "attorney". You'll need to have the same synonyms set up against the query field, but you have the option of making these query-time synonyms if you prefer. James Dyer Ingram Content Group -Original Message- From: Ryan Yacyshyn [mailto:ryan.yacys...@gmail.com] Sent: Thursday, July 09, 2015 2:28 AM To: solr-user@lucene.apache.org Subject: Spell checking the synonym list? Hi all, I'm wondering if it's possible to have spell checking performed on terms in the synonym list? For example, let's say I have documents with the word "lawyer" in them and I add "lawyer, attorney" in the synonyms.txt file. Then a query is made for the word "atorney". Is there any way to provide spell checking on this? Thanks, Ryan
RE: Protwords in solr spellchecker
Kamal, Given the constraint that you cannot re-index the data, your best bet might be to simply filter out the suggestions at the application level, or maybe even have a proxy do it. Possibly another option, you might be able to extend DirectSolrSpellchecker and override #getSuggestions(), calling super(), then post-filtering out your stop words from the response. You'll want to request a few more terms so you're more likely to get results even if a term or two get filtered out. You can specify your custom spell checker in solrconfig.xml. James Dyer Ingram Content Group -Original Message- From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com] Sent: Friday, July 10, 2015 7:00 AM To: solr-user@lucene.apache.org Subject: Re: Protwords in solr spellchecker So let's try to analyse the situation from the spellchecking point of view . First of all we follow David suggestions and we add in the QueryTime analysis, the StopWordsFilter, with our configured "bad" words. *Starting scenario* - we have the protected words in our index, we still want them to be in there Let's explore the different kind of Spellcheckers available, where do they take the suggestions ? : *Index Based Spellchecker* The suggestions will come from an auxiliary index. *Direct Spellchecker* The suggestions will come from the current index. *File based spellchecker* It uses an external file to get the spelling suggestions from, so we can curate this file properly with only good words, and we are fine. But I guess you would like to use a blacklist, in this case we are going to have a white list. *Query Time* At query time *the query is analysed *and a token stream is provided. Then depending on the implementation we trigger a different lookup. In the case of the Direct Spellchecker, if I remember well : For each token a FST with all the supported inflections is generated and an intersection happen with the Index FST ( based on the field), and the suggestion is returned. Unfortunately a proper* query time analysis will not help .* When we analyse the query we have the misspelled word "sexe" that is not going to be recognised as the bad word. Then the inflections are calculated, the FST built and the intersection will actually produce the feared suggestion "sex" . This because the word is in the index. If we can't modify the index, the *Direct Spellcheck is not an option *if my understanding is correct. Let's see if the Index Based spellcheck can help … Unfortunately also in this case, the auxiliary index produced is based on the analysed form of the original field. If you really can not re-index content I would suggest you an implementation based on a concept similar to the AnalyzingSuggester in Solr. Open to clarify your further questions. 2015-07-10 9:31 GMT+01:00 davidphilip cherian : > Hi Kamal, > > Not necessarily. You can have different filters applied at index time and > query time. (note that the order in which filters are defined matters). You > could just add the stop filter at query time. > Have your own custom data type defined (similar to 'text_en' that will be > in schem.xml) and perhaps use standard/whitespace tokenizer followed by > stop filter at query time. > > Tip: Use analysis tool that is available in solr admin page to further > understand the analysis chain of data types. > > HTH > > > > On Fri, Jul 10, 2015 at 1:03 PM, Kamal Kishore Aggarwal < > kkroyal@gmail.com> wrote: > > > Hi David, > > > > This one is a good suggestion. But, if add these *adult* keywords in the > > stopwords.txt file, it will be requiring the re-indexing of these > keywords > > related data. > > > > How can I see the change instantly. Is there any other great suggestion > > that you can suggest me. > > > > > > > > > > On Thu, Jul 9, 2015 at 12:09 PM, davidphilip cherian < > > davidphilipcher...@gmail.com> wrote: > > > > > The best bet is to use solr.StopFilterFactory. > > > Have all such words added to stopwords.txt and add this filter to your > > > analyzer. > > > > > > Reference links > > > > > > > > > https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.StopFilterFactory > > > > > > > > > https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-StopFilter > > > > > > HTH > > > > > > > > > On Thu, Jul 9, 2015 at 11:50 AM, Kamal Kishore Aggarwal < > > > kkroyal@gmail.com> wrote: > > > > > > > Hi Team, > > > > > > > > I am currently working with Java-1.7, Solr-4.8.1 with tomcat 7. Is > > there &g
Solr M2M authentication on Jetty
Dear Solr Community, We would like to provide an in-house group of users access to our Solr database in a way that meets the following specifications: 1. Use the Jetty web service that Solr 6.0 installs by default. 2. Provide an M2M (machine-to-machine) interface, so a user can setup a cron job that periodically executes a query and stores the results. 3. Authentication credentials for the M2M interface to the Jetty service are provided by an LDAP service so it is possible to log who is accessing what data. 4. Result data retrieved from Solr (result UIDs) are recorded by SPLUNK. Can you offer advice and/or point us to a working example of any of these specification items? Here's what we have so far: A. Completed item 1 above. We've installed Solr 6.0 with Jetty on a Linux VM and it works great. B. Partially addressed item 3 above in that we can login to Jetty using LDAP. However, our implementation is such that the login credentials are input interactively (via a login dialog). We don't yet know how to perform this login from machine to machine. This is the main sticking point right now. Any insight you might provide would be greatly appreciated. Regards, Jim Gregoric Boston Children's Hospital, Clinical Research Informatics
RE: Solr M2M authentication on Jetty
Correction: Item 1 is not an absolute requirement; we can use Apache or Tomcat if that makes things any easier. -Original Message- From: Gregoric, James [mailto:james.grego...@childrens.harvard.edu] Sent: Wednesday, May 18, 2016 1:54 PM To: solr-user@lucene.apache.org Subject: Solr M2M authentication on Jetty Dear Solr Community, We would like to provide an in-house group of users access to our Solr database in a way that meets the following specifications: 1. Use the Jetty web service that Solr 6.0 installs by default. 2. Provide an M2M (machine-to-machine) interface, so a user can setup a cron job that periodically executes a query and stores the results. 3. Authentication credentials for the M2M interface to the Jetty service are provided by an LDAP service so it is possible to log who is accessing what data. 4. Result data retrieved from Solr (result UIDs) are recorded by SPLUNK. Can you offer advice and/or point us to a working example of any of these specification items? Here's what we have so far: A. Completed item 1 above. We've installed Solr 6.0 with Jetty on a Linux VM and it works great. B. Partially addressed item 3 above in that we can login to Jetty using LDAP. However, our implementation is such that the login credentials are input interactively (via a login dialog). We don't yet know how to perform this login from machine to machine. This is the main sticking point right now. Any insight you might provide would be greatly appreciated. Regards, Jim Gregoric Boston Children's Hospital, Clinical Research Informatics
Alternate Port Not Working for Solr 6.0.0
Hello, I am trying to install Solr 6.0.0 and have been successful with the default installation, following the instructions provided on the Apache Solr website. However, I do not want Solr running on port 8983, I want it to run on port 80. I started a new Ubuntu 14.04 VM, installed open JDK 8, then installed Solr with the following commands: Command: tar xzf solr-6.0.0.tgz solr-6.0.0/bin/install_solr_service.sh --strip-components=2 Response: None, which is good. Command: ./install_solr_service.sh solr-6.0.0.tgz -p 80 Response: Misplaced or Unknown flag -p So I tried... Command: ./install_solr_service.sh solr-6.0.0.tgz -i /opt -d /var/solr -u solr -s solr -p 80 Response: A dump of the log, which is INFO only with no errors or warnings, at the top of which is "Solr process 4831 from /var/solr/solr-80.pid not found" If I look in the /var/solr directory I find a file called solr-80.pid, but nothing else. What did I miss? Previous versions of Solr, which I deployed with Tomcat instead of Jetty, allowed me to control this in the server.xml file in /etc/tomcat7/, but obviously this no longer applies. I like the ease of the installation script; I just want to be able to control the port assignment. Any help is appreciated! Thanks! -Teague PS - Please resist the urge to ask me why I want it on port 80. I am well aware of the security implications, etc., but regardless I still need to make this operational on port 80. Cheers!
RE: Alternate Port Not Working for Solr 6.0.0
ssues - happy searching! IF I change the port assignment to 1001, same screen dump/failure to load as with port 80. IF I change the port assignment to 1250, no issues - happy searching! IF I change the port assignment to 1100, no issues - happy searching! IF I change the port assignment to 1050, no issues - happy searching! IF I change the port assignment to 1025, no issues - happy searching! IF I change the port assignment to 1015, same screen dump/failure to load as with port 80. IF I change the port assignment to 1020, same screen dump/failure to load as with port 80. IF I change the port assignment to 1021, same screen dump/failure to load as with port 80. IF I change the port assignment to 1022, same screen dump/failure to load as with port 80. IF I change the port assignment to 1023, same screen dump/failure to load as with port 80. IF I change the port assignment to 1024, no issues - happy searching! Based on the above, it appears that port 80 itself is not special, but rather that Solr does not play nice with any port below 1024. There may exist an upper limit, but I did not test for that since my goal was to assign the application to port 80. For the record, there are no other listeners listening to port 80. The only listeners are 53 for dnsmasq and 631 for cupsd on my system. Also, I have successfully run Solr on port 80 on all 2.x-4.9.1 installations. I never go around to upgrading to 5.x, so I do not know if there are issues with low ports and that version. Any insight as to why Solr 6.0.0 does not play nice with ports below 1024 would be appreciated. If this is a "feature" of the application, it'd be nice to see that in the documentation. Thanks Shawn! -Teague -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Tuesday, May 31, 2016 4:31 PM To: solr-user@lucene.apache.org Subject: Re: Alternate Port Not Working for Solr 6.0.0 On 5/31/2016 2:02 PM, Teague James wrote: > Hello, I am trying to install Solr 6.0.0 and have been successful with > the default installation, following the instructions provided on the > Apache Solr website. However, I do not want Solr running on port 8983, > I want it to run on port 80. I started a new Ubuntu 14.04 VM, > installed open JDK 8, then installed Solr with the following commands: > Command: tar xzf solr-6.0.0.tgz solr-6.0.0/bin/install_solr_service.sh > --strip-components=2 Response: None, which is good. Command: > ./install_solr_service.sh solr-6.0.0.tgz -p 80 Response: Misplaced or > Unknown flag -p So I tried... Command: ./install_solr_service.sh > solr-6.0.0.tgz -i /opt -d /var/solr -u solr -s solr -p 80 Response: A > dump of the log, which is INFO only with no errors or warnings, at the > top of which is "Solr process 4831 from /var/solr/solr-80.pid not > found" If I look in the /var/solr directory I find a file called > solr-80.pid, but nothing else. What did I miss? Previous versions of > Solr, which I deployed with Tomcat instead of Jetty, allowed me to > control this in the server.xml file in /etc/tomcat7/, but obviously > this no longer applies. I like the ease of the installation script; I > just want to be able to control the port assignment. Any help is > appreciated! Thanks! The port can be changed after install, although I have been also able to change the port during install with the -p parameter. Check /etc/default/solr.in.sh and look for a line setting SOLR_PORT. On my dev server, it looks like this: SOLR_PORT=8982 Before making any changes in that file, make sure that Solr is not running at all, or you may be forced to manually kill it. Thanks, Shawn
RE: Using Solr to index zip files
Hi I think you'll need to do some unzipping of your zip files using an unzip application before you post to Solr. If you do this via a OS level batch script you can apply logic there to deal with nested zips. Then post your unzipped files to Solr via Curl. James -Original Message- From: anupama.gangad...@daimler.com [mailto:anupama.gangad...@daimler.com] Sent: 07 June 2016 03:57 To: solr-user@lucene.apache.org Subject: Using Solr to index zip files Hi, I have an use case where I need to search zip files quickly in HDFS. I intend to use Solr but not finding any relevant information about whether it can be done for zip files. These are nested zip files i.e. zips within a zip file. Any help/information is much appreciated. Thank you, Regards, Anupama If you are not the addressee, please inform us immediately that you have received this e-mail by mistake, and delete it. We thank you for your support. Oxford University Press (UK) Disclaimer This message is confidential. You should not copy it or disclose its contents to anyone. You may use and apply the information for the intended purpose only. OUP does not accept legal responsibility for the contents of this message. Any views or opinions presented are those of the author only and not of OUP. If this email has come to you in error, please delete it, along with any attachments. Please note that OUP may intercept incoming and outgoing email communications.
RE: using spell check on phrases
Kaveh, If your query has "mm" set to zero or a low value, then you may want to override this when the spellchecker checks possible collations. For example: spellcheck.collateParam.mm=100% You may also want to consider adding "spellcheck.maxResultsForSuggest" to your query, so that it will return spelling suggestions even when the query returns some results. Also if you set "spellcheck.alternativeTermCount", then it will try to correct all of the query keywords, including those that exist in the dictionary. See https://cwiki.apache.org/confluence/display/solr/Spell+Checking for more information. James Dyer Ingram Content Group -Original Message- From: kaveh minooie [mailto:ka...@plutoz.com] Sent: Monday, June 06, 2016 8:19 PM To: solr-user@lucene.apache.org Subject: using spell check on phrases Hi everyone I am using solr 6 and DirectSolrSpellChecker, and edismax parser. the problem that I am having is that when the query is a phrase, every single word in the phrase need to be misspelled for the spell checker to gets activated and gives suggestions. if only one of the word is misspelled then it just says that spelling is correct: true I was wondering if anyone has encountered this situation before and knows how to solve it? thanks, -- Kaveh Minooie
RE: Solr 4.3.1 - Spell-Checker with MULTI-WORD PHRASE
You need to set the "spellcheck.maxCollationTries" parameter to a value greater than zero. The higher the value, the more queries it checks for hits, and the longer it could potentially take. See https://cwiki.apache.org/confluence/display/solr/Spell+Checking#SpellChecking-Thespellcheck.maxCollationTriesParameter James Dyer Ingram Content Group -Original Message- From: SRINI SOLR [mailto:srini.s...@gmail.com] Sent: Friday, July 22, 2016 12:05 PM To: solr-user@lucene.apache.org Subject: Re: Solr 4.3.1 - Spell-Checker with MULTI-WORD PHRASE Hi all - please help me here On Thursday, July 21, 2016, SRINI SOLR wrote: > Hi All - > Could you please help me on spell check on multi-word phrase as a whole... > Scenario - > I have a problem with solr spellcheck suggestions for multi word phrases. With the query for 'red chillies' > > q=red+chillies&wt=xml&indent=true&spellcheck=true&spellcheck.extendedResults=true&spellcheck.collate=true > > I get > > > > 2 > 4 > 12 > 0 > > chiller4 > challis2 > > > false > red chiller > > > The problem is, even though 'chiller' has 4 results in index, 'red chiller' has none. So we end up suggesting a phrase with 0 result. > > What can I do to make spellcheck work on the whole phrase only? > > Please help me here ...
Tutorial not working for me
I apologize if this is a really stupid question. I followed all instructions on installing Tutorial, got data loaded, everything works great until I try to query with a field name -- e.g., name:foundation. I get zero results from this or any other query which specifies a field name. Simple queries return results, and the field names are listed in those results correctly. But if I query using names that I know are there and values that I know are there, I get nothing. I figure this must be something basic that is not right about the way things have gotten set up, but I am completely blocked at this point. I tried blowing it all away and restarting from scratch with no luck. Where should I be looking for problems here? I am running this on a MacBook, OS X 10.9, latest JDK (1.8). James -- *James Pritchett* Leader, Process Redesign and Analysis __ *Learning Ally™*Together It’s Possible 20 Roszel Road | Princeton, NJ 08540 | Office: 609.243.7608 jpritch...@learningally.org www.LearningAlly.org <http://www.learningally.org/> Join us in building a community that helps blind, visually impaired & dyslexic students thrive. Connect with our community: *Facebook* <https://www.facebook.com/LearningAlly.org> | *Twitter* <https://twitter.com/Learning_Ally> | *LinkedIn* <https://www.linkedin.com/groups?home=&gid=2644842&trk=anet_ug_hm> | *Explore1in5* <http://www.explore1in5.org/> | *Instagram* <https://instagram.com/Learning_Ally/> | *Sign up for our community newsletter* <https://go.learningally.org/about-learning-ally/stay-in-touch/> Support us: *Donate* <https://go.learningally.org/about-learning-ally/donate/> | *Volunteer* <https://go.learningally.org/about-learning-ally/volunteers/how-you-can-help/>
Re: Tutorial not working for me
I am following the exact instructions in the tutorial: copy and pasting all commands & queries from the tutorial: https://lucene.apache.org/solr/quickstart.html. Where it breaks down is this one: http://localhost:8983/solr/gettingstarted/select?wt=json&indent=true&q=name:foundation This returns no results. Tried in the web admin view as well, also tried various field:value combinations to no avail. Clearly something didn't get configured correctly, but I saw no error messages when running all the data loads, etc. given in the tutorial. Sorry to be so clueless, but I don't really have anything to go on for troubleshooting besides asking dumb questions. James On Fri, Sep 16, 2016 at 11:24 AM, John Bickerstaff wrote: > Please share the exact query syntax? > > Are you using a collection you built or one of the examples? > > On Fri, Sep 16, 2016 at 9:06 AM, Pritchett, James < > jpritch...@learningally.org> wrote: > > > I apologize if this is a really stupid question. I followed all > > instructions on installing Tutorial, got data loaded, everything works > > great until I try to query with a field name -- e.g., name:foundation. I > > get zero results from this or any other query which specifies a field > name. > > Simple queries return results, and the field names are listed in those > > results correctly. But if I query using names that I know are there and > > values that I know are there, I get nothing. > > > > I figure this must be something basic that is not right about the way > > things have gotten set up, but I am completely blocked at this point. I > > tried blowing it all away and restarting from scratch with no luck. Where > > should I be looking for problems here? I am running this on a MacBook, > OS X > > 10.9, latest JDK (1.8). > > > > James > > > > -- > > > > > > *James Pritchett* > > > > Leader, Process Redesign and Analysis > > > > __ > > > > > > *Learning Ally™*Together It’s Possible > > 20 Roszel Road | Princeton, NJ 08540 | Office: 609.243.7608 > > > > jpritch...@learningally.org > > > > www.LearningAlly.org <http://www.learningally.org/> > > > > Join us in building a community that helps blind, visually impaired & > > dyslexic students thrive. > > > > Connect with our community: *Facebook* > > <https://www.facebook.com/LearningAlly.org> | *Twitter* > > <https://twitter.com/Learning_Ally> | *LinkedIn* > > <https://www.linkedin.com/groups?home=&gid=2644842&trk=anet_ug_hm> | > > *Explore1in5* <http://www.explore1in5.org/> | *Instagram* > > <https://instagram.com/Learning_Ally/> | *Sign up for our community > > newsletter* <https://go.learningally.org/about-learning-ally/stay-in- > > touch/> > > > > Support us: *Donate* > > <https://go.learningally.org/about-learning-ally/donate/> | *Volunteer* > > <https://go.learningally.org/about-learning-ally/ > > volunteers/how-you-can-help/> > > > -- *James Pritchett* Leader, Process Redesign and Analysis __ *Learning Ally™*Together It’s Possible 20 Roszel Road | Princeton, NJ 08540 | Office: 609.243.7608 jpritch...@learningally.org www.LearningAlly.org <http://www.learningally.org/> Join us in building a community that helps blind, visually impaired & dyslexic students thrive. Connect with our community: *Facebook* <https://www.facebook.com/LearningAlly.org> | *Twitter* <https://twitter.com/Learning_Ally> | *LinkedIn* <https://www.linkedin.com/groups?home=&gid=2644842&trk=anet_ug_hm> | *Explore1in5* <http://www.explore1in5.org/> | *Instagram* <https://instagram.com/Learning_Ally/> | *Sign up for our community newsletter* <https://go.learningally.org/about-learning-ally/stay-in-touch/> Support us: *Donate* <https://go.learningally.org/about-learning-ally/donate/> | *Volunteer* <https://go.learningally.org/about-learning-ally/volunteers/how-you-can-help/>
Re: Tutorial not working for me
r-core-6.2.0.jar -Dauto=yes -Dc=gettingstarted -Ddata=files org.apache.solr.util.SimplePostTool example/exampledocs/books.json SimplePostTool version 5.0.0 Posting files to [base] url http://localhost:8983/solr/gettingstarted/update... Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log POSTing file books.json (application/json) to [base]/json/docs 1 files indexed. COMMITting Solr index changes to http://localhost:8983/solr/gettingstarted/update... Time spent: 0:00:01.782 marplon:solr-6.2.0 jpritchett$ bin/post -c gettingstarted example/exampledocs/books.csv java -classpath /Users/jpritchett/solr-6.2.0/dist/solr-core-6.2.0.jar -Dauto=yes -Dc=gettingstarted -Ddata=files org.apache.solr.util.SimplePostTool example/exampledocs/books.csv SimplePostTool version 5.0.0 Posting files to [base] url http://localhost:8983/solr/gettingstarted/update... Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log POSTing file books.csv (text/csv) to [base] 1 files indexed. COMMITting Solr index changes to http://localhost:8983/solr/gettingstarted/update... Time spent: 0:00:00.204 marplon:solr-6.2.0 jpritchett$ curl " http://localhost:8983/solr/gettingstarted/select?wt=json&indent=true&q=foundation " { "responseHeader":{ "zkConnected":true, "status":0, "QTime":264, "params":{ "q":"foundation", "indent":"true", "wt":"json"}}, "response":{"numFound":4156,"start":0,"maxScore":0.098080166,"docs":[ { "id":"0553293354", "cat":["book"], "name":["Foundation"], "price":[7.99], "inStock":[false], "author":["Isaac Asimov"], "series_t":["Foundation Novels"], "sequence_i":1, "genre_s":"scifi", "_version_":1545646368061652992}, [etc.]] }} marplon:solr-6.2.0 jpritchett$ curl " http://localhost:8983/solr/gettingstarted/select?wt=json&indent=true&q=name:foundation " { "responseHeader":{ "zkConnected":true, "status":0, "QTime":47, "params":{ "q":"name:foundation", "indent":"true", "wt":"json"}}, "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[] }} marplon:solr-6.2.0 jpritchett$ On Fri, Sep 16, 2016 at 11:40 AM, Erick Erickson wrote: > My bet: > the fields (look in managed_schema or, possibly schema.xml) > has stored="true" and indexed="false" set for the fields > in question. > > Pretty much everyone takes a few passes before this really > makes sense. "stored" means you see the results returned, > "indexed" must be true before you can search on something. > > Second possibility: You've somehow indexed fields as > "string" type rather than one of the text based fieldTypes. > "string" types are not tokenized, thus a field with > "My dog has fleas" will fail to find "My". It'll even not match > "my dog has fleas" (note capital "M"). > > The admin UI>>select core>>analysis page will show you > lots of this kind of detail, although I admit it takes a bit to > understand all the info (do un-check the "verbose" button > for the nonce). > > Now, all that aside, please show us the field definition for > one of the fields in question and, as John mentions, the exact > query (I'd also ass &debugQuery=true to the results). > > Saying you followed the exact instructions somewhere isn't > really helpful. It's likely that there's something innocent-seeming > that was done differently. Giving the information asked for > will help us diagnose what's happening and, perhaps, > improve the docs if we can understand the mis-match. > > Best, > Erick > > On Fri, Sep 16, 2016 at 8:28 AM, Pritchett, James > wrote: > > I am following the exact instructions in the tutorial: copy and pasting > all > > commands & queries from the tutorial: > > https://lucene.apache.org/solr/quickstart.html. Where it breaks down is > > this one: > > > > http://localhost:8983/solr/gettingstarted/select?wt=json&; > indent=true&q=name:foundation > > > > This returns no results. Tried i
Re: Tutorial not working for me
Second possibility: You've somehow indexed fields as "string" type rather than one of the text based fieldTypes. "string" types are not tokenized, thus a field with "My dog has fleas" will fail to find "My". It'll even not match "my dog has fleas" (note capital "M"). That appears to be the issue. Searching for name:Foundation indeed returns the expected result. I will now go find some better entry point to SOLR than the tutorial, which has wasted enough of my time for one day. Any suggestions would be welcome. James On Fri, Sep 16, 2016 at 11:40 AM, Erick Erickson wrote: > My bet: > the fields (look in managed_schema or, possibly schema.xml) > has stored="true" and indexed="false" set for the fields > in question. > > Pretty much everyone takes a few passes before this really > makes sense. "stored" means you see the results returned, > "indexed" must be true before you can search on something. > > Second possibility: You've somehow indexed fields as > "string" type rather than one of the text based fieldTypes. > "string" types are not tokenized, thus a field with > "My dog has fleas" will fail to find "My". It'll even not match > "my dog has fleas" (note capital "M"). > > The admin UI>>select core>>analysis page will show you > lots of this kind of detail, although I admit it takes a bit to > understand all the info (do un-check the "verbose" button > for the nonce). > > Now, all that aside, please show us the field definition for > one of the fields in question and, as John mentions, the exact > query (I'd also ass &debugQuery=true to the results). > > Saying you followed the exact instructions somewhere isn't > really helpful. It's likely that there's something innocent-seeming > that was done differently. Giving the information asked for > will help us diagnose what's happening and, perhaps, > improve the docs if we can understand the mis-match. > > Best, > Erick > > On Fri, Sep 16, 2016 at 8:28 AM, Pritchett, James > wrote: > > I am following the exact instructions in the tutorial: copy and pasting > all > > commands & queries from the tutorial: > > https://lucene.apache.org/solr/quickstart.html. Where it breaks down is > > this one: > > > > http://localhost:8983/solr/gettingstarted/select?wt=json&; > indent=true&q=name:foundation > > > > This returns no results. Tried in the web admin view as well, also tried > > various field:value combinations to no avail. Clearly something didn't > get > > configured correctly, but I saw no error messages when running all the > data > > loads, etc. given in the tutorial. > > > > Sorry to be so clueless, but I don't really have anything to go on for > > troubleshooting besides asking dumb questions. > > > > James > > > > On Fri, Sep 16, 2016 at 11:24 AM, John Bickerstaff < > j...@johnbickerstaff.com > >> wrote: > > > >> Please share the exact query syntax? > >> > >> Are you using a collection you built or one of the examples? > >> > >> On Fri, Sep 16, 2016 at 9:06 AM, Pritchett, James < > >> jpritch...@learningally.org> wrote: > >> > >> > I apologize if this is a really stupid question. I followed all > >> > instructions on installing Tutorial, got data loaded, everything works > >> > great until I try to query with a field name -- e.g., > name:foundation. I > >> > get zero results from this or any other query which specifies a field > >> name. > >> > Simple queries return results, and the field names are listed in those > >> > results correctly. But if I query using names that I know are there > and > >> > values that I know are there, I get nothing. > >> > > >> > I figure this must be something basic that is not right about the way > >> > things have gotten set up, but I am completely blocked at this point. > I > >> > tried blowing it all away and restarting from scratch with no luck. > Where > >> > should I be looking for problems here? I am running this on a MacBook, > >> OS X > >> > 10.9, latest JDK (1.8). > >> > > >> > James > >> > > >> > -- > >> > > >> > > >> > *James Pritchett* > >> > > >> > Leader, Process Redesign and Analysis > >> > > >> > _
Re: Tutorial not working for me
Thanks for that. I totally get how it is with complicated, open source projects. And from experience, I realize that beginner-level documentation is really hard, especially with these kinds of projects: by the time you get to documentation, everybody involved is so expert in all the details that they can't imagine approaching from a blank slate. Thanks for the suggestions. Had to chuckle, though: one of your links ( quora.com) is the one that I started with. Step 1: "Download Solr, actually do the tutorial ..." Best wishes, James On Fri, Sep 16, 2016 at 1:41 PM, John Bickerstaff wrote: > I totally empathize about the sense of wasted time. On Solr in particular > I pulled my hair out for months - and I had access to people who had been > using it for over two years!!! > > For what it's worth - this is kind of how it goes with most open source > projects in my experience. It's painful - and - the more moving parts the > open source project has, the more painful the learning curve (usually)... > > But - the good news is that's why this list is here - we're all trying to > help each other, so feel free to ping the list sooner rather than later > when you're frustrated. My new rule is one hour of being blocked... I > used to wait days - but everyone on the list seems to really understand how > frustrating it is to be stuck and people have really taken time to help me > - so I'm less hesitant. And, of course, I try to pay it forward by > contributing as much as I can in the same way. > > On that note: I've been particularly focused on working with Solr in terms > of being able to keep upgrading simple by just replacing and re-indexing so > if you have questions on that space (Disaster Recovery, Zookeeper config, > etc) I may be able to help - and if you're looking for "plan" for building > and maintaining a simple solrCloud working model on Ubuntu VMs on > VirtualBox, I can *really* help you. > > Off the top of my head - some places to start: > > http://yonik.com/getting-started-with-solr/ > https://www.quora.com/What-is-the-best-way-to-learn-SOLR > http://blog.outerthoughts.com/2015/11/learning-solr-comprehensively/ > http://www.solr-start.com/ > > I think everyone responsible for those links is also a frequent "helper" on > this email forum. > > Also (and I'm aware it's a glass half-full thing which frequently irritates > me, but I'll say it anyway). Having run into this problem I'm willing to > wager you'll never forget this particular quirk and if you see the problem > in future, you'll know exactly what's wrong. It shouldn't have been > "wrong" with the example, but for my part at least - I've begun to think of > stuff like this as just part of the learning curve because it happens > nearly every time. > > Software is hard - complex projects like SOLR are hard. It's why we get > paid to do stuff like this. I'm actually getting paid pretty well right > now because Solr is recognized as difficult and I have (with many thanks to > this list) become known as someone who "knows Solr"... > > It *could* and *should* be better, but open source is what it is as a > result of the sum total of what everyone has contributed - and we're all > happy to help you as best we can. > > > > On Fri, Sep 16, 2016 at 11:13 AM, Pritchett, James < > jpritch...@learningally.org> wrote: > > > Second possibility: You've somehow indexed fields as > > "string" type rather than one of the text based fieldTypes. > > "string" types are not tokenized, thus a field with > > "My dog has fleas" will fail to find "My". It'll even not match > > "my dog has fleas" (note capital "M"). > > > > That appears to be the issue. Searching for name:Foundation indeed > returns > > the expected result. I will now go find some better entry point to SOLR > > than the tutorial, which has wasted enough of my time for one day. Any > > suggestions would be welcome. > > > > James > > > > On Fri, Sep 16, 2016 at 11:40 AM, Erick Erickson < > erickerick...@gmail.com> > > wrote: > > > > > My bet: > > > the fields (look in managed_schema or, possibly schema.xml) > > > has stored="true" and indexed="false" set for the fields > > > in question. > > > > > > Pretty much everyone takes a few passes before this really > > > makes sense. "stored" means you see the results returned, > > > "indexed" must be true before you can search on something. > >
Re: Tutorial not working for me
FWIW, my next step was to work with the movie example file, which worked perfectly and was a much, much better "getting started" intro. You could do worse than to build a new tutorial/getting started from this example. Dataset is way more fun, too -- a quality that should never be underestimated in a tutorial. James On Fri, Sep 16, 2016 at 8:34 PM, Chris Hostetter wrote: > > : I apologize if this is a really stupid question. I followed all > > It's not a stupid question, the tutorial is completley broken -- and for > that matter, in my opinion, the data_driven_schema_configs used by that > tutorial (and recommended for new users) are largely useless for the same > underlying reason... > > https://issues.apache.org/jira/browse/SOLR-9526 > > Thank you very much for asking about this - hopefully the folks who > understand this more (and don't share my opinion that the entire concept > of data_driven schemas are a terrible idea) can chime in and explain WTF > is going on here) > > > -Hoss > http://www.lucidworks.com/ > -- *James Pritchett* Leader, Process Redesign and Analysis __ *Learning Ally™*Together It’s Possible 20 Roszel Road | Princeton, NJ 08540 | Office: 609.243.7608 jpritch...@learningally.org www.LearningAlly.org <http://www.learningally.org/> Join us in building a community that helps blind, visually impaired & dyslexic students thrive. Connect with our community: *Facebook* <https://www.facebook.com/LearningAlly.org> | *Twitter* <https://twitter.com/Learning_Ally> | *LinkedIn* <https://www.linkedin.com/groups?home=&gid=2644842&trk=anet_ug_hm> | *Explore1in5* <http://www.explore1in5.org/> | *Instagram* <https://instagram.com/Learning_Ally/> | *Sign up for our community newsletter* <https://go.learningally.org/about-learning-ally/stay-in-touch/> Support us: *Donate* <https://go.learningally.org/about-learning-ally/donate/> | *Volunteer* <https://go.learningally.org/about-learning-ally/volunteers/how-you-can-help/>
Re: Tutorial not working for me
> > > > From your perspective as a new user, did you find it > anoying/frustrating/confusing that the README.txt in the films example > required/instructed you to first create a handful of fields using a curl > command to hit the Schema API before you could index any of the documents? > > https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;a= > blob;f=solr/example/films/README.txt > > > N o, I didn't find that to be a problem. In fact, in my view that's not a bug, that's a feature -- at least from my very limited experience, it seems like that kind of schema setup is probably pretty standard stuff when building a SOLR core, and so including it in the example teaches you something useful that you'll need to do pretty much right off the bat. I don't think that I did it via curl, though ... I must have used the admin interface, which was just simpler than copying and pasting that hairy-looking, multiline command into a terminal. If you used the films example as the basis for a tutorial and wrote it up in pretty HTML, you could include screenshots, etc. That would make it completely painless. James
Solr 6 Highlighting Not Working
Can someone please help me troubleshoot my Solr 6.0 highlighting issue? I have a production Solr 4.9.0 unit configured to highlight responses and it has worked for a long time now without issues. I have recently been testing Solr 6.0 and have been unable to get highlighting to work. I used my 4.9 configuration as a guide when configuring my 6.0 machine. Here are the primary configs: solrconfig.xml In my query requestHandler I have the following: on text html It is worth noting here that the documentation in the wiki says hl.simple.pre and hl.simple.post both accept the following: Using this config in 6.0 causes the core to malfunction at startup throwing an error that essentially says that an XML statement was not closed. I had to add the escaped characters just to get the solrconfig to load! Why? That isn't documented anywhere I looked. It makes me wonder if this is the source of the problems with highlighting since it works in my 4.9 implementation without escaping. Is there something wrong with 6's ability to parse XML? I upload documents using cURL: curl http://localhost:8983/solr/[CORENAME]/update?commit=true -H "Content-Type:text/xml" --data-binary '7518TEST02. This is the second test.' When I search using a browser: http://50.16.13.37:8983/solr/pp/query?indent=true&q=TEST04&wt=xml The response I get is: 7518 TEST02. This is the second test. TEST02. This is the second test. 1548827202660859904 2.2499826 Note that nothing appears in the highlight section. Why?
RE: CachedSqlEntityProcessor with delta-import
Sowmya, My memory is that the cache feature does not work with Delta Imports. In fact, I believe that nearly all DIH features except straight JDBC imports do not work with Delta Imports. My advice is to not use the Delta Import feature at all as the same result can (often more-efficiently) be accomplished following the approach outlined here: https://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport James Dyer Ingram Content Group -Original Message- From: Mohan, Sowmya [mailto:sowmya.mo...@icf.com] Sent: Tuesday, October 18, 2016 10:07 AM To: solr-user@lucene.apache.org Subject: CachedSqlEntityProcessor with delta-import Good morning, Can CachedSqlEntityProcessor be used with delta-import? In my setup when running a delta-import with CachedSqlEntityProcessor, the child entity values are not correctly updated for the parent record. I am on Solr 4.3. Has anyone experienced this and if so how to resolve it? Thanks, Sowmya.
Solr 6.0 Highlighting Not Working
Can someone please help me troubleshoot my Solr 6.0 highlighting issue? I have a production Solr 4.9.0 unit configured to highlight responses and it has worked for a long time now without issues. I have recently been testing Solr 6.0 and have been unable to get highlighting to work. I used my 4.9 configuration as a guide when configuring my 6.0 machine. Here are the primary configs: solrconfig.xml In my query requestHandler I have the following: on text html It is worth noting here that the documentation in the wiki says hl.simple.pre and hl.simple.post both accept the following: Using this config in 6.0 causes the core to malfunction at startup throwing an error that essentially says that an XML statement was not closed. I had to add the escaped characters just to get the solrconfig to load! Why? That isn't documented anywhere I looked. It makes me wonder if this is the source of the problems with highlighting since it works in my 4.9 implementation without escaping. Is there something wrong with 6's ability to parse XML? I upload documents using cURL: curl http://localhost:8983/solr/[CORENAME]/update?commit=true -H "Content-Type:text/xml" --data-binary '7518TEST02. This is the second test.' When I search using a browser: http://50.16.13.37:8983/solr/pp/query?indent=true&q=TEST04&wt=xml The response I get is: 7518 TEST02. This is the second test. TEST02. This is the second test. 1548827202660859904 2.2499826 Note that nothing appears in the highlight section. Why? Any help would be appreciated - thanks! -Teague
RE: Solr 6.0 Highlighting Not Working
Hi - Thanks for the reply, I'll give that a try. -Original Message- From: jimtronic [mailto:jimtro...@gmail.com] Sent: Monday, October 24, 2016 3:56 PM To: solr-user@lucene.apache.org Subject: Re: Solr 6.0 Highlighting Not Working Perhaps you need to wrap your inner "" and "" tags in the CDATA structure? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-6-0-Highlighting-Not-Working-tp43027 87p4302835.html Sent from the Solr - User mailing list archive at Nabble.com.
Two separate instances sharing the same zookeeper cluster
I have a staging and a production solr cluster. I'd like to have them use the same zookeeper cluster. It seems like it is possible if I can set a different directory for the second cluster. I've looked through the documentation though and I can't quite figure out where to set that up. As a result my staging cluster nodes keep trying to add themselves tot he production cluster. If someone could point me in the right direction? Jim K. -- Jim Keeney President, FitterWeb E: j...@fitterweb.com M: 703-568-5887 *FitterWeb Consulting* *Are you lean and agile enough? *
Re: Two separate instances sharing the same zookeeper cluster
Mike - Thank you, this was very helpful. I've doing some research and experimenting. As currently configured solr is launched as a service. I looked at the sol.in.sh file in /etc/default and we are running using a list of servers for the zookeeper cluster. so I think that is translated to -z zookeeper1,zookeeper2,zookeeper3 (these are defined in the hosts file) If I understand what I am reading setting a specific configset path would be done explicitly by adding the path to the end of the zookeeper server list: -z zookeeper1,zookeeper2,zookeeper3/solr_dev for example. However, I'm not sure how to switch the production cluster to explicitly reference the directory it currently uses. Do I need to setup the directory first? As per this? https://lucene.apache.org/solr/guide/6_6/taking-solr-to-production.html#TakingSolrtoProduction-ZooKeeperchroot Would I setup say solr_prod, upconfig all the configs, switch over one node and then migrate over the rest of the nodes , ending with the leader? Would that then move production to solr_prod as the config base? Once that is done I would then setup the dev. Does any of this make sense? Jim K. On Thu, Sep 14, 2017 at 4:08 PM Mike Drob wrote: > When you specify the zk string for a solr instance, you typically include a > chroot in it. I think the default is /solr, but it doesn't have to be, so > you should be able to run with -z zk1:2181/sorl-dev and /solr-prod > > > https://lucene.apache.org/solr/guide/6_6/setting-up-an-external-zookeeper-ensemble.html#SettingUpanExternalZooKeeperEnsemble-PointSolrattheinstance > > On Thu, Sep 14, 2017 at 3:01 PM, James Keeney > wrote: > > > I have a staging and a production solr cluster. I'd like to have them use > > the same zookeeper cluster. It seems like it is possible if I can set a > > different directory for the second cluster. I've looked through the > > documentation though and I can't quite figure out where to set that up. > As > > a result my staging cluster nodes keep trying to add themselves tot he > > production cluster. > > > > If someone could point me in the right direction? > > > > Jim K. > > -- > > Jim Keeney > > President, FitterWeb > > E: j...@fitterweb.com > > M: 703-568-5887 <(703)%20568-5887> > > > > *FitterWeb Consulting* > > *Are you lean and agile enough? * > > > -- Jim Keeney President, FitterWeb E: j...@fitterweb.com M: 703-568-5887 *FitterWeb Consulting* *Are you lean and agile enough? *
Quick quester about suggester component
I've setup the suggester and want to act on the full document when user selects one of the suggestions. Ideally it would be nice to be able to tell the suggester to return more than just the field that the suggestion index is built from. If that can't be done, then should I do the following: 1. Get the suggestions 2. When user selects one, take the suggestion term and do a search of the field that the suggester used to build it's index. Is that correct? Jim K. -- Jim Keeney President, FitterWeb E: j...@fitterweb.com M: 703-568-5887 *FitterWeb Consulting* *Are you lean and agile enough? *
Re: Quick quester about suggester component
Yep. Understood. On Tue, Oct 17, 2017, 8:14 PM Erick Erickson wrote: > Well, you tell the suggester what field to use in the first place in > the configuration. > > But I don't quite understand. Suggester is not _intended_ to return > documents. It returns, well, suggestions. It's up to you to do > something with them, i.e. substitute them into a new query (against > whatever fields you want) and send that query to Solr. The new query > you send can use the edismax parser to automatically search across > several fields and the like. > > Suggesters are not intended to automatically do another search if > that's what you're asking. > > Best, > Erick > > On Tue, Oct 17, 2017 at 10:49 AM, James Keeney > wrote: > > I've setup the suggester and want to act on the full document when user > > selects one of the suggestions. > > > > Ideally it would be nice to be able to tell the suggester to return more > > than just the field that the suggestion index is built from. > > > > If that can't be done, then should I do the following: > > > > > >1. Get the suggestions > >2. When user selects one, take the suggestion term and do a search of > >the field that the suggester used to build it's index. > > > > Is that correct? > > > > Jim K. > > -- > > Jim Keeney > > President, FitterWeb > > E: j...@fitterweb.com > > M: 703-568-5887 > > > > *FitterWeb Consulting* > > *Are you lean and agile enough? * > -- Jim Keeney President, FitterWeb E: j...@fitterweb.com M: 703-568-5887 *FitterWeb Consulting* *Are you lean and agile enough? *
Solr cloud inquiry
Hello folks, To start, we have a sharded solr cloud configuration running solr version 5.1.0 . During shard to shard communication there is a problem state where queries are sent to a replica, and on that replica the storage is inaccessible. The node is healthy so it’s still taking requests which get piled up waiting to read from disk resulting in a latency increase. We’ve tried resolving this storage inaccessibility but it appears related to AWS ebs issues. Has anyone encountered the same issue? thanks
Starting SolrCloud
Hello, I am very new to Solr, and I'm excited to get it up and running on amazon ec2 for some prototypical testing. So, I've installed solr (and java) on one ec2 instance, and I've installed zookeeper on another. After starting the zookeeper server on the default port of 2181, I run this on the solr instance: "opt/solr/bin/solr start -c -z ". us-west-2.compute.amazonaws.com/solr"", which seems to complete successfully: Archiving 1 old GC log files to /opt/solr/server/logs/archived Archiving 1 console log files to /opt/solr/server/logs/archived Rotating solr logs, keeping a max of 9 generations Waiting up to 180 seconds to see Solr running on port 8983 [|] Started Solr server on port 8983 (pid=13038). Happy searching! But then when I run "/opt/solr/bin/solr status", I get this output: Found 1 Solr nodes: Solr process 13038 running on port 8983 ERROR: Failed to get system information from http://localhost:8983/solr due to: org.apache.http.client.ClientProtocolException: Expected JSON response from server but received: Error 500 Server Error HTTP ERROR 500 Problem accessing /solr/admin/info/system. Reason: Server ErrorCaused by:org.apache.solr.common.SolrException: Error processing the request. CoreContainer is either not initialized or shutting down. at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:518) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572) at java.lang.Thread.run(Thread.java:745) Typically, this indicates a problem with the Solr server; check the Solr server logs for more information. I don't quite understand what things could be causing this problem, so I'm really at a loss at the moment. If you need any additional information, I'd be glad to provide it. Thanks for reading! James
Re: Starting SolrCloud
Hello, Thanks for reading this, but it has been resolved. I honestly don't know what was happening, but restarting my shell and running the exact same commands today instead of yesterday seems to have fixed it. Best, James On Mon, Nov 28, 2016 at 8:07 PM, James Muerle wrote: > Hello, > > I am very new to Solr, and I'm excited to get it up and running on amazon > ec2 for some prototypical testing. So, I've installed solr (and java) on > one ec2 instance, and I've installed zookeeper on another. After starting > the zookeeper server on the default port of 2181, I run this on the solr > instance: "opt/solr/bin/solr start -c -z ".us- > west-2.compute.amazonaws.com/solr"", which seems to complete successfully: > > Archiving 1 old GC log files to /opt/solr/server/logs/archived > Archiving 1 console log files to /opt/solr/server/logs/archived > Rotating solr logs, keeping a max of 9 generations > Waiting up to 180 seconds to see Solr running on port 8983 [|] > Started Solr server on port 8983 (pid=13038). Happy searching! > > But then when I run "/opt/solr/bin/solr status", I get this output: > > Found 1 Solr nodes: > > Solr process 13038 running on port 8983 > > ERROR: Failed to get system information from http://localhost:8983/solr > due to: org.apache.http.client.ClientProtocolException: Expected JSON > response from server but received: > > > Error 500 Server Error > > HTTP ERROR 500 > Problem accessing /solr/admin/info/system. Reason: > Server ErrorCaused > by:org.apache.solr.common.SolrException: > Error processing the request. CoreContainer is either not initialized or > shutting down. > at org.apache.solr.servlet.SolrDispatchFilter.doFilter( > SolrDispatchFilter.java:263) > at org.apache.solr.servlet.SolrDispatchFilter.doFilter( > SolrDispatchFilter.java:254) > at org.eclipse.jetty.servlet.ServletHandler$CachedChain. > doFilter(ServletHandler.java:1668) > at org.eclipse.jetty.servlet.ServletHandler.doHandle( > ServletHandler.java:581) > at org.eclipse.jetty.server.handler.ScopedHandler.handle( > ScopedHandler.java:143) > at org.eclipse.jetty.security.SecurityHandler.handle( > SecurityHandler.java:548) > at org.eclipse.jetty.server.session.SessionHandler. > doHandle(SessionHandler.java:226) > at org.eclipse.jetty.server.handler.ContextHandler. > doHandle(ContextHandler.java:1160) > at org.eclipse.jetty.servlet.ServletHandler.doScope( > ServletHandler.java:511) > at org.eclipse.jetty.server.session.SessionHandler. > doScope(SessionHandler.java:185) > at org.eclipse.jetty.server.handler.ContextHandler. > doScope(ContextHandler.java:1092) > at org.eclipse.jetty.server.handler.ScopedHandler.handle( > ScopedHandler.java:141) > at org.eclipse.jetty.server.handler.ContextHandlerCollection. > handle(ContextHandlerCollection.java:213) > at org.eclipse.jetty.server.handler.HandlerCollection. > handle(HandlerCollection.java:119) > at org.eclipse.jetty.server.handler.HandlerWrapper.handle( > HandlerWrapper.java:134) > at org.eclipse.jetty.server.Server.handle(Server.java:518) > at org.eclipse.jetty.server.HttpChannel.handle( > HttpChannel.java:308) > at org.eclipse.jetty.server.HttpConnection.onFillable( > HttpConnection.java:244) > at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded( > AbstractConnection.java:273) > at org.eclipse.jetty.io.FillInterest.fillable( > FillInterest.java:95) > at org.eclipse.jetty.io.SelectChannelEndPoint$2.run( > SelectChannelEndPoint.java:93) > at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume. > produceAndRun(ExecuteProduceConsume.java:246) > at org.eclipse.jetty.util.thread.strategy. > ExecuteProduceConsume.run(ExecuteProduceConsume.java:156) > at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob( > QueuedThreadPool.java:654) > at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run( > QueuedThreadPool.java:572) > at java.lang.Thread.run(Thread.java:745) > > > > > > Typically, this indicates a problem with the Solr server; check the Solr > server logs for more information. > > > I don't quite understand what things could be causing this problem, so I'm > really at a loss at the moment. If you need any additional information, I'd > be glad to provide it. > > Thanks for reading! > James >
Re: Regarding /sql -- WHERE <> IS NULL and IS NOT NULL
For NOT NULL, I had some success using: WHERE field_name <> '' (greater or less than empty quotes) Best regards, Gethin. From: Joel Bernstein Sent: 05 January 2017 20:12:19 To: solr-user@lucene.apache.org Subject: Re: Regarding /sql -- WHERE <> IS NULL and IS NOT NULL IS NULL and IS NOT NULL predicate are not currently supported. Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Jan 5, 2017 at 2:05 PM, radha krishnan wrote: > Hi, > > solr version : 6.3 > > will WHERE <> IS NULL / IS NOT NULL work with the /sql handler > ? > > " select name from gettingstarted where name is not null " > > the above query is not returning any documents in the response even if > there are documents with "name"defined > > > Thanks, > Radhakrishnan D >
RE: Can't get spelling suggestions to work properly
Jimi, Generally speaking, spellcheck does not work well against fields with stemming, or other "heavy" analysis. I would to a field that is tokenized on whitespace with little else, and use that field for spellcheck. By default, the spellchecker does not suggest for words in the index. So if the user misspells a word but the misspelling is actually some other word that is indexed, it will never suggest. You can orverride this behavior by specifying "spellcheck.alternativeTermCount" with a value >0. This is how many suggestions it should give for words that indeed exist in the index. This can be the same value as "spellcheck.count", but you may wish to set it to a lower value. I do not recommend using "spellcheck.onlyMorePopular". It is similar to "spellcheck.alternativeTermCount", but in my opinion, the later gives a better experience. You might also wish to set "spellcheck.maxResultsForSuggest". If you set this, then the spellchecker will not suggest anything if more results are returned than the value you specify. This is helpful in providing "did you mean"-style suggestions for queries that return few results. If you would like to ensure the suggestions combine nicely into a re-written query that returns results, then specify both "spellcheck.collate=true" and "spellcheck.maxCollationTries" to a value >0 (possibly 5-10). This will cause it to internally check the re-written queries (aka. Collations) and report back on how many results you get for each. If you are using "q.op=OR" or a low value for "mm", then you will likely want to override this with something like "spellcheck.collateParam.mm=0". Otherwise every combination will get reported as returning results. I hope this and other comments you've gotten helps demystify spellcheck configuration. I do agree it is fairly complicated and frustrating to get it just right. James Dyer Ingram Content Group -Original Message- From: jimi.hulleg...@svensktnaringsliv.se [mailto:jimi.hulleg...@svensktnaringsliv.se] Sent: Friday, January 13, 2017 5:16 AM To: solr-user@lucene.apache.org Subject: RE: Can't get spelling suggestions to work properly I just noticed why setting maxResultsForSuggest to a high value was not a good thing. Because now it show spelling suggestions even on correctly spelled words. I think, what I would need is the logic of SuggestMode. SUGGEST_WHEN_NOT_IN_INDEX, but with a configurable limit instead of it being hard coded to 0. Ie just as maxQueryFrequency works. /Jimi -Original Message- From: jimi.hulleg...@svensktnaringsliv.se [mailto:jimi.hulleg...@svensktnaringsliv.se] Sent: Friday, January 13, 2017 5:56 PM To: solr-user@lucene.apache.org Subject: RE: Can't get spelling suggestions to work properly Hi Alessandro, Thanks for your explanation. It helped a lot. Although setting "spellcheck.maxResultsForSuggest" to a value higher than zero was not enough. I also had to set "spellcheck.alternativeTermCount". With that done, I now get suggestions when searching for 'mycet' (a misspelling of the Swedish word 'mycket', that didn't return suggestions before). Although, I'm still not able to fully understand how to configure this properly. Because with this change there now are other misspelled searches that now longer gives suggestions. The problem here is stemming, I suspect. Because the main search fields use stemming, so that in some cases one can get lots of results for spellings that doesn't exist in the index at all (or, at least not in the spelling-field). How can I configure this component so that those suggestions are still included? Do I need to set maxResultsForSuggest to a really high number? Like Integer.MAX_VALUE? I feel that such a setting would defeat the purpose of that parameter, in a way. But I'm not sure how else to solve this. Also, there is one other things I wonder about the spelling suggestions, that you might have the answer to. Is there a way to make the logic case insensitive, but the presentation case sensitive? For example, a search for 'georg washington' now would return 'george washington' as a suggestion, but ' Georg Washington' would be even better. Regards /Jimi -Original Message- From: alessandro.benedetti [mailto:abenede...@apache.org] Sent: Thursday, January 12, 2017 5:14 PM To: solr-user@lucene.apache.org Subject: Re: Can't get spelling suggestions to work properly Hi Jimi, taking a look to the *maxQueryFrequency* param : Your understanding is correct. 1) we don't provide misspelled suggestions if we set the param to 1, and we have a minimum of 1 doc freq for the term . 2) we don't provide misspelled suggestions if the doc frequency of the term is greater t
RE: StringIndexOutOfBoundsException "in" SpellCheckCollator.getCollation
This sounds a lot like SOLR-4489. However it looks like this was fixed prior to you version (4.5). So it could be you found another case where this bug still exists. The other thing is the default Query Converter cannot handle all cases, and it could be the query you are sending is beyond its abilities? Even in this case, it'd be nice if it failed more gracefully than this. Could you provide the query parameters you are sending and also how you have spellcheck configured? James Dyer Ingram Content Group -Original Message- From: Clemens Wyss DEV [mailto:clemens...@mysign.ch] Sent: Thursday, January 05, 2017 8:22 AM To: 'solr-user@lucene.apache.org' Subject: StringIndexOutOfBoundsException "in" SpellCheckCollator.getCollation I am seeing many exceptions like this in my Solr [5.4.1] log: null:java.lang.StringIndexOutOfBoundsException: String index out of range: -2 at java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:824) at java.lang.StringBuilder.replace(StringBuilder.java:262) at org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:236) at org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:93) at org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:238) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:203) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:273) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2073) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658) ... at java.lang.Thread.run(Thread.java:745) What am I potentially facing here? Thx Clemens
Solr 6.0.0 Returns Blank Highlights for Certain Queries
Hello everyone! I have a Solr 6.0.0 instance that is storing documents peppered with text like "1a", "2e", "4c", etc. If I search the documents for a word, "ms", "in", "the", etc., I get the correct number of hits and the results are highlighted correctly in the highlighting section. But when I search for "1a" or "2e" I get hits, but the highlights are blank: Where "8667" is the document ID of the record that had the hit, but no highlight. Other searches, "ms" for example, return: MS Why does highlighting fail for "1a" type searches? Any help is appreciated! Thanks! -Teague James
Solr 6.0.0 Returns Blank Highlights for alpha-numeric combos
Hello everyone! I'm still stuck on this issue and could really use some help. I have a Solr 6.0.0 instance that is storing documents peppered with text like "1a", "2e", "4c", etc. If I search the documents for a word, "ms", "in", "the", etc., I get the correct number of hits and the results are highlighted correctly in the highlighting section. But when I search for "1a" or "2e" I get hits, but the highlights are blank. Further testing revealed that the highlighter fails to highlight any combination of alpha-numeric two character value, such a n0, b1, 1z, etc.: ... Where "8667" is the document ID of the record that had the hit, but no highlight. Other searches, "ms" for example, return: ... MS Why does highlighting fail for "1a" type searches? Any help is appreciated! Thanks! -Teague James
RE: Solr 6.0.0 Returns Blank Highlights for alpha-numeric combos
Hi Erick! Thanks for the reply. The goal is to get two character terms like 1a, 1b, 2a, 2b, 3a, etc. to get highlighted in the documents. Additional testing shows that any alpha-numeric combo returns a blank highlight, regardless of length. Thus, "pr0blem" will not highlight because of the zero in the middle of the term. I came across a ServerFault article where it was suggested that the fieldType must be tokenized in order for highlighting to work correctly. Setting the field type to text_general was suggested as a solution. In my case the data is stored as a string fieldType, which is then copied using copyField to a field that has a fieldType of text_general, but I'm still not getting a good highlight on terms like "1a". Highlighting works for any other non-alpha-numeric term though. Other articles pointed to termVectors and termOffsets, but none of these seemed to help. Here's my config: In the solrconfig file highlighting is set to use the text field: text Thoughts? Appreciate the help! Thanks! -Teague -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, February 1, 2017 2:49 PM To: solr-user Subject: Re: Solr 6.0.0 Returns Blank Highlights for alpha-numeric combos How far into the text field are these tokens? The highlighter defaults to the first 10K characters under control of hl.maxAnalyzedChars. It's vaguely possible that the values happen to be farther along in the text than that. Not likely, mind you but possible. Best, Erick On Wed, Feb 1, 2017 at 8:24 AM, Teague James wrote: > Hello everyone! I'm still stuck on this issue and could really use > some help. I have a Solr 6.0.0 instance that is storing documents > peppered with text like "1a", "2e", "4c", etc. If I search the > documents for a word, "ms", "in", "the", etc., I get the correct > number of hits and the results are highlighted correctly in the > highlighting section. But when I search for "1a" or "2e" I get hits, > but the highlights are blank. Further testing revealed that the > highlighter fails to highlight any combination of alpha-numeric two character > value, such a n0, b1, 1z, etc.: > ... > > > > Where "8667" is the document ID of the record that had the hit, but no > highlight. Other searches, "ms" for example, return: > ... > > > > > MS > > > > > > Why does highlighting fail for "1a" type searches? Any help is appreciated! > Thanks! > > -Teague James >
RE: spellcheck.q and local parameters
spellcheck.q is supposed to take a list of raw query terms, so what you're trying to do in your example won't work. What you should do instead is space-delimit the actual query terms that exist in "qq" and (nothing else) use that for your value of spellcheck.q . James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Jeroen Steggink [mailto:jeroen.stegg...@contentstrategy.nl] Sent: Monday, April 28, 2014 3:01 PM To: solr-user@lucene.apache.org Subject: spellcheck.q and local parameters Hi, I'm having some trouble using the spellcheck.q parameter. The user's query is defined in the qq parameter and q parameter contains several other parameters for boosting. I would like to use the qq parameter as a default for spellcheck.q. I tried several ways of adding the qq parameter in the spellcheck.q parameter, but it doesn't seem to work. Is this at all possible or do I need to write a custom QueryConverter? This is the configuration: _query_:"{!edismax qf=$qfQuery pf=$pfQuery bq=$boostQuery bf=$boostFunction v=$qq}" {!v=$qq} I haven't included all the variables, because they seem unnecessary. Regards, Jeroen
RE: solr 4.2.1 spellcheck strange results
To achieve what you want, you need to specify a lightly analyzed field (no stemming) for spellcheck. For instance, if your "solr.SpellCheckComponent" in solrconfig.xml is set up with "field" of "title_full", then try using "title_full_unstemmed". Also, if you are specifying a "queryAnalyzerFieldType", it should be the same as your unstemmed text field. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: HL [mailto:freemail.grha...@gmail.com] Sent: Saturday, May 10, 2014 9:12 AM To: solr-user@lucene.apache.org Subject: solr 4.2.1 spellcheck strange results Hi I am querying the solr server spellcheck and the results I get back although at first glance look ok it seems like solr is replying back as if it made the search with the wrong key. so while I query the server with the word "καρδυα" Solr is responding me as if it was querying the database with the word "καρδυ" eliminating the last char --- --- Ideally, Solr should properly indicate that the suggestions correspond with "καρδυα" rather than "καρδυ". Is there a way to make solr respond with the original search word from the query in it's responce, instead of the one that is getting the hits from ?? Regars, Harry here is the complete solr responce --- 0 23 true *,score 0 καρδυα καρδυα title_short^750 title_full_unstemmed^600 title_full^400 title^500 title_alt^200 title_new^100 series^50 series2^30 author^300 author_fuller^150 contents^10 topic_unstemmed^550 topic^500 geographic^300 genre^300 allfields_unstemmed^10 fulltext_unstemmed^10 allfields fulltext isbn issn basicSpell arrarr dismax xml 0 3 0 6 0 καρδ 5 καρδι 3 καρυ 1 false
RE: Spell check [or] Did you mean this with Phrase suggestion
Have you looked at "spellcheck.collate", which re-writes the entire query with one or more corrected words? See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate . There are several options shown at this link that controls how the "collate" feature works. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: vanitha venkatachalam [mailto:venkatachalam.vani...@gmail.com] Sent: Thursday, May 08, 2014 4:14 AM To: solr-user@lucene.apache.org Subject: Spell check [or] Did you mean this with Phrase suggestion Hi, We need a spell check component that suggest actual full phrase not just words. Say, we have list of brands : "Nike corporation", "Samsung electronics" , when I search for "tamsong", I like to get suggestions as "samsung electronics" ( full phrase ) not just "samsung" ( words) Please help. -- regards, Vanitha
RE: spellcheck if docsfound below threshold
Its "spellcheck.maxResultsForSuggest". http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.maxResultsForSuggest James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Jan Verweij - Reeleez [mailto:j...@reeleez.nl] Sent: Monday, May 12, 2014 2:12 AM To: solr-user@lucene.apache.org Subject: spellcheck if docsfound below threshold Hi, Is there a setting to only include spellcheck if the number of documents found is below a certain threshold? Or would we need to rerun the request with the spellcheck parameters based on the docs found? Kind regards, Jan Verweij
Re: overseer queue clogged
We’re seeing something similar to what Ryan reported, e.g. a massively clogged overseer queue that gets so bad it brings down our solr nodes. I tried “rmr”ing the entire /overseer/queue but it keeps returning with “Node does not exist: /overseer/queue/qn-0##”, after which in order to continue I have to create the node complained about and then execute the “rmr /overseer/queue” again, until it stumbled upon another node that doesn’t exist, rinse, wash, repeat… This is w/ Solr 4.7.1 and ZooKeeper 3.4.6 -- James Hardwick On Thursday, May 1, 2014 at 10:25 AM, Mark Miller wrote: > What version are you running? This was fixed in a recent release. It can > happen if you hit add core with the defaults on the admin page in older > versions. > > -- > Mark Miller > about.me/markrmiller (http://about.me/markrmiller) > > On May 1, 2014 at 11:19:54 AM, ryan.cooke (ryan.co...@gmail.com > (mailto:ryan.co...@gmail.com)) wrote: > > I saw an overseer queue clogged as well due to a bad message in the queue. > Unfortunately this went unnoticed for a while until there were 130K messages > in the overseer queue. Since it was a production system we were not able to > simply stop everything and delete all Zookeeper data, so we manually deleted > messages by issuing commands directly through the zkCli.sh (http://zkCli.sh) > tool. After all > the messages had been cleared, some nodes were in the wrong state (e.g. > 'down' when should have been 'active'). Restarting the 'down' or 'recovery > failed' nodes brought the whole cluster back to a stable and healthy state. > > Since it can take some digging to determine backlog in the overseer queue, > some of the symptoms we saw were: > Overseer throwing an exception like "Path must not end with / character" > Random nodes throwing an exception like "ClusterState says we are the > leader, but locally we don't think so" > Bringing up new replicas time out when attempting to fetch shard id > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/overseer-queue-clogged-tp4047878p4134129.html > > Sent from the Solr - User mailing list archive at Nabble.com > (http://Nabble.com). > >
RE: Wordbreak spellchecker excessive breaking.
You can do this if you set it up like in the mail Solr example: wordbreak solr.WordBreakSolrSpellChecker name true true 10 The "combineWords" and "breakWords" flags let you tell it which kind of workbreak correction you want. "maxChanges" controls the maximum number of words it can break 1 word into, or the maximum number of words it can combine. It is reasonable to set this to 1 or 2. The best way to use this is in conjunction with a "regular" spellchecker like DirectSolrSpellChecker. When used together with the collation functionality, it should take a query like "mob ile" and depending on what actually returns results from your data, suggest either "mobile" or perhaps "mob lie" or both. The one thing is cannot do is fix a transposition or misspelling and combine or break words in one shot. That is, it cannot detect that "mob lie" should become "mobile". James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: S.L [mailto:simpleliving...@gmail.com] Sent: Saturday, May 24, 2014 4:21 PM To: solr-user@lucene.apache.org Subject: Wordbreak spellchecker excessive breaking. I am using Solr wordbreak spellchecker and the issue is that when I search for a term like "mob ile" expecting that the wordbreak spellchecker would actually resutn a suggestion for "mobile" it breaks the search term into letters like "m o b" I have two issues with this behavior. 1. How can I make Solr combine "mob ile" to mobile? 2. Not withstanding the fact that my search term "mob ile" is being broken incorrectly into individual letters , I realize that the wordbreak is needed in certain cases, how do I control the wordbreak so that it does not break it into letters like "m o b" which seems like excessive breaking to me ? Thanks.
How to Get Highlighting Working in Velocity (Solr 4.8.0)
My Solr 4.8.0 index includes a field called 'dom_title'. The field is displayed in the result set. I want to be able to highlight keywords from this field in the displayed results. I have tried configuring solrconfig.xml and I have tried adding parameters to the query "&hl=true&hl.fl=dom_title" but the searched keyword never gets highlighted in the results. I am attempting to use the Velocity Browse interface to demonstrate this. Most of the configuration is right out of the box, except for the fields in the schema. >From my solrconfig.xml: explicit velocity browse layout on dom_title html I omitted a lot of basic query settings and facet field info from this snippet to focus on the highlighting component. What am I missing? -Teague
RE: Wordbreak spellchecker excessive breaking.
I am not sure why changing spellcheck parameters would prevent your server from restarting. One thing to check is to see if you have warming queries running that involve spellcheck. I think I remember from long ago there was (maybe still is) an obscure bug where sometimes it will lock up in rare cases when spellcheck is used in warming queries. I do not remember exactly what caused this or if it was ever fixed. Besides that, you might want to post a stack trace or describe what happens when it doesn't restart. Perhaps someone here will know what the problem is. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: S.L [mailto:simpleliving...@gmail.com] Sent: Friday, May 30, 2014 12:36 AM To: solr-user@lucene.apache.org Subject: Re: Wordbreak spellchecker excessive breaking. James, Thanks for clearly stating this , I was not able to find this documented anywhere, yes I am using it with another spell checker (Direct) with the collation on. I will try the maxChangtes and let you know. On a side note , whenever I change the spellchecker parameter , I need to rebuild the index and delete the solr data directory before that as my Tomcat instance would not even start, can you let me know why ? Thanks. On Tue, May 27, 2014 at 12:21 PM, Dyer, James wrote: > You can do this if you set it up like in the mail Solr example: > > > wordbreak > solr.WordBreakSolrSpellChecker > name > true > true > 10 > > > The "combineWords" and "breakWords" flags let you tell it which kind of > workbreak correction you want. "maxChanges" controls the maximum number of > words it can break 1 word into, or the maximum number of words it can > combine. It is reasonable to set this to 1 or 2. > > The best way to use this is in conjunction with a "regular" spellchecker > like DirectSolrSpellChecker. When used together with the collation > functionality, it should take a query like "mob ile" and depending on what > actually returns results from your data, suggest either "mobile" or perhaps > "mob lie" or both. The one thing is cannot do is fix a transposition or > misspelling and combine or break words in one shot. That is, it cannot > detect that "mob lie" should become "mobile". > > James Dyer > Ingram Content Group > (615) 213-4311 > > > -Original Message- > From: S.L [mailto:simpleliving...@gmail.com] > Sent: Saturday, May 24, 2014 4:21 PM > To: solr-user@lucene.apache.org > Subject: Wordbreak spellchecker excessive breaking. > > I am using Solr wordbreak spellchecker and the issue is that when I search > for a term like "mob ile" expecting that the wordbreak spellchecker would > actually resutn a suggestion for "mobile" it breaks the search term into > letters like "m o b" I have two issues with this behavior. > > 1. How can I make Solr combine "mob ile" to mobile? > 2. Not withstanding the fact that my search term "mob ile" is being broken > incorrectly into individual letters , I realize that the wordbreak is > needed in certain cases, how do I control the wordbreak so that it does not > break it into letters like "m o b" which seems like excessive breaking to > me ? > > Thanks. >
RE: DirectSpellChecker not returning expected suggestions.
If "wrangle" is not in your index, and if it is within the max # of edits, then it should suggest it. Are you getting anything back from spellcheck at all? What is the exact query you are using? How is the spellcheck field analyzed? If you're using stemming, then "wrangle" and "wrangler" might be stemmed to the same word. (by the way, you shouldn't spellcheck against a stemmed or otherwise heavily-analyzed field). James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: S.L [mailto:simpleliving...@gmail.com] Sent: Monday, June 02, 2014 1:06 PM To: solr-user@lucene.apache.org Subject: Re: DirectSpellChecker not returning expected suggestions. OK, I just realized that "wrangle" is a proper english word, probably thats why I dont get a suggestion for "wrangler" in this case. How ever in my test index there is no "wrangle" present , so even though this is a proper english word , since there is no occurence of it in the index should'nt Solr suggest me "wrangler" ? On Mon, Jun 2, 2014 at 2:00 PM, S.L wrote: > I do not get any suggestion (when I search for "wrangle") , however I > correctly get the suggestion wrangler when I search for wranglr , I am > using the Direct and WordBreak spellcheckers in combination, I have not > tried using anything else. > > Is the distance calculation of Solr different than what Levestien distance > calculation ? I have set maxEdits to 1 , assuming that this corresponds to > the maxDistance. > > Thanks for your help! > > > On Mon, Jun 2, 2014 at 1:54 PM, david.w.smi...@gmail.com < > david.w.smi...@gmail.com> wrote: > >> What do you get then? Suggestions, but not the one you’re looking for, or >> is it deemed correctly spelled? >> >> Have you tried another spellChecker impl, for troubleshooting purposes? >> >> ~ David Smiley >> Freelance Apache Lucene/Solr Search Consultant/Developer >> http://www.linkedin.com/in/davidwsmiley >> >> >> On Sat, May 31, 2014 at 12:33 AM, S.L wrote: >> >> > Hi All, >> > >> > I have a small test index of 400 documents , it happens to have an entry >> > for "wrangler", When I search for "wranglr", I correctly get the >> collation >> > suggestion as "wrangler", however when I search for "wrangle" , I do not >> > get a suggestion for "wrangler". >> > >> > The Levenstien distance between wrangle --> wrangler is same as the >> > Levestien distance between wranglr-->wrangler , I am just wondering why >> I >> > do not get a suggestion for wrangle. >> > >> > Below is my Direct spell checker configuration. >> > >> > >> > direct >> > suggestAggregate >> > solr.DirectSolrSpellChecker >> > >> > internal >> > score >> > >> > >> > 0.7 >> > >> > 1 >> > >> > 3 >> > >> > 5 >> > >> > 4 >> > >> > 0.01 >> > >> > >> > >> > >> > >
RE: Solr spellcheck - onlyMorePopular threshold?
I believe it will return the terms that are most similar to the queried terms but have a greater term frequency than the queried terms. It doesn't actually care what the term frequencies are, only that they are greater than the frequencies of the terms you queried on. I do not know your use case, but you may want to consider using "spellcheck.alternativeTermCount" instead of "onlyMorePopular". See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount and https://issues.apache.org/jira/browse/SOLR-2585?focusedCommentId=13096153&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13096153 for why. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Alistair [mailto:ali...@gmail.com] Sent: Monday, June 09, 2014 3:06 AM To: solr-user@lucene.apache.org Subject: Solr spellcheck - onlyMorePopular threshold? Hello all, I was wondering what does the "onlyMorePopular" option for spellchecking use as its threshold? Will it always pick the suggestion that returns the most queries or does it base its result based off of some threshold that can be configured? Thanks! Ali. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-spellcheck-onlyMorePopular-threshold-tp4140727.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Highlighting not working
Vicky, I resolved this by making sure that the field that is searched has "stored=true". By default "text" is searched, which is the destination of the copyFields and is not stored. If you change your copyField destination to a field that is stored and use that field as the default search field then highlighting should work - or at least it did for me. As a super fast check, change the text field to "stored=true" and test. Remember that you'll have to restart Solr and re-index first! HTH! -Teague -Original Message- From: vicky [mailto:vi...@raytheon.com] Sent: Wednesday, June 18, 2014 10:28 AM To: solr-user@lucene.apache.org Subject: Re: Highlighting not working Were you ever able to resolve this issue? I am having same issue and highligh is not working for me on solr 4.8? -- View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-not-working-tp4112659p414251 3.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Spell checker - limit on number of misspelt words in a search term.
I do not believe there is such a setting. Most likely you will need to increase the value for "maxCollationTries" to get it to discover the "correct" combination. Just be sure not to set this too high as queries with a lot of misspelled words (or for something your index simply doesn't have) will take longer to complete. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: S.L [mailto:simpleliving...@gmail.com] Sent: Tuesday, June 17, 2014 4:49 PM To: solr-user@lucene.apache.org Subject: Spell checker - limit on number of misspelt words in a search term. Hi All, I am using the Direct Spell checker component and I have collate =true in my solrconfig.xml. The issue that I noticed is that , when I have a search term with upto two words in it and if both of them are misspelled I get a collation query as a suggestion in the spellchecker output, if I increase the search term length to 3 words and spell all of them incorrectly then I do not get a collation query as an output in the spell checker suggestions. Is there a setting in solrconfig.xml file that's controlling this behavior by restricting the length of the search term to be up to two misspelt words to suggest a collation query, if so I would need to change the property. Can anyone please let me know how to do so ? Thanks. Sent from my mobile.
RE: Endeca to Solr Migration
We migrated a big application from Endeca (6.0, I think) a several years ago. We were not using any of the business UI tools, but we found that Solr is a lot more flexible and performant than Endeca. But with more flexibility comes more you need to know. The hardest thing was to migrate the Endeca dimensions to Solr facets. We had endeca-api specific dependencies throughout the application, even in the presentation layer. We ended up writing a bridge api that allowed us to keep our endeca-specific code and translate the queries to solr queries. We are storing a cross-reference between the "N" values from Endeca and key/value pairs to translate something like N=4000 to "fq=Language:English". With solr, there is more you need to do in your app that the backend doesn't manage for you. In the end, though, it lets you sparate your concerns better. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: mrg81 [mailto:maya...@gmail.com] Sent: Saturday, June 28, 2014 1:11 PM To: solr-user@lucene.apache.org Subject: Endeca to Solr Migration Hello -- I wanted to get some details on Endeca to Solr Migration. I am interested in few topics: 1. We would like to migrate the Faceted Navigation, Boosting individual records and a few other items. 2. But the biggest question is about the UI [Experience Manager] - I have not found a tool that comes close to Experience Manager. I did read about Hue [In response to Gareth's question on Migration], but it seems that we will have to do a lot of customization to use that. Questions: 1. Is there a UI that we can use? Is it possible to un-hook the Experience Manager UI and point to Solr? 2. How long does a typical migration take? Assuming that we have to migrate the Faceted Navigation and Boosted records? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Endeca-to-Solr-Migration-tp4144582.html Sent from the Solr - User mailing list archive at Nabble.com.
Of, To, and Other Small Words
Hello all, I am working with Solr 4.9.0 and am searching for phrases that contain words like "of" or "to" that Solr seems to be ignoring at index time. Here's what I tried: curl http://localhost/solr/update?commit=true -H "Content-Type: text/xml" --data-binary '100blah blah blah knowledge of science blah blah blah' Then, using a broswer: http://localhost/solr/collection1/select?q="knowledge+of+science"&fq=id:100 I get zero hits. Search for "knowledge" or "science" and I'll get hits. "knowledge of" or "of science" and I get zero hits. I don't want to use proximity if I can avoid it, as this may introduce too many undesireable results. Stopwords.txt is blank, yet clearly Solr is ignoring "of" and "to" and possibly more words that I have not discovered through testing yet. Is there some other configuration file that contains these small words? Is there any way to force Solr to pay attention to them and not drop them from the phrase? Any advice is appreciated! Thanks! -Teague
RE: Of, To, and Other Small Words
Hi Anshum, Thanks for replying and suggesting this, but the field type I am using (a modified text_general) in my schema has the file set to 'stopwords.txt'. Just to be double sure I cleared the list in stopwords_en.txt, restarted Solr, re-indexed, and searched with still zero results. Any other suggestions on where I might be able to control this behavior? -Teague -Original Message- From: Anshum Gupta [mailto:ans...@anshumgupta.net] Sent: Monday, July 14, 2014 4:04 PM To: solr-user@lucene.apache.org Subject: Re: Of, To, and Other Small Words Hi Teague, The StopFilterFactory (which I think you're using) by default uses lang/stopwords_en.txt (which wouldn't be empty if you check). What you're looking at is the stopword.txt. You could either empty that file out or change the field type for your field. On Mon, Jul 14, 2014 at 12:53 PM, Teague James wrote: > Hello all, > > I am working with Solr 4.9.0 and am searching for phrases that contain > words like "of" or "to" that Solr seems to be ignoring at index time. > Here's what I tried: > > curl http://localhost/solr/update?commit=true -H "Content-Type: text/xml" > --data-binary '100 name="content">blah blah blah knowledge of science blah blah > blah' > > Then, using a broswer: > > http://localhost/solr/collection1/select?q="knowledge+of+science"&fq=i > d:100 > > I get zero hits. Search for "knowledge" or "science" and I'll get hits. > "knowledge of" or "of science" and I get zero hits. I don't want to > use proximity if I can avoid it, as this may introduce too many > undesireable results. Stopwords.txt is blank, yet clearly Solr is ignoring > "of" and "to" > and possibly more words that I have not discovered through testing > yet. Is there some other configuration file that contains these small > words? Is there any way to force Solr to pay attention to them and not > drop them from the phrase? Any advice is appreciated! Thanks! > > -Teague > > -- Anshum Gupta http://www.anshumgupta.net
RE: Of, To, and Other Small Words
Jack, Thanks for replying and the suggestion. I replied to another suggestion with my field type and I do have . There's nothing in the stopwords.txt. I even cleaned out stopwords_en.txt just to be certain. Any other suggestions on how to control this behavior? -Teague -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Monday, July 14, 2014 4:26 PM To: solr-user@lucene.apache.org Subject: Re: Of, To, and Other Small Words Or, if you happen to leave off the "words" attribute of the stop filter (or misspell the attribute name), it will use the internal Lucene hardwired list of stop words. -- Jack Krupansky -Original Message- From: Anshum Gupta Sent: Monday, July 14, 2014 4:03 PM To: solr-user@lucene.apache.org Subject: Re: Of, To, and Other Small Words Hi Teague, The StopFilterFactory (which I think you're using) by default uses lang/stopwords_en.txt (which wouldn't be empty if you check). What you're looking at is the stopword.txt. You could either empty that file out or change the field type for your field. On Mon, Jul 14, 2014 at 12:53 PM, Teague James wrote: > Hello all, > > I am working with Solr 4.9.0 and am searching for phrases that contain > words like "of" or "to" that Solr seems to be ignoring at index time. > Here's what I tried: > > curl http://localhost/solr/update?commit=true -H "Content-Type: text/xml" > --data-binary '100 name="content">blah blah blah knowledge of science blah blah > blah' > > Then, using a broswer: > > http://localhost/solr/collection1/select?q="knowledge+of+science"&fq=i > d:100 > > I get zero hits. Search for "knowledge" or "science" and I'll get hits. > "knowledge of" or "of science" and I get zero hits. I don't want to > use proximity if I can avoid it, as this may introduce too many > undesireable results. Stopwords.txt is blank, yet clearly Solr is > ignoring "of" and "to" > and possibly more words that I have not discovered through testing > yet. Is there some other configuration file that contains these small > words? Is there any way to force Solr to pay attention to them and not > drop them from the phrase? Any advice is appreciated! Thanks! > > -Teague > > -- Anshum Gupta http://www.anshumgupta.net
RE: Of, To, and Other Small Words
Alex, Thanks! Great suggestion. I figured out that it was the EdgeNGramFilterFactory. Taking that out of the mix did it. -Teague -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Monday, July 14, 2014 9:14 PM To: solr-user Subject: Re: Of, To, and Other Small Words Have you tried the Admin UI's Analyze screen. Because it will show you what happens to the text as it progresses through the tokenizers and filters. No need to reindex. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On Tue, Jul 15, 2014 at 8:10 AM, Teague James wrote: > Hi Anshum, > > Thanks for replying and suggesting this, but the field type I am using (a > modified text_general) in my schema has the file set to 'stopwords.txt'. > > positionIncrementGap="100"> > > > ignoreCase="true" words="stopwords.txt" /> > > > > minGramSize="3" maxGramSize="10" /> > > > > > > ignoreCase="true" words="stopwords.txt" /> > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > > > > > > > Just to be double sure I cleared the list in stopwords_en.txt, restarted > Solr, re-indexed, and searched with still zero results. Any other suggestions > on where I might be able to control this behavior? > > -Teague > > > -Original Message- > From: Anshum Gupta [mailto:ans...@anshumgupta.net] > Sent: Monday, July 14, 2014 4:04 PM > To: solr-user@lucene.apache.org > Subject: Re: Of, To, and Other Small Words > > Hi Teague, > > The StopFilterFactory (which I think you're using) by default uses > lang/stopwords_en.txt (which wouldn't be empty if you check). > What you're looking at is the stopword.txt. You could either empty that file > out or change the field type for your field. > > > On Mon, Jul 14, 2014 at 12:53 PM, Teague James > wrote: >> Hello all, >> >> I am working with Solr 4.9.0 and am searching for phrases that >> contain words like "of" or "to" that Solr seems to be ignoring at index time. >> Here's what I tried: >> >> curl http://localhost/solr/update?commit=true -H "Content-Type: text/xml" >> --data-binary '100> name="content">blah blah blah knowledge of science blah blah >> blah' >> >> Then, using a broswer: >> >> >> i >> d:100 >> >> I get zero hits. Search for "knowledge" or "science" and I'll get hits. >> "knowledge of" or "of science" and I get zero hits. I don't want to >> use proximity if I can avoid it, as this may introduce too many >> undesireable results. Stopwords.txt is blank, yet clearly Solr is ignoring >> "of" and "to" >> and possibly more words that I have not discovered through testing >> yet. Is there some other configuration file that contains these small >> words? Is there any way to force Solr to pay attention to them and >> not drop them from the phrase? Any advice is appreciated! Thanks! >> >> -Teague >> >> > > > > -- > > Anshum Gupta > http://www.anshumgupta.net >