Creating a Custom Query Response Writer

2014-12-05 Thread Ryan Yacyshyn
Hey Everyone, I'm a little stuck on building a custom query response writer. I want to create a response writer similar to the one explained in the book, Taming Text, on the TypeAheadResponseWriter. I know I need to implement the QueryResponseWriter, but I'm not sure where to find the Solr JAR fil

DocsEnum and TermsEnum "reuse" in lucene join library?

2014-12-05 Thread Darin Amos
Hi All, I have been working on a custom query and I am going off of samples in the lucene join library (4.3.0) and I am a little unclear about a couple lines. 1) When getting a TermsEnum in TermsIncludingScoreQuery.createWeight(…).scorer()… A previous TermsEnum is used like the following: seg

Re: Preferred Scema/Config for Chinese Language Cores?

2014-12-05 Thread Tom Zimmermann
Thanks for the links. The dzone lnk was nice and concise, but unfortunately makes use of the now deprecated CJK tokenizer. Does anyone out there have some examples or experience working with the recommended replacement for CJK? Thanks, TZ

Re: unable to build spellcheck in solr

2014-12-05 Thread Alexandre Rafalovitch
What's your suggester XML definition? Do you have a link similar to: fuzzysuggest.txt That particular code path seems to be expecting it. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr populari

Re: unable to build spellcheck in solr

2014-12-05 Thread Erick Erickson
Not sure, of course. Sure seems like a better error message is in order, is there anything above the message you pasted in the log file that sheds more light on the subject? Erick On Fri, Dec 5, 2014 at 1:22 PM, Min L wrote: > Thanks for your reply. > > This is all it is in the solr log, no stac

Re: unable to build spellcheck in solr

2014-12-05 Thread Min L
Thanks for your reply. This is all it is in the solr log, no stack. It fails regardless of buildOncommit=true or false by building it manually. The file fst.bin was created. I found the source code in suggester.java where it logged the error. Perhaps lookup.store(new FileOutputStream(target)) fai

RE: Tika HTTP 400 Errors with DIH

2014-12-05 Thread steve
Likely a good http debugger would help (wireshark, or fiddler2, for example) http://www.telerik.com/fiddler https://www.wireshark.org/download.html For example, it could show the http header that the "client" uses to request info from an api, then the show results of that query. One small caveat:

Re: unable to build spellcheck in solr

2014-12-05 Thread Erick Erickson
What's the rest of the stack trace? There should be a root cause somewhere. Best, Erick On Fri, Dec 5, 2014 at 11:07 AM, Min L wrote: > Hi all: > > My code using solr spellchecker to suggest keywords worked fine locally, > however in qa solr env, it failed to build it with the following error in

Re: Using Solr for finding Flight Routes

2014-12-05 Thread Nazik Huq
Check Grant's SOLR Air reference app here http://www.ibm.com/developerworks/library/j-solr-lucene/index.html . @Nazik_Huq On Dec 5, 2014, at 1:19 PM, Robin Woods wrote: > Thanks Alex. I'll check the GraphDB solutions. > > On Fri, Dec 5, 2014 at 6:20 AM, Alexandre Rafalovitch > wrote: > >>

Logging in Solr's DataImportHandler

2014-12-05 Thread Dan Davis
I have a script transformer and a log transformer, and I'm not seeing the log messages, at least not where I expect. Is there anyway I can simply log a custom message from within my script? Can the script easily interact with its containers logger?

Re: Anti-Pattern in lucent-join jar?

2014-12-05 Thread Darin Amos
In this case I was thinking about something like the following.. if you changed the Query implementation or created your own similar query: If you consider this query: q={!scorejoin from=parent to=id}type:child public class ScoreJoinQuery extends Query(){ private Query q = null;

Re: Anti-Pattern in lucent-join jar?

2014-12-05 Thread Roman Chyla
Not sure I understand. It is the searcher which executes the query, how would you 'convince' it to pass the query? First the Weight is created, weight instance creates scorer - you would have to change the API to do the passing (or maybe not...?) In my case, the relationships were across index segm

unable to build spellcheck in solr

2014-12-05 Thread Min L
Hi all: My code using solr spellchecker to suggest keywords worked fine locally, however in qa solr env, it failed to build it with the following error in solr log: ERROR Suggester Store Lookup build from index on field: myfieldname failed reader has: xxx docs I checked the solr directory and th

Re: Anti-Pattern in lucent-join jar?

2014-12-05 Thread Darin Amos
Couldn’t you just keep passing the wrapped query and searcher down to Weight.scorer()? This would allow you to wait until the query is executed to do term collection. If you want to protect against creating and executing the query with different searchers, you would have to make the query facto

Re: Using Solr for finding Flight Routes

2014-12-05 Thread Robin Woods
Thanks Alex. I'll check the GraphDB solutions. On Fri, Dec 5, 2014 at 6:20 AM, Alexandre Rafalovitch wrote: > Sounds like a standard graph-database problem. I think some GraphDBs > integrate with Solr (or at least Lucene) for search. > > Regards, >Alex. > > > Personal: http://www.outerthough

Re: Anti-Pattern in lucent-join jar?

2014-12-05 Thread Roman Chyla
Hi Mikhail, I think you are right, it won't be problem for SOLR, but it is likely an antipattern inside a lucene component. Because custom components may create join queries, hold to them and then execute much later against a different searcher. One approach would be to postpone term collection unt

[ANN] Heliosearch 0.09 (JSON Request API + Distrib for Facet API)

2014-12-05 Thread Yonik Seeley
http://heliosearch.org/download Heliosearch v0.09 Features: o Heliosearch v0.09 is based on (and contains all features of) Lucene/Solr 4.10.2 + most of 4.10.3 o Distributed search support for the new faceted search module / JSON Facet API: http://heliosearch.org/json-facet-api/ o Automatic conv

RE: How to stop Solr tokenising search terms with spaces

2014-12-05 Thread Dinesh Babu
Hi Erik, Probably I celebrated too soon. When I tested {!field} it seemed to work as the query was on such a data that it made to look like it is working. using the example that I originally mentioned to search for Tom Hanks Major 1) If I search {!field f=displayName}: Hanks Major, it works

Re: Anti-Pattern in lucent-join jar?

2014-12-05 Thread Darin Amos
Thanks for the information! The reason I ask is I am doing a POC on building a custom Query+QueryParser+Facet Component customization. I have had some issues finding exactly what I am looking for OOTB and I believe I need something custom. (its also a really good learning exercise) I do ecomm

RE: Tika HTTP 400 Errors with DIH

2014-12-05 Thread Teague James
Alex, Your suggestion might be a solution, but the issue isn't that the resource isn't found. Like Walter said 400 is a "bad request" which makes me wonder, what is the DIH/Tika doing when trying to access the documents? What is the "request" that is bad? Is there any other way to suss this out

Re: Get the new terms of fields since last update

2014-12-05 Thread lboutros
I think payloads are per posting informations which means that it's not trivial (to me at least ;)) to get terms for a given payload. And it's quite intensive to scan all postings. I will check for the bloom filter idea. Thx Ludovic. - Jouve France. -- View this message in context: http:

Re: How to stop Solr tokenising search terms with spaces

2014-12-05 Thread Erik Hatcher
Dinesh - indeed. You can compose arbitrarily complex queries using what has been termed “nested queries” like this. It used to be q=_query_:”{!…}...” OR _query_:”{!…}…”, but the _query_ trick isn’t strictly necessary now (though care has to be take to make sure these complex nested expressions

Re: How to stop Solr tokenising search terms with spaces

2014-12-05 Thread Erik Hatcher
But also, to spell out the more typical way to do that: q=field1:”…” OR field2:”…” The nice thing about {!field} is that the value doesn’t have to have quotes and deal with escaping issues, but if you just want phrase queries and quote/escaping isn’t a hassle maybe that’s cleaner for you.

Re: Get the new terms of fields since last update

2014-12-05 Thread Alexandre Rafalovitch
On 5 December 2014 at 10:21, lboutros wrote: > Alex, I will check, this seems to be a good idea. > Is it possible to filter terms with payloads in index readers ? I did not > see anything like that in my first investigation. > I suppose it would take some additional disk space. Payloads are kind

RE: How to stop Solr tokenising search terms with spaces

2014-12-05 Thread Dinesh Babu
Thanks a lot Erik. {!field} seems to solve our issue. Much appreciate your help Regards, Dinesh Babu. -Original Message- From: Erik Hatcher [mailto:erik.hatc...@gmail.com] Sent: 05 December 2014 16:00 To: solr-user@lucene.apache.org Subject: Re: How to stop Solr tokenising search terms

RE: How to stop Solr tokenising search terms with spaces

2014-12-05 Thread Dinesh Babu
One more quick question Erik, If I want to do search on multiple fields using {!field} do we have a query similar to what {!prefix} has : q={!prefix f=field1 v=$f1_val} OR {!prefix f=field2 v=$f2_val} where &f1_val=&f2_val= Regards, Dinesh Babu. -Original Message- From: Dinesh Babu

Re: How to stop Solr tokenising search terms with spaces

2014-12-05 Thread Erik Hatcher
try using {!field} instead of {!prefix}. {!field} will create a phrase query (or term query if it’s just one term) after analysis. [it also could construct other query types if the analysis overlaps tokens, but maybe not relevant here] Also note that you can use multiple of these expressions i

Re: Get the new terms of fields since last update

2014-12-05 Thread Sujit Pal
Hi Ludovic, A bit late to the party, sorry, but here is a bit of a riff off Eric's idea. Why not store the previous terms in a Bloom filter and once you get the terms from this week, check to see if they are not in the set. Once you find the set, add them to the Bloom filter. Bloom filters are spa

How to stop Solr tokenising search terms with spaces

2014-12-05 Thread Dinesh Babu
Hi, We are using Solr 4.10.2 to store user names from LDAP. I want Solr not to tokenise my search term which has space in it Eg: If there is a user by the name Tom Hanks Major, then 1) When I do a query for " Tom Hanks Major " , I don't want solr break this search phrase and search for individ

Re: Proximity Search with Grouping

2014-12-05 Thread Emre ERKEK
Thanks for answer. On Fri, Dec 5, 2014 at 2:01 PM, Allison, Timothy B. wrote: > With updated link (sorry!): > https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser > > -Original Message- > From: Emre ERKEK [mailto:h.emre.er...@gmail.com] > S

Re: Get the new terms of fields since last update

2014-12-05 Thread lboutros
The Apache Solr community is sooo great ! Interesting problem with 3 interesting answers in less than 2 hours ! Thank you all, really. Erik, I'm already saving the billion of terms each week. It's hard to diff 1 billion of terms. I'm already rebuilding the whole dictionaries each week in a cust

Re: Get the new terms of fields since last update

2014-12-05 Thread Michael Sokolov
How about creating a new core that only holds a single week's documents, and retrieving all of its terms? Then each week, flush it and start over. -Mike On 12/05/2014 07:54 AM, lboutros wrote: Dear all, I would like to get the new terms of fields since last update (once a week). If I retriev

Re: Using Solr for finding Flight Routes

2014-12-05 Thread Alexandre Rafalovitch
Sounds like a standard graph-database problem. I think some GraphDBs integrate with Solr (or at least Lucene) for search. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community:

Re: Get the new terms of fields since last update

2014-12-05 Thread Alexandre Rafalovitch
What about using payloads to store timestamps? And then some sort of post-filtering to remove what's too old. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.li

Re: Get the new terms of fields since last update

2014-12-05 Thread Erik Hatcher
Interesting problem. Can you save last weeks results and then “diff” them? My first thought would be to use the terms component or faceting to get all the terms, save that off (in a simple alpha sorted text file, maybe) and then next week do the same thing and diff the files? The lower lev

Get the new terms of fields since last update

2014-12-05 Thread lboutros
Dear all, I would like to get the new terms of fields since last update (once a week). If I retrieve some terms which were already present, it's not a problem (but terms which did not exist before must be retrieved). Is there an easy way to do that ? I'm currently investigating the possibility

RE: Proximity Search with Grouping

2014-12-05 Thread Allison, Timothy B.
With updated link (sorry!): https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser -Original Message- From: Emre ERKEK [mailto:h.emre.er...@gmail.com] Sent: Friday, December 05, 2014 2:42 AM To: solr Subject: Proximity Search with Grouping

RE: Proximity Search with Grouping

2014-12-05 Thread Allison, Timothy B.
Y, if you use the ComplexPhraseQueryParser: http://wiki.apache.org/solr/ComplexPhraseQueryParser . -Original Message- From: Emre ERKEK [mailto:h.emre.er...@gmail.com] Sent: Friday, December 05, 2014 2:42 AM To: solr Subject: Proximity Search with Grouping Hi All, Can I use proximit

Re: Anti-Pattern in lucent-join jar?

2014-12-05 Thread Mikhail Khludnev
Thanks Roman! Let's expand it for the sake of completeness. Such issue is not possible in Solr, because caches are associated with the searcher. While you follow this design (see Solr userCache), and don't update what's cached once, there is no chance to shoot the foot. There were few caches inside

Re: REST API Alternative to admin/luke

2014-12-05 Thread Ahmet Arslan
Hi, I use it in production with numTerms=0 parameter set. Ahmet On Thursday, December 4, 2014 10:48 PM, Constantin Wolber wrote: Hi, Basically using an endpoint in the admin section is something that makes me think if there is an alternative. And it would have been nice to have a straight