setNeedDocSet and queries cached in filter_cache
Hi, We've written a few searchComponenets that make use of rb.setNeedDocSet(true); the trouble with this is that the query gets cached in the filter_cache, and we think are purging our more 'useful' docsets from the filter_cache. Has anyone else noticed this and has a useful remedy? We are currently using the solr.FastLRUCache and solr 4.5.1. I was thinking of creating the docset in the first component that uses it, cached for use by other components, then finally discarded. Thanks in advance for any help. Dan
facet filtering
Hi, How can I have faceting on a subset of the query docset e.g. with something akin to: SimpleFacets.base = SolrIndexSearcher.getDocSet( Query mainQuery, SolrIndexSearcher.getDocSet(Query filter) ) Is there anything like facet.fq? Cheers, Dan
Dynamic Query Analyzer
Hi, We have a need to specify a different query analyzer depending on input parameters dynamically. We need this so that we can use different stopword lists at query time. Would any one know how I might be able to achieve this in solr? I'm aware of the solution to specify different field types, each with a different query analyzer, but I'd like not to have to index the field multiple times. Many thanks Dab
Re: Additional field informations?
Hi, Have a look at DocTransformers http://wiki.apache.org/solr/DocTransformers and ExplainAugmenterFactory as an example Cheers, Dan On Tue, Nov 20, 2012 at 3:08 PM, Sebastian Hofmann wrote: > Hello all, > We import xml documents to solr with solrj. We use xsl to proccess the > "objects" to fields. > We got the language informations in our "objects". > After xsl out Documents look like this: > > > ... > german title > english title > french title > ... > > > > > Our schema.xml looks like this. (we use it as a filter too..) > ... > multiValued="true" /> > ... > > Our results look like this. (we want to transform it directly to html with > xsl) > > german title > english title > french title > > > Is there any possibillity to get a result like this: > > german title > english title > german title > > >
Re: Custom ranking solutions?
Hi The product function query needs a valuesource, not the pseudo score field. You probably need something like (with Solr 4.0): q={!lucene}*:*&sort=product(query($q),2) desc,score desc&fl=score,_score_:product(query($q),2),[explain] Cheers, Dan On Tue, Nov 20, 2012 at 2:29 AM, Floyd Wu wrote: > Hi there, > > Before ExternalFielField introduced, change document boost value to achieve > custom ranking. My client app will update each boost value for documents > daily and seem to worked fine. > Actual ranking could be predicted based on boost value. (value is > calculated based on click, recency, and rating ). > > I'm now try to use ExternalFileField to do some ranking, after some test, I > did not get my expectation. > > I'm doing a sort like this > > sort=product(score,abs(rankingField))+desc > But the query result ranking won't change anyway. > > The external file as following > doc1=3 > doc2=5 > doc3=9 > > The original score get from Solr result as fllowing > doc1=41.042 > doc2=10.1256 > doc3=8.2135 > > Expected ranking > doc1 > doc3 > doc2 > > What wrong in my test, please kindly help on this. > > Floyd >
overlap function query
Hi, I'm wondering if there exists or if someone has implemented something like the following as a function query: overlap(query,field) = number of matching terms in field/number of terms in field e.g. with three docs having these tokens(e.g.A B C) in a field D 1:A B B 2:A B 3:A The overlap would be for these queries (-- highlights possibly highest scoring doc): Q:A 1:1/3 2:1/2 3:1/1 -- Q:A B 1:2/3 2:2/2 -- 3:1/1 Q:A B C 1:2/3 2:2/2 -- 3:1/1 The objective to to pick the most likely doc using the overlap to boost the score. Cheers, Dan
Re: overlap function query
Hi Mikhail, Thanks for the reply. I think coord works at the document level, I was thinking of having something that worked at a field level, against a 'principle/primary' field. I'm using edismax with tie=1 (a.k.a. Disjunction Sum) and several fields, but docs with greater query overlap on the primary field should score higher if you see what I mean. Cheers, Dan On Tue, Jan 29, 2013 at 7:14 PM, Mikhail Khludnev < mkhlud...@griddynamics.com> wrote: > Daniel, > > You can start from here > > http://lucene.apache.org/core/4_0_0-BETA/core/org/apache/lucene/search/similarities/Similarity.html#coord%28int,%20int%29but > it requires deep understanding of Lucene internals > > > > On Tue, Jan 29, 2013 at 2:12 PM, Daniel Rosher wrote: > > > Hi, > > > > I'm wondering if there exists or if someone has implemented something > like > > the following as a function query: > > > > overlap(query,field) = number of matching terms in field/number of terms > in > > field > > > > e.g. with three docs having these tokens(e.g.A B C) in a field > > D > > 1:A B B > > 2:A B > > 3:A > > > > The overlap would be for these queries (-- highlights possibly highest > > scoring doc): > > > > Q:A > > 1:1/3 > > 2:1/2 > > 3:1/1 -- > > > > Q:A B > > 1:2/3 > > 2:2/2 -- > > 3:1/1 > > > > Q:A B C > > 1:2/3 > > 2:2/2 -- > > 3:1/1 > > > > The objective to to pick the most likely doc using the overlap to boost > the > > score. > > > > Cheers, > > Dan > > > > > > -- > Sincerely yours > Mikhail Khludnev > Principal Engineer, > Grid Dynamics > > <http://www.griddynamics.com> > >
Re: Synonym Filter: Removing all original tokens, retain matched synonyms
Ah ha .. good thinking ... thanks! Dan On Wed, Oct 10, 2012 at 2:39 PM, Ahmet Arslan wrote: > > > Token_Input: > > the fox jumped over the lazy dog > > > > Synonym_Map: > > fox => vulpes > > dog => canine > > > > Token_Output: > > vulpes canine > > > > So remove all tokens, but retain those matched against the > > synonym map > > May be you can make use of > http://lucene.apache.org/solr/api-4_0_0-ALPHA/org/apache/solr/analysis/KeepWordFilterFactory.html > . > > You need to copy entries (vulpes, canine) from synonym.txt into > keepwords.txt file. >
Re: Auto complete
Hi, This is how we implement our autocomplete feature, excerpt from schema.xml -First accept the input as is without alteration -Lowercase the input, and eliminate all non a-z0-9 chars to normalize the input -split into multiple tokens with EdgeNGramFilterFactory upto a max of 100 chars, all starting from the beginning of the input, e.g. hello becomes h,he,hel,hell,hello. -For queries we accept the first 20 chars. Hope this helps. ... Regards, Dan On Mon, 2008-07-07 at 17:12 +, sundar shankar wrote: > Hi All, >I am using Solr for some time and am having trouble with an auto > complete feature that I have been trying to incorporate. I am indexing solr > as a database column to solr field mapping. I have tried various configs that > were mentioned in the solr user community suggestions and have tried a few > option of my own too. Each of them seem to either not bring me the exact data > I want or seems to get excess data. > > I have tried. > text_ws, > text, > string > EdgeNGramTokenizerFactory > the subword example > textTight > and juggling arnd some of the filters and analysers togther. > > Couldnt get dismax to work as somehow it wasnt able to connect my field > defined in the schema to the qf param that I was passing in the request. > > Text tight was the best results I had but the problem there was it was > searching for whole words and not part words. > example > > if my query String was field1:Word1 word2* I was getting back results but if > my query string was field1: Word1 wor* I didnt get a result back. > > I am little perplexed on how to implement this. I dont know what has to be > done. > > The schema > > > termVectors="true"/> > > > stored="false" multiValued="true"/> > > termVectors="true" multiValued="true"/> > termVectors="true" multiValued="true"/> > > multiValued="true" termVectors="true"/> > multiValued="true" termVectors="true"/> > > > > I Index institution.name only, the rest are copy fields of the same. > > > Any help is appreciated. > > Thanks > Sundar > > _ > Chose your Life Partner? Join MSN Matrimony > http://www.shaadi.com/msn/matrimony.php > > <> Daniel Rosher Developer www.thehotonlinenetwork.com d: 0207 3489 912 t: 0845 4680 568 f: 0845 4680 868 m: Beaumont House, Kensington Village, Avonmore Road, London, W14 8TS - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - This message is sent in confidence for the addressee only. It may contain privileged information. The contents are not to be disclosed to anyone other than the addressee. Unauthorised recipients are requested to preserve this confidentiality and to advise us of any errors in transmission. Thank you. hotonline ltd is registered in England & Wales. Registered office: One Canada Square, Canary Wharf, London E14 5AP. Registered No: 1904765.
solr.WordDelimiterFilterFactory
Hi, I'm trying to index some content that has things like 'java/J2EE' but with solr.WordDelimiterFilterFactory and parameters [generateWordParts="1" generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"] this ends up tokenized as 'java','j','2',EE' Does anyone know a way of having this tokenized as 'java','j2ee'. Perhaps this filter need something like a protected list of tokens not to tokenize like EnglishPorterFilter ? Cheers, Dan
FunctionQuery step function
Hi All, We'd like to restrict older modified documents with a step function, rather than the suggested method: *recip(rord(creationDate),1,1000,1000). I'm wondering whether the following might do it, and if anyone else has had to solve this before? bf="map(map(modified,0,0,today),0,12monthago,0.2) map(map(modified,0,0,today),12monthago,6monthago,0.3) map(map(modified,0,0,today),6monthsago,today,1)" is this inefficient? basically : if older than 12months, multiply by 0.2 **if 6-12months old, multiply by 0.3 all other cases **multiply by 1 today,**12monthago,**6monthago are epoch secs since 1/1/1970 (changes for each query)** Cheers, Dan ** *
solrj.embedded.JettySolrRunner and logging to file instead of STDERR
Hi, I've modified a copy of ./src/test/org/apache/solr/TestDistributedSearch.java for my own build process. I can compile fine but running the test always logs to STDERR INFO: Logging to STDERR via org.mortbay.log.StdErrLog This method appears deprecated? //public JettySolrRunner( String context, String home, String dataDir, int port, boolean log ) How can I log to a file instead of STDERR Many thanks, Dan