setNeedDocSet and queries cached in filter_cache

2013-11-18 Thread Daniel Rosher
Hi,

We've written a few searchComponenets that make use
of rb.setNeedDocSet(true); the trouble with this is that the query gets
cached in the filter_cache, and we think are purging our more 'useful'
docsets from the filter_cache.

Has anyone else noticed this and has a useful remedy?

We are currently using the solr.FastLRUCache and solr 4.5.1. I was thinking
of creating the docset in the first component that uses it, cached for use
by other components, then finally discarded.

Thanks in advance for any help.

Dan


facet filtering

2013-07-15 Thread Daniel Rosher
Hi,

How can I have faceting on a subset of the query docset e.g. with something
akin to:

SimpleFacets.base =
SolrIndexSearcher.getDocSet(
Query mainQuery,
SolrIndexSearcher.getDocSet(Query filter)
)

Is there anything like facet.fq?

Cheers,
Dan


Dynamic Query Analyzer

2013-09-03 Thread Daniel Rosher
Hi,

We have a need to specify a different query analyzer depending on input
parameters dynamically.

We need this so that we can use different stopword lists at query time.

Would any one know how I might be able to achieve this in solr?

I'm aware of the solution to specify different field types, each with a
different query analyzer, but I'd like not to have to index the field
multiple times.

Many thanks
Dab


Re: Additional field informations?

2012-11-20 Thread Daniel Rosher
Hi,

Have a look at DocTransformers

 http://wiki.apache.org/solr/DocTransformers and ExplainAugmenterFactory as
an example

Cheers,
Dan

On Tue, Nov 20, 2012 at 3:08 PM, Sebastian Hofmann wrote:

> Hello all,
> We import xml documents to solr with solrj. We use xsl to proccess the
> "objects" to fields.
> We got the language informations in our "objects".
> After xsl out Documents look like this:
> 
> 
> ...
> german title
> english title
> french title
> ...
> 
> 
>
>
> Our schema.xml looks like this. (we use it as a filter too..)
> ...
>  multiValued="true" />
> ...
>
> Our results look like this. (we want to transform it directly to html with
> xsl)
> 
> german title
> english title
> french title
> 
>
> Is there any possibillity to get a result like this:
> 
> german title
> english title
> german title
> 
>
>


Re: Custom ranking solutions?

2012-11-20 Thread Daniel Rosher
Hi

The product function query needs a valuesource, not the pseudo score field.

You probably need something like (with Solr 4.0):

q={!lucene}*:*&sort=product(query($q),2) desc,score
desc&fl=score,_score_:product(query($q),2),[explain]

Cheers,
Dan

On Tue, Nov 20, 2012 at 2:29 AM, Floyd Wu  wrote:

> Hi there,
>
> Before ExternalFielField introduced, change document boost value to achieve
> custom ranking. My client app will update each boost value for documents
> daily and seem to worked fine.
> Actual ranking could be predicted based on boost value. (value is
> calculated based on click, recency, and rating ).
>
> I'm now try to use ExternalFileField to do some ranking, after some test, I
> did not get my expectation.
>
> I'm doing a sort like this
>
> sort=product(score,abs(rankingField))+desc
> But the query result ranking won't change anyway.
>
> The external file as following
> doc1=3
> doc2=5
> doc3=9
>
> The original score get from Solr result as fllowing
> doc1=41.042
> doc2=10.1256
> doc3=8.2135
>
> Expected ranking
> doc1
> doc3
> doc2
>
> What wrong in my test, please kindly help on this.
>
> Floyd
>


overlap function query

2013-01-29 Thread Daniel Rosher
Hi,

I'm wondering if there exists or if someone has implemented something like
the following as a function query:

overlap(query,field) = number of matching terms in field/number of terms in
field

e.g. with three docs having these tokens(e.g.A B C) in a field
D
1:A B B
2:A B
3:A

The overlap would be for these queries (-- highlights possibly highest
scoring doc):

Q:A
1:1/3
2:1/2
3:1/1 --

Q:A B
1:2/3
2:2/2 --
3:1/1

Q:A B C
1:2/3
2:2/2 --
3:1/1

The objective to to pick the most likely doc using the overlap to boost the
score.

Cheers,
Dan


Re: overlap function query

2013-01-30 Thread Daniel Rosher
Hi Mikhail,

Thanks for the reply.

I think coord works at the document level, I was thinking of having
something that worked at a field level, against a 'principle/primary'
field.

I'm using edismax with tie=1 (a.k.a. Disjunction Sum) and several fields,
but docs with greater query overlap on the primary field should score
higher if you see what I mean.

Cheers,
Dan

On Tue, Jan 29, 2013 at 7:14 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Daniel,
>
> You can start from here
>
> http://lucene.apache.org/core/4_0_0-BETA/core/org/apache/lucene/search/similarities/Similarity.html#coord%28int,%20int%29but
> it requires deep understanding of Lucene internals
>
>
>
> On Tue, Jan 29, 2013 at 2:12 PM, Daniel Rosher  wrote:
>
> > Hi,
> >
> > I'm wondering if there exists or if someone has implemented something
> like
> > the following as a function query:
> >
> > overlap(query,field) = number of matching terms in field/number of terms
> in
> > field
> >
> > e.g. with three docs having these tokens(e.g.A B C) in a field
> > D
> > 1:A B B
> > 2:A B
> > 3:A
> >
> > The overlap would be for these queries (-- highlights possibly highest
> > scoring doc):
> >
> > Q:A
> > 1:1/3
> > 2:1/2
> > 3:1/1 --
> >
> > Q:A B
> > 1:2/3
> > 2:2/2 --
> > 3:1/1
> >
> > Q:A B C
> > 1:2/3
> > 2:2/2 --
> > 3:1/1
> >
> > The objective to to pick the most likely doc using the overlap to boost
> the
> > score.
> >
> > Cheers,
> > Dan
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  
>


Re: Synonym Filter: Removing all original tokens, retain matched synonyms

2012-10-10 Thread Daniel Rosher
Ah ha .. good thinking ... thanks!

Dan

On Wed, Oct 10, 2012 at 2:39 PM, Ahmet Arslan  wrote:

>
> > Token_Input:
> > the fox jumped over the lazy dog
> >
> > Synonym_Map:
> > fox => vulpes
> > dog => canine
> >
> > Token_Output:
> > vulpes canine
> >
> > So remove all tokens, but retain those matched against the
> > synonym map
>
> May be you can make use of
> http://lucene.apache.org/solr/api-4_0_0-ALPHA/org/apache/solr/analysis/KeepWordFilterFactory.html
> .
>
> You need to copy entries (vulpes, canine) from synonym.txt into
> keepwords.txt file.
>


Re: Auto complete

2008-07-08 Thread daniel rosher
Hi,

This is how we implement our autocomplete feature, excerpt from
schema.xml

-First accept the input as is without alteration
-Lowercase the input, and eliminate all non a-z0-9 chars to normalize
the input
-split into multiple tokens with EdgeNGramFilterFactory upto a max of
100 chars, all starting from the beginning of the input, e.g. hello
becomes h,he,hel,hell,hello. 
-For queries we accept the first 20 chars.

Hope this helps.
















...


Regards,
Dan




On Mon, 2008-07-07 at 17:12 +, sundar shankar wrote:
> Hi All,
>I am using Solr for some time and am having trouble with an auto 
> complete feature that I have been trying to incorporate. I am indexing solr 
> as a database column to solr field mapping. I have tried various configs that 
> were mentioned in the solr user community suggestions and have tried a few 
> option of my own too. Each of them seem to either not bring me the exact data 
> I want or seems to get excess data.
> 
> I have tried.
> text_ws,
> text,
> string
> EdgeNGramTokenizerFactory
> the subword example
> textTight
> and juggling arnd some of the filters and analysers togther.
> 
> Couldnt get dismax to work as somehow it wasnt able to connect my field 
> defined in the schema to the qf param that I was passing in the request.
> 
> Text tight was the best results I had but the problem there was it was 
> searching for whole words and not part words.
> example
> 
> if my query String was field1:Word1 word2* I was getting back results but if 
> my query string was field1: Word1 wor* I didnt get a result back.
> 
> I am little perplexed on how to implement this. I dont know what has to be 
> done.
> 
> The schema
> 
> 
> termVectors="true"/>
>
> 
> stored="false" multiValued="true"/>
>
> termVectors="true" multiValued="true"/>
> termVectors="true" multiValued="true"/>
> 
> multiValued="true"  termVectors="true"/>
> multiValued="true"  termVectors="true"/>
> 
> 
> 
> I Index institution.name only, the rest are copy fields of the same.
> 
> 
> Any help is appreciated.
> 
> Thanks
> Sundar
> 
> _
> Chose your Life Partner? Join MSN Matrimony
> http://www.shaadi.com/msn/matrimony.php 
> 
> <>
Daniel Rosher
Developer
www.thehotonlinenetwork.com
d: 0207 3489 912

t: 0845 4680 568

f: 0845 4680 868

m: 

Beaumont House, Kensington Village, Avonmore Road, London, W14 
8TS



- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - -

This message is sent in confidence for the addressee only. It may contain 
privileged

information. The contents are not to be disclosed to anyone other than the 
addressee.

Unauthorised recipients are requested to preserve this confidentiality and 
to advise

us of any errors in transmission. Thank you.

hotonline ltd is registered in England & Wales. Registered office: One 
Canada Square,

Canary Wharf, London E14 5AP. Registered No: 1904765.


solr.WordDelimiterFilterFactory

2008-11-20 Thread Daniel Rosher
Hi,

I'm trying to index some content that has things like 'java/J2EE' but with
solr.WordDelimiterFilterFactory and parameters [generateWordParts="1"
generateNumberParts="0" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="0"] this ends up tokenized as
'java','j','2',EE'

Does anyone know a way of having this tokenized as 'java','j2ee'.

Perhaps this filter need something like a protected list of tokens not to
tokenize like EnglishPorterFilter ?

Cheers,
Dan


FunctionQuery step function

2008-02-13 Thread Daniel Rosher
Hi All,

We'd like to restrict older modified documents with a step function, rather
than the suggested method:

*recip(rord(creationDate),1,1000,1000).

I'm wondering whether the following might do it, and if anyone else has had
to solve this before?

bf="map(map(modified,0,0,today),0,12monthago,0.2)
map(map(modified,0,0,today),12monthago,6monthago,0.3)
map(map(modified,0,0,today),6monthsago,today,1)"

is this inefficient?

basically :

if older than 12months, multiply by 0.2
**if 6-12months old,  multiply by 0.3
all other cases   **multiply by 1

today,**12monthago,**6monthago are epoch secs since 1/1/1970 (changes for
each query)**

Cheers,
Dan
**
*


solrj.embedded.JettySolrRunner and logging to file instead of STDERR

2008-06-19 Thread Daniel Rosher
Hi,

I've modified a copy of
./src/test/org/apache/solr/TestDistributedSearch.java for my own build
process. I can compile fine but running the test always logs to STDERR

INFO:  Logging to STDERR via org.mortbay.log.StdErrLog

This method appears deprecated?

//public JettySolrRunner( String context, String home, String dataDir, int
port, boolean log )

How can I log to a file instead of STDERR

Many thanks,
Dan