I would use that mm value to decrease it in case user's request would get no
answer.
I deal with requests potentially containing a lot of parasite words, and I
want to progammaticaly lower mm in a second try request if necessary. But I
don't want to decrease it too much to avoid getting too many i
Hello,
Is there some magic in edismax or one of the QPs that would make this possible:
Boost documents which match name and desc;
include docs which just match name;
and exclude docs which only match desc.
?
One could use very high field weight for name and very low weight for desc
field in o
It would be good to output the mm value for debugging.
Something like mm_value = 2
Then you should know the results are right.
On 10/5/11 9:58 AM, "Shawn Heisey" wrote:
>On 10/5/2011 9:06 AM, elisabeth benoit wrote:
>> thanks for answering.
>>
>> echoParams just echos mm value in solrconfig.xm
Markus,
The calculation is correct.
Look at your output.
Result = queryWeight(text:gb) * fieldWeight(text:gb in 1)
Result = (idf(docFreq=6, numDocs=26) * queryNorm) *
(tf(termFreq(text:gb)=2) * idf(docFreq=6, numDocs=26) *
fieldNorm(field=text, doc=1))
This you should notice that idf(docFreq=6
Shawn,
Have you looked
at http://www.sematext.com/products/dym-researcher/index.html as a solution to
the ZeroHits problem?
If that doesn't work, then yes, offline word/phase co-occurrence may work.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search ::
Hi,
I noticed that every "interesting term" returned using the
MoreLikeThisHandler always have a boost of 1. How would one go about
making a term have a different boost.
Say I have a paragraph of text and I do a more like this query on the
paragraph. But if term XX or YY is in the paragraph
Using ShingleFilterFactory at index time may help.
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory
--- On Wed, 10/5/11, Mikhail Khludnev wrote:
> From: Mikhail Khludnev
> Subject: Re: How do i get results for quering with separated words?
> To: solr-use
Your use-case is pretty unique. One solutions might be to use MemoryIndex which
is designed for "Prospective search".
http://lucene.apache.org/java/2_4_0/api/contrib-memory/org/apache/lucene/index/memory/MemoryIndex.html
Your documents will be your stored "huge numbers of queries". Your user en
On Thu, Oct 6, 2011 at 1:49 AM, Jamie Johnson wrote:
> I have a field named test_txt which I am populating in some cases, and
> not in others. I also have a copy field directive to copy data from
> _txt to text_txt. Thigns seem to work except I believe the field is
> also copying to itself. Is
I have a field named test_txt which I am populating in some cases, and
not in others. I also have a copy field directive to copy data from
_txt to text_txt. Thigns seem to work except I believe the field is
also copying to itself. Is there anyway to avoid this behavior?
I am trying to figure out how we can begin offering search suggestions
to people, especially when a user types in something that results in few
or zero results. For background, we have an archive of about 60 million
objects, most of which are photographs. There are also a number of text
artic
Hi,
You can also check out LUCENE-3413 [1] and the CombiningFilter that
I wrote and associated example. This lets you:
1. perform normal tokenization and analysis in your analysis chain
2. recombine the tokens at the end for sorting purposes
HTH,
Chris
[1] https://issues.apache.org/jira/browse
Have you tried to correct spaces by spelling dictionary?
if you build you dictionary from non tokenized terms, you'll have starwars
-> Star Wars and super rtl->superrtl corrections.
WDYT?
On Wed, Oct 5, 2011 at 7:13 PM, elisabeth benoit
wrote:
> I think you could define star wars and starwars a
A while back I sent a question to the list about only returning the most
recent version of a document, based on a numerical version field stored in
each record. Someone suggested that I use field collapsing to do so, and in
most cases it seems to work well. However, I've hit a snag and I'd
apprec
wow awesome hahaha thanks!
On Oct 6, 2011 8:36 AM, "Gora Mohanty" wrote:
> On Thu, Oct 6, 2011 at 12:55 AM, James Lin wrote:
>> Hi,
>>
>> I got an area index which only has one area name field, the field type is
>> using the "text_en_splitting"
>> some sample data will be: "Auckland", "North Shor
OK, I'm going to answer my own question -- it was probably so obvious that
nobody else wanted answer such an easy one!
I simply needed to apply
after
instead of before, as I had it originally. Otherwise "the\s" and "a\s" is
never matched!
Hope this maybe helps somebody else...
On Thu, Oct 6, 2011 at 12:55 AM, James Lin wrote:
> Hi,
>
> I got an area index which only has one area name field, the field type is
> using the "text_en_splitting"
> some sample data will be: "Auckland", "North Shore" etc.
>
> If I have a search query "hotels in auckland", the result doesn't mat
Hi,
I got an area index which only has one area name field, the field type is
using the "text_en_splitting"
some sample data will be: "Auckland", "North Shore" etc.
If I have a search query "hotels in auckland", the result doesn't match
anything. How would I change the index config to make it mat
On Wed, Oct 5, 2011 at 3:03 PM, David Ryan wrote:
> Do you mean both BM25 and BM25F?
>
>
No, BM25F and other "fielded" or structured models are somewhat different.
In these model, if you have two fields (body/title) you are saying
that "dogs" in body is actually the same term as "dogs" in title.
On Wed, Oct 5, 2011 at 11:42 PM, FionaY wrote:
> We have Solr integrated, but we are having some issues with search relevance
> and we need some help fine tuning the search results. Anyone think they can
> help?
Well, you would at least need to describe what problems you
are facing, e.g., some ex
Do you mean both BM25 and BM25F?
On Wed, Oct 5, 2011 at 11:44 AM, Robert Muir wrote:
> On Wed, Oct 5, 2011 at 2:23 PM, David Ryan wrote:
> > Hi,
> >
> > According to the IRA issue 2959,
> > https://issues.apache.org/jira/browse/LUCENE-2959
> >
> > BM25 will be included in the next release of L
The example does not include the evidence. But we do use eDisMax for
scoring in Solr.
The following is from solrconfig.xml:
edismax
Here is a short snippet of the explained result, where 0.1 is the Tie
breaker in DisMax/eDisMax.
6.446447 = (MATCH) max plus 0.1 times others of:
0.63826215
Hello, Andrzej.
First of all thanks for your help. The thing is that I´m not using Lucene:
I´m using Solr to index (well, I know that it envolves Lucene). I know about
Solr replication, but the index is being modify in real time includying new
documents with new petitions incoming. In resume, from
On Wed, Oct 5, 2011 at 2:23 PM, David Ryan wrote:
> Hi,
>
> According to the IRA issue 2959,
> https://issues.apache.org/jira/browse/LUCENE-2959
>
> BM25 will be included in the next release of LUCENE.
>
> 1). Will BM25F be included in the next release as well as part
> of LUCENE-2959?
should be
probably can't help, but pls keep the topic on list, as it is important for
me too!
On Wed, Oct 5, 2011 at 14:12, FionaY wrote:
> We have Solr integrated, but we are having some issues with search
> relevance
> and we need some help fine tuning the search results. Anyone think they can
> help?
We have Solr integrated, but we are having some issues with search relevance
and we need some help fine tuning the search results. Anyone think they can
help?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Search-Relevance-Assistance-tp3397404p3397404.html
Sent from the So
Hi,
I followed the very simple instructions found at '
http://wiki.apache.org/solr/Flare/HowTo
but run into a problem at step 4
Launch Solr:
cd ; java -Dsolr.solr.home= -jar start.jar
where Solr complains that it can't find solrconfig.xml in either the
classpath or the solr-ruby home dir. Can
: Thanks! What's the procedure to report this if it's a bug?
: EDisMax has similar behavior.
what yo uare seeing isn't specific to dismax & edismax (in fact: there's
no evidence in your example that dismax is even being used)
what you are seeing is the basic scoring of a TermQuery using the
D
We generated our own concatenated key (original customer, who may historically
have different addresses, etc.). If there is a way for Solr to do that
automatigically, I'd love to hear about it.
I don't think that the extra bytes for the key itself (String vs. binary
integer) is all that much o
On 05/10/2011 19:21, Luis Cappa Banda wrote:
Hello.
I´ve been looking for information trying to find an easy way to do index
backups with Solr and I´ve readed that lukeall has an application called
XMLExporter that creates a XML dump from a lucene index with it´s complete
information. I´ve got s
Hi,
According to the IRA issue 2959,
https://issues.apache.org/jira/browse/LUCENE-2959
BM25 will be included in the next release of LUCENE.
1). Will BM25F be included in the next release as well as part
of LUCENE-2959?
2). What's the timeline of the next release that new scoring modules will
be
Ok, here is the calculation of the score:
0.18314168 = *2.3121865* * 0.15502669 * 1.4142135 * *2.3121865* * 0.15625
*2.3121865 is *multiplied twice here. That is what I mean tf x idf^2 is
used instead of tf x idf.
On Wed, Oct 5, 2011 at 10:42 AM, Markus Jelsma
wrote:
> Hi,
>
> I don't see
: the response. When I add "&mm=50%25" to the URL in my browser (%25 being the
: URL encoding for the percent symbol), the response changes the mm value to
: "50%" as expected, overriding the value in solrconfig.xml. I have not tried
that is the value of hte mm param, but elisabeth seems to be
Hi,
I don't see 2.3121865 * 2 anywhere in your debug output or something that
looks like that.
> Hi Markus,
>
> The idf calculation itself is correct.
> What I am trying to understand here is why idf value is multiplied twice
> in the final score calculation. Essentially, tf x idf^2 is used
I could solve it. by using fq parameter: fq=type:pdf
but I want to have both pdf files and other formats like doc and docx. what
query should i use to have pdf,doc and docx files in my search?
On Tue, Oct 4, 2011 at 9:23 PM, ahmad ajiloo wrote:
> Hi
> I'm using Nutch for crawing and indexed my d
Hi Markus,
The idf calculation itself is correct.
What I am trying to understand here is why idf value is multiplied twice in
the final score calculation. Essentially, tf x idf^2 is used instead of tf
x idf.
I'd like to understand the rational behind that.
On Wed, Oct 5, 2011 at 9:43 AM, Ma
Hello.
I´ve been looking for information trying to find an easy way to do index
backups with Solr and I´ve readed that lukeall has an application called
XMLExporter that creates a XML dump from a lucene index with it´s complete
information. I´ve got some questions about this alternative:
*1. *Do
In Lucene's default similarity idf = 1 + ln (numDocs / df + 1).
1 + ln(26 / 7) =~ 2.3121865
I don't see a problem.
> Hi,
>
>
> When I examine the score calculation of DisMax in Solr, it looks to me
> that DisMax is using tf x idf^2 instead of tf x idf.
> Does anyone have insight why tf x id
I figure it out.. thanks for pointing me in the right direction... so at the
end solr field type text was changed for text_general
I was only missing this filters
On Wed, Oct 5, 2011 at 10:52 AM, jo [via Lucene] <
ml-node+s472066n3396737...@n3.nabble.com> wrote:
> Okay I did
Thanks! What's the procedure to report this if it's a bug?
EDisMax has similar behavior.
On Tue, Oct 4, 2011 at 11:24 PM, Bill Bell wrote:
> This seems like a bug to me.
>
> On 10/4/11 6:52 PM, "David Ryan" wrote:
>
> >Hi,
> >
> >
> >When I examine the score calculation of DisMax in Solr, it
Hi,
for my application, I would like to be able to create web queries
(wget/curl) that get "more like this" for either a single arbitrarily
specified URL or for the first x terms in a search query. I want to return
the results to myself as a csv file using wt=csv. How can I accomplish the
MLT pie
On 10/5/2011 9:18 AM, David GUYOT wrote:
I'm currently trying to benchmark my SolR install with a custom script,
but this benchmark must be run with all SolR caches empty; is there a
way to erase SolR caches by a command or to restart SolR with an option
to avoid cache autowarming?
Remove any f
On 10/5/2011 9:06 AM, elisabeth benoit wrote:
thanks for answering.
echoParams just echos mm value in solrconfig.xml (in my case mm = 4<-1
6<-2), not the actual value of mm for one particular request.
I think would be very useful to be able to know which mm value was
effectively used, in partic
Hello, everybody.
Firstly, I must advise you that I'm a probie with mailing lists and a
Froggie, so please excuse that could look as obvious errors, in both
computing and language.
I'm currently trying to benchmark my SolR install with a custom script,
but this benchmark must be run with all SolR
I think you could define star wars and starwars as synonyms in
synonyms.txt...
maybe not generic enough?
2011/10/5 Mike Mander
> Isn't this more a problem of the query string?
>
> Let's assume i have a game name like "Nintentdo 3DS - 'Star Wars - Clone
> Wars'".
> Can i copy that name to a fiel
thanks for answering.
echoParams just echos mm value in solrconfig.xml (in my case mm = 4<-1
6<-2), not the actual value of mm for one particular request.
I think would be very useful to be able to know which mm value was
effectively used, in particular for request with stopwords.
It's of course
Hi all!
I have documents, all of which have a title, and I would like to sort by
that title. The catch is, I wish to sort ignoring any "A" or "The" at the
beginning of the title.
My first (and only) attempt is by creating a type that looks like:
Okay I did use the analysis tool and it did make me notice a few things but
more important what changed
there is no longer a field type named "text" on the new schema, there is
only "text_en" which is weird as text field is the default when doing a
query..
anyway, when I used the analysis tool a
Isn't this more a problem of the query string?
Let's assume i have a game name like "Nintentdo 3DS - 'Star Wars - Clone
Wars'".
Can i copy that name to a field cutting the - and ', lowercase the
result string
and remove the whitespaces? So that i have "nintendo3dsstarwarsclonewars".
Is that "f
On 10/5/2011 1:01 AM, elisabeth benoit wrote:
Hello,
I'd like to be able to know programmaticaly what value mm was set to for one
request (to avoid having to parse the query, identify stopwords, calculate
mm based on solrconfig.xml). Is there a way to get mm value in solr
response?
To suppleme
I've uloaded the file here:
http://www.filesonic.com/file/2342166624/Starting_a_Search_Application.pdf
try this, thanks
2011/10/5 Michael McCandless
> Hmm, no attachment; maybe it's too large?
>
> Can you send it directly to me?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> 2011/1
Hmm, no attachment; maybe it's too large?
Can you send it directly to me?
Mike McCandless
http://blog.mikemccandless.com
2011/10/5 Héctor Trujillo :
> This is the file that give me errors.
>
> 2011/10/5 Michael McCandless
>>
>> Can you attach this PDF to an email & send to the list? Or is it
Thanks but, unfortunately that will not solve the problem since it will bring
back both the first and second doc. Besides, the query terms is: a b y z,
not just: a b
--
View this message in context:
http://lucene.472066.n3.nabble.com/A-simple-query-tp3395465p3396297.html
Sent from the Solr - User
index this field without whitespaces ? XD
-
--- System
One Server, 12 GB RAM, 2 Solr Instances, 8 Cores,
1 Core with 45 Million Documents other Cores < 200.000
- Solr1 for Search-Requests - commit every Minute - 5GB Xmx
-
Can you attach this PDF to an email & send to the list? Or is it too
large for that?
Or, you can try running Tika directly on the PDF to see if it's able
to extract the text.
Mike McCandless
http://blog.mikemccandless.com
2011/10/5 Héctor Trujillo :
> Sorry you have the reason, this file was i
Thanks stockii,
but WDFF ist splitting on Numeric or NameChange only.
For Star Wars in index and starwars in query this means that both are
not equal. Or?
Thanks
Mike
which type in the schema.xml do you use.
try out WordDelimiterFilterFactory or some other filters from this site:
http://wik
Hello,
To crawl the document you can use Apache Tika before sending the content to
Solr (via Solrj).
Regards,
Marc.
On Wed, Oct 5, 2011 at 1:16 AM, Chris Hostetter wrote:
>
> : I want to index some document with solrj API's but the URL of theses
> : documents is FTP,
> : How to set username an
Héctor,
I was meaning you need another way to reference the file *to the mailing list*.
Sorry for the confusion.
I do not think there's anything special to the set of interfaces you're using
if the delivery is the same for the solr client and the acrobat plugin. To make
sure of it, you could t
which type in the schema.xml do you use.
try out WordDelimiterFilterFactory or some other filters from this site:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory
-
--- System
One
Hello,
i have configured a catchall searchword field. In this i copy the value
of field name. Name value = "Star Wars".
Now i try to find this document by searchword "starwars". But it's not
found.
Vice versa same problem. Name value = "SuperRTL", searchword is "super rtl".
Replacing all whit
Sorry you have the reason, this file was indexed with a .Net web service
client, that calls a Java application(a web service) that calls Solr using
SolrJ.
I will try to index this in a different way, may be this resolve the
problem.
Thanks
Best regards
El 5 de octubre de 2011 08:42, Héctor Tr
It seems unreasonable that if I want to index a local file, I have to
references this local file by an URL.
This isn't a estrange file, this is a file downloaded from lucid web portal
called: Starting a Search Application.pdf
This problem may be a codification problem, or char set problem. I op
You count index the date as a text field(or use a new text field to store
date as text) and then try it on this new field
Thanx
Pravesh
--
View this message in context:
http://lucene.472066.n3.nabble.com/Hierarchical-faceting-with-Date-tp3394521p3395824.html
Sent from the Solr - User mailing lis
Hi,
since this isn't logged anywhere, as far as I can say, there are two ways:
Either you apply mm within your url-call, so that you get the whole mm
param back per request and calculate the applied mm with this
information (sounds bad), or you recalculate it within your own custom
search componen
You can explicitly pass /mm/ for every search, and get it in your response,
otherwise use /debugQuery=true/, it will give you all implicitly used
defaults (but you wouldn't want to use this in production)
Thanx
Pravesh
--
View this message in context:
http://lucene.472066.n3.nabble.com/is-there-
>in a certain time period (say christmas) I will promote a doc in "christmas"
keyword
You might check the QueryElevation component in SOLR.
>or based on users interest I will boost a specific category of products.
>or (I am not sure how can I do this one) I will boost docs that current
>user's fr
Hello,
I'd like to be able to know programmaticaly what value mm was set to for one
request (to avoid having to parse the query, identify stopwords, calculate
mm based on solrconfig.xml). Is there a way to get mm value in solr
response?
Thanks,
Elisabeth
67 matches
Mail list logo