Re: is there a way to know which mm value was used?

2011-10-05 Thread elisabeth benoit
I would use that mm value to decrease it in case user's request would get no answer. I deal with requests potentially containing a lot of parasite words, and I want to progammaticaly lower mm in a second try request if necessary. But I don't want to decrease it too much to avoid getting too many i

Excluding docs from results based on matched field

2011-10-05 Thread Otis Gospodnetic
Hello, Is there some magic in edismax or one of the QPs that would make this possible: Boost documents which match name and desc; include docs which just match name; and exclude docs which only match desc. ? One could use very high field weight for name and very low weight for desc field in o

Re: is there a way to know which mm value was used?

2011-10-05 Thread Bill Bell
It would be good to output the mm value for debugging. Something like mm_value = 2 Then you should know the results are right. On 10/5/11 9:58 AM, "Shawn Heisey" wrote: >On 10/5/2011 9:06 AM, elisabeth benoit wrote: >> thanks for answering. >> >> echoParams just echos mm value in solrconfig.xm

Re: Scoring of DisMax in Solr

2011-10-05 Thread Bill Bell
Markus, The calculation is correct. Look at your output. Result = queryWeight(text:gb) * fieldWeight(text:gb in 1) Result = (idf(docFreq=6, numDocs=26) * queryNorm) * (tf(termFreq(text:gb)=2) * idf(docFreq=6, numDocs=26) * fieldNorm(field=text, doc=1)) This you should notice that idf(docFreq=6

Re: Offering search suggestions - a discussion of multi-term phrases

2011-10-05 Thread Otis Gospodnetic
Shawn, Have you looked at http://www.sematext.com/products/dym-researcher/index.html as a solution to the ZeroHits problem? If that doesn't work, then yes, offline word/phase co-occurrence may work. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search ::

interestingTerms=details

2011-10-05 Thread dan whelan
Hi, I noticed that every "interesting term" returned using the MoreLikeThisHandler always have a boost of 1. How would one go about making a term have a different boost. Say I have a paragraph of text and I do a more like this query on the paragraph. But if term XX or YY is in the paragraph

Re: How do i get results for quering with separated words?

2011-10-05 Thread Ahmet Arslan
Using ShingleFilterFactory at index time may help. http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory --- On Wed, 10/5/11, Mikhail Khludnev wrote: > From: Mikhail Khludnev > Subject: Re: How do i get results for quering with separated words? > To: solr-use

Re: A simple query?

2011-10-05 Thread Ahmet Arslan
Your use-case is pretty unique. One solutions might be to use MemoryIndex which is designed for "Prospective search". http://lucene.apache.org/java/2_4_0/api/contrib-memory/org/apache/lucene/index/memory/MemoryIndex.html Your documents will be your stored "huge numbers of queries". Your user en

Re: CopyField copying to self

2011-10-05 Thread Gora Mohanty
On Thu, Oct 6, 2011 at 1:49 AM, Jamie Johnson wrote: > I have a field named test_txt which I am populating in some cases, and > not in others.  I also have a copy field directive to copy data from > _txt to text_txt.  Thigns seem to work except I believe the field is > also copying to itself.  Is

CopyField copying to self

2011-10-05 Thread Jamie Johnson
I have a field named test_txt which I am populating in some cases, and not in others. I also have a copy field directive to copy data from _txt to text_txt. Thigns seem to work except I believe the field is also copying to itself. Is there anyway to avoid this behavior?

Offering search suggestions - a discussion of multi-term phrases

2011-10-05 Thread Shawn Heisey
I am trying to figure out how we can begin offering search suggestions to people, especially when a user types in something that results in few or zero results. For background, we have an archive of about 60 million objects, most of which are photographs. There are also a number of text artic

Re: Sorting by article title

2011-10-05 Thread Mattmann, Chris A (388J)
Hi, You can also check out LUCENE-3413 [1] and the CombiningFilter that I wrote and associated example. This lets you: 1. perform normal tokenization and analysis in your analysis chain 2. recombine the tokens at the end for sorting purposes HTH, Chris [1] https://issues.apache.org/jira/browse

Re: How do i get results for quering with separated words?

2011-10-05 Thread Mikhail Khludnev
Have you tried to correct spaces by spelling dictionary? if you build you dictionary from non tokenized terms, you'll have starwars -> Star Wars and super rtl->superrtl corrections. WDYT? On Wed, Oct 5, 2011 at 7:13 PM, elisabeth benoit wrote: > I think you could define star wars and starwars a

Field Collapsing and Record Filtering

2011-10-05 Thread Daniel Skiles
A while back I sent a question to the list about only returning the most recent version of a document, based on a numerical version field stored in each record. Someone suggested that I use field collapsing to do so, and in most cases it seems to work well. However, I've hit a snag and I'd apprec

Re: make search "hotels in auckland" match "auckland" in index

2011-10-05 Thread James Lin
wow awesome hahaha thanks! On Oct 6, 2011 8:36 AM, "Gora Mohanty" wrote: > On Thu, Oct 6, 2011 at 12:55 AM, James Lin wrote: >> Hi, >> >> I got an area index which only has one area name field, the field type is >> using the "text_en_splitting" >> some sample data will be: "Auckland", "North Shor

Re: Sorting by article title

2011-10-05 Thread themanwho
OK, I'm going to answer my own question -- it was probably so obvious that nobody else wanted answer such an easy one! I simply needed to apply after instead of before, as I had it originally. Otherwise "the\s" and "a\s" is never matched! Hope this maybe helps somebody else...

Re: make search "hotels in auckland" match "auckland" in index

2011-10-05 Thread Gora Mohanty
On Thu, Oct 6, 2011 at 12:55 AM, James Lin wrote: > Hi, > > I got an area index which only has one area name field, the field type is > using the "text_en_splitting" > some sample data will be: "Auckland", "North Shore" etc. > > If I have a search query "hotels in auckland", the result doesn't mat

make search "hotels in auckland" match "auckland" in index

2011-10-05 Thread James Lin
Hi, I got an area index which only has one area name field, the field type is using the "text_en_splitting" some sample data will be: "Auckland", "North Shore" etc. If I have a search query "hotels in auckland", the result doesn't match anything. How would I change the index config to make it mat

Re: New scoring models in LUCENE/SOLR (LUCENE-2959)

2011-10-05 Thread Robert Muir
On Wed, Oct 5, 2011 at 3:03 PM, David Ryan wrote: > Do you mean both BM25 and BM25F? > > No, BM25F and other "fielded" or structured models are somewhat different. In these model, if you have two fields (body/title) you are saying that "dogs" in body is actually the same term as "dogs" in title.

Re: Search Relevance Assistance

2011-10-05 Thread Gora Mohanty
On Wed, Oct 5, 2011 at 11:42 PM, FionaY wrote: > We have Solr integrated, but we are having some issues with search relevance > and we need some help fine tuning the search results. Anyone think they can > help? Well, you would at least need to describe what problems you are facing, e.g., some ex

Re: New scoring models in LUCENE/SOLR (LUCENE-2959)

2011-10-05 Thread David Ryan
Do you mean both BM25 and BM25F? On Wed, Oct 5, 2011 at 11:44 AM, Robert Muir wrote: > On Wed, Oct 5, 2011 at 2:23 PM, David Ryan wrote: > > Hi, > > > > According to the IRA issue 2959, > > https://issues.apache.org/jira/browse/LUCENE-2959 > > > > BM25 will be included in the next release of L

Re: Scoring of DisMax in Solr

2011-10-05 Thread David Ryan
The example does not include the evidence. But we do use eDisMax for scoring in Solr. The following is from solrconfig.xml: edismax Here is a short snippet of the explained result, where 0.1 is the Tie breaker in DisMax/eDisMax. 6.446447 = (MATCH) max plus 0.1 times others of: 0.63826215

Re: Backup with lukeall XMLExporter.

2011-10-05 Thread Luis Cappa Banda
Hello, Andrzej. First of all thanks for your help. The thing is that I´m not using Lucene: I´m using Solr to index (well, I know that it envolves Lucene). I know about Solr replication, but the index is being modify in real time includying new documents with new petitions incoming. In resume, from

Re: New scoring models in LUCENE/SOLR (LUCENE-2959)

2011-10-05 Thread Robert Muir
On Wed, Oct 5, 2011 at 2:23 PM, David Ryan wrote: > Hi, > > According to the IRA issue 2959, > https://issues.apache.org/jira/browse/LUCENE-2959 > > BM25 will be included in the next release of LUCENE. > > 1). Will BM25F be included in the next release as well as part > of LUCENE-2959? should be

Re: Search Relevance Assistance

2011-10-05 Thread Fred Zimmerman
probably can't help, but pls keep the topic on list, as it is important for me too! On Wed, Oct 5, 2011 at 14:12, FionaY wrote: > We have Solr integrated, but we are having some issues with search > relevance > and we need some help fine tuning the search results. Anyone think they can > help?

Search Relevance Assistance

2011-10-05 Thread FionaY
We have Solr integrated, but we are having some issues with search relevance and we need some help fine tuning the search results. Anyone think they can help? -- View this message in context: http://lucene.472066.n3.nabble.com/Search-Relevance-Assistance-tp3397404p3397404.html Sent from the So

getting started with Solr Flare

2011-10-05 Thread Fred Zimmerman
Hi, I followed the very simple instructions found at ' http://wiki.apache.org/solr/Flare/HowTo but run into a problem at step 4 Launch Solr: cd ; java -Dsolr.solr.home= -jar start.jar where Solr complains that it can't find solrconfig.xml in either the classpath or the solr-ruby home dir. Can

Re: Scoring of DisMax in Solr

2011-10-05 Thread Chris Hostetter
: Thanks! What's the procedure to report this if it's a bug? : EDisMax has similar behavior. what yo uare seeing isn't specific to dismax & edismax (in fact: there's no evidence in your example that dismax is even being used) what you are seeing is the basic scoring of a TermQuery using the D

RE: composite Unique Keys?

2011-10-05 Thread Jaeger, Jay - DOT
We generated our own concatenated key (original customer, who may historically have different addresses, etc.). If there is a way for Solr to do that automatigically, I'd love to hear about it. I don't think that the extra bytes for the key itself (String vs. binary integer) is all that much o

Re: Backup with lukeall XMLExporter.

2011-10-05 Thread Andrzej Bialecki
On 05/10/2011 19:21, Luis Cappa Banda wrote: Hello. I´ve been looking for information trying to find an easy way to do index backups with Solr and I´ve readed that lukeall has an application called XMLExporter that creates a XML dump from a lucene index with it´s complete information. I´ve got s

New scoring models in LUCENE/SOLR (LUCENE-2959)

2011-10-05 Thread David Ryan
Hi, According to the IRA issue 2959, https://issues.apache.org/jira/browse/LUCENE-2959 BM25 will be included in the next release of LUCENE. 1). Will BM25F be included in the next release as well as part of LUCENE-2959? 2). What's the timeline of the next release that new scoring modules will be

Re: Scoring of DisMax in Solr

2011-10-05 Thread David Ryan
Ok, here is the calculation of the score: 0.18314168 = *2.3121865* * 0.15502669 * 1.4142135 * *2.3121865* * 0.15625 *2.3121865 is *multiplied twice here. That is what I mean tf x idf^2 is used instead of tf x idf. On Wed, Oct 5, 2011 at 10:42 AM, Markus Jelsma wrote: > Hi, > > I don't see

Re: is there a way to know which mm value was used?

2011-10-05 Thread Chris Hostetter
: the response. When I add "&mm=50%25" to the URL in my browser (%25 being the : URL encoding for the percent symbol), the response changes the mm value to : "50%" as expected, overriding the value in solrconfig.xml. I have not tried that is the value of hte mm param, but elisabeth seems to be

Re: Scoring of DisMax in Solr

2011-10-05 Thread Markus Jelsma
Hi, I don't see 2.3121865 * 2 anywhere in your debug output or something that looks like that. > Hi Markus, > > The idf calculation itself is correct. > What I am trying to understand here is why idf value is multiplied twice > in the final score calculation. Essentially, tf x idf^2 is used

Re: Search on content_type

2011-10-05 Thread ahmad ajiloo
I could solve it. by using fq parameter: fq=type:pdf but I want to have both pdf files and other formats like doc and docx. what query should i use to have pdf,doc and docx files in my search? On Tue, Oct 4, 2011 at 9:23 PM, ahmad ajiloo wrote: > Hi > I'm using Nutch for crawing and indexed my d

Re: Scoring of DisMax in Solr

2011-10-05 Thread David Ryan
Hi Markus, The idf calculation itself is correct. What I am trying to understand here is why idf value is multiplied twice in the final score calculation. Essentially, tf x idf^2 is used instead of tf x idf. I'd like to understand the rational behind that. On Wed, Oct 5, 2011 at 9:43 AM, Ma

Backup with lukeall XMLExporter.

2011-10-05 Thread Luis Cappa Banda
Hello. I´ve been looking for information trying to find an easy way to do index backups with Solr and I´ve readed that lukeall has an application called XMLExporter that creates a XML dump from a lucene index with it´s complete information. I´ve got some questions about this alternative: *1. *Do

Re: Scoring of DisMax in Solr

2011-10-05 Thread Markus Jelsma
In Lucene's default similarity idf = 1 + ln (numDocs / df + 1). 1 + ln(26 / 7) =~ 2.3121865 I don't see a problem. > Hi, > > > When I examine the score calculation of DisMax in Solr, it looks to me > that DisMax is using tf x idf^2 instead of tf x idf. > Does anyone have insight why tf x id

Re: schema changes changes 3.3 to 3.4?

2011-10-05 Thread jo
I figure it out.. thanks for pointing me in the right direction... so at the end solr field type text was changed for text_general I was only missing this filters On Wed, Oct 5, 2011 at 10:52 AM, jo [via Lucene] < ml-node+s472066n3396737...@n3.nabble.com> wrote: > Okay I did

Re: Scoring of DisMax in Solr

2011-10-05 Thread David Ryan
Thanks! What's the procedure to report this if it's a bug? EDisMax has similar behavior. On Tue, Oct 4, 2011 at 11:24 PM, Bill Bell wrote: > This seems like a bug to me. > > On 10/4/11 6:52 PM, "David Ryan" wrote: > > >Hi, > > > > > >When I examine the score calculation of DisMax in Solr, it

"more like this"

2011-10-05 Thread Fred Zimmerman
Hi, for my application, I would like to be able to create web queries (wget/curl) that get "more like this" for either a single arbitrarily specified URL or for the first x terms in a search query. I want to return the results to myself as a csv file using wt=csv. How can I accomplish the MLT pie

Re: How to empty SolR Cache

2011-10-05 Thread Shawn Heisey
On 10/5/2011 9:18 AM, David GUYOT wrote: I'm currently trying to benchmark my SolR install with a custom script, but this benchmark must be run with all SolR caches empty; is there a way to erase SolR caches by a command or to restart SolR with an option to avoid cache autowarming? Remove any f

Re: is there a way to know which mm value was used?

2011-10-05 Thread Shawn Heisey
On 10/5/2011 9:06 AM, elisabeth benoit wrote: thanks for answering. echoParams just echos mm value in solrconfig.xml (in my case mm = 4<-1 6<-2), not the actual value of mm for one particular request. I think would be very useful to be able to know which mm value was effectively used, in partic

How to empty SolR Cache

2011-10-05 Thread David GUYOT
Hello, everybody. Firstly, I must advise you that I'm a probie with mailing lists and a Froggie, so please excuse that could look as obvious errors, in both computing and language. I'm currently trying to benchmark my SolR install with a custom script, but this benchmark must be run with all SolR

Re: How do i get results for quering with separated words?

2011-10-05 Thread elisabeth benoit
I think you could define star wars and starwars as synonyms in synonyms.txt... maybe not generic enough? 2011/10/5 Mike Mander > Isn't this more a problem of the query string? > > Let's assume i have a game name like "Nintentdo 3DS - 'Star Wars - Clone > Wars'". > Can i copy that name to a fiel

Re: is there a way to know which mm value was used?

2011-10-05 Thread elisabeth benoit
thanks for answering. echoParams just echos mm value in solrconfig.xml (in my case mm = 4<-1 6<-2), not the actual value of mm for one particular request. I think would be very useful to be able to know which mm value was effectively used, in particular for request with stopwords. It's of course

Sorting by article title

2011-10-05 Thread themanwho
Hi all! I have documents, all of which have a title, and I would like to sort by that title. The catch is, I wish to sort ignoring any "A" or "The" at the beginning of the title. My first (and only) attempt is by creating a type that looks like:

Re: schema changes changes 3.3 to 3.4?

2011-10-05 Thread jo
Okay I did use the analysis tool and it did make me notice a few things but more important what changed there is no longer a field type named "text" on the new schema, there is only "text_en" which is weird as text field is the default when doing a query.. anyway, when I used the analysis tool a

Re: How do i get results for quering with separated words?

2011-10-05 Thread Mike Mander
Isn't this more a problem of the query string? Let's assume i have a game name like "Nintentdo 3DS - 'Star Wars - Clone Wars'". Can i copy that name to a field cutting the - and ', lowercase the result string and remove the whitespaces? So that i have "nintendo3dsstarwarsclonewars". Is that "f

Re: is there a way to know which mm value was used?

2011-10-05 Thread Shawn Heisey
On 10/5/2011 1:01 AM, elisabeth benoit wrote: Hello, I'd like to be able to know programmaticaly what value mm was set to for one request (to avoid having to parse the query, identify stopwords, calculate mm based on solrconfig.xml). Is there a way to get mm value in solr response? To suppleme

Re: Indexing PDF

2011-10-05 Thread Héctor Trujillo
I've uloaded the file here: http://www.filesonic.com/file/2342166624/Starting_a_Search_Application.pdf try this, thanks 2011/10/5 Michael McCandless > Hmm, no attachment; maybe it's too large? > > Can you send it directly to me? > > Mike McCandless > > http://blog.mikemccandless.com > > 2011/1

Re: Indexing PDF

2011-10-05 Thread Michael McCandless
Hmm, no attachment; maybe it's too large? Can you send it directly to me? Mike McCandless http://blog.mikemccandless.com 2011/10/5 Héctor Trujillo : > This is the file that give me errors. > > 2011/10/5 Michael McCandless >> >> Can you attach this PDF to an email & send to the list?  Or is it

Re: A simple query?

2011-10-05 Thread alexw
Thanks but, unfortunately that will not solve the problem since it will bring back both the first and second doc. Besides, the query terms is: a b y z, not just: a b -- View this message in context: http://lucene.472066.n3.nabble.com/A-simple-query-tp3395465p3396297.html Sent from the Solr - User

Re: How do i get results for quering with separated words?

2011-10-05 Thread stockii
index this field without whitespaces ? XD - --- System One Server, 12 GB RAM, 2 Solr Instances, 8 Cores, 1 Core with 45 Million Documents other Cores < 200.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx -

Re: Indexing PDF

2011-10-05 Thread Michael McCandless
Can you attach this PDF to an email & send to the list? Or is it too large for that? Or, you can try running Tika directly on the PDF to see if it's able to extract the text. Mike McCandless http://blog.mikemccandless.com 2011/10/5 Héctor Trujillo : > Sorry you have the reason, this file was i

Re: How do i get results for quering with separated words?

2011-10-05 Thread Mike Mander
Thanks stockii, but WDFF ist splitting on Numeric or NameChange only. For Star Wars in index and starwars in query this means that both are not equal. Or? Thanks Mike which type in the schema.xml do you use. try out WordDelimiterFilterFactory or some other filters from this site: http://wik

Re: indexing FTP documet with solrj

2011-10-05 Thread Marc SCHNEIDER
Hello, To crawl the document you can use Apache Tika before sending the content to Solr (via Solrj). Regards, Marc. On Wed, Oct 5, 2011 at 1:16 AM, Chris Hostetter wrote: > > : I want to index some document with solrj API's but the URL of theses > : documents is FTP, > : How to set username an

Re: Indexing PDF

2011-10-05 Thread Paul Libbrecht
Héctor, I was meaning you need another way to reference the file *to the mailing list*. Sorry for the confusion. I do not think there's anything special to the set of interfaces you're using if the delivery is the same for the solr client and the acrobat plugin. To make sure of it, you could t

Re: How do i get results for quering with separated words?

2011-10-05 Thread stockii
which type in the schema.xml do you use. try out WordDelimiterFilterFactory or some other filters from this site: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory - --- System One

How do i get results for quering with separated words?

2011-10-05 Thread Mike Mander
Hello, i have configured a catchall searchword field. In this i copy the value of field name. Name value = "Star Wars". Now i try to find this document by searchword "starwars". But it's not found. Vice versa same problem. Name value = "SuperRTL", searchword is "super rtl". Replacing all whit

Re: Indexing PDF

2011-10-05 Thread Héctor Trujillo
Sorry you have the reason, this file was indexed with a .Net web service client, that calls a Java application(a web service) that calls Solr using SolrJ. I will try to index this in a different way, may be this resolve the problem. Thanks Best regards El 5 de octubre de 2011 08:42, Héctor Tr

Re: Indexing PDF

2011-10-05 Thread Héctor Trujillo
It seems unreasonable that if I want to index a local file, I have to references this local file by an URL. This isn't a estrange file, this is a file downloaded from lucid web portal called: Starting a Search Application.pdf This problem may be a codification problem, or char set problem. I op

Re: Hierarchical faceting with Date

2011-10-05 Thread pravesh
You count index the date as a text field(or use a new text field to store date as text) and then try it on this new field Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Hierarchical-faceting-with-Date-tp3394521p3395824.html Sent from the Solr - User mailing lis

Re: is there a way to know which mm value was used?

2011-10-05 Thread Em
Hi, since this isn't logged anywhere, as far as I can say, there are two ways: Either you apply mm within your url-call, so that you get the whole mm param back per request and calculate the applied mm with this information (sounds bad), or you recalculate it within your own custom search componen

Re: is there a way to know which mm value was used?

2011-10-05 Thread pravesh
You can explicitly pass /mm/ for every search, and get it in your response, otherwise use /debugQuery=true/, it will give you all implicitly used defaults (but you wouldn't want to use this in production) Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/is-there-

Re: boosting and relevancy options from solr extensibility points -java-

2011-10-05 Thread pravesh
>in a certain time period (say christmas) I will promote a doc in "christmas" keyword You might check the QueryElevation component in SOLR. >or based on users interest I will boost a specific category of products. >or (I am not sure how can I do this one) I will boost docs that current >user's fr

is there a way to know which mm value was used?

2011-10-05 Thread elisabeth benoit
Hello, I'd like to be able to know programmaticaly what value mm was set to for one request (to avoid having to parse the query, identify stopwords, calculate mm based on solrconfig.xml). Is there a way to get mm value in solr response? Thanks, Elisabeth