Re: query with stemming, prefix and fuzzy?

2009-01-29 Thread Gert Brinkmann
Shalin Shekhar Mangar wrote: Quite the opposite, you are actually working with some advanced stuff :) Thank you for the response. Please have some patience, someone is Ok, I will have (what else could I do? ;) ). Meanwhile I while try some things and continue to search the web. Greetings

Re: Highlighting does not work?

2009-01-29 Thread Jarek Zgoda
Added appriopriate amendment to FAQ, but I'd consider reorganizing information in the whole wiki, like creating a section titled "Common Tasks". Bit of redundancy does not hurt if it comes to documentation. Wiadomość napisana w dniu 2009-01-28, o godz. 20:01, przez Mike Klaas: Well, both pag

Re: newbie question --- multiple schemas

2009-01-29 Thread Noble Paul നോബിള്‍ नोब्ळ्
have two different cores and you can have separate schema for each. On Thu, Jan 29, 2009 at 1:20 PM, Cheng Zhang wrote: > Hello, > > Is it possible to define more than one schema? I'm reading the example > schema.xml. It seems that we can only define one schema? What about if I want > to define

Re: WebLogic 10 Compatibility Issue - StackOverflowError

2009-01-29 Thread Ilan Rabinovitch
We were able to deploy Solr 1.3 on Weblogic 10.0 earlier today. Doing so required two changes: 1) Creating a weblogic.xml file in solr.war's WEB-INF directory. The weblogic.xml file is required to disable Solr's filter on FORWARD. The contents of weblogic.xml should be: http://www.bea.

Registration for ApacheCon Europe 2009 is now open!

2009-01-29 Thread Erik Hatcher
Cross-posting this announcement. There are several relevant Lucene/ Solr talks including: Trainings - Lucene Boot Camp (Grant Ingersoll) - Solr Boot Camp (Erik Hatcher) Sessions - Introducing Apache Mahout (Grant) - Lucene Case Studies (Erik) - Advanced Indexing Techniques with Apach

Re: query with stemming, prefix and fuzzy?

2009-01-29 Thread Gert Brinkmann
Gert Brinkmann wrote: >> A) fuzzy search >> >> What can I do to speed up the fuzzy query? Setting ramBufferSizeMB to a higher value seems to speed up the query slightly. I have to continue with tuning though. >> B) combine stemming, prefix and fuzzy search >> >> Is there a way to combine all th

Re: Pagination by facet?

2009-01-29 Thread Bruno Aranda
Further investigations leads me to think that I could achieve this by using the parameters facet.offset and facet.limit. I wonder how to do this with solrj, as I can see the SolrQuery.setFacetLimit() method but not a method to specify the facet offset. I guess I can extend the class and the offset

Re: WebLogic 10 Compatibility Issue - StackOverflowError

2009-01-29 Thread Mark Miller
We should get this on the wiki. - Mark Ilan Rabinovitch wrote: We were able to deploy Solr 1.3 on Weblogic 10.0 earlier today. Doing so required two changes: 1) Creating a weblogic.xml file in solr.war's WEB-INF directory. The weblogic.xml file is required to disable Solr's filter on F

Re: WebLogic 10 Compatibility Issue - StackOverflowError

2009-01-29 Thread Alexander Ramos Jardim
Ilan, I had the same problem some months ago and had to remove the quoted line on jsp. But I never got the other problem you said with 1.3 in Weblogic. 2009/1/29 Ilan Rabinovitch > > We were able to deploy Solr 1.3 on Weblogic 10.0 earlier today. Doing so > required two changes: > > 1) Creatin

RE: DIH handling of missing files

2009-01-29 Thread Nathan Adams
I'm running the example from the DIH wiki page: http://wiki.apache.org/solr-data/attachments/DataImportHandler/attachments/example-solr-home.jar -Nathan From: Noble Paul ??? ?? [mailto:noble.p...@gmail.com] Sent: Wed 01/28/2009 11:32 PM To: solr-user

RE: DIH handling of missing files

2009-01-29 Thread Nathan Adams
Which appears to be v1.3, which explains the problem. Thanks! From: Nathan Adams [mailto:na...@umich.edu] Sent: Thu 01/29/2009 8:28 AM To: solr-user@lucene.apache.org Subject: RE: DIH handling of missing files I'm running the example from the DIH wiki page: h

fuzzy search and uppercased word. finds moo~ not Moo~

2009-01-29 Thread Julian Davchev
Hi, I am doing fuzzy search. And works correctly. For some reason though it has problems with uppercase words. e.g if I search moo~I get results but if I do Moo~ I don't. I see in analyzer that LowerCaseFilterFactory is hitting but I gess with fuzzy it's getting messy. Any clue someone? Ch

Data Directory Sync.

2009-01-29 Thread Kalidoss MM
Hi, I have a requirement like, There is a running solr and having around 10K records indexed in it. Now i have to index another set of 30K records? The 10K data already in live, And i dont have an option to insert that 30K records in live, Is there any way to run the solr in

Re: multilanguage + howto search in all languages?

2009-01-29 Thread Julian Davchev
Thank you both for points. For now I am hanlding with fuzzy search. Let's hope this will do for sometime :) Walter Underwood wrote: > I've done this. There are five cases for the tokens in the search > index: > > 1. Tokens that are unique after stemming (this is good). > 2. Tokens that are common

Re: Data Directory Sync.

2009-01-29 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Thu, Jan 29, 2009 at 7:27 PM, Kalidoss MM wrote: > Hi, > > I have a requirement like, There is a running solr and having around > 10K records indexed in it. Now i have to index another set of 30K records? > > The 10K data already in live, And i dont have an option to insert > that 3

check snapshoot/snapinstaller

2009-01-29 Thread sunnyfr
Hi, I would like to know how can I check properly if a snapshot has been done except by checking in the data directory. Does snapshooter.log file is updated if snapshoot is executed automaticly after a commit? Is there a way to check snapshoot or snapinstaller last activity by stats.jsp, don't

MASTER / SLAVES numdoc

2009-01-29 Thread sunnyfr
Hi, I've one server and several slaves and I would like to know if I go to the host.name/solr/admin/stat.jsp if there is a way to know the difference of the numDoc per server? Thanks a lot -- View this message in context: http://www.nabble.com/MASTER---SLAVES-numdoc-tp21730748p21730748.html S

Re: check snapshoot/snapinstaller

2009-01-29 Thread Bill Au
snapshooter.log logs all invocation of the snapshooter, including the automatic ones triggered by a commit/optimize. There are log files in the logs directory on various status/stats: http://wiki.apache.org/solr/SolrCollectionDistributionStatusStats These status/stats can be displayed in the adm

warmupTime : 0

2009-01-29 Thread sunnyfr
Hi, Do you think it's normal to have warmupTime : 0 ?? searcher class: org.apache.solr.search.SolrIndexSearcher version:1.0 description:index searcher stats: searcherName : searc...@6f7cf6b6 main caching : true numDocs : 8207035 maxDoc : 8239991 readerImpl : ReadOnlyMultiSe

Re: fuzzy search and uppercased word. finds moo~ not Moo~

2009-01-29 Thread Mark Miller
Julian Davchev wrote: Hi, I am doing fuzzy search. And works correctly. For some reason though it has problems with uppercase words. e.g if I search moo~I get results but if I do Moo~ I don't. I see in analyzer that LowerCaseFilterFactory is hitting but I gess with fuzzy it's getting mess

Solr Gaze and Multicore?

2009-01-29 Thread Jacob Singh
Sorry if this is wrong place to ask since Solr Gaze is Lucid's proejct, but I was trying to install this in a multicore environment, and it doesn't seem to be working. It says to add the plugin to solr.home/lib. Which solr.home? I got to /gaze and of course, it doesn't know where to look. Thank

ranged query on multivalued field doesnt seem to work

2009-01-29 Thread zqzuk
Hi all, in my schema I have two multivalued fields as and I issued a query as: start_year:[400 TO *], the result seems to be incorrect because I got some records with start year = - 3000... and also start year = -2147483647 (Integer.MINVALUE) Also when I combine start_year with end_year, it a

Re: query with stemming, prefix and fuzzy?

2009-01-29 Thread Mark Miller
Truncation queries and stemming are difficult partners. You likely have to accept compromise. You can try using multiple fields like you are, you can try indexing the full term at the same position as the stemmed term, or you can accept the weirdness that comes from matching on a stemmed form (

Re: I get SEVERE: Lock obtain timed out

2009-01-29 Thread Jon Drukman
Julian Davchev wrote: Hi, Any documents or something I can read on how locks work and how I can controll it. When do they occur etc. Cause only way I got out of this mess was restarting tomcat SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: SingleInstanceLock: w

Re: Solr Gaze and Multicore?

2009-01-29 Thread Mark Miller
Jacob Singh wrote: Sorry if this is wrong place to ask since Solr Gaze is Lucid's proejct, but I was trying to install this in a multicore environment, and it doesn't seem to be working. It says to add the plugin to solr.home/lib. Which solr.home? I got to /gaze and of course, it doesn't know

permanently setting log level?

2009-01-29 Thread Jon Drukman
if i go to /solr/admin/logging, i can set the "root" log level to WARNING, which is what i want. however, every time solr restarts, it is set back to INFO. Is there a way to get the WARNING level to stick permanently? -jsd-

Question about rating documents

2009-01-29 Thread Reece
Currently I'm using SOLR 1.2 to index a few million documents. It's been requested that a way for users to rate the documents be done so that something rated higher would show up higher in search results and vice verse. I've been thinking about it, but can't come up with a good way to do this and

Re: permanently setting log level?

2009-01-29 Thread Vannia Rajan
On Thu, Jan 29, 2009 at 11:55 PM, Jon Drukman wrote: > if i go to /solr/admin/logging, i can set the "root" log level to WARNING, > which is what i want. however, every time solr restarts, it is set back to > INFO. Is there a way to get the WARNING level to stick permanently? > > Hi, You can se

Re: Question about rating documents

2009-01-29 Thread Matthew Runo
You could use a boost function to gently boost up items which were marked as more popular. You would send the function query in the "bf" parameter with your query, and you can find out more about syntax here: http://wiki.apache.org/solr/FunctionQuery Thanks for your time! Matthew Runo Soft

Re: How to handle database replication delay when using DataImportHandler?

2009-01-29 Thread Gregg Donovan
Noble, Thanks for the suggestion. The unfortunate thing is that we really don't know ahead of time what sort of replication delay we're going to encounter -- it could be one millisecond or it could be one hour. So, we end up needing to do something like: For delta-import run N: 1. query DB slave

Re: I get SEVERE: Lock obtain timed out

2009-01-29 Thread Yonik Seeley
On Thu, Jan 29, 2009 at 1:16 PM, Jon Drukman wrote: > Julian, have you had any luck figuring this out? My production instance > just started having this problem. It seems to crop up after solr's been > running for several hours. Our usage is very light (maybe one query every > few seconds). I

Re: Solr Gaze and Multicore?

2009-01-29 Thread Jacob Singh
Hi Mark, Thanks, I've got it working now. Still waiting for the stats to update... This is really cool! I've also been working pretty hard at an automated benchmark suite using jmeter, rightscale and amazon web services. Next time I'm in Boston (March I think), it would be great to show you. B

Re: permanently setting log level?

2009-01-29 Thread Jon Drukman
Vannia Rajan wrote: On Thu, Jan 29, 2009 at 11:55 PM, Jon Drukman wrote: if i go to /solr/admin/logging, i can set the "root" log level to WARNING, which is what i want. however, every time solr restarts, it is set back to INFO. Is there a way to get the WARNING level to stick permanently?

Solr 1.3 and spellcheck.onlyMorePopular=true

2009-01-29 Thread Nicholas Piasecki
Hello All, I'm new to Solr, so forgive me if I'm overlooking something obvious. My observation is that the spellcheck.onlyMorePopular property of the SpellCheckComponent seems to not do what I expect. If I send the query "calvin klien" to my data store, then the spell checker correctly suggests "

Re: I get SEVERE: Lock obtain timed out

2009-01-29 Thread Jon Drukman
Yonik Seeley wrote: On Thu, Jan 29, 2009 at 1:16 PM, Jon Drukman wrote: Julian, have you had any luck figuring this out? My production instance just started having this problem. It seems to crop up after solr's been running for several hours. Our usage is very light (maybe one query every fe

Re: Question about rating documents

2009-01-29 Thread Reece
Hmm, I already boost certain fields, but from what I know about it you would need to know the boost value ahead of time which is not possible as it would be a different boost for each document depending on how it was rated.. I did think of one thing though. If I had a field that had a value of 1-

Re: Optimizing & Improving results based on user feedback

2009-01-29 Thread Walter Underwood
Thanks, I didn't know there was so much research in this area. Most of the papers at those workshops are about tuning the entire ranking algorithm with machine learning techniques. I am interested in adding one more feature, click data, to an existing ranking algorithm. In my case, I have enough d

Re: Question about rating documents

2009-01-29 Thread Erick Erickson
This may not be practical, as it would involve re-indexing all your documents periodically, but here goes anyway... You could think about *index-time* boosts. Somewhere you keep a record of the recommendations, then re-index your corpus adding some suitable boost to each field in your document bas

Re: Solr 1.3 and spellcheck.onlyMorePopular=true

2009-01-29 Thread Mark Miller
I am not super familiar with the lucene/solr spell checking implementations, but here is my take: By saying to only allow more popular, you are restricting suggestions to only those that have a higher instance frequency in the index. The score is still by edit distance, but only terms with a h

Re: Solr 1.3 and spellcheck.onlyMorePopular=true

2009-01-29 Thread Mark Miller
Let me try that again. I think my email client is going nuts: I am not super familiar with the lucene/solr spell checking implementations, but here is my take: By saying to only allow more popular, you are restricting suggestions to only those that have a higher instance frequency in the inde

Re: Question about rating documents

2009-01-29 Thread Reece
Re-indexing so much would be a pretty big pain. I do have a unique ID for each document though that I use for updating them every day as they change. -Reece On Thu, Jan 29, 2009 at 2:40 PM, Erick Erickson wrote: > This may not be practical, as it would involve re-indexing > all your document

RE: Solr 1.3 and spellcheck.onlyMorePopular=true

2009-01-29 Thread Nicholas Piasecki
Thanks for this lucid explanation. Indeed, turning the option off seems to give more intelligent results. I think that this was more of an example of me seeing "onlyMorePopular" and thinking "hmm, that must be good!" without fully understanding the consequences of the setting. The key point in y

Re: permanently setting log level?

2009-01-29 Thread Vannia Rajan
> > i'm not using tomcat, i'm using the default jetty setup that comes with > solr. i grepped through the entire solr installation for 'INFO' but i don't > see it. > > i don't really know anything about jetty other than i have to run java -jar > start.jar to get it to run solr. > > If you are not

Re: Question about rating documents

2009-01-29 Thread Reece
Okay, so what if I added a "rating" field users could update from like 1-5, and then did something like this: /solr/select?indent=on&debugQuery=on&rows=99&q=body:+something AND type:I _val_:product(score, rating); _val_ desc, id desc Would that sort the resultset by the product of the score and t

got background_merge_hit_exception during optimization

2009-01-29 Thread Qingdi
We got the following background_merge_hit_exception during optimization: exception: )background_merge_hit_exception__4zsgC136887658__50nfC995992__51i9C995977__52d5C995968__537yC995999__54xmC1892345__54xlC99593_into__54xn_optimize__javaioIOException_background_merge_hit_exception__4zsgC136887658__5

Re: warmupTime : 0

2009-01-29 Thread Yonik Seeley
On Thu, Jan 29, 2009 at 12:12 PM, sunnyfr wrote: > Do you think it's normal to have warmupTime : 0 ?? Sure, if the caches were empty or almost empty (say on startup). -Yonik

RE: warmupTime : 0

2009-01-29 Thread Feak, Todd
This usually represents anything less then 8ms if you are on a Windows system. The granularity on timing on Windows systems is around 16ms. -Todd feak -Original Message- From: sunnyfr [mailto:johanna...@gmail.com] Sent: Thursday, January 29, 2009 9:13 AM To: solr-user@lucene.apache.org S

Re: got background_merge_hit_exception during optimization

2009-01-29 Thread Otis Gospodnetic
Hi, I didn't look into this deeply, but you didn't say which version of Solr you are using (looks like it might be 1.3). If using a nightly build is an option, you might try that instead - Yonik updated the Lucene jars recently and that might be enough to solve this problem. Otis -- Sematext

Re: Question about rating documents

2009-01-29 Thread Otis Gospodnetic
Reece, Solr does have the ability to read custom field values from an external file. This is suitable for cases where these values change a lot. You might want to consider that instead of updating the index. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original

Re: Solr Gaze and Multicore?

2009-01-29 Thread Mark Miller
Jacob Singh wrote: This is really cool! I've also been working pretty hard at an automated benchmark suite using jmeter, rightscale and amazon web services. Next time I'm in Boston (March I think), it would be great to show you. That sounds excellent! One problem with Solr's efficiency is tha

Re: Highlighting does not work?

2009-01-29 Thread Mike Klaas
Thanks, Jarek. -Mike On 29-Jan-09, at 12:20 AM, Jarek Zgoda wrote: Added appriopriate amendment to FAQ, but I'd consider reorganizing information in the whole wiki, like creating a section titled "Common Tasks". Bit of redundancy does not hurt if it comes to documentation. Wiadomość nap

Re: Optimizing & Improving results based on user feedback

2009-01-29 Thread Matthew Runo
Agreed, it seems that a lot of the algorithms in these papers would almost be a whole new RequestHandler ala Dismax. Luckily a lot of them seem to be built on Lucene (at least the ones that I looked at that had code samples). Which papers did you see that actually talked about using clicks?

Re: Optimizing & Improving results based on user feedback

2009-01-29 Thread Walter Underwood
"A Decision Theoretic Framework for Ranking using Implicit Feedback" uses clicks, but the best part of that paper is all the side comments about difficulties in evaluation. For example, if someone clicks on three results, is that three times as good or two failures and a success? We have to know th

Re: How to handle database replication delay when using DataImportHandler?

2009-01-29 Thread Noble Paul നോബിള്‍ नोब्ळ्
Yeah that is an option. On Fri, Jan 30, 2009 at 12:27 AM, Gregg Donovan wrote: > Noble, > > Thanks for the suggestion. The unfortunate thing is that we really don't > know ahead of time what sort of replication delay we're going to encounter > -- it could be one millisecond or it could be one hou