Issue in indexing Zip file content with apache-solr-3.3.0

2011-08-22 Thread Jagdish Kumar
Hi All I am using apache-solr-3.3.0 with apache-solr-cell-3.3.0.jar, though I am able to index the zip files, but I get no results if I search for content present in zip file. Please suggest possible solution. Thanks and regards Jagdish

Re: copyField for big indexes

2011-08-22 Thread Tom
Bill, I was using it as a simple default search field. I realise now that's not a good reason to use copyField. As I see it now, it should be used if you want to search in a way that is different: use different analyzers, etc; not for just searching on multiple fields in a single query. Thank

Boost or BQ?

2011-08-22 Thread Bill Bell
What is the different between boost= and bq= ? I cannot find any documentationŠ

Re: copyField for big indexes

2011-08-22 Thread Bill Bell
It depends. copyField may be good if you want to copy into a Soundex field, and then boost the sounded field lower than the tokenized field. What are you trying to do ? On 8/22/11 11:14 AM, "Tom" wrote: >Is it a good rule of thumb, that when dealing with large indexes copyField >should not be

Re: hierarchical faceting in Solr?

2011-08-22 Thread Bill Bell
Naomi, Just create a login and update it!! On 8/22/11 12:27 PM, "Erick Erickson" wrote: >Try searching the Solr user's list for "hierarchical", this topic >has been discussed numerous times. > >It would be great if you could collate the various solutions >and update the wiki, all you have to d

Re: Terms.regex performance issue

2011-08-22 Thread Bill Bell
We do something like: http://localhost:8983/solr/provs/terms?terms.fl=payor&terms.regex.flag=case _insensitive&terms.regex=%28.*%29WHAT USER TYPES%28.*%29&terms.limit=-1 We want not just prefix but anywhere in the terms. On 8/19/11 5:21 PM, "Chris Hostetter" wrote: > >: Subject: Terms.regex

Re: can i create filters of score range

2011-08-22 Thread Chris Hostetter
: before going into lucene doc id , i have got creationDate datetime field in : my index which i can use as page definition using filter query.. : i have learned exposing lucene docid wont be a clever idea, as its again : relative to index instance.. where as my index date field will be unique : .

Re: can i create filters of score range

2011-08-22 Thread jame vaalet
thanks hoss... thats a real good explanation .. well, don care about the sort order i just want all of the docs .. and yes score values may be duplicated which will deteriorate my search performance... before going into lucene doc id , i have got creationDate datetime field in my index which i can

Re: how can i develop client application with solr url using javascript?

2011-08-22 Thread caman
search 'ajax-solr' on google. To handle solr url, look at establishing a proxy Good luck. -- View this message in context: http://lucene.472066.n3.nabble.com/how-can-i-develop-client-application-with-solr-url-using-javascript-tp3275506p3276269.html Sent from the Solr - User mailing list archive

Re: how can i develop client application with solr url using javascript?

2011-08-22 Thread Alexei Martchenko
before setting up your solr to response directly to jquery did you manage to bulletproof it agains unwanted deletes? how will you protect your database? be careful before exposing solr directly to 'the world'. 2011/8/22 nagarjuna > hi everybody , >i have solr url which produces json response

Re: How to implement Spell Checker using Solr?

2011-08-22 Thread Alexei Martchenko
What is the error? 2011/8/22 anupamxyz > The changes for Solrconfig.xml in solr is as follows > > > > > default > > solr.IndexBasedSpellChecker > > spell > > ./spellchecker > > 0.7 > > .0001 > > > > jarowinkler > lowerfilt > > > name="di

Re: Sorting results by Range

2011-08-22 Thread Chris Hostetter
: 1) The user gives a query, and also has an option to choose the "from" and : "to" values for a specific field. : (For Eg: Give me all documents that match the query Solr Users, but with : those that were last updated between 10th and 20th of August ranked on top) : : -Over here, I am currently

Re: can i create filters of score range

2011-08-22 Thread Chris Hostetter
: retrieving 1 million docids from solr through paging is resulting in deep : pagin issues..so i wonder if i can use filter queries to fetch all the 1 : mllion docids chunk by chunk .. so for me the best filter wiould score... if : i can find the maximum score i can filter out other docs .. : : w

Re: Update field value in the document based on value of another field in the document

2011-08-22 Thread Chris Hostetter
: Now that I have set it up using UpdateProcessorChain, I am running into null : exeception. are you sure you pasted the correct stack trace? that's not a null (pointer) exception it's an AbstractMethodError... : Aug 20, 2011 10:48:43 AM org.apache.solr.common.SolrException log : SEVERE: java.

Re: Requiring multiple matches of a term

2011-08-22 Thread Simon Willnauer
On Mon, Aug 22, 2011 at 8:10 PM, Chris Hostetter wrote: > > : One simple way of doing this is maybe to write a wrapper for TermQuery > : that only returns docs with a Term Frequency  > X as far as I > : understand the question those terms don't have to be within a certain > : window right? > > I d

Re: heads up: re-index 3.x branch Lucene/Solr indices

2011-08-22 Thread Simon Willnauer
Shawn, as long as you are only using a release version of lucene /solr you don't need to be worried at all. This is a index format change that has never been released. only if you use a svn checkout you should reindex. simon On Mon, Aug 22, 2011 at 8:56 PM, Shawn Heisey wrote: > On 8/22/2011 12:

Re: heads up: re-index 3.x branch Lucene/Solr indices

2011-08-22 Thread Shawn Heisey
On 8/22/2011 12:38 PM, Shawn Heisey wrote: Just to be clear, if you are not using a compound file, do you need to worry about this? I am using 3.2, but I've got the compound file turned off and have 11 files per segment. Upgrading is in my near future, but I think 3.4 will be out by the time

Re: heads up: re-index 3.x branch Lucene/Solr indices

2011-08-22 Thread Shawn Heisey
On 8/22/2011 10:24 AM, Simon Willnauer wrote: I just reverted a previous commit related to CompoundFile in the 3.x stable branch. If you are using unreleased 3.x branch you need to reindex. See here for details: https://issues.apache.org/jira/browse/LUCENE-3218 If you are using a released

Re: hierarchical faceting in Solr?

2011-08-22 Thread Erick Erickson
Try searching the Solr user's list for "hierarchical", this topic has been discussed numerous times. It would be great if you could collate the various solutions and update the wiki, all you have to do is create a login... Best Erick On Mon, Aug 22, 2011 at 1:49 PM, Naomi Dushay wrote: > Chris,

Re: Requiring multiple matches of a term

2011-08-22 Thread Chris Hostetter
: One simple way of doing this is maybe to write a wrapper for TermQuery : that only returns docs with a Term Frequency > X as far as I : understand the question those terms don't have to be within a certain : window right? I don't think you could do it as a Query Wrapper -- it would have to be

Re: Sorting results by Range

2011-08-22 Thread Sowmya V.B.
Thanks for the clarification Erick! On Mon, Aug 22, 2011 at 4:30 PM, Erick Erickson wrote: > OK, I think I get it now. There's nothing in Solr that I know of that'll > let you > do this. Although you could add a clause boosted insanely high, something > like date_modified:[aug10 TO aug 20]^1

Re: copyField for big indexes

2011-08-22 Thread Tom
Thanks Erick -- View this message in context: http://lucene.472066.n3.nabble.com/copyField-for-big-indexes-tp3275712p3275816.html Sent from the Solr - User mailing list archive at Nabble.com.

hierarchical faceting in Solr?

2011-08-22 Thread Naomi Dushay
Chris, Is there a document somewhere on how to do this? If not, might you create one? I could even imagine such a document living on the Solr wiki ... this one has mostly ancient content: http://wiki.apache.org/solr/HierarchicalFaceting - Naomi

Re: copyField for big indexes

2011-08-22 Thread Erick Erickson
copyField should only be used if there's a good reason, that is you need to tokenize/analyze stuff differently, for instance faceting. It's not so much a matter of the index size, as whether the copyFields are necessary to get your needed functionality. You're right that you can construct queries

Re: Problem using stop words

2011-08-22 Thread Alexei Martchenko
No, I think you're right, i've never seen pipes as comments before... 2011/8/22 Erick Erickson > Ahh, you're right. I was wy off base there > > So I guess the question is how you know the words aren't being removed? A > common > problem is to look at *stored* fields rather than what's ac

Re: how can i develop client application with solr url using javascript?

2011-08-22 Thread Gora Mohanty
On Mon, Aug 22, 2011 at 9:38 PM, nagarjuna wrote: > hi everybody , >    i have solr url which produces json response format ...i would like to > develop a client application using javascript which is automatic search > field please send me any samples or any sample code. > i need to use my

Re: can i create filters of score range

2011-08-22 Thread Erick Erickson
Even if you could do this, I think you'd still have the "deep paging" issue. Your irreducible problem is that you have to return all million docs, have you tried just setting the &rows=1000? And maybe return JSON format for brevity? Best Erick On Mon, Aug 22, 2011 at 1:07 PM, jame vaalet wr

Re: how to perform scheduling?

2011-08-22 Thread Erick Erickson
Scheduling what? It might be a good thing to review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Mon, Aug 22, 2011 at 12:49 PM, nagarjuna wrote: > Hi everybody.. >       i am new to solr and i would like to know about the "scheduling" > can anybody please explain wha

copyField for big indexes

2011-08-22 Thread Tom
Is it a good rule of thumb, that when dealing with large indexes copyField should not be used. It seems to duplicate the indexing of data. You don't need copyField to be able to search on multiple fields. Example, if I have two fields: title and post and I want to search on both, I could just qu

Re: Text Analysis and copyField

2011-08-22 Thread Erick Erickson
I suspect that the things going into TermsDictionary are from fields other than CorrectlySpelledTerms. In other words I don't think that anything is getting into TermsDictionary from CorrectlySpelledTerms... Be careful to remove the index between schema changes, just to be sure that you're not se

Re: can i create filters of score range

2011-08-22 Thread jame vaalet
thanks erick for the answer.. my index have around 20 million document in it.. and each query of mine will yield around 1 million hits (numFound)..corresponding to each my query i store the hit document id into data base for further processing.. retrieving 1 million docids from solr through paging

how to perform scheduling?

2011-08-22 Thread nagarjuna
Hi everybody.. i am new to solr and i would like to know about the "scheduling" can anybody please explain what is scheduling and how to perform scheduling? please provide some sample config files which perform scheduling? Thanx in advance -- View this message in con

Re: SSD experience

2011-08-22 Thread Rich Cariens
Thanks folks! On Mon, Aug 22, 2011 at 11:13 AM, Erick Erickson wrote: > That link appears to be foo'd, and I can't find the original doc. > > But others (mostly on the user's list historically) have seen very > significant > performance improvements with SSDs, *IF* the entire index doesn't fit >

Re: Full sentence spellcheck

2011-08-22 Thread William Oberman
I had an NPE on the same line, but from googling it seems like that NPE can happen for different reasons, so I couldn't say if my situation was exactly the same as yours. I will say I get phrase based suggestions now. will On Mon, Aug 22, 2011 at 11:28 AM, Valentin wrote: > Thanks, but i have

Re: Problem using stop words

2011-08-22 Thread Erick Erickson
Ahh, you're right. I was wy off base there So I guess the question is how you know the words aren't being removed? A common problem is to look at *stored* fields rather than what's actually in the inverted index. The TermsComponent can help here: http://wiki.apache.org/solr/TermsComponent

heads up: re-index 3.x branch Lucene/Solr indices

2011-08-22 Thread Simon Willnauer
I just reverted a previous commit related to CompoundFile in the 3.x stable branch. If you are using unreleased 3.x branch you need to reindex. See here for details: https://issues.apache.org/jira/browse/LUCENE-3218 If you are using a released version of Lucene/Solr then you can ignore this m

Dictionary of Correctly Spelled terms

2011-08-22 Thread Herman Kiefus
My objective is to end up with a field that can be used to build the spellcheck dictionary; however, that field will only contain correctly spelled terms other than those terms originating from two other 'proper name' fields. I thought I had this working, but feedback from a separate thread seem

how can i develop client application with solr url using javascript?

2011-08-22 Thread nagarjuna
hi everybody , i have solr url which produces json response format ...i would like to develop a client application using javascript which is automatic search field please send me any samples or any sample code. i need to use my solr url in jscript or jquery file to implement automatic s

RE: Text Analysis and copyField

2011-08-22 Thread Herman Kiefus
That's what I thought, but my experiments show differently. In actuality: I have a number of fields that are of type "text" (the default as it is packaged). I have a type 'textCorrectlySpelled' that utilizes KeepWordFilterFactory in index-time analysis, using a file of terms which are known

Re: Custom FilterFactory is when called

2011-08-22 Thread simon
On Mon, Aug 22, 2011 at 5:34 AM, occurred < schaubm...@infodienst-ausschreibungen.de> wrote: > Hi, > > I've created my own custom FilterFactory or better to say rewritten an > existing one: > KeywordMarkerFilterFactory > to: > CachingKeywordMarkerFilterFactory > > It will/should reload the protwor

Re: Problem using stop words

2011-08-22 Thread Alexei Martchenko
That very txt said "A Spanish stop word list. Comments begin with vertical bar. Each stop word is at the start of a line." Solr's comments are #s not pipes. Brazilian stopwords file is kinda raw... http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/resources/org/apache/l

Re: Full sentence spellcheck

2011-08-22 Thread Valentin
Thanks, but i have a last question before trying your solution : did you have the same NullPointerException before ? I want to be sure that is the only way to resolve my problem before midifying some java files... -- View this message in context: http://lucene.472066.n3.nabble.com/Full-sentence-s

Re: Problem using stop words

2011-08-22 Thread Alexei Martchenko
Funny thing is that stopwords files in the examples shown in http://wiki.apache.org/solr/LanguageAnalysis#Spanish are actually using pipe and other terms. See the spanish one in http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/resources/org/apache/lucene/analysis/snowbal

Re: SSD experience

2011-08-22 Thread Erick Erickson
That link appears to be foo'd, and I can't find the original doc. But others (mostly on the user's list historically) have seen very significant performance improvements with SSDs, *IF* the entire index doesn't fit in memory. If your index does fit entirely in memory, there will probably be some

Re: SSD experience

2011-08-22 Thread Daniel Skiles
I haven't tried it with Solr yet, but with straight Lucene about two years ago we saw about a 40% boost in performance on our tests with no changes except the disk. On Mon, Aug 22, 2011 at 10:54 AM, Rich Cariens wrote: > Ahoy ahoy! > > Does anyone have any experiences or stories they can share wi

SSD experience

2011-08-22 Thread Rich Cariens
Ahoy ahoy! Does anyone have any experiences or stories they can share with the list about how SSDs impacted search performance for better or worse? I found a Lucene SSD performance benchmark doc

Re: Sorting results by Range

2011-08-22 Thread Erick Erickson
OK, I think I get it now. There's nothing in Solr that I know of that'll let you do this. Although you could add a clause boosted insanely high, something like date_modified:[aug10 TO aug 20]^1 that would bubble your target results toward the top of your list... Note, that's not the correct dat

Re: can i create filters of score range

2011-08-22 Thread Erick Erickson
I don't believe that this is possible, and I strongly question whether it's useful (not to mention the syntax error, score:[1 TO *], notice the colon). Scores really are "dimensionless". A normalized score of 0.5 for a particular query doesn't really say anything about how "good" the document is,

Re: Problem using stop words

2011-08-22 Thread Erick Erickson
What does the admin/analysis page show? And if you're really putting the pipe symbol (|) in you stopwords file, I have no clue what Solr will make of it. The stopwords file format is usually just one word per line. I'm assuming your name of "string" for the field type is just a placeholder or

Re: Sorting results by Range

2011-08-22 Thread Sowmya V.B.
Hi Eric Let me clarify: 1) The user gives a query, and also has an option to choose the "from" and "to" values for a specific field. (For Eg: Give me all documents that match the query Solr Users, but with those that were last updated between 10th and 20th of August ranked on top) -Over here, I

Re: Sorting results by Range

2011-08-22 Thread Erick Erickson
I guess I'm having trouble understanding this. "I just wanted all the results, along with an option to sort the results, if the user wants it." What does "all the results" mean? The results you would have had if it you didn't have a sort? There's no way to guarantee that the results returned by re

Re: Text Analysis and copyField

2011-08-22 Thread Stephen Duncan Jr
On Mon, Aug 22, 2011 at 9:25 AM, Herman Kiefus wrote: > Is my thinking correct? > > I have a field 'F1' of type 'T1' whose index time analysis employs the > StopFilterFactory. > > I also have a field 'F2' of type 'T2' whose index time analysis does NOT > employ the StopFilterFactory. > > There i

Text Analysis and copyField

2011-08-22 Thread Herman Kiefus
Is my thinking correct? I have a field 'F1' of type 'T1' whose index time analysis employs the StopFilterFactory. I also have a field 'F2' of type 'T2' whose index time analysis does NOT employ the StopFilterFactory. There is a copyField directive source="F1" dest="F2" F2 will not contain any

Re: Full sentence spellcheck

2011-08-22 Thread William Oberman
I listed the basic steps I took in the other thread (recently), which were: -Downloaded apache-solr-3.3.0 archive (I like to stick with releases vs. svn) -Untar (tar -xzvf) and cd -ant (to compile) -mkdir something, cd something (e.g. create a peer directory in apache-solr-3.3.0) -Wrote my class b

Re: Full sentence spellcheck

2011-08-22 Thread Valentin
I found the thread "Suggester Issues". You said to write a new java class : package com.civicscience; import java.util.ArrayList; import java.util.Collection; import java.util.Collections; import org.apache.lucene.analysis.Token; import org.apache.solr.spelling.QueryConverter; /**

can i create filters of score range

2011-08-22 Thread jame vaalet
hi. Is it possible to say fq=score[1 TO *] i have tried but solr is throwing error ? can this be done with some other syntax ? -- -JAME

Re: Count rows with tokens

2011-08-22 Thread lee carroll
Hi This looks like a facteing problem. See http://wiki.apache.org/solr/SolrFacetingOverview cheers lee c On 22 August 2011 11:52, tom135 wrote: > Hello, > > I want to use Solr as a search engine. I have indexed data like: > ID | TEXT | CREATION_DATE > > Daily increase by 500 000 rows. > > My pr

Count rows with tokens

2011-08-22 Thread tom135
Hello, I want to use Solr as a search engine. I have indexed data like: ID | TEXT | CREATION_DATE Daily increase by 500 000 rows. My problem: *INPUT:* fixed set of tokens (max size 40), set of days *RESULT:* How many rows (TEXT) contain fixed set of tokens and are created in day1, day2, ..., day

Problem using stop words

2011-08-22 Thread Lucas Miguez
Hi, I am trying to use spanish stop words, but the stop words are not working: Part of the schema.xml file: __

Re: Solr Queries

2011-08-22 Thread Shalin Shekhar Mangar
Hi Abhijeet, On Mon, Aug 22, 2011 at 3:09 PM, abhijit bashetti wrote: > > 1. Can I update a specific field while re-indexing? > Solr doesn't support updating specific fields. You must always create a complete document with values for all fields while indexing. If you keep the same value for the

Solr Queries

2011-08-22 Thread abhijit bashetti
Hi, I have some queries on Solr? 1. Can I update a specific field while re-indexing? 2. what are the ways to improve the performance of Indexing? 3. What should be ideal system configuration for solr indexing server? Regards, Abhijit

Custom FilterFactory is when called

2011-08-22 Thread occurred
Hi, I've created my own custom FilterFactory or better to say rewritten an existing one: KeywordMarkerFilterFactory to: CachingKeywordMarkerFilterFactory It will/should reload the protwords every minute. But now I found out that this FilterFactory is only called a few times when me server startu

Re: Tomcat freezes one or more times a day

2011-08-22 Thread Markus Jelsma
> > Part Solr log part GC log? Strange. > > The log is catalina.out. Solr requests are apparently logged into this > file, as are Tomcat's garbage collections (-verbose:gc). Ah yes. You can store GC log in a separate file. Makes life easier. > > > Anyway, your permgen is full, do not reload So

RE: Tomcat freezes one or more times a day

2011-08-22 Thread Reinier Kip
> Part Solr log part GC log? Strange. The log is catalina.out. Solr requests are apparently logged into this file, as are Tomcat's garbage collections (-verbose:gc). > Anyway, your permgen is full, do not reload Solr in Tomcat as it, by default, does not free previous permgen space. Thank you, I

Re: Tomcat freezes one or more times a day

2011-08-22 Thread Markus Jelsma
> Tomcat freezes one or more times a day. It may be related to the > reloading (in Tomcat's manager) of the Solr application. Searching the > internet led me to the conclusion that this might be due to garbage > collection. I was advised to enable more detailed logging of garbage > collection with

Sorting results by Range

2011-08-22 Thread Sowmya V.B.
Dear All I have been searching on how to sort Solr results based on a range query..but did not find decipherable answers. Hence, mailing again. Sorry if the query was already posed -i was not able to find it. I want to add an option to sort the results, based on a field range chosen. (For eg: "So