timezone DIH and dataimport.properties

2011-04-26 Thread stockii
Hello. How can i set the timezone oft java in my java properties ? my problem is, that in the dataimport-properties is a wrong timezone and i dont know how to set the correct timezone ... !?!? thx - --- System One Server,

Re: how to concatenate two nodes of xml with xpathentityprocessor

2011-04-26 Thread Stefan Matheis
Vishal, i don't really understand what you're trying to achieve? indexing what (complete/sample documents, valid if possible)? And getting what exactly as result? Regards Stefan On Mon, Apr 25, 2011 at 5:01 PM, vrpar...@gmail.com wrote: > hello , > > i am using Xpathentityprocessor to do index

Re: timezone DIH and dataimport.properties

2011-04-26 Thread Stefan Matheis
java -Duser.timezone=UTC -jar start.jar ? On Tue, Apr 26, 2011 at 9:54 AM, stockii wrote: > Hello. > > How can i set the timezone oft java in my java properties ? > > my problem is, that in the dataimport-properties is a wrong timezone and i > dont know how to set the correct timezone ... !?!? th

Problem with autogeneratePhraseQueries

2011-04-26 Thread Solr Beginner
Hi, I'm new to solr. My solr instance version is: Solr Specification Version: 3.1.0 Solr Implementation Version: 3.1.0 1085815 - grantingersoll - 2011-03-26 18:00:07 Lucene Specification Version: 3.1.0 Lucene Implementation Version: 3.1.0 1085809 - 2011-03-26 18:06:58 Current Time: Tue Apr 26 08:

Re: Problem with autogeneratePhraseQueries

2011-04-26 Thread Robert Muir
What do you have in solrconfig.xml for luceneMatchVersion? If you don't set this, then its going to default to "Lucene 2.9" emulation so that old solr 1.4 configs work the same way. I tried your example and it worked fine here, and I'm guessing this is probably whats happening. the default in the

Re: Query regarding solr plugin.

2011-04-26 Thread Erick Erickson
Sorry, but there's too much here to debug remotely. I strongly advise you back wy up. Undo (but save) all your changes. Start by doing the simplest thing you can, just get a dummy class in place and get it called. Perhaps create a really dumb logger method that opens a text file, writes a messa

Re: Problem with autogeneratePhraseQueries

2011-04-26 Thread Solr Beginner
Thank you very much for answer. You were right. There was no luceneMatchVersion in solrconfig.xml of our dev core. We thought that values not present in core configuration are copied from main solrconfig.xml. I will investigate if our administrators did something wrong during upgrade to 3.1. On T

Re: how to concatenate two nodes of xml with xpathentityprocessor

2011-04-26 Thread vrpar...@gmail.com
Thanks Stefan currently in dataconfig file part of xPathEntityProcessor and when i do make search i get following search result CustomerA AnyC 1

What initialize new searcher?

2011-04-26 Thread Solr Beginner
Hi, I'm reading solr cache documentation - http://wiki.apache.org/solr/SolrCaching I found there "The current Index Searcher serves requests and when a new searcher is opened...". Could you explain when new searcher is opened? Does it have something to do with index commit? Best Regards, Solr Beg

TermsCompoment + Dist. Search + Large Index + HEAP SPACE

2011-04-26 Thread mdz-munich
Hi! We've got one index splitted into 4 shards á 70.000 records of large full-text data from (very dirty) OCR. Thus we got a lot of "unique" terms. No we try to obtain the first 400 most common words for "CommonGramsFilter" via TermsComponent but the request runs allways out of memory. The VM is

org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler'

2011-04-26 Thread vrpar...@gmail.com
Hello, i got following source org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:389) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:423) at or

Re: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler'

2011-04-26 Thread Stefan Matheis
http://www.lucidimagination.com/blog/2011/04/01/solr-powered-isfdb-part-8/ On Tue, Apr 26, 2011 at 3:34 PM, vrpar...@gmail.com wrote: > Hello, > > i got following source > > org.apache.solr.common.SolrException: Error loading class > 'org.apache.solr.handler.dataimport.DataImportHandler' at > org

Re: Automatic synonyms for multiple variations of a word

2011-04-26 Thread Robert Muir
On Tue, Apr 26, 2011 at 12:24 AM, Otis Gospodnetic wrote: > But somehow this feels bad (well, so does sticking word variations in what's > supposed to be a synonyms file), partly because it means that the person > adding > new synonyms would need to know what they stem to (or always check it aga

WhitespaceTokenizer and scoring(field length)

2011-04-26 Thread roySolr
Hello, I have a problem with the whitespaceTokenizer and scoring. An example: id Titel 1 Manchester united 2 Manchester With the whitespaceTokenizer "Manchester united" will be splitted to "Manchester" and "united". When i search for

RE: TermsCompoment + Dist. Search + Large Index + HEAP SPACE

2011-04-26 Thread Burton-West, Tom
Don't know your use case, but if you just want a list of the 400 most common words you can use the lucene contrib. HighFreqTerms.java with the - t flag. You have to point it at your lucene index. You also probably don't want Solr to be running and want to give the JVM running HighFreqTerms a l

Apache Solr 3.1.0

2011-04-26 Thread Wodek Siebor
I'm trying to tokenize email and IP addresses using StandardTokenizerFactory. It does correctly tokenize IP address but it divides email address into two tokens one with value before '@' and the other with value after that. It works correctly under Solr 1.4.1 Has anybody else tried similar thin

RE: Apache Solr 3.1.0

2011-04-26 Thread Steven A Rowe
Hi Wodek, UAX29URLEmailTokenizer includes all of StandardTokenizer's rules and adds rules to tokenize URLs and Emails: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.UAX29URLEmailTokenizerFactory Steve > -Original Message- > From: Wodek Siebor [mailto:siebor_wlo...@ba

Problems with Spellchecker in 3.1

2011-04-26 Thread Bob Sandiford
Hi, all. Sorry for any duplication - seems like what I sent yesterday never made it through... We're having some troubles with the Solr Spellcheck Response. We're running version 3.1. Overview: If we search for something really ugly like: "kljhklsdjahfkljsdhf book rck" then wh

Ebay Kleinanzeigen and Auto Suggest

2011-04-26 Thread Eric Grobler
Hi Someone told me that ebay is using solr. I was looking at their Auto Suggest implementation and I guess they are using Shingles and the TermsComponent. I managed to get a satisfactory implementation but I have a problem with category specific filtering. Ebay suggestions are sensitive to catego

Solr Newbie: Starting embedded server with multicore

2011-04-26 Thread Simon, Richard T
I'm just starting with Solr. I'm using Solr 3.1.0, and I want to use EmbeddedSolrServer with a multicore setup, even though I currently have only one core (various documents I read suggest starting that way even if you have one core, to get the better administrative tools supported by mutlicore

RE: TermsCompoment + Dist. Search + Large Index + HEAP SPACE

2011-04-26 Thread mdz-munich
Thanks for your suggestion. It seems to be the use of shards and TermsComponent together. Now we simple requesting shard-by-shard without "shard" and "shard.qt" params and merge the results via XSLT. Sebastian -- View this message in context: http://lucene.472066.n3.nabble.com/TermsCompo

Re: What initialize new searcher?

2011-04-26 Thread Erick Erickson
You're on the right track. In a system where the indexing process and search process are on the same machine, commits by the index process cause a new searcher to opened. In a master/slave situation (assuming you are indexing on the master and searching on the slave), then the searchers are reopen

Re: WhitespaceTokenizer and scoring(field length)

2011-04-26 Thread Erick Erickson
First, you can give us some more data to work with ... In particular, attach &debugQuery=on to your http request and post the results. That will show how the documents got their score. Also, show us the definition and definition for the field in question. Best Erick On Tue, Apr 26, 2011 at 10

Question on Batch process

2011-04-26 Thread Charles Wardell
I am sure that this question has been asked a few times, but I can't seem to find the sweetspot for indexing. I have about 100,000 files each containing 1,000 xml documents ready to be posted to Solr. My desire is to have it index as quickly as possible and then once completed the daily stream

Re: Automatic synonyms for multiple variations of a word

2011-04-26 Thread Mike Sokolov
Suppose your analysis stack includes lower-casing, but your synonyms are only supposed to apply to upper-case tokens. For example, "PET" might be a synonym of "positron emission tomography", but "pet" wouldn't be. -Mike On 04/26/2011 09:51 AM, Robert Muir wrote: On Tue, Apr 26, 2011 at 12:24

Re: Automatic synonyms for multiple variations of a word

2011-04-26 Thread Robert Muir
Mike, thanks a lot for your example: the idea here would be you would put the lowercasefilter after the synonymfilter, and then you get this exact flexibility? e.g. WhitespaceTokenizer SynonymFilter -> no lowercasing of tokens are done as it "analyzes" your synonyms with just the tokenizer LowerCa

Re: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler'

2011-04-26 Thread Scott Bigelow
I experienced the same issue. With Solr 1.x, I was copying out the 'example' directory to make my solr installation. However, for the Solr 3.x distributions, the DataImportHandler class exists in a directory that is at the same level as example: "dist", not a directory within. You'll either want t

RE: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-26 Thread Robert Petersen
OK this is even more weird... everything is working much better except for one thing: I was testing use cases with our top query terms to make sure the below query settings wouldn't break any existing behavior, and got this most unusual result. The analyzer stack completely eliminated the word McA

Re: WhitespaceTokenizer and scoring(field length)

2011-04-26 Thread Otis Gospodnetic
Hi, If you run your query with debugQuery=true you will see the explanation about how Lucene/Solr went about scoring your 2 docs. If you can't figure out what's going on from there, send the relevant part to the list, along with the parsed query (which you can also see from debugQuery=true out

Re: What initialize new searcher?

2011-04-26 Thread Otis Gospodnetic
Hi, Yes, typically after your index has been replicated from master to a slave a commit will be issued and the new searcher will be opened. Before being exposed to regular clients it's a good practice to warm things up. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucen

Re: Automatic synonyms for multiple variations of a word

2011-04-26 Thread Mike Sokolov
Yes, I see. Makes sense. It is a bit hard to see a "bad" case for your proposal in that light. Here is one other example; I'm not sure whether it presents difficulties or not, and may be a bit contrived, but hey, food for thought at least: Say you have set up synonyms between names and commo

Re: Ebay Kleinanzeigen and Auto Suggest

2011-04-26 Thread Otis Gospodnetic
Hi Eric, Before using the terms component, allow me to point out: * http://sematext.com/products/autocomplete/index.html (used on http://search-lucene.com/ for example) * http://wiki.apache.org/solr/Suggester Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem

SynonymFilterFactory case changes

2011-04-26 Thread Robert Petersen
So if there is a hit in the synonym filter factory, do I need to put the various case changes for a term so that the following WordDelimiterFilter analyzer can do its 'split on case changes' work? Here we see SynonymFilterFactory makes all terms lowercase because this is what is in my synonmyms.txt

Re: Question on Batch process

2011-04-26 Thread Otis Gospodnetic
Charlie, How's this: * -Xmx2g * ramBufferSizeMB 512 * mergeFactor 10 (default, but you could up it to 20, 30, if ulimit -n allows) * ignore/delete maxBufferedDocs - not used if you ran ramBufferSizeMB * use SolrStreamingUpdateServer (with params matching your number of CPU cores) or send batches

Re: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-26 Thread Otis Gospodnetic
Hi Robert, I'm no WDFF expert, but all these zero look suspicious: org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=0, generateNumberParts=0, catenateWords=0, generateWordParts=0, catenateAll=0, catenateNumbers=0} A quick visit to http://wiki.apache.org/solr/AnalyzersTokeni

RE: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-26 Thread Robert Petersen
Yeah I am about to try turning one on at a time and see what happens. I had a meeting so couldn't do it yet... (darn those meetings) (lol) -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Tuesday, April 26, 2011 2:37 PM To: solr-user@lucene.apache.or

Reader per query request

2011-04-26 Thread cyang2010
Hi, I was wondering if solr open a new lucene IndexReader for every query request? >From performance point of view, is there any problem of opening a lot of IndexReaders concurrently, or application shall have some logic to reuse the same IndexReader? Thanks, cy -- View this message in

Field Length and Highlight

2011-04-26 Thread Alejandro Delgadillo
Hi, I¹ve been using solr with Coldfusion9, I¹ve made a couple of adjustment to it in order to fulfill my needs of my client, I¹m using solr as a document search engine for a online library which has documents larger then 20MB and some of them have more than 20 pages. The thing is that... At firs

Re: SynonymFilterFactory case changes

2011-04-26 Thread Erick Erickson
Yes, order does matter. You're right, putting, say, lowercase in front of WordDelimiter... will mess up the operations of WDFF. The admin/analysis page is *extremely* useful for understanding what happens in the analysis of input. Make sure to check the "verbose" checkbox. Best Erick On Tue, Ap

Re: term position question from analyzer stack for WordDelimiterFilterFactory

2011-04-26 Thread Erick Erickson
I second Otis' comments. Is it possible that you've gotten twisted around by trying to modify these settings and would be better off going back to the WDDF settings in the example schema? I've sometimes found that to be very useful. Also (although I don't think it applies in this case) be aware th

Re: Reader per query request

2011-04-26 Thread Erick Erickson
See below On Tue, Apr 26, 2011 at 6:15 PM, cyang2010 wrote: > Hi, > > I was wondering if solr open a new lucene IndexReader for every query > request? > no, absolutely not. Solr only opens a reader when the underlying index has changed, say a commit or a replication happens. > From performance p

Re: Too many open files exception related to solrj getServer too often?

2011-04-26 Thread cyang2010
Just pushing up the topic and look for answers. -- View this message in context: http://lucene.472066.n3.nabble.com/Too-many-open-files-exception-related-to-solrj-getServer-too-often-tp2808718p2867976.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Reader per query request

2011-04-26 Thread cyang2010
Thanks a lot. That makes sense. -- CY -- View this message in context: http://lucene.472066.n3.nabble.com/Reader-per-query-request-tp2867778p2867995.html Sent from the Solr - User mailing list archive at Nabble.com.

RE: SynonymFilterFactory case changes

2011-04-26 Thread Robert Petersen
But in this case lowercase is after WDF. The question is that when you get a hit in the SynonymFilter on a synonym and where the entries in synonmyms.txt file are all in lower case do I need to add the case changing versions to make WDF work on case changes because it appears the synonym text i

Re: Field Length and Highlight

2011-04-26 Thread Koji Sekiguchi
(11/04/27 7:35), Alejandro Delgadillo wrote: Hi, I¹ve been using solr with Coldfusion9, I¹ve made a couple of adjustment to it in order to fulfill my needs of my client, I¹m using solr as a document search engine for a online library which has documents larger then 20MB and some of them have mo

Re: Question on Batch process

2011-04-26 Thread Charles Wardell
Thank you Otis. Without trying to appear to stupid, when you refer to having the params matching your # of CPU cores, you are talking about the # of threads I can spawn with the StreamingUpdateSolrServer object? Up until now, I have been just utilizing post.sh or post.jar. Are these capable of t

Re: SynonymFilterFactory case changes

2011-04-26 Thread Erick Erickson
Ahhh, I mis-read your post.. First, it's not the synonymfilterfactory that's lowercasing anything. The ingorecase="true" affects the matching, not the output. The output is probably lowercased because you have it that way in the synonyms.txt file. At least that's what I just saw using the analysis

Suggester or spellcheck return stored fields

2011-04-26 Thread wakemaster 39
Hello all, I am trying to build an autocomplete solution for a website that I run. The current implementation of it is going to be used on who you want to send PM's too. I have it basically working up to this point, The UI is done and the suggester is working in returning possible solutions withou

Re: How to Update Value of One Field of a Document in Index?

2011-04-26 Thread Peter Spam
My schema: id, name, checksum, body, notes, date I'd like for a user to be able to add notes to the notes field, and not have to re-index the document (since the body field may contain 100MB of text). Some ideas: 1) How about creating another core which only contains id, checksum, and notes?

Re: What initialize new searcher?

2011-04-26 Thread Solr Beginner
Thank you for the answers. I'm moving forward and have few more questions but for separate threads. On Tue, Apr 26, 2011 at 10:47 PM, Otis Gospodnetic wrote: > Hi, > > Yes, typically after your index has been replicated from master to a slave a > commit will be issued and the new searcher will be

fieldCache only on stats page

2011-04-26 Thread Solr Beginner
Hi, I can see only fieldCache (nothing about filter, query or document cache) on stats page. What I'm doing wrong? We have two servers with replication. There are two cores(prod, dev) on each server. Maybe I have to add something to solrconfig.xml of cores? Best Regards, Solr Beginner

DataImportHandler in Solr 3.1.0: not updating dataimport.properties last_index_time on delta-import?

2011-04-26 Thread Scott Bigelow
Title pretty much says it all; I've configured the DIH in 3.1.0, and it works great, except the delta-imports are always from the last time a full-import happened, not a delta-import. After a delta-import, dataimport.properties is completely untouched. The documentation implies that the delta-impor

Re: Ebay Kleinanzeigen and Auto Suggest

2011-04-26 Thread Eric Grobler
Thanks for the links Otis, I will have a look. Regards Ericz On Tue, Apr 26, 2011 at 10:06 PM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: > Hi Eric, > > Before using the terms component, allow me to point out: > > * http://sematext.com/products/autocomplete/index.html (used on > http