Re: SOLR memory usage jump in JVM

2012-09-18 Thread Bernd Fehling
Hi Otis, because I see this on my slave without replication there is no index file change. I have also tons of logged data to dig in :-) I took dumps from different stages, fresh installed, after 5GB jump, after the system was hanging right after replication,... The last one was interesting when

Wildcard searches don't work

2012-09-18 Thread Alex Cougarman
Hi, We're having difficulty with some wildcard searches in Solr 4.0Beta. We're using a copyField to write a "tdate" to a "text_general" field. We are using the default definition for the "text_general" field type. Here's the sample data it holds: 2010-01-2

Re: SOLR memory usage jump in JVM

2012-09-18 Thread Bernd Fehling
Hi Lance, thanks for this hint. Something I also see, a sawtooth. This is coming from Eden space together with Survivor 0 and 1. I should switch to Java 7 release to get rid of this and see how heap usage looks there. May be something else is also fixed. Regards Bernd Am 19.09.2012 05:29, schri

Re: Compond File Format Advice needed - On migration to 3.6.1

2012-09-18 Thread Sujatha Arun
Thanks Lance. Did try going back to LogByteSizeMergePolicy,which we using with 1.3,and with usecompoundfile =true ,but even then this ,leads to non compound index file format. These seems no Config way to go back truly compound Index file format. Regards Sujatha On Wed, Sep 19, 2012 at 9:33 AM

Re: SOLR memory usage jump in JVM

2012-09-18 Thread Otis Gospodnetic
Hi Bernd, On Tue, Sep 18, 2012 at 3:09 AM, Bernd Fehling wrote: > Hi Otis, > > not really a problem because I have plenty of memory ;-) > -Xmx25g -Xms25g -Xmn6g Good. > I'm just interested into this. > Can you report similar jumps within JVM with your monitoring at sematext? Yes. More importan

Re: Compond File Format Advice needed - On migration to 3.6.1

2012-09-18 Thread Lance Norskog
1) Use fewer segments. 2) Start the service with a higher limit on the number of open files. It used to be that the kernel allocated fixed resources for maximum number, but that is no longer true. This is not really an important limit. 3) That Lucene issue was closed in 3.1. This must be some ot

Re: SOLR memory usage jump in JVM

2012-09-18 Thread Lance Norskog
There is a known JVM garbage collection bug that causes this. It has to do with reclaiming Weak references, I think in WeakHashMap. Concurrent garbage collection collides with this bug and the result is that old field cache data is retained after closing the index. The bug is more common with mo

Re: FilterCache Memory consumption high

2012-09-18 Thread Lance Norskog
The same answer as in another thread: There is a known JVM garbage collection bug that causes this. It has to do with reclaiming Weak references, I think in WeakHashMap. Concurrent garbage collection collides with this bug and the result is that old field cache data is retained after closing th

Re: Compond File Format Advice needed - On migration to 3.6.1

2012-09-18 Thread Sujatha Arun
anybody? On Tue, Sep 18, 2012 at 10:42 PM, Sujatha Arun wrote: > Hi , > > The default Index file creation format in 3.6.1 [migrating from 1.3] > in-spite of setting the usecompoundfile to true seems to be to create non > compound files due to Lucene > 2790

Re: File content indexing

2012-09-18 Thread Erik Hatcher
Solr Cell can already do this. See the stream.file parameter and content steam info on the wiki. Erik On Sep 18, 2012, at 19:56, "Zhang, Lisheng" wrote: > Hi, > > Sorry I just sent out an unfinished message! > > Reading Solr cell, we indexing a file by first upload it through HTTP to

File content indexing

2012-09-18 Thread Zhang, Lisheng
Hi, Sorry I just sent out an unfinished message! Reading Solr cell, we indexing a file by first upload it through HTTP to solr, in my experience it is rather expensive to pass a big file through HTTP. If the file is local, maybe the better way is to pass file path to solr so that solr can

File content indexing

2012-09-18 Thread Zhang, Lisheng
Hi, Reading Solr cell, we indexing a file by first upload it through HTTP to solr, in my experience it is rather expensive to pass a big file through HTTP. If the file is local, maybe the better way is to pass file path to solr so that solr can use java.io API to get file content, maybe this

A strange Solr NullPointerException while shutting down Tomcat, possible connection to messed-up index files

2012-09-18 Thread Bryan Loofbourrow
I’m using Solr/Lucene 3.6 under Tomcat 6. When shutting down an indexing server after much indexing activity, occasionally, I see the following NullPointerException trace from Tomcat: INFO: Stopping Coyote HTTP/1.1 on http-1800 Exception in thread "Lucene Merge Thread #1" org.apache.lucene.i

Re: Solr4 how to make it do this?

2012-09-18 Thread george123
Thanks everyone for the fast response. Pre processing it is. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr4-how-to-make-it-do-this-tp4008574p4008774.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr4 how to make it do this?

2012-09-18 Thread Jack Krupansky
For the future, it would certainly be worth considering some kind of generic mechanism for "query preprocessor" so that one could do a pseudo-synonym expansion before the query text gets sent to the query parser. As it is, you must do any such preprocessing yourself before sending the final quer

Re: Duplicate in copyField

2012-09-18 Thread Jack Krupansky
See SOLR-3814, which says that this issue is fixed by SOLR-3743, which will be in 4.0. See: https://issues.apache.org/jira/browse/SOLR-3814 -- Jack Krupansky -Original Message- From: Jonatan Fournier Sent: Tuesday, September 18, 2012 1:35 PM To: solr-user@lucene.apache.org Subject: D

Re: Duplicate in copyField

2012-09-18 Thread Jonatan Fournier
I didn't realize that copyField are implemented via multivalue, I thought they were flat field. What I was trying to do was to have one common field between two different schema, so that my GUI could use both index source for listing by title... I guess I will populate this field manually from my

Duplicate in copyField

2012-09-18 Thread Jonatan Fournier
Hi, I have something strange happening (4.0-BETA), I have a title field: And a copyField: Note that I don't have multivalue set for the title field, but I do end up with multiple value in my field: { "responseHeader":{ "status":0, "QTime":371, "params":{ "indent":"true",

Compond File Format Advice needed - On migration to 3.6.1

2012-09-18 Thread Sujatha Arun
Hi , The default Index file creation format in 3.6.1 [migrating from 1.3] in-spite of setting the usecompoundfile to true seems to be to create non compound files due to Lucene 2790 . I have tried the following ,but everything seems to create no

RE: broken links in solr wiki

2012-09-18 Thread Petersen, Robert
OK I made a login and corrected the links. -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Monday, September 17, 2012 5:07 PM To: solr-user@lucene.apache.org Subject: Re: broken links in solr wiki Hi Robert, Anyone can edit wiki, you just need to create user. Reg

"Intersects" spatial query returns polygons it shouldn't

2012-09-18 Thread solr-user
I have noticed that lucene spatial toolkit (aka LSP aka spatial4j) returns docs it shouldnt when I execute a polygon "intersects" query. I am wondering if this is a bug in how the spatial code works or something that I am doing wrong. Here is my test scenario: 1. my schema: where "geohas

RE: poor language detection

2012-09-18 Thread Markus Jelsma
Hi, You should avoid using Tika's language detector, it supports only about 15 languages. Use the LangDetect library instead, it detects more languages by default and has higher accuracy. For both detectors you can create custom (better) profiles. Cheers -Original message- > From:

poor language detection

2012-09-18 Thread tomtom
Hi, I've got a problem with language detection. There are about 120 documents in different languages to import, mostly chinese, english, german and others. English and german are classified quite well, but chinese, japanese and others stray into a field 'fieldname_lt' - for lituanian language. A

Re: Installing Tomcat as the user solr?

2012-09-18 Thread Michael Della Bitta
Yeah, that link appears to be broken for me too, like the person editing the page wrote the link and never provided the file. I don't believe there's anything Solr-specific about that file, so I would just google for directions for installing Tomcat on your platform and follow those. I know for U

Re: Personalized Boosting

2012-09-18 Thread Tom Mortimer
Hi, Would this do the job? http://wiki.apache.org/solr/QueryElevationComponent Tom On 18 Sep 2012, at 01:36, deniz wrote: > Hello All, > > I have a requirement or a pre=requirement for our search application. > Basically the engine will be on a website with plenty of users and more than > 2

Re: SOLR memory usage jump in JVM

2012-09-18 Thread Yonik Seeley
On Tue, Sep 18, 2012 at 7:45 AM, Bernd Fehling wrote: > I used GC in different situations and tried back and forth. > Yes, it reduces the used heap memory, but not by 5GB. > Even so that GC from jconsole (or jvisualvm) is "Full GC". Whatever "Full GC" means ;-) In the past at least, I've found th

Re: Taking a full text, then truncate and duplicate with stopwords

2012-09-18 Thread Jack Krupansky
The copyField is providing the "source" or "input" for your truncated_description field and is limiting (truncating) that text to 168 characters. In short, a much longer input can come in, all of the keywords get indexed, but only the truncated portion of the source text is "stored". -- Jack

Re: Taking a full text, then truncate and duplicate with stopwords

2012-09-18 Thread Spadez
Ok, thank you for the reply. I have one more question then I think everything is cleared up. If I have this code: The truncated_description is one I need to display in search results. If I set this to stored=true as above (so it can be displayed in results), does it mean that I am storing t

Re: solr user group

2012-09-18 Thread Jack Krupansky
Did you send them from the exact same email address as the original subscriptions? Did you follow all of the suggestions listed at the "Problems?" link on the discussions page? See: https://wiki.apache.org/solr/Unsubscribing%20from%20mailing%20lists -- Jack Krupansky -Original Message--

Re: Taking a full text, then truncate and duplicate with stopwords

2012-09-18 Thread Jack Krupansky
To help unravel your confusion, to put it simply, there are three distinct "values" for any field: 1. The "source" or "input" value - what you place in SolrXML or the raw value you "add" to a SolrInputDocument. 2. The "indexed" value that can be queried against. The source value is (optionally

Re: Taking a full text, then truncate and duplicate with stopwords

2012-09-18 Thread Ahmet Arslan
Hi James, > In order to do the copyfield > technique, I need to store the original full text document > within Solr, like > this: > > stored="false"> > indexed="true" > stored="true*"> No, that's not true. You can use copyField with stored="false". In other words, source field ( keyword_d

Re: Taking a full text, then truncate and duplicate with stopwords

No, you do not have to store anything for copyField to work. You're overthinking the problem. Way up top, when the original data comes in to a field (indexed or not, stored or not) the schema is scanned for any copyfields that use the field for a source. Then the whole input is sent to both fields

solr user group

sorry for the broadcast, but the solr list server is just not taking the hint yet, I have issued the following commands on the following dates: Sent Mon 08/27/2012 10:37 PM to 'solr-user-unsubscr...@lucene.apache.org' subject = unsubscribe Sent Mon 07/16/2012 6:53 AM to 'solr-user-unsubscr...@

Re: Solr4 how to make it do this?

Extracting meaning from user input is notoriously difficult, it's such a short bit of text. The whole notion of applying some intelligence to parsing the user input and trying to "do the right thing" is, indeed, a common request. The problem is that "the right thing" varies so wildly from problem

Re: Best way to index Solr XML from w/in the same servlet container

Hi Jay, I would like to see the Zookeeper Watcher as part of DIH in solr. Possible you could extend org.apache.solr.handler.dataimport.DataSource. If you want to call solr without http you can use solrJ: org.apache.solr.client.solrj.embedded.EmbeddedSolrServer Beste regards Karsten

Re: SOLR memory usage jump in JVM

I used GC in different situations and tried back and forth. Yes, it reduces the used heap memory, but not by 5GB. Even so that GC from jconsole (or jvisualvm) is "Full GC". But while you bring GC into this, there is another interesting thing. - I have one slave running for a week which ends up aro

Re: Solr4 how to make it do this?

Thanks. Yeah I was hoping someone might have a solution. Seems to me a potential common scenario. ( a search term being/stemming to an actual field). I did think I might have to filter before passing to Solr but thats worst case scenario for me. -- View this message in context: http://lucene.4

Re: Taking a full text, then truncate and duplicate with stopwords

Ok, I’ve been doing a bit more research. In order to do the copyfield technique, I need to store the original full text document within Solr, like this: true*"> What about instead if I imported the same fulltext into two seperate fields for Solr by my Python script: trucated_description=post.d

Re: SOLR memory usage jump in JVM

What happens if you attach jconsole (should ship with your SDK) and force a GC? Does the extra 5G go away? I'm wondering if you get a couple of warming searchers going simultaneously and happened to measure after that. Uwe has an interesting blog about memory, he recommends using as little as pos

Re: Solr4 how to make it do this?

Hi George, I don't think this will work. The synonyms will be added after the query is parsed, so you'll have terms like "bed:3" rather than matching "3" against the bed field. If I was implementing this I'd try doing some pattern matching before passing the query to Solr, e.g.: "3 bed

JUnit tests add entrys to data/solr/folder

Is it possible to differ between Junits and manual Tests? I would like to create a new folder for Junit tests...so that i have 2 different folders for my test entrys. e.g.) - manual test go to /data/solr/person/index - JUnit test should go to /data/solr/person-junit/index the folder is co

Re: Solr4 how to make it do this?

I guess I could come up with a synonyms.txt file and every instance of 3 bed I change to bed:3 it "should" work. eg 3 bed => bed:3 not exactly a synonym or what it was designed for, but it might work? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr4-how-to-make-it-

Re: 3.6.1 unable to create compound Index files?

Hi, Just discoved that this seems to dependent on *noCFSRatio [Lucene 2790] .* If I need to make it by default to usecompound File format ,then where should I change , can this be changed only at code level or is there any config setting which allows me to specify that it should always be comp

Solr4 how to make it do this?

Hi all, Simple scenario but I dont think a simple solution for a real estate website. I have an example schema field values in "bed" are numbers 0-20. In my website search box (simple html text input) I have a scenario where in the keyword input box, people may type in a natural search similar

Re: SOLR memory usage jump in JVM

Hi Otis, not really a problem because I have plenty of memory ;-) -Xmx25g -Xms25g -Xmn6g I'm just interested into this. Can you report similar jumps within JVM with your monitoring at sematext? Actually I would assume to see jumps of 0.5GB or even 1GB, but 5GB? And what is the cause, a cache? A