RE: Userdefined Field type - Faceting

2010-12-14 Thread Viswa S
This worked, thanks Yonik. -Viswa > Date: Mon, 13 Dec 2010 22:54:35 -0500 > Subject: Re: Userdefined Field type - Faceting > From: yo...@lucidimagination.com > To: solr-user@lucene.apache.org > > Perhaps try overriding indexedToReadable() also? > > -Yonik > http://www.lucidimagination.com > >

Re: facet.pivot for date fields

2010-12-14 Thread pankaj bhatt
Hi Adeel, You can make use of facet.query attribute to make the Faceting work across a range of dates. Here i am using the duration, just replace the field with a field date and Range values as the DATE in SOLR Format. so your query parameter will be like this ( you can pass multiple paramete

Re: Google like search

2010-12-14 Thread Bhavnik Gajjar
Hi Satya, Coming to your original question, there is one possibility to make Solr emit snippets like Google. Solr query syntax goes like, http://localhost:8080/solr/DefaultInstance/select/?q=java&version=2.2&start=0&rows=10&indent=on&hl=true&hl.snippets=5&hl.fl=Field_Text&fl=Field_Text Note tha

Re: Search with facet.pivot

2010-12-14 Thread Anders Dam
I forgot to mention that the query is handlede by the Dismax Request Handler Grant, from the tag and down you see all the query parameters used. The only thing varying from query to query is the actual query (q), When searching on by example '1000' (q=1000) facet.pivot fields are correctly return

Re: limit the search results to one category

2010-12-14 Thread sara motahari
I guess so. I didn't know I could use it with dismax I'll try. thanks Ahmet. From: Ahmet Arslan To: solr-user@lucene.apache.org Sent: Tue, December 14, 2010 5:42:51 PM Subject: Re: limit the search results to one category > I am using a dismax request handler

Re: limit the search results to one category

2010-12-14 Thread Ahmet Arslan
> I am using a dismax request handler with vrious fieds that > it searches, but I > also want to enable the users to select a category from a > drop-down list > and only get the results that belong to that category. It > seems I can't use a > nested query with dismax as the first one and standar

limit the search results to one category

2010-12-14 Thread sara motahari
Hi all, I am using a dismax request handler with vrious fieds that it searches, but I also want to enable the users to select a category from a drop-down list and only get the results that belong to that category. It seems I can't use a nested query with dismax as the first one and standard as

Re: Newbie: Indexing unrelated MySQL tables

2010-12-14 Thread Alexey Serba
> I figured I would create three entities and relevant > schema.xml entries in this way: > > dataimport.xml: > > > That's correct. You can list several entities under document element. You can index them separately using entity parameter (i.e. add entity=Users to you full import HTTP request). D

Re: Need some guidance on solr-config settings

2010-12-14 Thread Shawn Heisey
On 12/14/2010 5:05 PM, Mark wrote: Excellent reply. You mentioned: "I've been experimenting with FastLRUCache versus LRUCache, because I read that below a certain hitratio, the latter is better." Do you happen to remember what that threshold is? Thanks Looks like it's 75%, and that it's do

Re: Need some guidance on solr-config settings

2010-12-14 Thread Mark
Excellent reply. You mentioned: "I've been experimenting with FastLRUCache versus LRUCache, because I read that below a certain hitratio, the latter is better." Do you happen to remember what that threshold is? Thanks On 12/14/10 7:59 AM, Shawn Heisey wrote: On 12/14/2010 8:31 AM, Mark wrot

Re: my index has 500 million docs ,how to improve so lr search performance?

2010-12-14 Thread Alexey Serba
How much memory do you allocate for JVMs? Considering you have 10 JVMs per server (10*N) you might have not enough memory for OS file system cache ( you need to keep some memory free for that ) > all indexs size is about 100G is this per server or whole size? On Mon, Nov 15, 2010 at 8:35 AM, lu.

Re: Syncing 'delta-import' with 'select' query

2010-12-14 Thread Alexey Serba
What Solr version do you use? It seems that sync flag has been added to 3.1 and 4.0 (trunk) branches and not to 1.4 https://issues.apache.org/jira/browse/SOLR-1721 On Wed, Dec 8, 2010 at 11:21 PM, Juan Manuel Alvarez wrote: > Hello everyone! > I have been doing some tests, but it seems I can't m

Re: Changing the default Fuzzy minSimilarity?

2010-12-14 Thread Robert Muir
On Tue, Dec 14, 2010 at 5:51 PM, Jan Høydahl / Cominvent wrote: > Hi, > > A fuzzy query foo~ defaults to a similarity of 0.5, i.e. equal to foo~0.5 > just as an FYI, this isn't true in trunk (4.0) any more. the defaults are changed so that it never enumerates the entire dictionary (slow) like be

Changing the default Fuzzy minSimilarity?

2010-12-14 Thread Jan Høydahl / Cominvent
Hi, A fuzzy query foo~ defaults to a similarity of 0.5, i.e. equal to foo~0.5 I want to set the default to 0.8 so that if a user enters the query foo~ it euqals to foo~0.8 Have not seen a way to do this in Solr. A param &fuzzy.minSim=0.8 would do the trick. Anything like this, or shall I open

Re: [DIH] Example for SQL Server

2010-12-14 Thread Erick Erickson
The config isn't really any different for various sql instances, about the only difference is the driver. Have you seen the example in the distribution somewhere like /example/example-DIH/solr/db/conf/db-data-config.xml? Also, there's a magic URL for debugging DIH at: .../solr/admin/dataimport.jsp

[DIH] Example for SQL Server

2010-12-14 Thread Adam Estrada
Does anyone have an example config.xml file I can take a look at for SQL Server? I need to index a lot of data from a DB and can't seem to figure out the right syntax so any help would be greatly appreciated. What is the correct /jar file to use and where do I put it in order for it to work? Thank

Re: changing data type

2010-12-14 Thread Wodek Siebor
The DIH statement works fine if I run it directly in SQL developer. It's sth like: decode(, 0, 'string_1', 1, 'string_2') The is of type int, and in the schema.xml, since the decode output is string then the corresponding indexed field is of type string. Is there a problem declaring a field in s

Re: changing data type

2010-12-14 Thread Erick Erickson
You haven't given us much to go on. Please post: 1> your DIH statement 2> your schema file, particularly and in question 3> the exception trace. 4> Anything else that comes to mind. Remember we know nothing about your particular setup... Best Erick On Tue, Dec 14, 2010 at 3:17 PM, Wodek S

facet.pivot for date fields

2010-12-14 Thread Adeel Qureshi
It doesnt seems like pivot facetting works on dates .. I was just curious if thats how its supposed to be or I am doing something wrong .. if I include a datefield in the pivot list .. i simply dont get any facet results back for that datefield Thanks Adeel

Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-14 Thread Jonathan Rochkind
Thanks Shawn, that helps explain things. So the issue there, with using maxSearchWarmers to try and prevent out of control RAM/CPU usage from over-lapping on-deck, combined with replication... is if you're still pulling down replications very frequently but using maxSearchWarmers to prevent ov

changing data type

2010-12-14 Thread Wodek Siebor
Using DataImportHandler. In the select statement I use Oracle decode() function. As the result I have to change the indexed field from int to string. However, during the load Solr throws an exception. Any experience with that? Thanks -- View this message in context: http://lucene.472066.n3.nab

changing data type

2010-12-14 Thread Wodek Siebor
Using DataImportHandler. In the select statement I use Oracle decode() function. As the result I have to change the indexed field from int to string. However, during the load Solr throws an exception. Any experience with that? Thanks -- View this message in context: http://lucene.472066.n3.nab

Re: RAM usage issues

2010-12-14 Thread Shawn Heisey
On 12/13/2010 9:46 PM, Cameron Hurst wrote: When i start the server I am using about 90MB of RAM which is fine and from the google searches I found that is normal. The issue comes when I start indexing data. In my solrconf.xml file that my maximum RAM buffer is 32MB. In my mind that means that th

Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-14 Thread Shawn Heisey
On 12/14/2010 9:02 AM, Jonathan Rochkind wrote: 1. Will the existing index searcher have problems because the files have been changed out from under it? 2. Will a future replication -- at which NO new files are available on master -- still trigger a future commit on slave? I'm not really sur

Re: Very high load

2010-12-14 Thread Shawn Heisey
On 12/13/2010 9:15 PM, Mark wrote: No cache warming queries and our machines have 8g of memory in them with about 5120m of ram dedicated to so Solr. When our index is around 10-11g in size everything runs smoothly. At around 20g+ it just falls apart. I just replied to your new email thread, c

Re: Solr 1.4 replication, cleaning up old indexes

2010-12-14 Thread Shawn Heisey
On 12/14/2010 9:13 AM, Tim Heckman wrote: Once per day in the morning, I run a full index + optimize into an "on deck" core. When this is complete, I swap the "on deck" with the live core. A side-effect of this is that the version number / generation of the live index just went backwards, since t

Re: Solr 1.4 replication, cleaning up old indexes

2010-12-14 Thread Tim Heckman
On Tue, Dec 14, 2010 at 10:37 AM, Shawn Heisey wrote: > It's supposed to take care of removing the old indexes on its own - when > everything is working, it builds an index. directory, replicates, > swaps that directory in to replace index, and deletes the directory with the > timestamp.  I have n

Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-14 Thread Jonathan Rochkind
Yeah, I understand basically how caches work. What I don't understand is what happens in replication if, the new segment files are succesfully copied, but the actual commit fails due to maxAutoWarmingSearches. The new files are on disk... but the commit could not succeed and there is NOT a ne

Re: Need some guidance on solr-config settings

2010-12-14 Thread Shawn Heisey
On 12/14/2010 8:31 AM, Mark wrote: Can anyone offer some advice on what some good settings would be for an index or around 6 million documents totaling around 20-25gb? It seems like when our index gets to this size our CPU load spikes tremendously. If you are adding, deleting, or updating doc

Re: Solr 1.4 replication, cleaning up old indexes

2010-12-14 Thread Shawn Heisey
On 12/14/2010 8:31 AM, Tim Heckman wrote: When using the index replication over HTTP that was introduced in Solr 1.4, what is the recommended way to periodically clean up old indexes on the slaves? I found references to the snapcleaner script, but that seems to be for the older ssh/rsync replica

Need some guidance on solr-config settings

2010-12-14 Thread Mark
Can anyone offer some advice on what some good settings would be for an index or around 6 million documents totaling around 20-25gb? It seems like when our index gets to this size our CPU load spikes tremendously. What would be some appropriate settings for ramBufferSize and mergeFactor? We cu

Solr 1.4 replication, cleaning up old indexes

2010-12-14 Thread Tim Heckman
When using the index replication over HTTP that was introduced in Solr 1.4, what is the recommended way to periodically clean up old indexes on the slaves? I found references to the snapcleaner script, but that seems to be for the older ssh/rsync replication model. thanks, Tim

Re: Google like search

2010-12-14 Thread Tanguy Moal
To do so, you have several possibilities, I don't know if there is a best one. It depends pretty much on the format of the input file(s), your affinities with a given programing language,some libraries you might need and the time you're ready to spend on this task. Consider having a look at SolrJ

Re: Google like search

2010-12-14 Thread satya swaroop
Hi Tanguy, Thanks for ur reply. sorry to ask this type of question. how can we index each chapter of a file as seperate document.As for i know we just give the path of file to solr to index it... Can u provide me any sources for this type... I mean any blogs or wiki's... Regards,

RE: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-14 Thread Upayavira
A Lucene index is made up of segments. Each commit writes a segment. Sometimes, upon commit, some segments are merged together into one, to reduce the overall segment count, as too many segments hinders performance. Upon optimisation, all segments are (typically) merged into a single segment. Repl

Re: Google like search

2010-12-14 Thread Tanguy Moal
Satya, In fact the highlighter will select the relevant part of the whole text and return it with the matched terms highlighted. If you do so for a whole book, you will face the issue spotted by Dave (too long text). To address that issue, you have the possibility to split your book in chapters,

RE: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-14 Thread Jonathan Rochkind
But the entirety of the old indexes (no longer on disk) wasn't cached in memory, right? Or is it? Maybe this is me not understanding lucene enough. I thought that portions of the index were cached in disk, but that sometimes the index reader still has to go to disk to get things that aren't cu

Re: SpatialTierQueryParserPlugin Loading Error

2010-12-14 Thread Grant Ingersoll
For this functionality, you are probably better off using trunk or branch_3x. There are quite a few patches related to that particular one that you will need to apply in order to have it work correctly. On Dec 13, 2010, at 10:06 PM, Adam Estrada wrote: > All, > > Can anyone shed some light o

Re: Search with facet.pivot

2010-12-14 Thread Grant Ingersoll
The formatting of your message is a bit hard to read. Could you please clarify which commands worked and which ones didn't? Since the pivot stuff is relatively new, there could very well be a bug, so if you can give a simple test case that shows what is going on that would also be helpful, alb

Re: Solr Tika, Text with style

2010-12-14 Thread Grant Ingersoll
To do that, you need to keep the original content and store it in a field. On Dec 11, 2010, at 10:56 AM, ali fathieh wrote: > Hello, > I've seen this link: > http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Content-Extraction-Tika > What I got is pure text without any s

Re: RAM usage issues

2010-12-14 Thread Erick Erickson
Several observations: 1> If by RAM buffer size you're referring to the value in solrconfig.xml, , that is a limit on the size of the internal buffer while indexing. When that limit is reached the data is flushed to disk. It is irrelevant to searching. 2> When you run searches, various inter

Re: SpatialTierQueryParserPlugin Loading Error

2010-12-14 Thread Markus Jelsma
Anyway, try putting the jar in work/Jetty_0_0_0_0_8983_solr.war__solr__k1kf17/webapp/WEB-INF/lib/ On Tuesday 14 December 2010 11:10:47 Markus Jelsma wrote: > Where did you put the jar? > > > All, > > > > Can anyone shed some light on this error. I can't seem to get this > > class to load. I a

Re: De-duplication not working as I expected - duplicates still getting into the index

2010-12-14 Thread Markus Jelsma
Check this setting: false On Tuesday 14 December 2010 14:26:21 Jason Brown wrote: > I have configured de-duplication according to the Wiki.. > > My signature field is defined thus... > > multiValued="false" /> > > and my updateRequestProcessor as follows > > > class="

De-duplication not working as I expected - duplicates still getting into the index

2010-12-14 Thread Jason Brown
I have configured de-duplication according to the Wiki.. My signature field is defined thus... and my updateRequestProcessor as follows true false signature content org.apache.solr.update.processor.Lookup3Signature I am using

Re: Query-Expansion, copyFields, flexibility and size of Index (Solr-3.1-SNAPSHOT)

2010-12-14 Thread mdz-munich
Okay, I start guessing: - Do we have to write a customized QueryParserPlugin? - On which point does the RequestHandler/QueryParser/whatever decide what query-analyzer to use? 10% for every copied field is a lot for us, we're facing Terra-bytes of digitized Book-Data. So we want to keep the index

RE: Google like search

2010-12-14 Thread Dave Searle
Highlighting is exactly what you need, although if you highlight the whole book, this could slow down your queries. Index/store the first 5000-1 characters and see how you get on -Original Message- From: satya swaroop [mailto:satya.yada...@gmail.com] Sent: 14 December 2010 10:08 To:

Re: Is there a way to view the values of "stored=false" fields in search results?

2010-12-14 Thread Ahmet Arslan
> But now I have a situation where I need to debug to see the > value of these fields. > So is there a way to see the value of stored=false fields? You cannot see the original values. But you can see what is indexed. http://www.getopt.org/luke/ can display it.

Is there a way to view the values of "stored=false" fields in search results?

2010-12-14 Thread Swapnonil Mukherjee
Hi All, I have setup certain fields to be indexed=true and stored=false. According to the documentation fields marked as stored=false do not appear in search results, which is perfectly ok. But now I have a situation where I need to debug to see the value of these fields. So is there a way to

Re: SpatialTierQueryParserPlugin Loading Error

2010-12-14 Thread Markus Jelsma
Where did you put the jar? > All, > > Can anyone shed some light on this error. I can't seem to get this > class to load. I am using the distribution of Solr from Lucid > Imagination and the Spatial Plugin from here > https://issues.apache.org/jira/browse/SOLR-773. I don't know how to > apply a p

Re: Google like search

2010-12-14 Thread satya swaroop
Hi Tanguy, I am not asking for highlighting.. I think it can be explained with an example.. Here i illustarte it:: when i post the query like dis:: http://localhost:8080/solr/select?q=Java&version=2.2&start=0&rows=10&indent=on i Would be getting the result as follows:: - - 0 1

Re: Google like search

2010-12-14 Thread Tanguy Moal
Hi Satya, I think what you'e looking for is called "highlighting" in the sense of "highlighting" the query terms in their matching context. You could start by googling "solr highlight", surely the first results will make sense. Solr's wiki results are usually a good entry point : http://wiki.apa

Google like search

2010-12-14 Thread satya swaroop
Hi All, Can we get the results like google having some data about the search... I was able to get the data that is the first 300 characters of a file, but it is not helpful for me, can i be get the data that is having the first found key in that file Regards, Satya

Re: Query performance very slow even after autowarming

2010-12-14 Thread johnnyisrael
Hi Chris, Thanks for looking into it. Here is the sample query. http://localhost:8080/solr/core0/select/?qt=autosuggest&q=a I am using a request handler with a name autosuggest with the following configuration. json name,score scor

Re: Solr Memory Usage

2010-12-14 Thread Toke Eskildsen
On Tue, 2010-12-14 at 06:07 +0100, Cameron Hurst wrote: [Cameron expected 150MB overhead] > As I start to index data and passing queries to the database I notice a > steady rise in the RAM but it doesn't stop at 150MB. If I continue to > reindex the exact same data set with no additional data ent