Re: Testing Solr

2010-12-16 Thread Dennis Gearon
There are websites with data sets out there. 'Data sets' may not be the right search terms, but it's something like that. Exactly what you want, I couldn't guess otherwise? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a

Re: Best practice for Delta every 2 Minutes.

2010-12-16 Thread Dennis Gearon
BTW, what is a Delta (in this context, not an equipment line or a rocket, please :-) Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself.

Re: Best practice for Delta every 2 Minutes.

2010-12-16 Thread Li Li
we now meet the same situation and want to implement like this: we add new documents to a RAMDirectory and search two indice-- the index in disk and the RAM index. regularly(e.g. every hour we flush the RAMDirecotry into disk and make a new segment) to prevent error. before add to RAMDirecotry,we w

Testing Solr

2010-12-16 Thread satya swaroop
Hi All, I built solr successfully and i am thinking to test it with nearly 300 pdf files, 300 docs, 300 excel files,...and so on of each type with 300 files nearly Is there any dummy data available to test for solr,Otherwise i need to download each and every file individually..?? An

Re: Best practice for Delta every 2 Minutes.

2010-12-16 Thread Li Li
I think it will not because default configuration can only have 2 newSearcher threads but the delay will be more and more long. The newer newSearcher will wait these 2 ealier one to finish. 2010/12/1 Jonathan Rochkind : > If your index warmings take longer than two minutes, but you're doing a > co

A schema inside a Solr Schema (Schema in a can)

2010-12-16 Thread Dennis Gearon
Is it possible to put name value pairs of any type in a native Solr Index field type? Like JSON/XML/YML? The reason that I ask, since you asked, is I want my main index schema to be a base object, and another multivalue column to be the attributes of base object inherited descendants. Is ther

Solr (and mabye Java?) version numbering systems

2010-12-16 Thread Dennis Gearon
I've inferred from a bunch of posts that Solr 1.4 is actually the upcoming 4.x release? And the numbering systems on other Java products don't seem to match what's really out there,i.e Eclipse and Sun Java. So what IS the Solr versioning number system? Can anyone give a (maybe possible) chrono

Re: Got error when range query and highlight

2010-12-16 Thread Ahmet Arslan
> > Adding &hl.requireFieldMatch=true should probably > solve your problem. > > > Yes, adding &hl.requireFieldMatch=true can solve my > problem, but in my > solution , I have a "content" field indexing all fields' > contents to > support full text search, but I also have another 2 fields > "title"

Re: Got error when range query and highlight

2010-12-16 Thread Qi Ouyang
Thank you for reply. > Are you using hl.highlightMultiTerm=true? Pasting your search URL can give > more hints. > Yes, I used the "hl.highlightMultiTerm=true" , my search query is as follows : start=0&rows=10&facet.mincount=1&facet.field=authornav&facet.field=contentsearchkeywordnav&facet.field=

Re: Got error when range query and highlight

2010-12-16 Thread Ahmet Arslan
> I got an error as follows when I do a range query search > ([1 TO *]) > on an numeric field and highlight is set on another text > field. Are you using hl.highlightMultiTerm=true? Pasting your search URL can give more hints. Adding &hl.requireFieldMatch=true should probably solve your problem.

Got error when range query and highlight

2010-12-16 Thread Qi Ouyang
Hello all, I got an error as follows when I do a range query search ([1 TO *]) on an numeric field and highlight is set on another text field. 2010/12/15 10:58:55 org.apache.solr.common.SolrException log Fatal: org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 1024

Re: bulk commits

2010-12-16 Thread Dennis Gearon
Thanks Adam! Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EAR

Re: facet.pivot for date fields

2010-12-16 Thread Adeel Qureshi
i guess one last call for help .. i am assuming for people who wrote or have used the pivot faceting .. this should be a yes no question .. are date fields supported ? On Wed, Dec 15, 2010 at 12:58 PM, Adeel Qureshi wrote: > Thanks Pankaj - that was useful to know. I havent used the query stuff >

Re: bulk commits

2010-12-16 Thread Adam Estrada
One very important thing I forgot to mention is that you will have to increase the JAVA heap size for larger data sets. Set JAVA_OPT to something acceptable. Adam On Thu, Dec 16, 2010 at 3:27 PM, Yonik Seeley wrote: > On Thu, Dec 16, 2010 at 3:06 PM, Dennis Gearon > wrote: > > That easy, huh?

Re: Faceted Search Slows Down as index gets larger

2010-12-16 Thread Yonik Seeley
Another thing you can try is trunk. This specific case has been improved by an order of magnitude recenty. The case that has been sped up is initial population of the filterCache, or when the filterCache can't hold all of the unique values, or when faceting is configured to not use the filterCache

Re: Query Problem

2010-12-16 Thread Erick Erickson
OK, it works perfectly for me on a 1.4.1 instance. I've looked over your files a couple of times and see nothing obvious (but you'll never find anyone better at overlooking the obvious than me!). Tokenizing and stemming are irrelevant in this case because your type is "string", which is an untoken

Re: Jquery Autocomplete Json formatting ?

2010-12-16 Thread lee carroll
I think this could be down to the same server rule applied to ajax requests. Your not allowed to display content from two different servers :-( the good news solr supports jsonp which is a neat trick around this try this (pasted from another thread) queryString = "*:*" $.getJSON( "ht

Re: how to config DataImport Scheduling

2010-12-16 Thread Ahmet Arslan
> I also have the same problem, i configure > dataimport.properties file as shown > in > http://wiki.apache.org/solr/DataImportHandler#dataimport.properties_example > but no change occur, can any one help me What version of solr are you using? This seems a new feature. So it won't work on solr 1

Re: Faceted Search Slows Down as index gets larger

2010-12-16 Thread Furkan Kuru
I am sorry for raising up this thread after 6 months. But we have still problems with faceted search on full-text fields. We try to get most frequent words in a text field that is created in 1 hour. The faceted search takes too much time even the matching number of documents (created_at within 1

Re: Jquery Autocomplete Json formatting ?

2010-12-16 Thread Anurag
Installed Firebug Now getting the following error 4139 matches.call( document.documentElement, "[test!='']:sizzle" ); Though my solr server is running on port8983, I am not using any server to run this jquery, its just an html file in my home folder that i am opening in my firefox browser. --

Re: Jquery Autocomplete Json formatting ?

2010-12-16 Thread Anurag
Installed Firebug Now getting the following error 4139 matches.call( document.documentElement, "[test!='']:sizzle" ); Though my solr server is running on port8983, I am not using any server to run this jquery, its just an html file in my home folder that i am opening in my firefox browser. --

Re: Query Problem

2010-12-16 Thread Ezequiel Calderara
The jars are named like *1.4.1* . So i suppose its the version 1.4.1 Thanks! On Thu, Dec 16, 2010 at 6:54 PM, Erick Erickson wrote: > OK, what version of Solr are you using? I can take a quick check to see > what behavior I get > > Erick > > On Thu, Dec 16, 2010 at 4:44 PM, Ezequiel Calderar

Re: Query Problem

2010-12-16 Thread Erick Erickson
OK, what version of Solr are you using? I can take a quick check to see what behavior I get Erick On Thu, Dec 16, 2010 at 4:44 PM, Ezequiel Calderara wrote: > I'll check the Tokenizer to see if that's the problem. > The results of Analysis Page for "SectionName:Programas_Home" > Query Analy

Re: Query Problem

2010-12-16 Thread Ezequiel Calderara
I'll check the Tokenizer to see if that's the problem. The results of Analysis Page for "SectionName:Programas_Home" Query Analyzer org.apache.solr.schema.FieldType$DefaultAnalyzer {} term position 1 term text Programas_Home term type word source start,end 0,14 payload So it's not having problem

Re: Memory use during merges (OOM)

2010-12-16 Thread Yonik Seeley
On Thu, Dec 16, 2010 at 5:51 AM, Michael McCandless wrote: > If you are doing false deletions (calling .updateDocument when in fact > the Term you are replacing cannot exist) it'd be best if possible to > change the app to not call .updateDocument if you know the Term > doesn't exist. FWIW, if yo

Re: Memory use during merges (OOM)

2010-12-16 Thread Robert Muir
On Thu, Dec 16, 2010 at 4:03 PM, Burton-West, Tom wrote: >>>Your setting isn't being applied to the reader IW uses during >>>merging... its only for readers Solr opens from directories >>>explicitly. >>>I think you should open a jira issue! > > Do I understand correctly that this setting in theory

Re: Memory use during merges (OOM)

2010-12-16 Thread Michael McCandless
On Thu, Dec 16, 2010 at 4:03 PM, Burton-West, Tom wrote: >>>Your setting isn't being applied to the reader IW uses during >>>merging... its only for readers Solr opens from directories >>>explicitly. >>>I think you should open a jira issue! > > Do I understand correctly that this setting in theory

RE: Memory use during merges (OOM)

2010-12-16 Thread Burton-West, Tom
>>Your setting isn't being applied to the reader IW uses during >>merging... its only for readers Solr opens from directories >>explicitly. >>I think you should open a jira issue! Do I understand correctly that this setting in theory could be applied to the reader IW uses during merging but is no

Re: Query Problem

2010-12-16 Thread Erick Erickson
Ezequiel: Nice job of including relevant details, by the way. Unfortunately I'm puzzled too. Your SectionName is a "string" type, so it should be placed in the index as-is. Be a bit cautious about looking at returned results (as I see in one of your xml files) because the returned values are the v

Re: bulk commits

2010-12-16 Thread Yonik Seeley
On Thu, Dec 16, 2010 at 3:06 PM, Dennis Gearon wrote: > That easy, huh? Heck, this gets better and better. > > BTW, how about escaping? The CSV escaping? It's configurable to allow for loading different CSV dialects. http://wiki.apache.org/solr/UpdateCSV By default it uses double quote encapsu

Re: bulk commits

2010-12-16 Thread Dennis Gearon
That easy, huh? Heck, this gets better and better. BTW, how about escaping? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from

Re: Dataimport performance

2010-12-16 Thread Glen Newton
Hi, LuSqlv2 beta comes out in the next few weeks, and is designed to address this issue (among others). LuSql original (http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql now moved to: https://code.google.com/p/lusql/) is a JDBC-->Lucene high performance loader. You may have se

Re: Memory use during merges (OOM)

2010-12-16 Thread Robert Muir
On Thu, Dec 16, 2010 at 2:09 PM, Burton-West, Tom wrote: > > I always get confused about the two different divisors and their names in the > solrconfig.xml file This one (for the writer) isnt configurable by Solr. want to open an issue? > > We are setting  termInfosIndexDivisor, which I think t

Re: Memory use during merges (OOM)

2010-12-16 Thread Michael McCandless
Actually terms index is something different. If you don't use CFS, go and look at the size of *.tii in your index directory -- those are the terms index. The terms index picks a subset of the terms (by default 128) to hold in RAM (plus some metadata) in order to make seeking to a specific term fa

Re: Memory use during merges (OOM)

2010-12-16 Thread Michael McCandless
On Thu, Dec 16, 2010 at 2:09 PM, Burton-West, Tom wrote: > Thanks Mike, > >>>But, if you are doing deletions (or updateDocument, which is just a >>>delete + add under-the-hood), then this will force the terms index of >>>the segment readers to be loaded, thus consuming more RAM. > > Out of 700,000

RE: Memory use during merges (OOM)

2010-12-16 Thread Robert Petersen
Thanks Mike! When you say 'term index of the segment readers', are you referring to the term vectors? In our case our index of 8 million docs holds pretty 'skinny' docs containing searchable product titles and keywords, with the rest of the doc only holding Ids for faceting upon. Docs typical

RE: Memory use during merges (OOM)

2010-12-16 Thread Burton-West, Tom
Thanks Mike, >>But, if you are doing deletions (or updateDocument, which is just a >>delete + add under-the-hood), then this will force the terms index of >>the segment readers to be loaded, thus consuming more RAM. Out of 700,000 docs, by the time we get to doc 600,000, there is a good chance a

Query Problem

2010-12-16 Thread Ezequiel Calderara
Hi all, I have the following problems. I have this set of data (View data (Pastebin) ) If i do a search for: *SectionName:Programas_Home* i have no results: Returned Data (PasteBin) If i do a search for: *Programas_Home* i have only 1 re

Re: bulk commits

2010-12-16 Thread Adam Estrada
This is how I import a lot of data from a cvs file. There are close to 100k records in there. Note that you can either pre-define the column names using the fieldnames param like I did here *or* include header=true which will automatically pick up the column header if your file has it. curl " http

Re: bulk commits

2010-12-16 Thread Adam Estrada
what is it that you are trying to commit? a On Thu, Dec 16, 2010 at 1:03 PM, Dennis Gearon wrote: > What have people found as the best way to do bulk commits either from the > web or > from a file on the system? > > Dennis Gearon > > > Signature Warning > > It is always a good

Re: Memory use during merges (OOM)

2010-12-16 Thread Michael McCandless
It's not that it's "bad", it's just that Lucene must do extra work to check if these deletes are real or not, and that extra work requires loading the terms index which will consume additional RAM. For most apps, though, the terms index is relatively small and so this isn't really an issue. But i

Re: Thank you!

2010-12-16 Thread Dennis Gearon
If I ever make it, wikipedia, stackoverflow, PHP, Symfony, Doctrine, Apache are all going to get donations. I already send $20 to wikipedia, they're huring now. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better

RE: Memory use during merges (OOM)

2010-12-16 Thread Robert Petersen
Hello we occasionally bump into the OOM issue during merging after propagation too, and from the discussion below I guess we are doing thousands of 'false deletions' by unique id to make sure certain documents are *not* in the index. Could anyone explain why that is bad? I didn't really unders

Re: PHPSolrClient

2010-12-16 Thread Dennis Gearon
So just use add and overwrite. OK, thanks Dennis Gearon Signature Warning - - Original Message From: Tanguy Moal To: solr-user@lucene.apache.org Sent: Thu, December 16, 2010 1:33:36 AM Subject: Re: PHPSolrClient Hi Dennis, Not particular to the client you use (solr-php-client) for

RE: Dataimport performance

2010-12-16 Thread Dyer, James
We have ~50 long-running SQL queries that need to be joined and denormalized. Not all of the queries are to the same db, and some data comes from fixed-width data feeds. Our current search engine (that we are converting to SOLR) has a fast disk-caching mechanism that lets you cache all of thes

Re: Why does Solr commit block indexing?

2010-12-16 Thread Michael McCandless
Unfortunately, (I think?) Solr currently commits by closing the IndexWriter, which must wait for any running merges to complete, and then opening a new one. This is really rather silly because IndexWriter has had its own commit method (which does not block ongoing indexing nor merging) for quite s

Re: Multicore Search broken

2010-12-16 Thread Jörg Agatz
I have tryed some Thinks, now i have new news, when i search in : http://localhost:8080/solr/mail/select?q=*:*&shards=localhost:8080/solr/mail,localhost:8080/solr/

Re: STUCK Threads at "org.apache.lucene.document.CompressionTools.decompress"

2010-12-16 Thread Alexander Ramos Jardim
2010/12/16 Erick Erickson > What are you trying to do? It sounds like you're storing fields compressed, > is > that true (i.e. defining compressed=true in your field defs)? If so, why? > It > may be > costing you more than you benefit. > > No compressed fields in my schema > A quick test would

Case Insensitive sorting while preserving case during faceted search

2010-12-16 Thread shan2812
Hi, I am trying to do a facet search and sort the facet values too. First I tried with 'solr.TextField' as field type. But this does not return sorted facet values. After referring to FAQ(http://wiki.apache.org/solr/FAQ#Why_Isn.27t_Sorting_Working_on_my_Text_Fields.3F), I changed it to 'solr.St

Re: Determining core name from a result?

2010-12-16 Thread Mark Allan
Oops! Sorry, I thought shard and core were one in the same and the terms could be used interchangeably - I've got a multicore setup which I'm able to search across by using the shards parameter. I think you're right, that *is* the question I was asking. Thanks for letting me know it's not

Re: Query performance issue while using EdgeNGram

2010-12-16 Thread Erick Erickson
A couple of observations: 1> your regex at query time is interesting. You're using KeywordTokenizer, so input of "search me" becomes "searchme" before it goes through the parser. Is this your intent? 2> Why are you using EdgeNGrams for auto suggest? The TermsComponent is an easier, more e

Re: Determining core name from a result?

2010-12-16 Thread Chris Hostetter
: Subject: Determining core name from a result? FYI: some people may be confused because of terminoligy -- i think what you are asking is how to know which *shard* a document came from when doing a distributed search. This isn't currently supported, there is an open issue tracking it... https

Re: STUCK Threads at "org.apache.lucene.document.CompressionTools.decompress"

2010-12-16 Thread Erick Erickson
What are you trying to do? It sounds like you're storing fields compressed, is that true (i.e. defining compressed=true in your field defs)? If so, why? It may be costing you more than you benefit. A quick test would be to stop returning anything except the score by specifying &fl=score. Or at lea

Re: how to config DataImport Scheduling

2010-12-16 Thread do3do3
I also have the same problem, i configure dataimport.properties file as shown in http://wiki.apache.org/solr/DataImportHandler#dataimport.properties_example but no change occur, can any one help me -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-config-DataImport-Sc

Multicore Search broken

2010-12-16 Thread Jörg Agatz
Hallo users, I have create a Multicore instance from Solr with Tomcat6, i create two Cores "mail" and "index2" at first, mail and index2 are the Same config, after this, i change the Mail config and Indexing 30 xml No when i search in each core: http://localhost:8080/solr/mail/select?q=*:*&s

Re: indexing a lot of XML dokuments

2010-12-16 Thread Adam Estrada
I have been very successful in following this example http://wiki.apache.org/solr/DataImportHandler#HttpDataSource_Example Adam On Thu, Dec 16, 2010 at 5:44 AM, Jörg Agatz wrote: > hi, users, i serch e way to indexing a lot of

Why does Solr commit block indexing?

2010-12-16 Thread Renaud Delbru
Hi, See log at [1]. We are using the latest snapshot of lucene_branch3.1. We have configured Solr to use the ConcurrentMergeScheduler: When a commit() runs, it blocks indexing (all imcoming update requests are blocked until the commit operation is finished) ... at the end of the log we noti

STUCK Threads at "org.apache.lucene.document.CompressionTools.decompress"

2010-12-16 Thread Alexander Ramos Jardim
Hello guys, I am getting threads stuck forever at "* org.apache.lucene.document.CompressionTools.decompress*". I am using Weblogic 10.02, with solr deployed as ear and no work manager specifically configured for this instance. Only doing simple queries at this node (q=itemId:9 or q:skuId:

Re: Determining core name from a result?

2010-12-16 Thread Mark Allan
Hi Grant, Thanks for your reply. I'm using solrj to connect via http, which eventually sends this query http://localhost:8984/solr/core0/select/?q=id:022-80633905&version=2&start=0&rows=1&fl=*&indent=on&shards=localhost:8984/solr/core0,localhost:8984/solr/core1,localhost:8984/solr/core2,local

Re: Thank you!

2010-12-16 Thread kenf_nc
Hear hear! In the beginning of my journey with Solr/Lucene I couldn't have done it without this site. Smiley and Pugh's book was useful, but this forum was invaluable. I don't have as many questions now, but each new venture, Geospatial searching, replication and redundancy, performance tuning, b

Re: Determining core name from a result?

2010-12-16 Thread Grant Ingersoll
How are you querying the core to begin with? On Dec 16, 2010, at 6:46 AM, Mark Allan wrote: > Hi all, > > I've been bashing my head against the wall for a few hours now, trying to get > mlt (more-like-this) queries working across multiple cores. I've since seen a > JIRA issue and documentation

Re: PHPSolrClient

2010-12-16 Thread Erick Erickson
As Tanguy says, simply re-adding a document with the same will automatically delete/readd the doc. But I wanted to add a caution about your phrase "read-delete-modify-create" You only get back what you #stored#. So generally the update is done from the original source rather than the index. So i

Re: Google like search

2010-12-16 Thread satya swaroop
Hi All, Thanks for your suggestions.. I got the result of what i expected.. Cheers, Satya

Determining core name from a result?

2010-12-16 Thread Mark Allan
Hi all, I've been bashing my head against the wall for a few hours now, trying to get mlt (more-like-this) queries working across multiple cores. I've since seen a JIRA issue and documentation saying that multicore doesn't yet support mlt queries. Oops! Anyway, to get around this, I was

Re: Results from More then One Cors?

2010-12-16 Thread Jörg Agatz
ok, works Great, at the Beginning, but now i get a Big Error :-( HTTP Status 500 - null java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:462) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:298)

Re: Memory use during merges (OOM)

2010-12-16 Thread Michael McCandless
RAM usage for merging is tricky. First off, merging must hold open a SegmentReader for each segment being merged. However, it's not necessarily a full segment reader; for example, merging doesn't need the terms index nor norms. But it will load deleted docs. But, if you are doing deletions (or

indexing a lot of XML dokuments

2010-12-16 Thread Jörg Agatz
hi, users, i serch e way to indexing a lot of iml Dokuments so fast as Possible. i have more than 1 million docs on Server 1 and a SolR multicor an Server 2 with tomcat. i dont know ho i can do it easy and fast.. I cant find a idea in the wiki, maby you have some ideas? King

Re: PHPSolrClient

2010-12-16 Thread Tanguy Moal
Hi Dennis, Not particular to the client you use (solr-php-client) for sending documents, think of update as an overwrite. This means that if you update a particular document, the previous version indexed is lost. Therefore, when updating a document, make sure that all the fields to be indexed and

Re: Thank you!

2010-12-16 Thread Dennis Gearon
I feel the same way about this group and the Postgres group. VERY helpful people. All of us helping heacho other. Dennis Gearon Signature Warning - Original Message From: Adam Estrada Subject: Thank you! I just want to say that this list serve has been invaluable t

PHPSolrClient

2010-12-16 Thread Dennis Gearon
First of all, it's a very nice piece of work. I am just getting my feet wet with Solr in general. So I 'am not even sure how a document is NORMALLY deleted. The library PHPDocs say 'add', 'get' 'delete', But does anyone know about 'update'? (obviously one can read-delete-modify-create) Denn

RE: Dataimport performance

2010-12-16 Thread Ephraim Ofir
Check out http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201008.mbox/%3c9f8b39cb3b7c6d4594293ea29ccf438b01702...@icq-mail.icq.il.office.aol.com%3e This approach of not using sub entities really improved our load time. Ephraim Ofir -Original Message- From: Robert Gründler [mail

Re: Memory use during merges (OOM)

2010-12-16 Thread Upayavira
How long does it take to reach this OOM situation? Is it possible for you to try a merge with each setting in turn, and evaluate what impact they each have? That is, indexing speed and memory consumption? It might be interesting to watch garbage collection too while it is running with jstat, as tha

Re: search for a number within a range, where range values are mentioned in documents

2010-12-16 Thread lee carroll
During data import can you update a record with min and max fields, these would be equal in the case of a single non range value. I know this is not a solr solution but a data pre-processing one but would work? Failing the above i've saw in the docs reference to a compound value field (in the con