Solr edismax NOT operator behavior

2012-07-26 Thread Alok Bhandari
Hello, I am using Edismax parser and query submitted by application is of the format price:1000 AND ( NOT ( launch_date:[2007-06-07T00:00:00.000Z TO 2009-04-07T23:59:59.999Z] AND product_type:electronic)). Solr while executing gives unexpected result. I am suspecting it is because of the AND (

Re: Solr - hl.fragsize Issue

2012-07-26 Thread meghana
Hi @iorixxx , I use DefaultSolrHighlighter , and yes fragment size also includes tags but if we remove from fragment , then also the average size of fragment is 110 instead of 100. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-hl-fragsize-Issue-tp3997457p3997656.ht

Re: Map/Reduce directly against solr4 index.

2012-07-26 Thread Trung Pham
That is exactly what I want. I want the distributed Hadoop TaskNode to be running on the same server that is holding the local distributed solr index. This way there is no need to move any data around... I think other people call this feature 'data locality' of map/reduce. I believe HBase and Had

Re: solr host name on solrconfig.xml

2012-07-26 Thread stockii
okay. thx. i knw this way but its not so nice :P i set a new variable in my core.properties file which i load in solr.xml for each core =)) -- View this message in context: http://lucene.472066.n3.nabble.com/solr-host-name-on-solrconfig-xml-tp3997371p3997652.html Sent from the Solr - User ma

Re: Significance of Analyzer Class attribute

2012-07-26 Thread Rajani Maski
Hi All, Thank you for the replies. --Regards Rajani On Fri, Jul 27, 2012 at 9:58 AM, Chris Hostetter wrote: > > : > When I specify analyzer class in schema, something > : > like below and do > : > analysis on this field in analysis page : I cant see > : > verbose output on > : > tokenize

Re: Significance of Analyzer Class attribute

2012-07-26 Thread Chris Hostetter
: > When I specify analyzer class in schema,  something : > like below and do : > analysis on this field in analysis page : I cant  see : > verbose output on : > tokenizer and filters The reason for that is that if you use an explicit Analyzer implimentation, the analysis tool doesn't know what

Re: Updating a SOLR index with a properties file

2012-07-26 Thread Florian Popescu
Thanks! I will try I out and see how it works. This is for indexing a bunch of java resource bundles and trying to 'refactor' the keys. Basically trying to figure out if a key is used in multiple places and extracting it out if applicable. Florian On Jul 26, 2012, at 10:46 PM, Lance Norsko

Re: Map/Reduce directly against solr4 index.

2012-07-26 Thread Lance Norskog
No. This is just a Hadoop file input class. Distributed Hadoop has to get files from a distributed file service. It sounds like you want some kind of distributed file service that maps a TaskNode (??) on a given server to the files available on that server. There might be something that does this.

Re: Updating a SOLR index with a properties file

2012-07-26 Thread Lance Norskog
You can use the DataImportHandler. The DIH file would use a file reader, then the line reader tool, then separate the line with a regular expression into two fields. If you need a unique ID, look up the UUID tools. I have never heard of this use case. On Thu, Jul 26, 2012 at 1:56 PM, Florian Pope

Re: leaks in solr

2012-07-26 Thread Lance Norskog
What does the "Statistics" page in the Solr admin say? There might be several "searchers" open: org.apache.solr.search.SolrIndexSearcher Each searcher holds open different generations of the index. If obsolete index files are held open, it may be old searchers. How big are the caches? How long doe

Re: Map/Reduce directly against solr4 index.

2012-07-26 Thread Trung Pham
Can it read distributed lucene indexes in SolrCloud? On Jul 26, 2012 7:11 PM, "Lance Norskog" wrote: > Mahout includes a file reader for Lucene indexes. It will read from > HDFS or local disks. > > On Thu, Jul 26, 2012 at 6:57 PM, Darren Govoni > wrote: > > You raise an interesting possibility.

Re: Map/Reduce directly against solr4 index.

2012-07-26 Thread Lance Norskog
Mahout includes a file reader for Lucene indexes. It will read from HDFS or local disks. On Thu, Jul 26, 2012 at 6:57 PM, Darren Govoni wrote: > You raise an interesting possibility. A map/reduce solr handler over > solrcloud... > > On Thu, 2012-07-26 at 18:52 -0700, Trung Pham wrote: > >> I

Re: Map/Reduce directly against solr4 index.

2012-07-26 Thread Darren Govoni
You raise an interesting possibility. A map/reduce solr handler over solrcloud... On Thu, 2012-07-26 at 18:52 -0700, Trung Pham wrote: > I think the performance should be close to Hadoop running on HDFS, if > somehow Hadoop job can directly read the Solr Index file while executing > the job o

Re: Map/Reduce directly against solr4 index.

2012-07-26 Thread Trung Pham
I think the performance should be close to Hadoop running on HDFS, if somehow Hadoop job can directly read the Solr Index file while executing the job on the local solr node. Kindna like how HBase and Cassadra integrate with Hadoop. Plus, we can run the map reduce job on a standby Solr4 cluster.

Re: leaks in solr

2012-07-26 Thread Karthick Duraisamy Soundararaj
Mark, We use solr 3.6.0 on freebsd 9. Over a period of time, it accumulates lots of space! On Thu, Jul 26, 2012 at 8:47 PM, roz dev wrote: > Thanks Mark. > > We are never calling commit or optimize with openSearcher=false. > > As per logs, this is what is happening > > openSearcher=true,

Re: Binary content index with multiple cores

2012-07-26 Thread Chris Hostetter
: Here is my solrconfig.xml for one of the core : ... : : ... : I've added the maven dependencies like this for the solr war : ... : : org.apache.sol

Re: leaks in solr

2012-07-26 Thread roz dev
Thanks Mark. We are never calling commit or optimize with openSearcher=false. As per logs, this is what is happening openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false} -- But, We are going to use 4.0 Alpha and see if that helps. -Saroj On Thu, Jul 26, 2012 at

Re: leaks in solr

2012-07-26 Thread Mark Miller
I'd take a look at this issue: https://issues.apache.org/jira/browse/SOLR-3392 Fixed late April. On Jul 26, 2012, at 7:41 PM, roz dev wrote: > it was from 4/11/12 > > -Saroj > > On Thu, Jul 26, 2012 at 4:21 PM, Mark Miller wrote: > >> >> On Jul 26, 2012, at 3:18 PM, roz dev wrote: >> >>>

Re: solr host name on solrconfig.xml

2012-07-26 Thread Chris Hostetter
: i need the host name of my solr-server in my solrconfig.xml : anybody knows the correct variable? : : something like ${solr.host} or ${solr.host.name} ... : : exists an documantation about ALL available variables in the solr : namespaces? Off the top of my head i don't know that there are any

Re: UUID generation not working

2012-07-26 Thread Chris Hostetter
: : 1. I am using UUID to generate unique id in my collection but when I tried : to index the collection it could not find any doucmnets. can you please : tell me how to use UUID in schema.xm in general, if you are having a problem achieving a goal, please post what you've tried and what kinds

Re: leaks in solr

2012-07-26 Thread roz dev
it was from 4/11/12 -Saroj On Thu, Jul 26, 2012 at 4:21 PM, Mark Miller wrote: > > On Jul 26, 2012, at 3:18 PM, roz dev wrote: > > > Hi Guys > > > > I am also seeing this problem. > > > > I am using SOLR 4 from Trunk and seeing this issue repeat every day. > > > > Any inputs about how to resol

Re: leaks in solr

2012-07-26 Thread Mark Miller
On Jul 26, 2012, at 3:18 PM, roz dev wrote: > Hi Guys > > I am also seeing this problem. > > I am using SOLR 4 from Trunk and seeing this issue repeat every day. > > Any inputs about how to resolve this would be great > > -Saroj Trunk from what date? - Mark

Re: Map/Reduce directly against solr4 index.

2012-07-26 Thread Schmidt Jeff
It's not free (for production use anyway), but you might consider DataStax Enterprise: http://www.datastax.com/products/enterprise It is a very nice consolidation of Cassandra, Solr and Hadoop. No ETL required. Cheers, Jeff On Jul 26, 2012, at 3:55 PM, Trung Pham wrote: > Is it possible to r

UUID generation not working

2012-07-26 Thread gopes
Hi 1. I am using UUID to generate unique id in my collection but when I tried to index the collection it could not find any doucmnets. can you please tell me how to use UUID in schema.xm Thanks, Sarala -- View this message in context: http://lucene.472066.n3.nabble.com/UUID-generation-no

Re: Map/Reduce directly against solr4 index.

2012-07-26 Thread Darren Govoni
Of course you can do it, but the question is whether this will produce the performance results you expect. I've seen talk about this in other forums, so you might find some prior work here. Solr and HDFS serve somewhat different purposes. The key issue would be if your map and reduce code overload

Map/Reduce directly against solr4 index.

2012-07-26 Thread Trung Pham
Is it possible to run map reduce jobs directly on Solr4? I'm asking this because I want to use Solr4 as the primary storage engine. And I want to be able to run near real time analytics against it as well. Rather than export solr4 data out to a hadoop cluster.

RE: Bulk indexing data into solr

2012-07-26 Thread Zhang, Lisheng
Hi, I really appreciate your quick helps! 1) I want to let solr not cache any IndexerReader (hopefully it is possible), because our app is made of many lucene folders and each of them not very large, from my previous test it seems that performance is fine if each time we just create IndexerReader

Re: separation of indexes to optimize facet queries without fulltext

2012-07-26 Thread Daniel Brügge
Hi Chris, thanks for the answer. the plan is that in lots of queries I just need faceted values and don't even do a fulltext search. And on the other hand I need the fulltext search for exactly one task in my application, which is search documents and returning them. Here no faceting at all is ne

Re: Bulk indexing data into solr

2012-07-26 Thread Mikhail Khludnev
IIRC about a two month ago problem with such scheme discussed here, but I can remember exact details. Scheme is generally correct. But you didn't tell how do you let solr know that it need to reread new index generation, after indexer fsync segments get. btw, it might be a possible issue: https://

Re: language detection and phonetic

2012-07-26 Thread Paul Libbrecht
Le 26 juil. 2012 à 21:22, Alireza Salimi a écrit : > The question is: is there any cleaner way to do that? I've always done phonetic match using a separate phonetic field (title-ph for example) and copy-field. There's one considerable advantage to that: using such as dismax, you can say "prefe

Re: leaks in solr

2012-07-26 Thread roz dev
Hi Guys I am also seeing this problem. I am using SOLR 4 from Trunk and seeing this issue repeat every day. Any inputs about how to resolve this would be great -Saroj On Thu, Jul 26, 2012 at 8:33 AM, Karthick Duraisamy Soundararaj < karthick.soundara...@gmail.com> wrote: > Did you find any m

Re: querying using filter query and lots of possible values

2012-07-26 Thread Daniel Brügge
Exactly. Creating a new index from the aggregated documents is the plan I described above. I don't really now, how long this will take for each new index. Hopefully under 1 hour or so. That would be tolerable. Thanks. Daniel On Thu, Jul 26, 2012 at 8:47 PM, Chantal Ackermann < c.ackerm...@it-agen

Re: separation of indexes to optimize facet queries without fulltext

2012-07-26 Thread Chris Hostetter
: My thought was, that I could separate indexes. So for the facet queries : where I don't need : fulltext search (so also no indexed fulltext field) I can use a completely : new setup of a : sharded Solr which doesn't include the indexed fulltext, so the index is : kept small containing : just the

Re: querying using filter query and lots of possible values

2012-07-26 Thread Chantal Ackermann
Hi Daniel, depending on how you decide on the list of ids, in the first place, you could also create a new index (core) and populate it with DIH which would select only documents from your main index (core) in this range of ids. When updating you could try a delta import. Of course, this is on

RE: Bulk indexing data into solr

2012-07-26 Thread Zhang, Lisheng
Hi, I think at least before lucene 4.0 we can only allow one process/thread to write on a lucene folder. Based on this fact my initial plan is: 1) There is one set of lucene index folders. 2) Solr server only perform queries in those servers 3) Having a separate process (multi-threads) to index

Re: Skip first word

2012-07-26 Thread in.abdul
That's is best option I had also used shingle filter factory . . On Jul 26, 2012 10:03 PM, "Chantal Ackermann-2 [via Lucene]" < ml-node+s472066n399748...@n3.nabble.com> wrote: > Hi, > > use two fields: > 1. KeywordTokenizer (= single token) with ngram minsize=1 and maxsize=2 > for inputs of length

Re: querying using filter query and lots of possible values

2012-07-26 Thread Daniel Brügge
Thanks Alexandre, the list of IDs is constant for a longer time. I will take a look at these join thematic. Maybe another solution would be to really create a whole new collection or set of documents containing the aggregated documents (from the ids) from scratch and to execute queries on this col

Re: querying using filter query and lots of possible values

2012-07-26 Thread Alexandre Rafalovitch
You can't update the original documents except by reindexing them, so no easy group assigment option. If you create this 'collection' once but query it multiple times, you may be able to use SOLR4 join with IDs being stored separately and joined on. Still not great because the performance is an is

Re: Bulk indexing data into solr

2012-07-26 Thread Mikhail Khludnev
Coming back to your original question. I'm puzzled a little. It's not clear where you wanna call Lucene API directly from. if you mean that you has standalone indexer, which write index files. Then it stops and these files become available for Solr Process it will work. Sharing index between proces

Is it possible or wise to query multiple cores in parallel in SolrCloud

2012-07-26 Thread Daniel Brügge
Hi, I am playing around with a SolrCloud setup (4 shards) and thousands of cores. I am thinking of executing queries on hundreds of cores like a distributed query. Is this possible at all from SolrCloud side. And is this wise? Thanks & regards Daniel

Re: querying using filter query and lots of possible values

2012-07-26 Thread Daniel Brügge
Hey Chantal, thanks for your answer. The range queries would not work, because they are not values in a row. They can be randomly ordered with gaps. Above was just an example. Excluding is also not a solution, because the list of excluded id would be even longer. To specify it even more. The ID

Re: Bulk indexing data into solr

2012-07-26 Thread Mikhail Khludnev
Right in time, guys. https://issues.apache.org/jira/browse/SOLR-3585 Here is server side update processing "fork". It does the best for halting processing on exception occurs. Plug this UpdateProcessor, specify number of threads. Then submit lazy iterator into StreamingUpdateServer at client side

RE: Bulk indexing data into solr

2012-07-26 Thread Zhang, Lisheng
Thanks very much, both your and Rafal's advice are very helpful! -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Thursday, July 26, 2012 8:47 AM To: solr-user@lucene.apache.org Subject: Re: Bulk indexing data into solr On 7/26/2012 7:34 AM, Rafał Kuć wrote: > If yo

Re: Skip first word

2012-07-26 Thread Chantal Ackermann
Hi, use two fields: 1. KeywordTokenizer (= single token) with ngram minsize=1 and maxsize=2 for inputs of length < 3, 2. the other one tokenized as appropriate with minsize=3 and longer for all longer inputs Cheers, Chantal Am 26.07.2012 um 09:05 schrieb Finotti Simone: > Hi Ahmet, > busine

Re: querying using filter query and lots of possible values

2012-07-26 Thread Chantal Ackermann
Hi Daniel, index the id into a field of type tint or tlong and use a range query (http://wiki.apache.org/solr/SolrQuerySyntax?highlight=%28rangequery%29): fq=id:[200 TO 2000] If you want to exclude certain ids it might be wiser to simply add an exclusion query in addition to the range query in

Re: Expression Sort in Solr

2012-07-26 Thread lavesh
Hi i know we look to create at index time however all values are dynamic if(exists(query(COUNTRY:(22 33 44)),100,20),INCOME ) IS NOT WORKING ALSO I NEED NESTED IF On Thu, Jul 26,

querying using filter query and lots of possible values

2012-07-26 Thread Daniel Brügge
Hi, i am facing the following issue: I have couple of million documents, which have a field called "source_id". My problem is, that I want to retrieve all the documents which have a source_id in a specific range of values. This range can be pretty big, so for example a list of 200 to 2000 source

Re: Bulk indexing data into solr

2012-07-26 Thread Shawn Heisey
On 7/26/2012 7:34 AM, Rafał Kuć wrote: If you use Java (and I think you do, because you mention Lucene) you should take a look at StreamingUpdateSolrServer. It not only allows you to send data in batches, but also index using multiple threads. A caveat to what Rafał said: The streaming object

Re: leaks in solr

2012-07-26 Thread Karthick Duraisamy Soundararaj
Did you find any more clues? I have this problem in my machines as well.. On Fri, Jun 29, 2012 at 6:04 AM, Bernd Fehling < bernd.fehl...@uni-bielefeld.de> wrote: > Hi list, > > while monitoring my solr 3.6.1 installation I recognized an increase of > memory usage > in OldGen JVM heap on my slave.

Re: Solr - hl.fragsize Issue

2012-07-26 Thread Ahmet Arslan
> i am using solr 3.5 , and in search > query i set hl.fragsize = 100 , but my > fragment does not contain exact 100 chars , average fragment > size is 120 . > > Can anybody have idea about this issue?? Are you using FastVectorHighlighter or DefaultSolrHighlighter? Could it be that 120 includes c

Re: Expression Sort in Solr

2012-07-26 Thread Erik Hatcher
How dynamic are those numbers? If this expression can be computed at index time into a "sort_order" field, that'd be best. Otherwise, if these factors are truly dynamic at run-time, look at the function query sorting capability here:

Expression Sort in Solr

2012-07-26 Thread lavesh
am working on solr for search. I required to perform a expression sort such that ORDER BY (IF(COUNTRY=1,100,0) + IF(AVAILABLE=2,1000,IF(AVAILABLE=1,60,0)) + IF (DELIVERYIN IN (5,6,7),100,IF (DELIVERYIN IN (80,90),50,0))) DESC can anyone tell me hows is this possible? -- View this message in c

Solr - hl.fragsize Issue

2012-07-26 Thread meghana
i am using solr 3.5 , and in search query i set hl.fragsize = 100 , but my fragment does not contain exact 100 chars , average fragment size is 120 . Can anybody have idea about this issue?? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-hl-fragsize-Issue-tp3

Re: Binary content index with multiple cores

2012-07-26 Thread Ahmet Arslan
> About the solr.war, when i start my mvn cargo:run i put into > the pom.xml the > fact that he create the sol.war and for solr-cell tomcat > needs some > dependencies like solr-cell, solr-core, solr-solrj, > tika-core and slf4j-api. > > Have you any idea about where is my mistake ? Okey, for sol

Re: Bulk indexing data into solr

2012-07-26 Thread Rafał Kuć
Hello! If you use Java (and I think you do, because you mention Lucene) you should take a look at StreamingUpdateSolrServer. It not only allows you to send data in batches, but also index using multiple threads. -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -

solr host name on solrconfig.xml

2012-07-26 Thread stockii
Hello i need the host name of my solr-server in my solrconfig.xml anybody knows the correct variable? something like ${solr.host} or ${solr.host.name} ... exists an documantation about ALL available variables in the solr namespaces? thx a lot -- View this message in context: http://lucene.4

Re: Skip first word

2012-07-26 Thread Finotti Simone
Hi Ahmet, business asked me to apply EdgeNGram with minGramSize=1 on the first term and with minGramSize=3 on the latter terms. We are developing a search suggestion mechanism, the idea is that if the user types "D", the engine should suggest "Dolce & Gabbana", but if we type "G", it should sug

Re: solr spellchecker hogging all of my memory

2012-07-26 Thread Michael Della Bitta
Do the spellcheck objects eventually get collected off the heap? Maybe you should dump the heap later and ensure those objects get collected, in which case, I'd call this a normal heap expansion due to a temporary usage spike. Michael Della Bitta A

Bulk indexing data into solr

2012-07-26 Thread Zhang, Lisheng
Hi, I am starting to use solr, now I need to index a rather large amount of data, it seems that calling solr to pass data through HTTP is rather inefficient, I am think still call lucene API directly for bulk index but to use solr for search, is this design OK? Thanks very much for helps, Li

Re: Binary content index with multiple cores

2012-07-26 Thread davidbougearel
Thanks for replying, here is my dependency related to solr-cell : org.apache.solr:solr-cell:jar:3.6.0:compile [INFO] | +- com.ibm.icu:icu4j:jar:4.8.1.1:compile [INFO] | +- *org.apache.tika:tika-parsers:jar:1.0:compile* [INFO] | | +- org.apache.tika:tika-core:jar:1.0:compile [INFO] | | +- ed

Re: numFound inconsistent for different rows-param

2012-07-26 Thread patrick
i resolved my confusion and discovered that the documents of the second shard contained the same 'unique' id. rows=0 displayed the 'correct' numFound since (as i understand) there was no merge of the results. cheerio, patrick On 25.07.2012 17:07, patrick wrote: hi, i'm running two solr v3.

Re: Binary content index with multiple cores

2012-07-26 Thread davidbougearel
To help finding the solution, with my JUnit test here is the stack trace : org.apache.solr.client.solrj.SolrServerException: Server at http://localhost:8983/solr/document returned non ok status:500, message:Internal Server Error at org.apache.solr.client.solrj.impl.HttpSolrServer.request(H

Re: Binary content index with multiple cores

2012-07-26 Thread davidbougearel
Ok i find a way to use it, it was a problem with librairies. In fact i dont want to index PDF or Word directly i just want to get the content to add into my document content so i guess i will have to use tika to get the XML and to get the node that i want. -- View this message in context: http