Re: Custom Solr indexer/searcher

2012-11-15 Thread Mikhail Khludnev
Scott, It sounds like you need to look into few samples of similar things in Lucene. On top of my head FuzzyQuery from 4.0, which finds terms similar to the given in FST for query expansion. Generic query expansion is done via MultiTermQuery. Index time terms expansion is shown in TrieField and btw

Re: Is there a way to limit returned rows directly in a query string?

2012-11-15 Thread Mikhail Khludnev
Yun, Literally you can call another QParser from the middle of a query and apply local params to it via nested queries feature http://searchhub.org/2009/03/31/nested-queries-in-solr/ syntax is little bit tricky though. But calling other QParser and attempting specify number of rows for it makes ab

Re: Custom Solr indexer/searcher

2012-11-15 Thread John Whelan
Scott, I probably have no idea as to what I'm saying, but if you're looking for finding results in a N-dimensional space, you might look at creating a field of type 'point'. Point-type fields have a dimension attribute; I believe that it can be set to a large integer value. Barring that, there is

Re: Is there a way to limit returned rows directly in a query string?

2012-11-15 Thread Dominique Debailleux
First query is OK; it just doesn't fit your need if I understand Could you confirm that the expected result is 6 rows (3 rows w/ppt plus 3 rows/pdf) ? 2012/11/15 jefferyyuan > Thanks :) > local param is very useful, but seems it doesn't work here: > I tried: > q={!rows=3}ext_name:pdf OR ext_n

Re: BM25 model for solr 4?

2012-11-15 Thread Floyd Wu
Thanks everyone, especially to Tom, you do give me detailed explanation about this topic. Of course in academic we do need to interpret result carefully, what I care about is from end-users point of view, using BM25 will result better ranking instead of using lucene's original VSM+Boolean model? Ho

Re: how make a suggester?

2012-11-15 Thread Otis Gospodnetic
Hi Iwo, This is kind of a common question. Have a look at http://search-lucene.com/?q=autocomplete+OR+suggester&fc_project=Solr&fc_type=mail+_hash_+userfor lots of discussions on this topic. In short, you could use the Suggester that comes with Solr or you could do http://www.cominvent.com/2012/

Re: Patch Needed for Issue Solr-3790

2012-11-15 Thread mechravi25
Hi Koji, Thank you for your reply..will test for the same. -- View this message in context: http://lucene.472066.n3.nabble.com/Patch-Needed-for-Issue-Solr-3790-tp4019256p4020651.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: cores shards and disks in SolrCloud

2012-11-15 Thread Otis Gospodnetic
Hi, I think here you want to use a single JVM per server - no need for multiple JVMs, JVM per Collection and such. If you can spread data over more than 1 disk on each of your servers, great, that will help. Re data loss - yes, you really should just be using replication. Sharding a ton will mini

Re: Solr 4.0 indexing performance

2012-11-15 Thread Jack Krupansky
Did you start from scratch, or did you bulk index into an existing index? There is some "backcompat" logic in there, which is convenient, but not necessarily the best performance. -- Jack Krupansky -Original Message- From: Nils Weinander Sent: Thursday, November 15, 2012 1:29 AM To:

Re: Solr 4.0 indexing performance

2012-11-15 Thread Otis Gospodnetic
But slower indexing with solr 4.0 sounds suspicious to me... you compared your configs? JVM parameters? GC? IO? CPU? Otis -- Performance Monitoring - http://sematext.com/spm On Nov 15, 2012 5:26 AM, "Nils Weinander" wrote: > Ah, thanks Markus! > > That's a good thing. I tried disabling the tran

Re: consistency in SolrCloud replication

2012-11-15 Thread Otis Gospodnetic
I think Bill was asking about search I think the Q is whether the query hitting the shard where a doc was sent for indexing would see that doc even before that doc has been copied to replicas. I didn't test it, but I'd think the answer would be positive because of the xa log. Otis -- Performa

Re: zkcli issues

2012-11-15 Thread Nick Chase
Unfortunately, this doesn't seem to solve the issue; now I'm beginning to wonder if maybe it's because I'm on Windows. Has anyone successfully run ZkCLI on Windows? Nick On 11/12/2012 2:27 AM, Jeevanandam Madanagopal wrote: Nick - Sorry, embedded links are not shown in previous email.

RE: cores shards and disks in SolrCloud

2012-11-15 Thread Buttler, David
The main reason to split a collection into 25 shards is to reduce the impact of the loss of a disk. I was running an older version of solr, a disk went down, and my entire collection was offline. Solr 4 offers shards.tolerant to reduce the impact of the loss of a disk: fewer documents will be

Re: High Slave CPU Intermittently After Replication

2012-11-15 Thread Upayavira
One question is, why optimise? The newer TieredMergePolicy, as I understand it, takes away much of the need for optimising an index. As to maxing, after a replication, your caches need warming. Watch how often you replicate, nd check on the admin UI how long it takes to warm caches. You may be max

Re: cores shards and disks in SolrCloud

2012-11-15 Thread Upayavira
Personally I see no benefit to have more than one JVM per node, cores can handle it. I would say that splitting a 20m index into 25 shards strikes me as serious overkill, unless you expect to expand significantly. 20m would likely be okay with two or three shards. You can store the indexes for each

Re: PointType multivalued query

2012-11-15 Thread David Smiley (@MITRE.org)
Sorry, you're out of luck. SRPT could be generalized but that's a bit of work. The trickiest part I think would be writing a multi-dimensional SpatialPrefixTree impl. If the # of discrete values at each dimension is pretty small (<100? ish?), then there is a way using term positions and span que

Re: PointType multivalued query

2012-11-15 Thread blopez
Sorry I tried to explain it too fast. Imagine the usecase that I wrote on the first post. A document can have more than one 6-Dimensions point. So my first approach was: 1 2,2,2,2,2,2 2 3,3,3,3,3,3 3 4,4,4,4,4,4 It works fine and I don't think it gives us bad performance,

Re: PointType multivalued query

2012-11-15 Thread blopez
Hi, I think it's not a good idea to make Join operations between Solr cores because of the performance (we managed a lot of data). The point is that we want to store documents, each one with several information sets (let's name them Points), each one identified by 6 values (that's why I was tryin

cores shards and disks in SolrCloud

2012-11-15 Thread Buttler, David
Hi, I have a question about the optimal way to distribute solr indexes across a cloud. I have a small number of collections (less than 10). And a small cluster (6 nodes), but each node has several disks - 5 of which I am using for my solr indexes. The cluster is also a hadoop cluster, so the

Re: PointType multivalued query

2012-11-15 Thread David Smiley (@MITRE.org)
Borja, Umm, I'm quite confused with the use-case you present. ~ David - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/PointType-multivalued-query-tp4020445p4020609.html Sent from the Solr - Us

Re: Admin Permissions

2012-11-15 Thread Michael Long
I figured out you can disable the core admin in solr.xml, but then it breaks the admin as apparently it relies on that. I tried tomcat security but haven't been able to make it work I think as this point I may just write a query/debugging app that the developers could use On 11/13/2012 07:12

Re: PointType multivalued query

2012-11-15 Thread blopez
Hi David, thanks for your reply. I've tested this datatype and the values are indexed fine (I'm using 6-dimensions points). I'm trying to retrieve results and it works only with the 2 first dimensions (X and Y), but it's not taking into account the others 4 dimensions. I've been reading the do

RE: DIH nested entities don't work

2012-11-15 Thread Dyer, James
Depending on how much data you're pulling back, 2 hours might be a reasonable amount of time. Of course if you had it a lot faster with Endeca & Forge, I can understand your questioning this. Keep in mind that the way you're setting up, it will build each cache, 1 at a time. I'm pretty sure Fo

Re: BM25 model for solr 4?

2012-11-15 Thread Tom Burton-West
Hello Floyd, There is a ton of research literature out there comparing BM25 to vector space. But you have to be careful interpreting it. BM25 originally beat the SMART vector space model in the early TRECs because it did better tf and length normalization. Pivoted Document Length normalizatio

Re: Is there a way to limit returned rows directly in a query string?

2012-11-15 Thread Dominique Debailleux
Wasn't obvious ;). Maybe you could try local params...something like q={!q.op=OR%20rows=3}yourQueryHere Hope this helps Dom 2012/11/15 jefferyyuan > Thanks for the reply. > > I am using SolrEntityProcessor to import data from another remote solr > server - not database, so the query here is

Re: Is there a way to limit returned rows directly in a query string?

2012-11-15 Thread Dominique Debailleux
Hi yun Not sure to understand your need... There is no relationship between a query string and DIH. What you want to achieve (if "fetch 1 rows" means "select 1 rows from a table") can be done by limiting the number of rows you SQL select will return (the syntax differs from SGBD to SGBD).

Re: PointType multivalued query

2012-11-15 Thread David Smiley (@MITRE.org)
Oh I'm sorry, I should have read your question more clearly. I totally forgot that solr.PointType supports a configurable number of dimensions. If you need more than 2 dimensions as your example shows you do, then you'll have to resort to indexing your spatial data in another Solr core as non-mul

Re: CloudSolrServer and LBHttpSolrServer: setting BinaryResponseParser and BinaryRequestWriter.

2012-11-15 Thread Luis Cappa Banda
Yes, my first attemp was with a List, but it didn´t work. Then I started to try another ways such as a String[] array with no success. Regards, - Luis Cappa. 2012/11/15 Sami Siren > hi, > > did you try setting your values in a List, for example ArrayList it should > work when you use that even

Re: CloudSolrServer and LBHttpSolrServer: setting BinaryResponseParser and BinaryRequestWriter.

2012-11-15 Thread Sami Siren
hi, did you try setting your values in a List, for example ArrayList it should work when you use that even without specifying reguest-/response writer. -- Sami Siren On Thu, Nov 15, 2012 at 4:56 PM, Luis Cappa Banda wrote: > Hello, > > I´ve found what It seems to be a bug > JIRA-SOLR4080< > h

Re: consistency in SolrCloud replication

2012-11-15 Thread Mark Miller
It depends - no commit necessary for realtime get. Otherwise, yes, you would need to do at least a soft commit. That works the same way though - so if you make your update, then do a soft commit, you can be sure your next search will see the update on all the replicas. And with realtime get, of

how make a suggester?

2012-11-15 Thread iwo
Hello, I would like implement a suggester with solr, which is the best way now in your opinion? thanks in advance I. - Complicare è facile, semplificare é difficile. Complicated is easy, simple is hard. quote: http://it.wikipedia.org/wiki/Bruno_Munari -- View this message in context: h

Re: SolrCloud: Shard resize

2012-11-15 Thread Erick Erickson
Currently you have to re-index all of your data. If you don't you'll have a situation in which the same document (by uniqueKey) exists in two shards and that document may show up twice in your results list. NOTE: by "reindex all your data", you need to _delete_ all your data first. If you just add

Re: Nested Join Queries

2012-11-15 Thread Erick Erickson
Gerald: Here's the place to start: http://wiki.apache.org/solr/HowToContribute But the basic setup is 1> create a JIRA login (anyone can) 2> create a JIRA if one doesn't exist 3> generate the patch. From your root level (the one that contains "solr" and "lucene" dirs) and "svn diff > SOLR-###.patc

Re: Run multiple instances of solr using single data directory

2012-11-15 Thread Erick Erickson
I think this is rather dangerous. How would these multiple slaves coordinate replication? Would they all replicate at once? If only one was configured to replicate, how would the others know to reopen serchers? Furthermore, simply opening up more Solr instances on the same machine isn't expanding

Re: best practicies dealing with solr collections and instances

2012-11-15 Thread Erick Erickson
Well, what does "maintenance" entail? Changing schema? Rebuilding the index? Many operations under the "maintenance" rubrik can be done with core admin handler requests, see: http://wiki.apache.org/solr/CoreAdmin But if that doesn't solve your problem, then probably running in two separate JVMs i

Re: Solr defining Schema structure trouble.

2012-11-15 Thread Jack Krupansky
Ah... sure, you can create a schema that has several different document types in it, with extra fields that are used in some but not all documents - books have the metadata fields but no page bodies while pages have page bodies but no metadata. And maybe even do a Solr join for the "block" of p

Re: Unable to run two multicore Solr instances under Tomcat

2012-11-15 Thread Erick Erickson
Thanks for wrapping this up, it's always nice to get closure, especially when it comes to googling .. On Wed, Nov 14, 2012 at 5:34 AM, Adam Neal wrote: > Just to wrap up this one. Previously all the lib jars were located in the > war file on our setup, this was mainly to ease deployment as it's

Re: Solr 4.0 Dismax woes (2 specifically)

2012-11-15 Thread Erick Erickson
OK, I'm going to reach a bit here. First, you're right, (e)dismax distributes the terms across all the fields, there's no good way around that. But for your specific example, why don't fielded queries work with the default query parser? e.g. q=tag:clothe cid:95? You can use the fuzzy syntax here

High Slave CPU Intermittently After Replication

2012-11-15 Thread richardg
Here is our setup: Solr 4.0 Master replicates to three slaves after optimize We have a problem were every so often after replication the CPU load on the Slave servers maxes out and request come to a crawl. We do a dataimport every 10 minutes and depending on the number of updates since the las

Re: consistency in SolrCloud replication

2012-11-15 Thread David Smiley (@MITRE.org)
Mark Miller-3 wrote > I'm talking about an update request. So if you make an update, when it > returns, your next search will see the update, because it will be on > all replicas. I presume this is only the case if (of course) the client also sent a commit. So you're saying the commit call will n

Re: DataImportHandler in Solr 1.4 bug?

2012-11-15 Thread Andy Lester
On Nov 15, 2012, at 8:02 AM, Sébastien Lorber wrote: > > > I don't know where you're getting the ${JOB_EXEC.JOB_INSTANCE_ID}. I believe that if you want to get parameters passed in, it looks like this: WHERE batchid = ${dataimporter.request.batchid} when I kick

DataImportHandler in Solr 1.4 bug?

2012-11-15 Thread Sébastien Lorber
Hello, I don't know if this is a bug or a missing feature, nor if it was corrected in new versions of Solr (can't find any JIRA about it), so I just want to show you the problem... I can't test with Solr 4.0, I have a legacy system, not a lot of time, not a Solr expert at all and it seems just up

Re: Solr Indexing MAX FILE LIMIT

2012-11-15 Thread Alexandre Rafalovitch
Maybe you can start by testing this with split -l and xargs :-) These are standard Unix toolkit approaches and since you use one of them (curl) you may be happy to use others too. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalov

CloudSolrServer and LBHttpSolrServer: setting BinaryResponseParser and BinaryRequestWriter.

2012-11-15 Thread Luis Cappa Banda
Hello, I´ve found what It seems to be a bug JIRA-SOLR4080 with CloudSolrServer during atomic updates via SolrJ. Thanks to Sami I detec

RE: DIH nested entities don't work

2012-11-15 Thread mroosendaal
Hi James, Just gave it a go and it worked! That's the good news. The problem now is getting it to work faster. It took over 2 hours just to index 4 views and i need to get information from 26. I tried adding the defaultRowPrefetch="2" as a jdbc parameter but it does not seem to honour that. I

Re: consistency in SolrCloud replication

2012-11-15 Thread Mark Miller
I'm talking about an update request. So if you make an update, when it returns, your next search will see the update, because it will be on all replicas. Another process that is searching rapidly may see an "eventually" consistent view though (very briefly). We have some ideas to make that view "mo

Re: Solr 4.0 Spatial Search schema.xml and data-config.xml

2012-11-15 Thread David Smiley (@MITRE.org)
The particular JavaScript I referred to is this: function processAdd(cmd) { doc = cmd.solrDoc; // org.apache.solr.common.SolrInputDocument lat = doc.getFieldValue("LATITUDE"); lon = doc.getFieldValue("LONGITUDE"); if (lat != null && lon != null) doc.setField("latLon", lat+","+l

RE: Error loading class solr.CJKBigramFilterFactory

2012-11-15 Thread Frederico Azeiteiro
:) Just installed 3.6.1 and its working just fine. Something should be wrong with my tomcat/solr install. Thank you Robert. //Frederico   -Mensagem original- De: Robert Muir [mailto:rcm...@gmail.com] Enviada: quarta-feira, 14 de Novembro de 2012 19:18 Para: solr-user@lucene.apache.or

Re: SolrJ: atomic updates.

2012-11-15 Thread Luis Cappa Banda
Uhm, after setting both Response and Request Writers it worked OK with * HttpSolrServer*. I´ve tried to find a way to set BinaryResponseParser and BinaryRequestWriter with *CloudServer *(or even via *LbHttpSolrServer*) but I found nothing. Suggestions? :-/ - Luis Cappa. 2012/11/15 Sami Siren

Re: SolrJ: atomic updates.

2012-11-15 Thread Sami Siren
Try setting Request writer to binary like this: server.setParser(new BinaryResponseParser()); server.setRequestWriter(new BinaryRequestWriter()); Or then instead of string array use ArrayList() that contains your strings as the value for the map On Thu, Nov 15, 2012 at 3:58 PM, Luis

Re: SolrJ: atomic updates.

2012-11-15 Thread Luis Cappa Banda
Hi, Sami. Doing some tests I´ve used the same code as you and did a quick execution: *HttpSolrServer server = new HttpSolrServer(" http://localhost:8080/solrserver/core1 ");* * * * try {* * * * HashMap editTags = new HashMap();* * editTags.put("

Re: consistency in SolrCloud replication

2012-11-15 Thread Bill Au
Thanks for the info, Mark. By "a request won't return until it's affected all replicas", are you referring to the update request or the query? Bill On Wed, Nov 14, 2012 at 7:57 PM, Mark Miller wrote: > It's included as soon as it has been indexed - though a request won't > return until it's a

Re: Solr defining Schema structure trouble.

2012-11-15 Thread denl0
Yes this is what I'm trying to do. But stuff related to the document like language/title/...(i got way more fields) are stored many times. Each page has a part of data that's the same is it possible to seperate that data? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr

PointType multivalued query

2012-11-15 Thread blopez
Hi all, I'm using a multivalued PointType (6 dimensions) in my Solr schema. Imagine that I have one doc indexed in Solr: -1 1,1,1,1,1,1 5,5,5,5,5,5 Now imagine that I launch some queries: point:[0,0,0,0,0,0 TO 2,2,2,2,2,2]: Works OK (matches with the first doc point and retu

Re: Faceting Question

2012-11-15 Thread Alexey Serba
Seems like pivot faceting is what you looking for ( http://wiki.apache.org/solr/SimpleFacetParameters#Pivot_.28ie_Decision_Tree.29_Faceting ) Note: it currently does not work in distributed mode - see https://issues.apache.org/jira/browse/SOLR-2894 On Thu, Nov 15, 2012 at 7:46 AM, Jamie Johnson

Re: SolrJ: atomic updates.

2012-11-15 Thread Luis Cappa Banda
I´ll have a look to Solr source code and try to fix the bug. If I succeed I´ll update JIRA issue with it, :-) 2012/11/15 Sami Siren > Actually it seems that xml/binary request writers only behave differently > when using array[] as the value. if I use ArrayList it also works with the > xml form

Re: SolrJ: atomic updates.

2012-11-15 Thread Sami Siren
Actually it seems that xml/binary request writers only behave differently when using array[] as the value. if I use ArrayList it also works with the xml format (4.1 branch). Still it's annoying that the two request writers behave differently so I guess it's worth adding the jira anyway. The Affect

Re: SolrJ: atomic updates.

2012-11-15 Thread Luis Cappa Banda
Ok, done: https://issues.apache.org/jira/browse/SOLR-4080 Regards, - Luis Cappa. 2012/11/15 Luis Cappa Banda > Hello, Sami. > > It will be the first issue that I open so, should I create it under Solr > 4.0 version or in Solr 4.1.0 one? > > Thanks, > > - Luis Cappa. > > > 2012/11/15 Sami Sir

Re: SolrJ: atomic updates.

2012-11-15 Thread Luis Cappa Banda
Hello, Sami. It will be the first issue that I open so, should I create it under Solr 4.0 version or in Solr 4.1.0 one? Thanks, - Luis Cappa. 2012/11/15 Sami Siren > On Thu, Nov 15, 2012 at 11:51 AM, Luis Cappa Banda >wrote: > > > Thread update: > > > > When I use a simple: > > > > *Map oper

Re: SolrJ: atomic updates.

2012-11-15 Thread Sami Siren
On Thu, Nov 15, 2012 at 11:51 AM, Luis Cappa Banda wrote: > Thread update: > > When I use a simple: > > *Map operation = new HashMap();* > > > Instead of: > > *Map> operation = new HashMap List>();* > > > The result looks better, but it´s still wrong: > > fieldName: [ > "[Value1, Value2]" > ], > >

Re: Solr 4.0 indexing performance

2012-11-15 Thread Nils Weinander
Ah, thanks Markus! That's a good thing. I tried disabling the transaction log, the difference performance is marginal. So, I'll stick with the transaction logging. On Thu, Nov 15, 2012 at 11:02 AM, Markus Jelsma wrote: > Hi - you're likely seeing a drop in performance because of durability > wh

RE: Solr 4.0 indexing performance

2012-11-15 Thread Markus Jelsma
Hi - you're likely seeing a drop in performance because of durability which is enabled by default via a transaction log. When disabled 4.0 is iirc slightly faster than 3.x. -Original message- > From:Nils Weinander > Sent: Thu 15-Nov-2012 10:35 > To: solr-user@lucene.apache.org > Subj

Re: SolrJ: atomic updates.

2012-11-15 Thread Luis Cappa Banda
Thread update: When I use a simple: *Map operation = new HashMap();* Instead of: *Map> operation = new HashMap>();* The result looks better, but it´s still wrong: fieldName: [ "[Value1, Value2]" ], However, List value is received as a simple String "[Value1, Value2]". In other words, Sol

Re: Solr 4.0 Spatial Search schema.xml and data-config.xml

2012-11-15 Thread jmlucjav
If you are using DIH, is just doing (for a mysql project I have around for example) something like this: CONCAT(lat, ',',lon) as latlon -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Spatial-Search-schema-xml-and-data-config-xml-tp4020376p4020437.html Sent from

SolrJ: atomic updates.

2012-11-15 Thread Luis Cappa Banda
Hello everyone, I´ve tested atomic updates via Ajax calls and now I´m starting with atomic updates via SolrJ... but the way I´m proceeding doesn´t seem to work well. Here is the snippet: *SolrInputDocument do = ne SolrInputDocument();* *doc.addField("id", "myId");* * * *Map> operation = new HashM