Re: Changing Index directory?

2012-06-11 Thread Bruno Mannina
Le 12/06/2012 08:49, Bruno Mannina a écrit : Dear All, For tests, I would like to install Solr on standard directory (/home/solr) but with the index in a External HardDisk (/media/myExthdd). I suppose it will decrease performance but it's not a problem. Where can I find the Index Directory Pa

Changing Index directory?

2012-06-11 Thread Bruno Mannina
Dear All, For tests, I would like to install Solr on standard directory (/home/solr) but with the index in a External HardDisk (/media/myExthdd). I suppose it will decrease performance but it's not a problem. Where can I find the Index Directory Path variable? Thanks a lot, Bruno

Re: help with map reduce

2012-06-11 Thread Sachin Aggarwal
ok...my fault.. On Tue, Jun 12, 2012 at 11:31 AM, Gora Mohanty wrote: > On 12 June 2012 11:13, Sachin Aggarwal wrote: > > hello, > > > > > > I need help to write a map reduce program that can take the records from > > hbase table and insert into lily repository... > > which method will be a bet

Re: help with map reduce

2012-06-11 Thread Gora Mohanty
On 12 June 2012 11:13, Sachin Aggarwal wrote: > hello, > > > I need help to write a map reduce program that can take the records from > hbase table and insert into lily repository... > which method will be a better option to do doing indexing in the same job > or just perform insertion operation f

Re: Exception when optimizing index

2012-06-11 Thread Rok Rejc
Just as an addon: I have delete whole index directory and load the data from the start. After the data was loaded (and I commited the data) I run CheckIndex again. Again, there was bunch of broken segments. I will try with the latest trunk to see if the problem still exists. Regards, Rok On Mo

Re: Something like 'bf' or 'bq' with MoreLikeThis

2012-06-11 Thread Jack Krupansky
The MLT handler may not have those params, but you could use the MLT search "component" to generate the MLT queries (and results) and then add your own component that would revise the MLT queries to be boosted as you desire. -- Jack Krupansky -Original Message- From: entdeveloper Sen

Re: what's better for in memory searching?

2012-06-11 Thread Li Li
is this method equivalent to set vm.swappiness which is global? or it can set the swappiness for jvm process? On Tue, Jun 12, 2012 at 5:11 AM, Mikhail Khludnev wrote: > Point about premature optimization makes sense for me. However some time > ago I've bookmarked potentially useful approach > htt

Re: Indexing Multiple Datasources

2012-06-11 Thread Jack Krupansky
You can do it by giving each database data source a "name" attribute, which is what you reference in the dataSource attribute of your entity. See: http://wiki.apache.org/solr/DataImportHandler#multipleds Or, are you in fact trying to join or merge the tables based on first name and last name o

Something like 'bf' or 'bq' with MoreLikeThis

2012-06-11 Thread entdeveloper
I'm looking for a way to improve the relevancy of my MLT results. For my index based on movies, the MoreLikeThisHandler is doing a great job of returning related documents by the fields I specify like 'genre', but within my "bands" of results (groups of documents with the same score cause they all

Re: After a full data import from a database

2012-06-11 Thread Jack Krupansky
Do a query such as: http://localhost:8983/solr/select/?q=*:* to see the count of documents that were indexed. -- Jack Krupansky -Original Message- From: Michael Della Bitta Sent: Monday, June 11, 2012 6:41 PM To: solr-user@lucene.apache.org Subject: Re: After a full data import from

Re: After a full data import from a database

2012-06-11 Thread Michael Della Bitta
Hi Jin, The file never shows up on disk anywhere. It's parsed and various bits of it are stored in various different ways, depending on your schema. The raw stored data, if you've so specified, is in the .fdt file, but that's not going to be a very convenient file format for you to look at directl

After a full data import from a database

2012-06-11 Thread Jin Chun
Hi there, I just did a full import of data from a database... and where should I be looking for the indexed file (likely be in xml format)? I've already checked the folder I set as in solrconfig.xml.. and all I see in there is bunch of .fdt, .fdx, .frq, .tis files... Any suggestions? Thanks, J

Re: what's better for in memory searching?

2012-06-11 Thread Mikhail Khludnev
Point about premature optimization makes sense for me. However some time ago I've bookmarked potentially useful approach http://lucene.472066.n3.nabble.com/High-response-time-after-being-idle-tp3616599p3617604.html. On Mon, Jun 11, 2012 at 3:02 PM, Toke Eskildsen wrote: > On Mon, 2012-06-11 at 11

Re: Question on addBean and deleteByQuery

2012-06-11 Thread Chris Hostetter
: Transfer-Encoding: chunked : Content-Type: application/xml; charset=UTF-8 : : 47 : name:fred AND currency:USD : 0 ... : Due to the way our servers are setup, we get an error and we think it is due : to these numbers being in the body of the request. please be specific about the erro

Re: Correct way to deal with source data that may include a multivalued field that needs to be used for sorting?

2012-06-11 Thread Chris Hostetter
: In this day and age, a custom update handler is almost never the right : > answer to a problem -- nor is a custom request handler that does updates : > (theose two things are actaully different) ... my advice is always to : > start by trying to impliment what you need as an UpdateRequestProcesso

Re: score filter

2012-06-11 Thread Chris Hostetter
: I need to frame a query that is a combination of two query parts and I use a : 'function' query to prepare the same. Something like: : q={!type=func q.op=AND df=text}product(query($uq,0.0),query($cq,0.1)) : : where $uq and $cq are two queries. : : Now, I want a search result returned only if I

Re: Correct way to deal with source data that may include a multivalued field that needs to be used for sorting?

2012-06-11 Thread Aaron Daubman
While I look into doing some refactoring, as well as creating some new UpdateRequestProcessors (and/or backporting), would you please point me to some reading material on why you say the following: In this day and age, a custom update handler is almost never the right > answer to a problem -- nor

Re: Correct way to deal with source data that may include a multivalued field that needs to be used for sorting?

2012-06-11 Thread Chris Hostetter
: The new FieldValueSubsetUpdateProcessorFactory classes look phenomenal. I : haven't looked yet, but what are the chances these will be back-ported to : 3.6 (or how hard would it be to backport them?)... I'll have to check out : the source in more detail. 3.x is bug fix only as we now focus on 4

Re: Sorting with customized function of score

2012-06-11 Thread Chris Hostetter
: I'm using the solr 4.0 nightly build version. In fact I intend to sort with : a more complicate function including score, geodist() and other factors, so : this example is to simplify the issue that I cannot sort with a customized : function of score. : More concrete, how can i make the sort lik

Re: Sharing common config between different search handlers

2012-06-11 Thread Chris Hostetter
: But I would like those two Searchhandlers to share the rest of their : configuration. Because if anything needs to be changed, it need to be : done for both Searchhandlers. I think that's kind of ugly. take a look at using XML includes (aka "xinclude") ... that would let you keep the common ch

Re: search for alphabetic version of numbers

2012-06-11 Thread Jack Krupansky
You can certainly do a modest number of special cases as replacement synonyms, but if you are serious about arbitrary number support, it might be best to go with a custom update processor and query preprocessor that map text numbers to simple numeric form. How about cases like 2,300 or 2,300.0

Re: Issue with field collapsing in solr 4 while performing distributed search

2012-06-11 Thread roz dev
I think that there is no way around doing custom logic in this case. If indexing process knows that documents have to be grouped then they better be together. -Saroj On Mon, Jun 11, 2012 at 6:37 AM, Nitesh Nandy wrote: > Martijn, > > How do we add a custom algorithm for distributing documents

Re: edismax and untokenized field

2012-06-11 Thread Vijay Ramachandran
Thank you for your reply. Sending this as a phrase query does change the results as expected. On Mon, Jun 11, 2012 at 4:39 PM, Tanguy Moal wrote: > I think you have to issue a phrase query in such a case because otherwise > each "token" is searched independently in the merchant field : the query

Re: defaultSearchField not working after upgrade to solr3.6

2012-06-11 Thread Jack Krupansky
Correct. In 3.6 it is simply ignored. In 4.x it currently does work. Generally, Solr ignores any elements that it does not support. -- Jack Krupansky -Original Message- From: Rohit Sent: Monday, June 11, 2012 12:55 PM To: solr-user@lucene.apache.org Subject: RE: defaultSearchField n

RE: defaultSearchField not working after upgrade to solr3.6

2012-06-11 Thread Rohit
Thanks for the pointers Jack, actually the strange part is that the defaultSearchField element is present and uncommented yet not working. docKey searchText Regards, Rohit -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: 11 June 2012 20:35 To: solr-user@lu

Re: Building a heat map from geo data in index

2012-06-11 Thread Jamie Johnson
Yeah I'll have to play to see how useful it is, I really don't know at this point. On another note we already using some binning like is described in teh wiki you sent, specifically http://code.google.com/p/javageomodel/ for other purposes. Not sure if that could be used or not, guess I'd have to

Indexing Multiple Datasources

2012-06-11 Thread Kay
Hello, We have 2 MS SQL Server Databases which we wanted to index .But most of the columns in the Databases have the same names. For e.g. Both the DB’s have the columns First name ,Last name ,etc. How can you index multiple Databases using single db-data-config file and one schema? Here is my d

Re: Building a heat map from geo data in index

2012-06-11 Thread Tanguy Moal
Yes it looks interesting and is not too difficult to do. However, the length of the geohashes gives you very little control on the size of the regions to colorize. Quoting wikipedia : geohash length km error1 ±25002 ±6303 ±784 ±205 ±2.46 ±0.617 ±0.0768 ±0.019 This is interes

Re: Building a heat map from geo data in index

2012-06-11 Thread Jamie Johnson
If you look at the Stack response from David he had suggested breaking the geohash up into pieces and then using a prefix for refining precision. I hadn't imagined limiting this to a particular area, just limiting it based on the prefix (which would be based on users zoom level or something) allow

Re: defaultSearchField not working after upgrade to solr3.6

2012-06-11 Thread Jack Krupansky
Just to clarify one point from my original response, the "df" parameter is already set for the default request handlers, so all you need to do is change it from the "text" field to your preferred default field. Or, you can simply uncomment the deprecated defaultSearchField element in your sche

Re: Building a heat map from geo data in index

2012-06-11 Thread Tanguy Moal
There is definitely something interesting to do around geohashes. I'm wondering how one could map the N by N tiles requested tiles to a range of geohashes. (Where the gap would be a function of N). What I try to mean is that I don't know if a bijective function exist between tiles and geohash rang

Re: How to do custom sorting in Solr?

2012-06-11 Thread Afroz Ahmad
You may want to look at http://sujitpal.blogspot.com/2011/05/custom-sorting-in-solr-using-external.html. While it is not the same requirement, this should give you an idea of how to do custom sorting. Thanks Afroz On Sun, Jun 10, 2012 at 4:43 PM, roz dev wrote: > Yes, these documents have lots

Re: Building a heat map from geo data in index

2012-06-11 Thread Dmitry Kan
so it sounds to me, that the geohash is just a hash representation of lat, lon coordinates for an easier referencing (see e.g. http://en.wikipedia.org/wiki/Geohash). I would probably start with something easier, having bbox lat,lon coordinate pairs of top left corner (or in some coordinate systems,

RE: defaultSearchField not working after upgrade to solr3.6

2012-06-11 Thread Rohit
Hi Jack, I understand that df would make this work normaly, but why did defaultSearchField stop working suddenly. I notice that there is talk about deprecating it, but even then it should continue to work right? Regards, Rohit -Original Message- From: Jack Krupansky [mailto:j...@basetec

Re: Building a heat map from geo data in index

2012-06-11 Thread Jamie Johnson
That is certainly an option but the collecting of the heat map data is really the question. I saw this http://stackoverflow.com/questions/8798711/solr-using-facets-to-sum-documents-based-on-variable-precision-geohashes but don't have a really good understanding of how this would be accomplished.

Re: Issue with field collapsing in solr 4 while performing distributed search

2012-06-11 Thread Nitesh Nandy
Martijn, How do we add a custom algorithm for distributing documents in Solr Cloud? According to this discussion http://lucene.472066.n3.nabble.com/SolrCloud-how-to-index-documents-into-a-specific-core-and-how-to-search-against-that-core-td3985262.html , Mark discourages users from using custom d

RE: Writing custom data import handler for Solr.

2012-06-11 Thread Dyer, James
More specifically, the 3.6 Data Import Handler code (DIH) can be seen here: http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_6/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/ The main wiki page is here: http://wiki.apache.org/solr/DataImportHandler T

Re: defaultSearchField not working after upgrade to solr3.6

2012-06-11 Thread Jack Krupansky
Add the "df" parameter to your query request handler. It names the default field. Or use "qf" for the edismax query parser. -- Jack Krupansky -Original Message- From: Rohit Sent: Monday, June 11, 2012 8:58 AM To: solr-user@lucene.apache.org Subject: defaultSearchField not working afte

Re: Issue with field collapsing in solr 4 while performing distributed search

2012-06-11 Thread Jack Krupansky
Is there a Solr wiki that discusses these issues, such as "Groups can't cross shard boundaries"? Seems like it should be highlighted prominently, maybe here: http://wiki.apache.org/solr/FieldCollapsing Seems like it should be mentioned on the distributed/SolrCloud wiki(s) as well. Is this a

defaultSearchField not working after upgrade to solr3.6

2012-06-11 Thread Rohit
Hi, We have just migrated from solr3.5 to solr3.6, for all this time we have been querying solr as, http://122.166.9.144:8080/solr/ <>/?q=apple But now this is not working and the name of the search field needs to be provide

Re: Building a heat map from geo data in index

2012-06-11 Thread Stefan Matheis
I'm not entirely sure, that it has to be that complicated .. what about using for example http://www.patrick-wied.at/static/heatmapjs/ ? You could collect all the geo-related data and do the (heat)map stuff on the client. On Sunday, June 10, 2012 at 7:49 PM, Jamie Johnson wrote: > I had a req

Re: Issue with field collapsing in solr 4 while performing distributed search

2012-06-11 Thread Martijn v Groningen
The ngroups returns the number of groups that have matched with the query. However if you want ngroups to be correct in a distributed environment you need to put document belonging to the same group into the same shard. Groups can't cross shard boundaries. I guess you need to do some manual documen

Issue with field collapsing in solr 4 while performing distributed search

2012-06-11 Thread Nitesh Nandy
Version: Solr 4.0 (svn build 30th may, 2012) with Solr Cloud (2 slices and 2 shards) The setup was done as per the wiki: http://wiki.apache.org/solr/SolrCloud We are doing distributed search. While querying, we use field collapsing with "ngroups" set as true as we need the number of search resul

Re: edismax and untokenized field

2012-06-11 Thread Tanguy Moal
Hello, I think you have to issue a phrase query in such a case because otherwise each "token" is searched independently in the merchant field : the query parser splits the query on spaces! Check the difference between debug outputs when you search for "Jones New York", you'd get what you expected.

edismax and untokenized field

2012-06-11 Thread Vijay Ramachandran
Hello. I'm trying to understand the behaviour of edismax in solr 3.4 when it comes to searching fields similar to "string" types, i.e., untokenized. My document is data about products available in various stores. One of the fields in my schema is the name of the merchant, and I would like to match

Re: what's better for in memory searching?

2012-06-11 Thread Toke Eskildsen
On Mon, 2012-06-11 at 11:38 +0200, Li Li wrote: > yes, I need average query time less than 10 ms. The faster the better. > I have enough memory for lucene because I know there are not too much > data. there are not many modifications. every day there are about > hundreds of document update. if inde

Re: what's better for in memory searching?

2012-06-11 Thread Paul Libbrecht
Le 11 juin 2012 à 11:16, Li Li a écrit : > do you mean software RAM disk? Right. OS level. > using RAM to simulate disk? Yes. That generally makes a disk which is boost fast in reading and writing. > How to deal with Persistence? Synchronization (slaving?). paul

Re: what's better for in memory searching?

2012-06-11 Thread Li Li
I found this. http://unix.stackexchange.com/questions/10214/per-process-swapiness-for-linux it can provide fine grained control of swapping On Mon, Jun 11, 2012 at 4:45 PM, Michael Kuhlmann wrote: > Set the swapiness to 0 to avoid memory pages being swapped to disk too > early. > > http://en.wi

Re: what's better for in memory searching?

2012-06-11 Thread Li Li
yes, I need average query time less than 10 ms. The faster the better. I have enough memory for lucene because I know there are not too much data. there are not many modifications. every day there are about hundreds of document update. if indexes are not in physical memory, then IO operations will

Re: what's better for in memory searching?

2012-06-11 Thread Michael Kuhlmann
You cannot guarantee this when you're running out of RAM. You'd have a problem then anyway. Why are you caring that much? Did you yet have performance issues? 1GB should load really fast, and both auto warming and OS cache should help a lot as well. With such an index, you usually don't need t

Re: what's better for in memory searching?

2012-06-11 Thread Li Li
I am sorry. I make a mistake. even use RAMDirectory, I can not guarantee they are not swapped out. On Mon, Jun 11, 2012 at 4:45 PM, Michael Kuhlmann wrote: > Set the swapiness to 0 to avoid memory pages being swapped to disk too > early. > > http://en.wikipedia.org/wiki/Swappiness > > -Kuli > > A

Re: what's better for in memory searching?

2012-06-11 Thread Li Li
do you mean software RAM disk? using RAM to simulate disk? How to deal with Persistence? maybe I can hack by increase RAMOutputStream.BUFFER_SIZE from 1024 to 1024*1024. it may have a waste. but I can adjust my merge policy to avoid to much segments. I will have a "big" segment and a "small" segme

Re: what's better for in memory searching?

2012-06-11 Thread Paul Libbrecht
Li Li, have you considered allocating a RAM-Disk? It's not the most flexible thing... but it's certainly close, in performance to a RAMDirectory. MMapping on that is likely to be useless but I doubt you can set it to zero. That'd need experiment. Also, doesn't caching and auto-warming provide th

Re: what's better for in memory searching?

2012-06-11 Thread Li Li
1. this setting is global, I just want my lucene searching program don't swap. for other less important programs, it can still swap. 2. do I need call MappedByteBuffer.load() explicitly? or I have to warm up the indexes to guarantee all my files are in physical memory? On Mon, Jun 11, 2012 at 4:45

Re: what's better for in memory searching?

2012-06-11 Thread Michael Kuhlmann
Set the swapiness to 0 to avoid memory pages being swapped to disk too early. http://en.wikipedia.org/wiki/Swappiness -Kuli Am 11.06.2012 10:38, schrieb Li Li: I have roughly read the codes of RAMDirectory. it use a list of 1024 byte arrays and many overheads. But as far as I know, using MMap

Re: what's better for in memory searching?

2012-06-11 Thread Li Li
I have roughly read the codes of RAMDirectory. it use a list of 1024 byte arrays and many overheads. But as far as I know, using MMapDirectory, I can't prevent the page faults. OS will swap less frequent pages out. Even if I allocate enough memory for JVM, I can guarantee all the files in the direc

Re: Issues with whitespace tokenization in QueryParser

2012-06-11 Thread Bernd Fehling
Because we use in many cases mutli-term search together with synonyms as thesaurus we had to develop a solution for this. There is a whole chain of pitfalls through the system and you have to be careful. The thesaurus (synonym.txt) solves not only single-terms to multi-terms but also multi-terms t