Re: Solr 4.0 => Spatial Search - How to

2011-01-13 Thread caman
Thanks Here was the issues. Concatenating 2 floats(lat,lng) at mysql end converted it to a BLOB. Indexing would fail in storing BLOB in 'location' type field. After BLOB issue was resolved, all worked ok. Thank you all for your help -- View this message in context: http://lucene.472066.n3.na

Re: Solr 4.0 => Spatial Search - How to

2011-01-13 Thread Grijesh.singh
I have used that type of location searching. But I have not used spatial search. I wrote my logic at application end. I have cached the location ids and their lat/lang. When queries are comming for any location say "New Delhi" then my location searche logic at application end calculate the distanc

Re: Adding a new site to existing solr configuration

2011-01-13 Thread Gora Mohanty
On Thu, Jan 13, 2011 at 10:47 PM, PeterKerk wrote: > > I still have the default Solr example config running on Jetty. I use Cygwin > to start my current site. > > Now I already have fully configured one solr instance with these files: > \example\example-DIH\solr\db\conf\my-data-config.xml > \examp

Re: Variable datasources

2011-01-13 Thread Gora Mohanty
On Fri, Jan 14, 2011 at 1:02 AM, tjpoe wrote: [...] > I also tried creating datasources for each local and then using a variable > datasource in the entity such as: > > > > > > > and then the document as: > > >   rootEntity="false"> >   >   > > > but the ${local.code} variable is not resolve

Re: Improving Solr performance

2011-01-13 Thread Gora Mohanty
On Thu, Jan 13, 2011 at 10:10 PM, supersoft wrote: > > On the one hand, I found really interesting those comments about the reasons > for sharding. Documentation agrees you about why to split an index in > several shards (big sizes problems) but I don't find any explanation about > the inconvenien

Re: use of schema.xml

2011-01-13 Thread Dennis Gearon
I could put 1-10,000 fileds in any one document, as long as they are told what type or they are dynamically matched by dynamic fields relative to what's in the schema.xml file? It's very much like google 'big tables' or 'elastic search' that way, right? It's up to me to enforce any field nam

Re: use of schema.xml

2011-01-13 Thread Lance Norskog
Wait- it does enforce the schema names. What it does not enforce is field contents when you change the schema. Since Lucene does not have field replacement, it is not practical to remove or add a field to all existing documents when you change the schema. On Thu, Jan 13, 2011 at 8:15 PM, Lance Nor

Re: use of schema.xml

2011-01-13 Thread Lance Norskog
Correct. Solr and Lucene do not store or enforce the schema. You're on your own :) On Thu, Jan 13, 2011 at 8:09 PM, Dennis Gearon wrote: > I'm going to buy the book for Solr, since it looks like I need to do more of > the > work than I thought I would. > > But, from looking at it, the schema fil

Re: Solr 4.0 => Spatial Search - How to

2011-01-13 Thread Lance Norskog
Spatial does not support separate separate fields: you don't need lat/long, only 'coord'. To get latitude/longitude in the coord field from the DIH, you need to use a transformer in the DIH script. It would populate a field 'coord' with a text string made from the lat and lon fields: http://wiki.

use of schema.xml

2011-01-13 Thread Dennis Gearon
I'm going to buy the book for Solr, since it looks like I need to do more of the work than I thought I would. But, from looking at it, the schema file only says: A/ What types of data can be in the 'fields' of the documents B/ If there are any dynamically assigned fields. C/ What parsers are av

Re: Multi-word exact keyword case-insensitive search suggestions

2011-01-13 Thread Estrada Groups
Ahhh...the fun of open source software ;-). Requires a ton of trial and error! I found what worked for me and figured it was worth passing it along. If you don't mind...when you sort everything out on your end, please post results for the rest of us to take a gander at. Cheers, Adam On Jan 13

Re: segment gets corrupted (after background merge ?)

2011-01-13 Thread Lance Norskog
1) CheckIndex is not supposed to change a corrupt segment, only remove it. 2) Are you using local hard disks, or do run on a common SAN or remote file server? I have seen corruption errors on SANs, where existing files have random changes. On Thu, Jan 13, 2011 at 11:06 AM, Michael McCandless wrot

Re: Multi-word exact keyword case-insensitive search suggestions

2011-01-13 Thread Chamnap Chhorn
Thanks for your reply. However, it doesn't work for my case at all. I think it's the problem with query parser or something else. It forces me to put double quote to the search query in order to get the results found. "sim 010" "sim 010" +DisjunctionMaxQuery((keyphrase:sim 010)) () +(keyphrase:sim

[sfield] Missing in Spatial Search

2011-01-13 Thread Adam Estrada
According to the documentation here: http://wiki.apache.org/solr/SpatialSearch the field that identifies the spatial point data is "sfield". See the console output below. Jan 13, 2011 6:49:40 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={spellcheck=true&f.jtyp

Re: Solr + Hadoop

2011-01-13 Thread Alexander Kanarsky
Joan, make sure that you are running the job on Hadoop 0.21 cluster. (It looks like you have compiled the apache-solr-hadoop jar with Hadoop 0.21 but using it on 0.20 cluster). -Alexander

Searchers and Warmups

2011-01-13 Thread David Cramer
I'm trying to understand the mechanics behind warming up, when new searchers are registered, and their costs. A quick Google didn't point me in the right direction, so hoping for some of that here. -- David Cramer

RE: start value in queries zero or one based?

2011-01-13 Thread Jonathan Rochkind
You could have tried it and seen for yourself on any Solr server in your possession in less time than it took to have this thread. And if you don't have a Solr server, then why do you care? But the answer is 0. http://wiki.apache.org/solr/CommonQueryParameters#start "The default value is "0""

RE: verifying that an index contains ONLY utf-8

2011-01-13 Thread Jonathan Rochkind
So you're allowed to put the entire original document in a stored field in Solr, but you aren't allowed to stick it in, say, a redis or couchdb too? Ah, beaurocracy. But no reason what you are doing won't work, as you of course already know from doing it. If you actually know the charset of a

Re: verifying that an index contains ONLY utf-8

2011-01-13 Thread Paul
Thanks for all the responses. CharsetDetector does look promising. Unfortunately, we aren't allowed to keep the original of much of our data, so the solr index is the only place it exists (to us). I do have a java app that "reindexes", i.e., reads all documents out of one index, does some transfor

Re: start value in queries zero or one based?

2011-01-13 Thread Dennis Gearon
I'm migrating to CTO/CEO status in life due to building a small company. I find I don't have too much time for theory. I work with wht is. So, what is it, not what should it be. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is us

RE: start value in queries zero or one based?

2011-01-13 Thread Steven A Rowe
> Please, read every wiki page you can find and write notes. NO!!! Once you start down this road, there is no turning back! Soon you will feel the need to turn your notes into a new wiki page or a blog post, and people will read those and write notes, and the process will repeat, ad infinitum

Re: start value in queries zero or one based?

2011-01-13 Thread Markus Jelsma
Perhaps it would be more useful to RTFM instead of messing around on the mailing list: http://wiki.apache.org/solr/CommonQueryParameters#start Please, read every wiki page you can find and write notes. > Do I even need a body for this message? ;-) > > Dennis Gearon > > > Signature Warning >

Re: Solr + Hadoop

2011-01-13 Thread Em
Hi Joan, I am not sure whether it applies, but are you really using Solr 1.4 (not 1.4.1) and were also using the Hadoop-Jars provided by this patch (0.20.1 not 0.0.21)? I ask, because I had some other issues with other classes that were related to different package-definitions etc. - in short: so

Re: start value in queries zero or one based?

2011-01-13 Thread Walter Underwood
On Jan 13, 2011, at 1:28 PM, Dennis Gearon wrote: > Do I even need a body for this message? ;-) > > Dennis Gearon Are you asking "is it" or "should it be"? If the latter, we can also discuss Emacs and vi. wunder -- Walter Underwood K6WRU

Re: verifying that an index contains ONLY utf-8

2011-01-13 Thread Robert Muir
On Thu, Jan 13, 2011 at 2:05 PM, Jonathan Rochkind wrote: > > There are various packages of such heuristic algorithms to guess char > encoding, I wouldn't try to write my own. icu4j might include such an > algorithm, not sure. > it does: http://icu-project.org/apiref/icu4j/com/ibm/icu/text/Chars

start value in queries zero or one based?

2011-01-13 Thread Dennis Gearon
Do I even need a body for this message? ;-) Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com

Variable datasources

2011-01-13 Thread tjpoe
I have several similar databases that I'd like to import from 14 to be exact. there is also a 15th database where I can get a listing of the 14 database. I'm trying to do a variable datasource such as: then my import query looks like this The above configuration works, but the $

Re: verifying that an index contains ONLY utf-8

2011-01-13 Thread Michael McCandless
The tokens that Lucene sees (pre-4.0) are char[] based (ie, UTF16), so the first place where invalid UTF8 is detected/corrected/etc. is during your analysis process, which takes your raw content and produces char[] based tokens. Second, during indexing, Lucene ensures that the incoming char[] toke

Re: segment gets corrupted (after background merge ?)

2011-01-13 Thread Michael McCandless
Generally it's not safe to run CheckIndex if a writer is also open on the index. It's not safe because CheckIndex could hit FNFE's on opening files, or, if you use -fix, CheckIndex will change the index out from under your other IndexWriter (which will then cause other kinds of corruption). That

Re: verifying that an index contains ONLY utf-8

2011-01-13 Thread Jonathan Rochkind
Scanning for only 'valid' utf-8 is definitely not simple. You can eliminate some obviously not valid utf-8 things by byte ranges, but you can't confirm valid utf-8 alone by byte ranges. There are some bytes that can only come after or before other certain bytes to be valid utf-8. There is no

DataimportHandler development issue

2011-01-13 Thread Derek Werthmuller
We're just getting started with Solr and are very interested in using Solr for search applications. I've got the rss example working 1.4.1 didn't work out of the box, but we figured it out -then found fixes in the svn. Any way we are learning how to load the data/rss & atom feeds into the Solr in

Re: verifying that an index contains ONLY utf-8

2011-01-13 Thread Peter Karich
take a look also into icu4j which is one of the contrib projects ... > converting on the fly is not supported by Solr but should be relative > easy in Java. > Also scanning is relative simple (accept only a range). Detection too: > http://www.mozilla.org/projects/intl/chardet.html > >> We've crea

Adding a new site to existing solr configuration

2011-01-13 Thread PeterKerk
I still have the default Solr example config running on Jetty. I use Cygwin to start my current site. Now I already have fully configured one solr instance with these files: \example\example-DIH\solr\db\conf\my-data-config.xml \example\example-DIH\solr\db\conf\schema.xml \example\example-DIH\solr

Re: Term frequency across multiple documents

2011-01-13 Thread Ahmet Arslan
So you are interested in collection frequency of words. TermsComponent gives you document frequency of terms. You can modify it to give collection frequency info. http://search-lucene.com/m/of5Fn1PUOHU/ --- On Wed, 1/12/11, Juan Grande wrote: > From: Juan Grande > Subject: Re: Term frequency

Re: Improving Solr performance

2011-01-13 Thread supersoft
On the one hand, I found really interesting those comments about the reasons for sharding. Documentation agrees you about why to split an index in several shards (big sizes problems) but I don't find any explanation about the inconvenients as an Access Control List. I guess there should be some an

RE: StopFilterFactory and "qf" containing some fields that use it and some that do not

2011-01-13 Thread Dyer, James
I appreciate the reply and blog posting. For now, I just enabled stopwords for all the fields on "Qf". We have a very short list anyhow and our legacy search engine didn't even allow field-by-field configuration (stopwords are global on that system). I do wonder...what if (e)dismax had a flag

Re: Tuning StatsComponent

2011-01-13 Thread Johannes Goll
What field type do you recommend for a float stats.field for optimal Solr 1.4.1 StatsComponent performance ? float, pfloat or tfloat ? Do you recommend to index the field ? 2011/1/12 stockii > > my field Type is "double" maybe "sint" is better ? but i need double ... > =( > -- > View this m

Re: StopFilterFactory and "qf" containing some fields that use it and some that do not

2011-01-13 Thread Jonathan Rochkind
It's a known 'issue' in dismax, (really an inherent part of dismax's design with no clear way to do anything about it), that qf over fields with different stop word definitions will produce odd results for a query with a stopword. Here's my understanding of what's going on: http://bibwild.wor

Re: segment gets corrupted (after background merge ?)

2011-01-13 Thread Stéphane Delprat
I understand less and less what is happening to my solr. I did a checkIndex (without -fix) and there was an error... So a did another checkIndex with -fix and then the error was gone. The segment was alright During checkIndex I do not shut down the solr server, I just make sure no client co

Re: Multi-word exact keyword case-insensitive search suggestions

2011-01-13 Thread Adam Estrada
Hi, the following seems to work pretty well.

Get nearby words?

2011-01-13 Thread darren
Hi, Is there a way to get the relevant nearby words in the entire index given a single word? I want to know all the relevance ranked words before and after the queried word. thanks for any tips. Darren

Re: Solr boolean operators

2011-01-13 Thread Xavier SCHEPLER
Ok, thanks. That's what I expected :D > > From: dante stroe > Sent: Thu Jan 13 15:56:33 CET 2011 > To: > Subject: Re: Solr boolean operators > > > To my understanding: in terms of the results that will be matched by your > query ... it's the same. In te

Re: Solr boolean operators

2011-01-13 Thread dante stroe
To my understanding: in terms of the results that will be matched by your query ... it's the same. In terms of the score of the results no, since, if you are using the first query, the documents that will match both the "a" and the "b" terms, will match higher then the ones matching just the "

Re: basic document crud in an index

2011-01-13 Thread Markus Jelsma
To fill the gaps: b. the old version remains on disk but is flagged for deletion d. optimize equals merging, the difference is how many segments come out e. yes On Thursday 13 January 2011 15:21:54 kenf_nc wrote: > A/ You have to update all the fields, if you leave one off, it won't be in > the d

Solr boolean operators

2011-01-13 Thread Xavier Schepler
Hi, with the Lucene query syntax, is : a AND (a OR b) equivalent to : a (absorption) ?

Re: basic document crud in an index

2011-01-13 Thread kenf_nc
A/ You have to update all the fields, if you leave one off, it won't be in the document anymore. I have my 'persisted' data stored outside of Solr, so on update I get the stored data, modify it and update Solr with every field (even if one changed). You could also do a Query/Modify/Update directly

Re: Question on deleting all rows for an index

2011-01-13 Thread kenf_nc
If this is a one-time cleanup, not something you need to do programmatically, you could delete the index directory ( /data/index ). In my case I have to stop Tomcat, delete .\index and restart Tomcat. It is very fast and starts me out with a fresh, empty, index. Noticed you are multi-core, I'm not

Solr + Hadoop

2011-01-13 Thread Joan
Hi, I'm trying build solr index with MapReduce (Hadoop) and I'm using https://issues.apache.org/jira/browse/SOLR-1301 but I've a problem with hadoop version and this patch. When I compile this patch, I use 0.21.0 hadoop version, I don't have any problem but when I'm trying to run my job in Hadoop

range queries in solr

2011-01-13 Thread ur lops
Hi, I am sorry to ask this silly question but I could not find the documentation about this and I am very new to lucene solr. I want to run a range query on one of the multivalued field e.g. I have a point say [10,20], which is the point of intersection of the diagonals of a rectangle. Now I w

Re: Dismax, Sharding and Elevation

2011-01-13 Thread Grijesh.singh
As I seen the code for QueryElevationComponent ,there is no supports for Distributed Search i.e. query elevation does not works with shards. - Grijesh -- View this message in context: http://lucene.472066.n3.nabble.com/Dismax-Sharding-and-Elevation-tp2247156p2247522.html Sent from the Solr

Re: spell suggest response

2011-01-13 Thread Grijesh.singh
Similar type of work I have done earlier by using spell-check component with auto-suggest combined. Autosuggest will provide the words starting with query term and spellcheck returns the words similar to that. I have combined both suggestion in single list to display - Grijesh -- View this

Dismax, Sharding and Elevation

2011-01-13 Thread Oliver Marahrens
Hi all, I have discovered a strange thing with Dismax and Elevation and hope someone can enlighten me what to do. Whenever I search for something using the elevation Request Handler the hits are from a normal Lucene query (with elevated results if the search term was defined in elevation.xml). El