solr 1.3 database connection latin1/stored utf8 in mysql?

2008-10-22 Thread sunnyfr
Hi, I'm using solr1.3 mysql and tomcat55, can you please help to sort this out? How can I index data in UTF8 ? I tried to add the parameter encoding="UTF-8" in the datasource in data-config.xml. | character_set_client| latin1 | characte

Re: Out of Memory Errors

2008-10-22 Thread Nick Jenkin
Have you confirmed Java's -Xmx setting? (Max memory) e.g. java -Xmx2000MB -jar start.jar -Nick On Wed, Oct 22, 2008 at 3:24 PM, Mark Miller <[EMAIL PROTECTED]> wrote: > How much RAM in the box total? How many sort fields and what types? Sorts on > each core? > > Willie Wong wrote: >> >> Hello, >>

Re: error with delta import

2008-10-22 Thread Shalin Shekhar Mangar
Actually, most XML parsers don't require you to escape such characters in attributes. You are welcome to try this out, just look at the example-DIH :) On Tue, Oct 21, 2008 at 11:11 PM, Steven A Rowe <[EMAIL PROTECTED]> wrote: > Wow, I really should read more closely before I respond - I see now,

Re: solr 1.3 database connection latin1/stored utf8 in mysql?

2008-10-22 Thread Shalin Shekhar Mangar
Hi, The best way to manage international characters is to keep everything in UTF-8. Otherwise it will be difficult to figure out the source of the problem. 1. Make sure the program which writes data into MySQL is using UTF-8 2. Make sure the MySQL tables are using UTF-8. 3. Make sure MySQL client

function to clear up string to utf8 before indexing, where should I put it?

2008-10-22 Thread sunnyfr
I've a function to clear up string which are in latin1 to UTF8, I would like to know where exactly should I put it in the java code to clear up string before indexing ? Thanks a lot for this information, Sunny I'm using solr1.3, mysql, tomcat55 -- View this message in context: http://www.nabbl

Re: solr 1.3 database connection latin1/stored utf8 in mysql?

2008-10-22 Thread sunnyfr
Hi Shalin Thanks for your answer but it doesn't work just with Dfile.encoding I was hoping it could work. I definitely can't change the database so I guess I must change java code. I've a function to change latin-1 string to utf8 but I don't know really where should I put it? Thanks for your

Odd q.op=AND and fq interactions in Solr 1.3.0

2008-10-22 Thread jayson.minard
I am seeing odd behavior where a query such as: http://localhost:8983/solr/select/?q=moss&version=2.2&start=0&rows=10&indent=on&fq=docType%3AFancy+Doc works until I add q.op=AND http://localhost:8983/solr/select/?q=moss&q.op=AND&version=2.2&start=0&rows=10&indent=on&fq=docType%3AFancy+Doc whic

RE: Sorting performance

2008-10-22 Thread Beniamin Janicki
:so you can send your updates anytime you want, and as long as you only :commit every 5 minutes (or commit on a master as often as you want, but :only run snappuller/snapinstaller on your slaves every 5 minutes) your :results will be at most 5minutes + warming time stale. This is what I do as w

Re: solr 1.3 database connection latin1/stored utf8 in mysql?

2008-10-22 Thread Jérôme Etévé
Hi, See http://java.sun.com/j2se/1.3/docs/guide/intl/encoding.doc.html and http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#getBytes(java.lang.String) Also note that you cannot transform a latin1 string in a utf-8 string. What you can do is to decode a latin1 octet array

Re: Odd q.op=AND and fq interactions in Solr 1.3.0

2008-10-22 Thread jayson.minard
BY the way, the fq parameter is being used to apply a facet value as a refinement which is why it is not tokenized and is a string. jayson.minard wrote: > > I am seeing odd behavior where a query such as: > > http://localhost:8983/solr/select/?q=moss&version=2.2&start=0&rows=10&indent=on&fq=do

RE: Out of Memory Errors

2008-10-22 Thread r.prieto
Hi Willie, Are you using highliting ??? If, the response is yes, you need to know that for each document retrieved, the solr highliting load into memory the full field who is using for this functionality. If the field is too long, you have problems with memory. You can solve the problem using th

Re: function to clear up string to utf8 before indexing, where should I put it?

2008-10-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
you can try out a Transformer to translate that On Wed, Oct 22, 2008 at 2:00 PM, sunnyfr <[EMAIL PROTECTED]> wrote: > > I've a function to clear up string which are in latin1 to UTF8, I would like > to know where exactly should I put it in the java code to clear up string > before indexing ? > > T

Re: Odd q.op=AND and fq interactions in Solr 1.3.0

2008-10-22 Thread jayson.minard
Thinking about this, I could work around it by quoting the facet value so that the AND isn't inserted between tokens in the fq parameter. jayson.minard wrote: > > BY the way, the fq parameter is being used to apply a facet value as a > refinement which is why it is not tokenized and is a stri

Re: function to clear up string to utf8 before indexing, where should I put it?

2008-10-22 Thread sunnyfr
Can you tell me more about it ? Noble Paul നോബിള്‍ नोब्ळ् wrote: > > you can try out a Transformer to translate that > > On Wed, Oct 22, 2008 at 2:00 PM, sunnyfr <[EMAIL PROTECTED]> wrote: >> >> I've a function to clear up string which are in latin1 to UTF8, I would >> like >> to know where e

Solr for Whole Web Search

2008-10-22 Thread John Martyniak
I am very new to Solr, but I have played with Nutch and Lucene. Has anybody used Solr for a whole web indexing application? Which Spider did you use? How does it compare to Nutch? Thanks in advance for all of the info. -John

Re: function to clear up string to utf8 before indexing, where should I put it?

2008-10-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
http://wiki.apache.org/solr/DataImportHandler#head-eb523b0943596587f05532f3ebc506ea6d9a606b On Wed, Oct 22, 2008 at 4:41 PM, sunnyfr <[EMAIL PROTECTED]> wrote: > > Can you tell me more about it ? > > > Noble Paul നോബിള്‍ नोब्ळ् wrote: >> >> you can try out a Transformer to translate that >> >> On

Re: Ocean realtime search + Solr

2008-10-22 Thread Jason Rutherglen
Not quite yet, there is the IndexReader.clone patch that needs to be completed that Ocean depends on https://issues.apache.org/jira/browse/LUCENE-1314. I had it completed but then things changed in IndexReader so now it doesn't work and I have not had time to complete it again. Otherwise the Ocea

Re: immediatley commit of docs doesnt work in multiCore case

2008-10-22 Thread Parisa
I should mention that I have already added this his tag in my SolrConfig.xml of all cores. and It works in single core but unfortunately doesn't work in multi core . -- View this message in context: http://www.nabble.com/immediatley-commit-of-docs-doesnt-work-in-multiCore-case-tp20072378p2

FileNotFoundException on slave after replication - script bug?

2008-10-22 Thread Jim Murphy
We're seeing strange behavior on one of our slave nodes after replication. When the new searcher is created we see FileNotFoundExceptions in the log and the index is strangely invalid/corrupted. We may have identified the root cause but wanted to run it by the community. We figure there is a bu

Boosting Question

2008-10-22 Thread Manepalli, Kalyan
Hi, I am working on a usecase where I want to boost a document if there are certain group of words near the keywords searched by the user. For eg: if the user is searching for keyword "pool", I want to boost the documents which have words like "excellent pool", "nice pool", "awesome pool",

Re: Solr for Whole Web Search

2008-10-22 Thread Grant Ingersoll
On Oct 22, 2008, at 7:57 AM, John Martyniak wrote: I am very new to Solr, but I have played with Nutch and Lucene. Has anybody used Solr for a whole web indexing application? Which Spider did you use? How does it compare to Nutch? There is a patch that combines Nutch + Solr. Nutch is used

Re: Hierarchical Faceting

2008-10-22 Thread Marian Steinbach
On Tue, Oct 21, 2008 at 3:59 PM, Sachit P. Menon <[EMAIL PROTECTED]> wrote: > Hi, > > I have gone through the archive in search of Hierarchical Faceting but was > not clear as what should I exactly do to achieve that. > > Suppose, I have 3 categories like politics, science and sports. In the > sc

Re: Index updates blocking readers: To Multicore or not?

2008-10-22 Thread Jim Murphy
Thanks Yonik, I have more information... 1. We do indeed have large indexes: 40GB on disk, 30M documents - and is just a test server we have 8 of these in parallel. 2. The performance problem I was seeing followed replication, and first query on a new searcher. It turns out we didn't configur

Re: Solr for Whole Web Search

2008-10-22 Thread John Martyniak
Grant thanks for the response. A couple of other people have recommended trying the Nutch + Solr approach, but I am not sure what the real benefit of doing that is. Since Nutch provides most of the same features as Solr and Solr has some nice additional features (like spell checking, incre

Re: Index updates blocking readers: To Multicore or not?

2008-10-22 Thread John Martyniak
Jim, This is a off topic question. But for your 30M documents, did you fetch those from external web sites (Whole Web Search)? Or are they internal documents? If they are external what method did you use to fetch them and which spider? I am in the process of deciding between using Nutch

Re: Index updates blocking readers: To Multicore or not?

2008-10-22 Thread Jim Murphy
We index RSS content using our own home grown distributed spiders - not using Nutch. We use ruby processes do do the feed fetching and XML shreading, and Amazon SQS to queue up work packets to insert into our Solr cluster. Sorry can't be of more help. -- View this message in context: http://

Re: Boosting Question

2008-10-22 Thread Otis Gospodnetic
Hi, Without changing any of the internals a simple approach might be to take the query "pool" and expand the query with those other keywords, form query phrases in addition to just plain "pool" keyword, and boost those expanded phrases to make them bubble up - if they exist. Otis -- Sematext

Question about copyField

2008-10-22 Thread Aleksey Gogolev
Hello. I have field "description" in my schema. And I want make a filed "suggestion" with the same content. So I added following line to my schema.xml: But I also want to modify "description" string before copying it to "suggestion" field. I want to remove all comas, dots and slashes. Here

Re: Understanding prefix query searching

2008-10-22 Thread Otis Gospodnetic
Hii, You probably lower-case tokens during indexing (LowerCaseFilterFactory). Wildcard queries are not analyzed as non-wildcard ones (this is explained in Lucene FAQ, I believe), so your capitalized Robert doesn't match the lower-cased robert in your index. Otis -- Sematext -- http://sematext

Re: Odd q.op=AND and fq interactions in Solr 1.3.0

2008-10-22 Thread Otis Gospodnetic
Hi Jayson, That's exactly what I was going to suggest: fq="docType:Fancy Doc" Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: jayson.minard <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Wednesday, October 22, 2008 5:26:03

RE: Question about copyField

2008-10-22 Thread Feak, Todd
The filters and tokenizer that are applied to the copy field are determined by it's type in the schema. Simply create a new field type in your schema with the filters you would like, and use that type for your copy field. So, the field description would have it's old type, but the field suggestion

Re: Index updates blocking readers: To Multicore or not?

2008-10-22 Thread John Martyniak
Thank you that is good information, as that is kind of way that I am leaning. So when you fetch the content from RSS, does that get rendered to an XML document that Solr indexes? Also what where a couple of decision points for using Solr as opposed to using Nutch, or even straight Lucene?

Re: Out of Memory Errors

2008-10-22 Thread Otis Gospodnetic
Hi, Without knowing the details I suspect it's just that 1.5GB heap is not enough. Yes, sort will use your heap, as will various Solr caches. As will norms, so double-check your schema to make sure you are using field types like string where you can, not text, for example. If you sort by tim

RE: error with delta import

2008-10-22 Thread Steven A Rowe
Hi Shalin, I wasn't talking about the behavior of parsers in the wild, but rather about the XML specification (paraphrasing): 1. An XML document is not well-formed unless it matches the production labeled document. 2. Violations of well-formedness constraints are fatal errors. 3. Once a fatal e

Re[2]: Question about copyField

2008-10-22 Thread Aleksey Gogolev
Thanks for reply. I want to make your point more exact, cause I'm not sure that I correctly understood you :) As far as I know (correct me please, if I wrong) type defines the way in which the field is indexed and queried. But I don't want to index or query "suggestion" field in different way, I

Re: Index updates blocking readers: To Multicore or not?

2008-10-22 Thread Jim Murphy
We shread the RSS into individual items then create Solr XML documents to insert. Solr is an easy choice for us over straight Lucene since it adds the server infrastructure that we would mostly be writing ourself - caching, data types, master/slave replication. We looked at nutch too - but that

RE: Re[2]: Question about copyField

2008-10-22 Thread Feak, Todd
Yes, using fieldType, you can have Solr run the PatternReplaceFilter for you. So, for example, you can declare something like this: -- ... ... Put the PatternReplaceFilter in here. At least for indexing, maybe for query as well ... ... --- I would suggest doing this i

Re: error with delta import

2008-10-22 Thread Walter Underwood
On 10/22/08 8:57 AM, "Steven A Rowe" <[EMAIL PROTECTED]> wrote: > Telling people that it's not a problem (or required!) to write non-well-formed > XML, because a particular XML parser can't accept well-formed XML is kind of > insidious. I'm with you all the way on this. A parser which accepts no

Re: Out of Memory Errors

2008-10-22 Thread Jae Joo
Here is what I am doing to check the memory statues. 1. Run the Servelt and Solr application. 2. On command prompt, jstat -gc 5s (5s means that getting data every 5 seconds.) 3. Watch it or pipe to the file. 4. Analyze the data gathered. Jae On Tue, Oct 21, 2008 at 9:48 PM, Willie Wong <[EMAIL P

Re[4]: Question about copyField

2008-10-22 Thread Aleksey Gogolev
FT> I would suggest doing this in your schema, then starting up Solr and FT> using the analysis admin page to see if it will index and search the way FT> you want. That way you don't have to pay the cost of actually indexing FT> the data to find out. Thanks. I did it exactly like you said. I cr

RE: Re[4]: Question about copyField

2008-10-22 Thread Feak, Todd
My bad. I misunderstood what you wanted. The example I gave was for the searching side of things. Not the data representation in the document. -Todd -Original Message- From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 22, 2008 11:14 AM To: Feak, Todd Subject: Re[

RE: Re[4]: Question about copyField

2008-10-22 Thread Joe Nguyen
It doesn't need to be a copy field, right? Could you create a new field "ex", extract value from description, delete digits, and set to "ex" field before add/index to solr server? -Original Message- From: Feak, Todd [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 22, 2008 11:25 Joe

Re[6]: Question about copyField

2008-10-22 Thread Aleksey Gogolev
JN> It doesn't need to be a copy field, right? Could you create a new field JN> "ex", extract value from description, delete digits, and set to "ex" JN> field before add/index to solr server? Yes, I can. I just was wondering can I use solr for this purpose or not. JN> -Original Message-

RE: Re[6]: Question about copyField

2008-10-22 Thread Joe Nguyen
Could you post fieldType specification for "ex"? What your regex look like? -Original Message- From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 22, 2008 11:39 Joe To: Joe Nguyen Subject: Re[6]: Question about copyField JN> It doesn't need to be a copy field, r

RE: Issue with Query Parsing '+' works as 'OR'

2008-10-22 Thread Lance Norskog
URI encoding turns a space into a plus, then (maybe) Lucene takes that as a space. Also you want a + in front of first_name. A AND B -> +first_name:joe++last_name:smith B AND maybe A -> first_name:joe++last_name:smith Some of us need sample use cases to understand these things; documenta

Re[8]: Question about copyField

2008-10-22 Thread Aleksey Gogolev
Here is it, regex is very simple: But the problem is not about the filed type. The problem is: how to retrive final token and put it into the filed. Theoretically I gan re

Re: Issue with Query Parsing '+' works as 'OR'

2008-10-22 Thread Walter Underwood
To pass a plus sign in a URL parameter, use %2B. This query: foo +bar Looks like this in a URL: q=foo+%2Bbar wunder On 10/22/08 11:52 AM, "Lance Norskog" <[EMAIL PROTECTED]> wrote: > URI encoding turns a space into a plus, then (maybe) Lucene takes that as a > space. Also you want a + in

Re: Re[6]: Question about copyField

2008-10-22 Thread Shalin Shekhar Mangar
If you want your indexed value changed, you can use an analyzer (either PatternReplaceFilter or a custom one). If you want the stored value changed, you can use a custom UpdateRequestProcessor. However, taking care of this in your application may be easier than bothering with the two particularly i

Re: Issues with facet

2008-10-22 Thread Jeremy Hinegardner
On Tue, Oct 21, 2008 at 06:57:03AM -0700, prerna07 wrote: > > Hi, > > On using Facet in solr query I am facing various issues. > > Scenario 1: > I have 11 Index with tag : productIndex > > my search query is appended by facet parameters : > facet=true&facet.field=Index_Type_s&qt=dismaxrequest

SolrSharp gone?

2008-10-22 Thread Otis Gospodnetic
Hello, It looks like we might have lost SolrSharp: http://wiki.apache.org/solr/SolrSharp It looks like its home is http://www.codeplex.com/solrsharp , but the site is no longer available. Does anyone know its status? There is also http://code.google.com/p/deveel-solr/ , but this seems brand new

Advice needed on master-slave configuration

2008-10-22 Thread William Pierce
Folks: I have two instances of solr running one on the master (U) and the other on the slave (Q). Q is used for queries only, while U is where updates/deletes are done. I am running on Windows so unfortunately I cannot use the distribution scripts. Every N hours when changes are committed

Re: SolrSharp gone?

2008-10-22 Thread Ryan McKinley
On Oct 22, 2008, at 4:17 PM, Otis Gospodnetic wrote: Hello, It looks like we might have lost SolrSharp: http://wiki.apache.org/solr/SolrSharp It looks like its home is http://www.codeplex.com/solrsharp , but the site is no longer available. Does anyone know its status? looks like it is

Re: Advice needed on master-slave configuration

2008-10-22 Thread Otis Gospodnetic
Normally you don't have to start Q, but only "reload" Solr searcher when the index has been copied. However, you are on Windows, and its FS has the tendency not to let you delete/overwrite files that another app (Solr/java) has opened. Are you able to copy the index from U to Q? How are you do

How to search a DataImportHandler solr index

2008-10-22 Thread Nick80
Hi, I'm using a couple of Solr 1.1 powered indexes and have relied on my "old" Solr installation for more than two years now. I'm working on a new project that is a bit complexer than my previous ones and I thought I had a look at all the new goodies in Solr. One item that caught my attention is

Re: How to search a DataImportHandler solr index

2008-10-22 Thread Matthew Runo
DataImportHandler is only a way to get data into your index, from a relational database of some sort. It won't affect your Solr reads in any way - so everything that Solr normally does will still work the same. (I have not had a chance to look at it in depth, but searching the index would

Re: Advice needed on master-slave configuration

2008-10-22 Thread William Pierce
Otis, Yes, I had forgotten that Windows will not permit me to overwrite files currently in use. So my copy scripts are failing. Windows will not even allow a rename of a folder containing a file in use so I am not sure how to do this I am going to dig around and see what I can come u

Re: Solr for Whole Web Search

2008-10-22 Thread Jon Baer
If that is the case you should look @ the DataImportHandler examples as they can already index RSS, im doing it now for ~ a dozen feeds on an hourly basis. (This is also for any XML-based feed for XHTML, XML, etc). I find Nutch more useful for plain vanilla HTML (something that was built

Re: Issues with facet

2008-10-22 Thread prerna07
Thanks, it helped. We were using *_s fields which had analyser section. We used to copy all fields in some other field type and used this new type in facet. It is working fine now. Thanks, Prerna prerna07 wrote: > > Hi, > > On using Facet in solr query I am facing various issues. > > Sc

Re: error with delta import

2008-10-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
The case in point is DIH. DIH uses the standard DOM parser that comes w/ JDK. If it reads the xml properly do we need to complain?. I guess that data-config.xml may not be used for any other purposes. On Wed, Oct 22, 2008 at 10:10 PM, Walter Underwood <[EMAIL PROTECTED]> wrote: > On 10/22/08 8:5

Re: Advice needed on master-slave configuration

2008-10-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
If you are using a nightly you can try the new SolrReplication feature http://wiki.apache.org/solr/SolrReplication On Thu, Oct 23, 2008 at 4:32 AM, William Pierce <[EMAIL PROTECTED]> wrote: > Otis, > > Yes, I had forgotten that Windows will not permit me to overwrite files > currently in use.