Re: Is semicolon a character that needs escaping?

2010-09-02 Thread Michael Lackhoff
Hi Ken, >>> But in general escaping characters in a query gets tricky - if you >>> can >>> directly build queries versus pre-processing text sent to the query >>> parser, you'll save yourself some pain and suffering. >> >> What do you mean by these two alternatives? That is, what exactly >> co

Re: Is semicolon a character that needs escaping?

2010-09-02 Thread Ken Krugler
Hi Michael, But in general escaping characters in a query gets tricky - if you can directly build queries versus pre-processing text sent to the query parser, you'll save yourself some pain and suffering. What do you mean by these two alternatives? That is, what exactly could I do better?

Re: Is semicolon a character that needs escaping?

2010-09-02 Thread Michael Lackhoff
On 03.09.2010 00:57 Ken Krugler wrote: > The docs need to be updated, I believe. From some code I wrote back in > 2006... > [...] Thanks this explains it very well. > But in general escaping characters in a query gets tricky - if you can > directly build queries versus pre-processing text se

Re: Hardware Specs Question

2010-09-02 Thread Shawn Heisey
On 9/2/2010 2:54 AM, Toke Eskildsen wrote: We've done a fair amount of experimentation in this area (1997-era SSDs vs. two 15.000 RPM harddisks in RAID 1 vs. two 10.000 RPM harddisks in RAID 0). The harddisk setups never stood a chance for searching. With current SSD's being faster than harddisk

Re: Auto Suggest

2010-09-02 Thread Lance Norskog
What does analysis.jsp show? On Thu, Sep 2, 2010 at 5:53 AM, Jason Rutherglen wrote: > I'm having a different issue with the EdgeNGram technique described > here: > http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ > > That is one word queries q=

Re: Solr crawls during replication

2010-09-02 Thread Lance Norskog
Yes, the rsync scripts are still there. And they still work fine. It definitely helps to be a Unix shell wiz. You would add an option to the rsync call in the scripts that does rsync throttling. Rsync is just a standard copying tool in the SSH toolsuite. It's 12 years old and works quite well. L

Re: Do commits block updates in SOLR 1.4?

2010-09-02 Thread Lance Norskog
Yes, indexing synchronized during commits. You can call commit all you want, and index docs, and commit will finish and then indexing will restart. Previous Solr release did this also; how far back is your existing Solr? On Thu, Sep 2, 2010 at 1:11 PM, Robert Petersen wrote: > Hello sorry to bot

Re: Alphanumeric wildcard search problem

2010-09-02 Thread Erick Erickson
All I can say is that I just tried it with the following definitions and it works as expected. That is: http://localhost:8983/solr/select/?q=eoe:R-1*&version=2.2&start=0&rows=10&indent=on returns nothing (casing i

Re: how/why would I use LiteralValueSource and can I create a custom string function?

2010-09-02 Thread Gerald
Thanks Grant Am looking forward to the day when I can create a SOLR URL that looks something like this: http://mysolrserver:8080/solr/select?q=*:* AND mycustomstrfunction(mysolrstrfield):'somestringvalue' AND mycustomintfunction(mysolrintfield):[1 TO 100] -- View this message in context: http:

Re: In Need of Direction; Phrase-Context Tracking / Injection (Child Indexes) / Dismissal

2010-09-02 Thread Scott Gonyea
Hi Grant, Thanks for replying--sorry for sticking this on dev; I had imagined that development against the Solr codebase would be inevitable. The application has to do with regulatory and legal compliance work by a non-profit, and is "socially good," but I need to 'abstract' the problem/goals--as

Re: Is semicolon a character that needs escaping?

2010-09-02 Thread Ken Krugler
On Sep 2, 2010, at 12:35pm, Michael Lackhoff wrote: According to http://lucene.apache.org/java/2_9_1/ queryparsersyntax.html only these characters need escaping: + - && || ! ( ) { } [ ] ^ " ~ * ? : \ but with this simple query: TI:stroke; AND TI:journal I got the error message: HTTP ERROR: 400

Re: Throttling replication

2010-09-02 Thread Koji Sekiguchi
(10/09/03 5:42), Brandon Evans wrote: On 9/2/10 11:16 AM, Mark wrote: I am using the built in replication. Can you send me a link to the patch so I can give it a try? Thanks I see my email wasn't very clear. Sorry to get your hopes up. The patch I have is only for the rsync based replica

Re: Download document from solr

2010-09-02 Thread Lance Norskog
Yes. Indexing a PDF&other types with '/extract' means that Solr finds words in the document and indexes those in a field 'content'. It does not save the binary contents of the file. You could make a request handler that fetches one document and generates a redirect to the link. On Thu, Sep 2, 2010

Re: Indexing boolean value

2010-09-02 Thread Lance Norskog
It is possible that the DIH JDBC interface does not handle the MySQL 'bit' type. In general, it is easiest to make a DB view for your DIH query. It lets you see what the DHH gets, and the DIH syntax is easier.. You can throw in a translator function there to turn a 'bit' into an integer. On Thu,

Re: shingles work in analyzer but not real data

2010-09-02 Thread Dennis Gearon
I thought shingles were either a viral infection or roof material? (Hey, it's crazy friday early for me) Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Thu, 9/2/10, J

Re: shingles work in analyzer but not real data

2010-09-02 Thread Jonathan Rochkind
I've run into this before too. Both the dismax and solr-lucene _query parsers_ will tokenize a query on whitespace _before_ they pass the query to any field analyzers. There are some reasons for this, lots of things wouldn't work if they didn't do this. But it makes your approach kind of har

Re: Affinity ranking

2010-09-02 Thread Ukyo Virgden
Hi Lance, After doing some research I'm heading the same direction. The task seems to be a link graph process, a perfect mapreduce type of task. Thanks for the reply. On Tue, Aug 31, 2010 at 5:38 AM, Lance Norskog wrote: > This is a mass batch-processing task, rather than a search task. > Maho

Re: how/why would I use LiteralValueSource and can I create a custom string function?

2010-09-02 Thread Grant Ingersoll
On Sep 2, 2010, at 3:39 PM, Gerald wrote: > > while investigating custom functions in solr, I noticed LiteralValueSource > > according to the one line documentation will "Pass a the field value through > as a String, no matter the type" > > how would I use such a value source? if it is a valu

Re: In Need of Direction; Phrase-Context Tracking / Injection (Child Indexes) / Dismissal

2010-09-02 Thread Grant Ingersoll
Dropping d...@lucene.a.o. How about we step back and please explain the problem you are trying to solve, as opposed to the proposed solution to the problem below. You can likely do what you want below in Solr/Lucene (modulo replacing the index with a new document), but the bigger question is "

Re: Throttling replication

2010-09-02 Thread Brandon Evans
On 9/2/10 11:16 AM, Mark wrote: I am using the built in replication. Can you send me a link to the patch so I can give it a try? Thanks I see my email wasn't very clear. Sorry to get your hopes up. The patch I have is only for the rsync based replication. Not the built in java based repl

RE: Do commits block updates in SOLR 1.4?

2010-09-02 Thread Robert Petersen
Hello sorry to bother but does anyone know the answer to this? This is the closest thing I can find on the subject: http://lucene.472066.n3.nabble.com/Autocommit-blocking-adds-AutoCommit-S peedup-td498465.html -Original Message- From: Robert Petersen [mailto:rober...@buy.com] Sent: Wedn

how/why would I use LiteralValueSource and can I create a custom string function?

2010-09-02 Thread Gerald
while investigating custom functions in solr, I noticed LiteralValueSource according to the one line documentation will "Pass a the field value through as a String, no matter the type" how would I use such a value source? if it is a value source, I should be able to use it in functionqueries fo

Purpose of SolrDocument.java

2010-09-02 Thread stockii
I worling through the SolrCode and i want to know how the class SolrDocument used in Solr !? When i start and debug a normal search, SolrDocument never been used. (Standard SearchHandler with q-Query). i thought, this class is a representation of a Doc from the Index as a higher level doc above t

Is semicolon a character that needs escaping?

2010-09-02 Thread Michael Lackhoff
According to http://lucene.apache.org/java/2_9_1/queryparsersyntax.html only these characters need escaping: + - && || ! ( ) { } [ ] ^ " ~ * ? : \ but with this simple query: TI:stroke; AND TI:journal I got the error message: HTTP ERROR: 400 Unknown sort order: TI:journal My first guess was that i

RE: shingles work in analyzer but not real data

2010-09-02 Thread Steven A Rowe
Hi Jeff, Have you seen PositionFilterFactory?: Steve > -Original Message- > From: Jeff Rose [mailto:j...@globalorange.nl] > Sent: Thursday, September 02, 2010 9:06 AM > To: solr-user@lucene.apache.

Re: Throttling replication

2010-09-02 Thread Mark
On 9/2/10 10:21 AM, Brandon Evans wrote: Are you using rsync replication or the built in replication available in solr 1.4? I have a patch that allows easily allows the --bwlimit option to be added to the rsyncd command line. Either way I agree that a way to throttle the replication bandwidt

false matches with ReversedWildcardFilterFactory

2010-09-02 Thread Landon Kuhn
Hello, I am using the ReversedWildcardFilterFactory, and I am wondering if there is a way to prevent false matches when a query token matches the reversed indexed token. For instance, the query *zemog* matches documents that contain Gomez. I am pretty much using the fieldType configuration from the

Re: Throttling replication

2010-09-02 Thread Brandon Evans
Are you using rsync replication or the built in replication available in solr 1.4? I have a patch that allows easily allows the --bwlimit option to be added to the rsyncd command line. Either way I agree that a way to throttle the replication bandwidth would be nice. -brandon On 9/2/10 7:4

false matches with ReversedWildcardFilterFactory

2010-09-02 Thread Landon Kuhn
Hello, I am using the ReversedWildcardFilterFactory, and I am wondering if there is a way to prevent false matches when a query token matches the reversed indexed token. For instance, the query *zemog* matches documents that contain Gomez. I am pretty much using the fieldType configuration from the

Re: Localsolr with Dismax

2010-09-02 Thread Luke Tebbs
Thanks Dan, That seems to have moved things forwards, however if I do this I get two sets, presumably one from localsolr and one from dismax. e.g - 0 116 ... ... Also it seems to explode with a NullPointerException if I dare to try and sort by distance - INFO: [testCore] webapp=/so

Re: Throttling replication

2010-09-02 Thread Mark
On 9/2/10 8:27 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote: There is no way to currently throttle replication. It consumes the whole bandwidth available. It is a nice to have feature On Thu, Sep 2, 2010 at 8:11 PM, Mark wrote: Is there any way or forthcoming patch that would allow configuration of

Re: Solr crawls during replication

2010-09-02 Thread Mark
On 8/6/10 5:03 PM, Chris Hostetter wrote: : We have an index around 25-30G w/ 1 master and 5 slaves. We perform : replication every 30 mins. During replication the disk I/O obviously shoots up : on the slaves to the point where all requests routed to that slave take a : really long time... somet

Re: Throttling replication

2010-09-02 Thread Noble Paul നോബിള്‍ नोब्ळ्
There is no way to currently throttle replication. It consumes the whole bandwidth available. It is a nice to have feature On Thu, Sep 2, 2010 at 8:11 PM, Mark wrote: >  Is there any way or forthcoming patch that would allow configuration of how > much network bandwith (and ultimately disk I/O) a

Re: Localsolr with Dismax

2010-09-02 Thread dan whelan
I experienced the same issue. The localsolr site says to configure like this: localsolr facet mlt highlight debug but the default solr components are (note the above config is missing query): query facet mlt highlight stats debug I fixed it by doing this instead localsolr On 9/2/

Throttling replication

2010-09-02 Thread Mark
Is there any way or forthcoming patch that would allow configuration of how much network bandwith (and ultimately disk I/O) a slave is allowed during replication? We have the current problem of while replicating our disk I/O goes through the roof. I would much rather have the replication take

Re: Download document from solr

2010-09-02 Thread Matteo Moci
Thank you for the suggestions, I just completed the tutorial at http://lucene.apache.org/solr/tutorial.html and i understood that in the GET parameters I can choose wt=standard (and obtain an xml structure in the results), wt=json or wt=php. All of them display the results inline, in the sense

Re: anybody using solr with Cassandra?

2010-09-02 Thread Christos Constantinou
This is a very interesting topic. Nick if you could give some more howtos or information on your setup it would be great! What things are you using out of the box and what did you have to develop? Christos On 31 Aug 2010, at 07:12, Siju George wrote: > We will be suing Solr for indexing and Ca

Re: solr user

2010-09-02 Thread kenf_nc
You are querying for 'branch' and trying to place it in 'skill'. Also, you have Name and Column backwards, it should be: -- View this message in context: http://lucene.472066.n3.nabble.com/solr-user-tp1404814p1406343.html Sent from the Solr - User mailing list archive at Nabble.com.

Exception in Field Collapsing

2010-09-02 Thread Moazzam Khan
Hi Guys, My index contains 89k documents and I don't store some of the text fields because they are way too big. When I run a normal search without field collapsing everything works but when I enable field collapsing on the same search, it crashes with a 500 error (exception). The URL I call is t

Re: Need help with field collapsing and out of memory error

2010-09-02 Thread Moazzam Khan
Oh, I don't know if this matters but I store text fields in Solr but I never get them from the index (I only get the ID field from the index and everything else is pulled from DB cache). I store all the fields just in case I need to debug search queries, etc and want to see the data. Regards, Moa

Re: shingles work in analyzer but not real data

2010-09-02 Thread Jeff Rose
On Wed, Sep 1, 2010 at 3:35 PM, Robert Muir wrote: > On Wed, Sep 1, 2010 at 8:21 AM, Jeff Rose wrote: > > > Hi, > > We are using SOLR to match query strings with a keyword database, where > > some of the keywords are actually more than one word. For example a > > keyword > > might be "apple pi

Re: Auto Suggest

2010-09-02 Thread Jason Rutherglen
I'm having a different issue with the EdgeNGram technique described here: http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ That is one word queries q=app on the query_text field, work fine however "q=app mou" do not. Why would this be or is ther

Re: morelikethis - "stored=true" is necessary?

2010-09-02 Thread Markus Jelsma
The following table [1] will be most helpful! Keep it referenced! [1]: http://wiki.apache.org/solr/FieldOptionsByUseCase On Thursday 02 September 2010 13:20:33 zqzuk wrote: > Hi all > I am learning to use morelikethis handler, which seems very straightforward > but I got some problems when testin

Standard tokenizer and SOLR-1630

2010-09-02 Thread Andre Hagenbruch
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi all, just a quick question: am I supposed to still experience the problems outlined in with 3.1-dev 991853 and the field to do the spell checking on configured as in

Re: stream.url

2010-09-02 Thread satya swaroop
Hi, I made the curl from the shell(command prompt or terminal) with the escaping characters but the error is same when i saw in the remote system the request is not getting there Is there anything to be changed in config file inorder to enable the escaping characters for stream.url

Re: java.lang.OutOfMemoryError: PermGen space when reopening solr server

2010-09-02 Thread Luke Tebbs
Antonio Calo' wrote: Il 02/09/2010 8.51, Lance Norskog ha scritto: Loading a servlet creates a bunch of classes via reflection. These are in PermGen and never go away. If you load&unload over and over again, any PermGen setting will fill up. I agree , taking a look to all the links suggested by

morelikethis - "stored=true" is necessary?

2010-09-02 Thread zqzuk
Hi all I am learning to use morelikethis handler, which seems very straightforward but I got some problems when testing and I wonder if you could help me. In my schema I have With this schema when I use the query parameter mlt.fl=page_content The returned XML results in the "moreLiksThis" se

Localsolr with Dismax

2010-09-02 Thread Luke Tebbs
Anyone? I'm really lost as to what to do here... if anyone has any experience with this or even ideas of things to try I'd really appreciate your input. It seems like what I'm trying to do should work but for some reason 'defType' seems to be ignored Thankyou Luke Original Me

Re: java.lang.OutOfMemoryError: PermGen space when reopening solr server

2010-09-02 Thread Luke Tebbs
I agree. I wasn't proposing it as a fix merely as a means to reduce the time between restarts. Luke Lance Norskog wrote: Loading a servlet creates a bunch of classes via reflection. These are in PermGen and never go away. If you load&unload over and over again, any PermGen setting will fill

Re: Alphanumeric wildcard search problem

2010-09-02 Thread Hasnain
Erick, I have checked with lowercasing, and yes there are Items by this name. Im not getting anywhere with this, tried many things and Im really perplexed. any other suggestion? Oh dear. Wildcard queries aren't analyzed, so I suspect it's a casing issue. Try two things: 1> searc

Re: java.lang.OutOfMemoryError: PermGen space when reopening solr server

2010-09-02 Thread Antonio Calo'
Il 02/09/2010 8.51, Lance Norskog ha scritto: Loading a servlet creates a bunch of classes via reflection. These are in PermGen and never go away. If you load&unload over and over again, any PermGen setting will fill up. I agree , taking a look to all the links suggested by Peter seems that thi

Re: Hardware Specs Question

2010-09-02 Thread Toke Eskildsen
On Thu, 2010-09-02 at 03:37 +0200, Lance Norskog wrote: > I don't know how much SSD disks cost, but they will certainly cure the > disk i/o problem. We've done a fair amount of experimentation in this area (1997-era SSDs vs. two 15.000 RPM harddisks in RAID 1 vs. two 10.000 RPM harddisks in RAID 0

MoreLikethis and fq not giving exact results ?

2010-09-02 Thread Sumit Arora
> > Hi All, > > I have provided identifications ,While submitting document to Solr e.g; > jp_ for job posting , cp_ for career profile , and it stores id in a form of > : jp_1, or jp_2 etc or cp_1 or cp_2 etc. > > So when I perform standard query with fq=cp_ , then its provide me the > results be

Re: stream.url

2010-09-02 Thread Stefan Moises
Hi, well, you'll have to write a routine which escapes all filenames before transmitting... wether in a shell, in Java, PHP, Javascript or whereever you are submitting your CURL calls. Here is a javascript example that helps with escaping: http://www.xs4all.nl/~jlpoutre/BoT/Javascript/Utils/e

Re: stream.url

2010-09-02 Thread satya swaroop
Hi stefan, I used escape charaters and made it... It is not problem for a single file of 'solr &apache' but it shows the same problem for the files like Wireless lan.ppt, Tom info.pdf. the curl i sent is:: curl " http://localhost:8080/solr/update/extract?stream.url=http://remot

RE: Indexing boolean value

2010-09-02 Thread kirsty
Michael Griffiths wrote: > > Copyfield copies the field so you can have multiple versions. Useful to > dump all fields into one "super" field you can search on, for perf > reasons. > > If the column isn't being indexed, I'd suggest the problem is in DIH. No > suggestions as to why, I'm afraid.

Re: stream.url

2010-09-02 Thread Stefan Moises
Hi, this has nothing to do with Solr... you can't use a filename containing "&" as a URL parameter... if you really need to submit such a weird named file, you have to escape the "&", see http://www.december.com/html/spec/esccodes.html for the code... Cheers, Stefan Am 02.09.2010 09:35, sc

stream.url

2010-09-02 Thread satya swaroop
Hi all, I am using stream.url to index the files in the remote system. when i use the url as 1) curl " http://localhost:8080/solr/update/extract?stream.url=http://remotehost:port/file_download.yaws?file=yaws_presentation.pdf&literal.id=schb4 " it works and i get the response as the file got

Solr-related meeting in Delhi, India: At Dilli Haat: 5pm, Sun., 5th Sep.

2010-09-02 Thread Gora Mohanty
Hi, A Solr-related meeting will take place in Delhi, India, as per the details below. As the FOSS community in Delhi/NCR is quite small, and getting increasingly fragmented, we have tried to combine more than one topic, in the interest of getting more attendees. Event:FOSS meeting

Re: java.lang.OutOfMemoryError: PermGen space when reopening solr server

2010-09-02 Thread Peter Karich
Hi, that issue is not really related to solr. See this: http://stackoverflow.com/questions/88235/how-to-deal-with-java-lang-outofmemoryerror-permgen-space-error Increasing maxpermsize -XX:MaxPermSize=128m does not really solve this issue but you will see less errros :-) I have written a mini mon