DisMaxRequestHandler bf configuration

2010-01-05 Thread Andy
I'd like to boost every query using {!boost b=log(popularity)}. But I'd rather not have to prepend that to every query. It'd be much cleaner for me to configure Solr to use that as default. My plan is to make DisMaxRequestHandler the default handler and add the following to solrconfig.xml:   

Re: Rules engine and Solr

2010-01-05 Thread Ravi Gidwani
Avlesh: I am currently working on some of kind rules in front (application side) of our solr instance. These rules are more application specific and are not general. Like deciding which fields to facet, which fields to return in response, which fields to highlight, boost value for each f

Re: Solr Replication Questions

2010-01-05 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Wed, Jan 6, 2010 at 2:51 AM, Giovanni Fernandez-Kincade wrote: > http://wiki.apache.org/solr/SolrReplication > > I've been looking over this replication wiki and I'm still unclear on a two > points about Solr Replication: > > 1.     If there have been small changes to the index on the master,

Re: replicating extension JARs

2010-01-05 Thread Noble Paul നോബിള്‍ नोब्ळ्
jars are not replicated. It is by design. But that is not to say that we can't do it. open an issue . On Wed, Jan 6, 2010 at 6:20 AM, Ryan Kennedy wrote: > Will the built-in Solr replication replicate extension JAR files in > the "lib" directory? The documentation appears to indicate that only >

Re: Rules engine and Solr

2010-01-05 Thread Avlesh Singh
> > Your question appears to be an "XY Problem" ... that is: you are dealing > with "X", you are assuming "Y" will help you, and you are asking about "Y" > without giving more details about the "X" so that we can understand the full > issue. Perhaps the best solution doesn't involve "Y" at all? Se

Re: Using IDF to find Collactions and SIPs . . ?

2010-01-05 Thread Christopher Ball
Hoss, Thanks for your reply. As you pointed out the Terms Component alone with the terms.maxcount did the trick for single terms. And ShingleFilter did the trick for phrases. I have not ventured into Hadoop just yet - any examples you could point me to of simple map/reduce jobs?

RE: Listing Terms by Ascending IDF value . . ?

2010-01-05 Thread Christopher Ball
Thanks - I was overlooking the Terms Component and given I can specify terms.maxcount I can live without the ascending order. -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Tuesday, January 05, 2010 2:56 AM To: solr-user@lucene.apache.org Subject: Re

replicating extension JARs

2010-01-05 Thread Ryan Kennedy
Will the built-in Solr replication replicate extension JAR files in the "lib" directory? The documentation appears to indicate that only the index and any specified configuration files will be replicated, however if your solrconfig.xml references a class in a JAR file added to the lib directory the

Custom Analyzer/Tokenizer works but results were not saved

2010-01-05 Thread MitchK
Hello community, I wrote another mail today, but I think something goes wrong (I can't find my post in the mailinglist) - if not, I am sorry for posting a "doublepost" - I am using a maillist for the first time. I have created a custom analyzer, which contains on a LowerCaseTokenizer, a StopFilt

Re: Basic sentence parsing with the regex highlighter fragmenter

2010-01-05 Thread Caleb Land
I've tracked this problem down to the fact that I'm using the WordDelimiterFilter. I don't quite understand what's happening, but if I add preserveOriginal="1" as an option, everything looks fine. I think it has to do with the period being stripped in the token stream. On Tue, Jan 5, 2010 at 2:05

Re: Tokenizing problem with numbers in query

2010-01-05 Thread Ahmet Arslan
> Thanks to both of you for the quick > answers, > > analysis.jsp shows that the WordDelimiterFilterFactory is > performing the > split > > I was experimenting around with the delimiters for the last > two days but am > still unable to obtain the desired result. > > I tried entirely kicking sol

Re: Tokenizing problem with numbers in query

2010-01-05 Thread Bernd Brod
Hi, On Tue, Jan 5, 2010 at 5:17 PM, Erick Erickson wrote: > We need to back up, this is looking like an XY problem. That is, > you're asking for specifics when what would probably be more > helpful is for you to describe *what* the problem you're trying > to solve is rather than *how* to make a s

Solr Replication Questions

2010-01-05 Thread Giovanni Fernandez-Kincade
http://wiki.apache.org/solr/SolrReplication I've been looking over this replication wiki and I'm still unclear on a two points about Solr Replication: 1. If there have been small changes to the index on the master, does the slave copy the entire contents of the index files that were affecte

Re: internal XML parser used in Solr

2010-01-05 Thread Chris Hostetter
: For ex : which jar of solr contains org.xml.sax. .. package. none of them. it's an "endoresed standard" provided by the JRE (but overridable at runtime if you'd like to use an alternate implementation) ... http://java.sun.com/j2se/1.5.0/docs/guide/standards/ -Hoss

Re: internal XML parser used in Solr

2010-01-05 Thread Smith G
Hello.., 1) Yeah, I have found that before. But, which .jar file of Solr ( should be one of the jars inside Solr ) contains all the supporting classes related to xml parsing. For ex : which jar of solr contains org.xml.sax. .. package. 2) Do you mean, I can straightly use SAX api, f

Re: Rules engine and Solr

2010-01-05 Thread Chris Hostetter
: I am planning to build a rules engine on top search. The rules are database : driven and can't be stored inside solr indexes. These rules would ultimately : two do things - : :1. Change the order of Lucene hits. :2. Add/remove some results to/from the Lucene hits. : : What should be my

Re: performance question

2010-01-05 Thread Chris Hostetter
: > So, in general, there is no *significant* performance difference with using : > dynamic fields. Correct? : : Correct. There's not even really an "insignificant" performance difference. : A dynamic field is the same as a regular field in practically every way on the : search side of things.

Re: Reload synonyms

2010-01-05 Thread Chris Hostetter
: Subject: Reload synonyms : References: <00b501ca8db9$7e119c70$0301a...@cgifederal.com> : <69de18141001042355l4c98e147r8cd0ae73d3836...@mail.gmail.com> : In-Reply-To: <69de18141001042355l4c98e147r8cd0ae73d3836...@mail.gmail.com> http://people.apache.org/~hossman/#threadhijack Thread Hijacking o

RE: Solr Cell - PDFs plus literal metadata - GET or POST ?

2010-01-05 Thread Giovanni Fernandez-Kincade
Really? Doesn't it have to be delimited differently, if both the file contents and the document metadata will be part of the POST data? How does Solr Cell tell the difference between the literals and the start of the file? I've tried this before and haven't had any luck with it. -Original

Basic sentence parsing with the regex highlighter fragmenter

2010-01-05 Thread Caleb Land
Hello, I'm using Solr 1.4, and I'm trying to get the regex fragmenter to parse basic sentences, and I'm running into a problem. I'm using the default regex specified in the example solr configuration: [-\w ,/\n\"']{20,200} But I am using a larger fragment size (140) with a slop of 1.0. Given th

Re: internal XML parser used in Solr

2010-01-05 Thread Peter Wolanin
Config.java (which parses e.g. solrconfig.xml) in the solr core code has: import org.w3c.dom.Document; import org.w3c.dom.Node; import org.xml.sax.SAXException; import org.apache.solr.common.SolrException; import org.apache.solr.common.util.DOMUtil; import javax.xml.parsers.*; import javax.xml.xpa

Re: Indexing large text documents

2010-01-05 Thread Grant Ingersoll
I haven't tried it, but you might be able to use either (and this is just me thinking aloud): DataImportHandler with the FileEntityProcessor Remote Streaming - (you might have to write out Solr XML or do something else) -Grant On Jan 5, 2010, at 4:05 AM, Mark N wrote: > SolrInputDocument doc1

dramatic load from stas.jsp page

2010-01-05 Thread Peter Wolanin
The attached screenshot shows the transition on a master search server when we updated from a Solr 1.4 dev build (revision 779609 from 2009-05-28) to the Solr 1.4.0 released code. Every 3 hours we have a cron task to log some of the data from the stats.jsp page from each core (about 100 cores, mos

Re: Indexing the latests MS Office documents

2010-01-05 Thread Jay Hill
The version of Tika in the 1.4 release definitely parses the most current Office formats (.docx, .pptx, etc.) and they index as expected. -Jay On Mon, Jan 4, 2010 at 6:02 PM, Peter Wolanin wrote: > You must have been searching old documentation - I think tika 0,3+ has > support for the new MS f

Re: Tokenizing problem with numbers in query

2010-01-05 Thread Erick Erickson
We need to back up, this is looking like an XY problem. That is, you're asking for specifics when what would probably be more helpful is for you to describe *what* the problem you're trying to solve is rather than *how* to make a specific behavior happen. Although re-reading your original e-mail do

Re: Tokenizing problem with numbers in query

2010-01-05 Thread Bernd Brod
Thanks to both of you for the quick answers, analysis.jsp shows that the WordDelimiterFilterFactory is performing the split I was experimenting around with the delimiters for the last two days but am still unable to obtain the desired result. I tried entirely kicking solr.WordDelimiterFilterFact

internal XML parser used in Solr

2010-01-05 Thread Smith G
Hello , There are some project specific schema xml files which should be parsed. I have used Jdom API for the same. But it seems more clean to shift to xml parser used by Solr itself. I have gone through source codes.Its a bit confusing. I have found javax.xml package and also org.xml.sax

Re: Indexing large text documents

2010-01-05 Thread Glen Newton
(In Lucene) I break the document into smaller pieces, then add each piece to the Document field in a loop. This seems to work better, but will mess-around with analysis like term offsets. This should work in your example. In Lucene, you can also add the field using a Reader to the file in question

Re: Facets and distributed search

2010-01-05 Thread Aleksander Stensby
Hi Yonik! I've tried recreating the problem now to get some log-output and the problem just doesn't seem to be there anymore... This puzzles me abit, as the problem WAS definitely there before. I've done one change and that is to optimize the index on one of the servers. But should that impact thi

Re: Invalid CRLF - StreamingUpdateSolrServer ?

2010-01-05 Thread Patrick Sauts
The issue was sometimes null result during facet navigation or simple search, results were back after a refresh, we tried to changed the cache to . But same behaviour. That is strange. Just to make sure, you were using the same LBHttpSolrServer instance for all requests, weren't you?

Removing facets which frequency match the result count

2010-01-05 Thread joeMcElroy
Is there any way to specify to solr only to bring back facet filter options where the frequency is less than the total results found? I found facets which match the result count are not helpful to the user, and produce noise within the UI to filter results. I can obviously do this within the vie

Re: Invalid CRLF - StreamingUpdateSolrServer ?

2010-01-05 Thread Shalin Shekhar Mangar
On Mon, Jan 4, 2010 at 7:13 PM, Patrick Sauts wrote: > The issue was sometimes null result during facet navigation or simple > search, results were back after a refresh, we tried to changed the cache to > . But same behaviour. > > That is strange. Just to make sure, you were using the same LBHttpS

Re: Reload synonyms

2010-01-05 Thread Siddhant Goel
On Tue, Jan 5, 2010 at 2:24 PM, Peter A. Kirk wrote: > Thanks for the answer. How does one "reload" a core? Is there an API, or a > url one can use? > I think this should be it - http://wiki.apache.org/solr/CoreAdmin#RELOAD -- - Siddhant

Indexing large text documents

2010-01-05 Thread Mark N
SolrInputDocument doc1 = new SolrInputDocument(); doc1.addField( "Fulltext", strContent); strContent is a string variable which contains contents of text file. ( assume that text file is located in c:\files\abc.txt ) In my case abc.text ( text files ) could be very huge ~ 2 GB so it is not a

RE: Reload synonyms

2010-01-05 Thread Peter A. Kirk
Thanks for the answer. How does one "reload" a core? Is there an API, or a url one can use? Med venlig hilsen / Best regards Peter Kirk E-mail: mailto:p...@alpha-solutions.dk -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: 5. januar 2010 21:46 To:

Re: Reload synonyms

2010-01-05 Thread Shalin Shekhar Mangar
On Tue, Jan 5, 2010 at 2:03 PM, Peter A. Kirk wrote: > > Is it possible to reload the synonym list, if for example "synonyms.txt" is > changed, without having to restart the server? Is the same possible with > stop-words? > > Yes you can reload a core but there are two catches: 1. Reloading a

Reload synonyms

2010-01-05 Thread Peter A. Kirk
Hi Is it possible to reload the synonym list, if for example "synonyms.txt" is changed, without having to restart the server? Is the same possible with stop-words? Thanks, Peter