Re: Using hl.regex.pattern to print complete lines
Ah, this makes sense. I've changed my regex to "(?m)^.*$", and it works better, but I still get fragments before and after some returns. Thanks for the hint! -Pete On Jul 8, 2010, at 6:27 PM, Chris Hostetter wrote: > > : If you can use the latest branch_3x or trunk, hl.fragListBuilder=single > : is available that is for getting entire field contents with search terms > : highlighted. To use it, set hl.useFastVectorHighlighter to true. > > He doesn't want the entire field -- his stored field values contain > multi-line strings (using newline characters) and he wants to make > fragments per "line" (ie: bounded by newline characters, or the start/end > of the entire field value) > > Peter: i haven't looked at the code, but i expect that the problem is that > the java regex engine isn't being used in a way that makes ^ and $ match > any line boundary -- they are probably only matching the start/end of the > field (and . is probably only matching non-newline characters) > > java regexes support embedded flags (ie: "(?xyz)your regex") so you might > try that (i don't remember what the correct modifier flag is for the > multiline mode off the top of my head) > > -Hoss >
Job offer / Oferta de trabajo - Madrid, Spain
Hello, not sure if i should really send this kind of stuff to the list, but since i guess it's only positive and someone might be interested... The company i work at is looking for people with experience with SolR/Lucene. Below, the offer: http://www.infojobs.net/pozuelo-de-alarcon/programador-solr/of-icbca57230549aab73e4b484023657f cheers, Hola, no estoy seguro que deberia enviar este tipo de cosa a la lista, pero como imagino que a alguien le podra venir bien... La empresa en la que trabajo esta buscando gente con experiencia en SolR/Lucene. Sigue la oferta abajo: http://www.infojobs.net/pozuelo-de-alarcon/programador-solr/of-icbca57230549aab73e4b484023657f saludos, Leonardo Menezes
Re: solr connection question
xD On Thu, Jul 8, 2010 at 2:58 PM, Alejandro Gonzalez wrote: > ok please don't forget it :) > > 2010/7/8 Ruben Abad > >> Jorl, ok tendré que modificar mi petición de vacaciones :( >> Rubén Abad >> >> >> On Thu, Jul 8, 2010 at 2:46 PM, ZAROGKIKAS,GIORGOS < >> g.zarogki...@multirama.gr> wrote: >> >> > Hi solr users >> > >> > I need to know how solr manages the connections when we make a >> > request(select update commit) >> > Is there any connection pooling or an article to learn about it >> connection >> > management?? >> > How can I log in a file the connections solr server >> > >> > I have setup my solr 1.4 with tomcat >> > >> > Thanks in advance >> > >> > >> > >> > >> > -- Whether it's science, technology, personal experience, true love, astrology, or gut feelings, each of us has confidence in something that we will never fully comprehend. --Roy H. William
Re: solr connection question
jarrlll On Fri, Jul 9, 2010 at 10:20 AM, Óscar Marín Miró wrote: > xD > > On Thu, Jul 8, 2010 at 2:58 PM, Alejandro Gonzalez > wrote: > > ok please don't forget it :) > > > > 2010/7/8 Ruben Abad > > > >> Jorl, ok tendré que modificar mi petición de vacaciones :( > >> Rubén Abad > >> > >> > >> On Thu, Jul 8, 2010 at 2:46 PM, ZAROGKIKAS,GIORGOS < > >> g.zarogki...@multirama.gr> wrote: > >> > >> > Hi solr users > >> > > >> > I need to know how solr manages the connections when we make a > >> > request(select update commit) > >> > Is there any connection pooling or an article to learn about it > >> connection > >> > management?? > >> > How can I log in a file the connections solr server > >> > > >> > I have setup my solr 1.4 with tomcat > >> > > >> > Thanks in advance > >> > > >> > > >> > > >> > > >> > > > > > > -- > Whether it's science, technology, personal experience, true love, > astrology, or gut feelings, each of us has confidence in something > that we will never fully comprehend. > --Roy H. William >
Re: index format error because disk full
Disk full should never lead to index corruption (except for very old versions of Lucene). Lucene always writes (and closes) all files associated with the segment, then fsync's them, before writing & fsync'ing the segments_N file that refers to these files. Can you describe in more detail the events that led up to the zero-bytes del file? What OS/filesystem? Is there any external process that could have truncated the file? Or possibly filesystem corruption? Mike On Wed, Jul 7, 2010 at 10:12 PM, Li Li wrote: > I use SegmentInfos to read the segment_N file and found the error is > that it try to load deletedDocs but the .del file's size is 0(because > of disk error) . So I use SegmentInfos to set delGen=-1 to ignore > deleted Docs. > But I think there is some bug. The logic of write my be -- it first > writes the .del file then write the segment_N file. But it only write > to buffer and don't flush to disk immediately. So when disk full. it > may happen that segment_N file is flushed but del file faild. > > 2010/7/8 Lance Norskog : >> If autocommit does not to an automatic rollback, that is a serious bug. >> >> There should be a way to detect that an automatic rollback has >> happened, but I don't know what it is. Maybe something in the Solr >> MBeans? >> >> On Wed, Jul 7, 2010 at 5:41 AM, osocurious2 >> wrote: >>> >>> I haven't used this myself, but Solr supports a >>> http://wiki.apache.org/solr/UpdateXmlMessages#A.22rollback.22 rollback >>> function. It is supposed to rollback to the state at the previous commit. So >>> you may want to turn off auto-commit on the index you are updating if you >>> want to control what that last commit level is. >>> >>> However, in your case if the index gets corrupted due to a disk full >>> situation, I don't know what rollback would do, if anything, to help. You >>> may need to play with the scenario to see what would happen. >>> >>> If you are using the DataImportHandler it may handle the rollback for >>> you...again, however, it may not deal with disk full situations gracefully >>> either. >>> -- >>> View this message in context: >>> http://lucene.472066.n3.nabble.com/index-format-error-because-disk-full-tp948249p948968.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >> >> >> >> -- >> Lance Norskog >> goks...@gmail.com >> >
Sort by Day - Use of DateMathParser in Function Query?
Dear all, this is not a new problem, I just wanted to check whether with 1.4 there might have been changes that allow a different approach. In my query, I retrieve results that have a date field. I have to sort the result by day only, then by a different string field. The time of that date shall not be used for sorting. I cannot filter the results on a certain date (day). This thread confirms my first thought that I need another field in the index: http://search.lucidimagination.com/search/document/422dc30e0a222c28/sorting_dates_with_reduced_precision#46566037750d7b5 However, is it possible to use the DateMathParser somehow in the function queries? If it's not yet possible - why not: (a) is there are great risk that the performance would be bad? Or some other reason that discourages this solution. (b) simple not implemented In case of (b), I might try to implement it. Thanks! Chantal
Sort by Day - Use of DateMathParser in Function Query?
[P.S. to my first post] Further contemplating http://wiki.apache.org/solr/FunctionQuery. I am using 1.4.1, the date field is configured like this: (The schema has been created using the schema file from 1.4.0, and I haven't changed anything when upgrading to 1.4.1. TrieDate is said to be the default in 1.4, so I would expect this date field to have that type?) On the wiki page, the following example is listed: Example: ms(NOW/DAY) Could I do that same thing with my own date? ms(start_date/DAY) I tried that query: http://192.168.2.40:8080/solr/epg/select?qt=dismax&fl=start_date,title&sort=ms%28start_date/DAY%29%20asc,title%20asc (search for all *:* configured in solrconfig.xml for dismax) I get the following error message back: """ message can not sort on undefined field: ms(start_date/DAY) description The request sent by the client was syntactically incorrect (can not sort on undefined field: ms(start_date/DAY)). """ I am a complete newbie when it comes to function queries. Thanks for any suggestions! Chantal On Fri, 2010-07-09 at 11:44 +0200, Chantal Ackermann wrote: > Dear all, > > this is not a new problem, I just wanted to check whether with 1.4 there > might have been changes that allow a different approach. > > In my query, I retrieve results that have a date field. I have to sort > the result by day only, then by a different string field. The time of > that date shall not be used for sorting. > I cannot filter the results on a certain date (day). > > This thread confirms my first thought that I need another field in the > index: > http://search.lucidimagination.com/search/document/422dc30e0a222c28/sorting_dates_with_reduced_precision#46566037750d7b5 > > However, is it possible to use the DateMathParser somehow in the > function queries? > If it's not yet possible - why not: > (a) is there are great risk that the performance would be bad? Or some > other reason that discourages this solution. > (b) simple not implemented > > In case of (b), I might try to implement it. > > Thanks! > Chantal >
AW: Sort by Day - Use of DateMathParser in Function Query?
Hi Chantal, why dont you just add another Field to your Index where u put the Day only, you can sort by this filed then in your queries cheers. -Ursprüngliche Nachricht- Von: Chantal Ackermann [mailto:chantal.ackerm...@btelligent.de] Gesendet: Freitag, 9. Juli 2010 11:45 An: solr-user@lucene.apache.org Betreff: Sort by Day - Use of DateMathParser in Function Query? Dear all, this is not a new problem, I just wanted to check whether with 1.4 there might have been changes that allow a different approach. In my query, I retrieve results that have a date field. I have to sort the result by day only, then by a different string field. The time of that date shall not be used for sorting. I cannot filter the results on a certain date (day). This thread confirms my first thought that I need another field in the index: http://search.lucidimagination.com/search/document/422dc30e0a222c28/sorting_dates_with_reduced_precision#46566037750d7b5 However, is it possible to use the DateMathParser somehow in the function queries? If it's not yet possible - why not: (a) is there are great risk that the performance would be bad? Or some other reason that discourages this solution. (b) simple not implemented In case of (b), I might try to implement it. Thanks! Chantal
Re: Sort by Day - Use of DateMathParser in Function Query?
Sorry for the pollution. Sorting by function will only be possible with 1.5. In https://issues.apache.org/jira/browse/SOLR-1297, Grant writes: """ Note, there is a temporary workaround for this: (main query)^0 func(...) """ Is that workaround an option for my use case? Thanks, Chantal On Fri, 2010-07-09 at 12:08 +0200, Chantal Ackermann wrote: > [P.S. to my first post] > > Further contemplating http://wiki.apache.org/solr/FunctionQuery. > > I am using 1.4.1, the date field is configured like this: > omitNorms="true"/> > > (The schema has been created using the schema file from 1.4.0, and I > haven't changed anything when upgrading to 1.4.1. TrieDate is said to be > the default in 1.4, so I would expect this date field to have that > type?) > > On the wiki page, the following example is listed: > Example: ms(NOW/DAY) > Could I do that same thing with my own date? > ms(start_date/DAY) > > I tried that query: > http://192.168.2.40:8080/solr/epg/select?qt=dismax&fl=start_date,title&sort=ms%28start_date/DAY%29%20asc,title%20asc > > (search for all *:* configured in solrconfig.xml for dismax) > > I get the following error message back: > """ > message can not sort on undefined field: ms(start_date/DAY) > > description The request sent by the client was syntactically incorrect > (can not sort on undefined field: ms(start_date/DAY)). > """ > > I am a complete newbie when it comes to function queries. > > Thanks for any suggestions! > Chantal > > On Fri, 2010-07-09 at 11:44 +0200, Chantal Ackermann wrote: > > Dear all, > > > > this is not a new problem, I just wanted to check whether with 1.4 there > > might have been changes that allow a different approach. > > > > In my query, I retrieve results that have a date field. I have to sort > > the result by day only, then by a different string field. The time of > > that date shall not be used for sorting. > > I cannot filter the results on a certain date (day). > > > > This thread confirms my first thought that I need another field in the > > index: > > http://search.lucidimagination.com/search/document/422dc30e0a222c28/sorting_dates_with_reduced_precision#46566037750d7b5 > > > > However, is it possible to use the DateMathParser somehow in the > > function queries? > > If it's not yet possible - why not: > > (a) is there are great risk that the performance would be bad? Or some > > other reason that discourages this solution. > > (b) simple not implemented > > > > In case of (b), I might try to implement it. > > > > Thanks! > > Chantal > > >
Re: AW: Sort by Day - Use of DateMathParser in Function Query?
Hi Bastian, that is an option but it would be more flexible to sort using a function query. It looks like I'll have to add that field, however. At least, for as long as using 1.4. Thanks, Chantal On Fri, 2010-07-09 at 12:08 +0200, Bastian Spitzer wrote: > Hi Chantal, > > why dont you just add another Field to your Index where u put the Day only, > you can sort by this filed then > in your queries > > cheers. > > -Ursprüngliche Nachricht- > Von: Chantal Ackermann [mailto:chantal.ackerm...@btelligent.de] > Gesendet: Freitag, 9. Juli 2010 11:45 > An: solr-user@lucene.apache.org > Betreff: Sort by Day - Use of DateMathParser in Function Query? > > Dear all, > > this is not a new problem, I just wanted to check whether with 1.4 there > might have been changes that allow a different approach. > > In my query, I retrieve results that have a date field. I have to sort the > result by day only, then by a different string field. The time of that date > shall not be used for sorting. > I cannot filter the results on a certain date (day). > > This thread confirms my first thought that I need another field in the > index: > http://search.lucidimagination.com/search/document/422dc30e0a222c28/sorting_dates_with_reduced_precision#46566037750d7b5 > > However, is it possible to use the DateMathParser somehow in the function > queries? > If it's not yet possible - why not: > (a) is there are great risk that the performance would be bad? Or some other > reason that discourages this solution. > (b) simple not implemented > > In case of (b), I might try to implement it. > > Thanks! > Chantal >
Last day to submit your Surge 2010 CFP!
Today is your last chance to submit a CFP abstract for the 2010 Surge Scalability Conference. The event is taking place on Sept 30 and Oct 1, 2010 in Baltimore, MD. Surge focuses on case studies that address production failures and the re-engineering efforts that led to victory in Web Applications or Internet Architectures. You can find more information, including suggested topics and our current list of speakers, online: http://omniti.com/surge/2010 The final lineup should be available on the conference website next week. If you have questions about the CFP, attending Surge, or having your business sponsor/exhibit at Surge 2010, please contact us at su...@omniti.com. Thanks! -- Jason Dixon OmniTI Computer Consulting, Inc. jdi...@omniti.com 443.325.1357 x.241
MLT with boost capability
I've asked this question in the past without too much success. I figured I would try to revive it. Is there a way I can incorporate boost functions with a MoreLikeThis search? Can it be accomplished at the MLT request handler level or would I need to create a custom request handler which in turn delegates the majority of the search to a specialized instance of MLT? Can someone point me in the right direction? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/MLT-with-boost-capability-tp954650p954650.html Sent from the Solr - User mailing list archive at Nabble.com.
Function Query Sorting vs 'Sort' parameter?
Hi, I'm making some basic sorting (date, price, etc.) using the "sort" parameter (sort=field+asc), and it's working fine. I'm wondering whether there's a significant argument to use function query sorting instead of the "sort" parameter? Thanks, -S
Re: Realtime + Batch indexing
Replication does not transfer files that already exist on the slave and have the same metadata (size, last modified, etc) as the master. As far as deleting files, it will only do so if they do not exist on the master. In most cases, the only way that it would delete and copy the entire index is if the slave index were optimized after updating, which would result in different filenames with entirely different sizes and modification times. The wiki has more detail: http://wiki.apache.org/solr/SolrReplication#How_does_it_work.3F My build scripts use DIH full-import for a reindex, DIH delta-import for adding new content, and the XML update handler for deletes. Replication is very fast after an update on the master. I've got my replication interval set to 15 seconds, and once it's triggered, it typically only takes a second or two. I optimize one of my shards every day, and when that happens, replicating that shard (12GB) does take a little while. On 7/8/2010 10:48 PM, bbarani wrote: One final question about replication.. When I initiate replication I thought SOLR would delete the existing index in slave and just transfers the master index in to Slave. If thats the case there wont be any sync up issues right? I am asking this because everytime I initiate replication the index size of both slave and master becomes the same (even if for some reason if index size of slave is bigger than master it gets reduced to the same size as master after replication) so thought that SOLR just deletes the slave index and then moves all the files from master..
PDF remote streaming extract with lots of multiValues
How would I go about setting a large number of literal values in a call to index a remote PDF? I'm currently calling: http://host/solr/update/extract?literal.id=abc&literal.mycategory=blah&stream.url=http://otherhost/some/file.pdf And that works great, except now I'm coming across usecases where I need send in hundreds, up to thousands, of different values for 'mycategory'. So with mycategory defined as a multiValued string, I can call: http://host/solr/update/extract?literal.id=abc&literal.mycategory=blah&literal.mycategory=foo&literal.mycategory=bar&stream.url=http://otherhost/some/file.pdf and that works as expected. But when I try to embed thousands of literal.mycategory parameters in the call, eventually my container says 'look, I've been forgiving about letting you GET URLs far longer than 1500 characters, but this is ridiculous' and barfs on it. I've tried POSTing a ... command, but it only pays attention to parameters in the URL query string, ignoring everything in the document. I've seen some other threads that seem related, but now I'm just confused. What's the best way to tackle this? -dKt
Polish language support?
In IRC trying to help someone find Polish-language support for Solr. Seems lucene has nothing to offer? Found one stemmer that looks to be compatibly licensed in case someone wants to take a shot at incorporating it: http://www.getopt.org/stempel/ -Peter -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Polish language support?
Hi Peter, this stemmer is integrated into trunk and 3x. http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/stempel/ http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/lucene/contrib/analyzers/stempel/ On Fri, Jul 9, 2010 at 2:38 PM, Peter Wolanin wrote: > In IRC trying to help someone find Polish-language support for Solr. > > Seems lucene has nothing to offer? Found one stemmer that looks to be > compatibly licensed in case someone wants to take a shot at > incorporating it: http://www.getopt.org/stempel/ > > -Peter > > -- > Peter M. Wolanin, Ph.D. > Momentum Specialist, Acquia. Inc. > peter.wola...@acquia.com > -- Robert Muir rcm...@gmail.com
Re: Custom PhraseQuery
: Query: "foo bar" : Doc1: "foo bar baz" : Doc2: "foo bar foo bar" : : These two documents should be scored exactly the same. I accomplished the : above in the "normal" query use-case by using the SweetSpotSimilarity class. You can change this by subclassing SweetSpotSimilarity (or any Similarity class) and overridding the tf(float) function. tf(int) is called for terms, while tf(float) is called for for phrases -- the float value is lower for phrases with a lot of slop, and higher for exact matches. unfortunately, the input to tf(float) is lossy in accounting for docs htat match the phrase multiple times ... the value of "1.0f" might mean it mathes the phrase once exactly, or it might mean thta it matches many times in a sloppy manner. in your case, it sounds like you just want it to return "1" for any input except "0.0f" -Hoss
Re: Custom PhraseQuery
Oh.. i didnt know about the different signatures to tf. Thanks for that clarification. It sounds like all I need to do is actually override tf(float) in the SweetSpotSimilarity class to delegate to baselineTF just like tf(int) does. Is this correct? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-PhraseQuery-tp932414p955257.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrQueryResponse - Solr Documents
: How can I view solr docs in response writers before the response is sent : to the client ? What I get is only DocSlice with int values having size : equal the docs requested. All this while debugging on the : SolrQueryResponse Object. if you are writitng a custom ResponseWriter you can get the Documents corrisponding to a DocList (or DocSlice) by fetching the Documents from the SolrIndexSearcher (with is associated with the SolrQueryRequest) the int's in the DocSlice are the Lucene internal docIds that the IndexSearcher.document(int) method expects. Note: if you subclass the BaseResponseWriter class this is a lot easier -- it takes care of the hard work and converts the Document into a SolrDocument -- all you have to do is implement writeDoc(SolrDocument) -Hoss
Re: ClassCastException SOLR
: If you look at the Lucene factories, they all subclass from : BaseTokenFilterFactory which then subclasses from : BaseTokenStreamFactory. That last one does various things for the : child factories (I don't know what they are). Note also that if you really did copy the body of SynonymFilterFactory exactly (so that it already subclasses BaseTokenFilterFactory) the other possible cause of this problem is a classloader issue .. if the only thing in your plugin jar your new factory, and this new plugin is a lib dir that is either in your solr home dir, or configured in your solrconfig.xml, then you shouldn't have a problem. BUT! ... this wording here jumps out at me... : > I'm using the same dependencies as SOLR 1.4.1, because it caused problems : > with newer versions of lucene-core. ...you should be *compiling* against the same lucene/solr jars that come with SOlr, but you should not be trying to include any of those classes/jars in your classpath yourself -- having multiple instances of a class in the classloader can cause problems like the one you are seeing. -Hoss
RE: solr connection question
: Yes I mean HTTP-requests : How can I log them? it's entirely dependent on your Servlet Container (ie: jetty, tomcat, resin, weblogic, etc...) If you are using the example jetty provided in the solr releases (ie: java -jar start.jar) they show up in examples/logs -Hoss
Re: Custom PhraseQuery
: It sounds like all I need to do is actually override tf(float) in the : SweetSpotSimilarity class to delegate to baselineTF just like tf(int) does. : Is this correct? you have to decide how you want to map the float->int (ie: round, truncate, etc...) but otherwise: yes that should work fine. -Hoss
Re: making rotating timestamped logs from solr output
The entire wording/phrasing of your email leads me to suspect that you are using the example jetty server provided with solr (ie: java -jar start.jar) and that you aren't clear on the distinction between the logs generated by jetty and the logs generated by solr. the simple instance of Jetty thta you get when running java -jar start.jar does request logging into the example/logs directory -- while various debug/info/warn/error messages from the java code are all configured to be logged to the console specificly because it's an example, we want you to see what types of things are logged. For a "real" installation of Solr, i would recommend you look into something line init.d or "services" in windows (i think that's what they are called) to ensure that the servlet container is started as a daemon (independent of your user session). You can then configure your serlvet container to log anyway you want it to... http://wiki.apache.org/solr/SolrLogging That said: "request" logging from your servlet container only knows about the HTTP level request/response information -- it has no way of knowing about things like number of hits. those things are logged by Solr, but there is a single log message per request that does includes this information, so you can configure LogHandlers to direct copies of these specific messages to a special file (i can't remember the pattern off the top of my head) : Hello, : : I would like to log the solr console. although solr logs requests in : timestamped format, this only logs the requests, i.e. does not log : number of hits for a given query, etc. : : is there any easy way to do this other then reverting to methods for : capturing solr output. I usually run solr on my server using screen : command first, running solr, then detaching from console. : : but it would be nice to have output logging instead of request logging. : : best regards, : c.b. : -Hoss
Re: Delta Import by ID
I'm not certain but i think what you want is something like this... deltaQuery="select '${dataimporter.request.do_this_id}'" deltaImportQuery="select ... from destinations where DestID='${dataimporter.delta.id}' " ...and then hit the handler with a URL like.. /dataimport?config=data-config.xml&command=delta-import&do_this_id=XYZ& Normally, the job of deltaQuery is to pick a list of IDs based on the ${dataimporter.last_index_time}, and the ndeleteImportQuery fetches all the data for those Ids -- but in your case you don't care about the last index time, you just want to force it to index a specific id. so you just need to select that id as is in your request params. : : : However I really dont want to use CreationDate, but rather just pass in the : id (as done in the deltaImportQuery) - Can I do that directly - if so how do : I specify the value for dataimporter.delta.id? : : (P.S. sorry for a new thread, I kept getting my mail bounced back when I did : a reply, so I'm trying a new thread.) : -Hoss
Re: Function Query Sorting vs 'Sort' parameter?
(10/07/10 0:54), Saïd Radhouani wrote: Hi, I'm making some basic sorting (date, price, etc.) using the "sort" parameter (sort=field+asc), and it's working fine. I'm wondering whether there's a significant argument to use function query sorting instead of the "sort" parameter? Thanks, -S I'm not sure if I understand your question correctly, but sort by function will be available in next version of Solr: https://issues.apache.org/jira/browse/SOLR-1297 q=ipod&sort=func(price) asc Or you can sort by function via _val_ in Solr 1.4: q=ipod^0 _val_:"func(price)"&sort=score asc Koji -- http://www.rondhuit.com/en/
Re: Sort by Day - Use of DateMathParser in Function Query?
: In https://issues.apache.org/jira/browse/SOLR-1297, : Grant writes: : """ : Note, there is a temporary workaround for this: (main query)^0 : func(...) : """ : : Is that workaround an option for my use case? that would in fact be a workarround for sorting by function where the function uses "ms" to get hte milliseconds of a rounded date field -- however... : > I am using 1.4.1, the date field is configured like this: : > omitNorms="true"/> : > : > (The schema has been created using the schema file from 1.4.0, and I : > haven't changed anything when upgrading to 1.4.1. TrieDate is said to be : > the default in 1.4, so I would expect this date field to have that : > type?) ...somewhere you got confused, or missunderstood something. There is no "default" date field in Solr, there are only recomendations and examples provided in the example schema.xml -- in Solr 1.4.1 *and* in Solr 1.4 the recommended field for dealing with dates is "solr.TrieDateField" As noted in the FunctionQuery wiki page you mentioned, the ms() function does not work with "solr.DateField". (most likely your schema.xml originally started from the example in SOlr 1.3 or earlier ... *OR* ... you needed the sortMissingLast/sortMissingFirst functionality that DateField supports but TrieDateField does not. the 1.4 example schema.xml explains the differences) -Hoss
Re: Function Query Sorting vs 'Sort' parameter?
Yes, indeed, you understood my question. Looking forward to the next version then. To your reply, I'd add that _val_ is used for standard request handler, and bf is used for dismax, right? -S On Jul 10, 2010, at 12:05 AM, Koji Sekiguchi wrote: > (10/07/10 0:54), Saïd Radhouani wrote: >> Hi, >> >> I'm making some basic sorting (date, price, etc.) using the "sort" parameter >> (sort=field+asc), and it's working fine. I'm wondering whether there's a >> significant argument to use function query sorting instead of the "sort" >> parameter? >> >> Thanks, >> -S >> > I'm not sure if I understand your question correctly, > but sort by function will be available in next version of Solr: > > https://issues.apache.org/jira/browse/SOLR-1297 > > q=ipod&sort=func(price) asc > > Or you can sort by function via _val_ in Solr 1.4: > > q=ipod^0 _val_:"func(price)"&sort=score asc > > Koji > > -- > http://www.rondhuit.com/en/ >
Re: Function Query Sorting vs 'Sort' parameter?
(10/07/10 7:15), Saïd Radhouani wrote: Yes, indeed, you understood my question. Looking forward to the next version then. To your reply, I'd add that _val_ is used for standard request handler, and bf is used for dismax, right? -S Right. Koji -- http://www.rondhuit.com/en/
Re: Realtime + Batch indexing
Hi, Thanks a lot for your replies Here is the exact problem I am facing right now.. I have a scheduled batch indexing happening in master every 2 days for 3 sources (Ex: s1, s2, s3) Once the batch indexing gets completed I replicate that to slave instance for user queries. There is one more app which posts the XML (of s3) to SOLR slave instance (to perform real time indexing) and the posted XML can add / update document to the slave index (created by batch indexing). Now since the data posted via XML is also available for batch indexing, If I do a batch indexing for s3 after 2 days and replicate it in slave users should be able to view all data. I am posting just to slave first in order to have a kind of real time indexing where the user can see the results immediately but whenever the XML post happens to SOLR there is a db entry corresponding to that post.. Now I am afraid that I might run in to an issue when someone kicks off real time indexing from the app when batch indexing is in progress as the batch indexing might not pick up the changes made to slave at that time (when the batch indexing is in progress). Has anyone faced this kind of scenario.. My ideal solution is that I should be able to do real time (XML post) / batch indexing at same time and also I cant use shards as real time data may even need to update the existing index (not just add a new document)..My assumption is that I can use shards if we are going to maintain index separately for real time / batch indexing but if I need to update an existing document using XML post I don't think Shards would work... I also thought of doing this.. I will always write both XML post / batch indexing to Master and do a replication to slave every 15 seconds.. even in this case if I am doing a batch indexing I suppose SOLR will lock the index files and I wont be able to do a XML push to the same index at that time.. please correct me if I am wrong.. Any suggestion / thoughts would be greatly appreciated. Thanks, BB -- View this message in context: http://lucene.472066.n3.nabble.com/Realtime-Batch-indexing-tp952293p955442.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: PDF remote streaming extract with lots of multiValues
POSTing the individual parameters (literal.id, literal.mycategory, literal.mycategory) as name value pairs to 1.4's /update/extract does work. I just realized the POST's content type hadn't been set to 'application/x-www-form-urlencoded'. Set it to that and it accepts all the parameters. -dKt From: David Thompson To: solr-user@lucene.apache.org Sent: Fri, July 9, 2010 12:17:59 PM Subject: PDF remote streaming extract with lots of multiValues How would I go about setting a large number of literal values in a call to index a remote PDF? I'm currently calling: http://host/solr/update/extract?literal.id=abc&literal.mycategory=blah&stream.url=http://otherhost/some/file.pdf And that works great, except now I'm coming across usecases where I need send in hundreds, up to thousands, of different values for 'mycategory'. So with mycategory defined as a multiValued string, I can call: http://host/solr/update/extract?literal.id=abc&literal.mycategory=blah&literal.mycategory=foo&literal.mycategory=bar&stream.url=http://otherhost/some/file.pdf and that works as expected. But when I try to embed thousands of literal.mycategory parameters in the call, eventually my container says 'look, I've been forgiving about letting you GET URLs far longer than 1500 characters, but this is ridiculous' and barfs on it. I've tried POSTing a ... command, but it only pays attention to parameters in the URL query string, ignoring everything in the document. I've seen some other threads that seem related, but now I'm just confused. What's the best way to tackle this? -dKt
Problem with linux
I have problems when i execute my prog on linux having this following piece of code. { Document d; Analyzer analyzer = new PorterStemAnalyzer(); System.out.println("1"); Directory index = FSDirectory.open(new File("index1")); System.out.println("2"); IndexWriter w = new IndexWriter(index, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED ) ; // MY PROG HANGS UP HERE System.out.println("3"); . . . } Strangely this exact prog runs well on windows. It simply hangs up(doesnt halt) while creating the IndexWriter object in linux. The account via which im logged in has sufficient rights for the concerned folder. -Sarfaraz
Re: Realtime + Batch indexing
It's possible to get near real-time adds and updates (every two minutes in our case) with a multi-shard setup, if you have a shard dedicated to new content and have the right combination of unique identifiers on your data. I'll respond off-list with a full description of my setup. On 7/9/2010 4:41 PM, bbarani wrote: I have a scheduled batch indexing happening in master every 2 days for 3 sources (Ex: s1, s2, s3) Once the batch indexing gets completed I replicate that to slave instance for user queries. There is one more app which posts the XML (of s3) to SOLR slave instance (to perform real time indexing) and the posted XML can add / update document to the slave index (created by batch indexing). Now since the data posted via XML is also available for batch indexing, If I do a batch indexing for s3 after 2 days and replicate it in slave users should be able to view all data. I am posting just to slave first in order to have a kind of real time indexing where the user can see the results immediately but whenever the XML post happens to SOLR there is a db entry corresponding to that post.. Now I am afraid that I might run in to an issue when someone kicks off real time indexing from the app when batch indexing is in progress as the batch indexing might not pick up the changes made to slave at that time (when the batch indexing is in progress). Has anyone faced this kind of scenario.. My ideal solution is that I should be able to do real time (XML post) / batch indexing at same time and also I cant use shards as real time data may even need to update the existing index (not just add a new document)..My assumption is that I can use shards if we are going to maintain index separately for real time / batch indexing but if I need to update an existing document using XML post I don't think Shards would work... I also thought of doing this.. I will always write both XML post / batch indexing to Master and do a replication to slave every 15 seconds.. even in this case if I am doing a batch indexing I suppose SOLR will lock the index files and I wont be able to do a XML push to the same index at that time.. please correct me if I am wrong..
Re: Field Collapsing SOLR-236
Hi Rakhi, Sorry, I didn't see this email until just now. Did you get it working? If not here's some things that might help. - Download the patch first. - Check the date on which the patch was released. - Download the version of the trunk that existed at that date. - Apply the patch using the patch program in linux. There is a Windows program for patching but I can't remember right now. - After applying the patch just compile the whole thing It might be better if you used the example folder first and modify the config to work for multicore (at least that's what I did) . You can compile example by doing ant example (if I remember correctly) For config stuff refer to this link : http://wiki.apache.org/solr/FieldCollapsing HTH :) - Moazzam I'd give you the On Wed, Jun 23, 2010 at 7:23 AM, Rakhi Khatwani wrote: > Hi, > But these is almost no settings in my config > heres a snapshot of what i have in my solrconfig.xml > > > > > > multipartUploadLimitInKB="2048" /> > > > default="true" /> > > class="org.apache.solr.handler.admin.AdminHandlers" /> > > > > *:* > > > > class="org.apache.solr.handler.component.CollapseComponent" /> > > > Am i goin wrong anywhere? > Regards, > Raakhi > > On Wed, Jun 23, 2010 at 3:28 PM, Govind Kanshi wrote: > >> fieldType:analyzer without class or tokenizer & filter list seems to point >> to the config - you may want to correct. >> >> >> On Wed, Jun 23, 2010 at 3:09 PM, Rakhi Khatwani >> wrote: >> >> > Hi, >> > I checked out modules & lucene from the trunk. >> > Performed a build using the following commands >> > ant clean >> > ant compile >> > ant example >> > >> > Which compiled successfully. >> > >> > >> > I then put my existing index(using schema.xml from solr1.4.0/conf/solr/) >> in >> > the multicore folder, configured solr.xml and started the server >> > >> > When i type in http://localhost:8983/solr >> > >> > i get the following error: >> > org.apache.solr.common.SolrException: Plugin init failure for >> [schema.xml] >> > fieldType:analyzer without class or tokenizer & filter list >> > at >> > >> > >> org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:168) >> > at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:480) >> > at org.apache.solr.schema.IndexSchema.(IndexSchema.java:122) >> > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:429) >> > at org.apache.solr.core.CoreContainer.load(CoreContainer.java:286) >> > at org.apache.solr.core.CoreContainer.load(CoreContainer.java:198) >> > at >> > >> > >> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:123) >> > at >> > >> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:86) >> > at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97) >> > at >> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) >> > at >> > >> > >> org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:662) >> > at org.mortbay.jetty.servlet.Context.startContext(Context.java:140) >> > at >> > >> > >> org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1250) >> > at >> > org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:517) >> > at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:467) >> > at >> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) >> > at >> > >> > >> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) >> > at >> > >> > >> org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156) >> > at >> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) >> > at >> > >> > >> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) >> > at >> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) >> > at >> > org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130) >> > at org.mortbay.jetty.Server.doStart(Server.java:224) >> > at >> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) >> > at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985) >> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> > at >> > >> > >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> > at >> > >> > >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> > at java.lang.reflect.Method.invoke(Method.java:597) >> > at org.mortbay.start.Main.invokeMain(Main.java:194) >> > at org.mortbay.start.Main.start(Main.java:534) >> > at org.mortbay.start.Main.start(Main.java:441) >> > at org.mortbay.start.Main.main(Main.java:119) >> > Caused by: org.apache.solr.common.SolrException: analyzer without class >> or >> > tokenizer & filter list >> > at org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:908) >> > at org.apache.solr.schema.IndexSchema.access