Custom sort (score + custom value)
Hi, I want to implement a custom sort in Solr based on a combination of relevance (Solr gives me it yet => score) and a custom value I've calculated previously for each document. I see two options: 1. Use a function query (I'm using a DisMaxRequestHandler). 2. Create a component that set SortSpec with a sort that has a custom ComparatorSource (similar to QueryElevationComponent). The first option has the problem: While the relevance value changes for every query, my custom value is constant for each doc. It implies queries with documents that have high relevance are less affected with my custom value. On the other hand, queries with low relevance are affected a lot with my custom value. Can it be proportional with a function query? (i.e. docs with low relevance are less affected by my custom value). The second option has the problem: Solr score isn't normalized. I need it normalized in order to apply my custom value in the sortValue function in ScoreDocComparator. What do you think? What's the best option in that case? Another option? Thank you in advance, George
Re: Custom sort (score + custom value)
Ok Yonik, thank you. I've tried to execute the following query: "{!boost b=log(myrank) defType=dismax}q" and it works great. Do you know if I can do the same (combine a DisjunctionMaxQuery with a BoostedQuery) in solrconfig.xml? George On Sun, Nov 2, 2008 at 3:01 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > On Sun, Nov 2, 2008 at 5:09 AM, George <[EMAIL PROTECTED]> wrote: > > I want to implement a custom sort in Solr based on a combination of > > relevance (Solr gives me it yet => score) and a custom value I've > calculated > > previously for each document. I see two options: > > > > 1. Use a function query (I'm using a DisMaxRequestHandler). > > 2. Create a component that set SortSpec with a sort that has a custom > > ComparatorSource (similar to QueryElevationComponent). > > > > The first option has the problem: While the relevance value changes for > > every query, my custom value is constant for each doc. > > Yes, that can be an issue when adding unrelated scores. > Multiplying them might give you better results: > > http://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlugin.html > > -Yonik >
Re: Custom sort (score + custom value)
Todd: Yes, I looked into these arguments before I found the problem I described in the first email. Yonik: It's exactly what I was looking for. George On Mon, Nov 3, 2008 at 7:10 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > On Mon, Nov 3, 2008 at 12:37 PM, George <[EMAIL PROTECTED]> wrote: > > Ok Yonik, thank you. > > > > I've tried to execute the following query: "{!boost b=log(myrank) > > defType=dismax}q" and it works great. > > > > Do you know if I can do the same (combine a DisjunctionMaxQuery with a > > BoostedQuery) in solrconfig.xml? > > Do you mean set it as a default for a handler in solrconfig.xml? That > should work. > You could set a default of q={!boost b=log(myrank) defType=dismax v=$uq} > then all the client would have to pass in is uq (the user query) > > -Yonik >
Index an Oracle DATE column with a Solr DateField
Hi, I've an Oracle DATE column that I want to index with a Solr DateField. The part of my schema.xml looks like this: I use DataImportHandler. When I do a search, this field is returned with one day before. Oracle: 2006-12-10. Solr: 2006-12-09T23:00:00Z If I index it as a String, it's indexed as expected (with the same "string date" as I see it in Oracle) Does anyone know where the problem is? Thanks in advance
DataImportHandler Full Import completed successfully after SQLException
Hi, Yesterday I found out the following exception trying to index from an Oracle Database in my indexing process: 2009-06-23 14:57:29,205 WARN [org.apache.solr.handler.dataimport.JdbcDataSource] Error reading data java.sql.SQLException: ORA-01555: snapshot too old: rollback segment number 1 with name "_SYSSMU1$" too small at oracle.jdbc.driver.SQLStateMapping.newSQLException(SQLStateMapping.java:70) at oracle.jdbc.driver.DatabaseError.newSQLException(DatabaseError.java:110) at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:171) at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:455) at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:413) at oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:1030) at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:183) at oracle.jdbc.driver.T4CStatement.fetch(T4CStatement.java:1000) at oracle.jdbc.driver.OracleResultSetImpl.close_or_fetch_from_next(OracleResultSetImpl.java:314) at oracle.jdbc.driver.OracleResultSetImpl.next(OracleResultSetImpl.java:228) at org.jboss.resource.adapter.jdbc.WrappedResultSet.next(WrappedResultSet.java:1184) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:326) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$700(JdbcDataSource.java:223) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:258) at org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:73) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:231) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:335) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:224) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:167) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:316) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:374) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:355) 2009-06-23 14:57:29,206 INFO [org.apache.solr.handler.dataimport.DocBuilder] Full Import completed successfully 2009-06-23 14:57:29,206 INFO [org.apache.solr.update.UpdateHandler] start commit(optimize=true,waitFlush=false,waitSearcher=true) As you can see, Full Import completed successfully indexing a part (about 7) of all expected documents (about 15). I don't know if it is a bug or not but certainly it's not the behaviour I expect in this situation. It should have rolled back, shouldn't it? Reading Solr code I can see that in line 314 of JdbcDataSource.java it throws a DataImportHandlerException with SEVERE errCode so I can't understand why my indexing process finishes correctly. I'm working with Solr trunk version (rev. 785397) and no custom properties (i.e. onError value is default 'abort') in DataImportHandler. George
Re: DataImportHandler Full Import completed successfully after SQLException
Noble, thank you for fixing this issue! :) 2009/6/25 Noble Paul നോബിള് नोब्ळ् > OK , this should be a bug with JdbcDataSource. > > look at the line > > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:326) > > it is eating up the exception and logs error and goes back. I shall > raise an issue > > thanks > > > On Wed, Jun 24, 2009 at 11:12 PM, George wrote: > > Hi, > > Yesterday I found out the following exception trying to index from an > Oracle > > Database in my indexing process: > > > > 2009-06-23 14:57:29,205 WARN > > [org.apache.solr.handler.dataimport.JdbcDataSource] Error reading data > > java.sql.SQLException: ORA-01555: snapshot too old: rollback segment > number > > 1 with name "_SYSSMU1$" too small > > > > at > > > oracle.jdbc.driver.SQLStateMapping.newSQLException(SQLStateMapping.java:70) > > at > oracle.jdbc.driver.DatabaseError.newSQLException(DatabaseError.java:110) > > at > > > oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:171) > > at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:455) > > at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:413) > > at oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:1030) > > at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:183) > > at oracle.jdbc.driver.T4CStatement.fetch(T4CStatement.java:1000) > > at > > > oracle.jdbc.driver.OracleResultSetImpl.close_or_fetch_from_next(OracleResultSetImpl.java:314) > > at > oracle.jdbc.driver.OracleResultSetImpl.next(OracleResultSetImpl.java:228) > > at > > > org.jboss.resource.adapter.jdbc.WrappedResultSet.next(WrappedResultSet.java:1184) > > at > > > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:326) > > at > > > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$700(JdbcDataSource.java:223) > > at > > > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:258) > > at > > > org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:73) > > at > > > org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) > > at > > > org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:231) > > at > > > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:335) > > at > > > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:224) > > at > > > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:167) > > at > > > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:316) > > at > > > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:374) > > at > > > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:355) > > 2009-06-23 14:57:29,206 INFO > > [org.apache.solr.handler.dataimport.DocBuilder] Full Import completed > > successfully > > 2009-06-23 14:57:29,206 INFO [org.apache.solr.update.UpdateHandler] > start > > commit(optimize=true,waitFlush=false,waitSearcher=true) > > > > As you can see, Full Import completed successfully indexing a part (about > > 7) of all expected documents (about 15). I don't know if it is a > bug > > or not but certainly it's not the behaviour I expect in this situation. > It > > should have rolled back, shouldn't it? > > > > Reading Solr code I can see that in line 314 of JdbcDataSource.java it > > throws a DataImportHandlerException with SEVERE errCode so I can't > > understand why my indexing process finishes correctly. > > > > I'm working with Solr trunk version (rev. 785397) and no custom > properties > > (i.e. onError value is default 'abort') in DataImportHandler. > > > > George > > > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com >
JSON facet not working with dates
Hi all, I am using solr 6.5.0, and I want to do pivot faceting including a date field. My simple facet.json is: { "dates": { "type": "range", "field": "observationStart.TimeOP", "start": "3000-01-01T00:00:00Z", "end": "3000-01-02T00:00:00Z", "gap": "%2B15MINUTE", "facet": { "x": "sum(trafficCnt)" } } } What I get back is an error though: error-class","org.apache.solr.common.SolrException", "root-error-class","org.apache.solr.common.SolrException"], "msg":"Unable to range facet on field:observationStart.TimeOP{type=date_range,properties=indexed,stored,omitTermFreqAndPositions,useDocValuesAsStored}" On the other hand, if I use the old interface, it seems to work: "facet":"on", "facet.range.start":"3000-01-01T00:00:00Z", "facet.range.end":"3000-01-01T00:00:00Z+1DAY" "facet.range.gap":"+15MINUTE" I get: "facet_ranges":{ "observationStart.TimeOP":{ "counts":[ "3000-01-01T00:00:00Z",258, "3000-01-01T00:15:00Z",261, "3000-01-01T00:30:00Z",258, "3000-01-01T00:45:00Z",254, ... My date fields are of type solr.DateRangeField. Searching for the error I get, I found this source file: https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/facet/FacetRange.java Where in line 180 it has "if (ft instanceof TrieField || ft.isPointField()". Is it related to my problem? Is the new json facet interface not working with date ranges? Regards, George
Solr Grouping - sorting groups based on the sum of the scores of the documents within each group
Currently, solr grouping (http://wiki.apache.org/solr/FieldCollapsing) sorts groups "by the score of the top document within each group". E.g. [...] "groups":[{ "groupValue":"81cb63020d0339adb019a924b2a9e0c2", "doclist":{"numFound":9,"start":0,"maxScore":4.729042,"docs":[ { "id":"7481df771afe39fab368ce19dfeeb528", [...], "score":4.729042}, { "id":"c879e95b5f16343dad8b1248133727c2", [...], "score":4.6635237}, { "id":"485b9aec90fd3ef381f013c51ab6a4df", [...], "score":4.347174}] }}, [...] Is there an out-of-the-box way to instead sort groups by the sum of the scores of the documents within each group? E.g. [...] "groups":[{ "groupValue":"81cb63020d0339adb019a924b2a9e0c2", "doclist":{"numFound":9,"start":0,*"scoreSum":13.739738*,"docs":[ { "id":"7481df771afe39fab368ce19dfeeb528", [...], "score":4.729042}, { "id":"c879e95b5f16343dad8b1248133727c2", [...], "score":4.6635237}, { "id":"485b9aec90fd3ef381f013c51ab6a4df", [...], "score":4.347174}] }}, [...] With the release of sorting by Function Query ( https://issues.apache.org/jira/browse/SOLR-1297), it seems that there should be a way to use the sum() function ( http://wiki.apache.org/solr/FunctionQuery). But it's not quite close enough since the "score" field is not part of the documents. I feel like I'm close but I'm missing some obvious piece. I'm using Solr 3.5. Thank you in advance for your time.
Alternate score-based sorting for Solr Grouping
My previous subject line was not very scannable. Apologies for the re-post, I'm just hoping to get more eye-balls and hopefully some insights. Thank you in advance for your time. See below. -GS On Mon, Dec 5, 2011 at 1:37 PM, George Stathis wrote: > Currently, solr grouping (http://wiki.apache.org/solr/FieldCollapsing) > sorts groups "by the score of the top document within each group". E.g. > > [...] > "groups":[{ > "groupValue":"81cb63020d0339adb019a924b2a9e0c2", > "doclist":{"numFound":9,"start":0,"maxScore":4.729042,"docs":[ > { > "id":"7481df771afe39fab368ce19dfeeb528", > [...], > "score":4.729042}, > { > "id":"c879e95b5f16343dad8b1248133727c2", > [...], > "score":4.6635237}, > { > "id":"485b9aec90fd3ef381f013c51ab6a4df", > [...], > "score":4.347174}] > }}, > [...] > > Is there an out-of-the-box way to instead sort groups by the sum of the > scores of the documents within each group? E.g. > > [...] > "groups":[{ > "groupValue":"81cb63020d0339adb019a924b2a9e0c2", > "doclist":{"numFound":9,"start":0,*"scoreSum":13.739738*,"docs":[ > { > "id":"7481df771afe39fab368ce19dfeeb528", > [...], > "score":4.729042}, > { > "id":"c879e95b5f16343dad8b1248133727c2", > [...], > "score":4.6635237}, > { > "id":"485b9aec90fd3ef381f013c51ab6a4df", > [...], > "score":4.347174}] > }}, > [...] > > With the release of sorting by Function Query ( > https://issues.apache.org/jira/browse/SOLR-1297), it seems that there > should be a way to use the sum() function ( > http://wiki.apache.org/solr/FunctionQuery). But it's not quite close > enough since the "score" field is not part of the documents. > > I feel like I'm close but I'm missing some obvious piece. I'm using Solr > 3.5. > > Thank you in advance for your time. >
anybody using solr with Cassandra?
Hi, Is anybody using Solr with Cassandra? Are there any Gotcha's? Thanks --Siju
Re: anybody using solr with Cassandra?
Thanks a million Nick, We are currently debating whether we should use cassandra or membase or hbase with solr. Do you have anything to contribute as advice to us? Thanks again :-) --Siju On Tue, Aug 31, 2010 at 5:15 AM, nickdos wrote: > > Yes, we are Cassandra. There is nothing much to say really, it just works. > Note we are SOLR generating indexes using Java & SolrJ (embedded mode) and > reading data out of Cassandra with Java. Index generation is fast. > -- > View this message in context: > http://lucene.472066.n3.nabble.com/anybody-using-solr-with-Cassandra-tp1383646p1391589.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: anybody using solr with Cassandra?
We will be suing Solr for indexing and Cassandra/Membase/Hbase instead of a database. That is the idea now unless some body gives a better solution :-) thanks --Siju On Tue, Aug 31, 2010 at 11:39 AM, Amit Nithian wrote: > I am curious about this too.. are you talking about using HBase/Cassandra > as > an aux store of large data or using Cassandra to store the actual lucene > index (as in LuCandra)? > > On Mon, Aug 30, 2010 at 11:06 PM, Siju George > wrote: > > > Thanks a million Nick, > > > > We are currently debating whether we should use cassandra or membase or > > hbase with solr. > > Do you have anything to contribute as advice to us? > > > > Thanks again :-) > > > > --Siju > > > > On Tue, Aug 31, 2010 at 5:15 AM, nickdos > wrote: > > > > > > > > Yes, we are Cassandra. There is nothing much to say really, it just > > works. > > > Note we are SOLR generating indexes using Java & SolrJ (embedded mode) > > and > > > reading data out of Cassandra with Java. Index generation is fast. > > > -- > > > View this message in context: > > > > > > http://lucene.472066.n3.nabble.com/anybody-using-solr-with-Cassandra-tp1383646p1391589.html > > > Sent from the Solr - User mailing list archive at Nabble.com. > > > > > >
SOLR geospatial
In looking at some of the docs support for geospatial search. I see this functionality is mostly scheduled for upcoming release 4.0 (with some playing around with backported code). I note the support for the bounding box filter, but will "bounding box" be one of the supported *data* types for use with this filter? For example, if my lat/long data describes the "footprint" of a map, I'm curious if that type of coordinate data can be used by the bounding box filter (or in any other way for similar limiting/filtering capability). I see it can work with point type data but curious about functionality with bounding box type data (in contrast to simple point lat/long data). Thanks, George
RE: Near Real Time
> > Further without the NRT features present what's the closest I can > > expect to real time for the typical use case (obviously this will vary > > but the average deploy). One hour? One Minute? It seems like there are > > a few hacks to get somewhat close. Thanks so much. > > Depends a lot on the nature of the requests and the size of the index, > but one minute is often doable. > On a large index that facets on many fields per request, one minute is > probably still out of reach. With no facets, what index size is consider, in general, out of reach for NRT? Is a 9GB index with 7 million records out of reach? How about 3GB with 3 million records? 3GB with 800K records? This is for 1 min. NRT setting. Thanks. -- George
Solr* != solr*
Hi Folks, Can someone tell me what I might have setup wrong? After indexing my data, I can search just fine on, let say "sol*" but not on "Sol*" (note upper case 'S' vs. lower case 's') I get 0 hits. Here is my customize schema.xml setting: Btw, "Solr", "solr", "sOlr", etc. works. It's a problem with wild cards. Thanks in advance. -- George
schema.xml for CJK, German, French, etc.
Hi Folks, Has anyone created schema.xml for languages other then English? I like to see a working example mainly for CJK, German and French. If you have can you share them? TO get me started, I created the following for German: Will those filters work on German text? Thanks. -- George
RE: schema.xml for CJK, German, French, etc.
Thanks Erik! Trouble is, I don't know those languages to conclude that my setup is correct, specially for CJK. It's less problematic for European languages, but then again, should I be using those English filters with the German SnowballPorterFilterFactory? That is, will WordDelimiterFilterFactory work with a German filter? Etc. It would be nice if folks share their setting (Generic for each language) and then we can add them to a Solr Wiki. -- George > -Original Message- > From: Erik Hatcher [mailto:[EMAIL PROTECTED] > Sent: Wednesday, July 02, 2008 9:40 PM > To: solr-user@lucene.apache.org > Subject: Re: schema.xml for CJK, German, French, etc. > > > On Jul 2, 2008, at 9:16 PM, George Aroush wrote: > > Has anyone created schema.xml for languages other then English? > > Indeed. > > > I like to > > see a working example mainly for CJK, German and French. > If you have > > can you share them? > > > > TO get me started, I created the following for German: > > > > > > > > > > > words="stopwords.txt"/> > > > generateWordParts="0" > > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > > catenateAll="0"/> > > > > > language="German" /> > > > > > > > > > > Will those filters work on German text? > > > One tip that will help is visiting > http://localhost:8983/solr/admin/analysis.jsp > and test it out to see that you're getting the tokenization > that you desire on some sample text. Solr's analysis > introspection is quite nice and easy to tinker with. > > Removing stop words before lower casing won't quite work > though, as StopFilter is case-sensitive with all stop words > generally lowercased, but other than relocating the > StopFilterFactory in that chain it seems reasonable. > > As always, though, it depends on what you want to do with > these languages to offer more concrete recommendations. > > Erik >
RE: Solr* != solr*
Hi Erik and all, I'm still trying to solve this issue and I like to know how others might have solved it in their client. I can't modify Solr / Lucene code and I'm using Solr 1.2. What I have done is simple. Given a user input, I break it into words and then analyze each word. Any word contains wildcards (* Or ?) I lowercase it. While the logic is simple, I'm not comfortable with it because the word-breaker isn't based on the analyzer in use by Lucene. In my case, I can't tell which analyzer is used. So my question is, did you run into this problem, if so, how did you workaround it? That is, is breaking on generic whitespaces (independent of the analyzer in use) "good enough"? Thanks. -- George > -Original Message- > From: Erik Hatcher [mailto:[EMAIL PROTECTED] > Sent: Tuesday, July 01, 2008 9:35 PM > To: solr-user@lucene.apache.org > Subject: Re: Solr* != solr* > > George - wildcard expressions, in Lucene/Solr's QueryParser, > are not analyzed. There is one trick in the API that isn't > yet wired to > Solr's configuration, and that is setLowercaseExpandedTerms(true). > This would solve the Sol* issue because when indexed all > terms for the "text" field are lowercased during analysis. > > An functional alternative, of course, is to have the client > lowercase the query expression before requesting to Solr > (careful, though - consider AND/OR/NOT). > > Erik > > > > On Jul 1, 2008, at 8:14 PM, George Aroush wrote: > > > Hi Folks, > > > > Can someone tell me what I might have setup wrong? After > indexing my > > data, I can search just fine on, let say "sol*" but not on "Sol*" > > (note upper case 'S' vs. lower case 's') I get 0 hits. > > > > Here is my customize schema.xml setting: > > > > > positionIncrementGap="100"> > > > > > > > > > words="stopwords.txt"/> > > > generateWordParts="0" generateNumberParts="1" catenateWords="1" > > catenateNumbers="1" catenateAll="0"/> > > > > > protected="protwords.txt"/> > > > > > > > > > > > > > words="stopwords.txt"/> > > > generateWordParts="0" generateNumberParts="1" catenateWords="1" > > catenateNumbers="1" catenateAll="0"/> > > > > > protected="protwords.txt"/> > > > > > > > > > > Btw, "Solr", "solr", "sOlr", etc. works. It's a problem with wild > > cards. > > > > Thanks in advance. > > > > -- George >
Re: Can Solr be used to search public websites(Newbie).
Dear Con, Searching the entire Internet is a non-trivial computer science problem. It's kind of like asking a brain surgeon the best way to remove a tumor. The answer should be "First, spend 16 years becoming a neurosurgeon". My point is, there is a whole lot you need to know beyond "is Solr the correct tool for the job". However, the short answer is that Nutch is probably better suited for what you want to do, when you get the funding, hardware and expertise to do it. I'm not mocking or denigrating you in any way, but I think you need to do a bit more basic research in how search engines work. I found this very readable and accurate site the other day: http://www.tbray.org/ongoing/When/200x/2003/07/30/OnSearchTOC Regards, George On Sep 17, 2008, at 8:39 AM, convoyer wrote: Hi all. I am quite new to solr. I am just checking whether this tool suits my application. I am developing a search application that searches all publically available websites and also some selective websites. Can I use solr for this purpose. If yes how can I get started. All the tutorials are pointing to load data from a xml file and search those values..:-(:-( . Instead how can I give the URL of website and search contents of that site(just like in nutch).. Expecting reply thanks in advance con -- View this message in context: http://www.nabble.com/Can-Solr-be-used-to-search-public-websites%28Newbie%29.-tp19531227p19531227.html Sent from the Solr - User mailing list archive at Nabble.com.
Commit frequency
Hi Folks, I'm trying to collect some data -- if you can share them -- about the frequency of commit you have set on your index and at what rate did you find it acceptable. This is for a none-master / slave setup. For my case, in a test environment, I have experimented with a 1 minute interval (each 1 minute I commit anywhere between 0 to 10 new documents, and 0 to 10 updated documents). While commit is ongoing, I'm also searching on the index. For this experiment, my index size is about 3.5 Gb, and I have about 1.2 million documents. My experiment was done on a Windows 2003 server, with 4 Gb RAM and 3 GHZ 2x Xean CPU. So, if you can share your setup, at least the commit frequency, I would appreciate it. What I'm trying to get out of this, is what's the lowest commit frequency that Solr can handle. Regards, -- George
some hyphenated words not found
I have a nearly generic out-of-box installation of solr. When I search on a short text document containing a few hyphenated words, I get hits on *some* of the words, but not all. I'm quite puzzled as to why. I've checked that the text is only plain ascii. How can I find out what's wrong? In the file below, solr finds life-long, but not love-lorn. Here's the file: This is a small sample document just to insure that a type *.doc can be accessed by X Documentation. It is sung to the moon by a love-lorn loon, who fled from the mocking throng O! It’s the song of a merryman, moping mum, whose soul was sad and whose glance was glum. Misery me — lack-a-day-dee! He sipped no sup, and he craved no crumb, As he sighed for the love of a ladye! Who sipped no sup, and who craved no crumb, As he sighed for the love of a ladye. Heighdy! heighdy! Misery me — lack-a-day-dee! He sipped no sup, and he craved no crumb, As he sighed for the love of a ladye! I have a song to sing, O! Sing me your song, O! It is sung with the ring Of the songs maids sing Who love with a love life-long, O! It's the song of a merrymaid, peerly proud, Who loved a lord, and who laughed aloud At the moan of the merryman, moping mum, Whose soul was sad, and whose glance was glum, Who sipped no sup, and who craved no crumb, As he sighed for the love of a ladye! Heighdy! heighdy! Misery me — lack-a-day-dee! He sipped no sup, and he craved no crumb, As he sighed for the love of a ladye! -- georgeryo...@gmail.com
Re: TikaEntityProcessor not working?
Which version of Tika do you have? There was a problem introduced somewhere between Tika 0.6 and Tika 0.7 whereby the TikaConfig method config.getParsers() was returns an empty parser list due to class loader scope issues with Solr running under an application server. There is a fix in the Tika 0.8 branch and I note that a 0.8 snapshot of Tika is including in the Solr trunk. I've not tried to get this to work and am not sure what config is needed to make this work. I simply installed Tika 0.6 which can be dowloaded from the apache tika website. -- View this message in context: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p867572.html Sent from the Solr - User mailing list archive at Nabble.com.
Indexing stops after exception
I have a list of files in a database that I am indexing (it is a liferay database and the file lists are attachments). I'm encountering the following error https://issues.apache.org/jira/browse/PDFBOX-709 on one of the PDF documents and this causes indexing to stop (the TikaEntityProcessor) throws a Severe exception. Is it possible to ignore this exception and continue indexing by some kind of solr configuration ? It seems reasonable to do this in my case as I do not want indexing to stop due to a non-critical error beyond my control. Currently I've modified the TikaEntityProcessor to return null in this case. BTW shouldn't the inputstream close be in a finally block? -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-stops-after-exception-tp867608p867608.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Resume Solr indexing CSV after exception
I modified TikaEntityProcessor to ignore these exceptions.: If the Tika Entity processor encounters an exception it will stop indexing. I had to make two fixes to TikaEntityProcessor to work around this problem. >From the Solr SVN trunk edit the file: ~/src/solr-svn/trunk/solr/contrib/dataimporthandler/src/extras/main/java/org/apache/solr/handler/dataimport/TikaEntityProcessor.jar First of all if a file is not found on the disk we want to continue indexing. At the top of nextRow() add File f = new File (context.getResolvedEntityAttribute(URL)); if (! f.exists()) { return null; } Secondly if the document parser throws an error, for example certain PDF revisions can cause the PDFBox parser to barf, we will trap the exception and continue: try { tikaParser.parse(is, contentHandler, metadata , new ParseContext()); } catch (Exception e) { return null; } finally { IOUtils.closeQuietly(is); } We will also close IOUtils in the finally section which is not done in the original code. Build and deploy the extras.jar in the solr-instance/lib directory. see also: http://www.abcseo.com/tech/search/solr-and-liferay-integration -- View this message in context: http://lucene.472066.n3.nabble.com/Resume-Solr-indexing-CSV-after-exception-tp878801p888143.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Resume Solr indexing CSV after exception
cool I will try that. -- View this message in context: http://lucene.472066.n3.nabble.com/Resume-Solr-indexing-CSV-after-exception-tp878801p888605.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: [slightly ot] Looking for Lucene/Solr consultant in Germany
Dear Jan, I just saw your post on the SOLR mailing list. I hope I'm not too late. First of, I don't exactly match your required qualifications. I do have 9 years at Verity and 1 year at Autonomy in enterprise search, however. I'm in the middle of coming up to speed on SOLR and applying my considerable expertise in general Enterprise Search to the SOLR/Lucene platform. So, your specific requirements for a Lucene/SOLR expert are not quite met. But, I've been in the business of enterprise search for 10 years. Think if it as asking an Oracle expert to look at your MySQL implementation. My normal rate is USD 200/hour, and I do command that rate more often than not. I'd be interested in taking on the challenge in my spare time, free of charge, just to get my bearings and to see how my consulting skills translate from the closed-source Verity/IDOL world to the open source world. I think this could be beneficial to both of us: I would get some expertise in specific SOLR idiosyncrasies, and you would get the benefit of 10 years of general enterprise search experience. I've been studying SOLR and Lucene, and even developing my own project using them as a basis. That being said, I expect to make some mistakes as I try to match my existing skill set with what's available in SOLR. Fortunately, I found that with the transition from Verity K2 to Autonomy IDOL the underlying concepts of full-text search are pretty much universal. Another fly in the ointment is that I live in the USA (St. Pete Beach, Florida to be exact), so there would be some time zone issues. Also, I don't speak German, which will be a handicap when it comes to analyzing stemming options. If you can live with those limitations, I'd be happy to help. Let me know if you're interested. George Everitt Applied Relevance LLC [EMAIL PROTECTED] Tel: +1 (727) 641-4660 Fax: +1 (727) 233-0672 On Aug 8, 2007, at 12:43 PM, Jan Miczaika wrote: Hello, we are looking for a Lucene/Solr consultant in Germany. We have set up a Lucene/Solr server (currently live at http://www.hitflip.de). It returns search results, but the results are not really very good. We have been tweaking the parameters a bit, following suggestions from the mailing list, but are unsure of the effects this has. We are looking for someone to do the following: - analyse the search patterns on our website - define a methodology for defining the quality of search - analyse the data we have available - specify which data is required in the index - modify the search patterns used to query the data - test and evaluate the results The requirements: deep knowledge of Lucene/Solr, examples of implemented working search engines, theoretical knowledge Is anyone interested? Please feel free to circulate this offer. Thanks in advance Jan -- Geschäftsführer / Managing Director Hitflip Media Trading GmbH Gürzenichstr. 7, 50667 Köln http://www.hitflip.de - new: http://www.hitflip.co.uk Tel. +49-(0)221-272407-27 Fax. 0221-272407-22 (that's so 1990s) HRB 59046, Amtsgericht Köln Geschäftsführer: Andre Alpar, Jan Miczaika, Gerald Schönbucher
MoreLikeThis throwing NPE
I have been trying the MLT Query using EmbeddedSolr and SolrJ clients, which is resulting in NPE. It looks like a problem as mentioned here https://issues.apache.org/jira/browse/LUCENE-819 If that is the case,how to fix this? The MLT field has been stored with termvectors. The query i used and the exception is below. In debug mode Lucene term.hascode() has been receiving some fields in the index and breaking with the id field, I was hoping to see the exception happening bcos of the mlt fields/fieldValues. But it didn't. Can somebody help me please, i have already spent a whole saturday night with the trunk code ;-( SolrQuery q = new SolrQuery(); q.setQuery( "id:11"); q.addFacetField("l"); q.setFacet(true); q.setFacetMinCount(1); q.setParam("mlt", true); q.setParam("mlt.fl","field1"); q.setParam("mlt.minwl","1"); q.setParam("mlt.mintf","1"); q.setParam("mlt.mindf","1"); QueryResponse response = server.query( q ); SEVERE: java.lang.NullPointerException at org.apache.lucene.index.Term.hashCode(Term.java:78) at org.apache.lucene.search.TermQuery.hashCode (TermQuery.java:175) at org.apache.lucene.search.BooleanClause.hashCode(BooleanClause.java :108) at java.util.AbstractList.hashCode(AbstractList.java:630) at org.apache.lucene.search.BooleanQuery.hashCode (BooleanQuery.java :445) at org.apache.solr.search.QueryResultKey.(QueryResultKey.java:47) at org.apache.solr.search.SolrIndexSearcher.getDocListC( SolrIndexSearcher.java:725) at org.apache.solr.search.SolrIndexSearcher.getDocListAndSet ( SolrIndexSearcher.java:1241) at org.apache.solr.handler.MoreLikeThisHandler$MoreLikeThisHelper.getMoreLikeThis (MoreLikeThisHandler.java:280) at org.apache.solr.handler.MoreLikeThisHandler$MoreLikeThisHelper.getMoreLikeThese( MoreLikeThisHandler.java:310) at org.apache.solr.handler.StandardRequestHandler.handleRequestBody( StandardRequestHandler.java:142) at org.apache.solr.handler.RequestHandlerBase.handleRequest( RequestHandlerBase.java :78) at org.apache.solr.core.SolrCore.execute(SolrCore.java:894) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request( EmbeddedSolrServer.java:106) at org.apache.solr.client.solrj.request.QueryRequest.process ( QueryRequest.java:80) at org.apache.solr.client.solrj.impl.BaseSolrServer.query( BaseSolrServer.java:99) at my.padam.solr.SolrJQueryTest.testQuery(SolrJQueryTest.java:134) at my.padam.solr.SolrJQueryTest.main (SolrJQueryTest.java:165) -- Thanks George L
Re: MoreLikeThis throwing NPE
Looks like the query field has to be stored for MLT. It was failing when i had both query field and similarity fields unstored before. MLT is working fine with this configuration query_field - indexed and stored similarity_field - indexed, unstored and termvectors stored. But why should the query field be stored? It will be nice if http://wiki.apache.org/solr/FieldOptionsByUseCase is updated. -- Thanks George L On 9/9/07, Ryan McKinley <[EMAIL PROTECTED]> wrote: > > George L wrote: > > I have been trying the MLT Query using EmbeddedSolr and SolrJ clients, > which > > is resulting in NPE. > > > > Do you get the same error without solrj? > > Can you run the same query with: >http://localhost:8987/solr/select?q=id:11&mlt=true > > (just to make sure we only need to look at the MLT code) > > ryan >
RE: multiple indices
> > I was going through some old emails on this topic. Rafael Rossini > > figured out how to run multiple indices on single instance of jetty > > but it has to be jetty plus. I guess jetty doesn't allow this? I > > suppose I can add additional jars and make it work but I > haven't tried > > that. It'll always be much safer/simpler/less playing around if a > > feature is available out of box. > > The example that comes with Solr is meant to be a starting > point for users. It is a relatively functional and > well-commented example, and its config files are pretty much > the canonical documentation for solr config, and for many > people they can modifying it for their own production use > > but it is still just an example application. > > By the time people want to do expert-level activities with > Solr (multi-index falls into that category), they should be > able to configure their own servlet container, whether it be > jetty plus, tomcat, resin, etc. Does this means Solr 1.2 supports MultiSearcher? -- George
Re: Can you parse the contents of a field to populate other fields?
I'm not sure I fully understand your ultimate goal or Yonik's response. However, in the past I've been able to represent hierarchical data as a simple enumeration of delimited paths: root root/region root/region/north america root/region/south america Then, at response time, you can walk the result facet and build a hierarchy with counts that can be put into a tree view. The tree can be any arbitrary depth, and documents can live in any combination of nodes on the tree. In addition, you can represent any arbitrary name value pair (attribute/tuple) as a two level tree. That way, you can put any combination of attributes in the facet and parse them out at results list time. For example, you might be indexing computer hardware. Memory, Bus Speed and Resolution may be valid for some objects but not for others. Just put them in a facet and specify a separator: memory:1GB busspeed:133Mhz voltage:110/220 manufacturer:Shiangtsu When you do a facet query, you can easily display the categories appropriate to the object. And do facet selections like "show me all green things" and "show me all size 4 things". Even if that's not your goal, this might help someone else. George Everitt On Nov 7, 2007, at 3:15 PM, Kristen Roth wrote: So, I think I have things set up correctly in my schema, but it doesn't appear that any logic is being applied to my Category_# fields - they are being populated with the full string copied from the Category field (facet1::facet2::facet3...facetn) instead of just facet1, facet2, etc. I have several different field types, each with a different regex to match a specific part of the input string. In this example, I'm matching facet1 in input string facet1::facet2::facet3...facetn I have copyfields set up for each Category_# field. Anything obviously wrong? Thanks! Kristen -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Wednesday, November 07, 2007 9:38 AM To: solr-user@lucene.apache.org Subject: Re: Can you parse the contents of a field to populate other fields? On 11/6/07, Kristen Roth <[EMAIL PROTECTED]> wrote: Yonik - thanks so much for your help! Just to clarify; where should the regex go for each field? Each field should have a different FieldType (referenced by the "type" XML attribute). Each fieldType can have it's own analyzer. You can use a different PatternTokenizer (which specifies a regex) for each analyzer. -Yonik
Heritrix and Solr
I'm looking for a web crawler to use with Solr. The objective is to crawl about a dozen public web sites regarding a specific topic. After a lot of googling, I came across Heritrix, which seems to be the most robust well supported open source crawler out there. Heritrix has an integration with Nutch (NutchWax), but not with Solr. I'm wondering if anybody can share any experience using Heritrix with Solr. It seems that there are three options for integration: 1. Write a custom Heritrix "Writer" class which submits documents to Solr for indexing. 2. Write an ARC to Sol input XML format converter to import the ARC files. 3. Use the filesystem mirror writer and then another program to walk the downloaded files. Has anybody looked into this or have any suggestions on an alternative approach? The optimal answer would be "You dummy, just use XXX to crawl your web sites - there's no 'integration' required at all. Can you believe the temerity? What a poltroon." Yours in Revolution, George
Re: Heritrix and Solr
Otis: There are many reasons I prefer Solr to Nutch: 1. I actually tried to do some of the crawling with Nutch, but found the crawling options less flexible than I would have liked. 2. I prefer the Solr approach in general. I have a long background in Verity and Autonomy search, and Solr is a bit closer to them than Nutch. 3. I really like the schema support in Solr. 4. I really really like the facets/parametric search in Solr. 5. I really really really like the REST interface in Solr. 6. Finally, and not to put too fine a point on it, hadoop frightens the bejeebers out of me. I've skimmed some of the papers and it looks like a lot of study before I will fully understand it. I'm not saying I'm stupid and lazy, but if the map-reduce algorithm fits, I'll wear it. Plus, I'm trying to get a mental handle on Jeff Hawkins' HTM and it's application to the real world. It all makes my cerebral cortex itchy. Thanks for the suggestion, though. I'll probably revisit Nutch again if Heritrix lets me down. I had no luck getting the Nutch crawler Solr patch to work, either. Sadly, I'm the David Lee Roth of Java programmers - I may think that I"m hard-core, but I'm not, really. And my groupies are getting a bit saggy. BTW - add my voice to the paeans of praise for Lucene in Action. You and Erik did a bang up job, and I surely appreciate all the feedback you give on this forum, Especially over the past few months as I feel my way through Solr and Lucene. On Nov 22, 2007, at 10:10 PM, Otis Gospodnetic wrote: The answer to that question, Norberto, would depend on versions. George: why not just use straight Nutch and forget about Heritrix? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Norberto Meijome <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Cc: [EMAIL PROTECTED] Sent: Thursday, November 22, 2007 5:54:32 PM Subject: Re: Heritrix and Solr On Thu, 22 Nov 2007 10:41:41 -0500 George Everitt <[EMAIL PROTECTED]> wrote: After a lot of googling, I came across Heritrix, which seems to be the most robust well supported open source crawler out there. Heritrix has an integration with Nutch (NutchWax), but not with Solr. I'm wondering if anybody can share any experience using Heritrix with Solr. out on a limb here... both Nutch and SOLR use Lucene for the actual indexing / searching. Would the indexes generated with Nutch be compatible / readable with SOLR? _ {Beto|Norberto|Numard} Meijome "Why do you sit there looking like an envelope without any address on it?" Mark Twain I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Facets - What's a better term for non technical people?
I don't think you have to give the user a label other than the name of the facet field. The beauty of facets is that they are pretty intuitive. Manufacturer Microsoft (140) Logitech Inc. (128) Belkin (127) Rosewill (124) APEVIA (Aspire) (119) STARTECH (97) That said, I've seen them called: Parametric Tag Names Facet (200) Parameter (122) Tag (100) Advanced Selection (20) Select (15) Navigate (13) Filter (2) Bucket (1) Enumeration (1) Category (1) Topic (1) Regards, George On Dec 11, 2007, at 11:16 PM, Otis Gospodnetic wrote: Isn't that GROUP BY ColumnX, count(1) type of thing? I'd think "group by" would be a good label. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: "Norskog, Lance" <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Tuesday, December 11, 2007 9:38:37 PM Subject: RE: Facets - What's a better term for non technical people? In SQL terms they are: 'select unique'. Except on only one field. -Original Message- From: Charles Hornberger [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 11, 2007 9:49 AM To: solr-user@lucene.apache.org Subject: Re: Facets - What's a better term for non technical people? FAST calls them "navigators" (which I think is a terrible term - YMMV of course :-)) I tend to think that "filters" -- or perhaps "dynamic filters" -- captures the essential function. On Dec 11, 2007 2:38 AM, "DAVIGNON Andre - CETE NP/DIODé/PANDOC" <[EMAIL PROTECTED]> wrote: Hi, So, has anyone got a good example of the language they might use over, say, a set of radio buttons and fields on a web form, to indicate that selecting one or more of these would return facets. 'Show grouping by' or 'List the sets that the results fall into' or something similar. Here's what i found some time : http://www.searchtools.com/info/faceted-metadata.html It has been quite useful to me. André Davignon
Re: does solr handle hierarchical facets?
On Dec 13, 2007, at 1:56 AM, Chris Hostetter wrote: ie, if this is your hierarchy... Products/ Products/Computers/ Products/Computers/Laptops Products/Computers/Desktops Products/Cases Products/Cases/Laptops Products/Cases/CellPhones Then this trick won't work (because Laptops appears twice) but if you have numeric IDs that corrispond with each of those categories (so that the two instances of Laptops are unique... 1/ 1/2/ 1/2/3 1/2/4 1/5/ 1/5/6 1/5/7 Why not just use the whole path as the unique identifying token for a given node on the hierarchy? That way, you don't need to map nodes to unique numbers, just use a prefix query. taxonomy:Products/Computers/Laptops* or taxonomy:Products/Cases/Laptops* Sorry - that may be bogus query syntax, but you get the idea. Products/Computers/Laptops* and Products/Cases/Laptops* are two unique identifiers. You just need to make sure they are tokenized properly - which is beyond my current off-the-cuff expertise. At least that is the way I've been doing it with IDOL lately. I dearly hope I can do the same in Solr when the time comes. I have a whole mess of Java code which parses out arbitrary path separated values into real tree structures. I think it would be a useful addition to Solr, or maybe Solrj. It's been knocking around my hard drives for the better part of a decade. If I get enough interest, I'll clean it up and figure out how to offer it up as a part of the code base. I'm pretty naive when it comes to FLOSS, so any authoritative non-condescending hints on how to go about this would be greatly appreciated. Regards, George
Re: Newbie question about Solr use in web applications
On Dec 14, 2007, at 9:55 AM, Stuart Sierra wrote: On Dec 13, 2007 9:20 PM, solruser2 <[EMAIL PROTECTED]> wrote: Let's say I have a database containing people, groups, and projects (these all have different fields). I want to index these different kinds of objects with a view to eventually present search results from all three types mashed together and sorted by relevance. Using separate indices (and thus separate Solr processes) would make mashing the results together very difficult so I'm guessing I just add the separate fields to the schema along with an 'object_type' field or equivalent? That is the approach I would take. Having three separate indices would make your searches slower and more complicated. I agree. Secondly should I just store the database row id for each object (while still indexing the field contents) so a query on the index returns a list of id's that I can then fetch from the database? It depends. :) If you want highlighted snippets in your search results, then you have to store the field contents in the index. In some situations you can make your search pages faster by storing all the critical fields (the ones you want to appear in search results) in the index, so that you don't have to fetch a dozen records from the database just to display a list of search results. On the other hand, if your database records are small and you don't need highlighting, it may be faster to only store database ID's in the index. I agree with this also. However, I've never seen a case where a separate database query to retrieve metadata stored in a database will be faster than just storing the necessary fields directly in the search index and retrieving them with the search results.I've found it helpful to think of the full-text index as a very simple, very fast, very flat database engine. You may not be able to do outer joins and correlated subqueries on it, but you can get a list of documents and titles really fast. Hope this sheds some light, -Stuart Sierra AltLaw.org
Solr and WebSphere 6.1
Hi folks, Has anyone managed to get Solr 1.2 to run under WebSphere 6.1? If so, can you share your experience, what configuration, settings, etc. you had to do. Someone asked this questions earlier this month, but I don't see anyone followed up -- so I'm asking again since I have this need too. Thanks. -- George
Inverted Search Engine
Verity had a function called "profiler" which was essentially an inverted search engine. Instead of evaluating a single query at a time against a large corpus of documents, the profiler evaluated a single document at a time against a large number of queries. This kind of functionality is used for alert notifications, where a large number of users can have their own queries and as documents are indexed into the system, the queries are matched and some kind of notification is made to the owner of the query (e-mail, SMS, etc). Think "Google Alerts". I'm wondering if anybody has implemented this kind of functionality with Solr, and if so what strategy did you use? If you haven't implemented something like that I would still be interested in ideas on how to do it with Solr, or how to perhaps use Lucene to patch that functionality into Solr? I have my own thoughts, but they are still a bit primitive, and I'd like to throw it over the transom and see who bites... George Everitt Applied Relevance LLC
Re: Inverted Search Engine
Wow, that's spooky. Thanks for the heads up - looks like a good list to subscribe to as well George Everitt Applied Relevance LLC [EMAIL PROTECTED] Tel: +1 (727) 641-4660 Fax: +1 (727) 233-0672 Skype: geverit4 AIM: [EMAIL PROTECTED] On Jan 23, 2008, at 2:30 PM, Erick Erickson wrote: As chance would have it, this was just discussed over on the lucene user's list. See the thread.. Inverted search / Search on profilenetBest Erick On Jan 23, 2008 1:38 PM, George Everitt <[EMAIL PROTECTED]> wrote: Verity had a function called "profiler" which was essentially an inverted search engine. Instead of evaluating a single query at a time against a large corpus of documents, the profiler evaluated a single document at a time against a large number of queries. This kind of functionality is used for alert notifications, where a large number of users can have their own queries and as documents are indexed into the system, the queries are matched and some kind of notification is made to the owner of the query (e-mail, SMS, etc). Think "Google Alerts". I'm wondering if anybody has implemented this kind of functionality with Solr, and if so what strategy did you use? If you haven't implemented something like that I would still be interested in ideas on how to do it with Solr, or how to perhaps use Lucene to patch that functionality into Solr? I have my own thoughts, but they are still a bit primitive, and I'd like to throw it over the transom and see who bites... George Everitt Applied Relevance LLC
return only sorted Field, but with a different Field Name
Hi all, Sorry if this is a easy one, but apparently my research isn't working for now. Here is what I want to do. Currently I index the contents of my database using Solr. After a search result is retrieved from Solr, I extract only the key fields that I need (mostly the unique ID and score) and then match it with the permissions in the database before I present it to a user. I have a tonne of dynamic fields in the index, and sometimes I want to sort by them. That is easy enough. For example, say I want to sort by the field '162_sortable_s' then I add a parameter like so 'sort=162_sortable_s.' I need to change the settings so that when the result set is returned from solr, it takes the values of '162_sortable_s' and inserts them into a separate field called 'SortedField' so that the return doc looks like this: 0 0 *,score on 162_sortable_s desc on 0 chouchin standard standard 10 2.2 2.075873 Brecher, Henry 4077f1ed-6885-4170-badc-c72816d5b473 2.075873 Charles, Tom 951ecbc9-0cd6-4ba5-b32f-e5d6bc42ce29 2.5168622 Zeke 530760aa-bf25-4f74-ab8b-caca744b9362 How or where do I change that setting? Do I have to rewrite some part of the RequestHandler? Thanks, George
filter query: comparing values between fields
Hi all, I am using the DisMaxRequestHandler. Is there any way to use the fq param to compare two fields? For example, each of the documents in the index have two fields which are slightly related to each other within the context of the document; say these fields are: blah1 and blah2. When I do a search, I want the fq param in solrconfig.xml to look like this: blah1:blah2 Of course the above code won't work right now, but is there any way to specify that blah2 is actually a field and not a value? Thanks, George
Re: filter query: comparing values between fields
You know, that's ultimately what I have done. My thinking is that doing field comparisons could be too intensive an operation anyway. Thanks, George On 4/11/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 4/11/07, George Abraham <[EMAIL PROTECTED]> wrote: > I am using the DisMaxRequestHandler. Is there any way to use the fq param to > compare two fields? Find all docs where fielda=fieldb is not currently supported (in Lucene or Solr), but it could be in the future if it solves a common enough problem. Do you have many such fields you compare like that? If you are only comparing fielda and fieldb, then you could index another field fielda_is_fieldb:true for documents when the values match. -Yonik
Solr approaches to re-indexing large document corpus
We are looking for some recommendations around systematically re-indexing in Solr an ever growing corpus of documents (tens of millions now, hundreds of millions in than a year) without taking the currently running index down. Re-indexing is needed on a periodic bases because: - New features are introduced around searching the existing corpus that require additional schema fields which we can't always anticipate in advance - The corpus is indexed across multiple shards. When it grows past a certain threshold, we need to create more shards and re-balance documents evenly across all of them (which SolrCloud does not seem to yet support). The current index receives very frequent updates and additions, which need to be available for search within minutes. Therefore, approaches where the corpus is re-indexed in batch offline don't really work as by the time the batch is finished, new documents will have been made available. The approaches we are looking into at the moment are: - Create a new cluster of shards and batch re-index there while the old cluster is still available for searching. New documents that are not part of the re-indexed batch are sent to both the old cluster and the new cluster. When ready to switch, point the load balancer to the new cluster. - Use CoreAdmin: spawn a new core per shard and send the re-indexed batch to the new cores. New documents that are not part of the re-indexed batch are sent to both the old cores and the new cores. When ready to switch, use CoreAdmin to dynamically swap cores. We'd appreciate if folks can either confirm or poke holes in either or all these approaches. Is one more appropriate than the other? Or are we completely off? Thank you in advance.