Custom sort (score + custom value)

2008-11-02 Thread George
Hi,

I want to implement a custom sort in Solr based on a combination of
relevance (Solr gives me it yet => score) and a custom value I've calculated
previously for each document. I see two options:

1. Use a function query (I'm using a DisMaxRequestHandler).
2. Create a component that set SortSpec with a sort that has a custom
ComparatorSource (similar to QueryElevationComponent).

The first option has the problem: While the relevance value changes for
every query, my custom value is constant for each doc. It implies queries
with documents that have high relevance are less affected with my custom
value. On the other hand, queries with low relevance are affected a lot with
my custom value. Can it be proportional with a function query? (i.e. docs
with low relevance are less affected by my custom value).

The second option has the problem: Solr score isn't normalized. I need it
normalized in order to apply my custom value in the sortValue function in
ScoreDocComparator.

What do you think? What's the best option in that case? Another option?

Thank you in advance,

George


Re: Custom sort (score + custom value)

2008-11-03 Thread George
Ok Yonik, thank you.

I've tried to execute the following query: "{!boost b=log(myrank)
defType=dismax}q" and it works great.

Do you know if I can do the same (combine a DisjunctionMaxQuery with a
BoostedQuery) in solrconfig.xml?

George

On Sun, Nov 2, 2008 at 3:01 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:

> On Sun, Nov 2, 2008 at 5:09 AM, George <[EMAIL PROTECTED]> wrote:
> > I want to implement a custom sort in Solr based on a combination of
> > relevance (Solr gives me it yet => score) and a custom value I've
> calculated
> > previously for each document. I see two options:
> >
> > 1. Use a function query (I'm using a DisMaxRequestHandler).
> > 2. Create a component that set SortSpec with a sort that has a custom
> > ComparatorSource (similar to QueryElevationComponent).
> >
> > The first option has the problem: While the relevance value changes for
> > every query, my custom value is constant for each doc.
>
> Yes, that can be an issue when adding unrelated scores.
> Multiplying them might give you better results:
>
> http://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlugin.html
>
> -Yonik
>


Re: Custom sort (score + custom value)

2008-11-04 Thread George
Todd: Yes, I looked into these arguments before I found the problem I
described in the first email.

Yonik: It's exactly what I was looking for.

George

On Mon, Nov 3, 2008 at 7:10 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:

> On Mon, Nov 3, 2008 at 12:37 PM, George <[EMAIL PROTECTED]> wrote:
> > Ok Yonik, thank you.
> >
> > I've tried to execute the following query: "{!boost b=log(myrank)
> > defType=dismax}q" and it works great.
> >
> > Do you know if I can do the same (combine a DisjunctionMaxQuery with a
> > BoostedQuery) in solrconfig.xml?
>
> Do you mean set it as a default for a handler in solrconfig.xml?  That
> should work.
> You could set a default of q={!boost b=log(myrank) defType=dismax v=$uq}
> then all the client would have to pass in is uq (the user query)
>
> -Yonik
>


Index an Oracle DATE column with a Solr DateField

2009-03-06 Thread George
Hi,

I've an Oracle DATE column that I want to index with a Solr DateField. The
part of my schema.xml looks like this:




I use DataImportHandler. When I do a search, this field is returned with one
day before. Oracle: 2006-12-10. Solr: 2006-12-09T23:00:00Z

If I index it as a String, it's indexed as expected (with the same "string
date" as I see it in Oracle)

Does anyone know where the problem is?

Thanks in advance


DataImportHandler Full Import completed successfully after SQLException

2009-06-24 Thread George
Hi,
Yesterday I found out the following exception trying to index from an Oracle
Database in my indexing process:

2009-06-23 14:57:29,205 WARN
 [org.apache.solr.handler.dataimport.JdbcDataSource] Error reading data
java.sql.SQLException: ORA-01555: snapshot too old: rollback segment number
1 with name "_SYSSMU1$" too small

at
oracle.jdbc.driver.SQLStateMapping.newSQLException(SQLStateMapping.java:70)
at oracle.jdbc.driver.DatabaseError.newSQLException(DatabaseError.java:110)
at
oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:171)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:455)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:413)
at oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:1030)
at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:183)
at oracle.jdbc.driver.T4CStatement.fetch(T4CStatement.java:1000)
at
oracle.jdbc.driver.OracleResultSetImpl.close_or_fetch_from_next(OracleResultSetImpl.java:314)
at oracle.jdbc.driver.OracleResultSetImpl.next(OracleResultSetImpl.java:228)
at
org.jboss.resource.adapter.jdbc.WrappedResultSet.next(WrappedResultSet.java:1184)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:326)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$700(JdbcDataSource.java:223)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:258)
at
org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:73)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:231)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:335)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:224)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:167)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:316)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:374)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:355)
2009-06-23 14:57:29,206 INFO
 [org.apache.solr.handler.dataimport.DocBuilder] Full Import completed
successfully
2009-06-23 14:57:29,206 INFO  [org.apache.solr.update.UpdateHandler] start
commit(optimize=true,waitFlush=false,waitSearcher=true)

As you can see, Full Import completed successfully indexing a part (about
7) of all expected documents (about 15). I don't know if it is a bug
or not but certainly it's not the behaviour I expect in this situation. It
should have rolled back, shouldn't it?

Reading Solr code I can see that in line 314 of JdbcDataSource.java it
throws a DataImportHandlerException with SEVERE errCode so I can't
understand why my indexing process finishes correctly.

I'm working with Solr trunk version (rev. 785397) and no custom properties
(i.e. onError value is default 'abort') in DataImportHandler.

George


Re: DataImportHandler Full Import completed successfully after SQLException

2009-06-24 Thread George
Noble, thank you for fixing this issue! :)

2009/6/25 Noble Paul നോബിള്‍ नोब्ळ् 

> OK , this should be a bug with JdbcDataSource.
>
> look at the line
>
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:326)
>
> it is eating up the exception and logs error and goes back. I shall
> raise an issue
>
> thanks
>
>
> On Wed, Jun 24, 2009 at 11:12 PM, George wrote:
> > Hi,
> > Yesterday I found out the following exception trying to index from an
> Oracle
> > Database in my indexing process:
> >
> > 2009-06-23 14:57:29,205 WARN
> >  [org.apache.solr.handler.dataimport.JdbcDataSource] Error reading data
> > java.sql.SQLException: ORA-01555: snapshot too old: rollback segment
> number
> > 1 with name "_SYSSMU1$" too small
> >
> > at
> >
> oracle.jdbc.driver.SQLStateMapping.newSQLException(SQLStateMapping.java:70)
> > at
> oracle.jdbc.driver.DatabaseError.newSQLException(DatabaseError.java:110)
> > at
> >
> oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:171)
> > at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:455)
> > at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:413)
> > at oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:1030)
> > at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:183)
> > at oracle.jdbc.driver.T4CStatement.fetch(T4CStatement.java:1000)
> > at
> >
> oracle.jdbc.driver.OracleResultSetImpl.close_or_fetch_from_next(OracleResultSetImpl.java:314)
> > at
> oracle.jdbc.driver.OracleResultSetImpl.next(OracleResultSetImpl.java:228)
> > at
> >
> org.jboss.resource.adapter.jdbc.WrappedResultSet.next(WrappedResultSet.java:1184)
> > at
> >
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:326)
> > at
> >
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$700(JdbcDataSource.java:223)
> > at
> >
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:258)
> > at
> >
> org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:73)
> > at
> >
> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
> > at
> >
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:231)
> > at
> >
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:335)
> > at
> >
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:224)
> > at
> >
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:167)
> > at
> >
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:316)
> > at
> >
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:374)
> > at
> >
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:355)
> > 2009-06-23 14:57:29,206 INFO
> >  [org.apache.solr.handler.dataimport.DocBuilder] Full Import completed
> > successfully
> > 2009-06-23 14:57:29,206 INFO  [org.apache.solr.update.UpdateHandler]
> start
> > commit(optimize=true,waitFlush=false,waitSearcher=true)
> >
> > As you can see, Full Import completed successfully indexing a part (about
> > 7) of all expected documents (about 15). I don't know if it is a
> bug
> > or not but certainly it's not the behaviour I expect in this situation.
> It
> > should have rolled back, shouldn't it?
> >
> > Reading Solr code I can see that in line 314 of JdbcDataSource.java it
> > throws a DataImportHandlerException with SEVERE errCode so I can't
> > understand why my indexing process finishes correctly.
> >
> > I'm working with Solr trunk version (rev. 785397) and no custom
> properties
> > (i.e. onError value is default 'abort') in DataImportHandler.
> >
> > George
> >
>
>
>
> --
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
>


JSON facet not working with dates

2017-10-25 Thread George Petasis

Hi all,

I am using solr 6.5.0, and I want to do pivot faceting including a date 
field. My simple facet.json is:


{
  "dates": {
    "type": "range",
    "field": "observationStart.TimeOP",
    "start": "3000-01-01T00:00:00Z",
    "end":   "3000-01-02T00:00:00Z",
    "gap":   "%2B15MINUTE",
    "facet": {
  "x": "sum(trafficCnt)"
    }
  }
}

What I get back is an error though:

error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.common.SolrException"],
    "msg":"Unable to range facet on 
field:observationStart.TimeOP{type=date_range,properties=indexed,stored,omitTermFreqAndPositions,useDocValuesAsStored}"


On the other hand, if I use the old interface, it seems to work:

"facet":"on",
"facet.range.start":"3000-01-01T00:00:00Z",
"facet.range.end":"3000-01-01T00:00:00Z+1DAY"
"facet.range.gap":"+15MINUTE"

I get:

"facet_ranges":{
  "observationStart.TimeOP":{
    "counts":[
  "3000-01-01T00:00:00Z",258,
  "3000-01-01T00:15:00Z",261,
  "3000-01-01T00:30:00Z",258,
  "3000-01-01T00:45:00Z",254,
  ...


My date fields are of type solr.DateRangeField.

Searching for the error I get, I found this source file:

https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/facet/FacetRange.java

Where in line 180 it has "if (ft instanceof TrieField || ft.isPointField()".

Is it related to my problem? Is the new json facet interface not working 
with date ranges?


Regards,

George



Solr Grouping - sorting groups based on the sum of the scores of the documents within each group

2011-12-05 Thread George Stathis
Currently, solr grouping (http://wiki.apache.org/solr/FieldCollapsing)
sorts groups "by the score of the top document within each group". E.g.

[...]
"groups":[{
"groupValue":"81cb63020d0339adb019a924b2a9e0c2",
"doclist":{"numFound":9,"start":0,"maxScore":4.729042,"docs":[
{
  "id":"7481df771afe39fab368ce19dfeeb528",
  [...],
  "score":4.729042},
{
  "id":"c879e95b5f16343dad8b1248133727c2",
  [...],
  "score":4.6635237},
{
  "id":"485b9aec90fd3ef381f013c51ab6a4df",
  [...],
  "score":4.347174}]
}},
[...]

Is there an out-of-the-box way to instead sort groups by the sum of the
scores of the documents within each group? E.g.

[...]
"groups":[{
"groupValue":"81cb63020d0339adb019a924b2a9e0c2",
"doclist":{"numFound":9,"start":0,*"scoreSum":13.739738*,"docs":[
{
  "id":"7481df771afe39fab368ce19dfeeb528",
  [...],
  "score":4.729042},
{
  "id":"c879e95b5f16343dad8b1248133727c2",
  [...],
  "score":4.6635237},
{
  "id":"485b9aec90fd3ef381f013c51ab6a4df",
  [...],
  "score":4.347174}]
}},
[...]

With the release of sorting by Function Query (
https://issues.apache.org/jira/browse/SOLR-1297), it seems that there
should be a way to use the sum() function (
http://wiki.apache.org/solr/FunctionQuery). But it's not quite close enough
since the "score" field is not part of the documents.

I feel like I'm close but I'm missing some obvious piece. I'm using Solr
3.5.

Thank you in advance for your time.


Alternate score-based sorting for Solr Grouping

2011-12-06 Thread George Stathis
My previous subject line was not very scannable. Apologies for the re-post,
I'm just hoping to get more eye-balls and hopefully some insights. Thank
you in advance for your time. See below.

-GS

On Mon, Dec 5, 2011 at 1:37 PM, George Stathis  wrote:

> Currently, solr grouping (http://wiki.apache.org/solr/FieldCollapsing)
> sorts groups "by the score of the top document within each group". E.g.
>
> [...]
> "groups":[{
> "groupValue":"81cb63020d0339adb019a924b2a9e0c2",
> "doclist":{"numFound":9,"start":0,"maxScore":4.729042,"docs":[
> {
>   "id":"7481df771afe39fab368ce19dfeeb528",
>   [...],
>   "score":4.729042},
> {
>   "id":"c879e95b5f16343dad8b1248133727c2",
>   [...],
>   "score":4.6635237},
> {
>   "id":"485b9aec90fd3ef381f013c51ab6a4df",
>   [...],
>   "score":4.347174}]
> }},
> [...]
>
> Is there an out-of-the-box way to instead sort groups by the sum of the
> scores of the documents within each group? E.g.
>
> [...]
> "groups":[{
> "groupValue":"81cb63020d0339adb019a924b2a9e0c2",
> "doclist":{"numFound":9,"start":0,*"scoreSum":13.739738*,"docs":[
> {
>   "id":"7481df771afe39fab368ce19dfeeb528",
>   [...],
>   "score":4.729042},
> {
>   "id":"c879e95b5f16343dad8b1248133727c2",
>   [...],
>   "score":4.6635237},
> {
>   "id":"485b9aec90fd3ef381f013c51ab6a4df",
>   [...],
>   "score":4.347174}]
> }},
> [...]
>
> With the release of sorting by Function Query (
> https://issues.apache.org/jira/browse/SOLR-1297), it seems that there
> should be a way to use the sum() function (
> http://wiki.apache.org/solr/FunctionQuery). But it's not quite close
> enough since the "score" field is not part of the documents.
>
> I feel like I'm close but I'm missing some obvious piece. I'm using Solr
> 3.5.
>
> Thank you in advance for your time.
>


anybody using solr with Cassandra?

2010-08-29 Thread Siju George
Hi,

Is anybody using Solr with Cassandra?
Are there any Gotcha's?

Thanks

--Siju


Re: anybody using solr with Cassandra?

2010-08-30 Thread Siju George
Thanks a million Nick,

We are currently debating whether we should use cassandra or membase or
hbase with solr.
Do you have anything to contribute as advice to us?

Thanks again :-)

--Siju

On Tue, Aug 31, 2010 at 5:15 AM, nickdos  wrote:

>
> Yes, we are Cassandra. There is nothing much to say really, it just works.
> Note we are SOLR generating indexes using Java & SolrJ (embedded mode) and
> reading data out of Cassandra with Java. Index generation is fast.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/anybody-using-solr-with-Cassandra-tp1383646p1391589.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: anybody using solr with Cassandra?

2010-08-30 Thread Siju George
We will be suing Solr for indexing and Cassandra/Membase/Hbase instead of a
database.
That is the idea now unless some body gives a better solution :-)

thanks

--Siju

On Tue, Aug 31, 2010 at 11:39 AM, Amit Nithian  wrote:

> I am curious about this too.. are you talking about using HBase/Cassandra
> as
> an aux store of large data or using Cassandra to store the actual lucene
> index (as in LuCandra)?
>
> On Mon, Aug 30, 2010 at 11:06 PM, Siju George 
> wrote:
>
> > Thanks a million Nick,
> >
> > We are currently debating whether we should use cassandra or membase or
> > hbase with solr.
> > Do you have anything to contribute as advice to us?
> >
> > Thanks again :-)
> >
> > --Siju
> >
> > On Tue, Aug 31, 2010 at 5:15 AM, nickdos 
> wrote:
> >
> > >
> > > Yes, we are Cassandra. There is nothing much to say really, it just
> > works.
> > > Note we are SOLR generating indexes using Java & SolrJ (embedded mode)
> > and
> > > reading data out of Cassandra with Java. Index generation is fast.
> > > --
> > > View this message in context:
> > >
> >
> http://lucene.472066.n3.nabble.com/anybody-using-solr-with-Cassandra-tp1383646p1391589.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
>


SOLR geospatial

2010-12-10 Thread George Anthony
In looking at some of the docs support for geospatial search.

I see this functionality is mostly scheduled for upcoming release 4.0 (with 
some 
playing around with backported code). 


I note the support for the bounding box filter, but will "bounding box" be one 
of the supported *data* types for use with this filter?  For example, if my 
lat/long data describes the "footprint" of a map, I'm curious if that type of 
coordinate data can be used by the bounding box filter (or in any other way for 
similar limiting/filtering capability). I see it can work with point type data 
but curious about functionality with bounding box type data (in contrast to 
simple point lat/long data).

Thanks,
George


RE: Near Real Time

2009-10-21 Thread George Aroush
> >   Further without the NRT features present what's the closest I can 
> > expect to real time for the typical use case (obviously this will vary
> > but the average deploy). One hour? One Minute? It seems like there are 
> > a few hacks to get somewhat close. Thanks so much.
> 
> Depends a lot on the nature of the requests and the size of the index,
> but one minute is often doable.
> On a large index that facets on many fields per request, one minute is
> probably still out of reach.

With no facets, what index size is consider, in general, out of reach for
NRT?  Is a 9GB index with 7 million records out of reach?  How about 3GB
with 3 million records?  3GB with 800K records?  This is for 1 min. NRT
setting.

Thanks.

-- George



Solr* != solr*

2008-07-01 Thread George Aroush
Hi Folks,

Can someone tell me what I might have setup wrong?  After indexing my data,
I can search just fine on, let say "sol*" but not on "Sol*" (note upper case
'S' vs. lower case 's') I get 0 hits.

Here is my customize schema.xml setting:


  







  
  







  


Btw, "Solr", "solr", "sOlr", etc. works.  It's a problem with wild cards.

Thanks in advance.

-- George



schema.xml for CJK, German, French, etc.

2008-07-02 Thread George Aroush
Hi Folks,

Has anyone created schema.xml for languages other then English?  I like to
see a working example mainly for CJK, German and French.  If you have can
you share them?

TO get me started, I created the following for German:

  

  
  
  
  
  
  

  

Will those filters work on German text?

Thanks.

-- George



RE: schema.xml for CJK, German, French, etc.

2008-07-02 Thread George Aroush
Thanks Erik!

Trouble is, I don't know those languages to conclude that my setup is
correct, specially for CJK.

It's less problematic for European languages, but then again, should I be
using those English filters with the German SnowballPorterFilterFactory?
That is, will WordDelimiterFilterFactory work with a German filter?  Etc.

It would be nice if folks share their setting (Generic for each language)
and then we can add them to a Solr Wiki.

-- George

> -Original Message-
> From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, July 02, 2008 9:40 PM
> To: solr-user@lucene.apache.org
> Subject: Re: schema.xml for CJK, German, French, etc.
> 
> 
> On Jul 2, 2008, at 9:16 PM, George Aroush wrote:
> > Has anyone created schema.xml for languages other then English?
> 
> Indeed.
> 
> >  I like to
> > see a working example mainly for CJK, German and French.  
> If you have 
> > can you share them?
> >
> > TO get me started, I created the following for German:
> >
> >  
> >
> >  
> >   > words="stopwords.txt"/>
> >   > generateWordParts="0"
> > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> > catenateAll="0"/>
> >  
> >   > language="German" />
> >  
> >
> >  
> >
> > Will those filters work on German text?
> 
> 
> One tip that will help is visiting 
> http://localhost:8983/solr/admin/analysis.jsp
>   and test it out to see that you're getting the tokenization 
> that you desire on some sample text.  Solr's analysis 
> introspection is quite nice and easy to tinker with.
> 
> Removing stop words before lower casing won't quite work 
> though, as StopFilter is case-sensitive with all stop words 
> generally lowercased, but other than relocating the 
> StopFilterFactory in that chain it seems reasonable.
> 
> As always, though, it depends on what you want to do with 
> these languages to offer more concrete recommendations.
> 
>   Erik
> 



RE: Solr* != solr*

2008-08-01 Thread George Aroush
Hi Erik and all,

I'm still trying to solve this issue and I like to know how others might
have solved it in their client.  I can't modify Solr / Lucene code and I'm
using Solr 1.2.

What I have done is simple.  Given a user input, I break it into words and
then analyze each word.  Any word contains wildcards (* Or ?) I lowercase
it.

While the logic is simple, I'm not comfortable with it because the
word-breaker isn't based on the analyzer in use by Lucene.  In my case, I
can't tell which analyzer is used.

So my question is, did you run into this problem, if so, how did you
workaround it?  That is, is breaking on generic whitespaces (independent of
the analyzer in use) "good enough"?

Thanks.

-- George

> -Original Message-
> From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, July 01, 2008 9:35 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr* != solr*
> 
> George - wildcard expressions, in Lucene/Solr's QueryParser, 
> are not analyzed.  There is one trick in the API that isn't 
> yet wired to  
> Solr's configuration, and that is setLowercaseExpandedTerms(true).   
> This would solve the Sol* issue because when indexed all 
> terms for the "text" field are lowercased during analysis.
> 
> An functional alternative, of course, is to have the client 
> lowercase the query expression before requesting to Solr 
> (careful, though - consider AND/OR/NOT).
> 
>   Erik
> 
> 
> 
> On Jul 1, 2008, at 8:14 PM, George Aroush wrote:
> 
> > Hi Folks,
> >
> > Can someone tell me what I might have setup wrong?  After 
> indexing my 
> > data, I can search just fine on, let say "sol*" but not on "Sol*" 
> > (note upper case 'S' vs. lower case 's') I get 0 hits.
> >
> > Here is my customize schema.xml setting:
> >
> > > positionIncrementGap="100">
> >  
> >
> >
> > > words="stopwords.txt"/>
> > > generateWordParts="0" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="0"/>
> >
> > > protected="protwords.txt"/>
> >
> >  
> >      
> >
> >
> > > words="stopwords.txt"/>
> > > generateWordParts="0" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="0"/>
> >
> > > protected="protwords.txt"/>
> >
> >  
> >
> >
> > Btw, "Solr", "solr", "sOlr", etc. works.  It's a problem with wild 
> > cards.
> >
> > Thanks in advance.
> >
> > -- George
> 



Re: Can Solr be used to search public websites(Newbie).

2008-09-17 Thread George Everitt

Dear Con,

Searching the entire Internet is a non-trivial computer science  
problem.  It's kind of like asking a brain surgeon the best way to  
remove a tumor.  The answer should be "First, spend 16 years becoming  
a neurosurgeon".  My point is, there is a whole lot you need to know  
beyond "is Solr the correct tool for the job".


However, the short answer is that Nutch is probably better suited for  
what you want to do, when you get the funding, hardware and expertise  
to do it.


I'm not mocking or denigrating you in any way, but I think you need to  
do a bit more basic research in how search engines work.


I found this very readable and accurate site the other day:

http://www.tbray.org/ongoing/When/200x/2003/07/30/OnSearchTOC

Regards,
George


On Sep 17, 2008, at 8:39 AM, convoyer wrote:



Hi all.
I am quite new to solr. I am just checking whether this tool suits my
application.
I am developing a search application that searches all publically  
available
websites and also some selective websites. Can I use solr for this  
purpose.

If yes how can I get started.
All the tutorials are pointing to load data from a xml file and  
search those

values..:-(:-( . Instead how can I give the URL of website and search
contents of that site(just like in nutch)..

Expecting reply
thanks in advance
con

--
View this message in context: 
http://www.nabble.com/Can-Solr-be-used-to-search-public-websites%28Newbie%29.-tp19531227p19531227.html
Sent from the Solr - User mailing list archive at Nabble.com.






Commit frequency

2009-01-16 Thread George Aroush
Hi Folks,

I'm trying to collect some data -- if you can share them -- about the
frequency of commit you have set on your index and at what rate did you find
it acceptable.  This is for a none-master / slave setup.

For my case, in a test environment, I have experimented with a 1 minute
interval (each 1 minute I commit anywhere between 0 to 10 new documents, and
0 to 10 updated documents).  While commit is ongoing, I'm also searching on
the index.

For this experiment, my index size is about 3.5 Gb, and I have about 1.2
million documents.  My experiment was done on a Windows 2003 server, with 4
Gb RAM and 3 GHZ 2x Xean CPU.

So, if you can share your setup, at least the commit frequency, I would
appreciate it.

What I'm trying to get out of this, is what's the lowest commit frequency
that Solr can handle.

Regards,

-- George



some hyphenated words not found

2010-03-14 Thread george young
I have a nearly generic out-of-box installation of solr.  When I
search on a short text document containing a few hyphenated words, I
get hits on *some* of the words, but not all.  I'm quite puzzled as to
why.  I've checked that the text is only plain ascii.  How can I find
out what's wrong?  In the file below, solr finds life-long, but not
love-lorn.

Here's the file:
This is a small sample document just to insure that a type *.doc can
be accessed by X Documentation.
It is sung to the moon by a love-lorn loon,
who fled from the mocking throng O!
It’s the song of a merryman, moping mum,
whose soul was sad and whose glance was glum. Misery me — lack-a-day-dee!
He sipped no sup, and he craved no crumb,
As he sighed for the love of a ladye!
Who sipped no sup, and who craved no crumb,
As he sighed for the love of a ladye.
Heighdy! heighdy! Misery me — lack-a-day-dee!
He sipped no sup, and he craved no crumb,
As he sighed for the love of a ladye!

I have a song to sing, O!
Sing me your song, O!

It is sung with the ring
Of the songs maids sing
Who love with a love life-long, O!
It's the song of a merrymaid, peerly proud,
Who loved a lord, and who laughed aloud
At the moan of the merryman, moping mum,
Whose soul was sad, and whose glance was glum,
Who sipped no sup, and who craved no crumb,
As he sighed for the love of a ladye!
Heighdy! heighdy!
Misery me — lack-a-day-dee!
He sipped no sup, and he craved no crumb,
As he sighed for the love of a ladye!


-- 
georgeryo...@gmail.com


Re: TikaEntityProcessor not working?

2010-06-03 Thread David George

Which version of Tika do you have? There was a problem introduced somewhere
between Tika 0.6 and Tika 0.7 whereby the TikaConfig method
config.getParsers() was returns an empty parser list due to class loader
scope issues with Solr running under an application server.

There is a fix in the Tika 0.8 branch and I note that a 0.8 snapshot of Tika
is including in the Solr trunk. I've not tried to get this to work and am
not sure what config is needed to make this work. I simply installed Tika
0.6 which can be dowloaded from the apache tika website.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p867572.html
Sent from the Solr - User mailing list archive at Nabble.com.


Indexing stops after exception

2010-06-03 Thread David George

I have a list of files in a database that I am indexing (it is a liferay
database and the file lists are attachments). I'm encountering the following
error

https://issues.apache.org/jira/browse/PDFBOX-709

on one of the PDF documents and this causes indexing to stop (the
TikaEntityProcessor) throws a Severe exception. Is it possible to ignore
this exception and continue indexing by some kind of solr configuration ?

It seems reasonable to do this in my case as I do not want indexing to stop
due to a non-critical error beyond my control. Currently I've modified the
TikaEntityProcessor to return null in this case. BTW shouldn't the
inputstream close be in a finally block?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-stops-after-exception-tp867608p867608.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Resume Solr indexing CSV after exception

2010-06-11 Thread David George

I modified TikaEntityProcessor to ignore these exceptions.:

If the Tika Entity processor encounters an exception it will stop indexing.
I had to make two fixes to TikaEntityProcessor to work around this problem.

>From the Solr SVN trunk edit the file:

~/src/solr-svn/trunk/solr/contrib/dataimporthandler/src/extras/main/java/org/apache/solr/handler/dataimport/TikaEntityProcessor.jar

First of all if a file is not found on the disk we want to continue
indexing. At the top of nextRow() add

File f = new File (context.getResolvedEntityAttribute(URL));
if (! f.exists()) {
  return null;
}

Secondly if the document parser throws an error, for example certain PDF
revisions can cause the PDFBox parser to barf, we will trap the exception
and continue:

try {
  tikaParser.parse(is, contentHandler, metadata , new ParseContext());
} catch (Exception e) {
  return null;
} finally {
  IOUtils.closeQuietly(is);
}

We will also close IOUtils in the finally section which is not done in the
original code. Build and deploy the extras.jar in the solr-instance/lib
directory. 

see also: http://www.abcseo.com/tech/search/solr-and-liferay-integration
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Resume-Solr-indexing-CSV-after-exception-tp878801p888143.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Resume Solr indexing CSV after exception

2010-06-11 Thread David George

cool I will try that.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Resume-Solr-indexing-CSV-after-exception-tp878801p888605.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: [slightly ot] Looking for Lucene/Solr consultant in Germany

2007-08-15 Thread George Everitt

Dear Jan,

I just saw your post on the SOLR mailing list.  I hope I'm not too late.

First of, I don't exactly match your required qualifications.  I do  
have 9 years at Verity and 1 year at Autonomy in enterprise search,  
however.   I'm in the middle of coming up to speed on SOLR and  
applying my considerable expertise in general Enterprise Search to  
the SOLR/Lucene platform.   So, your specific requirements for a  
Lucene/SOLR expert are not quite met.  But, I've been in the business  
of enterprise search for 10 years.   Think if it as asking an Oracle  
expert to look at your MySQL implementation.


My normal rate is USD 200/hour, and I do command that rate more often  
than not.  I'd be interested in taking on the challenge in my spare  
time, free of charge, just to get my bearings and to see how my  
consulting skills translate from the closed-source Verity/IDOL world  
to the open source world.  I think this could be beneficial to both  
of us:   I would get some expertise in specific SOLR idiosyncrasies,  
and you would get the benefit of 10 years of general enterprise  
search experience.


I've been studying SOLR and Lucene, and even developing my own  
project using them as a basis.  That being said, I expect to make  
some mistakes as I try to match my existing skill set with what's  
available in SOLR.  Fortunately, I found that with the transition  
from Verity K2 to Autonomy IDOL the underlying concepts of full-text  
search are pretty much universal.


Another fly in the ointment is that I live in the USA (St. Pete  
Beach, Florida to be exact), so there would be some time zone  
issues.  Also, I don't speak German, which will be a handicap when it  
comes to analyzing stemming options.   If you can live with those  
limitations, I'd be happy to help.


Let me know if you're interested.

George Everitt
Applied Relevance LLC
[EMAIL PROTECTED]
Tel: +1 (727) 641-4660
Fax: +1 (727) 233-0672






On Aug 8, 2007, at 12:43 PM, Jan Miczaika wrote:


Hello,

we are looking for a Lucene/Solr consultant in Germany. We have set  
up a Lucene/Solr server (currently live at http://www.hitflip.de).  
It returns search results, but the results are not really very  
good. We have been tweaking the parameters a bit, following  
suggestions from the mailing list, but are unsure of the effects  
this has.


We are looking for someone to do the following:
- analyse the search patterns on our website
- define a methodology for defining the quality of search
- analyse the data we have available
- specify which data is required in the index
- modify the search patterns used to query the data
- test and evaluate the results

The requirements: deep knowledge of Lucene/Solr, examples of  
implemented working search engines, theoretical knowledge


Is anyone interested? Please feel free to circulate this offer.

Thanks in advance

Jan

--
Geschäftsführer / Managing Director
Hitflip Media Trading GmbH
Gürzenichstr. 7, 50667 Köln
http://www.hitflip.de - new: http://www.hitflip.co.uk

Tel. +49-(0)221-272407-27
Fax. 0221-272407-22 (that's so 1990s)
HRB 59046, Amtsgericht Köln

Geschäftsführer: Andre Alpar, Jan Miczaika, Gerald Schönbucher






MoreLikeThis throwing NPE

2007-09-09 Thread George L
I have been trying the MLT Query using EmbeddedSolr and SolrJ clients, which
is resulting in NPE.

It looks like a problem as mentioned here
https://issues.apache.org/jira/browse/LUCENE-819
If that is the case,how to fix this?

The MLT field has been stored with termvectors.

The query i used and the exception is below.

In debug mode Lucene term.hascode() has been receiving some fields in the
index and breaking with the id field,
I was hoping to see the exception happening bcos of the mlt
fields/fieldValues. But it didn't.

Can somebody help me please, i have already spent a whole saturday night
with the trunk code ;-(

SolrQuery q = new SolrQuery();
q.setQuery( "id:11");
q.addFacetField("l");
q.setFacet(true);
q.setFacetMinCount(1);
q.setParam("mlt", true);
q.setParam("mlt.fl","field1");
q.setParam("mlt.minwl","1");
q.setParam("mlt.mintf","1");
q.setParam("mlt.mindf","1");
QueryResponse response = server.query( q );



SEVERE: java.lang.NullPointerException
at org.apache.lucene.index.Term.hashCode(Term.java:78)
at org.apache.lucene.search.TermQuery.hashCode (TermQuery.java:175)
at org.apache.lucene.search.BooleanClause.hashCode(BooleanClause.java
:108)
at java.util.AbstractList.hashCode(AbstractList.java:630)
at org.apache.lucene.search.BooleanQuery.hashCode (BooleanQuery.java
:445)
at org.apache.solr.search.QueryResultKey.(QueryResultKey.java:47)
at org.apache.solr.search.SolrIndexSearcher.getDocListC(
SolrIndexSearcher.java:725)
at org.apache.solr.search.SolrIndexSearcher.getDocListAndSet (
SolrIndexSearcher.java:1241)
at
org.apache.solr.handler.MoreLikeThisHandler$MoreLikeThisHelper.getMoreLikeThis
(MoreLikeThisHandler.java:280)
at
org.apache.solr.handler.MoreLikeThisHandler$MoreLikeThisHelper.getMoreLikeThese(
MoreLikeThisHandler.java:310)
at org.apache.solr.handler.StandardRequestHandler.handleRequestBody(
StandardRequestHandler.java:142)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(
RequestHandlerBase.java :78)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:894)
at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(
EmbeddedSolrServer.java:106)
at org.apache.solr.client.solrj.request.QueryRequest.process (
QueryRequest.java:80)
at org.apache.solr.client.solrj.impl.BaseSolrServer.query(
BaseSolrServer.java:99)
at my.padam.solr.SolrJQueryTest.testQuery(SolrJQueryTest.java:134)
at my.padam.solr.SolrJQueryTest.main (SolrJQueryTest.java:165)
-- 
Thanks
George L


Re: MoreLikeThis throwing NPE

2007-09-10 Thread George L
Looks like the query field has to be stored for MLT.

It was failing when i had both query field and similarity fields unstored
before.

MLT is working fine with this configuration

query_field - indexed and stored
similarity_field - indexed, unstored and termvectors stored.

But why should the query field be stored?

It will be nice if http://wiki.apache.org/solr/FieldOptionsByUseCase is
updated.

-- 
Thanks
George L


On 9/9/07, Ryan McKinley <[EMAIL PROTECTED]> wrote:
>
> George L wrote:
> > I have been trying the MLT Query using EmbeddedSolr and SolrJ clients,
> which
> > is resulting in NPE.
> >
>
> Do you get the same error without solrj?
>
> Can you run the same query with:
>http://localhost:8987/solr/select?q=id:11&mlt=true
>
> (just to make sure we only need to look at the MLT code)
>
> ryan
>


RE: multiple indices

2007-09-11 Thread George Aroush
> > I was going through some old emails on this topic. Rafael Rossini 
> > figured out how to run multiple indices on single instance of jetty 
> > but it has to be jetty plus. I guess jetty doesn't allow this? I 
> > suppose I can add additional jars and make it work but I 
> haven't tried 
> > that. It'll always be much safer/simpler/less playing around if a 
> > feature is available out of box.
> 
> The example that comes with Solr is meant to be a starting 
> point for users.  It is a relatively functional and 
> well-commented example, and its config files are pretty much 
> the canonical documentation for solr config, and for many 
> people they can modifying it for their own production use
> 
> but it is still just an example application.
> 
> By the time people want to do expert-level activities with 
> Solr (multi-index falls into that category), they should be 
> able to configure their own servlet container, whether it be 
> jetty plus, tomcat, resin, etc.

Does this means Solr 1.2 supports MultiSearcher?

-- George



Re: Can you parse the contents of a field to populate other fields?

2007-11-07 Thread George Everitt
I'm not sure I fully understand your ultimate goal or Yonik's  
response.  However, in the past I've been able to represent  
hierarchical data as a simple enumeration of delimited paths:


root
root/region
root/region/north america
root/region/south america

Then, at response time, you can walk the result facet and build a  
hierarchy with counts that can be put into a tree view.  The tree can  
be any arbitrary depth, and documents can live in any combination of  
nodes on the tree.


In addition, you can represent any arbitrary name value pair  
(attribute/tuple) as a two level tree.   That way, you can put any  
combination of attributes in the facet and parse them out at results  
list time.  For example, you might be indexing computer hardware.
Memory, Bus Speed and Resolution may be valid for some objects but not  
for others.   Just put them in a facet and specify a separator:


memory:1GB
busspeed:133Mhz
voltage:110/220
manufacturer:Shiangtsu


When you do a facet query, you can easily display the categories  
appropriate to the object.  And do facet selections like "show me all  
green things" and "show me all size 4 things".



Even if that's not your goal, this might help someone else.


George Everitt







On Nov 7, 2007, at 3:15 PM, Kristen Roth wrote:

So, I think I have things set up correctly in my schema, but it  
doesn't

appear that any logic is being applied to my Category_# fields - they
are being populated with the full string copied from the Category  
field

(facet1::facet2::facet3...facetn) instead of just facet1, facet2, etc.

I have several different field types, each with a different regex to
match a specific part of the input string.  In this example, I'm
matching facet1 in input string facet1::facet2::facet3...facetn

   



   

I have copyfields set up for each Category_# field.  Anything  
obviously

wrong?

Thanks!
Kristen

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Wednesday, November 07, 2007 9:38 AM
To: solr-user@lucene.apache.org
Subject: Re: Can you parse the contents of a field to populate other
fields?

On 11/6/07, Kristen Roth <[EMAIL PROTECTED]> wrote:

Yonik - thanks so much for your help!  Just to clarify; where should

the

regex go for each field?


Each field should have a different FieldType (referenced by the "type"
XML attribute).  Each fieldType can have it's own analyzer.  You can
use a different PatternTokenizer (which specifies a regex) for each
analyzer.

-Yonik





Heritrix and Solr

2007-11-22 Thread George Everitt
I'm looking for a web crawler to use with Solr.  The objective is to  
crawl about a dozen public web sites regarding a specific topic.


After a lot of googling, I came across Heritrix, which seems to be the  
most robust well supported open source crawler out there.   Heritrix  
has an integration with Nutch (NutchWax), but not with Solr.   I'm  
wondering if anybody can share any experience using Heritrix with Solr.


It seems that there are three options for integration:

1. Write a custom Heritrix "Writer" class which submits documents to  
Solr for indexing.
2. Write an ARC to Sol input XML format converter to import the ARC  
files.
3. Use the filesystem mirror writer and then another program to walk  
the downloaded files.


Has anybody looked into this or have any suggestions on an alternative  
approach?  The optimal answer would be "You dummy, just use XXX to  
crawl your web sites - there's no 'integration' required at all.   Can  
you believe the temerity?   What a poltroon."


Yours in Revolution,
George










Re: Heritrix and Solr

2007-11-22 Thread George Everitt

Otis:

There are many reasons I prefer Solr to Nutch:

1. I actually tried to do some of the crawling with Nutch, but found  
the crawling options less flexible than I would have liked.
2. I prefer the Solr approach in general.  I have a long background in  
Verity and Autonomy search, and Solr is a bit closer to them than Nutch.

3. I really like the schema support in Solr.
4. I really really like the facets/parametric search in Solr.
5. I really really really like the REST interface in Solr.
6. Finally, and not to put too fine a point on it, hadoop frightens  
the bejeebers out of me.  I've skimmed some of the papers and it looks  
like a lot of study before I will fully understand it.  I'm not saying  
I'm stupid and lazy, but if the map-reduce algorithm fits, I'll wear  
it.  Plus, I'm trying to get a mental handle on Jeff Hawkins' HTM and  
it's application to the real world.   It all makes my cerebral cortex  
itchy.


Thanks for the suggestion, though.   I'll probably revisit Nutch again  
if Heritrix lets me down.  I had no luck getting the Nutch crawler  
Solr patch to work, either.   Sadly, I'm the David Lee Roth of Java  
programmers - I may think that I"m hard-core, but I'm not, really. And  
my groupies are getting a bit saggy.


BTW - add my voice to the paeans of praise for Lucene in Action.   You  
and Erik did a bang up job, and I surely appreciate all the feedback  
you give on this forum, Especially over the past few months as I feel  
my way through Solr and Lucene.




On Nov 22, 2007, at 10:10 PM, Otis Gospodnetic wrote:


The answer to that question, Norberto, would depend on versions.

George: why not just use straight Nutch and forget about Heritrix?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Norberto Meijome <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Cc: [EMAIL PROTECTED]
Sent: Thursday, November 22, 2007 5:54:32 PM
Subject: Re: Heritrix and Solr

On Thu, 22 Nov 2007 10:41:41 -0500
George Everitt <[EMAIL PROTECTED]> wrote:


After a lot of googling, I came across Heritrix, which seems to be

the

most robust well supported open source crawler out there.   Heritrix



has an integration with Nutch (NutchWax), but not with Solr.   I'm
wondering if anybody can share any experience using Heritrix with

Solr.

out on a limb here... both Nutch and SOLR use Lucene for the actual
indexing / searching. Would the indexes generated with Nutch be  
compatible

/ readable with SOLR?

_
{Beto|Norberto|Numard} Meijome

"Why do you sit there looking like an envelope without any address on
it?"
 Mark Twain

I speak for myself, not my employer. Contents may be hot. Slippery  
when

wet. Reading disclaimers makes you go blind. Writing them is worse.
You have been Warned.








Re: Facets - What's a better term for non technical people?

2007-12-17 Thread George Everitt
I don't think you have to give the user a label other than the name of  
the facet field.   The beauty of facets is that they are pretty  
intuitive.


Manufacturer
Microsoft (140)
Logitech Inc. (128)
Belkin (127)
Rosewill (124)
APEVIA (Aspire) (119)
STARTECH (97)

That said, I've seen them called:

Parametric Tag Names
Facet (200)
Parameter (122)
Tag (100)
Advanced Selection (20)
Select (15)
Navigate (13)
Filter (2)
Bucket  (1)
Enumeration (1)
Category (1)
Topic (1)

Regards,
George



On Dec 11, 2007, at 11:16 PM, Otis Gospodnetic wrote:


Isn't that GROUP BY ColumnX, count(1) type of thing?

I'd think "group by" would be a good label.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: "Norskog, Lance" <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Tuesday, December 11, 2007 9:38:37 PM
Subject: RE: Facets - What's a better term for non technical people?

In SQL terms they are: 'select unique'. Except on only one field.

-Original Message-
From: Charles Hornberger [mailto:[EMAIL PROTECTED]
Sent: Tuesday, December 11, 2007 9:49 AM
To: solr-user@lucene.apache.org
Subject: Re: Facets - What's a better term for non technical people?

FAST calls them "navigators" (which I think is a terrible term - YMMV
of course :-))

I tend to think that "filters" -- or perhaps "dynamic filters" --
captures the essential function.

On Dec 11, 2007 2:38 AM, "DAVIGNON Andre - CETE NP/DIODé/PANDOC"
<[EMAIL PROTECTED]> wrote:

Hi,


So, has anyone got a good example of the language they might use
over, say, a set of radio buttons and fields on a web form, to
indicate that selecting one or more of these would return facets.

'Show grouping by'

or 'List the sets that the results fall into' or something similar.


Here's what i found some time :
http://www.searchtools.com/info/faceted-metadata.html

It has been quite useful to me.

André Davignon











Re: does solr handle hierarchical facets?

2007-12-17 Thread George Everitt



On Dec 13, 2007, at 1:56 AM, Chris Hostetter wrote:


ie, if this is your hierarchy...

   Products/
   Products/Computers/
   Products/Computers/Laptops
   Products/Computers/Desktops
   Products/Cases
   Products/Cases/Laptops
   Products/Cases/CellPhones

Then this trick won't work (because Laptops appears twice) but if  
you have
numeric IDs that corrispond with each of those categories (so that  
the two

instances of Laptops are unique...

   1/
   1/2/
   1/2/3
   1/2/4
   1/5/
   1/5/6
   1/5/7


Why not just use the whole path as the unique identifying token for a  
given node on the hierarchy?   That way, you don't need to map nodes  
to unique numbers, just use a prefix query.


taxonomy:Products/Computers/Laptops* or taxonomy:Products/Cases/Laptops*

Sorry - that may be bogus query syntax, but you get the idea.

Products/Computers/Laptops* and Products/Cases/Laptops* are two unique  
identifiers.  You just need to make sure they are tokenized properly -  
which is beyond my current off-the-cuff expertise.


At least that is the way I've been doing it with IDOL lately.  I  
dearly hope I can do the same in Solr when the time comes.


I have a whole mess of Java code which parses out arbitrary path  
separated values into real tree structures.  I think it would be a  
useful addition to Solr, or maybe Solrj.  It's been knocking around my  
hard drives for the better part of a decade.   If I get enough  
interest, I'll clean it up and figure out how to offer it up as a part  
of the code base.  I'm pretty naive when it comes to FLOSS, so any  
authoritative non-condescending hints on how to go about this would be  
greatly appreciated.


Regards,
George


Re: Newbie question about Solr use in web applications

2007-12-18 Thread George Everitt


On Dec 14, 2007, at 9:55 AM, Stuart Sierra wrote:


On Dec 13, 2007 9:20 PM, solruser2 <[EMAIL PROTECTED]> wrote:
Let's say I have a database containing people, groups, and projects  
(these
all have different fields). I want to index these different kinds  
of objects
with a view to eventually present search results from all three  
types mashed
together and sorted by relevance. Using separate indices (and thus  
separate
Solr processes) would make mashing the results together very  
difficult so
I'm guessing I just add the separate fields to the schema along  
with an

'object_type' field or equivalent?


That is the approach I would take.  Having three separate indices
would make your searches slower and more complicated.


I agree.




Secondly should I just store the database row id for each object  
(while
still indexing the field contents) so a query on the index returns  
a list of

id's that I can then fetch from the database?


It depends. :)  If you want highlighted snippets in your search
results, then you have to store the field contents in the index.  In
some situations you can make your search pages faster by storing all
the critical fields (the ones you want to appear in search results) in
the index, so that you don't have to fetch a dozen records from the
database just to display a list of search results.  On the other hand,
if your database records are small and you don't need highlighting, it
may be faster to only store database ID's in the index.



I agree with this also.   However, I've never seen a case where a  
separate database query to retrieve metadata stored in a database will  
be faster than just storing the necessary fields directly in the  
search index and retrieving them with the search results.I've  
found it helpful to think of the full-text index as a very simple,  
very fast, very flat database engine.  You may not be able to do outer  
joins and correlated subqueries on it, but you can get a list of  
documents and titles really fast.



Hope this sheds some light,
-Stuart Sierra
AltLaw.org




Solr and WebSphere 6.1

2007-12-28 Thread George Aroush
Hi folks,

Has anyone managed to get Solr 1.2 to run under WebSphere 6.1?  If so, can
you share your experience, what configuration, settings, etc. you had to do.

Someone asked this questions earlier this month, but I don't see anyone
followed up -- so I'm asking again since I have this need too.

Thanks.

-- George



Inverted Search Engine

2008-01-23 Thread George Everitt
Verity had a function called "profiler" which was essentially an  
inverted search engine.  Instead of evaluating a single query at a  
time against a large corpus of documents, the profiler evaluated a  
single document at a time against a large number of queries.   This  
kind of functionality is used for alert notifications, where a large  
number of users can have their own queries and as documents are  
indexed into the system,  the queries are matched and some kind of  
notification is made to the owner of the query (e-mail, SMS, etc).  
Think "Google Alerts".


I'm wondering if anybody has implemented this kind of functionality  
with Solr, and if so what strategy did you use?  If you haven't  
implemented something like that I would still be interested in ideas  
on how to do it with Solr, or how to perhaps use Lucene to patch that  
functionality into Solr?  I have my own thoughts, but they are still a  
bit primitive, and I'd like to throw it over the transom and see who  
bites...


George Everitt
Applied Relevance LLC







Re: Inverted Search Engine

2008-01-23 Thread George Everitt

Wow, that's spooky.

Thanks for the heads up - looks like a good list to subscribe to as well

George Everitt
Applied Relevance LLC
[EMAIL PROTECTED]
Tel: +1 (727) 641-4660
Fax: +1 (727) 233-0672
Skype: geverit4
AIM: [EMAIL PROTECTED]




On Jan 23, 2008, at 2:30 PM, Erick Erickson wrote:


As chance would have it, this was just discussed over on the lucene
user's list. See the thread..

Inverted search / Search on profilenetBest
Erick


On Jan 23, 2008 1:38 PM, George Everitt  
<[EMAIL PROTECTED]>

wrote:


Verity had a function called "profiler" which was essentially an
inverted search engine.  Instead of evaluating a single query at a
time against a large corpus of documents, the profiler evaluated a
single document at a time against a large number of queries.   This
kind of functionality is used for alert notifications, where a large
number of users can have their own queries and as documents are
indexed into the system,  the queries are matched and some kind of
notification is made to the owner of the query (e-mail, SMS, etc).
Think "Google Alerts".

I'm wondering if anybody has implemented this kind of functionality
with Solr, and if so what strategy did you use?  If you haven't
implemented something like that I would still be interested in ideas
on how to do it with Solr, or how to perhaps use Lucene to patch that
functionality into Solr?  I have my own thoughts, but they are  
still a

bit primitive, and I'd like to throw it over the transom and see who
bites...

George Everitt
Applied Relevance LLC










return only sorted Field, but with a different Field Name

2008-03-10 Thread George Abraham
Hi all,
Sorry if this is a easy one, but apparently my research isn't working for
now. Here is what I want to do.

Currently I index the contents of my database using Solr. After a search
result is retrieved from Solr, I extract only the key fields that I need
(mostly the unique ID and score) and then match it with the permissions in
the database before I present it to a user. I have a tonne of dynamic fields
in the index, and sometimes I want to sort by them. That is easy enough.

For example, say I want to sort by the field '162_sortable_s' then I add a
parameter like so 'sort=162_sortable_s.' I need to change the settings so
that when the result set is returned from solr, it takes the values of
'162_sortable_s' and inserts them into a separate field called 'SortedField'
so that the return doc looks like this:



 0
 0
 
  
  *,score
  on
  162_sortable_s desc
  on
  0
  chouchin
  
  standard
  standard
  10
  2.2
 



  2.075873
  Brecher, Henry
  4077f1ed-6885-4170-badc-c72816d5b473


  2.075873
  Charles, Tom
  951ecbc9-0cd6-4ba5-b32f-e5d6bc42ce29

 
  2.5168622
  Zeke
  530760aa-bf25-4f74-ab8b-caca744b9362





How or where do I change that setting? Do I have to rewrite some part of the
RequestHandler?

Thanks,
George


filter query: comparing values between fields

2007-04-11 Thread George Abraham

Hi all,
I am using the DisMaxRequestHandler. Is there any way to use the fq param to
compare two fields? For example, each of the documents in the index have two
fields which are slightly related to each other within the context of the
document; say these fields are: blah1 and blah2. When I do a search, I want
the fq param in solrconfig.xml to look like this:


  blah1:blah2



Of course the above code won't work right now, but is there any way to
specify that blah2 is actually a field and not a value?


Thanks,
George


Re: filter query: comparing values between fields

2007-04-12 Thread George Abraham

You know, that's ultimately what I have done. My thinking is that doing
field comparisons could be too intensive an operation anyway.

Thanks,
George

On 4/11/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:


On 4/11/07, George Abraham <[EMAIL PROTECTED]> wrote:
> I am using the DisMaxRequestHandler. Is there any way to use the fq
param to
> compare two fields?

Find all docs where fielda=fieldb is not currently supported (in
Lucene or Solr), but it could be in the future if it solves a common
enough problem.

Do you have many such fields you compare like that?
If you are only comparing fielda and fieldb, then you could index another
field
fielda_is_fieldb:true for documents when the values match.

-Yonik



Solr approaches to re-indexing large document corpus

2011-05-09 Thread George P. Stathis
We are looking for some recommendations around systematically re-indexing in
Solr an ever growing corpus of documents (tens of millions now, hundreds of
millions in than a year) without taking the currently running index down.
Re-indexing is needed on a periodic bases because:

   - New features are introduced around searching the existing corpus that
   require additional schema fields which we can't always anticipate in advance
   - The corpus is indexed across multiple shards. When it grows past a
   certain threshold, we need to create more shards and re-balance documents
   evenly across all of them (which SolrCloud does not seem to yet support).

The current index receives very frequent updates and additions, which need
to be available for search within minutes. Therefore, approaches where the
corpus is re-indexed in batch offline don't really work as by the time the
batch is finished, new documents will have been made available.

The approaches we are looking into at the moment are:

   - Create a new cluster of shards and batch re-index there while the old
   cluster is still available for searching. New documents that are not part of
   the re-indexed batch are sent to both the old cluster and the new cluster.
   When ready to switch, point the load balancer to the new cluster.
   - Use CoreAdmin: spawn a new core per shard and send the re-indexed batch
   to the new cores. New documents that are not part of the re-indexed batch
   are sent to both the old cores and the new cores. When ready to switch, use
   CoreAdmin to dynamically swap cores.

We'd appreciate if folks can either confirm or poke holes in either or all
these approaches. Is one more appropriate than the other? Or are we
completely off? Thank you in advance.