date:20090609

Re: spellcheck /too many open files

2009-06-09 Thread Shalin Shekhar Mangar

On Tue, Jun 9, 2009 at 11:15 AM, revas  wrote:

>
> 1)Does the spell check component support all languages?
>

SpellCheckComponent relies on Lucene/Solr analyzers and tokenizers. So if
you can find an analyzer/tokenizer for your language, spell checker can
work.

> 2) I have a scnenario where i have abt 20 webapps in  a single container.We
> get too many open files at index time /while restarting tomcat.

Is that because of SpellCheckComponent?

> The mergefactor is at default.
>
> If i reduce the merge factor to 2 and optimize the index ,will the open
> files be closed automatically or would i have to reindex to close the open
> files or  how do i close the already opened files.This is on linux with
> solr
> 1.3 and tomcat 5.5
>

Lucene/Solr does not keep any file opened longer than it is necessary. But
decreasing merge factor should help. You can also increase the open file
limit on your system.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Use the same SQL Field in Dataimporthandler twice?

2009-06-09 Thread gateway0


Ok here it goes:
"


  
  

  
  
  
  
  

  
  
  

  
  
  
  

  
  

  
  
  

  

"
The name of the database is "dbA" and the table name is "project".

Everything works out fine except the comment part highlighted (bold). That
works to as I stated If I change the phrase to:
"



"
so that I don´t use my primary key "id" twice but the problem is I need to
use "id" for the comment part too.

kind regards, Sebastian


Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
> 
> On Tue, Jun 9, 2009 at 12:41 AM, gateway0 wrote:
>>
>> Thanks for your answer.
>>
>> "${db.tableA.id}" that specifies the sql query that the Dataimporthandler
>> should Use the sql field "id" in table "tableA" located in Database "db".
> 
> The naming convention does not work like that.
> 
> if the entity name is 'tableA' then the field 'id' is addressed as
> 'tableA.id'
> 
> As I said earlier, if you could privide mw with the entire
> data-config.xml it would be more helpful
> 
>>
>> like in the example from the Solr Wiki:
>> "
>> 
>> "
>>
>> It´s strange I know but when I use something other than "id" as the
>> foreign
>> key for the query everything works!
>>
>> like:
>> "${db.tableA.anotherid}"
>>
>>
>>
>> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>>>
>>> what is ${db.tableA.id} ?
>>>
>>> I think there is something extra in that
>>>
>>> can you paste the whole data-config.xml?
>>>
>>> can you paste
>>>
>>> On Sun, Jun 7, 2009 at 1:09 AM, gateway0 wrote:

 Hi,

 I tried to do the following:

 "
 

 
        
 
 "

 So I use the SQL Table Field "id" twice once for "db_id" in my index
 and
 for
 the sql query as "fid=id".

 That doesn´t work!

 But when I change the query from "fid=id" to like "fid=otherkey" it
 does
 work!
 Like:
 "
 

 
        
 
 "

 Is there any other kind of a workaround so I can use the SQL Field "id"
 twice as I wanted to? Thanks

 kind regards, Sebastian
 --
 View this message in context:
 http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23904968.html
 Sent from the Solr - User mailing list archive at Nabble.com.


>>>
>>>
>>>
>>> --
>>> -
>>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23930286.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23938282.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Use the same SQL Field in Dataimporthandler twice?

2009-06-09 Thread Noble Paul നോബിള്‍ नोब्ळ्

can you avoid "." dots in the entity name and try it out. dots are
special characters and it should have caused some problem

On Tue, Jun 9, 2009 at 1:37 PM, gateway0 wrote:
>
> Ok here it goes:
> "
> 
> 
>   driver="com.mysql.jdbc.Driver"
> url="jdbc:mysql://localhost:3306/dbA?zeroDateTimeBehavior=convertToNull"
> user="root" password=""/>
>  
>     transformer="TemplateTransformer" query="select *, 'dbA.project' from
> project">
>      
>       template="${dbA.project.dbA.project},id:${dbA.project.id}"/>
>      
>      
>      
>        
>      
>      
>      
>        
>      
>       dateTimeFormat="-MM-dd'T'hh:mm:ss"/>
>       dateTimeFormat="-MM-dd'T'hh:mm:ss"/>
>      
>        
>      
>      
>        
>      
>      
>      
>    
>  
> 
> "
> The name of the database is "dbA" and the table name is "project".
>
> Everything works out fine except the comment part highlighted (bold). That
> works to as I stated If I change the phrase to:
> "
> 
>        
> 
> "
> so that I don´t use my primary key "id" twice but the problem is I need to
> use "id" for the comment part too.
>
> kind regards, Sebastian
>
>
> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>>
>> On Tue, Jun 9, 2009 at 12:41 AM, gateway0 wrote:
>>>
>>> Thanks for your answer.
>>>
>>> "${db.tableA.id}" that specifies the sql query that the Dataimporthandler
>>> should Use the sql field "id" in table "tableA" located in Database "db".
>>
>> The naming convention does not work like that.
>>
>> if the entity name is 'tableA' then the field 'id' is addressed as
>> 'tableA.id'
>>
>> As I said earlier, if you could privide mw with the entire
>> data-config.xml it would be more helpful
>>
>>>
>>> like in the example from the Solr Wiki:
>>> "
>>> 
>>> "
>>>
>>> It´s strange I know but when I use something other than "id" as the
>>> foreign
>>> key for the query everything works!
>>>
>>> like:
>>> "${db.tableA.anotherid}"
>>>
>>>
>>>
>>> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:

 what is ${db.tableA.id} ?

 I think there is something extra in that

 can you paste the whole data-config.xml?

 can you paste

 On Sun, Jun 7, 2009 at 1:09 AM, gateway0 wrote:
>
> Hi,
>
> I tried to do the following:
>
> "
> 
>
> 
>        
> 
> "
>
> So I use the SQL Table Field "id" twice once for "db_id" in my index
> and
> for
> the sql query as "fid=id".
>
> That doesn´t work!
>
> But when I change the query from "fid=id" to like "fid=otherkey" it
> does
> work!
> Like:
> "
> 
>
> 
>        
> 
> "
>
> Is there any other kind of a workaround so I can use the SQL Field "id"
> twice as I wanted to? Thanks
>
> kind regards, Sebastian
> --
> View this message in context:
> http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23904968.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com


>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23930286.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> -
>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23938282.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Solr Multiple Queries?

2009-06-09 Thread Aleksander M. Stensby


Hi there Samnang!
Please see inline for comments:

On Tue, 09 Jun 2009 08:40:02 +0200, Samnang Chhun  
 wrote:



Hi all,
I just get started looking at using Solr as my search web service. But I
don't know does Solr have some features for multiple queries:

- Startswith
This is what we call prefix queries and wild card queries. For instance,  
you want something that starts with "man", you can search for man*



- Exact Match

Exact matching is done with apostrophes; "Solr rocks"


- Contain
Hmm, what do you mean by contain? Inside a given word? That might be a bit  
more tricky. We have an issue open at the moment for supporting leading  
wildcards, and that might allow for you to search for *cogn* and match  
recognition etc. If that was what you meant, you can look at the ongoing  
issue http://issues.apache.org/jira/browse/SOLR-218



- Doesn't Contain
NOT or - are keywords to exclude something (solr supports all the boolean  
operators that Lucene supports).



- In the range

range queries in solr are done by using brackets.
for instance
price:[500 TO 1000]
will return all results with prices ranging from 500 to 1000.

There is a lot of information on the Wiki that you should check out:
http://wiki.apache.org/solr/




Could anyone guide me how to implement those features in Solr?

Cheers,
Samnang



Cheers,
 Aleks


--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail

solr in distributed mode

2009-06-09 Thread Rakhi Khatwani

Hi,
I was looking for ways in which we can use solr in distributed mode.
is there anyways we can use solr indexes across machines or by using Hadoop
Distributed File System?

Its has been mentioned in the wiki that
When an index becomes too large to fit on a single system, or when a single
query takes too long to execute, an index can be split into multiple shards,
and Solr can query and merge results across those shards.

what i understand is that shards are a partition. are shards on the same
machine or can it be on different machines?? do we have to manually
split the indexes to store in different shards.

do you have an example or some tutorial which demonstrates distributed index
searching/ storing using shards?

Regards,
Raakhi

Re: User Credentials for Solr Data Dir

2009-06-09 Thread Noble Paul നോബിള്‍ नोब्ळ्

nope

On Tue, Jun 9, 2009 at 4:59 AM, vaibhav joshi wrote:
>
> Hi,
>
>
>
> I am currently using solr 1.3 and runnign the sole as NT service. I need to 
> store data indexes on a Remote Filer machine. the Filer needs user 
> credentials inorder to access the same.. Is there a solr configuration which 
> I can use to pass these credentials?
>
>
>
> I was reading some blogs and they suggested to run the NT service with user 
> who can access the resource needed. Since I need to use existing build and 
> deploy tools in the company, and they always run the NT serviec "LOCAL 
> System" which cannot access other resource.
>
>
>
> Thats why i am trying to explore if its possible to pass these credentials 
> via JNDI/System variables? Is it possible?
>
>
>
> Thanks
>
> Vaibhav
>
>
>
> _
> More than messages–check out the rest of the Windows Live™.
> http://www.microsoft.com/india/windows/windowslive/



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Use the same SQL Field in Dataimporthandler twice?

2009-06-09 Thread gateway0


No I changed the entity name to "dbA:project" but still the same problem.

Interesting sidenote If I use my Data-Config as posted (with the "id" field
in the comment section) none of the other entities works anymore like for
example:
"
entity name="user" dataSource="dbA" query="select username from
 ci_user where userid='${dbA.project.created_by}' ">

  
"
returns an empty result.

Still can´t figure it out why I cant use the (sql)tables primary key 
- once to save it in the index directly and
- twice to query against my comment table





Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
> 
> can you avoid "." dots in the entity name and try it out. dots are
> special characters and it should have caused some problem
> 
> On Tue, Jun 9, 2009 at 1:37 PM, gateway0 wrote:
>>
>> Ok here it goes:
>> "
>> 
>> 
>>  > driver="com.mysql.jdbc.Driver"
>> url="jdbc:mysql://localhost:3306/dbA?zeroDateTimeBehavior=convertToNull"
>> user="root" password=""/>
>>  
>>    > transformer="TemplateTransformer" query="select *, 'dbA.project' from
>> project">
>>      
>>      > template="${dbA.project.dbA.project},id:${dbA.project.id}"/>
>>      
>>      
>>      
>>        
>>      
>>      
>>      
>>        
>>      
>>      > dateTimeFormat="-MM-dd'T'hh:mm:ss"/>
>>      > dateTimeFormat="-MM-dd'T'hh:mm:ss"/>
>>      
>>        
>>      
>>      
>>        
>>      
>>      
>>      
>>    
>>  
>> 
>> "
>> The name of the database is "dbA" and the table name is "project".
>>
>> Everything works out fine except the comment part highlighted (bold).
>> That
>> works to as I stated If I change the phrase to:
>> "
>> 
>>        
>> 
>> "
>> so that I don´t use my primary key "id" twice but the problem is I need
>> to
>> use "id" for the comment part too.
>>
>> kind regards, Sebastian
>>
>>
>> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>>>
>>> On Tue, Jun 9, 2009 at 12:41 AM, gateway0 wrote:

 Thanks for your answer.

 "${db.tableA.id}" that specifies the sql query that the
 Dataimporthandler
 should Use the sql field "id" in table "tableA" located in Database
 "db".
>>>
>>> The naming convention does not work like that.
>>>
>>> if the entity name is 'tableA' then the field 'id' is addressed as
>>> 'tableA.id'
>>>
>>> As I said earlier, if you could privide mw with the entire
>>> data-config.xml it would be more helpful
>>>

 like in the example from the Solr Wiki:
 "
 
 "

 It´s strange I know but when I use something other than "id" as the
 foreign
 key for the query everything works!

 like:
 "${db.tableA.anotherid}"



 Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>
> what is ${db.tableA.id} ?
>
> I think there is something extra in that
>
> can you paste the whole data-config.xml?
>
> can you paste
>
> On Sun, Jun 7, 2009 at 1:09 AM, gateway0 wrote:
>>
>> Hi,
>>
>> I tried to do the following:
>>
>> "
>> 
>>
>> 
>>        
>> 
>> "
>>
>> So I use the SQL Table Field "id" twice once for "db_id" in my index
>> and
>> for
>> the sql query as "fid=id".
>>
>> That doesn´t work!
>>
>> But when I change the query from "fid=id" to like "fid=otherkey" it
>> does
>> work!
>> Like:
>> "
>> 
>>
>> 
>>        
>> 
>> "
>>
>> Is there any other kind of a workaround so I can use the SQL Field
>> "id"
>> twice as I wanted to? Thanks
>>
>> kind regards, Sebastian
>> --
>> View this message in context:
>> http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23904968.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>
>
>
> --
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
>
>

 --
 View this message in context:
 http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23930286.html
 Sent from the Solr - User mailing list archive at Nabble.com.


>>>
>>>
>>>
>>> --
>>> -
>>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23938282.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23939391.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: spellcheck /too many open files

2009-06-09 Thread revas

But the spell check componenet uses the n-gram analyzer and henc should work
for any language ,is this correct ,also we can refer an extern dictionary
for suggestions ,could this be in any language?

The open files is not because of spell check as we have not yet implemented
this yet, every time we restart solr we need to up the ulimit ,otherwise it
does not work,so is there any workaround to permanently close this open
files ,does optmizing the index close it?

Regards
Sujatha

On Tue, Jun 9, 2009 at 12:53 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Tue, Jun 9, 2009 at 11:15 AM, revasHi  wrote:
>
> >
> > 1)Does the spell check component support all languages?
> >
>
> SpellCheckComponent relies on Lucene/Solr analyzers and tokenizers. So if
> you can find an analyzer/tokenizer for your language, spell checker can
> work.
>
>
> > 2) I have a scnenario where i have abt 20 webapps in  a single
> container.We
> > get too many open files at index time /while restarting tomcat.
>
>
> Is that because of SpellCheckComponent?
>
>
> > The mergefactor is at default.
> >
> > If i reduce the merge factor to 2 and optimize the index ,will the open
> > files be closed automatically or would i have to reindex to close the
> open
> > files or  how do i close the already opened files.This is on linux with
> > solr
> > 1.3 and tomcat 5.5
> >
>
> Lucene/Solr does not keep any file opened longer than it is necessary. But
> decreasing merge factor should help. You can also increase the open file
> limit on your system.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Lucene2.9-dev version in Solr nightly-build and FieldCache memory usage

2009-06-09 Thread Marc Sturlese


Hey there,
Does the lucene2.9-dev used in current Solr nighty-build (9-6-2009) include
the patch LUCENE-1662 to avoid doubling memory usage in lucene FieldCache??
Thanks in advance
-- 
View this message in context: 
http://www.nabble.com/Lucene2.9-dev-version-in-Solr-nightly-build-and-FieldCache-memory-usage-tp23939495p23939495.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: spellcheck /too many open files

2009-06-09 Thread Shalin Shekhar Mangar

On Tue, Jun 9, 2009 at 2:56 PM, revas  wrote:

> But the spell check componenet uses the n-gram analyzer and henc should
> work
> for any language ,is this correct ,also we can refer an extern dictionary
> for suggestions ,could this be in any language?
>

Yes it does use n-grams but there's an analysis step before the n-grams are
created. For example, if you are creating your spell check index from a Solr
field, SpellCheckComponent uses that field's index time analyzer. So you
should create your language-specific fields in such a way that the analysis
works correctly for that language.

> The open files is not because of spell check as we have not yet implemented
> this yet, every time we restart solr we need to up the ulimit ,otherwise it
> does not work,so is there any workaround to permanently close this open
> files ,does optmizing the index close it?
>

Optimization merges the segments of the index into one big segment. So it
will reduce the number of files. However, during the merge it may create
many more files. The old files after the merge are cleanup by Lucene in a
while (unless you have changed the defaults in the IndexDeletionPolicy
section in solrconfig.xml).

-- 
Regards,
Shalin Shekhar Mangar.

Multiple queries in one, something similar to a SQL "union"

2009-06-09 Thread Avlesh Singh

I have an index with two fields - name and type. I need to perform a search
on the name field so that *equal number of results are fetched for each type
*.
Currently, I am achieving this by firing multiple queries with a different
type and then merging the results.
In my database driven version, I used to do a "union" of multiple queries
(and not separate SQL queries) to achieve this.

Can Solr do something similar? If not, can this be a possible enhancement?

Cheers
Avlesh

Re: Use the same SQL Field in Dataimporthandler twice?

2009-06-09 Thread Noble Paul നോബിള്‍ नोब्ळ्

There should be no problem if you re-use the same variable

are you sure you removed the dots from everywhere?


On Tue, Jun 9, 2009 at 2:55 PM, gateway0 wrote:
>
> No I changed the entity name to "dbA:project" but still the same problem.
>
> Interesting sidenote If I use my Data-Config as posted (with the "id" field
> in the comment section) none of the other entities works anymore like for
> example:
> "
> entity name="user" dataSource="dbA" query="select username from
>  ci_user where userid='${dbA.project.created_by}' ">
>        
>      
> "
> returns an empty result.
>
> Still can´t figure it out why I cant use the (sql)tables primary key
> - once to save it in the index directly and
> - twice to query against my comment table
>
>
>
>
>
> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>>
>> can you avoid "." dots in the entity name and try it out. dots are
>> special characters and it should have caused some problem
>>
>> On Tue, Jun 9, 2009 at 1:37 PM, gateway0 wrote:
>>>
>>> Ok here it goes:
>>> "
>>> 
>>> 
>>>  >> driver="com.mysql.jdbc.Driver"
>>> url="jdbc:mysql://localhost:3306/dbA?zeroDateTimeBehavior=convertToNull"
>>> user="root" password=""/>
>>>  
>>>    >> transformer="TemplateTransformer" query="select *, 'dbA.project' from
>>> project">
>>>      
>>>      >> template="${dbA.project.dbA.project},id:${dbA.project.id}"/>
>>>      
>>>      
>>>      
>>>        
>>>      
>>>      
>>>      
>>>        
>>>      
>>>      >> dateTimeFormat="-MM-dd'T'hh:mm:ss"/>
>>>      >> dateTimeFormat="-MM-dd'T'hh:mm:ss"/>
>>>      
>>>        
>>>      
>>>      
>>>        
>>>      
>>>      
>>>      
>>>    
>>>  
>>> 
>>> "
>>> The name of the database is "dbA" and the table name is "project".
>>>
>>> Everything works out fine except the comment part highlighted (bold).
>>> That
>>> works to as I stated If I change the phrase to:
>>> "
>>> 
>>>        
>>> 
>>> "
>>> so that I don´t use my primary key "id" twice but the problem is I need
>>> to
>>> use "id" for the comment part too.
>>>
>>> kind regards, Sebastian
>>>
>>>
>>> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:

 On Tue, Jun 9, 2009 at 12:41 AM, gateway0 wrote:
>
> Thanks for your answer.
>
> "${db.tableA.id}" that specifies the sql query that the
> Dataimporthandler
> should Use the sql field "id" in table "tableA" located in Database
> "db".

 The naming convention does not work like that.

 if the entity name is 'tableA' then the field 'id' is addressed as
 'tableA.id'

 As I said earlier, if you could privide mw with the entire
 data-config.xml it would be more helpful

>
> like in the example from the Solr Wiki:
> "
> 
> "
>
> It´s strange I know but when I use something other than "id" as the
> foreign
> key for the query everything works!
>
> like:
> "${db.tableA.anotherid}"
>
>
>
> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>>
>> what is ${db.tableA.id} ?
>>
>> I think there is something extra in that
>>
>> can you paste the whole data-config.xml?
>>
>> can you paste
>>
>> On Sun, Jun 7, 2009 at 1:09 AM, gateway0 wrote:
>>>
>>> Hi,
>>>
>>> I tried to do the following:
>>>
>>> "
>>> 
>>>
>>> 
>>>        
>>> 
>>> "
>>>
>>> So I use the SQL Table Field "id" twice once for "db_id" in my index
>>> and
>>> for
>>> the sql query as "fid=id".
>>>
>>> That doesn´t work!
>>>
>>> But when I change the query from "fid=id" to like "fid=otherkey" it
>>> does
>>> work!
>>> Like:
>>> "
>>> 
>>>
>>> 
>>>        
>>> 
>>> "
>>>
>>> Is there any other kind of a workaround so I can use the SQL Field
>>> "id"
>>> twice as I wanted to? Thanks
>>>
>>> kind regards, Sebastian
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23904968.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> -
>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>
>>
>
> --
> View this message in context:
> http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23930286.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com


>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23938282.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> -
>> Noble Pau

Re: Multiple queries in one, something similar to a SQL "union"

2009-06-09 Thread Aleksander M. Stensby

I don't know if I follow you correctly, but you are saying that you want X  
results per type?
So you do something like limit=X and query = type:Y etc. and merge the  
results?


- Aleks


On Tue, 09 Jun 2009 12:33:21 +0200, Avlesh Singh  wrote:

I have an index with two fields - name and type. I need to perform a  
search
on the name field so that *equal number of results are fetched for each  
type

*.
Currently, I am achieving this by firing multiple queries with a  
different

type and then merging the results.
In my database driven version, I used to do a "union" of multiple queries
(and not separate SQL queries) to achieve this.

Can Solr do something similar? If not, can this be a possible  
enhancement?


Cheers
Avlesh




--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail

Re: spellcheck /too many open files

2009-06-09 Thread revas

Thanks ShalinWhen we use the external  file  dictionary (if there is
one),then it should work fine ,right for spell check,also is there any
format for this file

Regards
Sujatha

On Tue, Jun 9, 2009 at 3:03 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Tue, Jun 9, 2009 at 2:56 PM, revas  wrote:
>
> > But the spell check componenet uses the n-gram analyzer and henc should
> > work
> > for any language ,is this correct ,also we can refer an extern dictionary
> > for suggestions ,could this be in any language?
> >
>
> Yes it does use n-grams but there's an analysis step before the n-grams are
> created. For example, if you are creating your spell check index from a
> Solr
> field, SpellCheckComponent uses that field's index time analyzer. So you
> should create your language-specific fields in such a way that the analysis
> works correctly for that language.
>
>
> > The open files is not because of spell check as we have not yet
> implemented
> > this yet, every time we restart solr we need to up the ulimit ,otherwise
> it
> > does not work,so is there any workaround to permanently close this open
> > files ,does optmizing the index close it?
> >
>
> Optimization merges the segments of the index into one big segment. So it
> will reduce the number of files. However, during the merge it may create
> many more files. The old files after the merge are cleanup by Lucene in a
> while (unless you have changed the defaults in the IndexDeletionPolicy
> section in solrconfig.xml).
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Sharding strategy

2009-06-09 Thread Aleksander M. Stensby


Hi all,
I'm trying to figure out how to shard our index as it is growing rapidly  
and we want to make our solution scalable.
So, we have documents that are most commonly sorted by their date. My  
initial thought is to shard the index by date, but I wonder if you have  
any input on this and how to best solve this...


I know that the most frequent queries will be executed against the  
"latest" shard, but then let's say we shard by year, how do we best solve  
the situation that will occur in the beginning of a new year? (Some of the  
data will be in the last shard, but most of it will be on the second last  
shard.)


Would it be stupid to have a "latest" shard with duplicate data (always  
consisting of the last 6 months or something like that) and maintain that  
index in addition to the regular yearly shards? Any one else facing a  
similar situation with a good solution?


Any input would be greatly appreciated :)

Cheers,
 Aleksander



--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail

Re: spellcheck /too many open files

2009-06-09 Thread Shalin Shekhar Mangar

On Tue, Jun 9, 2009 at 4:32 PM, revas  wrote:

> Thanks ShalinWhen we use the external  file  dictionary (if there is
> one),then it should work fine ,right for spell check,also is there any
> format for this file
>

The external file should have one token per line. See
http://wiki.apache.org/solr/FileBasedSpellChecker

The default analyzer is WhitespaceAnalyzer. So all tokens in the file will
be split on whitespace and the resulting tokens will be used for giving
suggestions. If you want to change the analyzer, specify fieldType in the
spell checker configuration and the component will use the analyzer
configured for that field type.

-- 
Regards,
Shalin Shekhar Mangar.

Re: spellcheck /too many open files

2009-06-09 Thread revas

Thanks

On Tue, Jun 9, 2009 at 5:14 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Tue, Jun 9, 2009 at 4:32 PM, revas  wrote:
>
> > Thanks ShalinWhen we use the external  file  dictionary (if there is
> > one),then it should work fine ,right for spell check,also is there any
> > format for this file
> >
>
> The external file should have one token per line. See
> http://wiki.apache.org/solr/FileBasedSpellChecker
>
> The default analyzer is WhitespaceAnalyzer. So all tokens in the file will
> be split on whitespace and the resulting tokens will be used for giving
> suggestions. If you want to change the analyzer, specify fieldType in the
> spell checker configuration and the component will use the analyzer
> configured for that field type.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Multiple queries in one, something similar to a SQL "union"

2009-06-09 Thread Avlesh Singh

>
> I don't know if I follow you correctly, but you are saying that you want X
> results per type?
>
You are right. I need "X" number of results per type.

So you do something like limit=X and query = type:Y etc. and merge the
> results?
>
That is what the question is! Which means, if I have 4 types, I am currently
making 4 queries to Solr.

The question is aimed to find a possibility of doing it in a single query
... and, suggesting that the implementation could be more on the lines of a
SQL Union.

Cheers
Avlesh

On Tue, Jun 9, 2009 at 4:29 PM, Aleksander M. Stensby <
aleksander.sten...@integrasco.no> wrote:

> I don't know if I follow you correctly, but you are saying that you want X
> results per type?
> So you do something like limit=X and query = type:Y etc. and merge the
> results?
>
> - Aleks
>
>
>
> On Tue, 09 Jun 2009 12:33:21 +0200, Avlesh Singh  wrote:
>
>  I have an index with two fields - name and type. I need to perform a
>> search
>> on the name field so that *equal number of results are fetched for each
>> type
>> *.
>> Currently, I am achieving this by firing multiple queries with a different
>> type and then merging the results.
>> In my database driven version, I used to do a "union" of multiple queries
>> (and not separate SQL queries) to achieve this.
>>
>> Can Solr do something similar? If not, can this be a possible enhancement?
>>
>> Cheers
>> Avlesh
>>
>
>
>
> --
> Aleksander M. Stensby
> Lead software developer and system architect
> Integrasco A/S
> www.integrasco.no
> http://twitter.com/Integrasco
>
> Please consider the environment before printing all or any of this e-mail
>

Re: Multiple queries in one, something similar to a SQL "union"

2009-06-09 Thread Shalin Shekhar Mangar

On Tue, Jun 9, 2009 at 4:03 PM, Avlesh Singh  wrote:

> I have an index with two fields - name and type. I need to perform a search
> on the name field so that *equal number of results are fetched for each
> type
> *.
> Currently, I am achieving this by firing multiple queries with a different
> type and then merging the results.
> In my database driven version, I used to do a "union" of multiple queries
> (and not separate SQL queries) to achieve this.
>
> Can Solr do something similar? If not, can this be a possible enhancement?
>

Not right now. There's an issue open:

https://issues.apache.org/jira/browse/SOLR-1093

-- 
Regards,
Shalin Shekhar Mangar.

Re: solr in distributed mode

2009-06-09 Thread Mark Miller


Rakhi Khatwani wrote:

Hi,
I was looking for ways in which we can use solr in distributed mode.
is there anyways we can use solr indexes across machines or by using Hadoop
Distributed File System?

Its has been mentioned in the wiki that
When an index becomes too large to fit on a single system, or when a single
query takes too long to execute, an index can be split into multiple shards,
and Solr can query and merge results across those shards.

what i understand is that shards are a partition. are shards on the same
machine or can it be on different machines?? do we have to manually
split the indexes to store in different shards.

do you have an example or some tutorial which demonstrates distributed index
searching/ storing using shards?

Regards,
Raakhi

  
You might check out this article to get an idea of how Solr scales (lot 
of extra stuff in Lucene in there too, just skip to around)

http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr

You can also check out the wiki: 
http://wiki.apache.org/solr/DistributedSearch


Also see:

Solr 1.4 : http://wiki.apache.org/solr/SolrReplication
Solr 1.3,1.4: http://wiki.apache.org/solr/CollectionDistribution

--
- Mark

http://www.lucidimagination.com

Re: solr distributed search example - exception

2009-06-09 Thread Mark Miller


Thanks for bringing closure to this Raakhi.

- Mark

Rakhi Khatwani wrote:

Hi Mark,
 i actually got this error coz i was using an old version of
java. now the problem is solved

Thanks anyways
Raakhi

On Tue, Jun 9, 2009 at 11:17 AM, Rakhi Khatwani  wrote:

  

Hi Mark,
yea i would like to open a JIRA issue for it. how do i go about
that?

Regards,
Raakhi



On Mon, Jun 8, 2009 at 7:58 PM, Mark Miller  wrote:



That is a very odd cast exception to get. Do you want to open a JIRA issue
for this?

It looks like an odd exception because the call is:

  NodeList nodes = (NodeList)solrConfig.evaluate(configPath,
XPathConstants.NODESET); // cast exception is we get an ArrayList rather
than NodeList

Which leads to:

Object o = xpath.evaluate(xstr, doc, type);

where type = XPathConstants.NODESET

So you get back an Object based on the XPathConstant passed. There does
not appear to be a value that would return an ArrayList.
Using XPathConstants.NODESET gets you a NodeList according to the XPath
API.

I'm not sure what could cause this to happen.

- Mark


Rakhi Khatwani wrote:

  

Hi,
I was executing a simple example which demonstrates
DistributedSearch.
example provided in the following link:

 http://wiki.apache.org/solr/DistributedSearch

however, when i startup the server in both port nos: 8983 and 7574, i get
the following exception:

SEVERE: Could not start SOLR. Check solr/home property
java.lang.ClassCastException: java.util.ArrayList cannot be cast to
org.w3c.dom.NodeList
  at

org.apache.solr.search.CacheConfig.getMultipleConfigs(CacheConfig.java:61)
  at org.apache.solr.core.SolrConfig.(SolrConfig.java:131)
  at org.apache.solr.core.SolrConfig.(SolrConfig.java:70)
  at

org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117)
  at

org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
  at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
  at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
  at

org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
  at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
  at

org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
  at
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
  at
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
  at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
  at

org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
  at

org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
  at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
  at

org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
  at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
  at
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
  at org.mortbay.jetty.Server.doStart(Server.java:210)
  at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
  at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
  at java.lang.reflect.Method.invoke(libgcj.so.7rh)
  at org.mortbay.start.Main.invokeMain(Main.java:183)
  at org.mortbay.start.Main.start(Main.java:497)
  at org.mortbay.start.Main.main(Main.java:115)
2009-06-08 18:36:28.016::WARN:  failed SolrRequestFilter
java.lang.NoClassDefFoundError: org.apache.solr.core.SolrCore
  at java.lang.Class.initializeClass(libgcj.so.7rh)
  at

org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:77)
  at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
  at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
  at

org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
  at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
  at

org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
  at
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
  at
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
  at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
  at

org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
  at

org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
  at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
  at

org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
  at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
  at
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
  at org.mortbay.jetty.Server.doStart(Server.java:210)
  at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
  at org.mortbay.xml.XmlConfiguratio

Re: Use the same SQL Field in Dataimporthandler twice?

2009-06-09 Thread gateway0


Noticed this "warning" in the log file:
"
Jun 9, 2009 2:53:35 PM
org.apache.solr.handler.dataimport.TemplateTransformer transformRow
WARNING: Unable to resolve variable: dbA.project.id while parsing
expression: ${dbA.project.dbA.project},id:${dbA.project.id}
"

Ok? Whats that suppose to mean?



gateway0 wrote:
> 
> No I changed the entity name to "dbA:project" but still the same problem.
> 
> Interesting sidenote If I use my Data-Config as posted (with the "id"
> field in the comment section) none of the other entities works anymore
> like for example:
> "
> entity name="user" dataSource="dbA" query="select username from
>  ci_user where userid='${dbA.project.created_by}' ">
> 
>   
> "
> returns an empty result.
> 
> Still can´t figure it out why I cant use the (sql)tables primary key 
> - once to save it in the index directly and
> - twice to query against my comment table
> 
> 
> 
> 
> 
> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>> 
>> can you avoid "." dots in the entity name and try it out. dots are
>> special characters and it should have caused some problem
>> 
>> On Tue, Jun 9, 2009 at 1:37 PM, gateway0 wrote:
>>>
>>> Ok here it goes:
>>> "
>>> 
>>> 
>>>  >> driver="com.mysql.jdbc.Driver"
>>> url="jdbc:mysql://localhost:3306/dbA?zeroDateTimeBehavior=convertToNull"
>>> user="root" password=""/>
>>>  
>>>    >> transformer="TemplateTransformer" query="select *, 'dbA.project' from
>>> project">
>>>      
>>>      >> template="${dbA.project.dbA.project},id:${dbA.project.id}"/>
>>>      
>>>      
>>>      
>>>        
>>>      
>>>      
>>>      
>>>        
>>>      
>>>      >> dateTimeFormat="-MM-dd'T'hh:mm:ss"/>
>>>      >> dateTimeFormat="-MM-dd'T'hh:mm:ss"/>
>>>      
>>>        
>>>      
>>>      
>>>        
>>>      
>>>      
>>>      
>>>    
>>>  
>>> 
>>> "
>>> The name of the database is "dbA" and the table name is "project".
>>>
>>> Everything works out fine except the comment part highlighted (bold).
>>> That
>>> works to as I stated If I change the phrase to:
>>> "
>>> 
>>>        
>>> 
>>> "
>>> so that I don´t use my primary key "id" twice but the problem is I need
>>> to
>>> use "id" for the comment part too.
>>>
>>> kind regards, Sebastian
>>>
>>>
>>> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:

 On Tue, Jun 9, 2009 at 12:41 AM, gateway0 wrote:
>
> Thanks for your answer.
>
> "${db.tableA.id}" that specifies the sql query that the
> Dataimporthandler
> should Use the sql field "id" in table "tableA" located in Database
> "db".

 The naming convention does not work like that.

 if the entity name is 'tableA' then the field 'id' is addressed as
 'tableA.id'

 As I said earlier, if you could privide mw with the entire
 data-config.xml it would be more helpful

>
> like in the example from the Solr Wiki:
> "
> 
> "
>
> It´s strange I know but when I use something other than "id" as the
> foreign
> key for the query everything works!
>
> like:
> "${db.tableA.anotherid}"
>
>
>
> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>>
>> what is ${db.tableA.id} ?
>>
>> I think there is something extra in that
>>
>> can you paste the whole data-config.xml?
>>
>> can you paste
>>
>> On Sun, Jun 7, 2009 at 1:09 AM, gateway0 wrote:
>>>
>>> Hi,
>>>
>>> I tried to do the following:
>>>
>>> "
>>> 
>>>
>>> 
>>>        
>>> 
>>> "
>>>
>>> So I use the SQL Table Field "id" twice once for "db_id" in my index
>>> and
>>> for
>>> the sql query as "fid=id".
>>>
>>> That doesn´t work!
>>>
>>> But when I change the query from "fid=id" to like "fid=otherkey" it
>>> does
>>> work!
>>> Like:
>>> "
>>> 
>>>
>>> 
>>>        
>>> 
>>> "
>>>
>>> Is there any other kind of a workaround so I can use the SQL Field
>>> "id"
>>> twice as I wanted to? Thanks
>>>
>>> kind regards, Sebastian
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23904968.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> -
>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>
>>
>
> --
> View this message in context:
> http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23930286.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com


>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p239382

Re: Use the same SQL Field in Dataimporthandler twice?

2009-06-09 Thread gateway0


Noticed this "warning" in the log file:
"
Jun 9, 2009 2:53:35 PM
org.apache.solr.handler.dataimport.TemplateTransformer transformRow
WARNING: Unable to resolve variable: dbA.project.id while parsing
expression: ${dbA.project.dbA.project},id:${dbA.project.id}
"

Ok? Whats that suppose to mean? 

And yes I replaced the dots (with ":") from the entity names like you
suggested. Still no change.





Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
> 
> There should be no problem if you re-use the same variable
> 
> are you sure you removed the dots from everywhere?
> 
> 
> On Tue, Jun 9, 2009 at 2:55 PM, gateway0 wrote:
>>
>> No I changed the entity name to "dbA:project" but still the same problem.
>>
>> Interesting sidenote If I use my Data-Config as posted (with the "id"
>> field
>> in the comment section) none of the other entities works anymore like for
>> example:
>> "
>> entity name="user" dataSource="dbA" query="select username from
>>  ci_user where userid='${dbA.project.created_by}' ">
>>        
>>      
>> "
>> returns an empty result.
>>
>> Still can´t figure it out why I cant use the (sql)tables primary key
>> - once to save it in the index directly and
>> - twice to query against my comment table
>>
>>
>>
>>
>>
>> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>>>
>>> can you avoid "." dots in the entity name and try it out. dots are
>>> special characters and it should have caused some problem
>>>
>>> On Tue, Jun 9, 2009 at 1:37 PM, gateway0 wrote:

 Ok here it goes:
 "
 
 
  >>> driver="com.mysql.jdbc.Driver"
 url="jdbc:mysql://localhost:3306/dbA?zeroDateTimeBehavior=convertToNull"
 user="root" password=""/>
  
    >>> transformer="TemplateTransformer" query="select *, 'dbA.project' from
 project">
      
      >>> template="${dbA.project.dbA.project},id:${dbA.project.id}"/>
      
      
      
        
      
      
      
        
      
      >>> dateTimeFormat="-MM-dd'T'hh:mm:ss"/>
      >>> dateTimeFormat="-MM-dd'T'hh:mm:ss"/>
      
        
      
      
        
      
      
      
    
  
 
 "
 The name of the database is "dbA" and the table name is "project".

 Everything works out fine except the comment part highlighted (bold).
 That
 works to as I stated If I change the phrase to:
 "
 
        
 
 "
 so that I don´t use my primary key "id" twice but the problem is I need
 to
 use "id" for the comment part too.

 kind regards, Sebastian


 Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>
> On Tue, Jun 9, 2009 at 12:41 AM, gateway0 wrote:
>>
>> Thanks for your answer.
>>
>> "${db.tableA.id}" that specifies the sql query that the
>> Dataimporthandler
>> should Use the sql field "id" in table "tableA" located in Database
>> "db".
>
> The naming convention does not work like that.
>
> if the entity name is 'tableA' then the field 'id' is addressed as
> 'tableA.id'
>
> As I said earlier, if you could privide mw with the entire
> data-config.xml it would be more helpful
>
>>
>> like in the example from the Solr Wiki:
>> "
>> 
>> "
>>
>> It´s strange I know but when I use something other than "id" as the
>> foreign
>> key for the query everything works!
>>
>> like:
>> "${db.tableA.anotherid}"
>>
>>
>>
>> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>>>
>>> what is ${db.tableA.id} ?
>>>
>>> I think there is something extra in that
>>>
>>> can you paste the whole data-config.xml?
>>>
>>> can you paste
>>>
>>> On Sun, Jun 7, 2009 at 1:09 AM, gateway0
>>> wrote:

 Hi,

 I tried to do the following:

 "
 

 
        
 
 "

 So I use the SQL Table Field "id" twice once for "db_id" in my
 index
 and
 for
 the sql query as "fid=id".

 That doesn´t work!

 But when I change the query from "fid=id" to like "fid=otherkey" it
 does
 work!
 Like:
 "
 

 
        
 
 "

 Is there any other kind of a workaround so I can use the SQL Field
 "id"
 twice as I wanted to? Thanks

 kind regards, Sebastian
 --
 View this message in context:
 http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23904968.html
 Sent from the Solr - User mailing list archive at Nabble.com.


>>>
>>>
>>>
>>> --
>>> -
>>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>>
>>>
>>
>> --
>> View this mess

Re: Lucene2.9-dev version in Solr nightly-build and FieldCache memory usage

2009-06-09 Thread Yonik Seeley

Yep.  CHANGES.txt for Solr has this:
34. Upgraded to Lucene 2.9-dev r779312 (yonik)

And if you click the "All" tab for LUCENE-1662, is says the committed
revision was 779277

-Yonik
http://www.lucidimagination.com



On Tue, Jun 9, 2009 at 5:32 AM, Marc Sturlese  wrote:
>
> Hey there,
> Does the lucene2.9-dev used in current Solr nighty-build (9-6-2009) include
> the patch LUCENE-1662 to avoid doubling memory usage in lucene FieldCache??
> Thanks in advance
> --
> View this message in context: 
> http://www.nabble.com/Lucene2.9-dev-version-in-Solr-nightly-build-and-FieldCache-memory-usage-tp23939495p23939495.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Lucene2.9-dev version in Solr nightly-build and FieldCache memory usage

2009-06-09 Thread Marc Sturlese


Thanks Yonik, didn't know how to check for the last commited revison.

Yonik Seeley-2 wrote:
> 
> Yep.  CHANGES.txt for Solr has this:
> 34. Upgraded to Lucene 2.9-dev r779312 (yonik)
> 
> And if you click the "All" tab for LUCENE-1662, is says the committed
> revision was 779277
> 
> -Yonik
> http://www.lucidimagination.com
> 
> 
> 
> On Tue, Jun 9, 2009 at 5:32 AM, Marc Sturlese 
> wrote:
>>
>> Hey there,
>> Does the lucene2.9-dev used in current Solr nighty-build (9-6-2009)
>> include
>> the patch LUCENE-1662 to avoid doubling memory usage in lucene
>> FieldCache??
>> Thanks in advance
>> --
>> View this message in context:
>> http://www.nabble.com/Lucene2.9-dev-version-in-Solr-nightly-build-and-FieldCache-memory-usage-tp23939495p23939495.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Lucene2.9-dev-version-in-Solr-nightly-build-and-FieldCache-memory-usage-tp23939495p23943239.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: fq vs. q

2009-06-09 Thread Michael Ludwig


Martin Davidsson schrieb:

I've tried to read up on how to decide, when writing a query, what
criteria goes in the q parameter and what goes in the fq parameter, to
achieve optimal performance. Is there [...] some kind of rule of thumb
to help me decide how to split things up when querying against one or
more fields.


This is a good question. I don't know if there is any such rule. I'm
going to sum up my understanding of filter queries hoping that the pros
will point out any flaws in my assumptions.

http://wiki.apache.org/solr/SolrCaching - filterCache

A filter query is cached, which means that it is the more useful the
more often it is repeated. We know how often certain queries arise, or
at least have the means to collect that data - so we know what might be
candidates for filtering.

The result of a filter query is cached and then used to filter a primary
query result using set intersection. If my filter query result comprises
more than 50 % of the entire document collection, its selectivity is
poor. I might need it despite this fact, but it might also be worth
while thinking about how to reframe the requirement, allowing for more
efficient filters.

Memory consumption is probably not a great concern here as the cache
stores only document IDs. (And if those are integers, it's just 4 bytes
each.) So having 100 filters containing 100,000 items on average, the
memory consumption increase should be around 40 MB.

By the way, are these document IDs (user in filterCache, documentCache,
queryResultCache) the ones I configure in schema.xml or does Solr map my
IDs to integers in order to ensure efficiency?

A filter query should probably be orthogonal to the primary query, which
means in plain English: unrelated to the primary query. To give an
example, I have a field "category", which is a required field. In the
class of searches where I use a filter on that field, the primary search
is for something entirely different, so in most cases, it will not, or
not necessarily, bias the primary result to any particular distribution
of the category values. I then allow the application to apply filtering
by category, incidentally, using faceting, which is a typical usage
pattern, I guess.

Michael Ludwig

filterCache/@size, queryResultCache/@size, documentCache/@size

2009-06-09 Thread Michael Ludwig


Common cache configuration parameters include @size ("size" attribute).

http://wiki.apache.org/solr/SolrCaching

For each of the following, does this mean the maximum size of:

* filterCache/@size - filter query results?
* queryResultCache/@size - query results?
* documentCache/@size - documents?

So if I know my tiny documents don't take up much memory (just 500
Bytes on average), I'd want to have very different settings for the
documentCache than if I decided to store 10 KB per doc in Solr?

And if I know that only 100 filters are possible, there is no point
raising the filterCache/@size above that threshold?

Given the following three filtering scenarios of (a) x:bla, (b) y:blub,
and (c) x:bla AND y:blub, will I end up with two or three distinct
filters? In other words, may filters be composites or are they
decomposed as far as their number (relevant for @size) is concerned?

Michael Ludwig

Re: Use the same SQL Field in Dataimporthandler twice?

2009-06-09 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Tue, Jun 9, 2009 at 6:39 PM, gateway0 wrote:
>
> Noticed this "warning" in the log file:
> "
> Jun 9, 2009 2:53:35 PM
> org.apache.solr.handler.dataimport.TemplateTransformer transformRow
> WARNING: Unable to resolve variable: dbA.project.id while parsing
> expression: ${dbA.project.dbA.project},id:${dbA.project.id}
> "
>
> Ok? Whats that suppose to mean?

This means that you still have dots in the entity name

 ${dbA.project.id} does not get resolved correctly
>
>
>
> gateway0 wrote:
>>
>> No I changed the entity name to "dbA:project" but still the same problem.
>>
>> Interesting sidenote If I use my Data-Config as posted (with the "id"
>> field in the comment section) none of the other entities works anymore
>> like for example:
>> "
>> entity name="user" dataSource="dbA" query="select username from
>>  ci_user where userid='${dbA.project.created_by}' ">
>>         
>>       
>> "
>> returns an empty result.
>>
>> Still can´t figure it out why I cant use the (sql)tables primary key
>> - once to save it in the index directly and
>> - twice to query against my comment table
>>
>>
>>
>>
>>
>> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>>>
>>> can you avoid "." dots in the entity name and try it out. dots are
>>> special characters and it should have caused some problem
>>>
>>> On Tue, Jun 9, 2009 at 1:37 PM, gateway0 wrote:

 Ok here it goes:
 "
 
 
  >>> driver="com.mysql.jdbc.Driver"
 url="jdbc:mysql://localhost:3306/dbA?zeroDateTimeBehavior=convertToNull"
 user="root" password=""/>
  
    >>> transformer="TemplateTransformer" query="select *, 'dbA.project' from
 project">
      
      >>> template="${dbA.project.dbA.project},id:${dbA.project.id}"/>
      
      
      
        
      
      
      
        
      
      >>> dateTimeFormat="-MM-dd'T'hh:mm:ss"/>
      >>> dateTimeFormat="-MM-dd'T'hh:mm:ss"/>
      
        
      
      
        
      
      
      
    
  
 
 "
 The name of the database is "dbA" and the table name is "project".

 Everything works out fine except the comment part highlighted (bold).
 That
 works to as I stated If I change the phrase to:
 "
 
        
 
 "
 so that I don´t use my primary key "id" twice but the problem is I need
 to
 use "id" for the comment part too.

 kind regards, Sebastian


 Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>
> On Tue, Jun 9, 2009 at 12:41 AM, gateway0 wrote:
>>
>> Thanks for your answer.
>>
>> "${db.tableA.id}" that specifies the sql query that the
>> Dataimporthandler
>> should Use the sql field "id" in table "tableA" located in Database
>> "db".
>
> The naming convention does not work like that.
>
> if the entity name is 'tableA' then the field 'id' is addressed as
> 'tableA.id'
>
> As I said earlier, if you could privide mw with the entire
> data-config.xml it would be more helpful
>
>>
>> like in the example from the Solr Wiki:
>> "
>> 
>> "
>>
>> It´s strange I know but when I use something other than "id" as the
>> foreign
>> key for the query everything works!
>>
>> like:
>> "${db.tableA.anotherid}"
>>
>>
>>
>> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>>>
>>> what is ${db.tableA.id} ?
>>>
>>> I think there is something extra in that
>>>
>>> can you paste the whole data-config.xml?
>>>
>>> can you paste
>>>
>>> On Sun, Jun 7, 2009 at 1:09 AM, gateway0 wrote:

 Hi,

 I tried to do the following:

 "
 

 
        
 
 "

 So I use the SQL Table Field "id" twice once for "db_id" in my index
 and
 for
 the sql query as "fid=id".

 That doesn´t work!

 But when I change the query from "fid=id" to like "fid=otherkey" it
 does
 work!
 Like:
 "
 

 
        
 
 "

 Is there any other kind of a workaround so I can use the SQL Field
 "id"
 twice as I wanted to? Thanks

 kind regards, Sebastian
 --
 View this message in context:
 http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23904968.html
 Sent from the Solr - User mailing list archive at Nabble.com.


>>>
>>>
>>>
>>> --
>>> -
>>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23930286.html
>> Sent from

Re: Terms Component

2009-06-09 Thread Anshuman Manur

I just got the nightly build, and terms comp works great!!!

merci beaucoup

On Mon, Jun 8, 2009 at 8:00 PM, Aleksander M. Stensby <
aleksander.sten...@integrasco.no> wrote:

> You can try out the nightly build of solr (which is the solr 1.4 dev
> version) containing all the new nice and shiny features of Solr 1.4:)
> To use Terms Component you simply need to configure the handler as
> explained in the documentation / wiki.
>
> Cheers,
>  Aleksander
>
>
>
> On Mon, 08 Jun 2009 14:22:15 +0200, Anshuman Manur <
> anshuman_ma...@stragure.com> wrote:
>
>  while on the subject, can anybody tell me when Solr 1.4 might come out?
>>
>> Thanks
>> Anshuman Manur
>>
>> On Mon, Jun 8, 2009 at 5:37 PM, Anshuman Manur
>> wrote:
>>
>>  I'm using Solr 1.3 apparently.and Solr 1.4 is not out yet.
>>> Sorry..My mistake!
>>>
>>>
>>> On Mon, Jun 8, 2009 at 5:18 PM, Anshuman Manur <
>>> anshuman_ma...@stragure.com> wrote:
>>>
>>>  Hello,

 I want to use the terms component in Solr 1.4: But

 http://localhost:8983/solr/terms?terms.fl=name


 But, I get the following error with the above query:

 java.lang.NullPointerException
at
 org.apache.solr.common.util.StrUtils.splitSmart(StrUtils.java:37)
at
 org.apache.solr.search.OldLuceneQParser.parse(LuceneQParserPlugin.java:104)
at org.apache.solr.search.QParser.getQuery(QParser.java:88)


at
 org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:82)
at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:148)
at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)


at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:84)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:690)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)


at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:295)


at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)


at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)


at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:568)
at
 org.ofbiz.catalina.container.CrossSubdomainSessionValve.invoke(CrossSubdomainSessionValve.java:44)


at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
at
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)


at
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)


 Any help would be great.

 Thanks
 Anshuman Manur


>>>
>>>
>
>
> --
> Aleksander M. Stensby
> Lead software developer and system architect
> Integrasco A/S
> www.integrasco.no
> http://twitter.com/Integrasco
>
> Please consider the environment before printing all or any of this e-mail
>

Initializing Solr Example

2009-06-09 Thread Mukerjee, Neiloy (Neil)

In trying to run the example distributed with Solr 1.3.0 from the command line, 
the process seems to stop at the following line:
INFO: [] Registered new searcher searc...@147c1db main

The searcher ID is not always the same, but it repeatedly gets caught at this 
line. Any suggestions?

Re: filter on millions of IDs from external query

2009-06-09 Thread Michael Ludwig


Ryan McKinley schrieb:

I am working with an in index of ~10 million documents.  The index
does not change often.

I need to preform some external search criteria that will return some
number of results -- this search could take up to 5 mins and return
anywhere from 0-10M docs.


If it really takes so long, then something is likely wrong. You might be
able to achieve a significant improvement by reframing your requirement.


I would like to use the output of this long running query as a filter
in solr.

Any suggestions on how to wire this all together?


Just use it as a filter query. The result will be cached, the query
won't have to be executed again (if I'm not mistaken) until a new index
searcher is opened (after an index update and a commit), or until the
filter query result is evicted from the cache, which you should make
sure won't happen if your query really is so terribly expensive.

Michael Ludwig

Re: Solr relevancy score - conversion

2009-06-09 Thread Matt Weber

Solr does not support this.  You can do it yourself by taking the  
highest score and using that as 100% and calculating other percentages  
from that number.  For example if the max score is 10 and the next  
result has a score of 5, you would do (5 / 10) * 100 = 50%.  Hope this  
helps.


Thanks,

Matt Weber
eSr Technologies
http://www.esr-technologies.com




On Jun 8, 2009, at 10:05 PM, Vijay_here wrote:



Hi,

I am using solr to inxdex some of the legal documents, where i need  
the solr
search engine to return relevancy ranking score for each search  
results. As

of now i am getting score like 3.12, 1.23, 0.23  so on.

Would need an more proportionate score like rounded to 100% (95%  
relevant,
80 % relevant and so on). Is there a way to make solr returns such  
scores of

such relevance. Any other approach to arrive at this scores also be
appreciated

thanks
vijay
--
View this message in context: 
http://www.nabble.com/Solr-relevancy-score---conversion-tp23936413p23936413.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Field Compression

2009-06-09 Thread Michael Ludwig


Fer-Bj schrieb:

for all the documents we have a field called "small_body" , which is a
60 chars max text field that were we store the "abstract" for each
article.



we need to display this small_body we want to compress every time.


If this works like compressing individual files, the overhead for just
60 characters (which may be no more than 60 bytes) may mean that any
attempt at compression results in inflation.

On the other hand, if lower-level units (pages) are compressed (as
opposed to individual fields), then I don't know what sense a
configurable compression threshold might make.

Maybe one of the pros can clarify.


Last question: what's the best way to determine the compress
threshold ?


One fairly obvious way would be to index the same set of documents
twice, with compression and then without, and then to compare the index
size on disk. If you don't save, say, five or ten percent (YMMV), it
might not be worth the effort.

Michael Ludwig

Re: Faceting on text fields

2009-06-09 Thread Michael Ludwig


Yao Ge schrieb:


The facet query is considerably slower comparing to other facets from
structured database fields (with highly repeated values). What I found
interesting is that even after I constrained search results to just a
few hunderd hits using other facets, these text facets are still very
slow.

I understand that text fields are not good candidate for faceting as
it can contain very large number of unique values. However why it is
still slow after my matching documents is reduced to hundreds? Is it
because the whole filter is cached (regardless the matching docs) and
I don't have enough filter cache size to fit the whole list?


Very interesting questions! I think an answer would both require and
further an understanding of how filters work, which might even lead to
a more general guideline on when and how to use filters and facets.

Even though faceting appears to have changed in 1.4 vs 1.3, it would
still be interesting to understand the 1.3 side of things.


Lastly, what I really want to is to give user a chance to visualize
and filter on top relevant words in the free-text fields. Are there
alternative to facet field approach? term vectors? I can do client
side process based on top N (say 100) hits for this but it is my last
option.


Also a very interesting data mining question! I'm sorry I don't have any
answers for you. Maybe someone else does.

Best,

Michael Ludwig

Solr update performance decrease after a while

2009-06-09 Thread Vincent Pérès


Hello,

We are indexing approximately 500 documents per day. My benchmark says an
update is done in 0.7 sec just after Solr has been started. But it quickly
decrease to 2.2 secs per update !
I have just been focused on the Schema until now, and didn't changed many
stuffs in the solrconfig file. Maybe you have some tips which could help me
to be more linear ?

Thanks a lot
Vincent
-- 
View this message in context: 
http://www.nabble.com/Solr-update-performance-decrease-after-a-while-tp23945947p23945947.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Trie Patches- Backportable?

2009-06-09 Thread Amit Nithian

I take it by the deafening silence that this is not possible? :-)

On Mon, Jun 8, 2009 at 11:34 AM, Amit Nithian  wrote:

> Hi,
> I am still using Solr 1.2 with the Lucene 2.2 that came with that version
> of Solr. I am interested in taking advantage of the trie filtering to
> alleviate some performance problems and was wondering how back-portable
> these patches are?
>
> I am also trying to understand how the Trie algorithm cuts down the number
> of term queries compared to a normal range query. I was at the recent Bay
> Area lucene/solr meetup where this was covered but missed some of the
> details.
>
> I know the ideal case is to upgrade to a newer Solr/Lucene but we are
> resource constrained and can't devote the time right now to test and upgrade
> our production systems to a newer Solr.
>
> Thanks!
> Amit
>

Re: Faceting on text fields

2009-06-09 Thread Michael Ludwig


Yonik Seeley schrieb:

Are you using Solr 1.3?
You might want to try the latest 1.4 test build -
faceting has changed a lot.


I found two significant changes (but there may well be more):

[#SOLR-911] multi-select facets - ASF JIRA
https://issues.apache.org/jira/browse/SOLR-911

Yao,

it sounds like the following (which is in 1.4) might have a chance of
helping your faceting performance issue:

[#SOLR-475] multi-valued faceting via un-inverted field - ASF JIRA
https://issues.apache.org/jira/browse/SOLR-475

Yonik,

from your initial comment for SOLR-475:

| * To save space and speed up faceting, any term that matches enough
| * documents will not be un-inverted... it will be skipped while
| * building the un-inverted field structore, and will use a set
| * intersection method during faceting.

Does this mean that frequently occurring terms (which we can use for
faceting in 1.3 without problems) are handled exactly as they were
before, by allocating a slot in the filter cache upon request, while
those zillions of pesky little fringe terms outside the mainstream,
for which allocating a slot in the filter cache would be overkill
(and possibly cause inefficient contention, eviction, and, hence,
a performance penalty) are now handled by the new structure mapping
documents to term numbers?

So doing faceting for a given set of documents would result in (a) doing
set intersection using those filter query results that have been set up
(for the terms occurring in many documents), and (b) collecting all the
pesky little terms from the new structure mapping documents to term
numbers?

So basically, depending on expediency, you (a) know the facets and count
the documents which display them, or you (b) take the documents and see
what facets they have?

Michael Ludwig

Re: Trie Patches- Backportable?

2009-06-09 Thread Shalin Shekhar Mangar

On Tue, Jun 9, 2009 at 10:19 PM, Amit Nithian  wrote:

> I take it by the deafening silence that this is not possible? :-)
>

Anything is possible :)

However, it might be easier to upgrade to 1.4 instead.


>
> On Mon, Jun 8, 2009 at 11:34 AM, Amit Nithian  wrote:
>
> > Hi,
> > I am still using Solr 1.2 with the Lucene 2.2 that came with that version
> > of Solr. I am interested in taking advantage of the trie filtering to
> > alleviate some performance problems and was wondering how back-portable
> > these patches are?
> >
>

Trie is a new functionality. It does have a few dependencies on new Lucene
APIs (TokenStream/TermAttribute etc.). On the Solr side I think it'd be
easier.


>
> > I am also trying to understand how the Trie algorithm cuts down the
> number
> > of term queries compared to a normal range query. I was at the recent Bay
> > Area lucene/solr meetup where this was covered but missed some of the
> > details.
> >
>

See the javadocs. It has the link to the paper in which it is described in
more detail.

http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/contrib-queries/org/apache/lucene/search/trie/package-summary.html
-- 
Regards,
Shalin Shekhar Mangar.

Re: User Credentials for Solr Data Dir

2009-06-09 Thread Walter Underwood

I do not recommend using network storage for indexes. This is almost always
extremely slow. When I tried it, indexing ran 100X slower.

If you don't mind terrible performance, configure your NT service to
run as a specific user. The default user is one that has almost no
privileges. Create a new user, perhaps "solr", give that user the desired
privs, and configure the service to run as that user.

But you should still use local disk.

wunder

On 6/9/09 1:55 AM, "Noble Paul നോബിള്‍  नोब्ळ्" 
wrote:

> nope
> 
> On Tue, Jun 9, 2009 at 4:59 AM, vaibhav joshi wrote:
>> 
>> Hi,
>> 
>> I am currently using solr 1.3 and runnign the sole as NT service. I need to
>> store data indexes on a Remote Filer machine. the Filer needs user
>> credentials inorder to access the same.. Is there a solr configuration which
>> I can use to pass these credentials?
>> 
>> I was reading some blogs and they suggested to run the NT service with user
>> who can access the resource needed. Since I need to use existing build and
>> deploy tools in the company, and they always run the NT serviec "LOCAL
>> System" which cannot access other resource.
>> 
>> Thats why i am trying to explore if its possible to pass these credentials
>> via JNDI/System variables? Is it possible?
>>  
>> Thanks
>> 
>> Vaibhav

Re: fq vs. q

2009-06-09 Thread Shalin Shekhar Mangar

On Tue, Jun 9, 2009 at 7:25 PM, Michael Ludwig  wrote:

>
> http://wiki.apache.org/solr/SolrCaching - filterCache
>
> A filter query is cached, which means that it is the more useful the
> more often it is repeated. We know how often certain queries arise, or
> at least have the means to collect that data - so we know what might be
> candidates for filtering.


Correct.


> The result of a filter query is cached and then used to filter a primary
> query result using set intersection. If my filter query result comprises
> more than 50 % of the entire document collection, its selectivity is
> poor. I might need it despite this fact, but it might also be worth
> while thinking about how to reframe the requirement, allowing for more
> efficient filters.


Correct.


> Memory consumption is probably not a great concern here as the cache
> stores only document IDs. (And if those are integers, it's just 4 bytes
> each.) So having 100 filters containing 100,000 items on average, the
> memory consumption increase should be around 40 MB.
>

A lot of times it is stored as a bitset so the memory requirements may be
even lesser.


>
> By the way, are these document IDs (user in filterCache, documentCache,
> queryResultCache) the ones I configure in schema.xml or does Solr map my
> IDs to integers in order to ensure efficiency?
>

These are internal doc ids assigned by Lucene.


> A filter query should probably be orthogonal to the primary query, which
> means in plain English: unrelated to the primary query. To give an
> example, I have a field "category", which is a required field. In the
> class of searches where I use a filter on that field, the primary search
> is for something entirely different, so in most cases, it will not, or
> not necessarily, bias the primary result to any particular distribution
> of the category values. I then allow the application to apply filtering
> by category, incidentally, using faceting, which is a typical usage
> pattern, I guess.
>

Yes and no. There are use-cases where the query is applicable only to the
filtered set. For example, when the same index contains many different
"types" of documents. It is just that the intersection may need to do more
or less work.

-- 
Regards,
Shalin Shekhar Mangar.

Re: statistics about word distances in solr

2009-06-09 Thread Michael Ludwig


Moin Jens,

Jens Fischer schrieb:

I was wondering if there's an option to return statistics about
distances from the query terms to the most frequent terms in the
result documents.



The additional information I'm looking for is the average distance
between these terms and my search term.

So let's say I have two docs

"the house is red"
"I live in a red house"

The search for "house" should also return the info

the:1
is:1
red:1.5
I:5
live:4


Could you explain what the "distance" here is? Something like "edit
distance"? Ah, I see: You want the textual distance between the search
term and other terms in the document, and then you want that averaged,
i.e. the cumulative distance divided by the number of occurrences.

No idea if that functionality is available.

However, the sort of calculation you want to perform requires the engine
to not only collect all the terms to present as facets (much improved in
1.4, as I've just learned), but to also analyze each document (if I'm
not mistaken) to determine the distance for each facet term from your
primary query term. (Or terms.)

The number of lookup operations is likely to scale as the product of
the number of your primary search results, the number of your search
terms, and the number of your facets.

I assume this is an expensive operation.

Michael Ludwig

Re: fq vs. q

2009-06-09 Thread Michael Ludwig


Shalin Shekhar Mangar schrieb:

On Tue, Jun 9, 2009 at 7:25 PM, Michael Ludwig 
wrote:



A filter query should probably be orthogonal to the primary query,
which means in plain English: unrelated to the primary query. To give
an example, I have a field "category", which is a required field. In
the class of searches where I use a filter on that field, the primary
search is for something entirely different, so in most cases, it will
not, or not necessarily, bias the primary result to any particular
distribution of the category values. I then allow the application to
apply filtering by category, incidentally, using faceting, which is a
typical usage pattern, I guess.


Yes and no. There are use-cases where the query is applicable only to
the filtered set. For example, when the same index contains many
different "types" of documents. It is just that the intersection may
need to do more or less work.


Sorry, I don't understand. I used to think that the engine applies the
filter to the primary query result. What you're saying here sounds as if
it could also pre-filter my document collection to then apply a query to
it (which should yield the same result). What does it mean that "the
query is applicable only to the filtered set"?

And thanks for having clarified the other points!

Michael Ludwig

Re: filterCache/@size, queryResultCache/@size, documentCache/@size

2009-06-09 Thread Shalin Shekhar Mangar

On Tue, Jun 9, 2009 at 7:47 PM, Michael Ludwig  wrote:

> Common cache configuration parameters include @size ("size" attribute).
>
> http://wiki.apache.org/solr/SolrCaching
>
> For each of the following, does this mean the maximum size of:
>
> * filterCache/@size - filter query results?


Maximum number of filters that can be cached.


> * queryResultCache/@size - query results?


Maximum number of queries (DocLists) that can be cached.


> * documentCache/@size - documents?


Correct.


> So if I know my tiny documents don't take up much memory (just 500
> Bytes on average), I'd want to have very different settings for the
> documentCache than if I decided to store 10 KB per doc in Solr?


Correct.


> And if I know that only 100 filters are possible, there is no point
> raising the filterCache/@size above that threshold?


Correct. Faceting also uses the filterCache so keep that in mind too.


> Given the following three filtering scenarios of (a) x:bla, (b) y:blub,
> and (c) x:bla AND y:blub, will I end up with two or three distinct
> filters? In other words, may filters be composites or are they
> decomposed as far as their number (relevant for @size) is concerned?
>

It will be three. If you want to cache separately, send them as separate fq
parameters.

-- 
Regards,
Shalin Shekhar Mangar.

Re: fq vs. q

2009-06-09 Thread Shalin Shekhar Mangar

On Tue, Jun 9, 2009 at 11:11 PM, Michael Ludwig  wrote:

>
> Sorry, I don't understand. I used to think that the engine applies the
> filter to the primary query result. What you're saying here sounds as if
> it could also pre-filter my document collection to then apply a query to
> it (which should yield the same result). What does it mean that "the
> query is applicable only to the filtered set"?
>

Sorry for not being clear. No, both filters and queries are computed on the
entire index.

My comment was related to the "A filter query should probably be orthogonal
to the primary query..." part. I meant that both kinds of use-cases are
common.

-- 
Regards,
Shalin Shekhar Mangar.

Re: filterCache/@size, queryResultCache/@size, documentCache/@size

2009-06-09 Thread Michael Ludwig


Shalin Shekhar Mangar schrieb:

On Tue, Jun 9, 2009 at 7:47 PM, Michael Ludwig 
wrote:



Given the following three filtering scenarios of (a) x:bla, (b)
y:blub, and (c) x:bla AND y:blub, will I end up with two or three
distinct filters? In other words, may filters be composites or are
they decomposed as far as their number (relevant for @size) is
concerned?


It will be three. If you want to cache separately, send them as
separate fq parameters.


Thanks a lot for clarifying all my questions.

Michael Ludwig

Re: fq vs. q

2009-06-09 Thread Michael Ludwig


Shalin Shekhar Mangar schrieb:


No, both filters and queries are computed on the entire index.

My comment was related to the "A filter query should probably be
orthogonal to the primary query..." part. I meant that both kinds of
use-cases are common.


Got it. Thanks :-)

Michael Ludwig

ExtractingRequestHandler and local files

2009-06-09 Thread doraiswamy thirumalai


Hi,

 

I would greatly appreciate a quick response to this question.
 
Is there a means of passing a local file to the ExtractingRequestHandler (as 
the enableRemoteStreaming/stream.file option does with the other handlers) so 
the file contents can directly be read from the local disk versus going over 
HTTP?
 
Per the Solr wiki entry for ExtractingRequestHandler, enableRemoteStreaming is 
not used?
 
This is also a tad confusing because the Ruby example off:
http://www.lucidimagination.com/blog/2009/02/17/acts_as_solr_cell/
explicitly recommends setting this parameter?
 
Thanks!


_
Lauren found her dream laptop. Find the PC that’s right for you.
http://www.microsoft.com/windows/choosepc/?ocid=ftp_val_wl_290

Re: ExtractingRequestHandler and local files

2009-06-09 Thread Grant Ingersoll

I haven't tried it, but I thought the enableRemoteStreaming stuff  
should work.  That stuff is handled by Solr in other places, if I  
recall correctly.  Have you tried it?


-Grant

On Jun 9, 2009, at 2:28 PM, doraiswamy thirumalai wrote:



Hi,



I would greatly appreciate a quick response to this question.

Is there a means of passing a local file to the  
ExtractingRequestHandler (as the enableRemoteStreaming/stream.file  
option does with the other handlers) so the file contents can  
directly be read from the local disk versus going over HTTP?


Per the Solr wiki entry for ExtractingRequestHandler,  
enableRemoteStreaming is not used?


This is also a tad confusing because the Ruby example off:
http://www.lucidimagination.com/blog/2009/02/17/acts_as_solr_cell/
explicitly recommends setting this parameter?

Thanks!


_
Lauren found her dream laptop. Find the PC that’s right for you.
http://www.microsoft.com/windows/choosepc/?ocid=ftp_val_wl_290





--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search

Re: Configure Collection Distribution in Solr 1.3

2009-06-09 Thread MaheshR


Hi Aleksander ,


I gone thorugh the below links and successfully configured rsync using
cygwin on windows xp. In Solr documentation they mentioned many script files
like rysnc-enable, snapshooter..etc. These all UNIX based  files scripts.
where do I get these script files for windows OS ?

Any help on this would be great helpful.

Thanks
MaheshR.



Aleksander M. Stensby wrote:
> 
> You'll find everything you need in the Wiki.
> http://wiki.apache.org/solr/SolrCollectionDistributionOperationsOutline
> 
> http://wiki.apache.org/solr/SolrCollectionDistributionScripts
> 
> If things are still uncertain I've written a guide for when we used the  
> solr distribution scrips on our lucene index earlier. You can read that  
> guide here:
> http://www.integrasco.no/index.php?option=com_content&view=article&id=51:lucene-index-replication&catid=35:blog&Itemid=53
> 
> Cheers,
>   Aleksander
> 
> 
> On Mon, 08 Jun 2009 18:22:01 +0200, MaheshR   
> wrote:
> 
>>
>> Hi,
>>
>> we configured multi-core solr 1.3 server in Tomcat 6.0.18 servlet  
>> container.
>> Its working great. Now I need to configure collection Distribution to
>> replicate indexing data between master and 2 slaves. Please provide me  
>> step
>> by step instructions to configure collection distribution between master  
>> and
>> slaves would be helpful.
>>
>> Thanks in advance.
>>
>> Thanks
>> Mahesh.
> 
> 
> 
> -- 
> Aleksander M. Stensby
> Lead software developer and system architect
> Integrasco A/S
> www.integrasco.no
> http://twitter.com/Integrasco
> 
> Please consider the environment before printing all or any of this e-mail
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Configure-Collection-Distribution-in-Solr-1.3-tp23927332p23949324.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Faceting on text fields

2009-06-09 Thread Yonik Seeley

Yep, all that sounds right.
An additional optimization counts terms for the documents *not* in the
set when the base set is over half the size of the index.

-Yonik
http://www.lucidimagination.com


On Tue, Jun 9, 2009 at 1:01 PM, Michael Ludwig  wrote:
> Yonik,
>
> from your initial comment for SOLR-475:
>
> | * To save space and speed up faceting, any term that matches enough
> | * documents will not be un-inverted... it will be skipped while
> | * building the un-inverted field structore, and will use a set
> | * intersection method during faceting.
>
> Does this mean that frequently occurring terms (which we can use for
> faceting in 1.3 without problems) are handled exactly as they were
> before, by allocating a slot in the filter cache upon request, while
> those zillions of pesky little fringe terms outside the mainstream,
> for which allocating a slot in the filter cache would be overkill
> (and possibly cause inefficient contention, eviction, and, hence,
> a performance penalty) are now handled by the new structure mapping
> documents to term numbers?
>
> So doing faceting for a given set of documents would result in (a) doing
> set intersection using those filter query results that have been set up
> (for the terms occurring in many documents), and (b) collecting all the
> pesky little terms from the new structure mapping documents to term
> numbers?
>
> So basically, depending on expediency, you (a) know the facets and count
> the documents which display them, or you (b) take the documents and see
> what facets they have?
>
> Michael Ludwig
>

Re: Initializing Solr Example

2009-06-09 Thread Grant Ingersoll

Define caught?  When I start up Solr, here's what I see (and know it's  
working):

2009-06-09 15:18:33.726::INFO:  Started SocketConnector @ 0.0.0.0:8983
Jun 9, 2009 3:18:33 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=null path=null params={q=static+firstSearcher+warming 
+query+from+solrconfig.xml} hits=0 status=0 QTime=30
Jun 9, 2009 3:18:33 PM org.apache.solr.core.QuerySenderListener  
newSearcher

INFO: QuerySenderListener done.
Jun 9, 2009 3:18:33 PM  
org.apache.solr.handler.component.SpellCheckComponent 
$SpellCheckerListener newSearcher

INFO: Loading spell index for spellchecker: default
Jun 9, 2009 3:18:33 PM  
org.apache.solr.handler.component.SpellCheckComponent 
$SpellCheckerListener newSearcher

INFO: Loading spell index for spellchecker: jarowinkler
Jun 9, 2009 3:18:33 PM  
org.apache.solr.handler.component.SpellCheckComponent 
$SpellCheckerListener newSearcher

INFO: Loading spell index for spellchecker: file
Jun 9, 2009 3:18:33 PM org.apache.solr.core.SolrCore registerSearcher
INFO: [] Registered new searcher searc...@f7378ab main

What happens if you browse to http://localhost:8983/solr/admin?  Or,  
what happens if you index documents?


Granted, the message could probably be clearer that "Solr is ready to  
go"


HTH,
Grant

On Jun 9, 2009, at 10:51 AM, Mukerjee, Neiloy (Neil) wrote:

In trying to run the example distributed with Solr 1.3.0 from the  
command line, the process seems to stop at the following line:

INFO: [] Registered new searcher searc...@147c1db main

The searcher ID is not always the same, but it repeatedly gets  
caught at this line. Any suggestions?


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search

Re: Fetching Dynamic Fields

2009-06-09 Thread Erik Hatcher

One option is to hit the Luke request handler (&numTerms=0 for best  
performance), grab all the field names there, then build the fl list  
(or facet.field in the cases I've used this trick for) from the fields  
with the prefix you desire.


Erik

On Jun 8, 2009, at 11:40 AM, Manepalli, Kalyan wrote:


Hi all,
   Is there a way to select all the dynamic fields in the fl  
field without using *. Here is what I am looking for.

Fields in the schema, locationName_*, locationId,description,content.
I want to select just the locationName_* and locationId. How can I  
do this without using fl=*, coz I don't want to fetch all the other  
fields.


Any suggestions in this regard will be helpful.

Thanks,
Kalyan Manepalli

Re: Faceting on text fields

2009-06-09 Thread Yao Ge

Michael,

Thanks for the update! I definitely need to get a 1.4 build see if it makes
a difference.

BTW, maybe instead of using faceting for text
mining/clustering/visualization purpose, we can build a separate feature in
SOLR for this. Many of commercial search engines I have experiences with
(Google Search Appliance, Vivisimo etc) provide dynamic term clustering
based on top N ranked documents (N is a parameter can be configured). When
facet field is highly fragmented (say a text field), the existing set
intersection based approach might no longer be optimum. Aggregating term
vectors over top N docs might be more attractive. Another features I can
really appreciate is to provide search time n-gram term clustering. Maybe
this might be better suited for "spell checker" as it just a different way
to display the alternative search terms.

-Yao

Michael Ludwig-4 wrote:
> 
> Yao Ge schrieb:
> 
>> The facet query is considerably slower comparing to other facets from
>> structured database fields (with highly repeated values). What I found
>> interesting is that even after I constrained search results to just a
>> few hunderd hits using other facets, these text facets are still very
>> slow.
>>
>> I understand that text fields are not good candidate for faceting as
>> it can contain very large number of unique values. However why it is
>> still slow after my matching documents is reduced to hundreds? Is it
>> because the whole filter is cached (regardless the matching docs) and
>> I don't have enough filter cache size to fit the whole list?
> 
> Very interesting questions! I think an answer would both require and
> further an understanding of how filters work, which might even lead to
> a more general guideline on when and how to use filters and facets.
> 
> Even though faceting appears to have changed in 1.4 vs 1.3, it would
> still be interesting to understand the 1.3 side of things.
> 
>> Lastly, what I really want to is to give user a chance to visualize
>> and filter on top relevant words in the free-text fields. Are there
>> alternative to facet field approach? term vectors? I can do client
>> side process based on top N (say 100) hits for this but it is my last
>> option.
> 
> Also a very interesting data mining question! I'm sorry I don't have any
> answers for you. Maybe someone else does.
> 
> Best,
> 
> Michael Ludwig
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Faceting-on-text-fields-tp23872891p23950084.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Initializing Solr Example

2009-06-09 Thread Mukerjee, Neiloy (Neil)

After that comes up in the command line, I can access the localhost address, 
but I can't enter anything on the command line. 

-Original Message-
From: Grant Ingersoll [mailto:gsing...@apache.org] 
Sent: Tuesday, June 09, 2009 3:20 PM
To: solr-user@lucene.apache.org
Subject: Re: Initializing Solr Example

Define caught?  When I start up Solr, here's what I see (and know it's  
working):
2009-06-09 15:18:33.726::INFO:  Started SocketConnector @ 0.0.0.0:8983
Jun 9, 2009 3:18:33 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=null path=null params={q=static+firstSearcher+warming 
+query+from+solrconfig.xml} hits=0 status=0 QTime=30
Jun 9, 2009 3:18:33 PM org.apache.solr.core.QuerySenderListener  
newSearcher
INFO: QuerySenderListener done.
Jun 9, 2009 3:18:33 PM  
org.apache.solr.handler.component.SpellCheckComponent 
$SpellCheckerListener newSearcher
INFO: Loading spell index for spellchecker: default
Jun 9, 2009 3:18:33 PM  
org.apache.solr.handler.component.SpellCheckComponent 
$SpellCheckerListener newSearcher
INFO: Loading spell index for spellchecker: jarowinkler
Jun 9, 2009 3:18:33 PM  
org.apache.solr.handler.component.SpellCheckComponent 
$SpellCheckerListener newSearcher
INFO: Loading spell index for spellchecker: file
Jun 9, 2009 3:18:33 PM org.apache.solr.core.SolrCore registerSearcher
INFO: [] Registered new searcher searc...@f7378ab main

What happens if you browse to http://localhost:8983/solr/admin?  Or,  
what happens if you index documents?

Granted, the message could probably be clearer that "Solr is ready to  
go"

HTH,
Grant

On Jun 9, 2009, at 10:51 AM, Mukerjee, Neiloy (Neil) wrote:

> In trying to run the example distributed with Solr 1.3.0 from the  
> command line, the process seems to stop at the following line:
> INFO: [] Registered new searcher searc...@147c1db main
>
> The searcher ID is not always the same, but it repeatedly gets  
> caught at this line. Any suggestions?

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search

Re: Initializing Solr Example

2009-06-09 Thread Grant Ingersoll

Solr is a server running in the Jetty web container and accepting  
requests over HTTP.  There is no command line tool, at least not in  
Solr itself, for interacting with Solr.   Typically people interact  
with it programmatically or via a Web Browser.


I'd start by walking through: http://lucene.apache.org/solr/tutorial.html 
 to familiarize yourself with Solr.


-Grant



On Jun 9, 2009, at 3:55 PM, Mukerjee, Neiloy (Neil) wrote:

After that comes up in the command line, I can access the localhost  
address, but I can't enter anything on the command line.


-Original Message-
From: Grant Ingersoll [mailto:gsing...@apache.org]
Sent: Tuesday, June 09, 2009 3:20 PM
To: solr-user@lucene.apache.org
Subject: Re: Initializing Solr Example

Define caught?  When I start up Solr, here's what I see (and know it's
working):
2009-06-09 15:18:33.726::INFO:  Started SocketConnector @ 0.0.0.0:8983
Jun 9, 2009 3:18:33 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=null path=null params={q=static+firstSearcher+warming
+query+from+solrconfig.xml} hits=0 status=0 QTime=30
Jun 9, 2009 3:18:33 PM org.apache.solr.core.QuerySenderListener
newSearcher
INFO: QuerySenderListener done.
Jun 9, 2009 3:18:33 PM
org.apache.solr.handler.component.SpellCheckComponent
$SpellCheckerListener newSearcher
INFO: Loading spell index for spellchecker: default
Jun 9, 2009 3:18:33 PM
org.apache.solr.handler.component.SpellCheckComponent
$SpellCheckerListener newSearcher
INFO: Loading spell index for spellchecker: jarowinkler
Jun 9, 2009 3:18:33 PM
org.apache.solr.handler.component.SpellCheckComponent
$SpellCheckerListener newSearcher
INFO: Loading spell index for spellchecker: file
Jun 9, 2009 3:18:33 PM org.apache.solr.core.SolrCore registerSearcher
INFO: [] Registered new searcher searc...@f7378ab main

What happens if you browse to http://localhost:8983/solr/admin?  Or,
what happens if you index documents?

Granted, the message could probably be clearer that "Solr is ready to
go"

HTH,
Grant

On Jun 9, 2009, at 10:51 AM, Mukerjee, Neiloy (Neil) wrote:


In trying to run the example distributed with Solr 1.3.0 from the
command line, the process seems to stop at the following line:
INFO: [] Registered new searcher searc...@147c1db main

The searcher ID is not always the same, but it repeatedly gets
caught at this line. Any suggestions?


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search

RE: ExtractingRequestHandler and local files

2009-06-09 Thread doraiswamy thirumalai


Thanks for the quick response, Grant.

 

We tried it and it seems to work.

 

The confusion stemmed from the fact that the wiki states that the parameter is 
not used - there are also comments in the test cases for the handler that say: 

 

//TODO: stop using locally defined fields once stream.file and stream.body 
start working everywhere

 

So wanted to confirm.

> From: gsing...@apache.org
> To: solr-user@lucene.apache.org
> Subject: Re: ExtractingRequestHandler and local files
> Date: Tue, 9 Jun 2009 14:50:43 -0400
> 
> I haven't tried it, but I thought the enableRemoteStreaming stuff 
> should work. That stuff is handled by Solr in other places, if I 
> recall correctly. Have you tried it?
> 
> -Grant
> 
> On Jun 9, 2009, at 2:28 PM, doraiswamy thirumalai wrote:
> 
> >
> > Hi,
> >
> >
> >
> > I would greatly appreciate a quick response to this question.
> >
> > Is there a means of passing a local file to the 
> > ExtractingRequestHandler (as the enableRemoteStreaming/stream.file 
> > option does with the other handlers) so the file contents can 
> > directly be read from the local disk versus going over HTTP?
> >
> > Per the Solr wiki entry for ExtractingRequestHandler, 
> > enableRemoteStreaming is not used?
> >
> > This is also a tad confusing because the Ruby example off:
> > http://www.lucidimagination.com/blog/2009/02/17/acts_as_solr_cell/
> > explicitly recommends setting this parameter?
> >
> > Thanks!
> >
> >
> > _
> > Lauren found her dream laptop. Find the PC that’s right for you.
> > http://www.microsoft.com/windows/choosepc/?ocid=ftp_val_wl_290
> 
> 
> 
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) 
> using Solr/Lucene:
> http://www.lucidimagination.com/search
> 

_
Windows Live™ SkyDrive™: Get 25 GB of free online storage.
http://windowslive.com/online/skydrive?ocid=TXT_TAGLM_WL_SD_25GB_062009

Re: Initializing Solr Example

2009-06-09 Thread Mat Brown

Neil - when started using the packaged start.jar, Solr runs in the
foreground; that's why you can't type anything in the command line after
starting it.

Mat

On Tue, Jun 9, 2009 at 15:55, Mukerjee, Neiloy (Neil) <
neil.muker...@alcatel-lucent.com> wrote:

> After that comes up in the command line, I can access the localhost
> address, but I can't enter anything on the command line.
>
> -Original Message-
> From: Grant Ingersoll [mailto:gsing...@apache.org]
> Sent: Tuesday, June 09, 2009 3:20 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Initializing Solr Example
>
> Define caught?  When I start up Solr, here's what I see (and know it's
> working):
> 2009-06-09 15:18:33.726::INFO:  Started SocketConnector @ 0.0.0.0:8983
> Jun 9, 2009 3:18:33 PM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=null path=null params={q=static+firstSearcher+warming
> +query+from+solrconfig.xml} hits=0 status=0 QTime=30
> Jun 9, 2009 3:18:33 PM org.apache.solr.core.QuerySenderListener
> newSearcher
> INFO: QuerySenderListener done.
> Jun 9, 2009 3:18:33 PM
> org.apache.solr.handler.component.SpellCheckComponent
> $SpellCheckerListener newSearcher
> INFO: Loading spell index for spellchecker: default
> Jun 9, 2009 3:18:33 PM
> org.apache.solr.handler.component.SpellCheckComponent
> $SpellCheckerListener newSearcher
> INFO: Loading spell index for spellchecker: jarowinkler
> Jun 9, 2009 3:18:33 PM
> org.apache.solr.handler.component.SpellCheckComponent
> $SpellCheckerListener newSearcher
> INFO: Loading spell index for spellchecker: file
> Jun 9, 2009 3:18:33 PM org.apache.solr.core.SolrCore registerSearcher
> INFO: [] Registered new searcher searc...@f7378ab main
>
> What happens if you browse to http://localhost:8983/solr/admin?  Or,
> what happens if you index documents?
>
> Granted, the message could probably be clearer that "Solr is ready to
> go"
>
> HTH,
> Grant
>
> On Jun 9, 2009, at 10:51 AM, Mukerjee, Neiloy (Neil) wrote:
>
> > In trying to run the example distributed with Solr 1.3.0 from the
> > command line, the process seems to stop at the following line:
> > INFO: [] Registered new searcher searc...@147c1db main
> >
> > The searcher ID is not always the same, but it repeatedly gets
> > caught at this line. Any suggestions?
>
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
> using Solr/Lucene:
> http://www.lucidimagination.com/search
>
>

Re: Refresh synonyms.txt file via replication

2009-06-09 Thread mlathe

Shalin Shekhar Mangar wrote:
> 
>> Second Question:
>> If i force an empty commit, like this:
>> curl
>> http://localhost:8080/solr_rep_master/core/update?stream.body=%3Ccommit/%3E
>> Then the changed synonym.txt config file are replicated to the slave.
>> Unfortunately now I need to do a core "RELOAD" on both the master and
>> slave
>> to get them to see the updated synonym.txt file.
>>
> Calling RELOAD on slave should not be necessary. If a configuration file
> is
> replicated, the slave is always reloaded. Can you try using the
> analysis.txt
> on a field which has the SynonymFilterFactory enabled to see if the new
> file
> is indeed not getting used?
> 

I'm a bit confused now. It's not doing what i saw before.
Now I can't get it to replicate when i do an "empty" commit. Rather I need
to do a real data update, and a commit, then any changes to the
solr_rep_master's conf/synonyms.txt file get replicated to the slave, and
the slave seems to pick up the change without reloading.

I'm not really sure what you mean by the analysis.txt file. Do you mean the
/analysis request handler? I've been making synonyms for "solr" so it is
pretty obvious if it was picked up.

Can you explain what you expect should happen? ie
1) should the slave replicate when you do an empty commit on the master?
2) If you change a master config file, and it is replicated to the slave,
would you expect the slave to pick it up automatically, but the master will
require a reload?

Thanks
--Matthias
-- 
View this message in context: 
http://www.nabble.com/Refresh-synonyms.txt-file-via-replication-tp23789187p23951978.html
Sent from the Solr - User mailing list archive at Nabble.com.

qf boost Versus field boost for Dismax queries

2009-06-09 Thread ashokc


When 'dismax' queries are use, where is the best place to apply boost
values/factors? While indexing by supplying the 'boost' attribute to the
field, or in solrconfig.xml by specifying the 'qf' parameter with the same
boosts? What are the advantages/disadvantages to each? What happens if both
boosts are present? Do they get multiplied?

Thanks

- ashok
-- 
View this message in context: 
http://www.nabble.com/qf-boost-Versus-field-boost-for-Dismax-queries-tp23952323p23952323.html
Sent from the Solr - User mailing list archive at Nabble.com.

facets and stopwords

2009-06-09 Thread JCodina


I have a text field from where I remove stop words, as a first approximation
I use facets to see the most common words in the text, but.. stopwords are
there, and if I search documents having the stopwords, then , there are no
documents in the answer. 
You can test it in this address (using solrjs, the texts are in spanish but
you can check in top words that "que" or "en" are there) but if you click on
them to perform the search no results  are given
http://projecte01.development.barcelonamedia.org/fonetic/
or the administrator at
http://projecte01.development.barcelonamedia.org/solr/admin
so you can check wat's going on on the content field.
I use the DataImportHandler to import the data, and
Solr analyzer shows me how  the stopwords are removed from both the query
and the indexed text, but why facets show me these words? 

-- 
View this message in context: 
http://www.nabble.com/facets-and-stopwords-tp23952823p23952823.html
Sent from the Solr - User mailing list archive at Nabble.com.

Problem using db-data-config.xml

2009-06-09 Thread jayakeerthi s

Hi All,

I am facing an issue while fetching the records from database by providing
the value" '${prod.prod_cd}' " in this type at db-data-config.xml.
It is working fine If I provide the exact value of the product code ie
'302437-413'

Here is the db-data-config.xm I am using






  AND p.prod_cd = '302437-413'">

















*

 
 
 
 
 
 

 


   
*
 
 
 
 

 
 
 

 
 
 
 



  
 
 
 


The issue is IF I replace the *AND prod_cd ='${prod.prod_cd}'   AND reg_id =
'${prod_reg.reg_id'">* with the exact value '302437-413' I am getting the
result If not it is not
executing the prod_reg and prod_reg_cmrc_styl entity.

Please advise anything I am missing in the above db-data-config.xml.

Thanks in advance.

Regards,
Jayakeerthi

Servlet filter for Solr

2009-06-09 Thread vivek sar

Hi,

  I've to intercept every request to solr (search and update) and log
some performance numbers. In order to do so I tried a Servlet filter
and added this to Solr's web.xml,

  
IndexFilter

com.xxx.index.filter.IndexRequestFilter

test-param
This parameter is for
testing.



IndexFilter
   
 SolrUpdate
 SolrServer


but, this doesn't seem to be working. Couple of questions,

1) What's wrong with my web.xml setting?
2) Is there any easier way to intercept calls to Solr without changing
its web.xml? Basically can I just change the solrconfig.xml to do so
(beside requesthandlers) so I don't have to customize the solr.war?

Thanks,
-vivek

How to index data without token in Solr

2009-06-09 Thread chem leakhina

Hi all,
I am very new in Solr and I want to use Solr to index data without token to
match with my search.
Does anyone know how to index data without token in Solr?
if possible, can you give me an example?

Thanks in advance,
LEE

Re: Refresh synonyms.txt file via replication

2009-06-09 Thread Noble Paul നോബിള്‍ नोब्ळ्

Hi ,
Unfortunately , the problem is that an 'empty' commit does not really
do anything. I mean, it is not a real commit.Solr takes a look to find
if the index is changed and if not, the call is ignored

When we designed it, the choice was to look for all the changed conf
files also to decide if a replication is required or not . That proved
to be expensive and error prone. so we relied on index change to
trigger this.


take the case of schema.xml change . schema.xml is changed first and
then indexing is done . If the schema.xml is replicated and slave core
is reloaded , it will cause error.

you can raise an issue and we can find out a better way to do this.

--Noble




On Wed, Jun 10, 2009 at 3:23 AM, mlathe wrote:
>
>
> Shalin Shekhar Mangar wrote:
>>
>>> Second Question:
>>> If i force an empty commit, like this:
>>> curl
>>> http://localhost:8080/solr_rep_master/core/update?stream.body=%3Ccommit/%3E
>>> Then the changed synonym.txt config file are replicated to the slave.
>>> Unfortunately now I need to do a core "RELOAD" on both the master and
>>> slave
>>> to get them to see the updated synonym.txt file.
>>>
>> Calling RELOAD on slave should not be necessary. If a configuration file
>> is
>> replicated, the slave is always reloaded. Can you try using the
>> analysis.txt
>> on a field which has the SynonymFilterFactory enabled to see if the new
>> file
>> is indeed not getting used?
>>
>
> I'm a bit confused now. It's not doing what i saw before.
> Now I can't get it to replicate when i do an "empty" commit. Rather I need
> to do a real data update, and a commit, then any changes to the
> solr_rep_master's conf/synonyms.txt file get replicated to the slave, and
> the slave seems to pick up the change without reloading.
>
> I'm not really sure what you mean by the analysis.txt file. Do you mean the
> /analysis request handler? I've been making synonyms for "solr" so it is
> pretty obvious if it was picked up.
>
> Can you explain what you expect should happen? ie
> 1) should the slave replicate when you do an empty commit on the master?
> 2) If you change a master config file, and it is replicated to the slave,
> would you expect the slave to pick it up automatically, but the master will
> require a reload?
>
> Thanks
> --Matthias
> --
> View this message in context: 
> http://www.nabble.com/Refresh-synonyms.txt-file-via-replication-tp23789187p23951978.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Servlet filter for Solr

2009-06-09 Thread Noble Paul നോബിള്‍ नोब्ळ्

if you wish to intercept "read" calls ,a filter is the only way.


On Wed, Jun 10, 2009 at 6:35 AM, vivek sar wrote:
> Hi,
>
>  I've to intercept every request to solr (search and update) and log
> some performance numbers. In order to do so I tried a Servlet filter
> and added this to Solr's web.xml,
>
>          
>                IndexFilter
>
> com.xxx.index.filter.IndexRequestFilter
>                
>                        test-param
>                        This parameter is for
> testing.
>                
>        
>        
>                IndexFilter
>               
>             SolrUpdate
>             SolrServer

I guess you canot put servlets in the filter mapping
>        
>
> but, this doesn't seem to be working. Couple of questions,
>
> 1) What's wrong with my web.xml setting?
> 2) Is there any easier way to intercept calls to Solr without changing
> its web.xml? Basically can I just change the solrconfig.xml to do so
> (beside requesthandlers) so I don't have to customize the solr.war?
>
> Thanks,
> -vivek
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Problem using db-data-config.xml

2009-06-09 Thread Noble Paul നോബിള്‍ नोब्ळ्

are you sure prod_cd and reg_id\ are emitted by respective entities in
the same name if not you may need to alias those fields (using as)

keep in mind ,the field namkes are case sensitive. Just to know what
are the values emitted use debug mode or use logTransformer

On Wed, Jun 10, 2009 at 4:55 AM, jayakeerthi s wrote:
> Hi All,
>
> I am facing an issue while fetching the records from database by providing
> the value" '${prod.prod_cd}' " in this type at db-data-config.xml.
> It is working fine If I provide the exact value of the product code ie
> '302437-413'
>
> Here is the db-data-config.xm I am using
>
> 
>  url="jdbc:oracle:thin:@*:1521:" user="lslsls"
> password="***"/>
>
>    
>    
>              AND p.prod_cd = '302437-413'">
>                
>                
>                
>                
>                
>                
>                
>                
>                
>                
>                
>                
>                
>                
>
>
>    
> *
>
>                 
>                 
>                 
>                 
>                  name="frst_prod_offr_dt"/>
>                 
>
>                 
>
>
>   
> *
>                     
>                      name="reg_cmrc_styl_nm"/>
>                     
>                     
>
>                     
>                     
>                     
>
>                     
>                     
>                     
>                     
>
>
>    
>  
>  
>  
>  
>
>
> The issue is IF I replace the *AND prod_cd ='${prod.prod_cd}'   AND reg_id =
> '${prod_reg.reg_id'">* with the exact value '302437-413' I am getting the
> result If not it is not
> executing the prod_reg and prod_reg_cmrc_styl entity.
>
> Please advise anything I am missing in the above db-data-config.xml.
>
> Thanks in advance.
>
> Regards,
> Jayakeerthi
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Upgrading 1.2.0 to 1.3.0 solr

2009-06-09 Thread Otis Gospodnetic

Francis,

If you can wait another month or so, you could skip 1.3.0, and jump to 1.4 
which will be released soon.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


>
>From: Francis Yakin 
>To: "solr-user@lucene.apache.org" 
>Sent: Wednesday, June 10, 2009 1:17:25 AM
>Subject: Upgrading 1.2.0 to 1.3.0 solr
>
> > 
>I am in process to upgrade our solr 1.2.0 to solr 1.3.0
> 
>Our solr 1.2.0 now is working fine, we just want to upgrade it cause we have 
>an application that requires some function from 1.3.0( we call it 
>autocomplete).
> 
>Currently our config files on 1.2.0 are as follow:
> 
>Solrconfig.xml
>Schema.xml ( we wrote this in house)
>Index_synonyms.txt ( we also modified and wrote this in house)
>Scripts.conf
>Protwords.txt
>Stopwords.txt
>Synonyms.txt
> 
>I understand on 1.3.0 , it has new solrconfig.xml .
> 
>My questions are:
> 
>1) what config files that I can reuse from 1.2.0 for 1.3.0
>   can I use the same schema.xml
>2) Solrconfig.xml, can I use the 1.2.0 version or I have to stick with 1.3.0
>   If I need to stick with 1.3.0, what that I need to change.
> 
>As of right I am testing it in my sandbox, so it doesn't work.
> 
>Please advice, if you have any docs for upgrading 1.2.0 to 1.3.0 let me know.
> 
>Thanks in advance
> 
>Francis
> 
>Note: I attached my solrconfigand schema.xml in this email
>  
>
>
>-Inline Attachment Follows-
>
>
>
>
>
>
>
>  
>
>  
>
>
>
> omitNorms="true"/>
>
>
> omitNorms="true"/>
>
>
>
>
>
>
>
>
>
>
>
>
> sortMissingLast="true" omitNorms="true"/>
> sortMissingLast="true" omitNorms="true"/>
> sortMissingLast="true" omitNorms="true"/>
> sortMissingLast="true" omitNorms="true"/>
>
>
>
> omitNorms="true"/>
>
>
>
>
>
>
> positionIncrementGap="100">
>  
>
>  
>
>
>
>
>  
>
> synonyms="index_synonyms.txt" ignoreCase="true" expand="true"/>
> words="stopwords.txt"/>
> generateNumberParts="1" catenateWords="1" catenateNumbers="1" 
> catenateAll="0"/>
>
> protected="protwords.txt"/>
> 
>  
>  
>
> ignoreCase="true" expand="true"/>
>  words="stopwords.txt"/>
> generateNumberParts="1" catenateWords="0" catenateNumbers="0" 
> catenateAll="0"/>
>
> protected="protwords.txt"/>
>
>  
>
>
>
>
> positionIncrementGap="100" >
>  
>
> ignoreCase="true" expand="false"/>
> words="stopwords.txt"/>
> generateNumberParts="0" catenateWords="1" catenateNumbers="1" 
> catenateAll="0"/>
>
> protected="protwords.txt"/>
>
>  
>
>
>
> sortMissingLast="true" omitNorms="true">
>  
>
>
>
>
>
>
>
>
>  
>
>
> 
> class="solr.StrField" /> 
>
>
>
>
>
>   
>
>/> 
>   
>   
>   
>   
>   
>   
>multiValue="false"/>
>multiValue="false"/>
>multiValue="false"/>
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>stored="true"/>
>   
>   
>
>
>
> stored="true"/>
>
>omitNorms="true"/>
>multiValued="true"/>
>
>   
>stored="true"/>
>
>   
>stored="false"/>
>
>   
>stored="false"/>
>  
>   
>stored="false"/>
>  
>
>   
>default="NOW" multiValued="false"/>
>  
>
>   
>   
>   
>   
>   
>   
>   
>   
>   
>
>
>   
>  
>
>
>
>id
>
>
>text
>
>
>
>
>  
>   
>
>   
>   
>   
>
>
>   
>   
>
>   
>   
>   
>
>   
>   
>   
>
>   
>   
>   
>  
>
>
>
>
>
>
>
>-Inline Attachment Follows-
>
>
>
>
>
>  
>  
> ${solr.abortOnConfigurationError:true}
>
>  
>  
>
>
>  
>   
>false
>10
>1000
>
>
>
>1000
>
>32
>2147483647
>1
>1000
>1
>
>
>
>
>
> 
>
>
>
> 
>
>
>single
>  
>
>  
>
>false
>32
>10
>
>
>2147483647
>1
>
>
>true
>  
>  
>  
>  
>
>  
>  
>
>
>
>
>
>
>
>
>
>  
>
>
>  
>
>1024
>
>
>
>  class="solr.LRUCache"
>  size="512"
>  initialSize="512"
>  autowarmCount="128"/>
>
>   
>  class="solr.LRUCache"
>  size="512"
>  initialSize="512"
>  autowarmCount="32"/>
>
>  
>  class="solr.LRUCache"
>  size="512"
>  initialSize="512"
>  autowarmCount="0"/>
>
>
>true
>
>
>
>
>   
>
>   
>50
>
>
>200
>
>
>
>
>
>
>
>  
> solr 0  name="rows">10 
> rocks 0  name="rows">10 
>  
>
>
>
>
>  
> fast_warm 0  name="rows">10 
>  
>
>
>
>false
>
>
>4
>
>  
>
>  
>  
>
> multipartUploadLimitInKB="2048" />
>
>
>
> etagSeed="Solr">
>   
>   
>   
>
>  
>  
>  
>  
>  
>
> 
>   explicit
>   
> 
>  
>
>
>

Re: How to index data without token in Solr

2009-06-09 Thread Otis Gospodnetic


Hello,

I don't follow the "index data without token to match with my search" part.  
Could you please give an example of what you mean?

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: chem leakhina 
> To: solr-user@lucene.apache.org
> Sent: Tuesday, June 9, 2009 10:06:35 PM
> Subject: How to index data without token in Solr
> 
> Hi all,
> I am very new in Solr and I want to use Solr to index data without token to
> match with my search.
> Does anyone know how to index data without token in Solr?
> if possible, can you give me an example?
> 
> Thanks in advance,
> LEE

Re: qf boost Versus field boost for Dismax queries

2009-06-09 Thread Otis Gospodnetic


It's like cooking.  If you put too much salt in your food, it's kind of hard to 
undo that and you end up with a salty meal.  Boosting at search time makes it 
easy to change boosts (e.g. when trying to find the best boost values), while 
boosting at index time "hard-codes" them.  You can use both and they should be 
multiplied.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: ashokc 
> To: solr-user@lucene.apache.org
> Sent: Tuesday, June 9, 2009 6:17:37 PM
> Subject: qf boost Versus field boost for Dismax queries
> 
> 
> When 'dismax' queries are use, where is the best place to apply boost
> values/factors? While indexing by supplying the 'boost' attribute to the
> field, or in solrconfig.xml by specifying the 'qf' parameter with the same
> boosts? What are the advantages/disadvantages to each? What happens if both
> boosts are present? Do they get multiplied?
> 
> Thanks
> 
> - ashok
> -- 
> View this message in context: 
> http://www.nabble.com/qf-boost-Versus-field-boost-for-Dismax-queries-tp23952323p23952323.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Faceting on text fields

2009-06-09 Thread Otis Gospodnetic


Yao,

Solr can already cluster top N hits using Carrot2:
http://wiki.apache.org/solr/ClusteringComponent

I've also done ugly "manual counting" of terms in top N hits.  For example, 
look at the right side of this:
http://www.simpy.com/user/otis/tag/%22machine+learning%22

Something like http://www.sematext.com/product-key-phrase-extractor.html could 
also be used.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Yao Ge 
> To: solr-user@lucene.apache.org
> Sent: Tuesday, June 9, 2009 3:46:13 PM
> Subject: Re: Faceting on text fields
> 
> 
> Michael,
> 
> Thanks for the update! I definitely need to get a 1.4 build see if it makes
> a difference.
> 
> BTW, maybe instead of using faceting for text
> mining/clustering/visualization purpose, we can build a separate feature in
> SOLR for this. Many of commercial search engines I have experiences with
> (Google Search Appliance, Vivisimo etc) provide dynamic term clustering
> based on top N ranked documents (N is a parameter can be configured). When
> facet field is highly fragmented (say a text field), the existing set
> intersection based approach might no longer be optimum. Aggregating term
> vectors over top N docs might be more attractive. Another features I can
> really appreciate is to provide search time n-gram term clustering. Maybe
> this might be better suited for "spell checker" as it just a different way
> to display the alternative search terms.
> 
> -Yao
> 
> 
> Michael Ludwig-4 wrote:
> > 
> > Yao Ge schrieb:
> > 
> >> The facet query is considerably slower comparing to other facets from
> >> structured database fields (with highly repeated values). What I found
> >> interesting is that even after I constrained search results to just a
> >> few hunderd hits using other facets, these text facets are still very
> >> slow.
> >>
> >> I understand that text fields are not good candidate for faceting as
> >> it can contain very large number of unique values. However why it is
> >> still slow after my matching documents is reduced to hundreds? Is it
> >> because the whole filter is cached (regardless the matching docs) and
> >> I don't have enough filter cache size to fit the whole list?
> > 
> > Very interesting questions! I think an answer would both require and
> > further an understanding of how filters work, which might even lead to
> > a more general guideline on when and how to use filters and facets.
> > 
> > Even though faceting appears to have changed in 1.4 vs 1.3, it would
> > still be interesting to understand the 1.3 side of things.
> > 
> >> Lastly, what I really want to is to give user a chance to visualize
> >> and filter on top relevant words in the free-text fields. Are there
> >> alternative to facet field approach? term vectors? I can do client
> >> side process based on top N (say 100) hits for this but it is my last
> >> option.
> > 
> > Also a very interesting data mining question! I'm sorry I don't have any
> > answers for you. Maybe someone else does.
> > 
> > Best,
> > 
> > Michael Ludwig
> > 
> > 
> 
> -- 
> View this message in context: 
> http://www.nabble.com/Faceting-on-text-fields-tp23872891p23950084.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Sharding strategy

2009-06-09 Thread Otis Gospodnetic


Aleksander,

In a sense you are lucky you have time-ordered data.  That makes it very easy 
to shard and cheaper to search - you know exactly which shards you need to 
query.  The beginning of the year situation should also be easy.  Do start with 
the latest shard for the current year, and go to next shard only if you have to 
(e.g. if you don't get enough results from the first shard).

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Aleksander M. Stensby 
> To: "solr-user@lucene.apache.org" 
> Sent: Tuesday, June 9, 2009 7:07:47 AM
> Subject: Sharding strategy
> 
> Hi all,
> I'm trying to figure out how to shard our index as it is growing rapidly and 
> we 
> want to make our solution scalable.
> So, we have documents that are most commonly sorted by their date. My initial 
> thought is to shard the index by date, but I wonder if you have any input on 
> this and how to best solve this...
> 
> I know that the most frequent queries will be executed against the "latest" 
> shard, but then let's say we shard by year, how do we best solve the 
> situation 
> that will occur in the beginning of a new year? (Some of the data will be in 
> the 
> last shard, but most of it will be on the second last shard.)
> 
> Would it be stupid to have a "latest" shard with duplicate data (always 
> consisting of the last 6 months or something like that) and maintain that 
> index 
> in addition to the regular yearly shards? Any one else facing a similar 
> situation with a good solution?
> 
> Any input would be greatly appreciated :)
> 
> Cheers,
> Aleksander
> 
> 
> 
> --Aleksander M. Stensby
> Lead software developer and system architect
> Integrasco A/S
> www.integrasco.no
> http://twitter.com/Integrasco
> 
> Please consider the environment before printing all or any of this e-mail

Re: Solr update performance decrease after a while

2009-06-09 Thread Otis Gospodnetic


Vincent,

It's hard to tell, but some things to look at are your JVM memory heap size, 
the status of various generations in the JVM, possibility of not enough memory 
and too frequent GC, etc. All can be seen with jconsole.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Vincent Pérès 
> To: solr-user@lucene.apache.org
> Sent: Tuesday, June 9, 2009 12:00:32 PM
> Subject: Solr update performance decrease after a while
> 
> 
> Hello,
> 
> We are indexing approximately 500 documents per day. My benchmark says an
> update is done in 0.7 sec just after Solr has been started. But it quickly
> decrease to 2.2 secs per update !
> I have just been focused on the Schema until now, and didn't changed many
> stuffs in the solrconfig file. Maybe you have some tips which could help me
> to be more linear ?
> 
> Thanks a lot
> Vincent
> -- 
> View this message in context: 
> http://www.nabble.com/Solr-update-performance-decrease-after-a-while-tp23945947p23945947.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr in distributed mode

2009-06-09 Thread Otis Gospodnetic


Hello,

All of this is covered on the Wiki, search for: distributed search

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Rakhi Khatwani 
> To: solr-user@lucene.apache.org
> Cc: ninad.r...@germinait.com; ranjit.n...@germinait.com; 
> saurabh.maha...@germinait.com
> Sent: Tuesday, June 9, 2009 4:55:55 AM
> Subject: solr in distributed mode
> 
> Hi,
> I was looking for ways in which we can use solr in distributed mode.
> is there anyways we can use solr indexes across machines or by using Hadoop
> Distributed File System?
> 
> Its has been mentioned in the wiki that
> When an index becomes too large to fit on a single system, or when a single
> query takes too long to execute, an index can be split into multiple shards,
> and Solr can query and merge results across those shards.
> 
> what i understand is that shards are a partition. are shards on the same
> machine or can it be on different machines?? do we have to manually
> split the indexes to store in different shards.
> 
> do you have an example or some tutorial which demonstrates distributed index
> searching/ storing using shards?
> 
> Regards,
> Raakhi

Re: Example folder - can we change it?

2009-06-09 Thread Otis Gospodnetic


Francis,

But that really is an example.  It's something that you can try and something 
that you can copy and base your own Solr setup on.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Francis Yakin 
> To: "solr-user@lucene.apache.org" 
> Sent: Monday, June 8, 2009 2:02:53 PM
> Subject: Example folder - can we change it?
> 
> 
> When I install solr , by default it will install it under 
> /opt/apache-solr-1.3.0/
> 
> The bin , config file and data is under /opt/apache-solr-1.3.0/example/solr
> 
> Is there anyway that we change the example to something else?
> Because "example" is can be interpreted wrong ( like sample, so it's not real)
> 
> 
> Francis

Re: creating new fields at index time - is it possible?

2009-06-09 Thread Otis Gospodnetic


Hello,

It might be expensive/slow, but you could write a custom 
UpdateRequestProcessor, "manually" run a field through the analyzer and then 
add/delete other fields right there, in the URP.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Kir4 
> To: solr-user@lucene.apache.org
> Sent: Sunday, June 7, 2009 8:18:43 PM
> Subject: Re: creating new fields at index time - is it possible?
> 
> 
> Now that I plan on adding new fields based on the data already present, it
> would be best to read the existing field after it has been processed
> (cleaned up) by the other analyzers.
> I was therefore planning on creating a custom analyzer that is started after
> the other default ones have been run; said analyzer would read the field and
> add new ones based on several rules and some data.
> 
> I have been told that UpdateRequestProcessor probably cannot be invoked from
> an analyzer.
> Is there any way for an analyzer to add new fields? 
> It would be enough to just populate them: I could add empty fields to the
> original document, and define for them analyzers that read the data of other
> fields previously analyzed and populate the empty field.
> 
> Thanks to anyone that may have answers to my questions. =)
> Best regards,
> G.
> 
> 
> 
> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
> > 
> > 
> > If you wish to plugin your code try this
> > http://wiki.apache.org/solr/UpdateRequestProcessor
> > 
> > 
> 
> -- 
> View this message in context: 
> http://www.nabble.com/creating-new-fields-at-index-time---is-it-possible--tp23741267p23916728.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to index data without token in Solr

2009-06-09 Thread chem leakhina

That's fine, now I've got solution for this.

Thanks any way

On Wed, Jun 10, 2009 at 12:29 PM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:

>
> Hello,
>
> I don't follow the "index data without token to match with my search" part.
>  Could you please give an example of what you mean?
>
>  Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
> > From: chem leakhina 
> > To: solr-user@lucene.apache.org
> > Sent: Tuesday, June 9, 2009 10:06:35 PM
> > Subject: How to index data without token in Solr
> >
> > Hi all,
> > I am very new in Solr and I want to use Solr to index data without token
> to
> > match with my search.
> > Does anyone know how to index data without token in Solr?
> > if possible, can you give me an example?
> >
> > Thanks in advance,
> > LEE
>
>

How to search date

2009-06-09 Thread chem leakhina

Hi,
Could you tell me how to make query to search Date with these conditions:

Before, After, Between, All

Could you please write some example for me?

Regards,
LEE

How to search date in Solr

2009-06-09 Thread chem leakhina

Hi,
Could you tell me how to make query to search Date in Solr with these
conditions:

Before, After, Between, All

Could you please write some example for me?

Regards,
LEE

Re: How to search date in Solr

2009-06-09 Thread Otis Gospodnetic


Hello,

These are all done with range queries.  They tend to look like this:

&q=add_date:[BeginDateHere TO EndDateHere]


You can use * for either BeginDateHere or EndDateHere to get the "before/after" 
effect.

"All" is just q=*:*

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: chem leakhina 
> To: solr-user@lucene.apache.org
> Sent: Wednesday, June 10, 2009 2:28:46 AM
> Subject: How to search date in Solr
> 
> Hi,
> Could you tell me how to make query to search Date in Solr with these
> conditions:
> 
> Before, After, Between, All
> 
> Could you please write some example for me?
> 
> Regards,
> LEE

Re: How to search date in Solr

2009-06-09 Thread chem leakhina

Thanks Otis

On Wed, Jun 10, 2009 at 1:32 PM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:

>
> Hello,
>
> These are all done with range queries.  They tend to look like this:
>
> &q=add_date:[BeginDateHere TO EndDateHere]
>
>
> You can use * for either BeginDateHere or EndDateHere to get the
> "before/after" effect.
>
> "All" is just q=*:*
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
> > From: chem leakhina 
> > To: solr-user@lucene.apache.org
> > Sent: Wednesday, June 10, 2009 2:28:46 AM
> > Subject: How to search date in Solr
> >
> > Hi,
> > Could you tell me how to make query to search Date in Solr with these
> > conditions:
> >
> > Before, After, Between, All
> >
> > Could you please write some example for me?
> >
> > Regards,
> > LEE
>
>

Re: Sharding strategy

2009-06-09 Thread Aleksander M. Stensby


Hi Otis,
thanks for your reply!
You could say I'm lucky (and I totally agree since I've made the choice of  
ordering the data that way:p).
What you describe is what I've thought about doing and I'm happy to read  
that you approve. It is always nice to know that you are not doing things  
completely off - that's what I love about this mailing list!


I've implemented a sharded "yellow pages" that builds up the shard  
parameter and it will obviously be easy to search in two shards to  
overcome the beginning of the year situation, just thought it might be a  
bit stupid to search for 1% of the data in the "latest shard" and the rest  
in shard n-1. How much of a performance decrease do you recon I will get  
from searching two shards instead of one?


Anyways, thanks for confirming things, Otis!

Cheers,
 Aleksander




On Wed, 10 Jun 2009 07:51:16 +0200, Otis Gospodnetic  
 wrote:




Aleksander,

In a sense you are lucky you have time-ordered data.  That makes it very  
easy to shard and cheaper to search - you know exactly which shards you  
need to query.  The beginning of the year situation should also be  
easy.  Do start with the latest shard for the current year, and go to  
next shard only if you have to (e.g. if you don't get enough results  
from the first shard).


 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: Aleksander M. Stensby 
To: "solr-user@lucene.apache.org" 
Sent: Tuesday, June 9, 2009 7:07:47 AM
Subject: Sharding strategy

Hi all,
I'm trying to figure out how to shard our index as it is growing  
rapidly and we

want to make our solution scalable.
So, we have documents that are most commonly sorted by their date. My  
initial
thought is to shard the index by date, but I wonder if you have any  
input on

this and how to best solve this...

I know that the most frequent queries will be executed against the  
"latest"
shard, but then let's say we shard by year, how do we best solve the  
situation
that will occur in the beginning of a new year? (Some of the data will  
be in the

last shard, but most of it will be on the second last shard.)

Would it be stupid to have a "latest" shard with duplicate data (always
consisting of the last 6 months or something like that) and maintain  
that index

in addition to the regular yearly shards? Any one else facing a similar
situation with a good solution?

Any input would be greatly appreciated :)

Cheers,
Aleksander



--Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this  
e-mail







--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail

RE: ExtractingRequestHandler and local files

2009-06-09 Thread Fergus McMenemie

I had also been wondering about this, but was to lazy/busy to post a
question. Now that it is resolved it would help lots if you could 
post ad example of how you invoked enableRemoteStreaming for your 
document(s)? 

Rgds Fergus.

>Thanks for the quick response, Grant.
>
> 
>
>We tried it and it seems to work.
>
> 
>
>The confusion stemmed from the fact that the wiki states that the parameter is 
>not used - there are also comments in the test cases for the handler that say: 
>
> 
>
>//TODO: stop using locally defined fields once stream.file and stream.body 
>start working everywhere
>
> 
>
>So wanted to confirm.
>
>> From: gsing...@apache.org
>> To: solr-user@lucene.apache.org
>> Subject: Re: ExtractingRequestHandler and local files
>> Date: Tue, 9 Jun 2009 14:50:43 -0400
>> 
>> I haven't tried it, but I thought the enableRemoteStreaming stuff 
>> should work. That stuff is handled by Solr in other places, if I 
>> recall correctly. Have you tried it?
>> 
>> -Grant
>> 
>> On Jun 9, 2009, at 2:28 PM, doraiswamy thirumalai wrote:
>> 
>> >
>> > Hi,
>> >
>> >
>> >
>> > I would greatly appreciate a quick response to this question.
>> >
>> > Is there a means of passing a local file to the 
>> > ExtractingRequestHandler (as the enableRemoteStreaming/stream.file 
>> > option does with the other handlers) so the file contents can 
>> > directly be read from the local disk versus going over HTTP?
>> >
>> > Per the Solr wiki entry for ExtractingRequestHandler, 
>> > enableRemoteStreaming is not used?
>> >
>> > This is also a tad confusing because the Ruby example off:
>> > http://www.lucidimagination.com/blog/2009/02/17/acts_as_solr_cell/
>> > explicitly recommends setting this parameter?
>> >
>> > Thanks!
>> >
>> >
>> > _
>> > Lauren found her dream laptop. Find the PC that¹s right for you.
>> > http://www.microsoft.com/windows/choosepc/?ocid=ftp_val_wl_290
>> 
>> 
>> 
>> 
>> --
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>> 
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) 
>> using Solr/Lucene:
>> http://www.lucidimagination.com/search
>> 
>
>_
>Windows Live SkyDrive: Get 25 GB of free online storage.
>http://windowslive.com/online/skydrive?ocid=TXT_TAGLM_WL_SD_25GB_062009

82 matches

Mail list logo