RE: Dismax phrase boosts on multi-value fields

2010-10-20 Thread Jason Brown
Thanks Jonathan.
 
To further clarify, I understand the the match of 
 
my blue rabbit
 
would have to be found in 1 element (of my multi-valued defined field) for the 
phrase boost on that field to kick in.
 
If for example my document had the following 3 entries for the multi-value 
field
 
 
my black cat
his blue car
her pink rabbit
 
Then I assume the phrase boost would not kick-in as the search term (my blue 
rabbit) isnt found in a single element (but can be found across them).
 
Thanks again
 
Jason.



From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
Sent: Tue 19/10/2010 17:27
To: solr-user@lucene.apache.org
Subject: Re: Dismax phrase boosts on multi-value fields



You are correct.  The query needs to match as a phrase. It doesn't need
to match "everything". Note that if a value is:

"long sentence with my blue rabbit in it",

then query "my blue rabbit" will also match as a phrase, for phrase
boosting or query purposes.

Jonathan

Jason Brown wrote:
> 
>
> Hi - I have a multi-value field, so say for example it consists of
>
> 'my black cat'
> 'my white dog'
> 'my blue rabbit'
>
> The field is whitespace parsed when put into the index.
>
> I have a phrase query boost configured on this field which I understand kicks 
> in when my search term is found entirely in this field.
>
> So, if the search term is 'my blue rabbit', then I understand that my phrase 
> boost will be applied as this is found entirley in this field.
>
> My question/presumption is that as this is a multi-valued field, only 1 value 
> of the multi-value needs to match for the phrase query boost (given my very 
> imaginative set of test data :-) above, you can see that this obviously 
> matches 1 value and not them all)
>
> Thanks for your help.
>
>
>
>
>
>
> If you wish to view the St. James's Place email disclaimer, please use the 
> link below
>
> http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
>
>  



If you wish to view the St. James's Place email disclaimer, please use the link 
below

http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer


Re: Solr with example Jetty and score problem

2010-10-20 Thread Floyd Wu
I tried this work-around, but seems not work for me.
I still get array of score in the response.

I have two physical server A and B

localhost --> A
test -->B

I issue query to A like this

http://localhost:8983/solr/core0/select?shards=test:8983/solr,localhost:8983/solr/core0&indent=on&version=2.2&q=*%3A*&fq=&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard
Hi Hoss,

But when I change query to

http://localhost:8983/solr/core0/select?shards=test:8983/solr&indent=on&version=2.2&q=*%3A*&fq=&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard

The score will be noraml. (that's just like issue query to test:8983)

any idea?



2010/10/16 Chris Hostetter 

>
> : Thanks. But do you have any suggest or work-around to deal with it?
>
> Posted in SOLR-2140
>
>   
>
> ..this key is to make sure solr knows "score" is not multiValued
>
>
> -Hoss
>


Re: Solr with example Jetty and score problem

2010-10-20 Thread Floyd Wu
Ok I Do a little test after previous email. The work-around that hoss
provided is not work when you issue query "*:*"

I tried to issue query like" key:aaa" and work-around works no matter shards
node is one or two or more.

Thanks hoss. And maybe you could try and help me confirmed this situation is
not coincidence.




2010/10/20 Floyd Wu 

> I tried this work-around, but seems not work for me.
> I still get array of score in the response.
>
> I have two physical server A and B
>
> localhost --> A
> test -->B
>
> I issue query to A like this
>
>
> http://localhost:8983/solr/core0/select?shards=test:8983/solr,localhost:8983/solr/core0&indent=on&version=2.2&q=*%3A*&fq=&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard
> Hi Hoss,
>
> But when I change query to
>
> http://localhost:8983/solr/core0/select?shards=test:8983/solr&indent=on&version=2.2&q=*%3A*&fq=&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard
>
> The score will be noraml. (that's just like issue query to test:8983)
>
> any idea?
>
>
>
> 2010/10/16 Chris Hostetter 
>
>
>> : Thanks. But do you have any suggest or work-around to deal with it?
>>
>> Posted in SOLR-2140
>>
>>   
>>
>> ..this key is to make sure solr knows "score" is not multiValued
>>
>>
>> -Hoss
>>
>
>


Re: Multiple partial word searching with dismax handler

2010-10-20 Thread Chamnap Chhorn
Anyone can suggests how to do multiple partial word searching?

On Wed, Oct 20, 2010 at 11:42 AM, Chamnap Chhorn wrote:

> Hi,
>
> I have some problem with combining the query with multiple parital-word
> searching in dismax handler. In order to make multiple partial word
> searching, I use EdgeNGramFilterFactory, and my query must be something like
> this: "name_ngram:sun name_ngram:hot" in q.alt combined with my search
> handler (
> http://localhost:8081/solr/select/?q.alt=name_ngram:sun%20name_ngram:hot&qt=products).
> I wonder how I combine this with my search handler.
>
> Here is my search handler config:
>   
> 
>   explicit
>   20
>   dismax
>   name^200 full_text
>   fap^15
>   uuid
>   2.2
>   on
>   0.1
> 
> 
>   type:Product
> 
> 
>   false
> 
> 
>   spellcheck
>   elevateProducts
> 
>   
>
> If I query with this url 
> http://localhost:8081/solr/select/?q.alt=name_ngram:sun%20name_ngram:hot&q=sun
> hot&qt=products,
> it doesn't show the correct answer like the previous query.
>
> How could configure this in my search handler with boost score?
>
> --
> Chhorn Chamnap
> http://chamnapchhorn.blogspot.com/
>



-- 
Chhorn Chamnap
http://chamnapchhorn.blogspot.com/


Re: xi:include

2010-10-20 Thread Stefan Matheis
Wouldn't it be easier to ensure, that your config.aspx returns valid
xml? Wrap your existing Code with some Exception-Handling and return your
fallback-xml if something goes wrong?


Searching with Number fields

2010-10-20 Thread Hasnain

Hi,

   Im having trouble with searching with number fields, if this field has
alphanumerics then search is working perfect but not with all numbers, can
anyone suggest me  solution???


  






  
  




  



-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Searching-with-Number-fields-tp1737513p1737513.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: xi:include

2010-10-20 Thread Peter A. Kirk
Hi

Thanks for your reply. In actual fact, the "config.aspx" will either return a 
valid xml, or it will return an empty string - and unfortunately an empty 
string is not considered valid xml by the Solr xml parser.

The "config.aspx" is a rather general application, returning all sorts of data, 
depending on the parameters supplied to it. It doesn't know what fallback xml 
is appropriate in a specific instance.

For example, it might be called like this:
http://localhost/config/config.aspx?id=core1dismax&weight=qf

But if configuration in solrconfig.xml is entered incorrectly (eg maybe the 
"id" parameter to config.aspx is incorrect) then config.aspx returns an empty 
string.

Other xml parsers which handle x:include, and xi:fallback, actually invoke the 
fallback if any error occurs during the include (not only if the include 
resource does not exist). Is it possible to configure the Solr parser so it 
invokes the fallback on any error?

Thanks,
Peter


From: Stefan Matheis [matheis.ste...@googlemail.com]
Sent: Wednesday, 20 October 2010 22:05
To: solr-user@lucene.apache.org
Subject: Re: xi:include

Wouldn't it be easier to ensure, that your config.aspx returns valid
xml? Wrap your existing Code with some Exception-Handling and return your
fallback-xml if something goes wrong?

Solr on WebSphere 7

2010-10-20 Thread laurent altaber
Hello experts,

Has anyone succeded in configuring and run Solr on WebSphere 7 and be
kind enough to help me on this ?

New to Solr and Websphere, I am looking on any hints on how to
configure solr on WebSphere 7. I was able to configure and run it on
tomcat and from the embedded Jetty.

The wiki page is very poor  on this particular Application Server, any
help would be greatly appreciated,


Thanks,

Veraz


Re: Not able to subscribe to ML

2010-10-20 Thread Tharindu Mathew
I had the same problem. The work around was to send mails in plain text.

On Wed, Oct 20, 2010 at 10:21 AM, Abdullah Shaikh
 wrote:
> Just a test mail to check if my mails are reaching the ML.
>
> I dont know, but my mails are failing to reach the ML with the following
> error :
>
> Delivery to the following recipient failed permanently:
>
>    solr-u...@lucene.apache.org
>
> Technical details of permanent failure:
> Google tried to deliver your message, but it was rejected by the recipient
> domain. We recommend contacting the other email provider for further
> information about the cause of this error. The error that the other server
> returned was: 552 552 spam score (5.7) exceeded threshold (state 18).
>
>
> - Abdullah
>



-- 
Regards,

Tharindu


Announcing " Blaze - Appliance for Solr "

2010-10-20 Thread Initcron Labs
Initcron Labs Announces "Blaze - Appliance for Solr" .

Read more at and download from :  http://www.initcron.org/blaze

Blaze is a tailor made appliance  preinstalled and preconfigured with Apache
Solr  running within Tomcat servlet  container. It  lets you focus on
developing applications based on Apache Solr platform  and not worry about
installation, configuration complexities.

Blaze Appliance is built with Suse Studio and is available in following
formats

- LiveCD
- USB Drive/ HDD Image
- Preload ISO
- Virtual Machine Images
- Xen
- VMWare, Virtualbox
- OVM Open Format
- Amazon EC2 Image Format


You could get your solr installation setup and running within minutes.
The appliance is also production ready being configured with Tomcat. Comes
with webyast for web administration and configuration of the appliance.



Thanks

Initcron Labs

www.initcron.org
















































 *
*


Step by step tutorial for multi-language indexing and search

2010-10-20 Thread Jakub Godawa
Hi everyone! (my first post)

I am new, but really curious about usefullness of lucene/solr in documents
search from the web applications. I use Ruby on Rails to create one, with
plugin "acts_as_solr_reloaded" that makes connection between web app and
solr easy.

So I am in a point, where I know that good solution is to prepare
multi-language documents with fields like:
question_en, answer_en,
question_fr, answer_fr,
question_pl,  answer_pl... etc.

I need to create an index that would work with 6 languages: english, french,
german, russian, ukrainian and polish.

My questions are:
1. Is it doable to have just one search field that behaves like Google's for
all those documents? It can be an option to indicate a language to search.
2. How should I begin changing the solr/conf/schema.xml (or other) file to
tailor it to my needs? As I am a real rookie here, I am still a bit confused
about "fields", "fieldTypes" and their connection with particular field (ex.
answer_fr) and the "tokenizers" and "analyzers". If someone can provide a
basic step by step tutorial on how to make it work in two languages I would
be more that happy.
3. Do all those languages are supported (officially/unofficialy) by
lucene/solr?

Thank you for help,
Jakub Godawa.


Re: Announcing " Blaze - Appliance for Solr "

2010-10-20 Thread Stefan Moises

 Sounds good, but there is nothing to download on Sourceforge?
Is this free or do you charge for it?

Cheers,
Stefan

Am 20.10.2010 13:03, schrieb Initcron Labs:

Initcron Labs Announces "Blaze - Appliance for Solr" .

Read more at and download from :  http://www.initcron.org/blaze

Blaze is a tailor made appliance  preinstalled and preconfigured with Apache
Solr  running within Tomcat servlet  container. It  lets you focus on
developing applications based on Apache Solr platform  and not worry about
installation, configuration complexities.

Blaze Appliance is built with Suse Studio and is available in following
formats

- LiveCD
- USB Drive/ HDD Image
- Preload ISO
- Virtual Machine Images
- Xen
- VMWare, Virtualbox
- OVM Open Format
- Amazon EC2 Image Format


You could get your solr installation setup and running within minutes.
The appliance is also production ready being configured with Tomcat. Comes
with webyast for web administration and configuration of the appliance.



Thanks

Initcron Labs

www.initcron.org
















































  *
*



--
***
Stefan Moises
Senior Softwareentwickler

shoptimax GmbH
Guntherstraße 45 a
90461 Nürnberg
Amtsgericht Nürnberg HRB 21703
GF Friedrich Schreieck

Tel.: 0911/25566-25
Fax:  0911/25566-29
moi...@shoptimax.de
http://www.shoptimax.de
***



Re: Announcing " Blaze - Appliance for Solr "

2010-10-20 Thread Stefan Matheis
Did you visit http://sourceforge.net/projects/blazeappliance/files/ ?
There are currently Blaze__Appliance_for_Solr.i686-0.1.1.oem.tar.gz (412MB)
& Blaze__Appliance_for_Solr.i686-0.1.1.ovf.tar.gz (434MB) to download

On Wed, Oct 20, 2010 at 3:23 PM, Stefan Moises  wrote:

>  Sounds good, but there is nothing to download on Sourceforge?
> Is this free or do you charge for it?
>
> Cheers,
> Stefan
>
> Am 20.10.2010 13:03, schrieb Initcron Labs:
>
>  Initcron Labs Announces "Blaze - Appliance for Solr" .
>>
>> Read more at and download from :  http://www.initcron.org/blaze
>>
>> Blaze is a tailor made appliance  preinstalled and preconfigured with
>> Apache
>> Solr  running within Tomcat servlet  container. It  lets you focus on
>> developing applications based on Apache Solr platform  and not worry about
>> installation, configuration complexities.
>>
>> Blaze Appliance is built with Suse Studio and is available in following
>> formats
>>
>> - LiveCD
>> - USB Drive/ HDD Image
>> - Preload ISO
>> - Virtual Machine Images
>> - Xen
>> - VMWare, Virtualbox
>> - OVM Open Format
>> - Amazon EC2 Image Format
>>
>>
>> You could get your solr installation setup and running within minutes.
>> The appliance is also production ready being configured with Tomcat. Comes
>> with webyast for web administration and configuration of the appliance.
>>
>>
>>
>> Thanks
>>
>> Initcron Labs
>>
>> www.initcron.org
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>  *
>> *
>>
>>
> --
> ***
> Stefan Moises
> Senior Softwareentwickler
>
> shoptimax GmbH
> Guntherstraße 45 a
> 90461 Nürnberg
> Amtsgericht Nürnberg HRB 21703
> GF Friedrich Schreieck
>
> Tel.: 0911/25566-25
> Fax:  0911/25566-29
> moi...@shoptimax.de
> http://www.shoptimax.de
> ***
>
>


Re: Boosting documents based on the vote count

2010-10-20 Thread Alexandru Badiu
Thanks, will look into those.

Andu

On Mon, Oct 18, 2010 at 4:14 PM, Ahmet Arslan  wrote:
>> I know but I can't figure out what
>> functions to use. :)
>
> Oh, I see. Why not just use {!boost b=log(vote)}?
>
> May be scale(vote,0.5,10)?
>
>
>
>


Shards VS Merged Core?

2010-10-20 Thread ahammad

Hello all,

I'm just wondering what the benefits/consequences are of using shards or
merging all the cores into a single core. Personally I have tried both, but
my document set is not large enough that I can actually test performance and
whatnot.

What is a better approach of implementing a search mechanism on multiple
cores (10-15 cores)?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Shards-VS-Merged-Core-tp1738771p1738771.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Announcing " Blaze - Appliance for Solr "

2010-10-20 Thread Stefan Moises
 oh, I guess they have just uploaded it... when I've checked the file 
list was empty :)


Am 20.10.2010 15:36, schrieb Stefan Matheis:

Did you visit http://sourceforge.net/projects/blazeappliance/files/ ?
There are currently Blaze__Appliance_for_Solr.i686-0.1.1.oem.tar.gz (412MB)
&  Blaze__Appliance_for_Solr.i686-0.1.1.ovf.tar.gz (434MB) to download

On Wed, Oct 20, 2010 at 3:23 PM, Stefan Moises  wrote:


  Sounds good, but there is nothing to download on Sourceforge?
Is this free or do you charge for it?

Cheers,
Stefan

Am 20.10.2010 13:03, schrieb Initcron Labs:

  Initcron Labs Announces "Blaze - Appliance for Solr" .

Read more at and download from :  http://www.initcron.org/blaze

Blaze is a tailor made appliance  preinstalled and preconfigured with
Apache
Solr  running within Tomcat servlet  container. It  lets you focus on
developing applications based on Apache Solr platform  and not worry about
installation, configuration complexities.

Blaze Appliance is built with Suse Studio and is available in following
formats

- LiveCD
- USB Drive/ HDD Image
- Preload ISO
- Virtual Machine Images
- Xen
- VMWare, Virtualbox
- OVM Open Format
- Amazon EC2 Image Format


You could get your solr installation setup and running within minutes.
The appliance is also production ready being configured with Tomcat. Comes
with webyast for web administration and configuration of the appliance.



Thanks

Initcron Labs

www.initcron.org
















































  *
*



--
***
Stefan Moises
Senior Softwareentwickler

shoptimax GmbH
Guntherstraße 45 a
90461 Nürnberg
Amtsgericht Nürnberg HRB 21703
GF Friedrich Schreieck

Tel.: 0911/25566-25
Fax:  0911/25566-29
moi...@shoptimax.de
http://www.shoptimax.de
***




--
***
Stefan Moises
Senior Softwareentwickler

shoptimax GmbH
Guntherstraße 45 a
90461 Nürnberg
Amtsgericht Nürnberg HRB 21703
GF Friedrich Schreieck

Tel.: 0911/25566-25
Fax:  0911/25566-29
moi...@shoptimax.de
http://www.shoptimax.de
***



Re: Step by step tutorial for multi-language indexing and search

2010-10-20 Thread Dennis Gearon
Thre's approximately a 100% chance that you are going to go through a server 
side langauge(php, ruby, pearl, java, VB/asp/,net[cough,cough]), before you get 
to Solr/Lucene. I'd recommend it anyway.

This code will should look at the user's browser locale (en_US, pl_PL, es_CO, 
etc). The server side langauge would then choose wich language to search by and 
display.

NOW, that being said, are you going to have the exact same content for all 
langauges, just translated? The temptation would be to translate to a common 
language like English, then do the search, then get the translation. I wouln'dt 
recommend it, but I'm no expert. Translation of single words can be OK, but 
mulitword ideas and especially sentences doesn't work so well that way.

you probably will have separate content for that reason, AND another. Different 
cultures are interested in different things and only have common ground on 
cetain things like international news (but with different opinions) and medical 
news. So different content for differnt cultures speaking different languages.

Are you tryihg to address differnt languages in some place like the US or Great 
Britain, with LOTS of different languages spoken in minority cultures? Only 
then would you want a geographically centered server and information gathering 
organization. If you were going to have search for other countries, then I'd 
recommend those resources be geogrpahically close to their source culture.
Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Wed, 10/20/10, Jakub Godawa  wrote:

> From: Jakub Godawa 
> Subject: Step by step tutorial for multi-language indexing and search
> To: solr-user@lucene.apache.org
> Date: Wednesday, October 20, 2010, 6:03 AM
> Hi everyone! (my first post)
> 
> I am new, but really curious about usefullness of
> lucene/solr in documents
> search from the web applications. I use Ruby on Rails to
> create one, with
> plugin "acts_as_solr_reloaded" that makes connection
> between web app and
> solr easy.
> 
> So I am in a point, where I know that good solution is to
> prepare
> multi-language documents with fields like:
> question_en, answer_en,
> question_fr, answer_fr,
> question_pl,  answer_pl... etc.
> 
> I need to create an index that would work with 6 languages:
> english, french,
> german, russian, ukrainian and polish.
> 
> My questions are:
> 1. Is it doable to have just one search field that behaves
> like Google's for
> all those documents? It can be an option to indicate a
> language to search.
> 2. How should I begin changing the solr/conf/schema.xml (or
> other) file to
> tailor it to my needs? As I am a real rookie here, I am
> still a bit confused
> about "fields", "fieldTypes" and their connection with
> particular field (ex.
> answer_fr) and the "tokenizers" and "analyzers". If someone
> can provide a
> basic step by step tutorial on how to make it work in two
> languages I would
> be more that happy.
> 3. Do all those languages are supported
> (officially/unofficialy) by
> lucene/solr?
> 
> Thank you for help,
> Jakub Godawa.
>


Re: why solr search is slower than lucene so much?

2010-10-20 Thread Yonik Seeley
Careful comparing apples to oranges ;-)
For one, your lucene code doesn't retrieve stored fields.
Did you try the solr request more than once (with a different q, but
the same filters?)

Also, by default, Solr independently caches the filters.  This can be
higher up-front cost, but a win when filters are reused.  If you want
something closer to your lucene code, you could add all the filters to
 the main query and not use "fq".

-Yonik
http://www.lucidimagination.com



On Wed, Oct 20, 2010 at 7:07 AM, kafka0102  wrote:
> HI.
> my solr seach has some performance problem recently.
> my query is like that: q=xx&fq=fid:1&fq=atm:[int_time1 TO int_time2],
> fid's type is :  precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
> atm's type is :  precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
> my index's size is about 500M and record num is 3984274.
> when I use solr's SolrIndexSearcher.search(QueryResult qr, QueryCommand
> cmd), it cost about70ms. When I changed use lucence'API, just like bottom:
>
>      final SolrQueryRequest req = rb.req;
>      final SolrIndexSearcher searcher = req.getSearcher();
>      final SolrIndexSearcher.QueryCommand cmd = rb.getQueryCommand();
>      final ExecuteTimeStatics timeStatics =
> ExecuteTimeStatics.getExecuteTimeStatics();
>      final ExecuteTimeUnit staticUnit =
> timeStatics.addExecuteTimeUnit("test2");
>      staticUnit.start();
>      final List query = cmd.getFilterList();
>      final BooleanQuery booleanFilter = new BooleanQuery();
>      for (final Query q : query) {
>        booleanFilter.add(new BooleanClause(q,Occur.MUST));
>      }
>      booleanFilter.add(new BooleanClause(cmd.getQuery(),Occur.MUST));
>      logger.info("q:"+query);
>      final Sort sort = cmd.getSort();
>      final TopFieldDocs docs = searcher.search(booleanFilter,null,20,sort);
>      final StringBuilder sbBuilder = new StringBuilder();
>      for (final ScoreDoc doc :docs.scoreDocs) {
>        sbBuilder.append(doc.doc+",");
>      }
>      logger.info("hits:"+docs.totalHits+",result:"+sbBuilder.toString());
>      staticUnit.end();
>
> it cost only about 20ms.
> I'm so confused. For solr's config, I closed cache. For test, I first called
> lucene's, and then solr's.
> Maybe I should look solr's source more carefully. But now, can anyone knows
> the reason?
>
>
>


Re: Step by step tutorial for multi-language indexing and search

2010-10-20 Thread Jakub Godawa
2010/10/20 Dennis Gearon 

> Thre's approximately a 100% chance that you are going to go through a
> server side langauge(php, ruby, pearl, java, VB/asp/,net[cough,cough]),
> before you get to Solr/Lucene. I'd recommend it anyway.
>

I use a server side language (Ruby) as I build the web application.


> This code will should look at the user's browser locale (en_US, pl_PL,
> es_CO, etc). The server side langauge would then choose wich language to
> search by and display.
>

As I said, I may provide locale as an addition to the search query.


> NOW, that being said, are you going to have the exact same content for all
> langauges, just translated? The temptation would be to translate to a common
> language like English, then do the search, then get the translation. I
> wouln'dt recommend it, but I'm no expert. Translation of single words can be
> OK, but mulitword ideas and especially sentences doesn't work so well that
> way.
>

I would like not to yield that temptation. I know that Solr/Lucene can work
with many lanugages and I think is has a purpose - like languages' semantic
diversity. Whats more, you often don't translate things literally even if
they are just translations.


> you probably will have separate content for that reason, AND another.
> Different cultures are interested in different things and only have common
> ground on cetain things like international news (but with different
> opinions) and medical news. So different content for differnt cultures
> speaking different languages.
>

I need to treat each culture separetly regarding the subject of query.


> Are you tryihg to address differnt languages in some place like the US or
> Great Britain, with LOTS of different languages spoken in minority cultures?
> Only then would you want a geographically centered server and information
> gathering organization. If you were going to have search for other
> countries, then I'd recommend those resources be geogrpahically close to
> their source culture.
>

No I am not trying to address miniority cultures.

Thanks for answer,
Jakub Godawa.

Dennis Gearon
>
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a
> better idea to learn from others’ mistakes, so you do not have to make them
> yourself. from '
> http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
> EARTH has a Right To Life,
>  otherwise we all die.
>
>
> --- On Wed, 10/20/10, Jakub Godawa  wrote:
>
> > From: Jakub Godawa 
> > Subject: Step by step tutorial for multi-language indexing and search
> > To: solr-user@lucene.apache.org
> > Date: Wednesday, October 20, 2010, 6:03 AM
> > Hi everyone! (my first post)
> >
> > I am new, but really curious about usefullness of
> > lucene/solr in documents
> > search from the web applications. I use Ruby on Rails to
> > create one, with
> > plugin "acts_as_solr_reloaded" that makes connection
> > between web app and
> > solr easy.
> >
> > So I am in a point, where I know that good solution is to
> > prepare
> > multi-language documents with fields like:
> > question_en, answer_en,
> > question_fr, answer_fr,
> > question_pl,  answer_pl... etc.
> >
> > I need to create an index that would work with 6 languages:
> > english, french,
> > german, russian, ukrainian and polish.
> >
> > My questions are:
> > 1. Is it doable to have just one search field that behaves
> > like Google's for
> > all those documents? It can be an option to indicate a
> > language to search.
> > 2. How should I begin changing the solr/conf/schema.xml (or
> > other) file to
> > tailor it to my needs? As I am a real rookie here, I am
> > still a bit confused
> > about "fields", "fieldTypes" and their connection with
> > particular field (ex.
> > answer_fr) and the "tokenizers" and "analyzers". If someone
> > can provide a
> > basic step by step tutorial on how to make it work in two
> > languages I would
> > be more that happy.
> > 3. Do all those languages are supported
> > (officially/unofficialy) by
> > lucene/solr?
> >
> > Thank you for help,
> > Jakub Godawa.
> >
>


Mulitple facet - fq

2010-10-20 Thread Yavuz Selim YILMAZ
Under category facet, there are multiple selections, whicih can be
personal,corporate or other 

How can I get both "personal" and "corporate" ones, I tried
fq=category:corporate&fq=category:personal

It looks easy, but I can't find the solution.


--

Yavuz Selim YILMAZ


London open-source search social - 28th Nov - NEW VENUE

2010-10-20 Thread Richard Marr
Hi all,

We've booked a London Search Social for Thursday the 28th Sept. Come
along if you fancy geeking out about search and related technology
over a beer.

Please note that we're not meeting in the same place as usual. Details
on the meetup page.
http://www.meetup.com/london-search-social/

Rich


Re: London open-source search social - 28th Nov - NEW VENUE

2010-10-20 Thread Richard Marr
Wow, apologies for utter stupidity. Both subject line and body should
have read 28th OCT.



On 20 October 2010 15:42, Richard Marr  wrote:
> Hi all,
>
> We've booked a London Search Social for Thursday the 28th Sept. Come
> along if you fancy geeking out about search and related technology
> over a beer.
>
> Please note that we're not meeting in the same place as usual. Details
> on the meetup page.
> http://www.meetup.com/london-search-social/
>
> Rich
>



-- 
Richard Marr


Re: Mulitple facet - fq

2010-10-20 Thread Markus Jelsma
It should work fine. Make sure the field is indexed and check your index.

On Wednesday 20 October 2010 16:39:03 Yavuz Selim YILMAZ wrote:
> Under category facet, there are multiple selections, whicih can be
> personal,corporate or other 
> 
> How can I get both "personal" and "corporate" ones, I tried
> fq=category:corporate&fq=category:personal
> 
> It looks easy, but I can't find the solution.
> 
> 
> --
> 
> Yavuz Selim YILMAZ

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350


How to delete a SOLR document if that particular data doesnt exist in DB?

2010-10-20 Thread bbarani

Hi,

I have a very common question but couldnt find any post related to my
question in this forum,

I am currently initiating a full import each week but the data that have
been deleted in the source is not update in my document as I am using
clean=false.

We are indexing multiple data by data types hence cant delete the index and
do a complete re-indexing each week also we want to delete the orphan solr
documents (for which the data is not present in back end DB) on a daily
basis.

Now my question is.. Is there a way I can use preImportDeleteQuery to delete
the documents from SOLR for which the data doesnt exist in back end db? I
dont have anything called delete status in DB, instead I need to get all the
UID's from SOLR document and compare it with all the UID's in back end and
delete the data from SOLR document for the UID's which is not present in DB.

Any suggestion / ideas would be of great help.

Note: Currently I have developed a simple program which will fetch the UID's
from SOLR document and then connect to backend DB to check the orphan UID's
and delete the documents from SOLR index corresponding to orphan UID's. I
just dont want to re-invent the wheel if this feature is already present in
SOLR as I need to do more testing in terms of performance / scalability for
my program..

Thanks,
Barani


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-delete-a-SOLR-document-if-that-particular-data-doesnt-exist-in-DB-tp1739222p1739222.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to delete a SOLR document if that particular data doesnt exist in DB?

2010-10-20 Thread Ezequiel Calderara
Can't you in each delete of that data, save the ids in other table?
And then process those ids against solr to delete them?
On Wed, Oct 20, 2010 at 11:51 AM, bbarani  wrote:

>
> Hi,
>
> I have a very common question but couldnt find any post related to my
> question in this forum,
>
> I am currently initiating a full import each week but the data that have
> been deleted in the source is not update in my document as I am using
> clean=false.
>
> We are indexing multiple data by data types hence cant delete the index and
> do a complete re-indexing each week also we want to delete the orphan solr
> documents (for which the data is not present in back end DB) on a daily
> basis.
>
> Now my question is.. Is there a way I can use preImportDeleteQuery to
> delete
> the documents from SOLR for which the data doesnt exist in back end db? I
> dont have anything called delete status in DB, instead I need to get all
> the
> UID's from SOLR document and compare it with all the UID's in back end and
> delete the data from SOLR document for the UID's which is not present in
> DB.
>
> Any suggestion / ideas would be of great help.
>
> Note: Currently I have developed a simple program which will fetch the
> UID's
> from SOLR document and then connect to backend DB to check the orphan UID's
> and delete the documents from SOLR index corresponding to orphan UID's. I
> just dont want to re-invent the wheel if this feature is already present in
> SOLR as I need to do more testing in terms of performance / scalability for
> my program..
>
> Thanks,
> Barani
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-delete-a-SOLR-document-if-that-particular-data-doesnt-exist-in-DB-tp1739222p1739222.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
__
Ezequiel.

Http://www.ironicnet.com


Re: Mulitple facet - fq

2010-10-20 Thread Pradeep Singh
fq=(category:corporate category:personal)

On Wed, Oct 20, 2010 at 7:39 AM, Yavuz Selim YILMAZ  wrote:

> Under category facet, there are multiple selections, whicih can be
> personal,corporate or other 
>
> How can I get both "personal" and "corporate" ones, I tried
> fq=category:corporate&fq=category:personal
>
> It looks easy, but I can't find the solution.
>
>
> --
>
> Yavuz Selim YILMAZ
>


Re: How to delete a SOLR document if that particular data doesnt exist in DB?

2010-10-20 Thread bbarani

ironicnet,

Thanks for your reply.

We actually use virtual DB modelling tool to fetch the data from various
sources during run time hence we dont have any control over the source. 

We consolidate the data from more than one source and index the consolidated
data using SOLR. We dont have any kind of update / access rights to source
data.

Thanks.
Barani
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-delete-a-SOLR-document-if-that-particular-data-doesnt-exist-in-DB-tp1739222p1739642.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to delete a SOLR document if that particular data doesnt exist in DB?

2010-10-20 Thread Mike Sokolov
Since you are performing a complete reload of all of your data, I don't 
understand why you can't create a new core, load your new data, swap 
your application to look at the new core, and then erase the old one, if 
you want.


Even so, you could track the timestamps on all your documents, which 
will be updated when you update the content.  Then when you're done you 
could delete anything with a timestamp prior to the time you started the 
latest import.


-Mike

On 10/20/2010 11:59 AM, bbarani wrote:

ironicnet,

Thanks for your reply.

We actually use virtual DB modelling tool to fetch the data from various
sources during run time hence we dont have any control over the source.

We consolidate the data from more than one source and index the consolidated
data using SOLR. We dont have any kind of update / access rights to source
data.

Thanks.
Barani
   


Re: Spatial

2010-10-20 Thread Pradeep Singh
Thanks for your response Grant.

I already have the bounding box based implementation in place. And on a
document base of around 350K it is super fast.

What about a document base of millions of documents? While a tier based
approach will narrow down the document space significantly this concern
might be misplaced because there are other numeric range queries I am going
to run anyway which don't have anything to do with spatial query. But the
keyword here is numeric range query based on NumericField, which is going to
be significantly faster than regular number based queries. I see that the
dynamic field type _latLon is of type double and not tdouble by default. Can
I have your input about that decision?

-Pradeep

On Tue, Oct 19, 2010 at 6:10 PM, Grant Ingersoll wrote:

>
> On Oct 19, 2010, at 6:23 PM, Pradeep Singh wrote:
>
> > https://issues.apache.org/jira/browse/LUCENE-2519
> >
> > If I change my code as per 2519
> >
> > to have this  -
> >
> > public double[] coords(double latitude, double longitude) {
> >double rlat = Math.toRadians(latitude);
> >double rlong = Math.toRadians(longitude);
> >double nlat = rlong * Math.cos(rlat);
> >return new double[]{nlat, rlong};
> >
> >  }
> >
> >
> > return this -
> >
> > x = (gamma - gamma[0]) cos(phi)
> > y = phi
> >
> > would it make it give correct results? Correct projections, tier ids?
>
> I'm not sure.  I have a lot of doubt around that code.  After making that
> correction, I spent several days trying to get the tests to pass and
> ultimately gave up.  Does that mean it is wrong?  I don't know.  I just
> don't have enough confidence to recommend it given that the tests I were
> asking it to do I could verify through other tools.  Personally, I would
> recommend seeing if one of the non-tier based approaches suffices for your
> situation and use that.
>
> -Grant


RE: Mulitple facet - fq

2010-10-20 Thread Tim Gilbert
As Prasad said:

fq=(category:corporate category:personal)

But you might want to check your schema.xml to see what you have here:




You can always specify your operator in your search between your facets.


fq=(category:corporate AND category:personal)

or

fq=(category:corporate OR category:personal)

I have an application where I am using searches on 10 more facets with
AND OR + and - options and it works flawlessly.

fq=(+category:corporate AND -category:personal)

meaning category is corporate and not personal.

Tim

-Original Message-
From: Pradeep Singh [mailto:pksing...@gmail.com] 
Sent: Wednesday, October 20, 2010 11:56 AM
To: solr-user@lucene.apache.org
Subject: Re: Mulitple facet - fq

fq=(category:corporate category:personal)

On Wed, Oct 20, 2010 at 7:39 AM, Yavuz Selim YILMAZ
 wrote:

> Under category facet, there are multiple selections, whicih can be
> personal,corporate or other 
>
> How can I get both "personal" and "corporate" ones, I tried
> fq=category:corporate&fq=category:personal
>
> It looks easy, but I can't find the solution.
>
>
> --
>
> Yavuz Selim YILMAZ
>


RE: Mulitple facet - fq

2010-10-20 Thread Tim Gilbert
Sorry, what Pradeep said, not Prasad.  My apologies Pradeep.

-Original Message-
From: Tim Gilbert 
Sent: Wednesday, October 20, 2010 12:18 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: Mulitple facet - fq

As Prasad said:

fq=(category:corporate category:personal)

But you might want to check your schema.xml to see what you have here:




You can always specify your operator in your search between your facets.


fq=(category:corporate AND category:personal)

or

fq=(category:corporate OR category:personal)

I have an application where I am using searches on 10 more facets with
AND OR + and - options and it works flawlessly.

fq=(+category:corporate AND -category:personal)

meaning category is corporate and not personal.

Tim

-Original Message-
From: Pradeep Singh [mailto:pksing...@gmail.com] 
Sent: Wednesday, October 20, 2010 11:56 AM
To: solr-user@lucene.apache.org
Subject: Re: Mulitple facet - fq

fq=(category:corporate category:personal)

On Wed, Oct 20, 2010 at 7:39 AM, Yavuz Selim YILMAZ
 wrote:

> Under category facet, there are multiple selections, whicih can be
> personal,corporate or other 
>
> How can I get both "personal" and "corporate" ones, I tried
> fq=category:corporate&fq=category:personal
>
> It looks easy, but I can't find the solution.
>
>
> --
>
> Yavuz Selim YILMAZ
>


EmbeddedSolrServer with one core and schema.xml loaded via ClassLoader, is it possible?

2010-10-20 Thread Paolo Castagna

Hi,
I am trying to use EmbeddedSolrServer with just one core and I'd like to
load solrconfig.xml, schema.xml and other configuration files from a jar
via getResourceAsStream(...).

I've tried to use SolrResourceLoader, but all my attempts failed with a
RuntimeException: Can't find resource [...].

Is it possible to construct an EmbeddedSolrServer loading all the config
files from a jar file?

Thank you in advance for your help,
Paolo


Re: How to delete a SOLR document if that particular data doesnt exist in DB?

2010-10-20 Thread Shawn Heisey

On 10/20/2010 9:59 AM, bbarani wrote:

We actually use virtual DB modelling tool to fetch the data from various
sources during run time hence we dont have any control over the source.

We consolidate the data from more than one source and index the consolidated
data using SOLR. We dont have any kind of update / access rights to source
data.


It seems likely that those who are in control of the data sources would 
be maintaining some kind of delete log, and that they should be able to 
make those logs available to you.


For my index, the data comes from a MySQL database.  When a delete is 
done at the database level, a database trigger records the old 
information to a main delete log table, as well as a separate table for 
the search system.  The build system uses that separate table to run 
deletes every ten minutes and keeps it trimmed to 48 hours of delete 
history.





Re: Announcing " Blaze - Appliance for Solr "

2010-10-20 Thread Initcron Labs
 oh, I guess they have just uploaded it... when I've checked the file list
> was empty :)
>
>
Yes. Upload is still in progress.

Currently all formats are on the Suse Gallery page. On the Sourceforge I
have managed to upload four formats now including live CD,  preload CD,
HDD/USB image and ovf format.  Two more formats to go for xen and
vmware/virtualbox. Visit Sourceforce page in a few hours and you'll see all
files. Thanks for patience.

Is this free or do you charge for it?
>>>
>>
This is both libre  and
gratis :)  Feel free to use, share, modify as you like as long as you adhere
to the license solr, tomcat comes with.

And please please...please  give us feedback and suggestions on what
you would like to see added to this appliance.  As a next step we are
thinking of including aja-solr, a very neat ajax based  user interface for
solr.


here are the download pages again for your reference,

Suse Studio :  http://susegallery.com/a/Kr7Ayv/blaze-appliance-for-solr
Sourceforge.net :  http://sourceforge.net/projects/blazeappliance/files/
Do check a Cool Prezi at : http://www.initcron.org/blaze/

Thanks
Initcron Labs





>
>>> Cheers,
>>> Stefan
>>>
>>> Am 20.10.2010 13:03, schrieb Initcron Labs:
>>>
>>>  Initcron Labs Announces "Blaze - Appliance for Solr" .
>>>
 Read more at and download from :  http://www.initcron.org/blaze

 Blaze is a tailor made appliance  preinstalled and preconfigured with
 Apache
 Solr  running within Tomcat servlet  container. It  lets you focus on
 developing applications based on Apache Solr platform  and not worry
 about
 installation, configuration complexities.

 Blaze Appliance is built with Suse Studio and is available in following
 formats

 - LiveCD
 - USB Drive/ HDD Image
 - Preload ISO
 - Virtual Machine Images
 - Xen
 - VMWare, Virtualbox
 - OVM Open Format
 - Amazon EC2 Image Format


 You could get your solr installation setup and running within minutes.
 The appliance is also production ready being configured with Tomcat.
 Comes
 with webyast for web administration and configuration of the appliance.



 Thanks

 Initcron Labs

 www.initcron.org
















































  *
 *


  --
>>> ***
>>> Stefan Moises
>>> Senior Softwareentwickler
>>>
>>> shoptimax GmbH
>>> Guntherstraße 45 a
>>> 90461 Nürnberg
>>> Amtsgericht Nürnberg HRB 21703
>>> GF Friedrich Schreieck
>>>
>>> Tel.: 0911/25566-25
>>> Fax:  0911/25566-29
>>> moi...@shoptimax.de
>>> http://www.shoptimax.de
>>> ***
>>>
>>>
>>>
> --
> ***
> Stefan Moises
> Senior Softwareentwickler
>
> shoptimax GmbH
> Guntherstraße 45 a
> 90461 Nürnberg
> Amtsgericht Nürnberg HRB 21703
> GF Friedrich Schreieck
>
> Tel.: 0911/25566-25
> Fax:  0911/25566-29
> moi...@shoptimax.de
> http://www.shoptimax.de
> ***
>
>


Multiple Similarity

2010-10-20 Thread raimon.bosch


Hi,

Is it possible to define different Similarity classes for different fields?
We have a use case where we are interested in avoid term frequency (tf) when
our fields are multiValued.

Regards,
Raimon Bosch.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-Similarity-tp1740290p1740290.html
Sent from the Solr - User mailing list archive at Nabble.com.


Searching for Documents by Indexed Term

2010-10-20 Thread Sasank Mudunuri
Hi Solr Users,

I used the TermsComponent to walk through all the indexed terms and find
ones of particular interest (named entities). And now, I'd like to search
for documents that contain these particular entities. I have both query-time
and index-time stemming set for the field, which means I can't just hit the
normal search handler because as I understand, it will stem the
already-stemmed term. Any ideas about how to search directly for the indexed
term? Maybe something I can do at query-time to disable stemming?

Thanks!
sasank


Re: How to delete a SOLR document if that particular data doesnt exist in DB?

2010-10-20 Thread Ezequiel Calderara
Also you can set an expiration policy maybe, and delete files that expire
after some time and aren't older than other... but i don't know if you can
iterate over the existing ids...

On Wed, Oct 20, 2010 at 1:34 PM, Shawn Heisey  wrote:

> On 10/20/2010 9:59 AM, bbarani wrote:
>
>> We actually use virtual DB modelling tool to fetch the data from various
>> sources during run time hence we dont have any control over the source.
>>
>> We consolidate the data from more than one source and index the
>> consolidated
>> data using SOLR. We dont have any kind of update / access rights to source
>> data.
>>
>
> It seems likely that those who are in control of the data sources would be
> maintaining some kind of delete log, and that they should be able to make
> those logs available to you.
>
> For my index, the data comes from a MySQL database.  When a delete is done
> at the database level, a database trigger records the old information to a
> main delete log table, as well as a separate table for the search system.
>  The build system uses that separate table to run deletes every ten minutes
> and keeps it trimmed to 48 hours of delete history.
>
>
>


-- 
__
Ezequiel.

Http://www.ironicnet.com


Sorting and filtering on fluctuating multi-currency price data?

2010-10-20 Thread Gregg Donovan
In our current search app, we have sorting and filtering based on item
prices. We'd like to extend this to support sorting and filtering in the
buyer's native currency with the items themselves listed in the seller's
native currency. E.g: as a buyer, if my native currency is the Euro, my
search of all items between 10 and 20 Euros would also find all items listed
in USD between 13.90 and 27.80, in CAD between 14.29 and 28.58, etc.

I wanted to run a few possible approaches by the list to see if we were on
the right track or not. Our index is updated every few minutes, but we only
update our currency conversions every few hours.

The easiest approach would be to update the documents with non-USD listings
every few hours with the USD-converted price. That will be fine, but if the
number of non-USD listings is large, this would be too expensive (i.e. large
parts of the index getting recreated frequently).

Another approach would be to use ExternalFileField and keep the price data,
normalized to USD, outside of the index. Every time the currency rates
changed, we would calculate new normalized prices for every document in the
index.

Still another approach would be to do the currency conversion at IndexReader
warmup time. We would index native price and currency code and create a
normalized currency field on the fly. This would be somewhat like
ExternalFileField in that it involved data from outside the index, but it
wouldn't need to be scoped to the parent SolrIndexReader, but could be
per-segment. Perhaps a custom poly-field could accomplish something like
this?

Has anyone dealt with this sort of problem? Do any of these approaches sound
more or less reasonable? Are we missing anything?

Thanks for the help!

Gregg Donovan
Technical Lead, Search
Etsy.com


Re: Step by step tutorial for multi-language indexing and search

2010-10-20 Thread Pradeep Singh
Here's what I would do -

Search all the fields everytime regardless of language. Use one handler and
specify all of these in "qf" and "pf".
question_en, answer_en,
question_fr, answer_fr,
question_pl,  answer_pl

Individual field based analyzers will take care of appropriate tokenization
and you will get a match across all languages.

Even with this setup if you wanted you could also have a separate field
called "language" and use a "fq" to limit searches to that language only.

-Pradeep

On Wed, Oct 20, 2010 at 6:03 AM, Jakub Godawa wrote:

> Hi everyone! (my first post)
>
> I am new, but really curious about usefullness of lucene/solr in documents
> search from the web applications. I use Ruby on Rails to create one, with
> plugin "acts_as_solr_reloaded" that makes connection between web app and
> solr easy.
>
> So I am in a point, where I know that good solution is to prepare
> multi-language documents with fields like:
> question_en, answer_en,
> question_fr, answer_fr,
> question_pl,  answer_pl... etc.
>
> I need to create an index that would work with 6 languages: english,
> french,
> german, russian, ukrainian and polish.
>
> My questions are:
> 1. Is it doable to have just one search field that behaves like Google's
> for
> all those documents? It can be an option to indicate a language to search.
> 2. How should I begin changing the solr/conf/schema.xml (or other) file to
> tailor it to my needs? As I am a real rookie here, I am still a bit
> confused
> about "fields", "fieldTypes" and their connection with particular field
> (ex.
> answer_fr) and the "tokenizers" and "analyzers". If someone can provide a
> basic step by step tutorial on how to make it work in two languages I would
> be more that happy.
> 3. Do all those languages are supported (officially/unofficialy) by
> lucene/solr?
>
> Thank you for help,
> Jakub Godawa.
>


Multiple indexes inside a single core

2010-10-20 Thread ben boggess
We are trying to convert a Lucene-based search solution to a
Solr/Lucene-based solution.  The problem we have is that we currently have
our data split into many indexes and Solr expects things to be in a single
index unless you're sharding.  In addition to this, our indexes wouldn't
work well using the distributed search functionality in Solr because the
documents are not evenly or randomly distributed.  We are currently using
Lucene's MultiSearcher to search over subsets of these indexes.

I know this has been brought up a number of times in previous posts and the
typical response is that the best thing to do is to convert everything into
a single index.  One of the major reasons for having the indexes split up
the way we do is because different types of data need to be indexed at
different intervals.  You may need one index to be updated every 20 minutes
and another is only updated every week.  If we move to a single index, then
we will constantly be warming and replacing searchers for the entire
dataset, and will essentially render the searcher caches useless.  If we
were able to have multiple indexes, they would each have a searcher and
updates would be isolated to a subset of the data.

The other problem is that we will likely need to shard this large single
index and there isn't a clean way to shard randomly and evenly across the of
the data.  We would, however like to shard a single data type.  If we could
use multiple indexes, we would likely be also sharding a small sub-set of
them.

Thanks in advance,

Ben


xpath processing

2010-10-20 Thread pghorpade


I am trying to import mods xml data in solr using  the xml/http datasource

This does not work with XPathEntityProcessor of the data import handler
xpath="/mods/name/namepa...@type = 'date']"

I actually have 143 records with type attribute as 'date' for element  
namePart.


Thank you
Parinita


Re: Dismax phrase boosts on multi-value fields

2010-10-20 Thread Erick Erickson
Well, it all depends (tm). your example wouldn't match, but if you
didn't have an increment gap greater than 1, "black cat his blue" #would#
match.

Best
Erick


On Wed, Oct 20, 2010 at 3:22 AM, Jason Brown  wrote:

> Thanks Jonathan.
>
> To further clarify, I understand the the match of
>
> my blue rabbit
>
> would have to be found in 1 element (of my multi-valued defined field) for
> the phrase boost on that field to kick in.
>
> If for example my document had the following 3 entries for the multi-value
> field
>
>
> my black cat
> his blue car
> her pink rabbit
>
> Then I assume the phrase boost would not kick-in as the search term (my
> blue rabbit) isnt found in a single element (but can be found across them).
>
> Thanks again
>
> Jason.
>
> 
>
> From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
> Sent: Tue 19/10/2010 17:27
> To: solr-user@lucene.apache.org
> Subject: Re: Dismax phrase boosts on multi-value fields
>
>
>
> You are correct.  The query needs to match as a phrase. It doesn't need
> to match "everything". Note that if a value is:
>
> "long sentence with my blue rabbit in it",
>
> then query "my blue rabbit" will also match as a phrase, for phrase
> boosting or query purposes.
>
> Jonathan
>
> Jason Brown wrote:
> >
> >
> > Hi - I have a multi-value field, so say for example it consists of
> >
> > 'my black cat'
> > 'my white dog'
> > 'my blue rabbit'
> >
> > The field is whitespace parsed when put into the index.
> >
> > I have a phrase query boost configured on this field which I understand
> kicks in when my search term is found entirely in this field.
> >
> > So, if the search term is 'my blue rabbit', then I understand that my
> phrase boost will be applied as this is found entirley in this field.
> >
> > My question/presumption is that as this is a multi-valued field, only 1
> value of the multi-value needs to match for the phrase query boost (given my
> very imaginative set of test data :-) above, you can see that this obviously
> matches 1 value and not them all)
> >
> > Thanks for your help.
> >
> >
> >
> >
> >
> >
> > If you wish to view the St. James's Place email disclaimer, please use
> the link below
> >
> > http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
> >
> >
>
>
>
> If you wish to view the St. James's Place email disclaimer, please use the
> link below
>
> http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
>


Re: Searching with Number fields

2010-10-20 Thread Erick Erickson
I don't see anything obvious. Try going to the admin page and click the
"analysis" link. That'll let you see pretty much exactly how things get
parsed both for indexing and querying.

Unless your synonyms are somehow getting in the way, but I don't
see how.

Best
Erick

On Wed, Oct 20, 2010 at 5:15 AM, Hasnain  wrote:

>
> Hi,
>
>   Im having trouble with searching with number fields, if this field has
> alphanumerics then search is working perfect but not with all numbers, can
> anyone suggest me  solution???
>
> 
>  
>
> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>
>
>ignoreCase="true"
>words="stopwords.txt"
>enablePositionIncrements="true"
>/>
>
>  
>  
>
> ignoreCase="true" expand="true"/>
>ignoreCase="true"
>words="stopwords.txt"
>enablePositionIncrements="true"
>/>
>
>  
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Searching-with-Number-fields-tp1737513p1737513.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Searching for Documents by Indexed Term

2010-10-20 Thread Erick Erickson
This may be a wild herring, but have you tried "raw"? NOTE: I'm a little
out of my depth here on what this actually does, so don't waste time by
thinking I'm an authority on this one. See:
http://lucene.apache.org/solr/api/org/apache/solr/search/RawQParserPlugin.html

and
http://wiki.apache.org/solr/SolrQuerySyntax
(this last under "built in query parsers").

HTH
Erick

On Wed, Oct 20, 2010 at 1:47 PM, Sasank Mudunuri  wrote:

> Hi Solr Users,
>
> I used the TermsComponent to walk through all the indexed terms and find
> ones of particular interest (named entities). And now, I'd like to search
> for documents that contain these particular entities. I have both
> query-time
> and index-time stemming set for the field, which means I can't just hit the
> normal search handler because as I understand, it will stem the
> already-stemmed term. Any ideas about how to search directly for the
> indexed
> term? Maybe something I can do at query-time to disable stemming?
>
> Thanks!
> sasank
>


Re: How can i get collect stemmed query?

2010-10-20 Thread Jerad

Thank you very much~! I'll try it :)


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-can-i-get-collect-search-result-from-custom-filtered-query-tp1723055p1742898.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to delete a SOLR document if that particular data doesnt exist in DB?

2010-10-20 Thread Erick Erickson
<<>>

Can you make delete by query work? Something like delete all Solr docs of
a certain type and do a full re-index of just that type?

I have no idea whether this is practical or not

But your solution also works. There's really no way Solr #can# know about
deleted database records, especially since the  field is
completely
arbitrarily defined.

Best
Erick

On Wed, Oct 20, 2010 at 10:51 AM, bbarani  wrote:

>
> Hi,
>
> I have a very common question but couldnt find any post related to my
> question in this forum,
>
> I am currently initiating a full import each week but the data that have
> been deleted in the source is not update in my document as I am using
> clean=false.
>
> We are indexing multiple data by data types hence cant delete the index and
> do a complete re-indexing each week also we want to delete the orphan solr
> documents (for which the data is not present in back end DB) on a daily
> basis.
>
> Now my question is.. Is there a way I can use preImportDeleteQuery to
> delete
> the documents from SOLR for which the data doesnt exist in back end db? I
> dont have anything called delete status in DB, instead I need to get all
> the
> UID's from SOLR document and compare it with all the UID's in back end and
> delete the data from SOLR document for the UID's which is not present in
> DB.
>
> Any suggestion / ideas would be of great help.
>
> Note: Currently I have developed a simple program which will fetch the
> UID's
> from SOLR document and then connect to backend DB to check the orphan UID's
> and delete the documents from SOLR index corresponding to orphan UID's. I
> just dont want to re-invent the wheel if this feature is already present in
> SOLR as I need to do more testing in terms of performance / scalability for
> my program..
>
> Thanks,
> Barani
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-delete-a-SOLR-document-if-that-particular-data-doesnt-exist-in-DB-tp1739222p1739222.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Searching for Documents by Indexed Term

2010-10-20 Thread Sasank Mudunuri
That looks very promising based on a couple of quick queries. Any objections
if I move the javadoc help into the wiki, specifically:

Create a term query from the input value without any text analysis or
> transformation whatsoever. This is useful in debugging, or when raw terms
> are returned from the terms component (this is not the default).


Thanks Eric!
sasank

On Wed, Oct 20, 2010 at 6:00 PM, Erick Erickson wrote:

> This may be a wild herring, but have you tried "raw"? NOTE: I'm a little
> out of my depth here on what this actually does, so don't waste time by
> thinking I'm an authority on this one. See:
>
> http://lucene.apache.org/solr/api/org/apache/solr/search/RawQParserPlugin.html
>
> and
> http://wiki.apache.org/solr/SolrQuerySyntax
> (this last under "built in query parsers").
>
> HTH
> Erick
>
> On Wed, Oct 20, 2010 at 1:47 PM, Sasank Mudunuri  wrote:
>
> > Hi Solr Users,
> >
> > I used the TermsComponent to walk through all the indexed terms and find
> > ones of particular interest (named entities). And now, I'd like to search
> > for documents that contain these particular entities. I have both
> > query-time
> > and index-time stemming set for the field, which means I can't just hit
> the
> > normal search handler because as I understand, it will stem the
> > already-stemmed term. Any ideas about how to search directly for the
> > indexed
> > term? Maybe something I can do at query-time to disable stemming?
> >
> > Thanks!
> > sasank
> >
>


Re: Step by step tutorial for multi-language indexing and search

2010-10-20 Thread Erick Erickson
See below:

But also search the archives for multilanguage, this topic has been
discussed
many times before. Lucid Imagination maintains a Solr-powered (of course)
searchable
list at: http://www.lucidimagination.com/search/



On Wed, Oct 20, 2010 at 9:03 AM, Jakub Godawa wrote:

> Hi everyone! (my first post)
>
> I am new, but really curious about usefullness of lucene/solr in documents
> search from the web applications. I use Ruby on Rails to create one, with
> plugin "acts_as_solr_reloaded" that makes connection between web app and
> solr easy.
>
> So I am in a point, where I know that good solution is to prepare
> multi-language documents with fields like:
> question_en, answer_en,
> question_fr, answer_fr,
> question_pl,  answer_pl... etc.
>
> I need to create an index that would work with 6 languages: english,
> french,
> german, russian, ukrainian and polish.
>
> My questions are:
> 1. Is it doable to have just one search field that behaves like Google's
> for
> all those documents? It can be an option to indicate a language to search.
>

This depends on what you mean by do-able. Are you going to allow a French
user to search an English document (& etc)? But the real answer is "yes, you
can
if you .". There'll be tradeoffs.

Take a look at the dismax handler. It's kind of hard to grok all at once,
but you
can cause it to search across multiple fields. That is, the user types
"language",
and you can turn it into a complex query under the covers like
lang_en:language lang_fr:language lang_ru:language, etc. You can also
apply boosts. Note that this has obvious problems with, say, Russian. Half
your
job will be figuring out what will satisfy the user.

You could also have a #different# dismax handler defined for various
languages. Say
the user was coming from Spanish. Consider a browseES handler. See
solrconfig.xml
for the default dismax handler. The Solr book mentioned above describes
this.


> 2. How should I begin changing the solr/conf/schema.xml (or other) file to
> tailor it to my needs? As I am a real rookie here, I am still a bit
> confused
> about "fields", "fieldTypes" and their connection with particular field
> (ex.
> answer_fr) and the "tokenizers" and "analyzers". If someone can provide a
> basic step by step tutorial on how to make it work in two languages I would
> be more that happy.
>

You have several choices here:
> books "Lucene in Action" and "Solr 1.4, Enterprise SearchServer" both have
discussions here.
> Spend some time on the solr/admin/analysis page. That page allows you to
see
   pretty much exactly what each of the steps in an analyzer chain
accomplish.


> 3. Do all those languages are supported (officially/unofficialy) by
> lucene/solr?
>

See:
http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/analysis/Analyzer.html
Remember that Solr is built on Lucene, so these analyzers are available.


>
> Thank you for help,
> Jakub Godawa.
>

Best
Erick


RE: Dismax phrase boosts on multi-value fields

2010-10-20 Thread Jonathan Rochkind
Which is why the positionIncrementGap is set to a high number normally (100 in 
the sample schema.xml).  With this being so, phrases won't match accross values 
in a multi-valued field. If for some reason you were using a dismax ps phrase 
slop that was higher than your positionIncrementGap, you could get phrase boost 
matches accross individual values.  But normally that won't happen unless you 
do something odd to make it happen because you actually want it to, because 
positionIncrementGap is 100. If for some reason you wanted to use a phrase slop 
of over 100 but still make sure it didn't go accross individual value 
boundaries you could just set positionIncrementGap to something absurdly high 
(I'm not entirely sure why it isn't something absurdly high in the sample 
schema.xml, instead of the high-but-not-absurdly-so 100, since most people will 
probably expect individual values to be entirely seperate). 

Jason, are you _trying_ to make that happen, or hoping it won't?  Ordinarily, 
it won't. 

From: Erick Erickson [erickerick...@gmail.com]
Sent: Wednesday, October 20, 2010 7:11 PM
To: solr-user@lucene.apache.org
Subject: Re: Dismax phrase boosts on multi-value fields

Well, it all depends (tm). your example wouldn't match, but if you
didn't have an increment gap greater than 1, "black cat his blue" #would#
match.

Best
Erick


On Wed, Oct 20, 2010 at 3:22 AM, Jason Brown  wrote:

> Thanks Jonathan.
>
> To further clarify, I understand the the match of
>
> my blue rabbit
>
> would have to be found in 1 element (of my multi-valued defined field) for
> the phrase boost on that field to kick in.
>
> If for example my document had the following 3 entries for the multi-value
> field
>
>
> my black cat
> his blue car
> her pink rabbit
>
> Then I assume the phrase boost would not kick-in as the search term (my
> blue rabbit) isnt found in a single element (but can be found across them).
>
> Thanks again
>
> Jason.
>
> 
>
> From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
> Sent: Tue 19/10/2010 17:27
> To: solr-user@lucene.apache.org
> Subject: Re: Dismax phrase boosts on multi-value fields
>
>
>
> You are correct.  The query needs to match as a phrase. It doesn't need
> to match "everything". Note that if a value is:
>
> "long sentence with my blue rabbit in it",
>
> then query "my blue rabbit" will also match as a phrase, for phrase
> boosting or query purposes.
>
> Jonathan
>
> Jason Brown wrote:
> >
> >
> > Hi - I have a multi-value field, so say for example it consists of
> >
> > 'my black cat'
> > 'my white dog'
> > 'my blue rabbit'
> >
> > The field is whitespace parsed when put into the index.
> >
> > I have a phrase query boost configured on this field which I understand
> kicks in when my search term is found entirely in this field.
> >
> > So, if the search term is 'my blue rabbit', then I understand that my
> phrase boost will be applied as this is found entirley in this field.
> >
> > My question/presumption is that as this is a multi-valued field, only 1
> value of the multi-value needs to match for the phrase query boost (given my
> very imaginative set of test data :-) above, you can see that this obviously
> matches 1 value and not them all)
> >
> > Thanks for your help.
> >
> >
> >
> >
> >
> >
> > If you wish to view the St. James's Place email disclaimer, please use
> the link below
> >
> > http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
> >
> >
>
>
>
> If you wish to view the St. James's Place email disclaimer, please use the
> link below
>
> http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
>


Re: Multiple indexes inside a single core

2010-10-20 Thread Erick Erickson
It seems to me that multiple cores are along the lines you
need, a single instance of Solr that can search across multiple
sub-indexes that do not necessarily share schemas, and are
independently maintainable..

This might be a good place to start: http://wiki.apache.org/solr/CoreAdmin

HTH
Erick

On Wed, Oct 20, 2010 at 3:23 PM, ben boggess  wrote:

> We are trying to convert a Lucene-based search solution to a
> Solr/Lucene-based solution.  The problem we have is that we currently have
> our data split into many indexes and Solr expects things to be in a single
> index unless you're sharding.  In addition to this, our indexes wouldn't
> work well using the distributed search functionality in Solr because the
> documents are not evenly or randomly distributed.  We are currently using
> Lucene's MultiSearcher to search over subsets of these indexes.
>
> I know this has been brought up a number of times in previous posts and the
> typical response is that the best thing to do is to convert everything into
> a single index.  One of the major reasons for having the indexes split up
> the way we do is because different types of data need to be indexed at
> different intervals.  You may need one index to be updated every 20 minutes
> and another is only updated every week.  If we move to a single index, then
> we will constantly be warming and replacing searchers for the entire
> dataset, and will essentially render the searcher caches useless.  If we
> were able to have multiple indexes, they would each have a searcher and
> updates would be isolated to a subset of the data.
>
> The other problem is that we will likely need to shard this large single
> index and there isn't a clean way to shard randomly and evenly across the
> of
> the data.  We would, however like to shard a single data type.  If we could
> use multiple indexes, we would likely be also sharding a small sub-set of
> them.
>
> Thanks in advance,
>
> Ben
>


Re: Searching for Documents by Indexed Term

2010-10-20 Thread Erick Erickson
Help updating/clarifying the Wiki is #alwyas# appreciated

Erick

On Wed, Oct 20, 2010 at 9:10 PM, Sasank Mudunuri  wrote:

> That looks very promising based on a couple of quick queries. Any
> objections
> if I move the javadoc help into the wiki, specifically:
>
> Create a term query from the input value without any text analysis or
> > transformation whatsoever. This is useful in debugging, or when raw terms
> > are returned from the terms component (this is not the default).
>
>
> Thanks Eric!
> sasank
>
> On Wed, Oct 20, 2010 at 6:00 PM, Erick Erickson  >wrote:
>
> > This may be a wild herring, but have you tried "raw"? NOTE: I'm a little
> > out of my depth here on what this actually does, so don't waste time by
> > thinking I'm an authority on this one. See:
> >
> >
> http://lucene.apache.org/solr/api/org/apache/solr/search/RawQParserPlugin.html
> >
> > and
> > http://wiki.apache.org/solr/SolrQuerySyntax
> > (this last under "built in query parsers").
> >
> > HTH
> > Erick
> >
> > On Wed, Oct 20, 2010 at 1:47 PM, Sasank Mudunuri 
> wrote:
> >
> > > Hi Solr Users,
> > >
> > > I used the TermsComponent to walk through all the indexed terms and
> find
> > > ones of particular interest (named entities). And now, I'd like to
> search
> > > for documents that contain these particular entities. I have both
> > > query-time
> > > and index-time stemming set for the field, which means I can't just hit
> > the
> > > normal search handler because as I understand, it will stem the
> > > already-stemmed term. Any ideas about how to search directly for the
> > > indexed
> > > term? Maybe something I can do at query-time to disable stemming?
> > >
> > > Thanks!
> > > sasank
> > >
> >
>


Re: Multiple indexes inside a single core

2010-10-20 Thread Ben Boggess
Thanks Erick.  The problem with multiple cores is that the documents are scored 
independently in each core.  I would like to be able to search across both 
cores and have the scores 'normalized' in a way that's similar to what Lucene's 
MultiSearcher would do.  As far a I understand, multiple cores would likely 
result in seriously skewed scores in my case since the documents are not 
distributed evenly or randomly.  I could have one core/index with 20 million 
docs and another with 200.

I've poked around in the code and this feature doesn't seem to exist.  I would 
be happy with finding a decent place to try to add it.  I'm not sure if there 
is a clean place for it.

Ben

On Oct 20, 2010, at 8:36 PM, Erick Erickson  wrote:

> It seems to me that multiple cores are along the lines you
> need, a single instance of Solr that can search across multiple
> sub-indexes that do not necessarily share schemas, and are
> independently maintainable..
> 
> This might be a good place to start: http://wiki.apache.org/solr/CoreAdmin
> 
> HTH
> Erick
> 
> On Wed, Oct 20, 2010 at 3:23 PM, ben boggess  wrote:
> 
>> We are trying to convert a Lucene-based search solution to a
>> Solr/Lucene-based solution.  The problem we have is that we currently have
>> our data split into many indexes and Solr expects things to be in a single
>> index unless you're sharding.  In addition to this, our indexes wouldn't
>> work well using the distributed search functionality in Solr because the
>> documents are not evenly or randomly distributed.  We are currently using
>> Lucene's MultiSearcher to search over subsets of these indexes.
>> 
>> I know this has been brought up a number of times in previous posts and the
>> typical response is that the best thing to do is to convert everything into
>> a single index.  One of the major reasons for having the indexes split up
>> the way we do is because different types of data need to be indexed at
>> different intervals.  You may need one index to be updated every 20 minutes
>> and another is only updated every week.  If we move to a single index, then
>> we will constantly be warming and replacing searchers for the entire
>> dataset, and will essentially render the searcher caches useless.  If we
>> were able to have multiple indexes, they would each have a searcher and
>> updates would be isolated to a subset of the data.
>> 
>> The other problem is that we will likely need to shard this large single
>> index and there isn't a clean way to shard randomly and evenly across the
>> of
>> the data.  We would, however like to shard a single data type.  If we could
>> use multiple indexes, we would likely be also sharding a small sub-set of
>> them.
>> 
>> Thanks in advance,
>> 
>> Ben
>> 


Re: How to delete a SOLR document if that particular data doesnt exist in DB?

2010-10-20 Thread ben boggess
> Now my question is.. Is there a way I can use preImportDeleteQuery to
> delete
> the documents from SOLR for which the data doesnt exist in back end db? I
> dont have anything called delete status in DB, instead I need to get all
> the
> UID's from SOLR document and compare it with all the UID's in back end and
> delete the data from SOLR document for the UID's which is not present in
> DB.

I've done something like this with raw Lucene and I'm not sure how or if you
could do it with Solr as I'm relatively new to it.

We stored a timestamp for when we started to import and stored an update
timestamp field for every document added to the index.  After the data
import, we did a delete by query that matched all documents with a timestamp
older than when we started.  The assumption being that if we didn't update
the timestamp during the load, then the record must have been deleted from
the database.

Hope this helps.

Ben

On Wed, Oct 20, 2010 at 8:05 PM, Erick Erickson wrote:

> << and
> do a complete re-indexing each week also we want to delete the orphan solr
> documents (for which the data is not present in back end DB) on a daily
> basis.>>>
>
> Can you make delete by query work? Something like delete all Solr docs of
> a certain type and do a full re-index of just that type?
>
> I have no idea whether this is practical or not
>
> But your solution also works. There's really no way Solr #can# know about
> deleted database records, especially since the  field is
> completely
> arbitrarily defined.
>
> Best
> Erick
>
> On Wed, Oct 20, 2010 at 10:51 AM, bbarani  wrote:
>
> >
> > Hi,
> >
> > I have a very common question but couldnt find any post related to my
> > question in this forum,
> >
> > I am currently initiating a full import each week but the data that have
> > been deleted in the source is not update in my document as I am using
> > clean=false.
> >
> > We are indexing multiple data by data types hence cant delete the index
> and
> > do a complete re-indexing each week also we want to delete the orphan
> solr
> > documents (for which the data is not present in back end DB) on a daily
> > basis.
> >
> > Now my question is.. Is there a way I can use preImportDeleteQuery to
> > delete
> > the documents from SOLR for which the data doesnt exist in back end db? I
> > dont have anything called delete status in DB, instead I need to get all
> > the
> > UID's from SOLR document and compare it with all the UID's in back end
> and
> > delete the data from SOLR document for the UID's which is not present in
> > DB.
> >
> > Any suggestion / ideas would be of great help.
> >
> > Note: Currently I have developed a simple program which will fetch the
> > UID's
> > from SOLR document and then connect to backend DB to check the orphan
> UID's
> > and delete the documents from SOLR index corresponding to orphan UID's. I
> > just dont want to re-invent the wheel if this feature is already present
> in
> > SOLR as I need to do more testing in terms of performance / scalability
> for
> > my program..
> >
> > Thanks,
> > Barani
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/How-to-delete-a-SOLR-document-if-that-particular-data-doesnt-exist-in-DB-tp1739222p1739222.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>


RE: Dismax phrase boosts on multi-value fields

2010-10-20 Thread Jason Brown
Thanks - I was hoping it wouldnt match - and I belive you've confimred it wont 
in my case as the default positionIncrementGap is set.

Many Thanks

Jason.


-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
Sent: Thu 21/10/2010 02:27
To: solr-user@lucene.apache.org
Subject: RE: Dismax phrase boosts on multi-value fields
 
Which is why the positionIncrementGap is set to a high number normally (100 in 
the sample schema.xml).  With this being so, phrases won't match accross values 
in a multi-valued field. If for some reason you were using a dismax ps phrase 
slop that was higher than your positionIncrementGap, you could get phrase boost 
matches accross individual values.  But normally that won't happen unless you 
do something odd to make it happen because you actually want it to, because 
positionIncrementGap is 100. If for some reason you wanted to use a phrase slop 
of over 100 but still make sure it didn't go accross individual value 
boundaries you could just set positionIncrementGap to something absurdly high 
(I'm not entirely sure why it isn't something absurdly high in the sample 
schema.xml, instead of the high-but-not-absurdly-so 100, since most people will 
probably expect individual values to be entirely seperate). 

Jason, are you _trying_ to make that happen, or hoping it won't?  Ordinarily, 
it won't. 

From: Erick Erickson [erickerick...@gmail.com]
Sent: Wednesday, October 20, 2010 7:11 PM
To: solr-user@lucene.apache.org
Subject: Re: Dismax phrase boosts on multi-value fields

Well, it all depends (tm). your example wouldn't match, but if you
didn't have an increment gap greater than 1, "black cat his blue" #would#
match.

Best
Erick


On Wed, Oct 20, 2010 at 3:22 AM, Jason Brown  wrote:

> Thanks Jonathan.
>
> To further clarify, I understand the the match of
>
> my blue rabbit
>
> would have to be found in 1 element (of my multi-valued defined field) for
> the phrase boost on that field to kick in.
>
> If for example my document had the following 3 entries for the multi-value
> field
>
>
> my black cat
> his blue car
> her pink rabbit
>
> Then I assume the phrase boost would not kick-in as the search term (my
> blue rabbit) isnt found in a single element (but can be found across them).
>
> Thanks again
>
> Jason.
>
> 
>
> From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
> Sent: Tue 19/10/2010 17:27
> To: solr-user@lucene.apache.org
> Subject: Re: Dismax phrase boosts on multi-value fields
>
>
>
> You are correct.  The query needs to match as a phrase. It doesn't need
> to match "everything". Note that if a value is:
>
> "long sentence with my blue rabbit in it",
>
> then query "my blue rabbit" will also match as a phrase, for phrase
> boosting or query purposes.
>
> Jonathan
>
> Jason Brown wrote:
> >
> >
> > Hi - I have a multi-value field, so say for example it consists of
> >
> > 'my black cat'
> > 'my white dog'
> > 'my blue rabbit'
> >
> > The field is whitespace parsed when put into the index.
> >
> > I have a phrase query boost configured on this field which I understand
> kicks in when my search term is found entirely in this field.
> >
> > So, if the search term is 'my blue rabbit', then I understand that my
> phrase boost will be applied as this is found entirley in this field.
> >
> > My question/presumption is that as this is a multi-valued field, only 1
> value of the multi-value needs to match for the phrase query boost (given my
> very imaginative set of test data :-) above, you can see that this obviously
> matches 1 value and not them all)
> >
> > Thanks for your help.
> >
> >
> >
> >
> >
> >
> > If you wish to view the St. James's Place email disclaimer, please use
> the link below
> >
> > http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
> >
> >
>
>
>
> If you wish to view the St. James's Place email disclaimer, please use the
> link below
>
> http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
>


If you wish to view the St. James's Place email disclaimer, please use the link 
below

http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer


Looking for Solr/Lucene Developers in India(Pune)

2010-10-20 Thread ST ST
If you are a Solr/Lucene developer in Pune, India and are interested in a
consulting opportunity overseas,
or on Projects local to the area, please get in touch with me.

Thanks


Re: does solr support posting gzipped content?

2010-10-20 Thread Gora Mohanty
On Tue, Oct 19, 2010 at 9:34 PM, danomano  wrote:
>
> Hi folks, I was wondering if there is any native support for posting gzipped
> files to solr?
>
> i.e. I'm testing a project where we inject our log files into solr for
> indexing, these logs files are gzipped, and I figure it would take less
> network bandwith to inject gzipped files directl.

What do you mean by "inject"? Are you POSTing XML to Solr,
using a DataImportHandler, or what?

>  is there a 
> way to do
> this? other then implementing my own SerlvetFilter or some such.

As far as I am aware there is no existing way to post gzipped
content to Solr. Integrating this into the DataImportHandler would
probably be the way to go.

Regards,
Gora


Re: how can i use solrj binary format for indexing?

2010-10-20 Thread Gora Mohanty
On Mon, Oct 18, 2010 at 8:22 PM, Jason, Kim  wrote:

Sorry for the delay in replying. Was caught up in various things this
week.

> Thank you for reply, Gora
>
> But I still have several questions.
> Did you use separate index?
> If so, you indexed 0.7 million Xml files per instance
> and merged it. Is it Right?

Yes, that is correct. We sharded the data by user ID, so that each of the 25
cores held approximately 0.7 million out of the 3.5 million records. We could
have used the sharded indices directly for search, but at least for now have
decided to go with a single, merged index.

> Please let me know how to work multiple instances and cores in your case.
[...]

* Multi-core Solr setup is quite easy, via configuration in solr.xml:
  http://wiki.apache.org/solr/CoreAdmin . The configuration, i.e.,
  schema, solrconfig.xml, etc. need to be replicated across the
  cores.
* Decide which XML files you will post to which core, and do the
  POST with curl, as usual. You might need to write a little script
  to do this.
* After indexing on the cores is done, make sure to do a commit
  on each.
* Merge the sharded indexes (if desired) as described here:
  http://wiki.apache.org/solr/MergingSolrIndexes . One thing to
  watch out for here is disk space. When merging with Lucene
  IndexMergeTool, we found that a rough rule of thumb was that
  intermediate steps in the merge would require about twice as
  much space as the total size of the indexes to be merged. I.e.,
  if one is merging 40GB of data in sharded indexes, one should
  have at least 120GB free.

Regards,
Gora


RAM increase

2010-10-20 Thread satya swaroop
Hi all,
  I increased my RAM size to 8GB and i want 4GB of it to be used
for solr itself. can anyone tell me the way to allocate the RAM for the
solr.


Regards,
satya


Re: Mulitple facet - fq

2010-10-20 Thread Yavuz Selim YILMAZ
Thnx guys.
--

Yavuz Selim YILMAZ


2010/10/20 Tim Gilbert 

> Sorry, what Pradeep said, not Prasad.  My apologies Pradeep.
>
> -Original Message-
> From: Tim Gilbert
> Sent: Wednesday, October 20, 2010 12:18 PM
> To: 'solr-user@lucene.apache.org'
> Subject: RE: Mulitple facet - fq
>
> As Prasad said:
>
>fq=(category:corporate category:personal)
>
> But you might want to check your schema.xml to see what you have here:
>
>
>
>
> You can always specify your operator in your search between your facets.
>
>
>fq=(category:corporate AND category:personal)
>
> or
>
>fq=(category:corporate OR category:personal)
>
> I have an application where I am using searches on 10 more facets with
> AND OR + and - options and it works flawlessly.
>
>fq=(+category:corporate AND -category:personal)
>
> meaning category is corporate and not personal.
>
> Tim
>
> -Original Message-
> From: Pradeep Singh [mailto:pksing...@gmail.com]
> Sent: Wednesday, October 20, 2010 11:56 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Mulitple facet - fq
>
> fq=(category:corporate category:personal)
>
> On Wed, Oct 20, 2010 at 7:39 AM, Yavuz Selim YILMAZ
>  > wrote:
>
> > Under category facet, there are multiple selections, whicih can be
> > personal,corporate or other 
> >
> > How can I get both "personal" and "corporate" ones, I tried
> > fq=category:corporate&fq=category:personal
> >
> > It looks easy, but I can't find the solution.
> >
> >
> > --
> >
> > Yavuz Selim YILMAZ
> >
>


Re: RAM increase

2010-10-20 Thread Gora Mohanty
On Thu, Oct 21, 2010 at 10:46 AM, satya swaroop  wrote:
> Hi all,
>              I increased my RAM size to 8GB and i want 4GB of it to be used
> for solr itself. can anyone tell me the way to allocate the RAM for the
> solr.
[...]

You will need to set up the allocation of RAM for Java, via the the -Xmx
and -Xms variables. If you are using something like Tomcat, that would
be done in the Tomcat configuration file. E.g., this option can be added
inside /etc/init.d/tomcat6 on new Debian/Ubuntu systems.

Regards,
Gora