date:20110428

XS DateTime format

2011-04-28 Thread Jens Flaaris

Hi,

I just have a small question regarding the output format of fields of type 
TrieDateField. If a document containing the date 0001-01-01T01.01.01Z is passed 
to Solr and I then try to search for that document the output of the date field 
is of format Y-MM-DDThh:mm:ssZ. The first three zeros are missing. According to 
XML specification found on w3.org  in XS DateTime is a four-or-more digit 
optionally negative-signed numeral that represents the year. Is it intentional 
that Solr strips leading zeros for the first four digits?

Thanks
Jens Jørgen Flaaris

fq parameter with partial value

2011-04-28 Thread elisabeth benoit

Hello,

I would like to know if there is a way to use the fq parameter with a
partial value.

For instance, if I have a request with fq=NAME:Joe, and I would like to
retrieve all answers where NAME contains Joe, including those with NAME =
Joe Smith.

Thanks,
Elisabeth

Re: fq parameter with partial value

2011-04-28 Thread Stefan Matheis

Hi Elisabeth,

that's not what FilterQueries are made for :) What against using that
Criteria in the Query?
Perhaps you want to describe your UseCase and we'll see if there's
another way to solve it?

Regards
Stefan

On Thu, Apr 28, 2011 at 9:09 AM, elisabeth benoit
 wrote:
> Hello,
>
> I would like to know if there is a way to use the fq parameter with a
> partial value.
>
> For instance, if I have a request with fq=NAME:Joe, and I would like to
> retrieve all answers where NAME contains Joe, including those with NAME =
> Joe Smith.
>
> Thanks,
> Elisabeth
>

Spatial Search

2011-04-28 Thread Jonas Lanzendörfer

Dear list :)

I am new to solr and try to use the spatial search feature which was added in 
3.1. In my schema.xml I have 2 double fields for latitude and longitude. How 
can I get them into the location field type? I use solrj to fill the index with 
data. If I would use a location field instead of two double fields, how could I 
fill this with solrj? I use annotations to link the data from my dto´s to the 
index fields...

Hope you got my problem...

best regards, Jonas

Re: fq parameter with partial value

2011-04-28 Thread elisabeth benoit

Hi Stefan,

Thanks for answering.

In more details, my problem is the following. I'm working on searching
points of interest (POIs), which can be hotels, restaurants, plumbers,
psychologists, etc.

Those POIs can be identified among other things  by categories or by brand.
And a single POIs might have different categories (no maximum number). User
might enter a query like

McDonald’s Paris

or

Restaurant Paris

or

many other possible queries

First I want to do a facet search on brand and categories, to find out which
case is the current case.

http://localhost:8080/solr /select?q=restaurant  paris
&facet=true&facet.field=BRAND& facet.field=CATEGORY

and get an answer like

598

451

Then I want to send a request with fq= CATEGORY: Restaurant and still get
answers with CATEGORY= Restaurant Hotel.

One solution would be to modify the data to add a new document every time we
have a new category, so a POI with three different categories would be index
three times, each time with a different category.

But I was wondering if there was another way around.

Thanks again,

Elisabeth

2011/4/28 Stefan Matheis 

> Hi Elisabeth,
>
> that's not what FilterQueries are made for :) What against using that
> Criteria in the Query?
> Perhaps you want to describe your UseCase and we'll see if there's
> another way to solve it?
>
> Regards
> Stefan
>
> On Thu, Apr 28, 2011 at 9:09 AM, elisabeth benoit
>  wrote:
> > Hello,
> >
> > I would like to know if there is a way to use the fq parameter with a
> > partial value.
> >
> > For instance, if I have a request with fq=NAME:Joe, and I would like to
> > retrieve all answers where NAME contains Joe, including those with NAME =
> > Joe Smith.
> >
> > Thanks,
> > Elisabeth
> >
>

how to update database record after indexing

2011-04-28 Thread vrpar...@gmail.com

Hello,

i am using dataimporthandler to import data from sql server database.

my requirement is when solr completed indexing on particular database record 
i want to update that record in database

or after indexing all records if i can get all ids and update all records

how to achieve same ?

Thanks

Vishal Parekh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-update-database-record-after-indexing-tp2874171p2874171.html
Sent from the Solr - User mailing list archive at Nabble.com.

manual background re-indexing

2011-04-28 Thread Paul Libbrecht


Hello list,

I am planning to implement a setup, to be run on unix scripts, that should 
perform a full pull-and-reindex in a background server and index then deploy 
that index. All should happen on the same machine.

I thought the replication methods would help me but they seem to rather solve 
the issues of distribution while, what I need, is only the ability to:

- suspend the queries
- swap the directories with the new index
- close all searchers
- reload and warm-up the searcher on the new index

Is there a part of the replication utilities (http or unix) that I could use to 
perform the above tasks?
I intend to do this on occasion... maybe once a month or even less.
Is "reload" the right term to be used?

paul

Re: Formatted date/time in long field and javabinRW exception

2011-04-28 Thread Markus Jelsma

Any thoughts on this one? Why does Solr output a string in a long field with 
XMLResponseWriter but fails doing so (as it should) with the javabin format?

On Tuesday 19 April 2011 10:52:33 Markus Jelsma wrote:
> Hi,
> 
> Nutch 1.3-dev seems to have changed its tstamp field from a long to a
> properly formatted Solr readable date/time but the example Solr schema for
> Nutch still configures the tstamp field as a long. This results in a
> formatted date/time in a long field, which i think should not be allowed
> in the first place by Solr.
> 
> 2011-04-19T08:16:31.675Z
> 
> While the above is strange enough, i only found out it's all wrong when
> using the javabin format. The following query will throw an exception
> while using XML response writer works find and returns the tstamp as long
> but formatted as a proper date/time.
> 
> javabin:
> 
> curl
> "http://localhost:8983/solr/select?fl=id,boost,tstamp,digest&start=0&q=id:
> \[*+TO+*\]&wt=javabin&rows=2&version=1"
> 
> Apr 19, 2011 10:34:50 AM
> org.apache.solr.request.BinaryResponseWriter$Resolver getDoc
> WARNING: Error reading a field from document :
> SolrDocument[{digest=7ff92a31c58e43a34fd45bc6d87cda03}]
> java.lang.NumberFormatException: For input string:
> "2011-04-19T08:16:31.675Z" at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:4
> 8) at java.lang.Long.parseLong(Long.java:419)
> at java.lang.Long.valueOf(Long.java:525)
> at org.apache.solr.schema.LongField.toObject(LongField.java:82)
> at org.apache.solr.schema.LongField.toObject(LongField.java:33)
> at
> org.apache.solr.request.BinaryResponseWriter$Resolver.getDoc(BinaryResponse
> Writer.java:148) at
> org.apache.solr.request.BinaryResponseWriter$Resolver.writeDocList(BinaryRe
> sponseWriter.java:124) at
> org.apache.solr.request.BinaryResponseWriter$Resolver.resolve(BinaryRespons
> eWriter.java:88) at
> org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:143)
> at
> org.apache.solr.common.util.JavaBinCodec.writeNamedList(JavaBinCodec.java:1
> 33) at
> org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:2
> 21) at
> org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:138)
> at
> org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:87)
> at
> org.apache.solr.request.BinaryResponseWriter.write(BinaryResponseWriter.jav
> a:48) at
> org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter
> .java:322) at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java
> :254) more trace from Jetty
> 
> Here's the wt=xml working fine and showing output for the tstamp field:
> 
> markus@midas:~$ curl
> "http://localhost:8983/solr/select?fl=id,boost,tstamp,digest&start=0&q=id:
> \[*+TO+*\]&wt=xml&rows=2&version=1"
> 
> 
> 
> 017
> 
> id,boost,tstamp,digest
> 0
> id:[* TO *]
> xml
> 2
> 1<
> /lst>
> 
> 
> 478e77f99f7005ae71aa92a879be2fd4
> idfield
> 2011-04-19T08:16:31.689Z
> 
> 
> 7ff92a31c58e43a34fd45bc6d87cda03
> idfield
> 2011-04-19T08:16:31.675Z
> 
> 
> 
> 
> Cheers,

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: manual background re-indexing

2011-04-28 Thread Shaun Campbell

Hi Paul

Would a multi-core set up and the swap command do what you want it to do?

http://wiki.apache.org/solr/CoreAdmin

Shaun

On 28 April 2011 12:49, Paul Libbrecht  wrote:

>
> Hello list,
>
> I am planning to implement a setup, to be run on unix scripts, that should
> perform a full pull-and-reindex in a background server and index then deploy
> that index. All should happen on the same machine.
>
> I thought the replication methods would help me but they seem to rather
> solve the issues of distribution while, what I need, is only the ability to:
>
> - suspend the queries
> - swap the directories with the new index
> - close all searchers
> - reload and warm-up the searcher on the new index
>
> Is there a part of the replication utilities (http or unix) that I could
> use to perform the above tasks?
> I intend to do this on occasion... maybe once a month or even less.
> Is "reload" the right term to be used?
>
> paul

Re: fq parameter with partial value

2011-04-28 Thread Erick Erickson

So, I assume your CATEGORY field is multiValued but each value is not
broken up into tokens, right? If that's the case, would it work to have a
second field CATEGORY_TOKENIZED and run your fq against that
field instead?

You could have this be a multiValued field with an increment gap if you wanted
to prevent matches across separate entries and have your fq do a proximity
search where the proximity was less than the increment gap

Best
Erick

On Thu, Apr 28, 2011 at 6:03 AM, elisabeth benoit
 wrote:
> Hi Stefan,
>
> Thanks for answering.
>
> In more details, my problem is the following. I'm working on searching
> points of interest (POIs), which can be hotels, restaurants, plumbers,
> psychologists, etc.
>
> Those POIs can be identified among other things  by categories or by brand.
> And a single POIs might have different categories (no maximum number). User
> might enter a query like
>
>
> McDonald’s Paris
>
>
> or
>
>
> Restaurant Paris
>
>
> or
>
>
> many other possible queries
>
>
> First I want to do a facet search on brand and categories, to find out which
> case is the current case.
>
>
> http://localhost:8080/solr /select?q=restaurant  paris
> &facet=true&facet.field=BRAND& facet.field=CATEGORY
>
> and get an answer like
>
> 
>
> 
>
> 598
>
> 451
>
>
>
> Then I want to send a request with fq= CATEGORY: Restaurant and still get
> answers with CATEGORY= Restaurant Hotel.
>
>
>
> One solution would be to modify the data to add a new document every time we
> have a new category, so a POI with three different categories would be index
> three times, each time with a different category.
>
>
> But I was wondering if there was another way around.
>
>
>
> Thanks again,
>
> Elisabeth
>
>
> 2011/4/28 Stefan Matheis 
>
>> Hi Elisabeth,
>>
>> that's not what FilterQueries are made for :) What against using that
>> Criteria in the Query?
>> Perhaps you want to describe your UseCase and we'll see if there's
>> another way to solve it?
>>
>> Regards
>> Stefan
>>
>> On Thu, Apr 28, 2011 at 9:09 AM, elisabeth benoit
>>  wrote:
>> > Hello,
>> >
>> > I would like to know if there is a way to use the fq parameter with a
>> > partial value.
>> >
>> > For instance, if I have a request with fq=NAME:Joe, and I would like to
>> > retrieve all answers where NAME contains Joe, including those with NAME =
>> > Joe Smith.
>> >
>> > Thanks,
>> > Elisabeth
>> >
>>
>

Re: how to update database record after indexing

2011-04-28 Thread Erick Erickson

I don't think you can do this through DIH, you'll probably have to write a
separate process that queries the Solr index and updates your table.

You'll have to be a bit cautious that you coordinate the commits, that
is wait for the DIH to complete and commit before running your separate
db update process.

Best
Erick

On Thu, Apr 28, 2011 at 6:59 AM, vrpar...@gmail.com  wrote:
> Hello,
>
> i am using dataimporthandler to import data from sql server database.
>
> my requirement is when solr completed indexing on particular database record
> i want to update that record in database
>
> or after indexing all records if i can get all ids and update all records
>
> how to achieve same ?
>
> Thanks
>
> Vishal Parekh
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/how-to-update-database-record-after-indexing-tp2874171p2874171.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Spatial Search

2011-04-28 Thread Yonik Seeley

On Thu, Apr 28, 2011 at 5:15 AM, Jonas Lanzendörfer
 wrote:
> I am new to solr and try to use the spatial search feature which was added in 
> 3.1. In my schema.xml I have 2 double fields for latitude and longitude. How 
> can I get them into the location field type? I use solrj to fill the index 
> with data. If I would use a location field instead of two double fields, how 
> could I fill this with solrj? I use annotations to link the data from my 
> dto´s to the index fields...


I've not used the annotation stuff in SolrJ, but since the value sent
in must be of the for 10.3,20.4 then
I guess one would have to have a String field with this value on your object.


-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

Re: manual background re-indexing

2011-04-28 Thread Paul Libbrecht

Just where to do I put the new index data with such a command? Simply replacing 
the segment files appears dangerous to me.

Also, what is the best practice to move from single-core to multi-core?
My current set-up is single-core, do I simply need to add a solr.xml in my 
solr-home and one core1 directory with the data that was there previously?

paul


Le 28 avr. 2011 à 14:04, Shaun Campbell a écrit :

> Hi Paul
> 
> Would a multi-core set up and the swap command do what you want it to do?
> 
> http://wiki.apache.org/solr/CoreAdmin
> 
> Shaun
> 
> On 28 April 2011 12:49, Paul Libbrecht  wrote:
> 
>> 
>> Hello list,
>> 
>> I am planning to implement a setup, to be run on unix scripts, that should
>> perform a full pull-and-reindex in a background server and index then deploy
>> that index. All should happen on the same machine.
>> 
>> I thought the replication methods would help me but they seem to rather
>> solve the issues of distribution while, what I need, is only the ability to:
>> 
>> - suspend the queries
>> - swap the directories with the new index
>> - close all searchers
>> - reload and warm-up the searcher on the new index
>> 
>> Is there a part of the replication utilities (http or unix) that I could
>> use to perform the above tasks?
>> I intend to do this on occasion... maybe once a month or even less.
>> Is "reload" the right term to be used?
>> 
>> paul

Re: manual background re-indexing

2011-04-28 Thread Erick Erickson

It would probable be safest just to set up a separate system as
multi-core from the start, get the process working and then either use
the new machine or copy the whole setup to the production machine.

Best
Erick

On Thu, Apr 28, 2011 at 8:49 AM, Paul Libbrecht  wrote:
> Just where to do I put the new index data with such a command? Simply 
> replacing the segment files appears dangerous to me.
>
> Also, what is the best practice to move from single-core to multi-core?
> My current set-up is single-core, do I simply need to add a solr.xml in my 
> solr-home and one core1 directory with the data that was there previously?
>
> paul
>
>
> Le 28 avr. 2011 à 14:04, Shaun Campbell a écrit :
>
>> Hi Paul
>>
>> Would a multi-core set up and the swap command do what you want it to do?
>>
>> http://wiki.apache.org/solr/CoreAdmin
>>
>> Shaun
>>
>> On 28 April 2011 12:49, Paul Libbrecht  wrote:
>>
>>>
>>> Hello list,
>>>
>>> I am planning to implement a setup, to be run on unix scripts, that should
>>> perform a full pull-and-reindex in a background server and index then deploy
>>> that index. All should happen on the same machine.
>>>
>>> I thought the replication methods would help me but they seem to rather
>>> solve the issues of distribution while, what I need, is only the ability to:
>>>
>>> - suspend the queries
>>> - swap the directories with the new index
>>> - close all searchers
>>> - reload and warm-up the searcher on the new index
>>>
>>> Is there a part of the replication utilities (http or unix) that I could
>>> use to perform the above tasks?
>>> I intend to do this on occasion... maybe once a month or even less.
>>> Is "reload" the right term to be used?
>>>
>>> paul
>
>

Re: fq parameter with partial value

2011-04-28 Thread elisabeth benoit

yes, the multivalued field is not broken up into tokens.

so, if I understand well what you mean, I could have

a field CATEGORY with  multiValued="true"
a field CATEGORY_TOKENIZED with  multiValued=" true"

and then some POI

POI_Name
...
Restaurant Hotel
Restaurant
Hotel

do faceting on CATEGORY and fq on CATEGORY_TOKENIZED.

But then, wouldn't it be possible to do faceting on CATEGORY_TOKENIZED?

Best regards
Elisabeth


2011/4/28 Erick Erickson 

> So, I assume your CATEGORY field is multiValued but each value is not
> broken up into tokens, right? If that's the case, would it work to have a
> second field CATEGORY_TOKENIZED and run your fq against that
> field instead?
>
> You could have this be a multiValued field with an increment gap if you
> wanted
> to prevent matches across separate entries and have your fq do a proximity
> search where the proximity was less than the increment gap
>
> Best
> Erick
>
> On Thu, Apr 28, 2011 at 6:03 AM, elisabeth benoit
>  wrote:
> > Hi Stefan,
> >
> > Thanks for answering.
> >
> > In more details, my problem is the following. I'm working on searching
> > points of interest (POIs), which can be hotels, restaurants, plumbers,
> > psychologists, etc.
> >
> > Those POIs can be identified among other things  by categories or by
> brand.
> > And a single POIs might have different categories (no maximum number).
> User
> > might enter a query like
> >
> >
> > McDonald’s Paris
> >
> >
> > or
> >
> >
> > Restaurant Paris
> >
> >
> > or
> >
> >
> > many other possible queries
> >
> >
> > First I want to do a facet search on brand and categories, to find out
> which
> > case is the current case.
> >
> >
> > http://localhost:8080/solr /select?q=restaurant  paris
> > &facet=true&facet.field=BRAND& facet.field=CATEGORY
> >
> > and get an answer like
> >
> > 
> >
> > 
> >
> > 598
> >
> > 451
> >
> >
> >
> > Then I want to send a request with fq= CATEGORY: Restaurant and still get
> > answers with CATEGORY= Restaurant Hotel.
> >
> >
> >
> > One solution would be to modify the data to add a new document every time
> we
> > have a new category, so a POI with three different categories would be
> index
> > three times, each time with a different category.
> >
> >
> > But I was wondering if there was another way around.
> >
> >
> >
> > Thanks again,
> >
> > Elisabeth
> >
> >
> > 2011/4/28 Stefan Matheis 
> >
> >> Hi Elisabeth,
> >>
> >> that's not what FilterQueries are made for :) What against using that
> >> Criteria in the Query?
> >> Perhaps you want to describe your UseCase and we'll see if there's
> >> another way to solve it?
> >>
> >> Regards
> >> Stefan
> >>
> >> On Thu, Apr 28, 2011 at 9:09 AM, elisabeth benoit
> >>  wrote:
> >> > Hello,
> >> >
> >> > I would like to know if there is a way to use the fq parameter with a
> >> > partial value.
> >> >
> >> > For instance, if I have a request with fq=NAME:Joe, and I would like
> to
> >> > retrieve all answers where NAME contains Joe, including those with
> NAME =
> >> > Joe Smith.
> >> >
> >> > Thanks,
> >> > Elisabeth
> >> >
> >>
> >
>

Re: manual background re-indexing

2011-04-28 Thread Paul Libbrecht


I sure would need a downtime to migrate from single-core to multi-core!
The question is however whether there are typical steps for a migration.

paul

Le 28 avr. 2011 à 15:01, Erick Erickson a écrit :

> It would probable be safest just to set up a separate system as
> multi-core from the start, get the process working and then either use
> the new machine or copy the whole setup to the production machine.
> 
> Best
> Erick
> 
> On Thu, Apr 28, 2011 at 8:49 AM, Paul Libbrecht  wrote:
>> Just where to do I put the new index data with such a command? Simply 
>> replacing the segment files appears dangerous to me.
>> 
>> Also, what is the best practice to move from single-core to multi-core?
>> My current set-up is single-core, do I simply need to add a solr.xml in my 
>> solr-home and one core1 directory with the data that was there previously?
>> 
>> paul
>> 
>> 
>> Le 28 avr. 2011 à 14:04, Shaun Campbell a écrit :
>> 
>>> Hi Paul
>>> 
>>> Would a multi-core set up and the swap command do what you want it to do?
>>> 
>>> http://wiki.apache.org/solr/CoreAdmin
>>> 
>>> Shaun
>>> 
>>> On 28 April 2011 12:49, Paul Libbrecht  wrote:
>>> 
 
 Hello list,
 
 I am planning to implement a setup, to be run on unix scripts, that should
 perform a full pull-and-reindex in a background server and index then 
 deploy
 that index. All should happen on the same machine.
 
 I thought the replication methods would help me but they seem to rather
 solve the issues of distribution while, what I need, is only the ability 
 to:
 
 - suspend the queries
 - swap the directories with the new index
 - close all searchers
 - reload and warm-up the searcher on the new index
 
 Is there a part of the replication utilities (http or unix) that I could
 use to perform the above tasks?
 I intend to do this on occasion... maybe once a month or even less.
 Is "reload" the right term to be used?
 
 paul
>> 
>>

RE: fq parameter with partial value

2011-04-28 Thread Jonathan Rochkind

Yep, what you describe is what I do in similar situations, it works fine. 

It is certainly possible to facet on a tokenized field... but your individual 
facet values will be the _tokens_, not the complete values. And they'll be the 
post-analyzed tokens at that.  Which is rarely what you want.  Thus the use of 
two fields, one tokenized and analyzed, one not tokenized and minimimally 
analzyed (for instance, not stemmed). 

From: elisabeth benoit [elisaelisael...@gmail.com]
Sent: Thursday, April 28, 2011 9:03 AM
To: solr-user@lucene.apache.org
Subject: Re: fq parameter with partial value

yes, the multivalued field is not broken up into tokens.

so, if I understand well what you mean, I could have

a field CATEGORY with  multiValued="true"
a field CATEGORY_TOKENIZED with  multiValued=" true"

and then some POI

POI_Name
...
Restaurant Hotel
Restaurant
Hotel

do faceting on CATEGORY and fq on CATEGORY_TOKENIZED.

But then, wouldn't it be possible to do faceting on CATEGORY_TOKENIZED?

Best regards
Elisabeth


2011/4/28 Erick Erickson 

> So, I assume your CATEGORY field is multiValued but each value is not
> broken up into tokens, right? If that's the case, would it work to have a
> second field CATEGORY_TOKENIZED and run your fq against that
> field instead?
>
> You could have this be a multiValued field with an increment gap if you
> wanted
> to prevent matches across separate entries and have your fq do a proximity
> search where the proximity was less than the increment gap
>
> Best
> Erick
>
> On Thu, Apr 28, 2011 at 6:03 AM, elisabeth benoit
>  wrote:
> > Hi Stefan,
> >
> > Thanks for answering.
> >
> > In more details, my problem is the following. I'm working on searching
> > points of interest (POIs), which can be hotels, restaurants, plumbers,
> > psychologists, etc.
> >
> > Those POIs can be identified among other things  by categories or by
> brand.
> > And a single POIs might have different categories (no maximum number).
> User
> > might enter a query like
> >
> >
> > McDonald’s Paris
> >
> >
> > or
> >
> >
> > Restaurant Paris
> >
> >
> > or
> >
> >
> > many other possible queries
> >
> >
> > First I want to do a facet search on brand and categories, to find out
> which
> > case is the current case.
> >
> >
> > http://localhost:8080/solr /select?q=restaurant  paris
> > &facet=true&facet.field=BRAND& facet.field=CATEGORY
> >
> > and get an answer like
> >
> > 
> >
> > 
> >
> > 598
> >
> > 451
> >
> >
> >
> > Then I want to send a request with fq= CATEGORY: Restaurant and still get
> > answers with CATEGORY= Restaurant Hotel.
> >
> >
> >
> > One solution would be to modify the data to add a new document every time
> we
> > have a new category, so a POI with three different categories would be
> index
> > three times, each time with a different category.
> >
> >
> > But I was wondering if there was another way around.
> >
> >
> >
> > Thanks again,
> >
> > Elisabeth
> >
> >
> > 2011/4/28 Stefan Matheis 
> >
> >> Hi Elisabeth,
> >>
> >> that's not what FilterQueries are made for :) What against using that
> >> Criteria in the Query?
> >> Perhaps you want to describe your UseCase and we'll see if there's
> >> another way to solve it?
> >>
> >> Regards
> >> Stefan
> >>
> >> On Thu, Apr 28, 2011 at 9:09 AM, elisabeth benoit
> >>  wrote:
> >> > Hello,
> >> >
> >> > I would like to know if there is a way to use the fq parameter with a
> >> > partial value.
> >> >
> >> > For instance, if I have a request with fq=NAME:Joe, and I would like
> to
> >> > retrieve all answers where NAME contains Joe, including those with
> NAME =
> >> > Joe Smith.
> >> >
> >> > Thanks,
> >> > Elisabeth
> >> >
> >>
> >
>

boost fields which have value

2011-04-28 Thread Zoltán Altfatter

Hi,

How can I achieve that documents which don't have field1 and field2 filled
in, are returned in the end of the search result.

I have tried with *bf* parameter, which seems to work but just with one
field.

Is there any function query which I can use in bf value to boost two fields?

Thank you.

Regards,
Zoltan

Boost newer documents only if date is different from timestamp

2011-04-28 Thread Dietrich

I am trying to boost newer documents in Solr queries. The ms function
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
seems to be the right way to go, but I need to add an additional
condition:
I am using the last-Modified-Date from crawled web pages as the date
to consider, and that does not always provide a meaningful date.
Therefore I would like the function to only boost documents where the
date (not time) found in the last-Modified-Date is different from the
timestamp, eliminating results that just return the current date as
the last-Modified-Date. Suggestions are appreciated!

Searching for escaped characters

2011-04-28 Thread Paul

I'm trying to create a test to make sure that character sequences like
"è" are successfully converted to their equivalent utf
character (that is, in this case, "è").

So, I'd like to search my solr index using the equivalent of the
following regular expression:

&\w{1,6};

To find any escaped sequences that might have slipped through.

Is this possible? I have indexed these fields with text_lu, which
looks like this:

   
  



  


Thanks,
Paul

Re: Concatenate multivalued DIH fields

2011-04-28 Thread jimtronic

I solved this problem using the flatten="true" attribute.

Given this schema

 
  
   
Joe
Smith
   
  
 




attr_names is a multiValued field in my schema.xml. The flatten attribute
tells solr to take all the text from the specified node and below.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Concatenate-multivalued-DIH-fields-tp2749988p2875435.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: boost fields which have value

2011-04-28 Thread Robert Petersen

I believe the sortMissingLast fieldtype attribute is what you want: http://wiki.apache.org/solr/SchemaXml

-Original Message-
From: Zoltán Altfatter [mailto:altfatt...@gmail.com] 
Sent: Thursday, April 28, 2011 6:11 AM
To: solr-user@lucene.apache.org
Subject: boost fields which have value

Hi,

How can I achieve that documents which don't have field1 and field2 filled
in, are returned in the end of the search result.

I have tried with *bf* parameter, which seems to work but just with one
field.

Is there any function query which I can use in bf value to boost two fields?

Thank you.

Regards,
Zoltan

Re: Searching for escaped characters

2011-04-28 Thread Mike Sokolov

StandardTokenizer will have stripped punctuation I think.  You might try 
searching for all the entity names though:


(agrave | egrave | omacron | etc... )

The names are pretty distinctive.  Although you might have problems with 
greek letters.


-Mike

On 04/28/2011 12:10 PM, Paul wrote:

I'm trying to create a test to make sure that character sequences like
"è" are successfully converted to their equivalent utf
character (that is, in this case, "è").

So, I'd like to search my solr index using the equivalent of the
following regular expression:

&\w{1,6};

To find any escaped sequences that might have slipped through.

Is this possible? I have indexed these fields with text_lu, which
looks like this:


   
 
 
 
   
 

Thanks,
Paul

Re: SolrQuery#setStart(Integer) ???

2011-04-28 Thread Leonardo Souza

Hi Erick,

Correct, i cut some zeros while reading the javadocs, thanks for the heads
up!


[ ]'s
Leonardo da S. Souza
 °v°   Linux user #375225
 /(_)\   http://counter.li.org/
 ^ ^



On Wed, Apr 27, 2011 at 8:13 PM, Erick Erickson wrote:

> Well, the java native int fomat is 32 bits, so  unless you're returning
> over 2 billion documents, you should be OK. But you'll run into other
> issues
> long before you get to that range.
>
> Best
> Erick
>
> On Wed, Apr 27, 2011 at 5:25 PM, Leonardo Souza 
> wrote:
> > Hi Guys,
> >
> > We have an index with more than 3 millions documents, we use the
> pagination
> > feature through SolrQuery#setStart and SolrQuery#setRows
> > methods. Some queries can return a huge amount of documents and i'm worry
> > about the integer parameter of  the setStart method, this parameter
> > should be a long don't you think? For now i'm considering to use the
> > ModifiableSolrParams class. Any suggestion is welcome!
> >
> > thanks!
> >
> >
> > [ ]'s
> > Leonardo Souza
> >  °v°   Linux user #375225
> >  /(_)\   http://counter.li.org/
> >  ^ ^
> >
>

Re: Replicaiton Fails with Unreachable error when master host is responding.

2011-04-28 Thread Jed Glazner



  
  
Anybody?

On 04/27/2011 01:51 PM, Jed Glazner wrote:

  Hello All,

I'm having a very strange problem that I just can't figure out. The
slave is not able to replicate from the master, even though the master
is reachable from the slave machine.  I can telnet to the port it's
running on, I can use text based browsers to navigate the master from
the slave. I just don't understand why it won't replicate.  The admin
screen gives me an Unreachable in the status, and in the log there is an
exception thrown.  Details below:

BACKGROUND:

OS: Arch Linux
Solr Version: svn revision 1096983 from
https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/
No custom plugins, just whatever came with the version above.
Java Setup:

java version "1.6.0_22"
OpenJDK Runtime Environment (IcedTea6 1.10) (ArchLinux-6.b22_1.10-1-x86_64)
OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)

We have 3 cores running, all 3 cores are not able to replicate.

The admin on the slave shows  the Master as
http://solr-master-01_dev.la.bo:8983/solr/music/replication - *Unreachable*
Replicaiton def on the slave

  529 
  530 
  531 http://solr-master-01_dev.la.bo:8983/solr/music/replication
  532 00:15:00
  533 
  534 

Replication def on the master:

  529 
  530 
  531 commit
  532 startup
  533 schema.xml,stopwords.txt
  534 
  535 

Below is the log start to finish for replication attempts, note that it
says connection refused, however, I can telnet to 8983 from the slave to
the master, so I know it's up and reachable from the slave:

telnet solr-master-01_dev.la.bo 8983
Trying 172.12.65.58...
Connected to solr-master-01_dev.la.bo.
Escape character is '^]'.

I double checked the master to make sure that it didn't have replication
turned off, and it's not.  So I should be able to replicate but it
can't.  I just dont' know what else to check.  The log from the slave is
below.

Apr 27, 2011 7:39:45 PM org.apache.solr.request.SolrQueryResponse 
WARNING: org.apache.solr.request.SolrQueryResponse is deprecated. Please
use the corresponding class in org.apache.solr.response
Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: I/O exception (java.net.ConnectException) caught when processing
request: Connection refused
Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: Retrying request
Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: I/O exception (java.net.ConnectException) caught when processing
request: Connection refused
Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: Retrying request
Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: I/O exception (java.net.ConnectException) caught when processing
request: Connection refused
Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: Retrying request
Apr 27, 2011 7:39:45 PM org.apache.solr.handler.ReplicationHandler
getReplicationDetails
WARNING: Exception while invoking 'details' method for replication on
master
java.net.ConnectException: Connection refused
 at java.net.PlainSocketImpl.socketConnect(Native Method)
 at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
 at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
 at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
 at java.net.Socket.connect(Socket.java:546)
 at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at
org.apache.commons.httpclient.protocol.ReflectionSocketFactory.createSocket(ReflectionSocketFactory.java:140)
 at
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:125)
 at
org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
 at
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1361)
 at
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
 at
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
 at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
 at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
 at
org.apache.solr.handler.SnapPuller.getNamedListResponse(SnapPuller.java:193)
 at
org.apache.solr.handler.SnapPuller.getCommandResponse(SnapPuller.java:188)
 at
org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:588)

Re: Replicaiton Fails with Unreachable error when master host is responding.

2011-04-28 Thread Mike Sokolov


No clue. Try wireshark to gather more data?

On 04/28/2011 02:53 PM, Jed Glazner wrote:

Anybody?

On 04/27/2011 01:51 PM, Jed Glazner wrote:

Hello All,

I'm having a very strange problem that I just can't figure out. The
slave is not able to replicate from the master, even though the master
is reachable from the slave machine.  I can telnet to the port it's
running on, I can use text based browsers to navigate the master from
the slave. I just don't understand why it won't replicate.  The admin
screen gives me an Unreachable in the status, and in the log there is an
exception thrown.  Details below:

BACKGROUND:

OS: Arch Linux
Solr Version: svn revision 1096983 from
https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/
No custom plugins, just whatever came with the version above.
Java Setup:

java version "1.6.0_22"
OpenJDK Runtime Environment (IcedTea6 1.10) (ArchLinux-6.b22_1.10-1-x86_64)
OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)

We have 3 cores running, all 3 cores are not able to replicate.

The admin on the slave shows  the Master as
http://solr-master-01_dev.la.bo:8983/solr/music/replication  - *Unreachable*
Replicaiton def on the slave

   529
   530
   531http://solr-master-01_dev.la.bo:8983/solr/music/replication
   53200:15:00
   533
   534

Replication def on the master:

   529
   530
   531commit
   532startup
   533schema.xml,stopwords.txt
   534
   535

Below is the log start to finish for replication attempts, note that it
says connection refused, however, I can telnet to 8983 from the slave to
the master, so I know it's up and reachable from the slave:

telnet solr-master-01_dev.la.bo 8983
Trying 172.12.65.58...
Connected to solr-master-01_dev.la.bo.
Escape character is '^]'.

I double checked the master to make sure that it didn't have replication
turned off, and it's not.  So I should be able to replicate but it
can't.  I just dont' know what else to check.  The log from the slave is
below.

Apr 27, 2011 7:39:45 PM org.apache.solr.request.SolrQueryResponse
WARNING: org.apache.solr.request.SolrQueryResponse is deprecated. Please
use the corresponding class in org.apache.solr.response
Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: I/O exception (java.net.ConnectException) caught when processing
request: Connection refused
Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: Retrying request
Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: I/O exception (java.net.ConnectException) caught when processing
request: Connection refused
Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: Retrying request
Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: I/O exception (java.net.ConnectException) caught when processing
request: Connection refused
Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: Retrying request
Apr 27, 2011 7:39:45 PM org.apache.solr.handler.ReplicationHandler
getReplicationDetails
WARNING: Exception while invoking 'details' method for replication on
master
java.net.ConnectException: Connection refused
  at java.net.PlainSocketImpl.socketConnect(Native Method)
  at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
  at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
  at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
  at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
  at java.net.Socket.connect(Socket.java:546)
  at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
  at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:616)
  at
org.apache.commons.httpclient.protocol.ReflectionSocketFactory.createSocket(ReflectionSocketFactory.java:140)
  at
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:125)
  at
org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
  at
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1361)
  at
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
  at
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
  at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
  at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
  at
org.apache.solr.handler.SnapPuller.getNamedListResponse(SnapPuller.java:193)
  at
org.apache.solr.handler.SnapPuller.getCommandResponse(SnapPuller.java:188)
  at
org.apache.solr.

Re: fq parameter with partial value

2011-04-28 Thread Erick Erickson

See below:


On Thu, Apr 28, 2011 at 9:03 AM, elisabeth benoit
 wrote:
> yes, the multivalued field is not broken up into tokens.
>
> so, if I understand well what you mean, I could have
>
> a field CATEGORY with  multiValued="true"
> a field CATEGORY_TOKENIZED with  multiValued=" true"
>
> and then some POI
>
> POI_Name
> ...
> Restaurant Hotel
> Restaurant
> Hotel

[EOE] If the above is the document you're sending, then no. The
document would be indexed with
Restaurant Hotel
Restaurant Hotel


Or even just:
Restaurant Hotel

and set up a  to copy the value from CATEGORY to CATEGORY_TOKENIZED.

The multiValued part comes from:
"And a single POIs might have different categories so your document could have"
which would look like:
Restaruant Hotel
Health Spa
Dance Hall

and your document would be counted for each of those entries while searches
against CATEGORY_TOKENIZED would match things like "dance" "spa" etc.

But do notice that if you did NOT want searching for "restaurant hall"
(no quotes),
to match then you could do proximity searches for less than your
increment gap. e.g.
(this time with the quotes) would be "restaurant hall"~50, which would then
NOT match if your increment gap were 100.

Best
Erick


>
> do faceting on CATEGORY and fq on CATEGORY_TOKENIZED.
>
> But then, wouldn't it be possible to do faceting on CATEGORY_TOKENIZED?
>
> Best regards
> Elisabeth
>
>
> 2011/4/28 Erick Erickson 
>
>> So, I assume your CATEGORY field is multiValued but each value is not
>> broken up into tokens, right? If that's the case, would it work to have a
>> second field CATEGORY_TOKENIZED and run your fq against that
>> field instead?
>>
>> You could have this be a multiValued field with an increment gap if you
>> wanted
>> to prevent matches across separate entries and have your fq do a proximity
>> search where the proximity was less than the increment gap
>>
>> Best
>> Erick
>>
>> On Thu, Apr 28, 2011 at 6:03 AM, elisabeth benoit
>>  wrote:
>> > Hi Stefan,
>> >
>> > Thanks for answering.
>> >
>> > In more details, my problem is the following. I'm working on searching
>> > points of interest (POIs), which can be hotels, restaurants, plumbers,
>> > psychologists, etc.
>> >
>> > Those POIs can be identified among other things  by categories or by
>> brand.
>> > And a single POIs might have different categories (no maximum number).
>> User
>> > might enter a query like
>> >
>> >
>> > McDonald’s Paris
>> >
>> >
>> > or
>> >
>> >
>> > Restaurant Paris
>> >
>> >
>> > or
>> >
>> >
>> > many other possible queries
>> >
>> >
>> > First I want to do a facet search on brand and categories, to find out
>> which
>> > case is the current case.
>> >
>> >
>> > http://localhost:8080/solr /select?q=restaurant  paris
>> > &facet=true&facet.field=BRAND& facet.field=CATEGORY
>> >
>> > and get an answer like
>> >
>> > 
>> >
>> > 
>> >
>> > 598
>> >
>> > 451
>> >
>> >
>> >
>> > Then I want to send a request with fq= CATEGORY: Restaurant and still get
>> > answers with CATEGORY= Restaurant Hotel.
>> >
>> >
>> >
>> > One solution would be to modify the data to add a new document every time
>> we
>> > have a new category, so a POI with three different categories would be
>> index
>> > three times, each time with a different category.
>> >
>> >
>> > But I was wondering if there was another way around.
>> >
>> >
>> >
>> > Thanks again,
>> >
>> > Elisabeth
>> >
>> >
>> > 2011/4/28 Stefan Matheis 
>> >
>> >> Hi Elisabeth,
>> >>
>> >> that's not what FilterQueries are made for :) What against using that
>> >> Criteria in the Query?
>> >> Perhaps you want to describe your UseCase and we'll see if there's
>> >> another way to solve it?
>> >>
>> >> Regards
>> >> Stefan
>> >>
>> >> On Thu, Apr 28, 2011 at 9:09 AM, elisabeth benoit
>> >>  wrote:
>> >> > Hello,
>> >> >
>> >> > I would like to know if there is a way to use the fq parameter with a
>> >> > partial value.
>> >> >
>> >> > For instance, if I have a request with fq=NAME:Joe, and I would like
>> to
>> >> > retrieve all answers where NAME contains Joe, including those with
>> NAME =
>> >> > Joe Smith.
>> >> >
>> >> > Thanks,
>> >> > Elisabeth
>> >> >
>> >>
>> >
>>
>

Re: Extra facet query from within a custom search component

2011-04-28 Thread Erick Erickson

Have you looked at: http://wiki.apache.org/solr/TermsComponent?

Best
Erick

On Thu, Apr 28, 2011 at 2:44 PM, Frederik Kraus
 wrote:
> Hi Guys,
>
> I'm currently working on a custom search component and need to fetch a list 
> of all possible values within a certain field.
> An internal facet (wildcard) query first came to mind, but I'm not quite sure 
> how to best create and then execute such a query ...
>
> What would be the best way to do this?
>
> Can anyone please point me in the right direction?
>
> Thanks,
>
> Fred.

Problem with autogeneratePhraseQueries=false

2011-04-28 Thread solr_beginner

Hi,
 
I'm new to solr. My solr instance version is:
 
Solr Specification Version: 3.1.0
Solr Implementation Version: 3.1.0 1085815 - grantingersoll - 2011-03-26 
18:00:07
Lucene Specification Version: 3.1.0
Lucene Implementation Version: 3.1.0 1085809 - 2011-03-26 18:06:58
Current Time: Tue Apr 26 08:01:09 CEST 2011
Server Start Time:Tue Apr 26 07:59:05 CEST 2011
 
I have following definition for textgen type:
 
 
  





  
  





  

 
 
I'm using this type for name field in my index. As you can see I'm 
using autoGeneratePhraseQueries="false" but for query sony vaio 4gb I'm getting 
following query in debug:
 
 
  sony vaio 4gb 
  sony vaio 4gb 
  +name:sony +name:vaio +MultiPhraseQuery(name:"(4gb 4) 
gb") 
  +name:sony +name:vaio +name:"(4gb 4) 
gb"
 
Do you have any idea how can I avoid this MultiPhraseQuery?
 
Best Regards,
solr_beginner

Re: Problem with autogeneratePhraseQueries

2011-04-28 Thread Marcin Kostuch

Thank you very much for answer.

You were right. There was no luceneMatchVersion in solrconfig.xml of our dev
core. We thought that values not present in core configuration are copied
from main solrconfig.xml. I will investigate if our administrators did
something wrong during upgrade to 3.1.

On Tue, Apr 26, 2011 at 1:35 PM, Robert Muir  wrote:

> What do you have in solrconfig.xml for luceneMatchVersion?
>
> If you don't set this, then its going to default to "Lucene 2.9"
> emulation so that old solr 1.4 configs work the same way. I tried your
> example and it worked fine here, and I'm guessing this is probably
> whats happening.
>
> the default in the example/solrconfig.xml looks like this:
>
> 
> LUCENE_31
>
> On Tue, Apr 26, 2011 at 6:51 AM, Solr Beginner 
> wrote:
> > Hi,
> >
> > I'm new to solr. My solr instance version is:
> >
> > Solr Specification Version: 3.1.0
> > Solr Implementation Version: 3.1.0 1085815 - grantingersoll - 2011-03-26
> > 18:00:07
> > Lucene Specification Version: 3.1.0
> > Lucene Implementation Version: 3.1.0 1085809 - 2011-03-26 18:06:58
> > Current Time: Tue Apr 26 08:01:09 CEST 2011
> > Server Start Time:Tue Apr 26 07:59:05 CEST 2011
> >
> > I have following definition for textgen type:
> >
> >   positionIncrementGap="100"
> > autoGeneratePhraseQueries="false">
> >  
> > 
> >  > words="stopwords.txt" enablePositionIncrements="true" />
> >  > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> > preserveOriginal="1"/>
> > 
> >  maxGramSize="15"
> > side="front" preserveOriginal="1"/>
> >  
> >  
> > 
> >  > ignoreCase="true" expand="true"/>
> >  > ignoreCase="true"
> > words="stopwords.txt"
> > enablePositionIncrements="true"/>
> >  > generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> > catenateAll="0" preserveOriginal="1"/>
> > 
> >  
> > 
> >
> >
> > I'm using this type for name field in my index. As you can see I'm
> > using autoGeneratePhraseQueries="false" but for query sony vaio 4gb I'm
> > getting following query in debug:
> >
> > 
> >  sony vaio 4gb
> >  sony vaio 4gb
> >  +name:sony +name:vaio
> +MultiPhraseQuery(name:"(4gb
> > 4) gb")
> >  +name:sony +name:vaio +name:"(4gb 4)
> > gb"
> >
> > Do you have any idea how can I avoid this MultiPhraseQuery?
> >
> > Best Regards,
> > solr_beginner
> >
>

Dynamically loading xml files from webapplication to index

2011-04-28 Thread sankar

In our webapp, we need to upload a xml  data file  from the UI(dialogue box)
for  indexing 
we are not able to find the solution in documentation. plz suggest what is
the way to implement it



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dynamically-loading-xml-files-from-webapplication-to-index-tp2865890p2865890.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: fieldCache only on stats page

2011-04-28 Thread Marcin Kostuch

Solr version:

Solr Specification Version: 3.1.0
Solr Implementation Version: 3.1.0 1085815 - grantingersoll -
2011-03-26 18:00:07
Lucene Specification Version: 3.1.0
Lucene Implementation Version: 3.1.0 1085809 - 2011-03-26 18:06:58
Current Time: Wed Apr 27 14:28:34 CEST 2011
Server Start Time:Wed Apr 27 11:07:00 CEST 2011

According to cache I can see only following informations:

CACHE

name:fieldCache
class:   org.apache.solr.search.SolrFieldCacheMBean
version: 1.0
description: Provides introspection of the Lucene FieldCache, this
is **NOT** a cache that is managed by Solr.
sourceid:$Id: SolrFieldCacheMBean.java 984594 2010-08-11 21:42:04Z 
yonik $
source:  $URL:
https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/solr/src/java/org/apache/solr/search/SolrFieldCacheMBean.java
$
name:fieldValueCache
class:   org.apache.solr.search.FastLRUCache
version: 1.0
description: Concurrent LRU Cache(maxSize=1, initialSize=10,
minSize=9000, acceptableSize=9500, cleanupThread=false)
sourceid:$Id: FastLRUCache.java 1065312 2011-01-30 16:08:25Z rmuir $
source:  $URL:
https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/solr/src/java/org/apache/solr/search/FastLRUCache.java
$

Nothing about filterCache or documentCache ;/

Best Regards,
Solr Beginner

On Wed, Apr 27, 2011 at 2:00 PM, Erick Erickson  wrote:
> There's nothing special you need to do to be able to view the various
> stats from admin/stats.jsp. If another look doesn't show them, could you
> post a screenshot?
>
> And please include the version of Solr you're using, I checked with 1.4.1.
>
> Best
> Erick
>
> On Wed, Apr 27, 2011 at 1:44 AM, Solr Beginner  wrote:
>> Hi,
>>
>> I can see only fieldCache (nothing about filter, query or document
>> cache) on stats page. What I'm doing wrong? We have two servers with
>> replication. There are two cores(prod, dev) on each server. Maybe I
>> have to add something to solrconfig.xml of cores?
>>
>> Best Regards,
>> Solr Beginner
>>
>

Re: Extra facet query from within a custom search component

2011-04-28 Thread Frederik Kraus

Haaa fantastic! 

Thanks a lot!

Fred.
On Donnerstag, 28. April 2011 at 22:21, Erick Erickson wrote: 
> Have you looked at: http://wiki.apache.org/solr/TermsComponent?
> 
> Best
> Erick
> 
> On Thu, Apr 28, 2011 at 2:44 PM, Frederik Kraus
>  wrote:
> > Hi Guys,
> > 
> > I'm currently working on a custom search component and need to fetch a list 
> > of all possible values within a certain field.
> > An internal facet (wildcard) query first came to mind, but I'm not quite 
> > sure how to best create and then execute such a query ...
> > 
> > What would be the best way to do this?
> > 
> > Can anyone please point me in the right direction?
> > 
> > Thanks,
> > 
> > Fred.
>

Re: AlternateDistributedMLT.patch not working (SOLR-788)

2011-04-28 Thread Shawn Heisey


On 2/23/2011 11:53 AM, Otis Gospodnetic wrote:

Hi Isha,

The patch is out of date.  You need to look at the patch and rejection and
update your local copy of the code to match the logic from the patch, if it's
still applicable to the version of Solr source code you have.


We have a need for distributed More Like This.  We're gearing up for a 
deployment of 3.1, so a patch against 1.4.1 is not very useful for us.


I've spent the last couple of days trying to rework both the original 
and the alternate patches on SOLR-788 to work against 3.1.  I don't 
understand enough about the code to know how to fix it.  I knew I had to 
change the value of PURPOSE_GET_MLT_RESULTS  to 0x800 because of the 
conflict with PURPOSE_GET_TERMS, but the changes in 
MoreLikeThisComponent.java are beyond me.


Thanks,
Shawn

Re: Spatial Search

2011-04-28 Thread Jan Høydahl

1) Create an extra String field on your bean as Yonik suggests or
2) Write an UpdateRequestHandler which reads the doubles and creates the LatLon 
from that

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 28. apr. 2011, at 14.44, Yonik Seeley wrote:

> On Thu, Apr 28, 2011 at 5:15 AM, Jonas Lanzendörfer
>  wrote:
>> I am new to solr and try to use the spatial search feature which was added 
>> in 3.1. In my schema.xml I have 2 double fields for latitude and longitude. 
>> How can I get them into the location field type? I use solrj to fill the 
>> index with data. If I would use a location field instead of two double 
>> fields, how could I fill this with solrj? I use annotations to link the data 
>> from my dto´s to the index fields...
> 
> 
> I've not used the annotation stuff in SolrJ, but since the value sent
> in must be of the for 10.3,20.4 then
> I guess one would have to have a String field with this value on your object.
> 
> 
> -Yonik
> http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
> 25-26, San Francisco

Rép : Re: manual background re-indexing

2011-04-28 Thread Paul Libbrecht

> It would probable be safest just to set up a separate system as
> multi-core from the start, get the process working and then either use
> the new machine or copy the whole setup to the production machine.> 
> On Thu, Apr 28, 2011 at 8:49 AM, Paul Libbrecht  wrote:
>> Just where to do I put the new index data with such a command? Simply replacing the segment files appears dangerous to me. Any idea where I should put the data directory before calling the reload command?paul

Re: Rép : Re: manual background re-indexing

2011-04-28 Thread Erick Erickson

You simply create two cores. One in solr/cores/core1 and another in
solr/cores/core2
They each have a separate conf and data directory,and the index in in
core#/data/index.

Really, its' just introducing one more level. You can experiment just
by configuring a core
and copying your index to solr/cores/yourcore/data/index. After, of
course, configuring
Solr.xml to understand cores.

Best
Erick

On Thu, Apr 28, 2011 at 7:27 PM, Paul Libbrecht  wrote:
>> It would probable be safest just to set up a separate system as
>> multi-core from the start, get the process working and then either use
>> the new machine or copy the whole setup to the production machine.
>>
>> On Thu, Apr 28, 2011 at 8:49 AM, Paul Libbrecht  wrote:
>>> Just where to do I put the new index data with such a command? Simply
>>> replacing the segment files appears dangerous to me.
>
>
> Any idea where I should put the data directory before calling the reload
> command?
> paul

Location of Solr Logs

2011-04-28 Thread Geeta Subramanian

Hi,

I am newbee to SOLR.
Can you please help me to know where can see the logs written by SOLR?
Is there any configuration required to see the logs of SOLR?

Thanks for your time and help,
Geeta
**Legal Disclaimer***
"This communication may contain confidential and privileged
material for the sole use of the intended recipient. Any
unauthorized review, use or distribution by others is strictly
prohibited. If you have received the message in error, please
advise the sender by reply email and delete the message. Thank
you."
*

Can the Suggester be updated incrementally?

2011-04-28 Thread Andy

I'm interested in using Suggester (http://wiki.apache.org/solr/Suggester) for 
auto-complete on the field "Document Title".

Does Suggester (either FST, TST or Jaspell) support incremental updates? Say I 
want to add a new document title to the Suggester, or to change the weight of 
an existing document title, would I need to rebuild the entire tree for every 
update?

Also, can the Suggester be sharded? If the size of the tree gets bigger than 
the RAM size, is it possible to shard the Suggester across multiple machines?

Thanks
Andy

Re: Can the Suggester be updated incrementally?

2011-04-28 Thread Jason Rutherglen

It's answered on the wiki site:

"TSTLookup - ternary tree based representation, capable of immediate
data structure updates"

Although the EdgeNGram technique is probably more widely adopted, eg,
it's closer to what Google has implemented.

http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/

On Thu, Apr 28, 2011 at 9:37 PM, Andy  wrote:
> I'm interested in using Suggester (http://wiki.apache.org/solr/Suggester) for 
> auto-complete on the field "Document Title".
>
> Does Suggester (either FST, TST or Jaspell) support incremental updates? Say 
> I want to add a new document title to the Suggester, or to change the weight 
> of an existing document title, would I need to rebuild the entire tree for 
> every update?
>
> Also, can the Suggester be sharded? If the size of the tree gets bigger than 
> the RAM size, is it possible to shard the Suggester across multiple machines?
>
> Thanks
> Andy
>

Re: Question on Batch process

2011-04-28 Thread Otis Gospodnetic

Charles,

Maybe the question to ask is why you are committing at all?  Do you need 
somebody to see index changes while you are indexing?  If not, commit just at 
the end.  And optimize if you won't touch the index for a while.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Charles Wardell 
> To: solr-user@lucene.apache.org
> Sent: Wed, April 27, 2011 7:51:20 PM
> Subject: Re: Question on Batch process
> 
> Thank you for your response. I did not make the StreamingUpdate application 
>yet,  but I did change the other settings that you mentioned. It gave me a 
>huge 
>boost  in indexing speed. (I am still using post.sh but hope to change that  
>soon).
> 
> One thing I noticed is the indexing speed was incredibly fast last  night, 
> but 
>today the commits are taking so long. Is this to be  expected?
> 
> 
> 
> -- 
> Best Regards,
> 
> Charles Wardell
> Blue  Chips Technology, Inc.
> www.bcsolution.com
> 
> On Wednesday, April 27, 2011  at 6:15 PM, Otis Gospodnetic wrote: 
> > Hi Charles,
> > 
> > Yes,  the threads I was referring to are in the context of the 
>client/indexer, so 
>
> > one of the params for StreamingUpdateSolrServer.
> > post.sh/jar  are just there because they are handy. Don't use them for 
> >  production.
> > 
> > It's impossible to tell how long indexing of 100M  documents may take. They 
> > could be very big or very small. You could  perform very light or no 
> > analysis 
>or 
>
> > heavy analysis. They could contain  1 or 100 fields. :)
> > 
> > Otis
> > 
> > Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
> > Lucene ecosystem search :: http://search-lucene.com/
> > 
> > 
> > 
> > - Original  Message 
> > > From: Charles Wardell 
> >  > To: solr-user@lucene.apache.org
> >  > Sent: Tue, April 26, 2011 8:01:28 PM
> > > Subject: Re: Question on  Batch process
> > > 
> > > Thank you Otis.
> > > Without  trying to appear to stupid, when you refer to having the params 
> > >  matching your # of CPU cores, you are talking about the # of threads I 
> > > can 
>
> > > spawn with the StreamingUpdateSolrServer object?
> > > Up  until now, I have been just utilizing post.sh or post.jar. Are these 
> >  > capable of that or do I need to write some code to collect a bunch of 
>files 
>
> > > into the buffer and send it off?
> > > 
> > > Also,  Do you have a sense for how long it should take to index 100,000 
>files 
>
> >  > or in my case 100,000,000 documents?
> > >  StreamingUpdateSolrServer
> > > public StreamingUpdateSolrServer(String  solrServerUrl, int queueSize, 
> > > int 

> > > threadCount) throws  MalformedURLException
> > > 
> > > Thanks again,
> > >  Charlie
> > > 
> > > -- 
> > > Best Regards,
> > > 
> > > Charles Wardell
> > > Blue Chips Technology, Inc.
> >  > www.bcsolution.com
> > > 
> > > On Tuesday, April 26, 2011 at  5:12 PM, Otis Gospodnetic wrote: 
> > > > Charlie,
> > > > 
> > > > How's this:
> > > > * -Xmx2g
> > > > *  ramBufferSizeMB 512
> > > > * mergeFactor 10 (default, but you could  up it to 20, 30, if ulimit -n 
> > > allows)
> > > > *  ignore/delete maxBufferedDocs - not used if you ran ramBufferSizeMB
> > >  > * use SolrStreamingUpdateServer (with params matching your number of 
> > > CPU 
>
> > > cores) 
> > > 
> > > > or send batches of say  1000 docs with the other SolrServer impl using 
> > > > N 

> > > threads 
> >  > 
> > > > (N=# of your CPU cores)
> > > > 
> > >  > Otis
> > > >  
> > > > Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
> > > > Lucene ecosystem search :: http://search-lucene.com/
> > > > 
> > > > 
> >  > > 
> > > > - Original Message 
> > > > >  From: Charles Wardell 
> >  > > > To: solr-user@lucene.apache.org
> >  > > > Sent: Tue, April 26, 2011 2:32:29 PM
> > > > >  Subject: Question on Batch process
> > > > > 
> > > >  > I am sure that this question has been asked a few times, but I can't 
>seem 
>
> > > to 
> > > 
> > > > > find the sweetspot for  indexing.
> > > > > 
> > > > > I have about 100,000  files each containing 1,000 xml documents ready 
>to be 
>
> > > 
> >  > > > posted to Solr. My desire is to have it index as quickly as  
> > possible 
>and 
>
> > > then 
> > > 
> > > > > once  completed the daily stream of ADDs will be small in comparison.
> > >  > > 
> > > > > The individual documents are small.  Essentially web postings from 
> > > > > the 
>net. 
>
> > > 
> > > > >  Title, postPostContent, date. 
> > > > > 
> > > > > 
> > > > >  What would be the ideal configuration? For  RamBufferSize, 
>mergeFactor, 
>
> > > > > MaxbufferedDocs,  etc..
> > > > > 
> > > > > My machine is a quad core  hyper-threaded. So it shows up as 8 cpu's 
> > > > > in 
>
> > TOP
> > > > >  I have 16GB of available ram.
> > > > > 
> > > > > 
> > > > > Thanks in advance.
> > > > >  Charlie
> > 
>

Re: Can the Suggester be updated incrementally?

2011-04-28 Thread Andy

--- On Fri, 4/29/11, Jason Rutherglen  wrote:

> It's answered on the wiki site:
> 
> "TSTLookup - ternary tree based representation, capable of
> immediate
> data structure updates"
> 

But how to update it? 

The wiki talks about getting data sources from a file or from the main index. 
In either case it sounds like the entire data structure will be rebuilt, no?

Re: Location of Solr Logs

2011-04-28 Thread Grijesh

You can see solr logs at your servlet container's log file i.e. if you are
using Tomcat it can be found at
[CATALINA_HOME]/logs/catalina.XXX.log


-Thanx: 
Grijesh 
www.gettinhahead.co.in --
View this message in context: 
http://lucene.472066.n3.nabble.com/Location-of-Solr-Logs-tp2877510p2878294.html
Sent from the Solr - User mailing list archive at Nabble.com.

43 matches

Mail list logo