Re: Setting many properties for a multivalued field. Schema.xml ? External file?

2010-06-26 Thread Saïd Radhouani
Thanks so much Otis. This is working great.

Now, I'm trying to make facets on pictures: display doc w/ pic vs. doc w/o pic

To the best of my knowledge, everyone is saying that faceting cannot be done on 
dynamic fields (only on definitive field names). Thus, I tried the following 
and it's working: I assume that the stored pictures have a sequential number 
(_1, _2, etc.), i.e., if pic_url_1 exists in the index, it means that the 
underlying doc has at least one picture: 

...&facet=on&facet.field=pic_url_1&facet.mincount=1&fq=pic_url_1:*

While this is working fine, I'm wondering whether there's a cleaner way to do 
the same thing without assuming that pictures have a sequential number.

Also, do you have any documentation about handling Dynamic Fields using SolrJ. 
So far, I found only issues about that on JIRA, but no documentation.

Thanks a lot.

-Saïd

On Jun 26, 2010, at 1:18 AM, Otis Gospodnetic wrote:

> Saïd,
> 
> Dynamic fields could help here, for example imagine a doc with:
> id
> pic_url_*
> pic_caption_*
> pic_description_*
> 
> See http://wiki.apache.org/solr/SchemaXml#Dynamic_fields
> 
> So, for you:
> 
> 
>  stored="true"/>
>  stored="true"/>
> 
> Then you can add docs with unlimited number of 
> pic_(url|caption|description)_* fields, e.g.
> 
> id
> pic_url_1
> pic_caption_1
> pic_description_1
> 
> id
> pic_url_2
> pic_caption_2
> pic_description_2
> 
> 
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
> 
> 
> 
> - Original Message 
>> From: Saïd Radhouani 
>> To: solr-user@lucene.apache.org
>> Sent: Fri, June 25, 2010 6:01:13 PM
>> Subject: Setting many properties for a multivalued field. Schema.xml ? 
>> External file?
>> 
>> Hi,
> 
> I'm trying to index data containing a multivalued field "picture", 
>> that has three properties: url, caption and description:
> 
>  
>> 
>
> 
>> 
>
> 
> Thus, each 
>> indexed document might have many pictures, each of them has a url, a 
>> caption, 
>> and a description.
> 
> I wonder wether it's possible to store this data using 
>> only schema.xml. I couldn't figure it out so far. Instead, I'm thinking of 
>> using 
>> an external file to sore the properties of each picture, but I haven't tried 
>> yet 
>> this solution, waiting for your suggestions...
> 
> Thanks,
> -Saïd



Re: Setting many properties for a multivalued field. Schema.xml ? External file?

2010-06-26 Thread Geert-Jan Brits
You can treat dynamic fields like any other field, so you can facet, sort,
filter, etc on these fields (afaik)

I believe the confusion arises that sometimes the usecase for dynamic fields
seems to be ill-understood, i.e: to be able to use them to do some kind of
wildcard search, e.g: search for a value in any of the dynamic fields at
once like pic_url_*. This however is NOT possible.

As far as your question goes:

>Now, I'm trying to make facets on pictures: display doc w/ pic vs. doc w/o
pic
>To the best of my knowledge, everyone is saying that faceting cannot be
done on dynamic fields (only on definitive field names). Thus, I tried the
following and it's working: I assume that the stored > >pictures have a
sequential number (_1, _2, etc.), i.e., if pic_url_1 exists in the index, it
means that the underlying doc has at least one picture:
> ...&facet=on&facet.field=pic_url_1&facet.mincount=1&fq=pic_url_1:*
> While this is working fine, I'm wondering whether there's a cleaner way to
do the same thing without assuming that pictures have a sequential number.

If I understand your question correctly: faceting on docs with and without
pics could ofcourse by done like you mention, however it  would be more
efficient to have an extra field defined:  hasAtLestOnePic with values (0 |
1)
use that to facet / filter on.

you can extend this to NrOfPics [0,N)  if you need to filter / facet on docs
with a certain nr of pics.

also I wondered what else you wanted to do with this pic-related info. Do
you want to search on pic-description / pic-caption for instance? In that
case the dynamic-fields approach may not be what you want: how would you
know in which dynamic-field to search for a particular term? Would if be
pic_desc_1 , or pic_desc_x?  Of couse you could OR over all dynamic fields,
but you need to know how many pics an upperbound for the nr of pics and it
really doesn't feel right, to me at least.

If you need search on pic_description for instance, but don't mind what pic
matches, you could create a single field pic_description and put in the
concat of all pic-descriptions and search on that, or just make it a a
multi-valued field.

If you dont need search at all on these fields, the best thing imo is to
store all pic-related info of all pics together by concatenating them with
some delimiter which you know how to seperate at the client-side.
That or just store it in an external RDB since solr is just sitting on the
data and not doing anything intelligent with it.

I assume btw that you don't want to sort/ facet on pic-desc / pic_caption/
pic_url either ( I have a hard time thinking of a useful usecase for that)

HTH,

Geert-Jan



2010/6/26 Saïd Radhouani 

> Thanks so much Otis. This is working great.
>
> Now, I'm trying to make facets on pictures: display doc w/ pic vs. doc w/o
> pic
>
> To the best of my knowledge, everyone is saying that faceting cannot be
> done on dynamic fields (only on definitive field names). Thus, I tried the
> following and it's working: I assume that the stored pictures have a
> sequential number (_1, _2, etc.), i.e., if pic_url_1 exists in the index, it
> means that the underlying doc has at least one picture:
>
> ...&facet=on&facet.field=pic_url_1&facet.mincount=1&fq=pic_url_1:*
>
> While this is working fine, I'm wondering whether there's a cleaner way to
> do the same thing without assuming that pictures have a sequential number.
>
> Also, do you have any documentation about handling Dynamic Fields using
> SolrJ. So far, I found only issues about that on JIRA, but no documentation.
>
> Thanks a lot.
>
> -Saïd
>
> On Jun 26, 2010, at 1:18 AM, Otis Gospodnetic wrote:
>
> > Saïd,
> >
> > Dynamic fields could help here, for example imagine a doc with:
> > id
> > pic_url_*
> > pic_caption_*
> > pic_description_*
> >
> > See http://wiki.apache.org/solr/SchemaXml#Dynamic_fields
> >
> > So, for you:
> >
> >   stored="true"/>
> >   stored="true"/>
> >   stored="true"/>
> >
> > Then you can add docs with unlimited number of
> pic_(url|caption|description)_* fields, e.g.
> >
> > id
> > pic_url_1
> > pic_caption_1
> > pic_description_1
> >
> > id
> > pic_url_2
> > pic_caption_2
> > pic_description_2
> >
> >
> > Otis
> > 
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > Lucene ecosystem search :: http://search-lucene.com/
> >
> >
> >
> > - Original Message 
> >> From: Saïd Radhouani 
> >> To: solr-user@lucene.apache.org
> >> Sent: Fri, June 25, 2010 6:01:13 PM
> >> Subject: Setting many properties for a multivalued field. Schema.xml ?
> External file?
> >>
> >> Hi,
> >
> > I'm trying to index data containing a multivalued field "picture",
> >> that has three properties: url, caption and description:
> >
> > 
> >>
> >
> >
> >> 
> >
> >
> > Thus, each
> >> indexed document might have many pictures, each of them has a url, a
> caption,
> >> and a description.
> >
> > I wonder wether it's possible to store this data using
> >> only schema.xml. I couldn't fi

Re: Setting many properties for a multivalued field. Schema.xml ? External file?

2010-06-26 Thread Saïd Radhouani
Thanks Geert-Jan for the detailed answer. Actually, I don't search at all on 
these fields. I'm only filtering (w/ vs w/ pic) and sorting (based on the 
number of pictures). Thus, your suggestion of adding an extra field NrOfPics 
[0,N] would be the best solution.

Regarding the other suggestion:

> If you dont need search at all on these fields, the best thing imo is to
> store all pic-related info of all pics together by concatenating them with
> some delimiter which you know how to seperate at the client-side.
> That or just store it in an external RDB since solr is just sitting on the
> data and not doing anything intelligent with it.

If I understand your suggestion correctly, you said that there's NO need to 
have many Dynamic Fields; instead, we can have one definitive field name, which 
can store a long string (concatenation of information about tens of pictures), 
e.g., using "-" and "%" delimiters: 
pic_url_value1-pic_caption_value1-pic_description_value1%pic_url_value2-pic_caption_value2-pic_description_value2%...

I don't clearly see the reason of doing this. Is there a gain in terms of 
performance? Or does this make programming on the client-side easier? Or 
something else?


My other question was: in case we use Dynamic Fields, is there a documentation 
about using SolrJ for this purpose? 

Thanks
-Saïd

On Jun 26, 2010, at 12:29 PM, Geert-Jan Brits wrote:

> You can treat dynamic fields like any other field, so you can facet, sort,
> filter, etc on these fields (afaik)
> 
> I believe the confusion arises that sometimes the usecase for dynamic fields
> seems to be ill-understood, i.e: to be able to use them to do some kind of
> wildcard search, e.g: search for a value in any of the dynamic fields at
> once like pic_url_*. This however is NOT possible.
> 
> As far as your question goes:
> 
>> Now, I'm trying to make facets on pictures: display doc w/ pic vs. doc w/o
> pic
>> To the best of my knowledge, everyone is saying that faceting cannot be
> done on dynamic fields (only on definitive field names). Thus, I tried the
> following and it's working: I assume that the stored > >pictures have a
> sequential number (_1, _2, etc.), i.e., if pic_url_1 exists in the index, it
> means that the underlying doc has at least one picture:
>> ...&facet=on&facet.field=pic_url_1&facet.mincount=1&fq=pic_url_1:*
>> While this is working fine, I'm wondering whether there's a cleaner way to
> do the same thing without assuming that pictures have a sequential number.
> 
> If I understand your question correctly: faceting on docs with and without
> pics could ofcourse by done like you mention, however it  would be more
> efficient to have an extra field defined:  hasAtLestOnePic with values (0 |
> 1)
> use that to facet / filter on.
> 
> you can extend this to NrOfPics [0,N)  if you need to filter / facet on docs
> with a certain nr of pics.
> 
> also I wondered what else you wanted to do with this pic-related info. Do
> you want to search on pic-description / pic-caption for instance? In that
> case the dynamic-fields approach may not be what you want: how would you
> know in which dynamic-field to search for a particular term? Would if be
> pic_desc_1 , or pic_desc_x?  Of couse you could OR over all dynamic fields,
> but you need to know how many pics an upperbound for the nr of pics and it
> really doesn't feel right, to me at least.
> 
> If you need search on pic_description for instance, but don't mind what pic
> matches, you could create a single field pic_description and put in the
> concat of all pic-descriptions and search on that, or just make it a a
> multi-valued field.
> 
> If you dont need search at all on these fields, the best thing imo is to
> store all pic-related info of all pics together by concatenating them with
> some delimiter which you know how to seperate at the client-side.
> That or just store it in an external RDB since solr is just sitting on the
> data and not doing anything intelligent with it.
> 
> I assume btw that you don't want to sort/ facet on pic-desc / pic_caption/
> pic_url either ( I have a hard time thinking of a useful usecase for that)
> 
> HTH,
> 
> Geert-Jan
> 
> 
> 
> 2010/6/26 Saïd Radhouani 
> 
>> Thanks so much Otis. This is working great.
>> 
>> Now, I'm trying to make facets on pictures: display doc w/ pic vs. doc w/o
>> pic
>> 
>> To the best of my knowledge, everyone is saying that faceting cannot be
>> done on dynamic fields (only on definitive field names). Thus, I tried the
>> following and it's working: I assume that the stored pictures have a
>> sequential number (_1, _2, etc.), i.e., if pic_url_1 exists in the index, it
>> means that the underlying doc has at least one picture:
>> 
>> ...&facet=on&facet.field=pic_url_1&facet.mincount=1&fq=pic_url_1:*
>> 
>> While this is working fine, I'm wondering whether there's a cleaner way to
>> do the same thing without assuming that pictures have a sequential number.
>> 
>> Also, do you have any documentation ab

Re: Setting many properties for a multivalued field. Schema.xml ? External file?

2010-06-26 Thread Geert-Jan Brits
>If I understand your suggestion correctly, you said that there's NO need to
have many Dynamic Fields; instead, we can have one definitive field name,
which can store a long string (concatenation of >information about tens of
pictures), e.g., using "-" and "%" delimiters:
pic_url_value1-pic_caption_value1-pic_description_value1%pic_url_value2-pic_caption_value2-pic_description_value2%...
>I don't clearly see the reason of doing this. Is there a gain in terms of
performance? Or does this make programming on the client-side easier? Or
something else?

I think you should ask the exact opposite question. If you don't do anything
with these fields which Solr is particularly good at (searching / filtering
/ faceting/ sorting) why go through the trouble of creating dynamic fields?
 (more fields is more overhead cost/ tracking cost no matter how you look at
it)

Moreover, indeed from a client-view it's easier the way I suggested, since
otherwise you:
- would have to ask (through SolrJ) to include all dynamic fields to be
returned in the Fl-field (
http://wiki.apache.org/solr/CommonQueryParameters#fl). This is difficult,
because a-priori you don't know how many dynamic-fields to query. So in
other words you can't just ask SOlr (though SolrJ lik you asked) to just
return all dynamic fields beginning with pic_*. (afaik)
- your client iterate code (looping the pics) is a bit more involved.

HTH, Cheers,

Geert-Jan

2010/6/26 Saïd Radhouani 

> Thanks Geert-Jan for the detailed answer. Actually, I don't search at all
> on these fields. I'm only filtering (w/ vs w/ pic) and sorting (based on the
> number of pictures). Thus, your suggestion of adding an extra field NrOfPics
> [0,N] would be the best solution.
>
> Regarding the other suggestion:
>
> > If you dont need search at all on these fields, the best thing imo is to
> > store all pic-related info of all pics together by concatenating them
> with
> > some delimiter which you know how to seperate at the client-side.
> > That or just store it in an external RDB since solr is just sitting on
> the
> > data and not doing anything intelligent with it.
>
> If I understand your suggestion correctly, you said that there's NO need to
> have many Dynamic Fields; instead, we can have one definitive field name,
> which can store a long string (concatenation of information about tens of
> pictures), e.g., using "-" and "%" delimiters:
> pic_url_value1-pic_caption_value1-pic_description_value1%pic_url_value2-pic_caption_value2-pic_description_value2%...
>
> I don't clearly see the reason of doing this. Is there a gain in terms of
> performance? Or does this make programming on the client-side easier? Or
> something else?
>
>
> My other question was: in case we use Dynamic Fields, is there a
> documentation about using SolrJ for this purpose?
>
> Thanks
> -Saïd
>
> On Jun 26, 2010, at 12:29 PM, Geert-Jan Brits wrote:
>
> > You can treat dynamic fields like any other field, so you can facet,
> sort,
> > filter, etc on these fields (afaik)
> >
> > I believe the confusion arises that sometimes the usecase for dynamic
> fields
> > seems to be ill-understood, i.e: to be able to use them to do some kind
> of
> > wildcard search, e.g: search for a value in any of the dynamic fields at
> > once like pic_url_*. This however is NOT possible.
> >
> > As far as your question goes:
> >
> >> Now, I'm trying to make facets on pictures: display doc w/ pic vs. doc
> w/o
> > pic
> >> To the best of my knowledge, everyone is saying that faceting cannot be
> > done on dynamic fields (only on definitive field names). Thus, I tried
> the
> > following and it's working: I assume that the stored > >pictures have a
> > sequential number (_1, _2, etc.), i.e., if pic_url_1 exists in the index,
> it
> > means that the underlying doc has at least one picture:
> >> ...&facet=on&facet.field=pic_url_1&facet.mincount=1&fq=pic_url_1:*
> >> While this is working fine, I'm wondering whether there's a cleaner way
> to
> > do the same thing without assuming that pictures have a sequential
> number.
> >
> > If I understand your question correctly: faceting on docs with and
> without
> > pics could ofcourse by done like you mention, however it  would be more
> > efficient to have an extra field defined:  hasAtLestOnePic with values (0
> |
> > 1)
> > use that to facet / filter on.
> >
> > you can extend this to NrOfPics [0,N)  if you need to filter / facet on
> docs
> > with a certain nr of pics.
> >
> > also I wondered what else you wanted to do with this pic-related info. Do
> > you want to search on pic-description / pic-caption for instance? In that
> > case the dynamic-fields approach may not be what you want: how would you
> > know in which dynamic-field to search for a particular term? Would if be
> > pic_desc_1 , or pic_desc_x?  Of couse you could OR over all dynamic
> fields,
> > but you need to know how many pics an upperbound for the nr of pics and
> it
> > really doesn't feel right, to me at least.
> >
> > If

Re: Setting many properties for a multivalued field. Schema.xml ? External file?

2010-06-26 Thread Geert-Jan Brits
btw, be careful with you delimiters: pic_url may possibly contain a '-',
etc.

2010/6/26 Geert-Jan Brits 

> >If I understand your suggestion correctly, you said that there's NO need
> to have many Dynamic Fields; instead, we can have one definitive field name,
> which can store a long string (concatenation of >information about tens of
> pictures), e.g., using "-" and "%" delimiters:
> pic_url_value1-pic_caption_value1-pic_description_value1%pic_url_value2-pic_caption_value2-pic_description_value2%...
> >I don't clearly see the reason of doing this. Is there a gain in terms of
> performance? Or does this make programming on the client-side easier? Or
> something else?
>
> I think you should ask the exact opposite question. If you don't do
> anything with these fields which Solr is particularly good at (searching /
> filtering / faceting/ sorting) why go through the trouble of creating
> dynamic fields?  (more fields is more overhead cost/ tracking cost no matter
> how you look at it)
>
> Moreover, indeed from a client-view it's easier the way I suggested, since
> otherwise you:
> - would have to ask (through SolrJ) to include all dynamic fields to be
> returned in the Fl-field (
> http://wiki.apache.org/solr/CommonQueryParameters#fl). This is difficult,
> because a-priori you don't know how many dynamic-fields to query. So in
> other words you can't just ask SOlr (though SolrJ lik you asked) to just
> return all dynamic fields beginning with pic_*. (afaik)
> - your client iterate code (looping the pics) is a bit more involved.
>
> HTH, Cheers,
>
> Geert-Jan
>
> 2010/6/26 Saïd Radhouani 
>
>> Thanks Geert-Jan for the detailed answer. Actually, I don't search at all
>> on these fields. I'm only filtering (w/ vs w/ pic) and sorting (based on the
>> number of pictures). Thus, your suggestion of adding an extra field NrOfPics
>> [0,N] would be the best solution.
>>
>> Regarding the other suggestion:
>>
>> > If you dont need search at all on these fields, the best thing imo is to
>> > store all pic-related info of all pics together by concatenating them
>> with
>> > some delimiter which you know how to seperate at the client-side.
>> > That or just store it in an external RDB since solr is just sitting on
>> the
>> > data and not doing anything intelligent with it.
>>
>> If I understand your suggestion correctly, you said that there's NO need
>> to have many Dynamic Fields; instead, we can have one definitive field name,
>> which can store a long string (concatenation of information about tens of
>> pictures), e.g., using "-" and "%" delimiters:
>> pic_url_value1-pic_caption_value1-pic_description_value1%pic_url_value2-pic_caption_value2-pic_description_value2%...
>>
>> I don't clearly see the reason of doing this. Is there a gain in terms of
>> performance? Or does this make programming on the client-side easier? Or
>> something else?
>>
>>
>> My other question was: in case we use Dynamic Fields, is there a
>> documentation about using SolrJ for this purpose?
>>
>> Thanks
>> -Saïd
>>
>> On Jun 26, 2010, at 12:29 PM, Geert-Jan Brits wrote:
>>
>> > You can treat dynamic fields like any other field, so you can facet,
>> sort,
>> > filter, etc on these fields (afaik)
>> >
>> > I believe the confusion arises that sometimes the usecase for dynamic
>> fields
>> > seems to be ill-understood, i.e: to be able to use them to do some kind
>> of
>> > wildcard search, e.g: search for a value in any of the dynamic fields at
>> > once like pic_url_*. This however is NOT possible.
>> >
>> > As far as your question goes:
>> >
>> >> Now, I'm trying to make facets on pictures: display doc w/ pic vs. doc
>> w/o
>> > pic
>> >> To the best of my knowledge, everyone is saying that faceting cannot be
>> > done on dynamic fields (only on definitive field names). Thus, I tried
>> the
>> > following and it's working: I assume that the stored > >pictures have a
>> > sequential number (_1, _2, etc.), i.e., if pic_url_1 exists in the
>> index, it
>> > means that the underlying doc has at least one picture:
>> >> ...&facet=on&facet.field=pic_url_1&facet.mincount=1&fq=pic_url_1:*
>> >> While this is working fine, I'm wondering whether there's a cleaner way
>> to
>> > do the same thing without assuming that pictures have a sequential
>> number.
>> >
>> > If I understand your question correctly: faceting on docs with and
>> without
>> > pics could ofcourse by done like you mention, however it  would be more
>> > efficient to have an extra field defined:  hasAtLestOnePic with values
>> (0 |
>> > 1)
>> > use that to facet / filter on.
>> >
>> > you can extend this to NrOfPics [0,N)  if you need to filter / facet on
>> docs
>> > with a certain nr of pics.
>> >
>> > also I wondered what else you wanted to do with this pic-related info.
>> Do
>> > you want to search on pic-description / pic-caption for instance? In
>> that
>> > case the dynamic-fields approach may not be what you want: how would you
>> > know in which dynamic-field to search for a p

Re: Setting many properties for a multivalued field. Schema.xml ? External file?

2010-06-26 Thread Saïd Radhouani
Thanks Geert-Jan, this is indeed very helpful.

The delimiters I gave were just for the need of the example. I will use non 
frequent delimiter.

Cheers,
-Saïd

On Jun 26, 2010, at 1:53 PM, Geert-Jan Brits wrote:

>> If I understand your suggestion correctly, you said that there's NO need to
> have many Dynamic Fields; instead, we can have one definitive field name,
> which can store a long string (concatenation of >information about tens of
> pictures), e.g., using "-" and "%" delimiters:
> pic_url_value1-pic_caption_value1-pic_description_value1%pic_url_value2-pic_caption_value2-pic_description_value2%...
>> I don't clearly see the reason of doing this. Is there a gain in terms of
> performance? Or does this make programming on the client-side easier? Or
> something else?
> 
> I think you should ask the exact opposite question. If you don't do anything
> with these fields which Solr is particularly good at (searching / filtering
> / faceting/ sorting) why go through the trouble of creating dynamic fields?
> (more fields is more overhead cost/ tracking cost no matter how you look at
> it)
> 
> Moreover, indeed from a client-view it's easier the way I suggested, since
> otherwise you:
> - would have to ask (through SolrJ) to include all dynamic fields to be
> returned in the Fl-field (
> http://wiki.apache.org/solr/CommonQueryParameters#fl). This is difficult,
> because a-priori you don't know how many dynamic-fields to query. So in
> other words you can't just ask SOlr (though SolrJ lik you asked) to just
> return all dynamic fields beginning with pic_*. (afaik)
> - your client iterate code (looping the pics) is a bit more involved.
> 
> HTH, Cheers,
> 
> Geert-Jan
> 
> 2010/6/26 Saïd Radhouani 
> 
>> Thanks Geert-Jan for the detailed answer. Actually, I don't search at all
>> on these fields. I'm only filtering (w/ vs w/ pic) and sorting (based on the
>> number of pictures). Thus, your suggestion of adding an extra field NrOfPics
>> [0,N] would be the best solution.
>> 
>> Regarding the other suggestion:
>> 
>>> If you dont need search at all on these fields, the best thing imo is to
>>> store all pic-related info of all pics together by concatenating them
>> with
>>> some delimiter which you know how to seperate at the client-side.
>>> That or just store it in an external RDB since solr is just sitting on
>> the
>>> data and not doing anything intelligent with it.
>> 
>> If I understand your suggestion correctly, you said that there's NO need to
>> have many Dynamic Fields; instead, we can have one definitive field name,
>> which can store a long string (concatenation of information about tens of
>> pictures), e.g., using "-" and "%" delimiters:
>> pic_url_value1-pic_caption_value1-pic_description_value1%pic_url_value2-pic_caption_value2-pic_description_value2%...
>> 
>> I don't clearly see the reason of doing this. Is there a gain in terms of
>> performance? Or does this make programming on the client-side easier? Or
>> something else?
>> 
>> 
>> My other question was: in case we use Dynamic Fields, is there a
>> documentation about using SolrJ for this purpose?
>> 
>> Thanks
>> -Saïd
>> 
>> On Jun 26, 2010, at 12:29 PM, Geert-Jan Brits wrote:
>> 
>>> You can treat dynamic fields like any other field, so you can facet,
>> sort,
>>> filter, etc on these fields (afaik)
>>> 
>>> I believe the confusion arises that sometimes the usecase for dynamic
>> fields
>>> seems to be ill-understood, i.e: to be able to use them to do some kind
>> of
>>> wildcard search, e.g: search for a value in any of the dynamic fields at
>>> once like pic_url_*. This however is NOT possible.
>>> 
>>> As far as your question goes:
>>> 
 Now, I'm trying to make facets on pictures: display doc w/ pic vs. doc
>> w/o
>>> pic
 To the best of my knowledge, everyone is saying that faceting cannot be
>>> done on dynamic fields (only on definitive field names). Thus, I tried
>> the
>>> following and it's working: I assume that the stored > >pictures have a
>>> sequential number (_1, _2, etc.), i.e., if pic_url_1 exists in the index,
>> it
>>> means that the underlying doc has at least one picture:
 ...&facet=on&facet.field=pic_url_1&facet.mincount=1&fq=pic_url_1:*
 While this is working fine, I'm wondering whether there's a cleaner way
>> to
>>> do the same thing without assuming that pictures have a sequential
>> number.
>>> 
>>> If I understand your question correctly: faceting on docs with and
>> without
>>> pics could ofcourse by done like you mention, however it  would be more
>>> efficient to have an extra field defined:  hasAtLestOnePic with values (0
>> |
>>> 1)
>>> use that to facet / filter on.
>>> 
>>> you can extend this to NrOfPics [0,N)  if you need to filter / facet on
>> docs
>>> with a certain nr of pics.
>>> 
>>> also I wondered what else you wanted to do with this pic-related info. Do
>>> you want to search on pic-description / pic-caption for instance? In that
>>> case the dynamic-fields approach may not

Re: Recommended MySQL JDBC driver

2010-06-26 Thread Marc Sturlese

I supose you use BatchSize=-1 to index that amount of data. Up from 5.1.7
connector there's this param:
netTimeoutForStreamingResults
The default value is 600. Increasing that maybe can help (2400 for example?)
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Recommended-MySQL-JDBC-driver-tp817458p924107.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: example solr xml working fine but my own xml files not working

2010-06-26 Thread codar

I'm struggling with this very same problem.  I can index the example files
fine. When I try adding a custom file, it appears to index without issue;
but I get no search results via the admin console.  I've also tried
modifying one of the files (monitor.xml); it also did not update.  I'm using
solr 1.4.1 on a MAC.  Any help would be greatly appreciated.

I added these fields to the schema.xml




Here's my custom xml:


ZS1
ZS1
RW
How to index Solr on the Mac


I cd to the exampledocs dir and run: java -jar post.jar my_data.xml

Here are the results:

SimplePostTool: version 1.2
SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8,
other encodings are not currently supported
SimplePostTool: POSTing files to http://localhost:8983/solr/update..
SimplePostTool: POSTing file tr_single.xml
SimplePostTool: COMMITting Solr index changes..

So, it appears to have indexed without issue, but yet when I search for the
ZS1, I get not results.

Thanks in advance


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/example-solr-xml-working-fine-but-my-own-xml-files-not-working-tp504245p924113.html
Sent from the Solr - User mailing list archive at Nabble.com.


phrase highlighting

2010-06-26 Thread Lukas Kahwe Smith
Hi,

Form googling and looking at jira tickets it seems like phrase highlighting 
should work out of the box, but even enabling it manually didnt get me the 
desired result:
http://resolutionfinder.org/search?q=%22security+council%22&=&tm=any&s=Search

generates the following query:

INFO: [Clause_en] webapp=/solr path=/select 
params={hl.fragsize=0&facet=true&sort=score+desc&hl.simple.pre=&hl.fl=*&json.nl=map&wt=json&hl=true&rows=21&hl.highlightMultiTerm=true&fl=*,score&start=0&q=(_query_:"{!dismax+qf%3D'content+document_title'+pf%3D'content+document_title'+v%3D$qq}")&hl.simple.post=&facet.field={!ex%3Ddt+key%3Dorig_legal_value}legal_value&facet.field={!ex%3Ddt+key%3Dorig_adoption_year}adoption_year&facet.field={!ex%3Ddt+key%3Dorig_organisation_id}organisation_id&facet.field={!ex%3Ddt+key%3Dorig_addressee_ids}addressee_ids&facet.field={!ex%3Ddt+key%3Dorig_documenttype_id}documenttype_id&facet.field={!ex%3Ddt+key%3Dorig_information_type_id}information_type_id&facet.field={!ex%3Ddt+key%3Dorig_operative_phrase_id}operative_phrase_id&facet.field={!ex%3Ddt+key%3Dorig_tag_ids}tag_ids&hl.usePhraseHighlighter=true&qq="security+council"}
 hits=0 status=0 QTime=31 

but as you can see in the above website "security" and "council" are still 
highlighted separately.

regards,
Lukas Kahwe Smith
m...@pooteeweet.org





Re: phrase highlighting

2010-06-26 Thread Koji Sekiguchi

(10/06/26 22:19), Lukas Kahwe Smith wrote:

Hi,

Form googling and looking at jira tickets it seems like phrase highlighting 
should work out of the box, but even enabling it manually didnt get me the 
desired result:
http://resolutionfinder.org/search?q=%22security+council%22&=&tm=any&s=Search

generates the following query:

INFO: [Clause_en] webapp=/solr path=/select 
params={hl.fragsize=0&facet=true&sort=score+desc&hl.simple.pre=&hl.fl=*&json.nl=map&wt=json&hl=true&rows=21&hl.highlightMultiTerm=true&fl=*,score&start=0&q=(_query_:"{!dismax+qf%3D'content+document_title'+pf%3D'content+document_title'+v%3D$qq}")&hl.simple.post=&facet.field={!ex%3Ddt+key%3Dorig_legal_value}legal_value&facet.field={!ex%3Ddt+key%3Dorig_adoption_year}adoption_year&facet.field={!ex%3Ddt+key%3Dorig_organisation_id}organisation_id&facet.field={!ex%3Ddt+key%3Dorig_addressee_ids}addressee_ids&facet.field={!ex%3Ddt+key%3Dorig_documenttype_id}documenttype_id&facet.field={!ex%3Ddt+key%3Dorig_information_type_id}information_type_id&facet.field={!ex%3Ddt+key%3Dorig_operative_phrase_id}operative_phrase_id&facet.field={!ex%3Ddt+key%3Dorig_tag_ids}tag_ids&hl.usePhraseHighlighter=true&qq="security+council"}
 hits=0 status=0 QTime=31

but as you can see in the above website "security" and "council" are still 
highlighted separately.

regards,
Lukas Kahwe Smith
m...@pooteeweet.org

   

Lukas,

What do you mean by ""security" and "council" are still highlighted 
separately"?

If you expect that you get "security council", highlighter
cannot do it. Highlighter tags per term for emphasizing terms and phrases
even if you set hl.usePhraseHighlighter to true.

Koji

--
http://www.rondhuit.com/en/




Re: phrase highlighting

2010-06-26 Thread Lukas Kahwe Smith

On 26.06.2010, at 16:22, Koji Sekiguchi wrote:

> (10/06/26 22:19), Lukas Kahwe Smith wrote:
>> Hi,
>> 
>> Form googling and looking at jira tickets it seems like phrase highlighting 
>> should work out of the box, but even enabling it manually didnt get me the 
>> desired result:
>> http://resolutionfinder.org/search?q=%22security+council%22&=&tm=any&s=Search
>> 
>> generates the following query:
>> 
>> INFO: [Clause_en] webapp=/solr path=/select 
>> params={hl.fragsize=0&facet=true&sort=score+desc&hl.simple.pre=&hl.fl=*&json.nl=map&wt=json&hl=true&rows=21&hl.highlightMultiTerm=true&fl=*,score&start=0&q=(_query_:"{!dismax+qf%3D'content+document_title'+pf%3D'content+document_title'+v%3D$qq}")&hl.simple.post=&facet.field={!ex%3Ddt+key%3Dorig_legal_value}legal_value&facet.field={!ex%3Ddt+key%3Dorig_adoption_year}adoption_year&facet.field={!ex%3Ddt+key%3Dorig_organisation_id}organisation_id&facet.field={!ex%3Ddt+key%3Dorig_addressee_ids}addressee_ids&facet.field={!ex%3Ddt+key%3Dorig_documenttype_id}documenttype_id&facet.field={!ex%3Ddt+key%3Dorig_information_type_id}information_type_id&facet.field={!ex%3Ddt+key%3Dorig_operative_phrase_id}operative_phrase_id&facet.field={!ex%3Ddt+key%3Dorig_tag_ids}tag_ids&hl.usePhraseHighlighter=true&qq="security+council"}
>>  hits=0 status=0 QTime=31
>> 
>> but as you can see in the above website "security" and "council" are still 
>> highlighted separately.
>> 
>> regards,
>> Lukas Kahwe Smith
>> m...@pooteeweet.org
>> 
>>   
> Lukas,
> 
> What do you mean by ""security" and "council" are still highlighted 
> separately"?
> If you expect that you get "security council", highlighter
> cannot do it. Highlighter tags per term for emphasizing terms and phrases
> even if you set hl.usePhraseHighlighter to true.


ah ok .. then i will just replace  in my custom code.

regards,
Lukas Kahwe Smith
m...@pooteeweet.org





Re: phrase highlighting

2010-06-26 Thread Lukas Kahwe Smith

On 26.06.2010, at 16:30, Lukas Kahwe Smith wrote:

> 
> On 26.06.2010, at 16:22, Koji Sekiguchi wrote:
> 
>> (10/06/26 22:19), Lukas Kahwe Smith wrote:
>>> Hi,
>>> 
>>> Form googling and looking at jira tickets it seems like phrase highlighting 
>>> should work out of the box, but even enabling it manually didnt get me the 
>>> desired result:
>>> http://resolutionfinder.org/search?q=%22security+council%22&=&tm=any&s=Search
>>> 
>>> generates the following query:
>>> 
>>> INFO: [Clause_en] webapp=/solr path=/select 
>>> params={hl.fragsize=0&facet=true&sort=score+desc&hl.simple.pre=&hl.fl=*&json.nl=map&wt=json&hl=true&rows=21&hl.highlightMultiTerm=true&fl=*,score&start=0&q=(_query_:"{!dismax+qf%3D'content+document_title'+pf%3D'content+document_title'+v%3D$qq}")&hl.simple.post=&facet.field={!ex%3Ddt+key%3Dorig_legal_value}legal_value&facet.field={!ex%3Ddt+key%3Dorig_adoption_year}adoption_year&facet.field={!ex%3Ddt+key%3Dorig_organisation_id}organisation_id&facet.field={!ex%3Ddt+key%3Dorig_addressee_ids}addressee_ids&facet.field={!ex%3Ddt+key%3Dorig_documenttype_id}documenttype_id&facet.field={!ex%3Ddt+key%3Dorig_information_type_id}information_type_id&facet.field={!ex%3Ddt+key%3Dorig_operative_phrase_id}operative_phrase_id&facet.field={!ex%3Ddt+key%3Dorig_tag_ids}tag_ids&hl.usePhraseHighlighter=true&qq="security+council"}
>>>  hits=0 status=0 QTime=31
>>> 
>>> but as you can see in the above website "security" and "council" are still 
>>> highlighted separately.
>>> 
>>> regards,
>>> Lukas Kahwe Smith
>>> m...@pooteeweet.org
>>> 
>>> 
>> Lukas,
>> 
>> What do you mean by ""security" and "council" are still highlighted 
>> separately"?
>> If you expect that you get "security council", highlighter
>> cannot do it. Highlighter tags per term for emphasizing terms and phrases
>> even if you set hl.usePhraseHighlighter to true.
> 
> 
> ah ok .. then i will just replace  in my custom code.


hmm then again thats probably also not good in case there are separate 
highlighted terms/phrases after each other. ah well i guess i can accept the 
default behavior.

regards,
Lukas Kahwe Smith
m...@pooteeweet.org





upload PDF using curl

2010-06-26 Thread go canal
Hello,
I am following the example at 
http://wiki.apache.org/solr/ExtractingRequestHandler 

I am using Windows XP, curl 7.19.5, Solr 1.4.1

the command is:

curl http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true' -F 
"myfi...@tutorial.pdf"
 
I got error :
HTTP Error:  400.  missing content stream.
'commit' is not a recognized as an internal or external command, operable 
program or batch file.

Any idea what's wrong ?
rgds,
canal



  

Re: example solr xml working fine but my own xml files not working

2010-06-26 Thread Erick Erickson
The first place you should go for this type of question is the
solr admin page and look at what's actually in your index.

A very handy tool for this is also Luke. Get a copy of it (google
Lucene Luke) and point it at your index and poke around
to see if what's actually in your index is what you expect.

If that all doesn't help, post more information. Particularly
what query you're submitting that you expect to return
results.

Also, try executing the query with &debugQuery=on, that may
give you some clues (also note that there's a checkbox on the
Admin page for debug info if you go to the "full interface"

HTH
Erick

On Sat, Jun 26, 2010 at 8:46 AM, codar  wrote:

>
> I'm struggling with this very same problem.  I can index the example files
> fine. When I try adding a custom file, it appears to index without issue;
> but I get no search results via the admin console.  I've also tried
> modifying one of the files (monitor.xml); it also did not update.  I'm
> using
> solr 1.4.1 on a MAC.  Any help would be greatly appreciated.
>
> I added these fields to the schema.xml
>multiValued="true" />
>multiValued="true" />
>multiValued="true" />
>
> Here's my custom xml:
>
> 
>ZS1
>ZS1
>RW
>How to index Solr on the Mac
> 
>
> I cd to the exampledocs dir and run: java -jar post.jar my_data.xml
>
> Here are the results:
>
> SimplePostTool: version 1.2
> SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8,
> other encodings are not currently supported
> SimplePostTool: POSTing files to http://localhost:8983/solr/update..
> SimplePostTool: POSTing file tr_single.xml
> SimplePostTool: COMMITting Solr index changes..
>
> So, it appears to have indexed without issue, but yet when I search for the
> ZS1, I get not results.
>
> Thanks in advance
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/example-solr-xml-working-fine-but-my-own-xml-files-not-working-tp504245p924113.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


NGramFilterFactory usage

2010-06-26 Thread Indika Tantrigoda
Hi all,

I've been working with Solr for while and the search components work as
expected.
Recently I've had the requirement to do searching on partial words and I
setup the NGramFilterFactory.

My schema.xml is as follows :

















Furthermore I am using the dismax query hanlder and have set a boost on the
nGram_text field.

If I do a *:* on the Solr administration interface it shows the nGram_text
field to be populated.
However if I search for plan (Assume I indexed the word Plane) no results
are shown.
Is there any other configurations that needs to be done ?

Thanks in advance,

Regards,
Indika


Re: NGramFilterFactory usage

2010-06-26 Thread Robert Muir
yes, you need to use ngramfilter at query-time too.

On Sat, Jun 26, 2010 at 3:55 PM, Indika Tantrigoda wrote:

> Hi all,
>
> I've been working with Solr for while and the search components work as
> expected.
> Recently I've had the requirement to do searching on partial words and I
> setup the NGramFilterFactory.
>
> My schema.xml is as follows :
>
> positionIncrementGap="100" stored="false" multiValued="true">
>
>
>
>  maxGramSize="15"/>
>
>
>
>
>
>
>
>  multiValued="false"/>
>  multiValued="true"/>
> 
>
> Furthermore I am using the dismax query hanlder and have set a boost on the
> nGram_text field.
>
> If I do a *:* on the Solr administration interface it shows the nGram_text
> field to be populated.
> However if I search for plan (Assume I indexed the word Plane) no results
> are shown.
> Is there any other configurations that needs to be done ?
>
> Thanks in advance,
>
> Regards,
> Indika
>



-- 
Robert Muir
rcm...@gmail.com


How to index rich document with XML payload?

2010-06-26 Thread Steve Johnson

Greetings,

I am new to Solr, but have gotten as far as successfully indexing 
documents both by sending XML describing the document and by sending the 
document itself using "update/extract".  What I want to do now is, in 
effect, do both of these on each of my documents.  I want to be able to 
have Tika do its magic first, and then I want to add additional fields 
to my document entries using XML.


Is there any way to do this?  In general, is there any way to apply 
multiple update requests to a single document entry?


I do understand that I can put literal values on the "update/extract" 
URL to do what I'm asking.  This is what I'll have to do if I can't 
figure out another way, but it seems messy to me...I'd much rather send 
an XML payload.


TIA for any help.



Re: example solr xml working fine but my own xml files not working

2010-06-26 Thread codar

Thanks, Erik.

I downloaded Luke and pointed it to my index.  I can see the data I indexed
via Luke, but still can't query it through the admin console.  I queried for
ZS1 and still got no results, but when I look at the index via Luke, I
see the document was indexed.  I'm stumped.

Jeff

On Sat, Jun 26, 2010 at 2:54 PM, Erick Erickson [via Lucene] <
ml-node+924591-989457010-302...@n3.nabble.com
> wrote:

> The first place you should go for this type of question is the
> solr admin page and look at what's actually in your index.
>
> A very handy tool for this is also Luke. Get a copy of it (google
> Lucene Luke) and point it at your index and poke around
> to see if what's actually in your index is what you expect.
>
> If that all doesn't help, post more information. Particularly
> what query you're submitting that you expect to return
> results.
>
> Also, try executing the query with &debugQuery=on, that may
> give you some clues (also note that there's a checkbox on the
> Admin page for debug info if you go to the "full interface"
>
> HTH
> Erick
>
> On Sat, Jun 26, 2010 at 8:46 AM, codar <[hidden 
> email]>
> wrote:
>
> >
> > I'm struggling with this very same problem.  I can index the example
> files
> > fine. When I try adding a custom file, it appears to index without issue;
>
> > but I get no search results via the admin console.  I've also tried
> > modifying one of the files (monitor.xml); it also did not update.  I'm
> > using
> > solr 1.4.1 on a MAC.  Any help would be greatly appreciated.
> >
> > I added these fields to the schema.xml
> > >multiValued="true" />
> >
> >multiValued="true" />
> > >multiValued="true" />
> >
> > Here's my custom xml:
> >
> > 
> >ZS1
> >ZS1
> >RW
> >How to index Solr on the Mac
> > 
> >
> > I cd to the exampledocs dir and run: java -jar post.jar my_data.xml
> >
> > Here are the results:
> >
> > SimplePostTool: version 1.2
> > SimplePostTool: WARNING: Make sure your XML documents are encoded in
> UTF-8,
> > other encodings are not currently supported
> > SimplePostTool: POSTing files to http://localhost:8983/solr/update..
> > SimplePostTool: POSTing file tr_single.xml
> > SimplePostTool: COMMITting Solr index changes..
> >
> > So, it appears to have indexed without issue, but yet when I search for
> the
> > ZS1, I get not results.
> >
> > Thanks in advance
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/example-solr-xml-working-fine-but-my-own-xml-files-not-working-tp504245p924113.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>
>
> --
>  View message @
> http://lucene.472066.n3.nabble.com/example-solr-xml-working-fine-but-my-own-xml-files-not-working-tp504245p924591.html
> To unsubscribe from Re: example solr xml working fine but my own xml files
> not working, click here< (link removed) =>.
>
>
>

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/example-solr-xml-working-fine-but-my-own-xml-files-not-working-tp504245p924851.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: example solr xml working fine but my own xml files not working

2010-06-26 Thread codar

To add to my last, when I query *:* I get the results I expect, but if I
query a term (ZS2) it doesn't find any matches.  I must be missing
something simple.  I'm new to solr, so it's possible I just don't understand
how to query it.

Jeff

On Sat, Jun 26, 2010 at 6:42 PM, Jeff Kemble  wrote:

> Thanks, Erik.
>
> I downloaded Luke and pointed it to my index.  I can see the data I indexed
> via Luke, but still can't query it through the admin console.  I queried for
> ZS1 and still got no results, but when I look at the index via Luke, I
> see the document was indexed.  I'm stumped.
>
> Jeff
>
>
> On Sat, Jun 26, 2010 at 2:54 PM, Erick Erickson [via Lucene] <
> ml-node+924591-989457010-302...@n3.nabble.com
> > wrote:
>
>> The first place you should go for this type of question is the
>> solr admin page and look at what's actually in your index.
>>
>> A very handy tool for this is also Luke. Get a copy of it (google
>> Lucene Luke) and point it at your index and poke around
>> to see if what's actually in your index is what you expect.
>>
>> If that all doesn't help, post more information. Particularly
>> what query you're submitting that you expect to return
>> results.
>>
>> Also, try executing the query with &debugQuery=on, that may
>> give you some clues (also note that there's a checkbox on the
>> Admin page for debug info if you go to the "full interface"
>>
>> HTH
>> Erick
>>
>> On Sat, Jun 26, 2010 at 8:46 AM, codar <[hidden 
>> email]>
>> wrote:
>>
>> >
>> > I'm struggling with this very same problem.  I can index the example
>> files
>> > fine. When I try adding a custom file, it appears to index without
>> issue;
>> > but I get no search results via the admin console.  I've also tried
>> > modifying one of the files (monitor.xml); it also did not update.  I'm
>> > using
>> > solr 1.4.1 on a MAC.  Any help would be greatly appreciated.
>> >
>> > I added these fields to the schema.xml
>> >>
>> >multiValued="true" />
>> >> stored="true"
>> >multiValued="true" />
>> >>
>> >multiValued="true" />
>> >
>> > Here's my custom xml:
>> >
>> > 
>> >ZS1
>> >ZS1
>> >RW
>> >How to index Solr on the Mac
>> > 
>> >
>> > I cd to the exampledocs dir and run: java -jar post.jar my_data.xml
>> >
>> > Here are the results:
>> >
>> > SimplePostTool: version 1.2
>> > SimplePostTool: WARNING: Make sure your XML documents are encoded in
>> UTF-8,
>> > other encodings are not currently supported
>> > SimplePostTool: POSTing files to http://localhost:8983/solr/update..
>> > SimplePostTool: POSTing file tr_single.xml
>> > SimplePostTool: COMMITting Solr index changes..
>> >
>> > So, it appears to have indexed without issue, but yet when I search for
>> the
>> > ZS1, I get not results.
>> >
>> > Thanks in advance
>> >
>> >
>> > --
>> > View this message in context:
>> >
>> http://lucene.472066.n3.nabble.com/example-solr-xml-working-fine-but-my-own-xml-files-not-working-tp504245p924113.html
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>> >
>>
>>
>> --
>>  View message @
>> http://lucene.472066.n3.nabble.com/example-solr-xml-working-fine-but-my-own-xml-files-not-working-tp504245p924591.html
>> To unsubscribe from Re: example solr xml working fine but my own xml files
>> not working, click here< (link removed) =>.
>>
>>
>>
>

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/example-solr-xml-working-fine-but-my-own-xml-files-not-working-tp504245p924880.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: [ANN] Solr 1.4.1 Released

2010-06-26 Thread Jason Chaffee
It appears the 1.4.1 version was deployed with a new maven groupId

For eample, if you are trying to download solr-core, here are the differences 
between 1.4.0 and 1.4.1.  

1.4.0
groupId: org.apache.solr
artifactId: solr-core

1.4.1
groupId: org.apache.solr.solr
artifactId:solr-core

Was this change intentional or a mistake?  If it was a mistake, can someone 
please fix it in maven's central repository?

thanks,

Jason

-Original Message-
From: Mark Miller [mailto:markrmil...@apache.org]
Sent: Fri 6/25/2010 6:23 AM
To: solr-user@lucene.apache.org; gene...@lucene.apache.org; annou...@apache.org
Subject: [ANN] Solr 1.4.1 Released
 
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Apache Solr 1.4.1 has been released and is now available for public
download!
http://www.apache.org/dyn/closer.cgi/lucene/solr/

Solr is the popular, blazing fast open source enterprise search
platform from the Apache Lucene project.  Its major features include
powerful full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, and rich document (e.g., Word, PDF)
handling.  Solr is highly scalable, providing distributed search and
index replication, and it powers the search and navigation features of
many of the world's largest internet sites.

Solr is written in Java and runs as a standalone full-text search server
within a servlet container such as Tomcat.  Solr uses the Lucene Java
search library at its core for full-text indexing and search, and has
REST-like HTTP/XML and JSON APIs that make it easy to use from virtually
any programming language.  Solr's powerful external configuration allows
it to be tailored to almost any type of application without Java coding,
and it has an extensive plugin architecture when more advanced
customization is required.

Solr 1.4.1 is a bug fix release for Solr 1.4 that includes many Solr bug
fixes as well as Lucene bug fixes from Lucene 2.9.3.

See all of the CHANGES here:
http://svn.apache.org/repos/asf/lucene/solr/tags/release-1.4.1/CHANGES.txt


- - Mark Miller on behalf of the Solr team
-BEGIN PGP SIGNATURE-
Version: GnuPG/MacGPG2 v2.0.14 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBAgAGBQJMJK3AAAoJED+/0YJ4eWrIrfAP/RLD7QvreOBFebICN/eiRzCH
1dHOt9Scn7qGQU4RvXZ8GQq37AuoRMgmgckntttFLCCD5w5A29/GxzyZbAoQDQ0B
OkaHsYIcUuhbLq8QtlTjt+rK3gc6oxMoCRMJBS7DfUFUyROl6om4gpYAVem50qDy
FfBdgRxp4VZ07E7VwmMvma03nSrKuvX0bwE8NXksaCAVsvkmi8Sh7aLMPPVHgsuD
pbY8kB0hXCULJgs9ZAc2t6+T38+eV9wxJSeAktVlGAvNlYTavW2bxzF5wQk+kXCd
DwGjdlU9/ebHdx3MHJyE0zXSl4rGFsy8zfh/ntk7UV7qklQ2jn5Ur18zLqv4vkb1
Ea78GpoqCZWlMGcRUSErtH33cGs4blo/kuJZj/VLrk6jxO4x4beUsAfRcM/YliJW
Z6OuFtpcdVDjVl4aB2xbAMwDl2DXqgyNmlxs8vvqdRoDhN8wZ91raO0kkbrkzj1f
5gPD//Efx6RcrYtXAV3HKAwI7FLP8MhzFu1Y2FK2FY7DyFNmirad03+pB6bFs1xq
ARU6pdeTYvv+PsWH3Keaw/L/nb0BYbU8R1sVhkvjm+S9gJ6cCcKJkeAkNgL+6QNm
JPJ5VeXVFGVmwzQ5mE3j6qX1uDrEmLA2T5Dd7bssWtwveLoyfo0s7qezIfbRamnc
T3iyCE6cuSU9CvCEqN+o
=nBB9
-END PGP SIGNATURE-



REST calls

2010-06-26 Thread Jason Chaffee
The solr docs say it is RESTful, yet it seems that it doesn't use http headers 
in a RESTful way.  For example, it doesn't seem to use the Accept: request 
header to determine the media-type to be returned.  Instead, it requires a 
query parameter to be used in the URL.  Also, it doesn't seem to use return 304 
Not Modified if the request header "if-modified-since" is used.

Am I doing something wrong or is Solr not truly completely RESTful?

thanks,


Jason


URLDataSource

2010-06-26 Thread Jason Chaffee
I would like to the URLDataSource to make RESTful calls to get content and only 
re-index when content changes.  This means using http headers to make a request 
and using the response headers to determine when to make the request.  For 
example,

Request Headers:

Accept: application/xml
if-modified-since: timestamp


Response Headers:

Expires: timestamp
Etag: etag

In this case Solr would make a request or the specified media type by adding it 
to the accept header.  Also, it would use a timestamp in the if-modified-since 
on requests after the first request.  This timestamp would be the last time 
that indexing took place.  So, we only want to index again if changes happened. 
 The RESTful service would return content the first time contacted with the 
expires header, which would tell Solr when is the next time it should check for 
new content to be indexed.  At that point the RESTful service could return 304 
Not Modified or it could return new content.  If it returns new content, it is 
indexed.  Otherwise, Solr reads the new Expires header to see when it should 
make the next request.


My question is whether or not there is anything in Solr that currently supports 
this or if I would have to implement this myself?  I wasn't able to find 
anything.  

thanks,

Jason




Re: [ANN] Solr 1.4.1 Released

2010-06-26 Thread Ken Krugler


On Jun 26, 2010, at 5:18pm, Jason Chaffee wrote:


It appears the 1.4.1 version was deployed with a new maven groupId

For eample, if you are trying to download solr-core, here are the  
differences between 1.4.0 and 1.4.1.


1.4.0
groupId: org.apache.solr
artifactId: solr-core

1.4.1
groupId: org.apache.solr.solr
artifactId:solr-core

Was this change intentional or a mistake?  If it was a mistake, can  
someone please fix it in maven's central repository?


I believe it was a mistake. From a recent email thread on this list,  
Mark Miller said:



Can a solr/maven dude look at this? I simply used the copy command on
the release to-do wiki (sounds like it should be updated).

If no one steps up, I'll try and straighten it out later.

On 6/25/10 10:28 AM, Stevo Slavić wrote:

Congrats on the release!

Something seems to be wrong with solr 1.4.1 maven artifacts, there  
is in

extra solr in the path. E.g. solr-parent-1.4.1.pom at in
http://repo1.maven.org/maven2/org/apache/solr/solr/solr-parent/1.4.1/solr-parent-1.4.1.pomwhile
it should be at
http://repo1.maven.org/maven2/org/apache/solr/solr-parent/1.4.1/solr-parent-1.4.1.pom 
.

Pom's seem to contain correct maven artifact coordinates.

Regards,
Stevo.


-- Ken


Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g






Re: How to index rich document with XML payload?

2010-06-26 Thread go canal
Hi,
I just started using SolrI am using SolrJ client, but uploading the file 
directly to Solr. I think we can use Tika in our code first.

Here I send the file directly to Solr which will do the text extraction:

CommonsHttpSolrServer solr = new 
CommonsHttpSolrServer("http://localhost:8983/solr";);
solr.setRequestWriter(new BinaryRequestWriter());

ContentStreamUpdateRequest up = new ContentStreamUpdateRequest 
("/update/extract");
// read a file
File file = new File ("tutorial.pdf");
up.addFile(file);
up.setParam("literal.id", "tutorial.pdf");
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
solr.request(up);

So what we need to do is to add Tika.

I have a question about up.setParam - am I able to create my own fields ?
 rgds,
canal





From: Steve Johnson 
To: solr-user@lucene.apache.org
Sent: Sun, June 27, 2010 6:50:01 AM
Subject: How to index rich document with XML payload?

Greetings,

I am new to Solr, but have gotten as far as successfully indexing documents 
both by sending XML describing the document and by sending the document itself 
using "update/extract".  What I want to do now is, in effect, do both of these 
on each of my documents.  I want to be able to have Tika do its magic first, 
and then I want to add additional fields to my document entries using XML.

Is there any way to do this?  In general, is there any way to apply multiple 
update requests to a single document entry?

I do understand that I can put literal values on the "update/extract" URL to do 
what I'm asking.  This is what I'll have to do if I can't figure out another 
way, but it seems messy to me...I'd much rather send an XML payload.

TIA for any help.


  

Re: NGramFilterFactory usage

2010-06-26 Thread Indika Tantrigoda
Hello,

Applying the NGramFilterFactory for analyzer type="query" didnt solve the
issue.
>From the examples I've seen it is only necesssary to have the
NGramFilterFactory at index time right ?

Regards,
Indika

On 27 June 2010 01:14, Indika Tantrigoda  wrote:

> Hi all,
>
> I've been working with Solr for while and the search components work as
> expected.
> Recently I've had the requirement to do searching on partial words and I
> setup the NGramFilterFactory.
>
> My schema.xml is as follows :
>
>  positionIncrementGap="100" stored="false" multiValued="true">
> 
> 
> 
>maxGramSize="15"/>
> 
> 
> 
> 
> 
> 
>
>  multiValued="false"/>
>  multiValued="true"/>
> 
>
> Furthermore I am using the dismax query hanlder and have set a boost on the
> nGram_text field.
>
> If I do a *:* on the Solr administration interface it shows the nGram_text
> field to be populated.
> However if I search for plan (Assume I indexed the word Plane) no results
> are shown.
> Is there any other configurations that needs to be done ?
>
> Thanks in advance,
>
> Regards,
> Indika
>


Re: How to index rich document with XML payload?

2010-06-26 Thread go canal
Simple code like this:


File file = new File ("test.pdf");
InputStream input = new FileInputStream(file);
Metadata metadata = new Metadata ();
ContentHandler handler = new BodyContentHandler();
AutoDetectParser parse = new AutoDetectParser();
parse.parse(input, handler, metadata);
input.close();

the extracted content is handler.toString() rgds,
canal





From: go canal 
To: solr-user@lucene.apache.org
Sent: Sun, June 27, 2010 9:45:57 AM
Subject: Re: How to index rich document with XML payload?

Hi,
I just started using SolrI am using SolrJ client, but uploading the file 
directly to Solr. I think we can use Tika in our code first.

Here I send the file directly to Solr which will do the text extraction:

CommonsHttpSolrServer solr = new 
CommonsHttpSolrServer("http://localhost:8983/solr";);
solr.setRequestWriter(new BinaryRequestWriter());

ContentStreamUpdateRequest up = new ContentStreamUpdateRequest 
("/update/extract");
// read a file
File file = new File ("tutorial.pdf");
up.addFile(file);
up.setParam("literal.id", "tutorial.pdf");
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
solr.request(up);

So what we need to do is to add Tika.

I have a question about up.setParam - am I able to create my own fields ?
rgds,
canal





From: Steve Johnson 
To: solr-user@lucene.apache.org
Sent: Sun, June 27, 2010 6:50:01 AM
Subject: How to index rich document with XML payload?

Greetings,

I am new to Solr, but have gotten as far as successfully indexing documents 
both by sending XML describing the document and by sending the document itself 
using "update/extract".  What I want to do now is, in effect, do both of these 
on each of my documents.  I want to be able to have Tika do its magic first, 
and then I want to add additional fields to my document entries using XML.

Is there any way to do this?  In general, is there any way to apply multiple 
update requests to a single document entry?

I do understand that I can put literal values on the "update/extract" URL to do 
what I'm asking.  This is what I'll have to do if I can't figure out another 
way, but it seems messy to me...I'd much rather send an XML payload.

TIA for any help.


  

Chinese chars are not indexed ?

2010-06-26 Thread go canal
Hello,
I enter Chinese chars in the admin console for searching matched documents, it 
does not return any though I have uploaded some documents that has Chinese 
chars. 

I guess the Chinese characters are not indexed. Is there any configuration I 
need to make in Solr?
 rgds,
canal