Re: How to extend IndexSchema and SchemaField

2010-10-09 Thread Renaud Delbru

 Hi Chris,

I have opened an issue (SOLR-2146 [1]) following that discussion.

[1] https://issues.apache.org/jira/browse/SOLR-2146

cheers
--
Renaud Delbru

On 14/09/10 01:06, Chris Hostetter wrote:

: Yes, I have thought of that, or even extending field type. But this does not
: work for my use case, since I can have multiple fields of a same type
: (therefore with the same field type, and same analyzer), but each one of them
: needs specific information. Therefore, I think the only "nice" way to achieve
: this is to have the possibility to add attributes to any field definition.

Right, at the moment custom FieldType classes can specify whatever
attributes they want to use in the  declaration -- but it's
not possible to specify arbitrary attributes that can be used in the
  declaration.

By all means, pelase open an issue requesting this as a feature.

I don't know that anyone explicitly set out to impose this limitation, but
one of the reasons it likely exists is because SchemaField is not
something that is intended to be customized -- while FieldType
objects are constructed once at startup, SchemaField obejcts are
frequently created on the fly when dealing with dynamicFields, so
initialization complexity is kept to a minimum.

That said -- this definitely seems like that type of usecase that we
should try to find *some* solution for -- even if it just means having
Solr automaticly create hidden FieldType instances for you on startup
based on attributes specified in the  that the corrisponding
FieldType class understands.


-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!





How to enable solr MoreLikeThis

2010-10-09 Thread Titash Neogi

Hello all,

I am trying to follow the documentation given here 
http://wiki.apache.org/solr/MoreLikeThisHandler to enable MoreLikeThis 
in my application. However, when I execute any URL such as the one below


/solr/mlt?stream.body=social media in india&mlt.fl=content&mlt.mintf=0

I get a 404 error with the following message

*type* Status report

*message* _/solr/mlt_

*description* _The requested resource (/solr/mlt) is not available.
_

_
_I have made all change in solrconfig.xml to enable the mlt component. 
My solr version is 1.4.0 833479. When I run regular /select/ queries, I 
get results as expected.
Any help on this would be appreciated greatly. I am not sure what else I 
need to do to make this work.


kind regards,
Titash




Re: How to enable solr MoreLikeThis

2010-10-09 Thread Ahmet Arslan


--- On Sat, 10/9/10, Titash Neogi  wrote:

> From: Titash Neogi 
> Subject: How to enable solr MoreLikeThis
> To: "solr-user@lucene.apache.org" 
> Date: Saturday, October 9, 2010, 3:28 PM
> Hello all,
> 
> I am trying to follow the documentation given here 
> http://wiki.apache.org/solr/MoreLikeThisHandler to
> enable MoreLikeThis in my application. However, when I
> execute any URL such as the one below
> 
> /solr/mlt?stream.body=social media in
> india&mlt.fl=content&mlt.mintf=0
> 
> I get a 404 error with the following message
> 
> *type* Status report
> 
> *message* _/solr/mlt_
> 
> *description* _The requested resource (/solr/mlt) is not
> available.
> _
> 
> _
> _I have made all change in solrconfig.xml to enable the mlt
> component. My solr version is 1.4.0 833479. When I run
> regular /select/ queries, I get results as expected.
> Any help on this would be appreciated greatly. I am not
> sure what else I need to do to make this work.

/solr/mlt? syntax is about RequestHandler. You need to registered a request 
handler named mlt in solrconfig.xml



list




  


Re: Speeding up solr indexing

2010-10-09 Thread Otis Gospodnetic
Related.  Can't be larger than -Xmx. :)  Or even equal to -Xmx, because other 
things need to live in the heap.  There is no exact function, so be more on the 
conservative side in order to avoid OOME.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Dennis Gearon 
> To: solr-user@lucene.apache.org
> Sent: Sat, October 9, 2010 12:58:18 AM
> Subject: Re: Speeding up solr indexing
> 
> How does that have to work with Java's memory? 
> 
> In lockstep, a certain  percentage, not related, what, or at all?
> 
> 
> Dennis  Gearon
> 
> Signature Warning
> 
> It is always a good idea  to learn from your own mistakes. It is usually a 
>better idea to learn from  others’ mistakes, so you do not have to make them 
>yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> 
> EARTH  has a Right To Life,
>   otherwise we all die.
> 
> 
> --- On Fri,  10/8/10, Otis Gospodnetic   wrote:
> 
> > From: Otis Gospodnetic 
> >  Subject: Re: Speeding up solr indexing
> > To: solr-user@lucene.apache.org
> >  Date: Friday, October 8, 2010, 9:13 PM
> > Hi,
> > 
> > Assuming  your DB/network/something else is not the
> > bottleneck, increase your 
> > ramBufferSizeMB (in solrconfig).
> > 
> > Otis
> >  
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > Lucene ecosystem  search :: http://search-lucene.com/
> > 
> > 
> > 
> > - Original  Message 
> > > From: sivaprasad 
> >  > To: solr-user@lucene.apache.org
> >  > Sent: Fri, October 8, 2010 2:59:45 PM
> > > Subject: Speeding up  solr indexing
> > > 
> > > 
> > > Hi,
> > > I am  indexing the data using DIH.Data coming from
> > mysql.Each   document
> > > contains 30 fields.Some of the fields are multi
> >  valued.When i am  trying to
> > > index 10 million records it taking more  time to
> > index.
> > > 
> > > Any  body has suggestions to  speed up indexing
> > process?Any suggestions on
> > > solr  admin  level configurations?
> > > 
> > > 
> > > Thanks,
> >  > JS
> > > -- 
> > > View this message  in context: 
> >  
>>http://lucene.472066.n3.nabble.com/Speeding-up-solr-indexing-tp1667054p1667054.html
>
> >  >
> > > Sent  from the Solr - User mailing list archive
> > at  Nabble.com.
> > > 
> >
>


Re: dynamic "stop" words?

2010-10-09 Thread Geert-Jan Brits
That might work, although depending on your use-case it might be hard to
have a good controlled vocab on citynames (hotel metropole bruxelles, hotel
metropole brussels, hotel metropole brussel, etc.)  Also 'hotel paris
bruxelles' stinks...

given your example:

> Doc 1
> name => "Holiday  Inn"
> city => "Denver"
>
> Doc 2
> name => "Holiday Inn,  Denver"
> city => "Denver"
>
> q=name:(Holiday Inn, Denver)

turning it upside down, perhaps an alternative would be to query on:
q=name:Holiday Inn+city:Denver

and configure field 'name' in such a way that doc1 and doc2 score the same.
I believe that must be possible, just not sure how to config it exactly at
the moment.

Of course, it depends on your scenario if you have enough knowlegde on the
clientside to transform:
q=name:(Holiday Inn, Denver)  to   q=name:Holiday Inn+city:Denver

Hth,
Geert-Jan

2010/10/9 Otis Gospodnetic 

> Matt,
>
> The first thing that came to my mind is that this might be interesting to
> try
> with a dictionary (of city names) if this example is not a made-up one.
>
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> - Original Message 
> > From: Matt Mitchell 
> > To: solr-user@lucene.apache.org
> > Sent: Fri, October 8, 2010 11:22:36 AM
> > Subject: dynamic "stop" words?
> >
> > Is it possible to have certain query terms not effect score, if that
> > same  query term is present in a field? For example, I have an index of
> > hotels.  Each hotel has a name and city. If the name of a hotel has the
> > name of the  city in it's "name" field, I want to completely ignore
> > that and not have it  influence score.
> >
> > Example:
> >
> > Doc 1
> > name => "Holiday  Inn"
> > city => "Denver"
> >
> > Doc 2
> > name => "Holiday Inn,  Denver"
> > city => "Denver"
> >
> > q=name:(Holiday Inn, Denver)
> >
> > I'd  like those docs to have the same score in the response. I don't
> > want Doc2 to  have a higher score, just because it has all of the query
> > terms.
> >
> > Is  this possible without using stop words? I hope this makes  sense!
> >
> > Thanks,
> > Matt
> >
>


Re: Speeding up solr indexing

2010-10-09 Thread sivaprasad

Hi,
Please find the configurations below.

Machine configurations(Solr running here):

RAM - 4 GB
HardDisk - 180GB
Os - Red Hat linux version 5
Processor-2x Intel Core 2 Duo CPU @2.66GHz



Machine configurations(Mysql server is running here):
RAM - 4 GB
HardDisk - 180GB
Os - Red Hat linux version 5
Processor-2x Intel Core 2 Duo CPU @2.66GHz

My sql Server deatils:
My sql version - Mysql 5.0.22

Solr configuration details:

 
  
false

20
   

100
2147483647
1
1000
1

   
   


   

single
  

  

false
100
20
   

2147483647
1
false
  

  
  
10
 
  1 
  6




  

Solr document details:

21 fields are indexed and stored
3 fileds are indexed only.
3 fileds are stored only.
3 fileds are indexed,stored and multi valued
2 fileds indexed and multi valued

And i am copying some of the indexed fileds.In this 2 fileds are multivalued
and has thousands of values.

In db-config-file the main table contains 0.6 million records.

When i tested for the same records, the index has taken 1hr 30 min.In this
case one of the multivalued filed table doesn't have records.After putting
data into this table,for each main table record , this table has thousands
of records and this filed is indexed and stored.It is taking more than 24
hrs .

Solr is running on tomcat 6.0.26, jdk1.6.0_17 and solr 1.4.1

I am using JVM's default settings.

Why this is taking this much time?Any body has suggestions, where i am going
wrong.

Thanks,
JS
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Speeding-up-solr-indexing-tp1667054p1670737.html
Sent from the Solr - User mailing list archive at Nabble.com.


xml-aware highlighting

2010-10-09 Thread Michael Sokolov
 I have a requirement to highlight search results, and to display 
documents with matching terms highlighted in the context of the original 
XML document structure.


It seems like this must be a very common use case, but I am having 
trouble finding a way to accomplish what we need to do using solr and/or 
lucene.  Using the standard highlighting support in solr, we have been 
able to retrieve KWIC text fragments for search results, which is 
great.  But what we would ideally like to do is to apply similar 
highlighting logic while preserving the original document structure.


1) When the user selects a matching document, we render it as HTML with 
paragraphs, headers, text styles such as italics, and so on, so we need 
to highlight either the rendered HTML or the original XML and then 
process that.  We need to find the text fragments that matched the 
original query and highlight those.  And this has to use the same logic 
used by solr/lucene to do the searching, so that the tokenization and 
analysis is applied properly, and query semantics are respected: if the 
original query was a phrase query, only phrases should match, and so on.


2) In addition, we also want to be able to display KWIC phrases that are 
rendered with type styles based on the original XML; this requires some 
XML tree surgery in order to pull out a fragment of a structured 
document while preserving enough xml structure to render type styles, 
which we can do, but it also requires a mapping of matching tokens back 
into the original document.


I am hoping this is a solved problem, but if not, I'd also be interested 
in pointers to the best places to start an implementation.  I think the 
problem at base is to maintain a map relating positions of matching 
terms in the indexed and stored field in lucene to corresponding 
positions in an original XML document.  Ideally the original positions 
could be stored directly in term vectors, but they could also be 
translated at render/highlight time using an additional lookup.


I see code in org.apache.lucene.search.highlight in solr and also 
something in lucene/contrib/highlighter. Is that the state of the art 
now, or is there anywhere else I should be looking as well?


Thanks for any pointers

-Mike Sokolov


Re: xml-aware highlighting

2010-10-09 Thread Michael Sokolov
 OK - I read a bit more and it appears an appropriate analysis pipeline 
(which would extract text from XML using SAX, say) is all that's 
required, and existing highlighting ought to be able to accomplish what 
I'm after.  So I guess the only question I have now before writing code 
is where is the existing implementation :)  - anyone?


-Mike


On 10/9/2010 12:51 PM, Michael Sokolov wrote:
 I have a requirement to highlight search results, and to display 
documents with matching terms highlighted in the context of the 
original XML document structure.


It seems like this must be a very common use case, but I am having 
trouble finding a way to accomplish what we need to do using solr 
and/or lucene.  Using the standard highlighting support in solr, we 
have been able to retrieve KWIC text fragments for search results, 
which is great.  But what we would ideally like to do is to apply 
similar highlighting logic while preserving the original document 
structure.


1) When the user selects a matching document, we render it as HTML 
with paragraphs, headers, text styles such as italics, and so on, so 
we need to highlight either the rendered HTML or the original XML and 
then process that.  We need to find the text fragments that matched 
the original query and highlight those.  And this has to use the same 
logic used by solr/lucene to do the searching, so that the 
tokenization and analysis is applied properly, and query semantics are 
respected: if the original query was a phrase query, only phrases 
should match, and so on.


2) In addition, we also want to be able to display KWIC phrases that 
are rendered with type styles based on the original XML; this requires 
some XML tree surgery in order to pull out a fragment of a structured 
document while preserving enough xml structure to render type styles, 
which we can do, but it also requires a mapping of matching tokens 
back into the original document.


I am hoping this is a solved problem, but if not, I'd also be 
interested in pointers to the best places to start an implementation.  
I think the problem at base is to maintain a map relating positions of 
matching terms in the indexed and stored field in lucene to 
corresponding positions in an original XML document.  Ideally the 
original positions could be stored directly in term vectors, but they 
could also be translated at render/highlight time using an additional 
lookup.


I see code in org.apache.lucene.search.highlight in solr and also 
something in lucene/contrib/highlighter. Is that the state of the art 
now, or is there anywhere else I should be looking as well?


Thanks for any pointers

-Mike Sokolov




Re: Speeding up solr indexing

2010-10-09 Thread Dennis Gearon
Looking at it, and now knowing how much memory your other processes on your box 
use (nor how much memory you have set aside for Java), I would start with 
DOUBLING your ram. Make sure that you have enough Java memory.

You will know if it has some effect by using the 2:1 size ratio. 100mb for all 
that data ia pretty small, I think.


Use the scientific method; Change only one parameter at a time and check 
results.

It's always on of four things:
(in different order depending on task, but listed alphabetically here)
--
Memory (process assigned and/or actual physical memory)
Processor
Network Bandwidth
Hard Drive Bandwidth
(sometimes you can add motherboard I/O paths also.
 as of this date, AMD has much more I/O paths in their
 consumer line of processors.)

In order ease of experimenting with(Easiest to hardest):
---
Appication/process assigned memory
Physical memory
Network Bandwidth
HardDrive Bandwidth
  Screaming fast SCSI 15K rpm drives
  RAID arrays, casual
  RAID arrays, professional
  External DRAM drive 64 gig max/RAID them for more
Processor(s) 
  Put maximum speed/cache size motherboard will take.
  Otherwise, USUALLY requires changing motherboard/HOSTING setup
I/O channels
  USUALLY requires changing motherboard/HOSTING setup





Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Sat, 10/9/10, sivaprasad  wrote:

> From: sivaprasad 
> Subject: Re: Speeding up solr indexing
> To: solr-user@lucene.apache.org
> Date: Saturday, October 9, 2010, 8:09 AM
> 
> Hi,
> Please find the configurations below.
> 
> Machine configurations(Solr running here):
> 
> RAM - 4 GB
> HardDisk - 180GB
> Os - Red Hat linux version 5
> Processor-2x Intel Core 2 Duo CPU @2.66GHz
> 
> 
> 
> Machine configurations(Mysql server is running here):
> RAM - 4 GB
> HardDisk - 180GB
> Os - Red Hat linux version 5
> Processor-2x Intel Core 2 Duo CPU @2.66GHz
> 
> My sql Server deatils:
> My sql version - Mysql 5.0.22
> 
> Solr configuration details:
> 
>  
>   
>    
> false
> 
>     20
>    
>    
>  
>   
>    
> 100
>    
> 2147483647
>    
> 1
>    
> 1000
>    
> 1
>    
> 
>    
>    
> 
>     
>    
> 
>     single
>   
> 
>   
>     
>    
> false
>    
> 100
>     20
>    
>    
> 
>    
> 2147483647
>    
> 1
>    
> false
>   
> 
>   
>    class="solr.DirectUpdateHandler2">
>    
> 10
>      
>       1 
>       6
>     
>     
>     
>     
>   
> 
> Solr document details:
> 
> 21 fields are indexed and stored
> 3 fileds are indexed only.
> 3 fileds are stored only.
> 3 fileds are indexed,stored and multi valued
> 2 fileds indexed and multi valued
> 
> And i am copying some of the indexed fileds.In this 2
> fileds are multivalued
> and has thousands of values.
> 
> In db-config-file the main table contains 0.6 million
> records.
> 
> When i tested for the same records, the index has taken 1hr
> 30 min.In this
> case one of the multivalued filed table doesn't have
> records.After putting
> data into this table,for each main table record , this
> table has thousands
> of records and this filed is indexed and stored.It is
> taking more than 24
> hrs .
> 
> Solr is running on tomcat 6.0.26, jdk1.6.0_17 and solr
> 1.4.1
> 
> I am using JVM's default settings.
> 
> Why this is taking this much time?Any body has suggestions,
> where i am going
> wrong.
> 
> Thanks,
> JS
> -- 
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Speeding-up-solr-indexing-tp1667054p1670737.html
> Sent from the Solr - User mailing list archive at
> Nabble.com.
>


Re: multi level faceting

2010-10-09 Thread Peter Karich
Hi,

there are two relative similar solutions for this problem.

I will describe one of them:
 * create a multivalued string field called 'category'
 * you have a category tree. so make sure a document gets not only the
leaf category, but all categories (name or id) until the root
 * now facet over category with '-1' as limit
 * What if you want to display only the categories of one level? (e.g.
if you don't want other level at a time or if they are too much).
   then use index the category field ala _category and use
facet.prefix to filter the category list
 * clicking on a category entry should result in a filter query ala
fq=category:"selectedCategoryWithLevel"
   the little tricky part is now that your UI or middle tier has to
parse the level e.g. 2 and the append 2+1=3 to the query: facet.prefix=3_
 * if you filter the level then one question remains:
Q: how can you display the path from the selected category until the
root category?
A: Either get the category parents via DB (which is easy if you
store the category ids in solr) or
get the parents from the parameter list which is a bit more
complicated (in this case I think it is best to  store the category
names in solr).

(The second approach is: instead of using facet.prefix you can use
dynamic fields ala category__s)

Did this explaination is missing sth. or unclear?

Kind Regards,
Peter.

> Hi,
>
> there is a solution without the patch. Here it should be explained:
> http://www.lucidimagination.com/blog/2010/08/11/stumped-with-solr-chris-hostetter-of-lucene-pmc-at-lucene-revolution/
>
> If not, I will do on 9.10.2010 ;-)
>
> Regards,
> Peter.
>
>   
>> I've a similar problem with a project I'm working on now.  I am holding out 
>> for either SOLR-64 or SOLR-792 being a bit more mature before I need the 
>> functionality but if not I was thinking I could do multi-level faceting by 
>> indexing the data as a "String" like this:
>>
>> id: 1
>> SHOE: Sneakers|Men|Size 7
>>
>> id: 2
>> SHOE: Sneakers|Men|Size 8
>>
>> id: 3
>> SHOE: Sneakers|Women|Size 6
>>
>> etc
>>
>> and then in the UI, show just up to the first delimiter (you'll have to sum 
>> the counts in the UI too).  Once the user clicks on "Sneakers", you would 
>> then add fq=SHOE:Sneakers|* to the query and then show the values up to the 
>> 2nd delimiter, etc.  
>>
>> Alternatively, if you didn't want to use a wildcard query, you could index 
>> each level separately like this:
>>
>> id: 1
>> SHOE1: Sneakers
>> SHOE2: Sneakers|Men
>> SHOE3: Sneakers|Men|Size 7
>>
>> Then after the user clicks on the 1st level, fq on SHOE1 and show SHOE2, 
>> etc.  This wouldn't work so well if you had more than a few levels in your 
>> hierarchy.
>>
>> I haven't actually tried this and like I said I'm hoping I could just use a 
>> patch (really I hope 3.x gets released GA with the functionality but I won't 
>> hold my breath...)  But I do think this would work in a pinch if need be.
>>
>> James Dyer
>> E-Commerce Systems
>> Ingram Content Group
>> (615) 213-4311
>>
>>
>> -Original Message-
>> From: Nguyen, Vincent (CDC/OD/OADS) (CTR) [mailto:v...@cdc.gov] 
>> Sent: Tuesday, October 05, 2010 8:22 AM
>> To: solr-user@lucene.apache.org
>> Subject: RE: multi level faceting
>>
>> Just to clarify, the effect I was look for was this.  
>>
>> Sneakers
>>Men (22)
>>Women (43)
>>
>> AFTER a user filters by one of those, they would be presented with a NEW
>> facet field such as 
>>
>> Sneakers
>>Men
>>   Size 7 (10)
>>   Size 8 (11)
>>   Size 9 (23)
>>
>> Vincent Vu Nguyen
>>
>> -Original Message-
>> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
>> Sent: Monday, October 04, 2010 11:44 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: multi level faceting
>>
>> Hi,
>>
>> I *think* this is not what Vincent was after.  If I read the suggestions
>>
>> correctly, you are saying to use &fq=x&fq=y -- multiple fqs.
>> But I think Vincent is wondering how to end up with something that will
>> let him 
>> create a UI with multi-level facets (with a single request), e.g.
>>
>> Footwear (100)
>>   Sneakers (20)
>> Men (1)
>> Women (19)
>>
>>   Dancing shoes (10)
>> Men (0)
>> Women (10)
>> ...
>>
>> If this is what Vincent was after, I'd love to hear suggestions myself.
>> :)
>>
>> Otis
>> 
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> Lucene ecosystem search :: http://search-lucene.com/
>>
>>
>>
>> - Original Message 
>>   
>> 
>>> From: Jason Brown 
>>> To: solr-user@lucene.apache.org
>>> Sent: Mon, October 4, 2010 11:34:56 AM
>>> Subject: RE: multi level faceting
>>>
>>> Yes, by adding fq back into the main query you will get results
>>> 
>>>   
>> increasingly  
>>   
>> 
>>> filtered each time.
>>>
>>> You may run into an issue if you are displaying facet  counts, as the
>>> 
>>>   
>> facet 
>>   
>> 
>>> part of the query will also obey the increasingly filtered  fq, and so
>>>

"OR" facet queries?

2010-10-09 Thread Andy
I want to enable users to select multiple facet values for a specific facet 
fields. For example, if "color" is a facet field, I'd like to let users to 
select "red" OR "blue".

Please note, I've set

because I want "q=hello+world" means "hello" and "world" are AND'ed together.

1) What is the syntax of doing that? Can I implement that by putting "OR" 
within the fq clause?
E.g.
&facet=on&facet.field=color&facet.field=size
&fq=color:(red OR blue)
&fq=size:(M OR L)

2) Is there a performance penalty associated with using "OR" on the facet 
values like that? If so how much of a penalty?

Thanks





  

Re: xml-aware highlighting

2010-10-09 Thread Ahmet Arslan
>  OK - I read a bit more and it
> appears an appropriate analysis pipeline (which would
> extract text from XML using SAX, say) is all that's
> required, and existing highlighting ought to be able to
> accomplish what I'm after.  So I guess the only
> question I have now before writing code is where is the
> existing implementation :)  - anyone?

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory
 may remove xml tags too.





Re: "OR" facet queries?

2010-10-09 Thread Ahmet Arslan
> I want to enable users to select
> multiple facet values for a specific facet fields. For
> example, if "color" is a facet field, I'd like to let users
> to select "red" OR "blue".
> 
> Please note, I've set
> 
> because I want "q=hello+world" means "hello" and "world"
> are AND'ed together.
> 
> 1) What is the syntax of doing that? Can I implement that
> by putting "OR" within the fq clause?
> E.g.
> &facet=on&facet.field=color&facet.field=size
> &fq=color:(red OR blue)
> &fq=size:(M OR L)

Yes you can do that filter queries.  

You may find this interesting. 
http://wiki.apache.org/solr/SimpleFacetParameters#Multi-Select_Faceting_and_LocalParams





  


Re: xml-aware highlighting

2010-10-09 Thread Michael Sokolov

 Yes - that looks right; I was thrown a bit by the name -

Thanks!

On 10/9/2010 5:23 PM, Ahmet Arslan wrote:

  OK - I read a bit more and it
appears an appropriate analysis pipeline (which would
extract text from XML using SAX, say) is all that's
required, and existing highlighting ought to be able to
accomplish what I'm after.  So I guess the only
question I have now before writing code is where is the
existing implementation :)  - anyone?

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory
 may remove xml tags too.







Re: Sorting on arbitary 'custom' fields

2010-10-09 Thread Erick Erickson
I'm confused. What do you mean that a user can "set any
number of arbitrarily named fields on a document". It sounds
like you are talking about a user adding arbitrarily may entries
to a multi-valued field? Or is it some kind of key:value pairs
in a field in your schema?

Under any circumstances, sorting on a multi-valued field is...er...
hard. What does sorting mean there? Sort by the first value entered?
The second? The 15th? This is indeterminate behavior.

What is the over-arching problem you're addressing? I wonder
if this is an XY problem. see:
http://people.apache.org/~hossman/#xyproblem

Best
Erick

On Fri, Oct 8, 2010 at 8:15 PM, Simon Wistow  wrote:

> On Fri, Oct 08, 2010 at 04:56:38PM -0700, kenf_nc said:
> >
> > What behavior are you trying to see? You are allowed to sort on fields
> that
> > are potentially empty, they just sort to the top or bottom depending on
> your
> > sort order. Now, if you Query on the fields that could be empty, you
> won't
> > see the result, but if your document is valid for the query, you can sort
> on
> > whatever field you want whether the document has that field or not.
>
> A user can set any number of arbitarily named fields on a document. We'd
> like to be able to sort by those fields.
>
> The problem is that users can set multiple arbitary fields and we may
> have thousands of them - it would be impractical for us to have these as
> actual fields in the schema.
>
> If I could sort on only the matching values of a multi valued field then
> this would be easy - I'd just collapse down key / value pairs to
> _ and then search for user_field:_*
>
>
>
>
>
>