date:20100111

On Thu, Jan 7, 2010 at 9:34 PM, Giovanni Fernandez-Kincade <
gfernandez-kinc...@capitaliq.com> wrote:

> Right, but if you want to take periodic backups and ship them to tape or
> some DR site, you need to be able to tell when the backup is actually
> complete.
>
> It's seems very strange to me that you can actually track the replication
> progress on a slave, but you can't track the backup progress on a master.
>
>
You are right. This can be improved. See
https://issues.apache.org/jira/browse/SOLR-1714

-- 
Regards,
Shalin Shekhar Mangar.

Re: Adaptive search?

On Fri, Jan 8, 2010 at 3:41 AM, Otis Gospodnetic  wrote:

>
> - Original Message 
>
> > From: Shalin Shekhar Mangar 
> > To: solr-user@lucene.apache.org
> > Sent: Wed, December 23, 2009 2:45:21 AM
> > Subject: Re: Adaptive search?
> >
> > On Wed, Dec 23, 2009 at 4:09 AM, Lance Norskog wrote:
> >
> > > Nice!
> > >
> > > Siddhant: Another problem to watch out for is the feedback problem:
> > > someone clicks on a link and it automatically becomes more
> > > interesting, so someone else clicks, and it gets even more
> > > interesting... So you need some kind of suppression. For example, as
> > > individual clicks get older, you can push them down. Or you can put a
> > > cap on the number of clicks used to rank the query.
> > >
> > >
> > We use clicks/views instead of just clicks to avoid this problem.
>
> Doesn't a click imply a view?  You click to view.  I must be missing
> something...
>
>
I was talking about boosting documents using past popularity. So a user
searches for X and gets 10 results. This view is recorded for each of the 10
documents and added to the index later. If a user clicks on result #2, the
click is recorded for doc #2 and added to index. We boost using clicks/view.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Understanding the query parser

2010-01-11 Thread rswart


I am running in to the same issue. I have tried to replace my
WhitespaceTokenizerFactory with a PatternTokenizerFactory with pattern
(\s+|-) but I still seem to get a phrase query. Why is that?




Ahmet Arslan wrote:
> 
> 
>> I am using Solr 1.3.
>> I have an index with a field called "name". It is of type
>> "text"
>> (unmodified, stock text field from solr).
>> 
>> My query
>> field:foo-bar
>> is parsed as a phrase query
>> field:"foo bar"
>> 
>> I was rather expecting it to be parsed as
>> field:(foo bar)
>> or
>> field:foo field:bar
>> 
>> Is there an expectation mismatch? Can I make it work as I
>> expect it to?
> 
> If the query analyzer produces two or more tokens from a single token,
> QueryParser constructs PhraseQuery. Therefore it is expected. 
> 
> Without writing custom code it seems impossible to alter this behavior.
> 
> Modifying QueryParser to change this behavior will be troublesome. 
> I think easiest way is to replace '-' with whitespace before analysis
> phase. Probably in client side. Or in an custom RequestHandler.
> 
> May be you can set qp.setPhraseSlop(Integer.MAX_VALUE); so that 
> field:foo-bar and field:(foo AND bar) will be virtually equal.
> 
> hope this helps.
> 
> 
>   
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Understanding-the-query-parser-tp27071483p27107523.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Synonyms from Database

2010-01-11 Thread Peter A. Kirk

You could try to take the code for SynonymFilterFactory as a starting point, 
and adapt it to obtain the synonym configuration from another source than a 
text file.

But I'm not sure what you mean by checking for synonyms at query time. As I 
understand it, Solr works like that anyway - depending on how you configure it. 
The only difference between your new SynonymFilterFactory and Solr's default 
would be where it obtains the synonym configuration from.

You can get Solr to re-read the configuration by issuing a "reload" command. 
See http://wiki.apache.org/solr/CoreAdmin#RELOAD.

Med venlig hilsen / Best regards

Peter Kirk
E-mail: mailto:p...@alpha-solutions.dk


-Original Message-
From: Ravi Gidwani [mailto:ravi.gidw...@gmail.com] 
Sent: 10. januar 2010 16:20
To: solr-user@lucene.apache.org
Subject: Synonyms from Database

Hi :
 Is there any work done in providing synonyms from a database instead of
synonyms.txt file ? Idea is to have a dictionary in DB that can be enhanced
on the fly in the application. This can then be used at query time to check
for synonyms.

I know I am not putting thoughts to the performance implications of this
approach, but will love to hear about others thoughts.

~Ravi.

No virus found in this incoming message.
Checked by AVG - www.avg.com 
Version: 9.0.725 / Virus Database: 270.14.133/2612 - Release Date: 01/11/10 
08:35:00

Re: Synonyms from Database

2010-01-11 Thread Ravi Gidwani

Thanks all for your replies.

I guess what I meant by Query time, and as I understand solr  (and I may be
wrong here) I can add synonyms.txt in the query analyser as follows:

By this my understanding is , even if the document (at index time) has a
word "mathematics" and my synonyms.txt file has:

mathematics=>math,maths,

a query for "math" will match "mathematics". Since we have the synonyms.txt
in the query analyzer. So I was curious about the database approach on
similar lines.

I get the point of the performance, and I think that is a big NO NO for this
approach. But the idea was to allow changing the synonyms on the fly (more
like adaptive synonyms) and improve the hits.

I guess the only way (as Otis suggested) is to rewrite the file and reload
configuration (as Peter suggested). This might be a performance hit (rewrite
the file) and reload, but I guess still much better than the reading from DB
?

Thanks again for your comments.

~Ravi.

2010/1/10 Noble Paul നോബിള്‍ नोब्ळ् 

> On Sun, Jan 10, 2010 at 1:04 PM, Otis Gospodnetic
>  wrote:
> > Ravi,
> >
> > I think if your synonyms were in a DB, it would be trivial to
> periodically dump them into a text file Solr expects.  You wouldn't want to
> hit the DB to look up synonyms at query time...
> Why query time. Can it not be done at startup time ?
> >
> >
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
> >
> >
> >
> > - Original Message 
> >> From: Ravi Gidwani 
> >> To: solr-user@lucene.apache.org
> >> Sent: Sat, January 9, 2010 10:20:18 PM
> >> Subject: Synonyms from Database
> >>
> >> Hi :
> >>  Is there any work done in providing synonyms from a database
> instead of
> >> synonyms.txt file ? Idea is to have a dictionary in DB that can be
> enhanced
> >> on the fly in the application. This can then be used at query time to
> check
> >> for synonyms.
> >>
> >> I know I am not putting thoughts to the performance implications of this
> >> approach, but will love to hear about others thoughts.
> >>
> >> ~Ravi.
> >
> >
>
>
>
> --
> -
> Noble Paul | Systems Architect| AOL | http://aol.com
>

Re: Adaptive search?

2010-01-11 Thread Ravi Gidwani

Shalin:
   Can you point me to pages/resources that talk about this approach
in details ? OR can you provide more details on the schema and the
function(?) used for ranking the documents.

Thanks,
~Ravi.

On Mon, Jan 11, 2010 at 1:00 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Fri, Jan 8, 2010 at 3:41 AM, Otis Gospodnetic <
> otis_gospodne...@yahoo.com
> > wrote:
>
> >
> > - Original Message 
> >
> > > From: Shalin Shekhar Mangar 
> > > To: solr-user@lucene.apache.org
> > > Sent: Wed, December 23, 2009 2:45:21 AM
> > > Subject: Re: Adaptive search?
> > >
> > > On Wed, Dec 23, 2009 at 4:09 AM, Lance Norskog wrote:
> > >
> > > > Nice!
> > > >
> > > > Siddhant: Another problem to watch out for is the feedback problem:
> > > > someone clicks on a link and it automatically becomes more
> > > > interesting, so someone else clicks, and it gets even more
> > > > interesting... So you need some kind of suppression. For example, as
> > > > individual clicks get older, you can push them down. Or you can put a
> > > > cap on the number of clicks used to rank the query.
> > > >
> > > >
> > > We use clicks/views instead of just clicks to avoid this problem.
> >
> > Doesn't a click imply a view?  You click to view.  I must be missing
> > something...
> >
> >
> I was talking about boosting documents using past popularity. So a user
> searches for X and gets 10 results. This view is recorded for each of the
> 10
> documents and added to the index later. If a user clicks on result #2, the
> click is recorded for doc #2 and added to index. We boost using
> clicks/view.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

RE: Synonyms from Database

2010-01-11 Thread Peter A. Kirk

Hi - I don't think you'll see a "performance hit" using a DB for your synonym 
configuration as opposed to a text file. 

The configuration is only done once (at startup) - or when you "reload". You 
won't be reloading every minute, will you? After reading the configuration, the 
synonyms are available to Solr via the SynonymFilter object (at least as I 
understand it from looking at the code).

The reload feature actually sounds quite neat - it will reload "in the 
background", and "switch in" the newly read configuration when it's ready - so 
hopefully no down-time waiting for configuration.

Med venlig hilsen / Best regards

Peter Kirk
E-mail: mailto:p...@alpha-solutions.dk

-Original Message-
From: Ravi Gidwani [mailto:ravi.gidw...@gmail.com] 
Sent: 11. januar 2010 22:43
To: solr-user@lucene.apache.org; noble.p...@gmail.com
Subject: Re: Synonyms from Database

Thanks all for your replies.

I guess what I meant by Query time, and as I understand solr  (and I may be
wrong here) I can add synonyms.txt in the query analyser as follows:

By this my understanding is , even if the document (at index time) has a
word "mathematics" and my synonyms.txt file has:

mathematics=>math,maths,

a query for "math" will match "mathematics". Since we have the synonyms.txt
in the query analyzer. So I was curious about the database approach on
similar lines.

I get the point of the performance, and I think that is a big NO NO for this
approach. But the idea was to allow changing the synonyms on the fly (more
like adaptive synonyms) and improve the hits.

I guess the only way (as Otis suggested) is to rewrite the file and reload
configuration (as Peter suggested). This might be a performance hit (rewrite
the file) and reload, but I guess still much better than the reading from DB
?

Thanks again for your comments.

~Ravi.

2010/1/10 Noble Paul നോബിള്‍ नोब्ळ् 

> On Sun, Jan 10, 2010 at 1:04 PM, Otis Gospodnetic
>  wrote:
> > Ravi,
> >
> > I think if your synonyms were in a DB, it would be trivial to
> periodically dump them into a text file Solr expects.  You wouldn't want to
> hit the DB to look up synonyms at query time...
> Why query time. Can it not be done at startup time ?
> >
> >
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
> >
> >
> >
> > - Original Message 
> >> From: Ravi Gidwani 
> >> To: solr-user@lucene.apache.org
> >> Sent: Sat, January 9, 2010 10:20:18 PM
> >> Subject: Synonyms from Database
> >>
> >> Hi :
> >>  Is there any work done in providing synonyms from a database
> instead of
> >> synonyms.txt file ? Idea is to have a dictionary in DB that can be
> enhanced
> >> on the fly in the application. This can then be used at query time to
> check
> >> for synonyms.
> >>
> >> I know I am not putting thoughts to the performance implications of this
> >> approach, but will love to hear about others thoughts.
> >>
> >> ~Ravi.
> >
> >
>
>
>
> --
> -
> Noble Paul | Systems Architect| AOL | http://aol.com
>

No virus found in this incoming message.
Checked by AVG - www.avg.com 
Version: 9.0.725 / Virus Database: 270.14.133/2612 - Release Date: 01/11/10 
08:35:00

Re: Synonyms from Database



On Jan 11, 2010, at 4:51 AM, Peter A. Kirk wrote:
The reload feature actually sounds quite neat - it will reload "in  
the background", and "switch in" the newly read configuration when  
it's ready - so hopefully no down-time waiting for configuration.


Correct me if I'm wrong, but I don't think that it's true about a  
reload working in the background.  While a core is reloading (and  
warming), it is unavailable for search.  right?  I think you have to  
create a new core, and then swap to keep things alive constantly.


Erik

Re: Synonyms from Database

On Mon, Jan 11, 2010 at 4:15 PM, Erik Hatcher wrote:

>
> On Jan 11, 2010, at 4:51 AM, Peter A. Kirk wrote:
>
>> The reload feature actually sounds quite neat - it will reload "in the
>> background", and "switch in" the newly read configuration when it's ready -
>> so hopefully no down-time waiting for configuration.
>>
>
> Correct me if I'm wrong, but I don't think that it's true about a reload
> working in the background.  While a core is reloading (and warming), it is
> unavailable for search.  right?  I think you have to create a new core, and
> then swap to keep things alive constantly.
>
>
Core reload swaps the old core with a new core on the same configuration
files with no downtime. See CoreContainer#reload.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Understanding the query parser

2010-01-11 Thread Ahmet Arslan


> I am running in to the same issue. I have tried to replace
> my
> WhitespaceTokenizerFactory with a PatternTokenizerFactory
> with pattern
> (\s+|-) but I still seem to get a phrase query. Why is
> that?

It is in the source code of QueryParser's getFieldQuery(String field, String 
queryText)  method line#660. If numTokens > 1 it returns Phrase Query. 

Modifications in analysis phase (CharFilterFactory, TokenizerFactory, 
TokenFilterFactory) won't change this behavior. Something must be done before 
analysis phase.

But i think in your case, you can obtain match with modifying parameters of 
WordDelimeterFilterFactory even with PhraseQuery.

Re: No Analyzer, tokenizer or stemmer works at Solr

2010-01-11 Thread MitchK


Hello Hossman,

sorry for my late response.

For this specific case, you are right. It makes more sense to do such work
"on the fly".
However, I am only testing at the moment, what one can do with Solr and what
not.

Is the UpdateProcessor something that comes froms Lucene itself or from
Solr?

Thanks!


hossman wrote:
> 
> 
> : Is there a way to prepare a document the described way with Lucene/Solr,
> : before I analyze it?
> : My use case is to categorize several documents in an automatic way,
> which
> : includes that I have to "create" data from the given input doing some
> : information retrieval.
> 
> As Ryan mentioned earlier: this is what the UpdateRequestProcessor API 
> is for -- it allows you to modify Documents (regardless of how they were 
> added: csv, xml, dih) prior to Solr processing them...
> 
> http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-to27026739.html
> 
> Personally, i think you may be looking at your problem from the wrong 
> dirrection...
> 
> : >> Imagine you would analyze, index and store them like you normally do
> and
> : >> afterwards you want to set, whether the document belongs to the
> expensive
> : >> item-group or not.
> : >> If the price for the item is higher than 500$, it belongs to the
> : >> expensive
> : >> ones, otherwise not.
> 
> ...for a situation like that, i wouldn't attempt to "classify" the docs as 
> "expensive" or "cheap" when adding them.  instead i would use numeric 
> ranges for faceting and filtering to show me how many docs where 
> "expensive" or "cheap" at query time -- that way when the ecomony tanks i 
> can redifine my definition of "expensive" on the fly w/o needing to 
> reindex a million documents.
> 
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27109760.html
Sent from the Solr - User mailing list archive at Nabble.com.

Multi language support

2010-01-11 Thread Daniel Persson

Hi Solr users.

I'm trying to set up a site with Solr search integrated. And I use the
SolJava API to feed the index with search documents. At the moment I
have only activated search on the English portion of the site. I'm
interested in using as many features of solr as possible. Synonyms,
Stopwords and stems all sounds quite interesting and useful but how do
I set up this in a good way for a multilingual site?

The site don't have a huge text mass so performance issues don't
really bother me but still I'd like to hear your suggestions before I
try to implement an solution.

Best regards

Daniel

Re: No Analyzer, tokenizer or stemmer works at Solr



On Jan 11, 2010, at 7:33 AM, MitchK wrote:
Is the UpdateProcessor something that comes froms Lucene itself or  
from

Solr?


It's at the Solr level - 


Erik

Re: Synonyms from Database



On Jan 11, 2010, at 5:50 AM, Shalin Shekhar Mangar wrote:

On Mon, Jan 11, 2010 at 4:15 PM, Erik Hatcher  
wrote:




On Jan 11, 2010, at 4:51 AM, Peter A. Kirk wrote:

The reload feature actually sounds quite neat - it will reload "in  
the
background", and "switch in" the newly read configuration when  
it's ready -

so hopefully no down-time waiting for configuration.



Correct me if I'm wrong, [me saying something wrong]


Core reload swaps the old core with a new core on the same  
configuration

files with no downtime. See CoreContainer#reload.


Sweet!  Thanks for the correction.

Erik

Re: Could not start SOLR issue

2010-01-11 Thread Grant Ingersoll


On Jan 11, 2010, at 1:38 AM, dipti khullar wrote:

> Hi
> 
> We are running master/slave Solr 1.3 version on production since about 5
> months.
> 
> Yesterday, we faced following issue on one of the slaves for the first time
> because of which we had to restart the slave.
> 
> SEVERE: Could not start SOLR. Check solr/home property
> java.lang.RuntimeException: java.io.FileNotFoundException: no segments* file
> found in 
> org.apache.lucene.store.FSDirectory@/opt/solr/solr_slave/solr/data/index:
> files: null

It looks like your index was removed out from under you.  Perhaps this is due 
to the failed snapshot install?

Can you replicate the problem?  Stopping the slave and deleting the index 
directory and then restarting it should resolve it for now.

> 
> I searched on forums but couldn't find any relevant info which could have
> possibly caused the issue.
> 
> In snapinstaller logs, following failed logs were observed:
> 
> 2010/01/11 04:20:06 started by solr
> 2010/01/11 04:20:06 command:
> /opt/solr/solr_slave/solr/solr/bin/snapinstaller
> 2010/01/11 04:20:07 installing snapshot
> /opt/solr/solr_slave/solr/data/snapshot.20100111041402
> 2010/01/11 04:20:07 notifing Solr to open a new Searcher
> 2010/01/11 04:20:07 failed to connect to Solr server
> 2010/01/11 04:20:07 snapshot installed but Solr server has not open a new
> Searcher
> 2010/01/11 04:20:08 failed (elapsed time: 1 sec)
> 
> 
> Configurations:
> There are 2 search servers in a virtualized VMware environment. Each has  2
> instances of Solr running on separates ports in tomcat.
> Server 1: hosts 1 master(application 1), 1 slave (application 1)
> Server 2: hosta 1 master (application 2), 1 slave (application 1)
> 
> Both servers have 4 CPUs and 4 GB RAM.
> Master
> - 4GB RAM
> - 1GB JVM Heap memory is allocated to Solr
> Slave1/Slave2:
> - 4GB RAM
> - 2GB JVM Heap memory is allocated to Solr
> 
> Can there be any possible reasons that solr/home property couldn't be found?
> 
> Thanks
> Dipti

Re: Could not start SOLR issue

2010-01-11 Thread dipti khullar

We were able to resolve the problem by restarting the slave. Also these
failed snapshot install incidents occur after the exception was observed,
which seems logically correct also.
"Could not start SOLR. Check solr/home property"

We just want to avoid such instances for future. Is it possible that an any
instance of time solr/home property can get corrupted?

One more thing we observed was that tomcat-users.xml was overwritten. Should
we debug towards that also?

Thanks
Dipti

On Mon, Jan 11, 2010 at 6:55 PM, Grant Ingersoll wrote:

>
> On Jan 11, 2010, at 1:38 AM, dipti khullar wrote:
>
> > Hi
> >
> > We are running master/slave Solr 1.3 version on production since about 5
> > months.
> >
> > Yesterday, we faced following issue on one of the slaves for the first
> time
> > because of which we had to restart the slave.
> >
> > SEVERE: Could not start SOLR. Check solr/home property
> > java.lang.RuntimeException: java.io.FileNotFoundException: no segments*
> file
> > found in org.apache.lucene.store.FSDirectory@
> /opt/solr/solr_slave/solr/data/index:
> > files: null
>
> It looks like your index was removed out from under you.  Perhaps this is
> due to the failed snapshot install?
>
> Can you replicate the problem?  Stopping the slave and deleting the index
> directory and then restarting it should resolve it for now.
>
> >
> > I searched on forums but couldn't find any relevant info which could have
> > possibly caused the issue.
> >
> > In snapinstaller logs, following failed logs were observed:
> >
> > 2010/01/11 04:20:06 started by solr
> > 2010/01/11 04:20:06 command:
> > /opt/solr/solr_slave/solr/solr/bin/snapinstaller
> > 2010/01/11 04:20:07 installing snapshot
> > /opt/solr/solr_slave/solr/data/snapshot.20100111041402
> > 2010/01/11 04:20:07 notifing Solr to open a new Searcher
> > 2010/01/11 04:20:07 failed to connect to Solr server
> > 2010/01/11 04:20:07 snapshot installed but Solr server has not open a new
> > Searcher
> > 2010/01/11 04:20:08 failed (elapsed time: 1 sec)
> >
> >
> > Configurations:
> > There are 2 search servers in a virtualized VMware environment. Each has
>  2
> > instances of Solr running on separates ports in tomcat.
> > Server 1: hosts 1 master(application 1), 1 slave (application 1)
> > Server 2: hosta 1 master (application 2), 1 slave (application 1)
> >
> > Both servers have 4 CPUs and 4 GB RAM.
> > Master
> > - 4GB RAM
> > - 1GB JVM Heap memory is allocated to Solr
> > Slave1/Slave2:
> > - 4GB RAM
> > - 2GB JVM Heap memory is allocated to Solr
> >
> > Can there be any possible reasons that solr/home property couldn't be
> found?
> >
> > Thanks
> > Dipti
>
>

update solr index

2010-01-11 Thread Marc Des Garets

Hi,

I am running solr in tomcat and I have about 35 indexes (between 2 and
80 millions documents each). Currently if I try to update few documents
from an index (let's say the one which contains 80 millions documents)
while tomcat is running and therefore receiving requests, I am getting
few very long garbage collection (about 60sec). I am running tomcat with
-Xms10g -Xmx10g -Xmn2g -XX:PermSize=256m -XX:MaxPermSize=256m. I'm using
ConcMarkSweepGC.

I have 2 questions:
1. Is solr doing something specific while an index is being updated like
updating something in memory which would cause the garbage collection?

2. Any idea how I could solve this problem? Currently I stop tomcat,
update index, start tomcat. I would like to be able to update my index
while tomcat is running. I was thinking about running more tomcat
instance with less memory for each and each running few of my indexes.
Do you think it would be the best way to go?


Thanks,
Marc
--
This transmission is strictly confidential, possibly legally privileged, and 
intended solely for the 
addressee.  Any views or opinions expressed within it are those of the author 
and do not necessarily 
represent those of 192.com, i-CD Publishing (UK) Ltd or any of it's subsidiary 
companies.  If you 
are not the intended recipient then you must not disclose, copy or take any 
action in reliance of this 
transmission. If you have received this transmission in error, please notify 
the sender as soon as 
possible.  No employee or agent is authorised to conclude any binding agreement 
on behalf of 
i-CD Publishing (UK) Ltd with another party by email without express written 
confirmation by an 
authorised employee of the Company. http://www.192.com (Tel: 08000 192 192).  
i-CD Publishing (UK) Ltd 
is incorporated in England and Wales, company number 3148549, VAT No. GB 
673128728.

Re: No Analyzer, tokenizer or stemmer works at Solr

2010-01-11 Thread MitchK

Is there any schemata that explains which class is responsible for which
level of processing my data to the index?

My example was: I have categorized, whether something is cheap or expensive.  
Let's say I didn't do that on the fly, but with the help of the
UpdateRequestProcessor.
Imagine there is a query like "harry potter dvd-collection cheap" or "cheap
Harry Potter dvd-collection". 
How can I customize, that, if there is something said about the category
"cheap", Solr uses a facetting query on "cat:cheap"? To do so, I have to
alter the original query - how can I do that?

Erik Hatcher-4 wrote:
> 
> 
> On Jan 11, 2010, at 7:33 AM, MitchK wrote:
>> Is the UpdateProcessor something that comes froms Lucene itself or  
>> from
>> Solr?
> 
> It's at the Solr level -
>   
>  >
> 
>   Erik
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27111504.html
Sent from the Solr - User mailing list archive at Nabble.com.

How to display Highlight with VelocityResponseWriter?

2010-01-11 Thread qiuyan . xu


Hi,

we need a web gui for solr and we've noticed that  
VelocityResponseWriter is integrated in solr-proj for that purpose.  
But i have no idea how i can configure solrconfig.xml so that snippet  
with highlight can also be displayed in the web gui. I've added name="hl">true into the standard responseHandler and it already  
works, i.e without velocity. But the same line doesn't take effect in  
itas. Should i configure anything else? Thanks in advance.


with best regards,
Qiuyan




  
  ${solr.abortOnConfigurationError:true}

  
  ${solr.data.dir:./solr/data}


  
   
false

10



32
2147483647
1
1000
1










single
  

  

false
32
10


2147483647
1


false



  
  false
  
  1
  


  

  
  

  
  









  


  

1024





   


  



true




   

   
50


200







  
 solr 0 10 
 rocks 0 10 
static newSearcher warming query from solrconfig.xml
  




  
 fast_warm 0 10 
static firstSearcher warming query from solrconfig.xml
  



false


2

  

  
  






   
   
   

  

  
   
   
   

 100

   

   
   

  
  70
  
  0.5
  
  [-\w ,/\n\"']{20,200}

   

   
   

 
 

   
  

  
  

 
   explicit
   true
   
 

  

  
 
   browse
   velocity.properties
   Solritas

   velocity
   dismax
   *:*
   10
   *,score
   on
   content
   1
   
  content^0.5 url^0.5 cluster^0.5
   
   true
   regex
   html
   
 highlight
   
 
 
   
 
  

  




  
  

 dismax
 explicit
 0.01
 
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
 
 
text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9
 
 
ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3
 
 
id,name,price,score
 
 
2<-1 5<-2 6<90%
 
 100
 *:*
 
 text features name
 
 0
 
 name
 regex 

  

  
  

 dismax
 explicit
 text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0
 2<-1 5<-2 6<90%
 
 incubationdate_dt:[* TO NOW/DAY-1MONTH]^2.2



  inStock:true



  cat
  manu_exact
  price:[* TO 500]
  price:[500 TO *]

  

  
  

  
   
  

textSpell


  default
  spell
  ./spellchecker1



  jarowinkler
  spell
  
  org.apache.lucene.search.spell.JaroWinklerDistance
  ./spellchecker2




  solr.FileBasedSpellChecker
  file
  spellings.txt
  UTF-8
  ./spellcheckerFile

  

  
  

  
  false
  
  false
  
  1


  spellcheck

  

  
  
  

  true


  tvComponent

  


  
  

string
elevate.xml
  

  
  

  explicit


  elevator

  


  
  

  
  


  
  


  
  

  
  

  standard
  solrpingquery
  all

  

  
  

 explicit 
 true

  



  

  

  
  
5
  


  

  

  
  
solr

Re: Getting solr response data in a JS query

2010-01-11 Thread Gregg Hoshovsky

You might be running into  an Ajax restriction.

See if an article like this helps.


http://www.nathanm.com/ajax-bypassing-xmlhttprequest-cross-domain-restriction/


On 1/9/10 11:37 PM, "Otis Gospodnetic"  wrote:

Dan,

You didn't mention whether you tried &wt=json .  Does it work if you use that 
to tell Solr to return its response in JSON format?

 Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



- Original Message 
> From: Dan Yamins 
> To: solr-user@lucene.apache.org
> Sent: Sat, January 9, 2010 10:05:54 PM
> Subject: Getting solr response data in a JS query
>
> Hi:
>
> I'm trying to use figure out how to get solr responses and use them in my
> website.I'm having some problems figure out how to
>
> 1) My initial thought is is to use ajax, and insert a line like this in my
> script:
>
>  data = eval($.get("http://localhost:8983/solr/select/?q=*:*
> ").responseText)
>
> ... and then do what I want with the data, with logic being done in
> Javascript on the front page.
>
> However, this is just not working technically:  no matter what alternative I
> use, I always seem to get no response to this query.  I think I'm having
> exactly the same problem as described here:
>
> http://www.mail-archive.com/solr-user@lucene.apache.org/msg29949.html<%20http://www.mail-archive.com/solr-user@lucene.apache.org/msg29949.html>
>
> and here:
>
> http://stackoverflow.com/questions/1906498/solr-responses-to-webbrowser-url-but-not-from-javascript-code
>
> Just like those two OPs, I can definitely access my solr responese through a
> web browser, but my jquery is getting nothing.Unfortunately, in neither
> thread did the answer seem to have been figured out satisfactorily.   Does
> anybody know what the problem is?
>
>
> 2)  As an alternative, I _can_ use  the ajax-solr library.   Code like this:
>
> var Manager;
> (function ($) {
>   $(function () {
> Manager = new AjaxSolr.Manager({
>   solrUrl: 'http://localhost:8983/solr/'
>});
>
>   Manager.init();
>   Manager.store.addByValue('q', '*:*');
>   Manager.store.addByValue('rows', '1000');
>   Manager.doRequest();
>   });
> })(jQuery);
>
> does indeed load solr data into my DOM.Somehow, ajax-solr's doRequest
> method is doing something that makes it possible to receive the proper
> response from the solr servlet, but I don't know what it is so I can't
> replicate it with my own ajax.   Does anyone know what is happening?
>
> (Of course, I _could_ just use ajax-solr, but doing so would mean figuring
> out how to re-write my existing application for how to display search
> results in a form that works with the ajax-solr api, and I' d rather avoid
> this if possible since it looks somewhat nontrivial.)
>
>
> Thanks!
> Dan

Re: Getting solr response data in a JS query

2010-01-11 Thread Matt Mitchell

I remember having a difficult time getting jquery to work as I thought it
would. Something to do with the wt. I ended up creating a little client lib.
Maybe this will be useful in finding your problem?

example:
  http://github.com/mwmitchell/get_rest/blob/master/solr_example.html
lib:
  http://github.com/mwmitchell/get_rest/blob/master/solr_client.jquery.js

Matt

On Mon, Jan 11, 2010 at 11:22 AM, Gregg Hoshovsky  wrote:

> You might be running into  an Ajax restriction.
>
> See if an article like this helps.
>
>
>
> http://www.nathanm.com/ajax-bypassing-xmlhttprequest-cross-domain-restriction/
>
>
> On 1/9/10 11:37 PM, "Otis Gospodnetic"  wrote:
>
> Dan,
>
> You didn't mention whether you tried &wt=json .  Does it work if you use
> that to tell Solr to return its response in JSON format?
>
>  Otis
> --
> Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
>
>
>
> - Original Message 
> > From: Dan Yamins 
> > To: solr-user@lucene.apache.org
> > Sent: Sat, January 9, 2010 10:05:54 PM
> > Subject: Getting solr response data in a JS query
> >
> > Hi:
> >
> > I'm trying to use figure out how to get solr responses and use them in my
> > website.I'm having some problems figure out how to
> >
> > 1) My initial thought is is to use ajax, and insert a line like this in
> my
> > script:
> >
> >  data = eval($.get("http://localhost:8983/solr/select/?q=*:*
> > ").responseText)
> >
> > ... and then do what I want with the data, with logic being done in
> > Javascript on the front page.
> >
> > However, this is just not working technically:  no matter what
> alternative I
> > use, I always seem to get no response to this query.  I think I'm having
> > exactly the same problem as described here:
> >
> > http://www.mail-archive.com/solr-user@lucene.apache.org/msg29949.html
> <%20http://www.mail-archive.com/solr-user@lucene.apache.org/msg29949.html>
> >
> > and here:
> >
> >
> http://stackoverflow.com/questions/1906498/solr-responses-to-webbrowser-url-but-not-from-javascript-code
> >
> > Just like those two OPs, I can definitely access my solr responese
> through a
> > web browser, but my jquery is getting nothing.Unfortunately, in
> neither
> > thread did the answer seem to have been figured out satisfactorily.
> Does
> > anybody know what the problem is?
> >
> >
> > 2)  As an alternative, I _can_ use  the ajax-solr library.   Code like
> this:
> >
> > var Manager;
> > (function ($) {
> >   $(function () {
> > Manager = new AjaxSolr.Manager({
> >   solrUrl: 'http://localhost:8983/solr/'
> >});
> >
> >   Manager.init();
> >   Manager.store.addByValue('q', '*:*');
> >   Manager.store.addByValue('rows', '1000');
> >   Manager.doRequest();
> >   });
> > })(jQuery);
> >
> > does indeed load solr data into my DOM.Somehow, ajax-solr's doRequest
> > method is doing something that makes it possible to receive the proper
> > response from the solr servlet, but I don't know what it is so I can't
> > replicate it with my own ajax.   Does anyone know what is happening?
> >
> > (Of course, I _could_ just use ajax-solr, but doing so would mean
> figuring
> > out how to re-write my existing application for how to display search
> > results in a form that works with the ajax-solr api, and I' d rather
> avoid
> > this if possible since it looks somewhat nontrivial.)
> >
> >
> > Thanks!
> > Dan
>
>
>

Re: How to display Highlight with VelocityResponseWriter?

2010-01-11 Thread Sascha Szott


Qiuyan,


with highlight can also be displayed in the web gui. I've added true into the standard responseHandler and it already
works, i.e without velocity. But the same line doesn't take effect in
itas. Should i configure anything else? Thanks in advance.
First of all, just a few notes on the /itas request handler in your 
solrconfig.xml:


1. The entry


  highlight


is obsolete, since the highlighting component is a default search 
component [1].


2. Note that since you didn't specify a value for hl.fl highlighting 
will only affect the fields listed inside of qf.


3. Why did you override the default value of hl.fragmenter? In most 
cases the default fragmenting algorithm (gap) works fine - and maybe in 
yours as well?



To make sure all your hl related settings are correct, can you post an 
xml output (change the wt parameter to xml) for a search with 
highlighted results.


And finally, can you post the vtl code snippet that should produce the 
highlighted output.


-Sascha

[1] http://wiki.apache.org/solr/SearchComponent

Re: Multi language support

2010-01-11 Thread Markus Jelsma

Hello,


We have implemented language specific search in Solr using language
specific fields and field types. For instance, an en_text field type can
use an English stemmer, and list of stopwords and synonyms. We, however
did not use specific stopwords, instead we used one list shared by both
languages.

So you would have a field type like:

  
  

etc etc.



Cheers,

-  
Markus Jelsma  Buyways B.V.
Technisch ArchitectFriesestraatweg 215c
http://www.buyways.nl  9743 AD Groningen   


Alg. 050-853 6600  KvK  01074105
Tel. 050-853 6620  Fax. 050-3118124
Mob. 06-5025 8350  In: http://www.linkedin.com/in/markus17


On Mon, 2010-01-11 at 13:45 +0100, Daniel Persson wrote:

> Hi Solr users.
> 
> I'm trying to set up a site with Solr search integrated. And I use the
> SolJava API to feed the index with search documents. At the moment I
> have only activated search on the English portion of the site. I'm
> interested in using as many features of solr as possible. Synonyms,
> Stopwords and stems all sounds quite interesting and useful but how do
> I set up this in a good way for a multilingual site?
> 
> The site don't have a huge text mass so performance issues don't
> really bother me but still I'd like to hear your suggestions before I
> try to implement an solution.
> 
> Best regards
> 
> Daniel

Replication problem

2010-01-11 Thread Jason Rutherglen

Hi, sorry for the somewhat inane question:

I setup replication request handler on the master however I'm not
seeing any replicatable indexes via
http://localhost:8080/solr/main/replication?command=indexversion
Queries such as *:* yield results on the master (so I assume the
commit worked).  The replication console shows an index, so not sure
what's going on.  Here's the request handler XML on the master:



   true
   

   
   schema.xml,synonyms.txt,stopwords.txt,elevate.xml

Re: Replication problem

2010-01-11 Thread Yonik Seeley

Did you try adding "startup" to the list of events to replicate after?

-Yonik
http://www.lucidimagination.com

On Mon, Jan 11, 2010 at 12:25 PM, Jason Rutherglen
 wrote:
> Hi, sorry for the somewhat inane question:
>
> I setup replication request handler on the master however I'm not
> seeing any replicatable indexes via
> http://localhost:8080/solr/main/replication?command=indexversion
> Queries such as *:* yield results on the master (so I assume the
> commit worked).  The replication console shows an index, so not sure
> what's going on.  Here's the request handler XML on the master:
>
> 
>    
>       true
>       
>
>       
>        name="confFiles">schema.xml,synonyms.txt,stopwords.txt,elevate.xml
>

Re: Replication problem

2010-01-11 Thread Jason Rutherglen

Yonik,

I added startup to replicateAfter, however no dice... There's no
errors the Tomcat log.

The output of:
http://localhost-master:8080/solr/main/replication?command=indexversion


0
0

0
0


The master replication UI:
Local Index  Index Version: 1263182366335, Generation: 3
Location: /mnt/solr/main/data/index
Size: 1.08 KB

Master solrconfig.xml, and tomcat was restarted:



   true
   

   
   schema.xml,synonyms.txt,stopwords.txt,elevate.xml
   
>>
>>       
>>       > name="confFiles">schema.xml,synonyms.txt,stopwords.txt,elevate.xml
>>

help implementing a couple of business rules

2010-01-11 Thread Joe Calderon

hello *, im looking for help on writing queries to implement a few
business rules.


1. given a set of fields how to return matches that match across them
but not just one specific one, ex im using a dismax parser currently
but i want to exclude any results that only match against a field
called 'description2'


2. given a set of fields how to return matches that match across them
but on one specific field match as a phrase only, ex im using a dismax
parser currently but i want matches against a field called 'people' to
only match as a phrase


thx much,

--joe

Re: help implementing a couple of business rules



On Jan 11, 2010, at 12:56 PM, Joe Calderon wrote:

1. given a set of fields how to return matches that match across them
but not just one specific one, ex im using a dismax parser currently
but i want to exclude any results that only match against a field
called 'description2'


One way could be to add an fq parameter to the request:

   &fq=-description2:()


2. given a set of fields how to return matches that match across them
but on one specific field match as a phrase only, ex im using a dismax
parser currently but i want matches against a field called 'people' to
only match as a phrase


Doesn't setting pf=people accomplish this?

Erik

Re: help implementing a couple of business rules

2010-01-11 Thread Joe Calderon

thx, but im not sure that covers all edge cases, to clarify
1. matching description2 is okay if other fields are matched too, but
results matching only to description2 should be omitted

2. its okay to not match against the people field, but matches against
the people field should only be phrase matches

sorry if  i was unclear

--joe
On Mon, Jan 11, 2010 at 10:13 AM, Erik Hatcher  wrote:
>
> On Jan 11, 2010, at 12:56 PM, Joe Calderon wrote:
>>
>> 1. given a set of fields how to return matches that match across them
>> but not just one specific one, ex im using a dismax parser currently
>> but i want to exclude any results that only match against a field
>> called 'description2'
>
> One way could be to add an fq parameter to the request:
>
>   &fq=-description2:()
>
>> 2. given a set of fields how to return matches that match across them
>> but on one specific field match as a phrase only, ex im using a dismax
>> parser currently but i want matches against a field called 'people' to
>> only match as a phrase
>
> Doesn't setting pf=people accomplish this?
>
>        Erik
>
>

Re: Understanding the query parser

2010-01-11 Thread Avlesh Singh

>
> It is in the source code of QueryParser's getFieldQuery(String field,
> String queryText)  method line#660. If numTokens > 1 it returns Phrase
> Query.
>
That's exactly the question. Would be nice to hear from someone as to why is
it that way?

Cheers
Avlesh

On Mon, Jan 11, 2010 at 5:10 PM, Ahmet Arslan  wrote:

>
> > I am running in to the same issue. I have tried to replace
> > my
> > WhitespaceTokenizerFactory with a PatternTokenizerFactory
> > with pattern
> > (\s+|-) but I still seem to get a phrase query. Why is
> > that?
>
> It is in the source code of QueryParser's getFieldQuery(String field,
> String queryText)  method line#660. If numTokens > 1 it returns Phrase
> Query.
>
> Modifications in analysis phase (CharFilterFactory, TokenizerFactory,
> TokenFilterFactory) won't change this behavior. Something must be done
> before analysis phase.
>
> But i think in your case, you can obtain match with modifying parameters of
> WordDelimeterFilterFactory even with PhraseQuery.
>
>
>
>

Cores + Replication Config

2010-01-11 Thread Giovanni Fernandez-Kincade

If you want to share one config amidst master & slaves, using Solr 1.4 
replication, is there a way to specific whether a core is Master or Slave when 
using the CREATE Core command?

Thanks,
Gio.

Re: XmlUpdateRequestHandler with HTMLStripCharFilterFactory

2010-01-11 Thread darniz


Thanks we were having the saem issue.
We are trying to store article content and we are strong a field like
This article is for blah .
Wheni see the analysis.jsp page it does strip out the  tags and is
indexed. but when we fetch the document it returns the field with the 
tags.
>From solr point of view, its correct but our issue is that this kind of html
tags is screwing up our display of our page. Is there an easy way to esure
how to strip out hte html tags, or do we have to take care of manually.

Thanks
Rashid


aseem cheema wrote:
> 
> Alright. It turns out that escapedTags is not for what I thought it is
> for.
> The problem that I am having with HTMLStripCharFilterFactory is that
> it strips the html while indexing the field, but not while storing the
> field. That is why what is see in analysis.jsp, which is index
> analysis, does not match what gets stored... because.. well HTML is
> stripped only for indexing. Makes so much sense.
> 
> Thanks to Ryan McKinley for clarifying this.
> Aseem
> 
> On Wed, Nov 11, 2009 at 9:50 AM, aseem cheema 
> wrote:
>> I am trying to post a document with the following content using SolrJ:
>> content
>> I need the xml/html tags to be ignored. Even though this works fine in
>> analysis.jsp, this does not work with SolrJ, as the client escapes the
>> < and > with < and > and HTMLStripCharFilterFactory does not
>> strip those escaped tags. How can I achieve this? Any ideas will be
>> highly appreciated.
>>
>> There is escapedTags in HTMLStripCharFilterFactory constructor. Is
>> there a way to get that to work?
>> Thanks
>> --
>> Aseem
>>
> 
> 
> 
> -- 
> Aseem
> 
> 

-- 
View this message in context: 
http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27116434.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Tokenizer question

2010-01-11 Thread Grant Ingersoll

What do your FieldTypes look like for the fields in question?

On Jan 10, 2010, at 10:05 AM, rswart wrote:

> 
> Hi,
> 
> This is probably an easy question. 
> 
> I am doing a simple query on postcode and house number. If the housenumber
> contains a minus sign like:
> 
> q=PostCode:(1078 pw)+AND+HouseNumber:(39-43)
> 
> the resulting parsed query contains a phrase query:
> 
> +(PostCode:1078 PostCode:pw) +PhraseQuery(HouseNumber:"39 43")
> 
> This never matches.
> 
> What I want solr to do is generate the following parsed query (essentially
> an OR for both house numbers):
> 
> +(PostCode:1078 PostCode:pw) +(HouseNumber:39 HouseNumber:43)
> 
> Solr generates this based on the following query (so a space instead of a
> minus sign):
> 
> q=PostCode:(1078 pw)+AND+HouseNumber:(39 43)
> 
> 
> I tried two things to have Solr generate the desired parsed query:
> 
> 1. WordDelimiterFilterFactory with generateNumberParts=1 but this results in
> a phrase query
> 2. PatternTokenizerFactory that splits on (\s+|-).
> 
> But both options don't work. 
> 
> Any suggestions on how to get rid of the phrase query?
> 
> Thanks,
> 
> Richard
> -- 
> View this message in context: 
> http://old.nabble.com/Tokenizer-question-tp27099119p27099119.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search

Re: Tokenizer question

2010-01-11 Thread Grant Ingersoll

And also, what query parser are you using? 
On Jan 11, 2010, at 2:46 PM, Grant Ingersoll wrote:

> What do your FieldTypes look like for the fields in question?
> 
> On Jan 10, 2010, at 10:05 AM, rswart wrote:
> 
>> 
>> Hi,
>> 
>> This is probably an easy question. 
>> 
>> I am doing a simple query on postcode and house number. If the housenumber
>> contains a minus sign like:
>> 
>> q=PostCode:(1078 pw)+AND+HouseNumber:(39-43)
>> 
>> the resulting parsed query contains a phrase query:
>> 
>> +(PostCode:1078 PostCode:pw) +PhraseQuery(HouseNumber:"39 43")
>> 
>> This never matches.
>> 
>> What I want solr to do is generate the following parsed query (essentially
>> an OR for both house numbers):
>> 
>> +(PostCode:1078 PostCode:pw) +(HouseNumber:39 HouseNumber:43)
>> 
>> Solr generates this based on the following query (so a space instead of a
>> minus sign):
>> 
>> q=PostCode:(1078 pw)+AND+HouseNumber:(39 43)
>> 
>> 
>> I tried two things to have Solr generate the desired parsed query:
>> 
>> 1. WordDelimiterFilterFactory with generateNumberParts=1 but this results in
>> a phrase query
>> 2. PatternTokenizerFactory that splits on (\s+|-).
>> 
>> But both options don't work. 
>> 
>> Any suggestions on how to get rid of the phrase query?
>> 
>> Thanks,
>> 
>> Richard
>> -- 
>> View this message in context: 
>> http://old.nabble.com/Tokenizer-question-tp27099119p27099119.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem using Solr/Lucene: 
> http://www.lucidimagination.com/search
> 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search

Re: XmlUpdateRequestHandler with HTMLStripCharFilterFactory

2010-01-11 Thread Erick Erickson

This page: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
shows you many
of the SOLR analyzers and filters. Would one of
the various *HTMLStrip* stuff work?

HTH
ERick

On Mon, Jan 11, 2010 at 2:44 PM, darniz  wrote:

>
> Thanks we were having the saem issue.
> We are trying to store article content and we are strong a field like
> This article is for blah .
> Wheni see the analysis.jsp page it does strip out the  tags and is
> indexed. but when we fetch the document it returns the field with the 
> tags.
> From solr point of view, its correct but our issue is that this kind of
> html
> tags is screwing up our display of our page. Is there an easy way to esure
> how to strip out hte html tags, or do we have to take care of manually.
>
> Thanks
> Rashid
>
>
> aseem cheema wrote:
> >
> > Alright. It turns out that escapedTags is not for what I thought it is
> > for.
> > The problem that I am having with HTMLStripCharFilterFactory is that
> > it strips the html while indexing the field, but not while storing the
> > field. That is why what is see in analysis.jsp, which is index
> > analysis, does not match what gets stored... because.. well HTML is
> > stripped only for indexing. Makes so much sense.
> >
> > Thanks to Ryan McKinley for clarifying this.
> > Aseem
> >
> > On Wed, Nov 11, 2009 at 9:50 AM, aseem cheema 
> > wrote:
> >> I am trying to post a document with the following content using SolrJ:
> >> content
> >> I need the xml/html tags to be ignored. Even though this works fine in
> >> analysis.jsp, this does not work with SolrJ, as the client escapes the
> >> < and > with < and > and HTMLStripCharFilterFactory does not
> >> strip those escaped tags. How can I achieve this? Any ideas will be
> >> highly appreciated.
> >>
> >> There is escapedTags in HTMLStripCharFilterFactory constructor. Is
> >> there a way to get that to work?
> >> Thanks
> >> --
> >> Aseem
> >>
> >
> >
> >
> > --
> > Aseem
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27116434.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: XmlUpdateRequestHandler with HTMLStripCharFilterFactory

2010-01-11 Thread darniz


Well thats the whole discussion we are talking about.
I had the impression that the html tags are filtered and then the field is
stored without tags. But looks like the html tags are removed and terms are
indexed purely for indexing, and the actual text is stored in raw format.

Lets say for example if i enter a field like 
honda car road review
When i do analysis on the body field the html filter removes the  tag and
indexed works honda, car, road, review. But when i fetch body field to
display in my document it returns honda car road review

I hope i make sense.
thanks
darniz



Erick Erickson wrote:
> 
> This page: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
> shows you
> many
> of the SOLR analyzers and filters. Would one of
> the various *HTMLStrip* stuff work?
> 
> HTH
> ERick
> 
> On Mon, Jan 11, 2010 at 2:44 PM, darniz  wrote:
> 
>>
>> Thanks we were having the saem issue.
>> We are trying to store article content and we are strong a field like
>> This article is for blah .
>> Wheni see the analysis.jsp page it does strip out the  tags and is
>> indexed. but when we fetch the document it returns the field with the 
>> tags.
>> From solr point of view, its correct but our issue is that this kind of
>> html
>> tags is screwing up our display of our page. Is there an easy way to
>> esure
>> how to strip out hte html tags, or do we have to take care of manually.
>>
>> Thanks
>> Rashid
>>
>>
>> aseem cheema wrote:
>> >
>> > Alright. It turns out that escapedTags is not for what I thought it is
>> > for.
>> > The problem that I am having with HTMLStripCharFilterFactory is that
>> > it strips the html while indexing the field, but not while storing the
>> > field. That is why what is see in analysis.jsp, which is index
>> > analysis, does not match what gets stored... because.. well HTML is
>> > stripped only for indexing. Makes so much sense.
>> >
>> > Thanks to Ryan McKinley for clarifying this.
>> > Aseem
>> >
>> > On Wed, Nov 11, 2009 at 9:50 AM, aseem cheema 
>> > wrote:
>> >> I am trying to post a document with the following content using SolrJ:
>> >> content
>> >> I need the xml/html tags to be ignored. Even though this works fine in
>> >> analysis.jsp, this does not work with SolrJ, as the client escapes the
>> >> < and > with < and > and HTMLStripCharFilterFactory does not
>> >> strip those escaped tags. How can I achieve this? Any ideas will be
>> >> highly appreciated.
>> >>
>> >> There is escapedTags in HTMLStripCharFilterFactory constructor. Is
>> >> there a way to get that to work?
>> >> Thanks
>> >> --
>> >> Aseem
>> >>
>> >
>> >
>> >
>> > --
>> > Aseem
>> >
>> >
>>
>> --
>> View this message in context:
>> http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27116434.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27116601.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Adaptive search?


: I was talking about boosting documents using past popularity. So a user
: searches for X and gets 10 results. This view is recorded for each of the 10
: documents and added to the index later. If a user clicks on result #2, the
: click is recorded for doc #2 and added to index. We boost using clicks/view.

FWIW: I've observed three problems with this type of metric...

1) "render" vs "view" ... what you are calling a "view" is really a 
"rendering" -- you are sending the data back to include the item in the 
list of 10 items on the page, and the brwoser is rendering it, but that 
doesn't mean the users is actaully "viewing" it -- particularly in a 
webpage type situation where only the first 3-5 results might actually 
appear "above the fold" and the user has to scroll to see the rest.  Even 
in a smaller UI element (like a left or right nav info box, there's no 
garuntee that the user acctually "views" any of the items, which can bias 
things.

2) It doesn't take into account people who click on a result, decide it's 
terrible, hit the back arrow and click on a differnet result -- both of 
those wind up scoring "equally".  Some really complex session+click 
analysis can overcome this, but not a lot of people have the resources to 
do that all the time.

3) ignoring #1 and #2 above (because i havne't found many better options) 
you face the popularity problem -- or what my coworkers and i use to call 
the "TRL Problem" back in the 90s:  MTV's Total Request Live was a Top X 
countdown show of videos, featuring hte most popular videos of the week 
based on requests -- but it was also the number one show on the network, 
occupying something like 4/24 broadcast hours of every day, when there was 
only a total of 6/24 hours that actaully showed music videoes.  So for 
them ost part the only videos peopel ever saw were on TRL, so those were 
the only videos that ever got requested.

In a nutshell: once something becomes "popular" and is what everybody 
sees, it stays popular, because it's what everybody sees and they don't 
know that there is better stuff out there.

Even if everyone looks at the full list of results and actaully reads all 
of the first 10 summaries, in the absense of ay other bias their 
inclination is going to be to assume #1 is the best.  So they might click 
on that even if another result on the list appears better bassed on their 
opinion.

A variation that i did some experiments with, but never really refined 
because i didn't have the time/energy to really go to town on it, is to 
weight the "clicks" based on position:  a click on item #1 whould't be 
worth anything -- it's hte number one result, the expectation is that it 
better get clicked or something is wrong.  A click on #2 is worth 
soemthing to that item, and a click on #3 is worth more to that item, and 
so on ... so that if the #9 item gets a click, that's huge.  To do it 
right, I think what you really want to do is penalize items that get views 
but no clicks -- because if someone loads up resuolts 1-10, and doesn't 
click on any of them, that should be a vote in favor of moving all of them 
"down" and moving item #11 up (even though it got no views or clicks)

But like i said: i never experimented with this idea enough to come up 
with a good formula, or verify that the idea was sound.

-Hoss

Re: Getting solr response data in a JS query

2010-01-11 Thread James McKinney


AJAX Solr does more or less the following:

jQuery.getJSON('http://localhost:8983/solr/select/?q=*:*&wt=json&json.wrf=?',
{}, function (data) {
// do something with data, which is the eval'd JSON response
});
-- 
View this message in context: 
http://old.nabble.com/Getting-solr-response-data-in-a-JS-query-tp27095224p27116970.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: XmlUpdateRequestHandler with HTMLStripCharFilterFactory


: stored without tags. But looks like the html tags are removed and terms are
: indexed purely for indexing, and the actual text is stored in raw format.

Correct. Analysis is all about "indexing" it has nothing to do with 
"stored" content.

You can write UpdateProcessors that modify the content before it is either 
indexed or stored, but there aren't a lot of Processors provided out of 
hte box at the moment.

-Hoss

Re: Tokenizer question

2010-01-11 Thread rswart


We are using the standard query parser (so no dismax).

Fieldtype is solr.TextField with the following query analyzer:


 
  











Grant Ingersoll-6 wrote:
> 
> And also, what query parser are you using? 
> On Jan 11, 2010, at 2:46 PM, Grant Ingersoll wrote:
> 
>> What do your FieldTypes look like for the fields in question?
>> 
>> On Jan 10, 2010, at 10:05 AM, rswart wrote:
>> 
>>> 
>>> Hi,
>>> 
>>> This is probably an easy question. 
>>> 
>>> I am doing a simple query on postcode and house number. If the
>>> housenumber
>>> contains a minus sign like:
>>> 
>>> q=PostCode:(1078 pw)+AND+HouseNumber:(39-43)
>>> 
>>> the resulting parsed query contains a phrase query:
>>> 
>>> +(PostCode:1078 PostCode:pw) +PhraseQuery(HouseNumber:"39 43")
>>> 
>>> This never matches.
>>> 
>>> What I want solr to do is generate the following parsed query
>>> (essentially
>>> an OR for both house numbers):
>>> 
>>> +(PostCode:1078 PostCode:pw) +(HouseNumber:39 HouseNumber:43)
>>> 
>>> Solr generates this based on the following query (so a space instead of
>>> a
>>> minus sign):
>>> 
>>> q=PostCode:(1078 pw)+AND+HouseNumber:(39 43)
>>> 
>>> 
>>> I tried two things to have Solr generate the desired parsed query:
>>> 
>>> 1. WordDelimiterFilterFactory with generateNumberParts=1 but this
>>> results in
>>> a phrase query
>>> 2. PatternTokenizerFactory that splits on (\s+|-).
>>> 
>>> But both options don't work. 
>>> 
>>> Any suggestions on how to get rid of the phrase query?
>>> 
>>> Thanks,
>>> 
>>> Richard
>>> -- 
>>> View this message in context:
>>> http://old.nabble.com/Tokenizer-question-tp27099119p27099119.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>> 
>> 
>> --
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>> 
>> Search the Lucene ecosystem using Solr/Lucene:
>> http://www.lucidimagination.com/search
>> 
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem using Solr/Lucene:
> http://www.lucidimagination.com/search
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Tokenizer-question-tp27099119p27117036.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: XmlUpdateRequestHandler with HTMLStripCharFilterFactory

2010-01-11 Thread Erick Erickson

Ah, I read your post too fast and ignored the title. Sorry 'bout that.

Erick

On Mon, Jan 11, 2010 at 2:55 PM, darniz  wrote:

>
> Well thats the whole discussion we are talking about.
> I had the impression that the html tags are filtered and then the field is
> stored without tags. But looks like the html tags are removed and terms are
> indexed purely for indexing, and the actual text is stored in raw format.
>
> Lets say for example if i enter a field like
> honda car road review
> When i do analysis on the body field the html filter removes the  tag
> and
> indexed works honda, car, road, review. But when i fetch body field to
> display in my document it returns honda car road review
>
> I hope i make sense.
> thanks
> darniz
>
>
>
> Erick Erickson wrote:
> >
> > This page: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
> > shows you
> > many
> > of the SOLR analyzers and filters. Would one of
> > the various *HTMLStrip* stuff work?
> >
> > HTH
> > ERick
> >
> > On Mon, Jan 11, 2010 at 2:44 PM, darniz  wrote:
> >
> >>
> >> Thanks we were having the saem issue.
> >> We are trying to store article content and we are strong a field like
> >> This article is for blah .
> >> Wheni see the analysis.jsp page it does strip out the  tags and is
> >> indexed. but when we fetch the document it returns the field with the
> 
> >> tags.
> >> From solr point of view, its correct but our issue is that this kind of
> >> html
> >> tags is screwing up our display of our page. Is there an easy way to
> >> esure
> >> how to strip out hte html tags, or do we have to take care of manually.
> >>
> >> Thanks
> >> Rashid
> >>
> >>
> >> aseem cheema wrote:
> >> >
> >> > Alright. It turns out that escapedTags is not for what I thought it is
> >> > for.
> >> > The problem that I am having with HTMLStripCharFilterFactory is that
> >> > it strips the html while indexing the field, but not while storing the
> >> > field. That is why what is see in analysis.jsp, which is index
> >> > analysis, does not match what gets stored... because.. well HTML is
> >> > stripped only for indexing. Makes so much sense.
> >> >
> >> > Thanks to Ryan McKinley for clarifying this.
> >> > Aseem
> >> >
> >> > On Wed, Nov 11, 2009 at 9:50 AM, aseem cheema 
> >> > wrote:
> >> >> I am trying to post a document with the following content using
> SolrJ:
> >> >> content
> >> >> I need the xml/html tags to be ignored. Even though this works fine
> in
> >> >> analysis.jsp, this does not work with SolrJ, as the client escapes
> the
> >> >> < and > with < and > and HTMLStripCharFilterFactory does not
> >> >> strip those escaped tags. How can I achieve this? Any ideas will be
> >> >> highly appreciated.
> >> >>
> >> >> There is escapedTags in HTMLStripCharFilterFactory constructor. Is
> >> >> there a way to get that to work?
> >> >> Thanks
> >> >> --
> >> >> Aseem
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Aseem
> >> >
> >> >
> >>
> >> --
> >> View this message in context:
> >>
> http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27116434.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27116601.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Search query log using solr


: application. I am planning to add a search query log that will capture all
: the search queries (and more information like IP,user info,date time,etc).
: I understand I can easily do this on the application side capturing all the
: search request, logging them in a DB/File before sending them to solr for
: execution.
:  But I wanted to check with the forum if there was any better
: approach OR best practices OR anything that has been added to Solr for such
: requirement.

doing this in your applicatyion is probably the best bet ... you could put 
all of the extra info in query args to solr, which would be ignored but 
included in Solr's own logs, except that would mcuk up any HTTP Caching 
you might do (and putting an Accelerator Cache in front of Solr is a 
really easy way to reduce load in a lot of common situations)

-Hoss

Re: Understanding the query parser



On Jan 11, 2010, at 1:33 PM, Avlesh Singh wrote:



It is in the source code of QueryParser's getFieldQuery(String field,
String queryText)  method line#660. If numTokens > 1 it returns  
Phrase

Query.

That's exactly the question. Would be nice to hear from someone as  
to why is

it that way?


Suppose you indexed "Foo Bar".  It'd get indexed as two tokens [foo]  
followed by [bar].  Then someone searches for foo-bar, which would get  
analyzed into two tokens also.  A PhraseQuery is the most logical  
thing for it to turn into, no?


What's the alternative?

Of course it's tricky business though, impossible to do the right  
thing for all cases within SolrQueryParser.  Thankfully it is  
pleasantly subclassable and overridable for this method.


Erik

Commons Lang

2010-01-11 Thread Jeff Newburn

We have a solr plugin that would be much easier to write if commons-lang was
available.  Why does solr not have this library?  Is there any drawbacks to
pulling in the commons lang for StringUtils?
-- 
Jeff Newburn
Software Engineer, Zappos.com

Re: XmlUpdateRequestHandler with HTMLStripCharFilterFactory

2010-01-11 Thread darniz


no problem

Erick Erickson wrote:
> 
> Ah, I read your post too fast and ignored the title. Sorry 'bout that.
> 
> Erick
> 
> On Mon, Jan 11, 2010 at 2:55 PM, darniz  wrote:
> 
>>
>> Well thats the whole discussion we are talking about.
>> I had the impression that the html tags are filtered and then the field
>> is
>> stored without tags. But looks like the html tags are removed and terms
>> are
>> indexed purely for indexing, and the actual text is stored in raw format.
>>
>> Lets say for example if i enter a field like
>> honda car road review
>> When i do analysis on the body field the html filter removes the  tag
>> and
>> indexed works honda, car, road, review. But when i fetch body field to
>> display in my document it returns honda car road review
>>
>> I hope i make sense.
>> thanks
>> darniz
>>
>>
>>
>> Erick Erickson wrote:
>> >
>> > This page: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>> > shows you
>> > many
>> > of the SOLR analyzers and filters. Would one of
>> > the various *HTMLStrip* stuff work?
>> >
>> > HTH
>> > ERick
>> >
>> > On Mon, Jan 11, 2010 at 2:44 PM, darniz 
>> wrote:
>> >
>> >>
>> >> Thanks we were having the saem issue.
>> >> We are trying to store article content and we are strong a field like
>> >> This article is for blah .
>> >> Wheni see the analysis.jsp page it does strip out the  tags and is
>> >> indexed. but when we fetch the document it returns the field with the
>> 
>> >> tags.
>> >> From solr point of view, its correct but our issue is that this kind
>> of
>> >> html
>> >> tags is screwing up our display of our page. Is there an easy way to
>> >> esure
>> >> how to strip out hte html tags, or do we have to take care of
>> manually.
>> >>
>> >> Thanks
>> >> Rashid
>> >>
>> >>
>> >> aseem cheema wrote:
>> >> >
>> >> > Alright. It turns out that escapedTags is not for what I thought it
>> is
>> >> > for.
>> >> > The problem that I am having with HTMLStripCharFilterFactory is that
>> >> > it strips the html while indexing the field, but not while storing
>> the
>> >> > field. That is why what is see in analysis.jsp, which is index
>> >> > analysis, does not match what gets stored... because.. well HTML is
>> >> > stripped only for indexing. Makes so much sense.
>> >> >
>> >> > Thanks to Ryan McKinley for clarifying this.
>> >> > Aseem
>> >> >
>> >> > On Wed, Nov 11, 2009 at 9:50 AM, aseem cheema
>> 
>> >> > wrote:
>> >> >> I am trying to post a document with the following content using
>> SolrJ:
>> >> >> content
>> >> >> I need the xml/html tags to be ignored. Even though this works fine
>> in
>> >> >> analysis.jsp, this does not work with SolrJ, as the client escapes
>> the
>> >> >> < and > with < and > and HTMLStripCharFilterFactory does not
>> >> >> strip those escaped tags. How can I achieve this? Any ideas will be
>> >> >> highly appreciated.
>> >> >>
>> >> >> There is escapedTags in HTMLStripCharFilterFactory constructor. Is
>> >> >> there a way to get that to work?
>> >> >> Thanks
>> >> >> --
>> >> >> Aseem
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Aseem
>> >> >
>> >> >
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27116434.html
>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27116601.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://old.nabble.com/XmlUpdateRequestHandler-with-HTMLStripCharFilterFactory-tp26305561p27118304.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multi language support

2010-01-11 Thread Don Werve

This is the way I've implemented multilingual search as well.

2010/1/11 Markus Jelsma 

> Hello,
>
>
> We have implemented language specific search in Solr using language
> specific fields and field types. For instance, an en_text field type can
> use an English stemmer, and list of stopwords and synonyms. We, however
> did not use specific stopwords, instead we used one list shared by both
> languages.
>
> So you would have a field type like:
>   
>  
>  
>
> etc etc.
>
>
>
> Cheers,
>
> -
> Markus Jelsma  Buyways B.V.
> Technisch ArchitectFriesestraatweg 215c
> http://www.buyways.nl  9743 AD Groningen
>
>
> Alg. 050-853 6600  KvK  01074105
> Tel. 050-853 6620  Fax. 050-3118124
> Mob. 06-5025 8350  In: http://www.linkedin.com/in/markus17
>
>
> On Mon, 2010-01-11 at 13:45 +0100, Daniel Persson wrote:
>
> > Hi Solr users.
> >
> > I'm trying to set up a site with Solr search integrated. And I use the
> > SolJava API to feed the index with search documents. At the moment I
> > have only activated search on the English portion of the site. I'm
> > interested in using as many features of solr as possible. Synonyms,
> > Stopwords and stems all sounds quite interesting and useful but how do
> > I set up this in a good way for a multilingual site?
> >
> > The site don't have a huge text mass so performance issues don't
> > really bother me but still I'd like to hear your suggestions before I
> > try to implement an solution.
> >
> > Best regards
> >
> > Daniel
>

Encountering a roadblock with my Solr schema design...use dedupe?

I am in the process of building a Solr search solution for my application and
have run into a roadblock with the schema design. Trying to match criteria
in one multi-valued field with corresponding criteria in another
multi-valued field. Any advice would be greatly appreciated.

BACKGROUND:
My RDBMS data model is such that for every one of my "Product" entities,
there are one-to-many "SKU" entities available for purchase. Each SKU entity
can have its own price, as well as one-to-many options, etc. The web
frontend displays available "Product" entities on both directory and detail
pages.

In order to take advantage of Solr's facet count, paging, and sorting
functionality, I decided to base the Solr schema on "Product" documents; so
none of my documents currently contain duplicate "Product" data, and all
"SKU" related data is denormalized as necessary, but into multi-valued
fields. For example, I have a document with an "id" field set to
"Product:7," a "docType" field is set to "Product" as well as multi-valued
"SKU" related fields and data like, "sku_color" {Red | Green | Blue},
"sku_size" {Small | Medium | Large}, "sku_price" {10.00 | 10.00 | 7.99}

I hit the roadblock when I tried to answer the question, "Which products are
available that contain skus with color Green, size M, and a price of $9.99
or less?"...and have now begun the switch to "SKU" level indexing. This
also gives me what I need for faceted browsing/navigation, and search
refinement...leading the user to "Product" entities having purchasable "SKU"
entities. But this also means I now have documents which are mostly
duplicates for each "Product," and all, facet counts, paging and sorting is
then inaccurate; so it appears I need do this myself, with multiple Solr
requests.

Is this really the best approach; and if so, should I use the Solr
Deduplication update processor when indexing and querying?

Thanks in advance,
Kelly
--
View this message in context:
http://old.nabble.com/Encountering-a-roadblock-with-my-Solr-schema-design...use-dedupe--tp27118977p27118977.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Encountering a roadblock with my Solr schema design...use dedupe?

2010-01-11 Thread Markus Jelsma

Hello Kelly,


I am not entirely sure if i understand your problem correctly. But i
believe your first approach is the right one.

Your question: "Which products are available that contain skus with color
Green, size M, and a price of $9.99 or less?" can be easily answered using
a schema like yours.

id = 1
color = [green, blue]
size = [M, S]
price = 6

id = 2
color = [red, blue]
size = [L, S]
price = 12

id = 3
color = [green, red, blue]
size = [L, S, M]
price = 5

Using the data above you can answer your question using a basic Solr query
[1] like the following: q=color:green AND price:[0 TO 9,99] AND size:M

Of course, you would make this a function query [2] but this, if i
understood your question well enough, answers it.

[1] http://wiki.apache.org/solr/SolrQuerySyntax
[2] http://wiki.apache.org/solr/FunctionQuery


Cheers,


Kelly Taylor zei:
>
> I am in the process of building a Solr search solution for my
> application and have run into a roadblock with the schema design.
> Trying to match criteria in one multi-valued field with corresponding
> criteria in another
> multi-valued field.  Any advice would be greatly appreciated.
>
> BACKGROUND:
> My RDBMS data model is such that for every one of my "Product" entities,
> there are one-to-many "SKU" entities available for purchase. Each SKU
> entity can have its own price, as well as one-to-many options, etc.  The
> web frontend displays available "Product" entities on both directory and
> detail pages.
>
> In order to take advantage of Solr's facet count, paging, and sorting
> functionality, I decided to base the Solr schema on "Product" documents;
> so none of my documents currently contain duplicate "Product" data, and
> all "SKU" related data is denormalized as necessary, but into
> multi-valued fields.  For example, I have a document with an "id" field
> set to
> "Product:7," a "docType" field is set to "Product" as well as
> multi-valued "SKU" related fields and data like, "sku_color" {Red |
> Green | Blue}, "sku_size" {Small | Medium | Large}, "sku_price" {10.00 |
> 10.00 | 7.99}
>
> I hit the roadblock when I tried to answer the question, "Which products
> are available that contain skus with color Green, size M, and a price of
> $9.99 or less?"...and have now begun the switch to "SKU" level indexing.
>  This also gives me what I need for faceted browsing/navigation, and
> search refinement...leading the user to "Product" entities having
> purchasable "SKU" entities.  But this also means I now have documents
> which are mostly duplicates for each "Product," and all, facet counts,
> paging and sorting is then inaccurate;  so it appears I need do this
> myself, with multiple Solr requests.
>
> Is this really the best approach; and if so, should I use the Solr
> Deduplication update processor when indexing and querying?
>
> Thanks in advance,
> Kelly
> --
> View this message in context:
> http://old.nabble.com/Encountering-a-roadblock-with-my-Solr-schema-design...use-dedupe--tp27118977p27118977.html
> Sent from the Solr - User mailing list archive at Nabble.com.

EOF IOException Query

2010-01-11 Thread Osborn Chan

Hi all,

I got following exception for SOLR, but the index is still searchable. (At 
least it is searchable for query "*:*".)
I am just wondering what is the root cause.

Thanks,
Osborn

INFO: [publicGalleryPostMaster] webapp=/multicore path=/select 
params={wt=javabin&rows=12&start=0&sort=/gallery/1/postlist/1Rank_i+desc&q=%2B(comm
unityList_s_m:/gallery/1/postlist/1)+%2Bstate_s:A&version=1} status=500 QTime=3
Jan 11, 2010 12:23:01 PM org.apache.solr.common.SolrException log
SEVERE: java.io.IOException: read past EOF
at 
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:151)
at 
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:80)
at 
org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:112)
at 
org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:712)
at 
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:208)
at 
org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:676)
at 
org.apache.lucene.search.FieldComparator$StringOrdValComparator.setNextReader(FieldComparator.java:667)
at 
org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:94)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:245)
at org.apache.lucene.search.Searcher.search(Searcher.java:171)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)

Re: Encountering a roadblock with my Solr schema design...use dedupe?


Hi Markus,

Thanks for your reply.

Using the current schema and query like you suggest, how can I identify the
unique combination of options and price for a given SKU?   I don't want the
user to arrive at a product which doesn't completely satisfy their search
request.  For example, with the "color:Green", "size:M", and "price:[0 to
9.99]" search refinements applied,  no products should be displayed which
only have "size:M" in "color:Blue"

The actual data in the database for a product to display on the frontend
could be as follows:

product id = 1
product name = T-shirt

related skus...
-- sku id = 7 [color=green, size=S, price=10.99]
-- sku id = 9 [color=green, size=L, price=10.99]
-- sku id = 10 [color=blue, size=S, price=9.99]
-- sku id = 11 [color=blue, size=M, price=10.99]
-- sku id = 12 [color=blue, size=L, price=10.99]

Regards,
Kelly


Markus Jelsma - Buyways B.V. wrote:
> 
> Hello Kelly,
> 
> 
> I am not entirely sure if i understand your problem correctly. But i
> believe your first approach is the right one.
> 
> Your question: "Which products are available that contain skus with color
> Green, size M, and a price of $9.99 or less?" can be easily answered using
> a schema like yours.
> 
> id = 1
> color = [green, blue]
> size = [M, S]
> price = 6
> 
> id = 2
> color = [red, blue]
> size = [L, S]
> price = 12
> 
> id = 3
> color = [green, red, blue]
> size = [L, S, M]
> price = 5
> 
> Using the data above you can answer your question using a basic Solr query
> [1] like the following: q=color:green AND price:[0 TO 9,99] AND size:M
> 
> Of course, you would make this a function query [2] but this, if i
> understood your question well enough, answers it.
> 
> [1] http://wiki.apache.org/solr/SolrQuerySyntax
> [2] http://wiki.apache.org/solr/FunctionQuery
> 
> 
> Cheers,
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Encountering-a-roadblock-with-my-Solr-schema-design...use-dedupe--tp27118977p27120031.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Encountering a roadblock with my Solr schema design...use dedupe?

2010-01-11 Thread Markus Jelsma

Hello Kelly,


Simple boolean algebra, you tell Solr you want color = green AND size = M
so it will only return green t-shirts in size M. If you, however, turn the
AND in a OR it will return all t-shirts that are green OR in size M, thus
you can then get M sized shirts in the blue color or green shirts in size
XXL.

I suggest you'd just give it a try and perhaps come back later to find
some improvements for your query. It would also be a good idea - if i may
say so - to read the links provided in the earlier message.

Hope you will find what you're looking for :)


Cheers,

Kelly Taylor zei:
>
> Hi Markus,
>
> Thanks for your reply.
>
> Using the current schema and query like you suggest, how can I identify
> the unique combination of options and price for a given SKU?   I don't
> want the user to arrive at a product which doesn't completely satisfy
> their search request.  For example, with the "color:Green", "size:M",
> and "price:[0 to 9.99]" search refinements applied,  no products should
> be displayed which only have "size:M" in "color:Blue"
>
> The actual data in the database for a product to display on the frontend
> could be as follows:
>
> product id = 1
> product name = T-shirt
>
> related skus...
> -- sku id = 7 [color=green, size=S, price=10.99]
> -- sku id = 9 [color=green, size=L, price=10.99]
> -- sku id = 10 [color=blue, size=S, price=9.99]
> -- sku id = 11 [color=blue, size=M, price=10.99]
> -- sku id = 12 [color=blue, size=L, price=10.99]
>
> Regards,
> Kelly
>
>
> Markus Jelsma - Buyways B.V. wrote:
>>
>> Hello Kelly,
>>
>>
>> I am not entirely sure if i understand your problem correctly. But i
>> believe your first approach is the right one.
>>
>> Your question: "Which products are available that contain skus with
>> color Green, size M, and a price of $9.99 or less?" can be easily
>> answered using a schema like yours.
>>
>> id = 1
>> color = [green, blue]
>> size = [M, S]
>> price = 6
>>
>> id = 2
>> color = [red, blue]
>> size = [L, S]
>> price = 12
>>
>> id = 3
>> color = [green, red, blue]
>> size = [L, S, M]
>> price = 5
>>
>> Using the data above you can answer your question using a basic Solr
>> query [1] like the following: q=color:green AND price:[0 TO 9,99] AND
>> size:M
>>
>> Of course, you would make this a function query [2] but this, if i
>> understood your question well enough, answers it.
>>
>> [1] http://wiki.apache.org/solr/SolrQuerySyntax
>> [2] http://wiki.apache.org/solr/FunctionQuery
>>
>>
>> Cheers,
>>
>>
>
> --
> View this message in context:
> http://old.nabble.com/Encountering-a-roadblock-with-my-Solr-schema-design...use-dedupe--tp27118977p27120031.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Encountering a roadblock with my Solr schema design...use dedupe?

Hi Markus,

Thanks again. I wish this were simple boolean algebra. This is something I
have already tried. So either I am missing the boat completely, or have
failed to communicate it clearly. I didn't want to confuse the issue further
but maybe the following excerpts will help...

Excerpt from  "Solr 1.4 Enterprise Search Server" by David Smiley & Eric
Pugh...

"...the criteria for this hypothetical search involves multi-valued fields,
where the index of one matching criteria needs to correspond to the same
value in another multi-valued field in the same index. You can't do that..."

And this excerpt is from "Solr and RDBMS: The basics of designing your
application for the best of both" by by Amit Nithianandan...

"...If I wanted to allow my users to search for wiper blades available in a
store nearby, I might create an index with multiple documents or records for
the same exact wiper blade, each document having different location data
(lat/long, address, etc.) to represent an individual store. Solr has a
de-duplication component to help show unique documents in case that
particular wiper blade is available in multiple stores near me..."

http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Solr-and-RDBMS-design-basics

Remember, with my original schema definition I have multi-valued fields, and
when the "product" document is built, these fields do contain an array of
values retrieved from each of the related skus. Skus are children of my
products.

Using your example data, which t-shirt sku is available for purchase as a
child of t-shirt product with id 3? Is it really the green, M, or have we
found a product document related to both a green t-shirt and a Medium
t-shirt of some other color, which will thereby leave the user with nothing
to purchase?

sku = 9 [color=green, size=L, price=10.99], product id = 3
sku = 10 [color=blue, size=S, price=9.99], product id = 3
sku = 11 [color=blue, size=M, price=10.99], product id = 3

>> id = 1
>> color = [green, blue]
>> size = [M, S]
>> price = 6
>>
>> id = 2
>> color = [red, blue]
>> size = [L, S]
>> price = 12
>>
>> id = 3
>> color = [green, red, blue]
>> size = [L, S, M]
>> price = 5

If this is still unclear, I'll post a new question based on findings from
this conversation. Thanks for all of your help.

-Kelly

Markus Jelsma - Buyways B.V. wrote:
> 
> Hello Kelly,
> 
> 
> Simple boolean algebra, you tell Solr you want color = green AND size = M
> so it will only return green t-shirts in size M. If you, however, turn the
> AND in a OR it will return all t-shirts that are green OR in size M, thus
> you can then get M sized shirts in the blue color or green shirts in size
> XXL.
> 
> I suggest you'd just give it a try and perhaps come back later to find
> some improvements for your query. It would also be a good idea - if i may
> say so - to read the links provided in the earlier message.
> 
> Hope you will find what you're looking for :)
> 
> 
> Cheers,
> 
> Kelly Taylor zei:
>>
>> Hi Markus,
>>
>> Thanks for your reply.
>>
>> Using the current schema and query like you suggest, how can I identify
>> the unique combination of options and price for a given SKU?   I don't
>> want the user to arrive at a product which doesn't completely satisfy
>> their search request.  For example, with the "color:Green", "size:M",
>> and "price:[0 to 9.99]" search refinements applied,  no products should
>> be displayed which only have "size:M" in "color:Blue"
>>
>> The actual data in the database for a product to display on the frontend
>> could be as follows:
>>
>> product id = 1
>> product name = T-shirt
>>
>> related skus...
>> -- sku id = 7 [color=green, size=S, price=10.99]
>> -- sku id = 9 [color=green, size=L, price=10.99]
>> -- sku id = 10 [color=blue, size=S, price=9.99]
>> -- sku id = 11 [color=blue, size=M, price=10.99]
>> -- sku id = 12 [color=blue, size=L, price=10.99]
>>
>> Regards,
>> Kelly
>>
>>
>> Markus Jelsma - Buyways B.V. wrote:
>>>
>>> Hello Kelly,
>>>
>>>
>>> I am not entirely sure if i understand your problem correctly. But i
>>> believe your first approach is the right one.
>>>
>>> Your question: "Which products are available that contain skus with
>>> color Green, size M, and a price of $9.99 or less?" can be easily
>>> answered using a schema like yours.
>>>
>>> id = 1
>>> color = [green, blue]
>>> size = [M, S]
>>> price = 6
>>>
>>> id = 2
>>> color = [red, blue]
>>> size = [L, S]
>>> price = 12
>>>
>>> id = 3
>>> color = [green, red, blue]
>>> size = [L, S, M]
>>> price = 5
>>>
>>> Using the data above you can answer your question using a basic Solr
>>> query [1] like the following: q=color:green AND price:[0 TO 9,99] AND
>>> size:M
>>>
>>> Of course, you would make this a function query [2] but this, if i
>>> understood your question well enough, answers it.
>>>
>>> [1] http://wiki.apache.org/solr/SolrQuerySyntax
>>> [2] http://wiki.apache.org/solr/FunctionQuery
>>>
>>>
>>> Cheers,
>>>
>>>
>>
>> --
>> View this message in con

Re: Commons Lang

There's no point in moving it to Solr core unless something in core  
depends on it.


The VelocityResponseWriter depends on commons-lang, though, and I am  
aiming to integrate that into core at some point.


But, you can put commons-lang in your /lib and your plugin  
will be able to see it fine.


Erik


On Jan 11, 2010, at 4:39 PM, Jeff Newburn wrote:

We have a solr plugin that would be much easier to write if commons- 
lang was
available.  Why does solr not have this library?  Is there any  
drawbacks to

pulling in the commons lang for StringUtils?
--
Jeff Newburn
Software Engineer, Zappos.com

Re: Tokenizer question


: q=PostCode:(1078 pw)+AND+HouseNumber:(39-43)
: 
: the resulting parsed query contains a phrase query:
: 
: +(PostCode:1078 PostCode:pw) +PhraseQuery(HouseNumber:"39 43")

This stems from some fairly fundemental behavior i nthe QueryParser ... 
each "chunk" of input that isn't deemed "markup (ie: not field names, or 
special characters) is sent to the analyzer.  If the analyzer produces 
multiple tokens at differnet positions, then a PhraseQuery is constructed. 
-- Things like simple phrase searchs and N-Gram based partial matching 
require this behavior.

If the analyzer produces multiple Tokens, but they all have the same 
position then the QueryParser produces a BooleanQuery will all SHOULD 
clauses.  -- This is what allows simple synonyms to work.

If you write a simple TokenFilter to "flatten" all of the positions to be 
the same, and use it after WordDelimiterFilter then it should give you the 
"OR" style query you want.

This isn't hte default behavior because the Phrase behavior of WDF fits 
it's intended case better --- someone searching for a product sku 
like X3QZ-D5 expects it to match X-3QZD5, but not just "X" or "3QZ"

-Hoss

Re: Tokenizer question

2010-01-11 Thread Avlesh Singh

>
> If the analyzer produces multiple Tokens, but they all have the same
> position then the QueryParser produces a BooleanQuery will all SHOULD
> clauses.  -- This is what allows simple synonyms to work.
>
You rock Hoss!!! This is exactly the explanation I was looking for .. it is
as simple as it sounds. Thanks!

Cheers
Avlesh

On Tue, Jan 12, 2010 at 6:37 AM, Chris Hostetter
wrote:

>
> : q=PostCode:(1078 pw)+AND+HouseNumber:(39-43)
> :
> : the resulting parsed query contains a phrase query:
> :
> : +(PostCode:1078 PostCode:pw) +PhraseQuery(HouseNumber:"39 43")
>
> This stems from some fairly fundemental behavior i nthe QueryParser ...
> each "chunk" of input that isn't deemed "markup (ie: not field names, or
> special characters) is sent to the analyzer.  If the analyzer produces
> multiple tokens at differnet positions, then a PhraseQuery is constructed.
> -- Things like simple phrase searchs and N-Gram based partial matching
> require this behavior.
>
> If the analyzer produces multiple Tokens, but they all have the same
> position then the QueryParser produces a BooleanQuery will all SHOULD
> clauses.  -- This is what allows simple synonyms to work.
>
> If you write a simple TokenFilter to "flatten" all of the positions to be
> the same, and use it after WordDelimiterFilter then it should give you the
> "OR" style query you want.
>
> This isn't hte default behavior because the Phrase behavior of WDF fits
> it's intended case better --- someone searching for a product sku
> like X3QZ-D5 expects it to match X-3QZD5, but not just "X" or "3QZ"
>
> -Hoss
>
>

Re: Understanding the query parser

2010-01-11 Thread Avlesh Singh

Thanks Erik for responding.
Hoss explained the behavior with nice corollaries here -
http://www.lucidimagination.com/search/document/8bc351d408f24cf6/tokenizer_question

Cheers
Avlesh

On Tue, Jan 12, 2010 at 2:21 AM, Erik Hatcher wrote:

>
> On Jan 11, 2010, at 1:33 PM, Avlesh Singh wrote:
>
>
>>> It is in the source code of QueryParser's getFieldQuery(String field,
>>> String queryText)  method line#660. If numTokens > 1 it returns Phrase
>>> Query.
>>>
>>>  That's exactly the question. Would be nice to hear from someone as to
>> why is
>> it that way?
>>
>
> Suppose you indexed "Foo Bar".  It'd get indexed as two tokens [foo]
> followed by [bar].  Then someone searches for foo-bar, which would get
> analyzed into two tokens also.  A PhraseQuery is the most logical thing for
> it to turn into, no?
>
> What's the alternative?
>
> Of course it's tricky business though, impossible to do the right thing for
> all cases within SolrQueryParser.  Thankfully it is pleasantly subclassable
> and overridable for this method.
>
>Erik
>
>

Solr 1.4 Field collapsing - What are the steps for applying the SOLR-236 patch?


Hi,

Is there a step-by-step for applying the patch for SOLR-236 to enable field
collapsing in Solr 1.4?

Thanks,
Kelly
-- 
View this message in context: 
http://old.nabble.com/Solr-1.4-Field-collapsing---What-are-the-steps-for-applying-the-SOLR-236-patch--tp27122621p27122621.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 1.4 Field collapsing - What are the steps for applying the SOLR-236 patch?

2010-01-11 Thread Joe Calderon

it seems to be in flux right now as the solr developers slowly make 
improvements and ingest the various pieces into the solr trunk, i think 
your best bet might be to use the 12/24 patch and fix any errors where 
it doesnt apply cleanly


im using solr trunk r892336 with the 12/24 patch


--joe
On 01/11/2010 08:48 PM, Kelly Taylor wrote:

Hi,

Is there a step-by-step for applying the patch for SOLR-236 to enable field
collapsing in Solr 1.4?

Thanks,
Kelly

Seattle Hadoop / HBase / Lucene / NoSQL meetup Jan 27th!

2010-01-11 Thread Bradford Stephens

Greetings,

A friendly reminder that the Seattle Hadoop, NoSQL, etc. meetup is on
January 27th at University of Washington in the Allen Computer Science
Building, room 303.

I believe Razorfish will be giving a talk on how they use Hadoop.

Here's the new, shiny meetup.com link with more detail:
http://www.meetup.com/Seattle-Hadoop-HBase-NoSQL-Meetup

-- 
http://www.drawntoscalehq.com -- Big Data for all. The Big Data Platform.

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science

Re: Tokenizer question

2010-01-11 Thread rswart



Cristal clear. Thanks for your response&time!
-- 
View this message in context: 
http://old.nabble.com/Tokenizer-question-tp27099119p27123281.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: update solr index