date:20100106

Re: Rules engine and Solr

2010-01-06 Thread Avlesh Singh

Thanks for the revert, Ravi.

I am currently working on some of kind rules in front
> (application side) of our solr instance. These rules are more application
> specific and are not general. Like deciding which fields to facet, which
> fields to return in response, which fields to highlight, boost value for
> each field (both at query time and at index time).
>  The approach I have taken is to define a database table which
> holds these fields parameters. Which are then interpreted by my application
> to decide the query to be sent to Solr. This allow tweaking the Solr fields
> on the fly and hence influence the search results.
>
I guess, this is the usual usage of solr server. In my case this is no
different. Search queries have a personalized experience, which means
behaviors for facets, highlighting etc .. are customizable. We pull it off
using databases and java data structures.

I will be interested to hear from you about the "Kind" of rules you talk
> about and your approach towards it. Are these "Rules" like a regular
> expression that when matched with the "user query", execute a specific
> "solr
> query" ?
>
http://en.wikipedia.org/wiki/Business_rules_engine

Cheers
Avlesh

On Wed, Jan 6, 2010 at 12:12 PM, Ravi Gidwani wrote:

> Avlesh:
>   I am currently working on some of kind rules in front
> (application side) of our solr instance. These rules are more application
> specific and are not general. Like deciding which fields to facet, which
> fields to return in response, which fields to highlight, boost value for
> each field (both at query time and at index time).
>  The approach I have taken is to define a database table which
> holds these fields parameters. Which are then interpreted by my application
> to decide the query to be sent to Solr. This allow tweaking the Solr fields
> on the fly and hence influence the search results.
>
> I will be interested to hear from you about the "Kind" of rules you talk
> about and your approach towards it. Are these "Rules" like a regular
> expression that when matched with the "user query", execute a specific
> "solr
> query" ?
>
> ~Ravi
>
> On Tue, Jan 5, 2010 at 8:25 PM, Avlesh Singh  wrote:
>
> > >
> > > Your question appears to be an "XY Problem" ... that is: you are
> dealing
> > > with "X", you are assuming "Y" will help you, and you are asking about
> > "Y"
> > > without giving more details about the "X" so that we can understand the
> > full
> > > issue.  Perhaps the best solution doesn't involve "Y" at all? See Also:
> > > http://www.perlmonks.org/index.pl?node_id=542341
> > >
> > Hahaha, thats classic Hoss!
> > Thanks for introducing me to the XY problem. Had I known the two
> > completely,
> > I wouldn't have posted it on the mailing list. And I wasn't looking for a
> > "solution" either. Anyways, as I replied back earlier, I'll get back with
> > questions once I get more clarity.
> >
> > Cheers
> > Avlesh
> >
> > On Wed, Jan 6, 2010 at 2:02 AM, Chris Hostetter <
> hossman_luc...@fucit.org
> > >wrote:
> >
> > >
> > > : I am planning to build a rules engine on top search. The rules are
> > > database
> > > : driven and can't be stored inside solr indexes. These rules would
> > > ultimately
> > > : two do things -
> > > :
> > > :1. Change the order of Lucene hits.
> > > :2. Add/remove some results to/from the Lucene hits.
> > > :
> > > : What should be my starting point? Custom search handler?
> > >
> > > This smells like an XY problem ... can you elaborate on the types of
> > > rules/conditions/situations when you want #1 and #2 listed above to
> > > happen?
> > >
> > > http://people.apache.org/~hossman/#xyproblem
> 
> > 
> > > XY Problem
> > >
> > > Your question appears to be an "XY Problem" ... that is: you are
> dealing
> > > with "X", you are assuming "Y" will help you, and you are asking about
> > "Y"
> > > without giving more details about the "X" so that we can understand the
> > > full issue.  Perhaps the best solution doesn't involve "Y" at all?
> > > See Also: http://www.perlmonks.org/index.pl?node_id=542341
> > >
> > >
> > >
> > >
> > >
> > > -Hoss
> > >
> > >
> >
>

readOnly=true IndexReader

2010-01-06 Thread Patrick Sauts

In the Wiki page : 
http://wiki.apache.org/lucene-java/ImproveSearchingSpeed, I've found
-Open the IndexReader with readOnly=true. This makes a big difference 
when multiple threads are sharing the same reader, as it removes certain 
sources of thread contention.


How to open the IndexReader with readOnly=true ?
I can't find anything related to this parameter.

Do the VJM parameters -Dslave=disabled or -Dmaster=disabled have any 
incidence on solr with a standart solrConfig.xml?


Thank you for your answers.

Patrick.

Re: readOnly=true IndexReader

2010-01-06 Thread Shalin Shekhar Mangar

On Wed, Jan 6, 2010 at 4:26 PM, Patrick Sauts wrote:

> In the Wiki page :
> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed, I've found
> -Open the IndexReader with readOnly=true. This makes a big difference when
> multiple threads are sharing the same reader, as it removes certain sources
> of thread contention.
>
> How to open the IndexReader with readOnly=true ?
> I can't find anything related to this parameter.
>
>
Solr always opens IndexReader with readOnly=true. It was added with SOLR-730
and released in Solr 1.3

-- 
Regards,
Shalin Shekhar Mangar.

Re: readOnly=true IndexReader

2010-01-06 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Wed, Jan 6, 2010 at 4:26 PM, Patrick Sauts  wrote:
> In the Wiki page : http://wiki.apache.org/lucene-java/ImproveSearchingSpeed,
> I've found
> -Open the IndexReader with readOnly=true. This makes a big difference when
> multiple threads are sharing the same reader, as it removes certain sources
> of thread contention.
>
> How to open the IndexReader with readOnly=true ?
> I can't find anything related to this parameter.
>
> Do the VJM parameters -Dslave=disabled or -Dmaster=disabled have any
> incidence on solr with a standart solrConfig.xml?
these are not variables used by Solr. These are just substituted in
solrconfig.xml and probably consumed by ReplicationHandler (this is
not a standard)
>
> Thank you for your answers.
>
> Patrick.
>



-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

schema.xml and Xinclude

2010-01-06 Thread Patrick Sauts

As  in schema.xml are the same between all our indexes, I'd like 
to make them an XInclude so I tried :




xmlns:xi="http://www.w3.org/2001/XInclude";>


 

-
-
-


My Syntax might not be correct ?
Or it is not possible ? yet ?

Thank you again for your time.

Patrick.

Yankee's Solr integration

2010-01-06 Thread Nicolas Kern

Hello everybody,

I was wordering how did Yankee (
http://www.yankeegroup.com/search.do?searchType=advancedSearch) did to
provide the possibility to Create Alerts, Save Searches, and generate a RSS
Feed out of a custom search using Solr, do you have any idea ?

Thanks a lot,
Best regards & happy new year !
Nicolas

Re: Basic sentence parsing with the regex highlighter fragmenter

2010-01-06 Thread Erick Erickson

Hmmm, the name WordDelimiterFilterFactory might be leading
you astray. Its purpose isn't to break things up into "words"
that have anything to do with grammatical rules. Rather, it's
purpose is to break up strings of funky characters into
searchable stuff. see:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

In the grammatical sense, PowerShot should just be
PowerShot, not power shot (which is what WordDelimiterFactory
gives you, options permitting). So I think you probably want
one of the other analyzers

Have you tried any other analyzers? StandardAnalyzer might be
more friendly

HTH
Erick

On Tue, Jan 5, 2010 at 5:18 PM, Caleb Land  wrote:

> I've tracked this problem down to the fact that I'm using the
> WordDelimiterFilter. I don't quite understand what's happening, but if I
> add preserveOriginal="1" as an option, everything looks fine. I think it
> has
> to do with the period being stripped in the token stream.
>
> On Tue, Jan 5, 2010 at 2:05 PM, Caleb Land  wrote:
>
> > Hello,
> > I'm using Solr 1.4, and I'm trying to get the regex fragmenter to parse
> > basic sentences, and I'm running into a problem.
> >
> > I'm using the default regex specified in the example solr configuration:
> >
> > [-\w ,/\n\"']{20,200}
> >
> > But I am using a larger fragment size (140) with a slop of 1.0.
> >
> > Given the passage:
> >
> > Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla a neque a
> > ipsum accumsan iaculis at id lacus. Sed magna velit, aliquam ut congue
> > vitae, molestie quis nunc.
> >
> > When I search for "Nulla" (the first word of the second sentence) and
> grab
> > the first highlighted snippet, this is what I get:
> >
> > . Nulla a neque a ipsum accumsan iaculis at id lacus
> >
> > As you can see, there's a leading period from the previous sentence and
> the
> > period from the current sentence is missing.
> >
> > I understand this regex isn't that advanced, but I've tried everything I
> > can think of, regex-wise, to get this to work, and I always end up with
> this
> > problem.
> >
> > For example, I've tried: \w[^.!?]{0,200}[.!?]
> >
> > Which seems like it should include the ending punctuation, but it
> doesn't,
> > so I think I'm missing something.
> >
> > Does anybody know a regex that works?
> > --
> > Caleb Land
> >
>
>
>
> --
> Caleb Land
>

Re: performance question

2010-01-06 Thread A. Steven Anderson

> Strictly speaking there is some insignificant distinctions in performance
> related to how a field name is resolved -- Grant alluded to this
> earlier in this thread -- but it only comes into play when you actually
> refer to that field by name and Solr has to "look them up" in the
> metadata.  So for example if your request refered to 100 differnet field
> names in the q, fq, and facet.field params there would be a small overhead
> for any of those 100 fields that existed because of 
> declarations, that would not exist for any of those fields that were
> declared using  -- but there would be no added overhead to htat
> query if there were 999 other fields that existed in your index
> because of that same  declaration.
>
> But frankly: we're getting talking about seriously ridiculous
> "pico-optimizing" at this point ... if you find yourselv with performance
> concerns there are probaly 500 other things worth worrying about before
> this should ever cross your mind.
>

Thanks for the follow up.

I've converted our schema to required fields only with every other field
being a dynamic field.

The only negative that I've found so far is that you lose the copyField
capability, so it makes my ingest a little bigger, since I have to manually
copy the values myself.

-- 
A. Steven Anderson
Independent Consultant
st...@asanderson.com

Re: performance question

2010-01-06 Thread Erik Hatcher

You don't lose copyField capability with dynamic fields.  You can copy  
dynamic fields into a fixed field name like *_s => text or dynamic  
fields into another dynamic field like  *_s => *_t


Erik

On Jan 6, 2010, at 9:35 AM, A. Steven Anderson wrote:

Strictly speaking there is some insignificant distinctions in  
performance

related to how a field name is resolved -- Grant alluded to this
earlier in this thread -- but it only comes into play when you  
actually

refer to that field by name and Solr has to "look them up" in the
metadata.  So for example if your request refered to 100 differnet  
field
names in the q, fq, and facet.field params there would be a small  
overhead

for any of those 100 fields that existed because of 
declarations, that would not exist for any of those fields that were
declared using  -- but there would be no added overhead to  
htat

query if there were 999 other fields that existed in your index
because of that same  declaration.

But frankly: we're getting talking about seriously ridiculous
"pico-optimizing" at this point ... if you find yourselv with  
performance
concerns there are probaly 500 other things worth worrying about  
before

this should ever cross your mind.



Thanks for the follow up.

I've converted our schema to required fields only with every other  
field

being a dynamic field.

The only negative that I've found so far is that you lose the  
copyField
capability, so it makes my ingest a little bigger, since I have to  
manually

copy the values myself.

--
A. Steven Anderson
Independent Consultant
st...@asanderson.com

ord on TrieDateField always returning max

2010-01-06 Thread Nagelberg, Kallin

Hi everyone,

I've been trying to add a date based boost to my queries. I have a field like:




When I look at the datetime field in the solr schema browser I can see that 
there are 9051 distinct dates.

When I try to add the parameter to my query like: bf=ord(datetime) (on a dismax 
query) I always get 9051 as the result of the function. I see this in the debug 
data:


1698.6041 = (MATCH) FunctionQuery(top(ord(datetime))), product of:

9051.0 = 9051

1.0 = boost

0.18767032 = queryNorm



It is exactly the same for every result, even though each result has a 
different value for datetime.



Does anyone have any suggestions as to why this could be happening? I have done 
extensive googling with no luck.



Thanks,

Kallin Nagelberg.

replication --> missing field data file

2010-01-06 Thread Giovanni Fernandez-Kincade

I set up replication between 2 cores on one master and 2 cores on one slave. 
Before doing this the master was working without issues, and I stopped all 
indexing on the master.

Now that replication has synced the index files, an .FDT field is suddenly 
missing on both the master and the slave. Pretty much every operation (core 
reload, commit, add document) fails with an error like the one posted below.

How could this happen? How can one recover from such an error? Is there any way 
to regenerate the FDT file without re-indexing everything?

This brings me to a question about backups. If I run the 
replication?command=backup command, where is this backup stored? I've tried 
this a few times and get an OK response from the machine, but I don't see the 
backup generated anywhere.

Thanks,
Gio.

org.apache.solr.common.SolrException: Error handling 'reload' action
   at 
org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:412)
   at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:142)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at 
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:298)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174)
   at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
   at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
   at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
   at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
   at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
   at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
   at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
   at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
   at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
   at 
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
   at 
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
   at 
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
   at 
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
   at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file 
specified)
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
   at org.apache.solr.core.SolrCore.(SolrCore.java:579)
   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:425)
   at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:486)
   at 
org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:409)
   ... 18 more
Caused by: java.io.FileNotFoundException: 
Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file 
specified)
   at java.io.RandomAccessFile.open(Native Method)
   at java.io.RandomAccessFile.(Unknown Source)
   at 
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.(SimpleFSDirectory.java:78)
   at 
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.(SimpleFSDirectory.java:108)
   at 
org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:65)
   at 
org.apache.lucene.index.FieldsReader.(FieldsReader.java:104)
   at 
org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:277)
   at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:640)
   at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:599)
   at 
org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:103)
   at 
org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirectoryReader.java:27)
   at 
org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:73)
   at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:704)
   at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:68)
   at org.apache.lucene.index.IndexReader.open(IndexReader.java:476)
   at org.apache.lucene.index.IndexReader.open(IndexReader.java:403)
   at 
org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:38)
   at org.apache.solr.core.SolrCore.getSearcher(So

Re: ord on TrieDateField always returning max

2010-01-06 Thread Yonik Seeley

Besides using up a lot more memory, ord() isn't even going to work for
a field with multiple tokens indexed per value (like tdate).
I'd recommend using a function on the date value itself.
http://wiki.apache.org/solr/FunctionQuery#ms

-Yonik
http://www.lucidimagination.com



On Wed, Jan 6, 2010 at 10:52 AM, Nagelberg, Kallin
 wrote:
> Hi everyone,
>
> I've been trying to add a date based boost to my queries. I have a field like:
>
>  precisionStep="6" positionIncrementGap="0"/>
>  required="true" />
>
> When I look at the datetime field in the solr schema browser I can see that 
> there are 9051 distinct dates.
>
> When I try to add the parameter to my query like: bf=ord(datetime) (on a 
> dismax query) I always get 9051 as the result of the function. I see this in 
> the debug data:
>
>
> 1698.6041 = (MATCH) FunctionQuery(top(ord(datetime))), product of:
>
>    9051.0 = 9051
>
>    1.0 = boost
>
>    0.18767032 = queryNorm
>
>
>
> It is exactly the same for every result, even though each result has a 
> different value for datetime.
>
>
>
> Does anyone have any suggestions as to why this could be happening? I have 
> done extensive googling with no luck.
>
>
>
> Thanks,
>
> Kallin Nagelberg.
>
>

RE: ord on TrieDateField always returning max

2010-01-06 Thread Nagelberg, Kallin

Thanks Yonik, I was just looking at that actually.
Trying something like recip(ms(NOW,datetime),3.16e-11,1,1)^10  now.
My 'inspiration' for the ord method was actually the Solr 1.4 Enterprise Search 
server book. Page 126 has a section 'using reciprocals and rord with dates'. 
You should let those guys know what's up!

Thanks,
Kallin.

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Wednesday, January 06, 2010 11:23 AM
To: solr-user@lucene.apache.org
Subject: Re: ord on TrieDateField always returning max

Besides using up a lot more memory, ord() isn't even going to work for
a field with multiple tokens indexed per value (like tdate).
I'd recommend using a function on the date value itself.
http://wiki.apache.org/solr/FunctionQuery#ms

-Yonik
http://www.lucidimagination.com

On Wed, Jan 6, 2010 at 10:52 AM, Nagelberg, Kallin
 wrote:
> Hi everyone,
>
> I've been trying to add a date based boost to my queries. I have a field like:
>
>  precisionStep="6" positionIncrementGap="0"/>
>  required="true" />
>
> When I look at the datetime field in the solr schema browser I can see that 
> there are 9051 distinct dates.
>
> When I try to add the parameter to my query like: bf=ord(datetime) (on a 
> dismax query) I always get 9051 as the result of the function. I see this in 
> the debug data:
>
>
> 1698.6041 = (MATCH) FunctionQuery(top(ord(datetime))), product of:
>
>    9051.0 = 9051
>
>    1.0 = boost
>
>    0.18767032 = queryNorm
>
>
>
> It is exactly the same for every result, even though each result has a 
> different value for datetime.
>
>
>
> Does anyone have any suggestions as to why this could be happening? I have 
> done extensive googling with no luck.
>
>
>
> Thanks,
>
> Kallin Nagelberg.
>
>

Re: ord on TrieDateField always returning max

2010-01-06 Thread Yonik Seeley

On Wed, Jan 6, 2010 at 11:26 AM, Nagelberg, Kallin
 wrote:
> Thanks Yonik, I was just looking at that actually.
> Trying something like recip(ms(NOW,datetime),3.16e-11,1,1)^10  now.

I'd also recommend looking into a multiplicative boost too - IMO they
normally make more sense.
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents

-Yonik
http://www.lucidimagination.com




> My 'inspiration' for the ord method was actually the Solr 1.4 Enterprise 
> Search server book. Page 126 has a section 'using reciprocals and rord with 
> dates'. You should let those guys know what's up!
>
> Thanks,
> Kallin.
>
> -Original Message-
> From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
> Sent: Wednesday, January 06, 2010 11:23 AM
> To: solr-user@lucene.apache.org
> Subject: Re: ord on TrieDateField always returning max
>
> Besides using up a lot more memory, ord() isn't even going to work for
> a field with multiple tokens indexed per value (like tdate).
> I'd recommend using a function on the date value itself.
> http://wiki.apache.org/solr/FunctionQuery#ms
>
> -Yonik
> http://www.lucidimagination.com
>
>
>
> On Wed, Jan 6, 2010 at 10:52 AM, Nagelberg, Kallin
>  wrote:
>> Hi everyone,
>>
>> I've been trying to add a date based boost to my queries. I have a field 
>> like:
>>
>> > precisionStep="6" positionIncrementGap="0"/>
>> > required="true" />
>>
>> When I look at the datetime field in the solr schema browser I can see that 
>> there are 9051 distinct dates.
>>
>> When I try to add the parameter to my query like: bf=ord(datetime) (on a 
>> dismax query) I always get 9051 as the result of the function. I see this in 
>> the debug data:
>>
>>
>> 1698.6041 = (MATCH) FunctionQuery(top(ord(datetime))), product of:
>>
>>    9051.0 = 9051
>>
>>    1.0 = boost
>>
>>    0.18767032 = queryNorm
>>
>>
>>
>> It is exactly the same for every result, even though each result has a 
>> different value for datetime.
>>
>>
>>
>> Does anyone have any suggestions as to why this could be happening? I have 
>> done extensive googling with no luck.
>>
>>
>>
>> Thanks,
>>
>> Kallin Nagelberg.
>>
>>
>

Re: performance question

2010-01-06 Thread A. Steven Anderson

> You don't lose copyField capability with dynamic fields.  You can copy
> dynamic fields into a fixed field name like *_s => text or dynamic fields
> into another dynamic field like  *_s => *_t


Ahhh...I missed that little detail.  Nice!

Ok, so there are no negatives to using dynamic fields then. ;-)

Thanks for all the info!

-- 
A. Steven Anderson
Independent Consultant
st...@asanderson.com

Re: replication --> missing field data file

2010-01-06 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Wed, Jan 6, 2010 at 9:49 PM, Giovanni Fernandez-Kincade
 wrote:
> I set up replication between 2 cores on one master and 2 cores on one slave. 
> Before doing this the master was working without issues, and I stopped all 
> indexing on the master.
>
> Now that replication has synced the index files, an .FDT field is suddenly 
> missing on both the master and the slave. Pretty much every operation (core 
> reload, commit, add document) fails with an error like the one posted below.
>
> How could this happen? How can one recover from such an error? Is there any 
> way to regenerate the FDT file without re-indexing everything?
>
> This brings me to a question about backups. If I run the 
> replication?command=backup command, where is this backup stored? I've tried 
> this a few times and get an OK response from the machine, but I don't see the 
> backup generated anywhere.
The backup is done asynchronously. So it always gives an OK response immedietly.
The backup is created in the data dir itself
>
> Thanks,
> Gio.
>
> org.apache.solr.common.SolrException: Error handling 'reload' action
>       at 
> org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:412)
>       at 
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:142)
>       at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>       at 
> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:298)
>       at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174)
>       at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
>       at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
>       at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
>       at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
>       at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>       at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
>       at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
>       at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
>       at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
>       at 
> org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
>       at 
> org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
>       at 
> org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
>       at 
> org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
>       at java.lang.Thread.run(Unknown Source)
> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
> Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file 
> specified)
>       at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
>       at org.apache.solr.core.SolrCore.(SolrCore.java:579)
>       at org.apache.solr.core.CoreContainer.create(CoreContainer.java:425)
>       at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:486)
>       at 
> org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:409)
>       ... 18 more
> Caused by: java.io.FileNotFoundException: 
> Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file 
> specified)
>       at java.io.RandomAccessFile.open(Native Method)
>       at java.io.RandomAccessFile.(Unknown Source)
>       at 
> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.(SimpleFSDirectory.java:78)
>       at 
> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.(SimpleFSDirectory.java:108)
>       at 
> org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:65)
>       at 
> org.apache.lucene.index.FieldsReader.(FieldsReader.java:104)
>       at 
> org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:277)
>       at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:640)
>       at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:599)
>       at 
> org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:103)
>       at 
> org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirectoryReader.java:27)
>       at 
> org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:73)
>       at 
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:704)
>       at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:68)
>       at org.apache.lucene.index.IndexReader.open(IndexReader.java:476)
>       at

solr and patch - SOLR-64 SOLR-792

2010-01-06 Thread Thibaut Lassalle

hi,

I tried to apply patches to solr-1.4

Here is the result

javad...@javadev5:~/Java/apache-solr-1.4.0$ patch -p0 < SOLR-64.patch
patching file src/java/org/apache/solr/schema/HierarchicalFacetField.java
patching file src/common/org/apache/solr/common/params/FacetParams.java
Hunk #1 FAILED at 108.
1 out of 1 hunk FAILED -- saving rejects to file
src/common/org/apache/solr/common/params/FacetParams.java.rej
patching file example/solr/conf/schema.xml
Hunk #1 FAILED at 144.
Hunk #2 FAILED at 417.
2 out of 2 hunks FAILED -- saving rejects to file
example/solr/conf/schema.xml.rej
patching file src/java/org/apache/solr/request/SimpleFacets.java
Hunk #1 FAILED at 33.
Hunk #2 FAILED at 227.
Hunk #3 FAILED at 238.
Hunk #4 FAILED at 484.
Hunk #5 FAILED at 541.
5 out of 5 hunks FAILED -- saving rejects to file
src/java/org/apache/solr/request/SimpleFacets.java.rej
javad...@javadev5:~/Java/apache-solr-1.4.0$ patch -p0

Re: solr and patch - SOLR-64 SOLR-792

2010-01-06 Thread Erik Hatcher

You probably aren't doing anything wrong, other than those patches are  
a bit out of date with trunk.  You might have to fight through getting  
them current a bit, or wait until I or someone else can get to  
updating them.


Erik

On Jan 6, 2010, at 11:52 AM, Thibaut Lassalle wrote:


hi,

I tried to apply patches to solr-1.4

Here is the result

javad...@javadev5:~/Java/apache-solr-1.4.0$ patch -p0 < SOLR-64.patch
patching file src/java/org/apache/solr/schema/ 
HierarchicalFacetField.java
patching file src/common/org/apache/solr/common/params/ 
FacetParams.java

Hunk #1 FAILED at 108.
1 out of 1 hunk FAILED -- saving rejects to file
src/common/org/apache/solr/common/params/FacetParams.java.rej
patching file example/solr/conf/schema.xml
Hunk #1 FAILED at 144.
Hunk #2 FAILED at 417.
2 out of 2 hunks FAILED -- saving rejects to file
example/solr/conf/schema.xml.rej
patching file src/java/org/apache/solr/request/SimpleFacets.java
Hunk #1 FAILED at 33.
Hunk #2 FAILED at 227.
Hunk #3 FAILED at 238.
Hunk #4 FAILED at 484.
Hunk #5 FAILED at 541.
5 out of 5 hunks FAILED -- saving rejects to file
src/java/org/apache/solr/request/SimpleFacets.java.rej
javad...@javadev5:~/Java/apache-solr-1.4.0$ patch -p0

RE: replication --> missing field data file

2010-01-06 Thread Giovanni Fernandez-Kincade

How can you differentiate between the backup and the normal index files?

-Original Message-
From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf Of Noble 
Paul ??? ??
Sent: Wednesday, January 06, 2010 11:52 AM
To: solr-user
Subject: Re: replication --> missing field data file

On Wed, Jan 6, 2010 at 9:49 PM, Giovanni Fernandez-Kincade
 wrote:
> I set up replication between 2 cores on one master and 2 cores on one slave. 
> Before doing this the master was working without issues, and I stopped all 
> indexing on the master.
>
> Now that replication has synced the index files, an .FDT field is suddenly 
> missing on both the master and the slave. Pretty much every operation (core 
> reload, commit, add document) fails with an error like the one posted below.
>
> How could this happen? How can one recover from such an error? Is there any 
> way to regenerate the FDT file without re-indexing everything?
>
> This brings me to a question about backups. If I run the 
> replication?command=backup command, where is this backup stored? I've tried 
> this a few times and get an OK response from the machine, but I don't see the 
> backup generated anywhere.
The backup is done asynchronously. So it always gives an OK response immedietly.
The backup is created in the data dir itself
>
> Thanks,
> Gio.
>
> org.apache.solr.common.SolrException: Error handling 'reload' action
>       at 
> org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:412)
>       at 
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:142)
>       at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>       at 
> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:298)
>       at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174)
>       at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
>       at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
>       at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
>       at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
>       at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>       at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
>       at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
>       at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
>       at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
>       at 
> org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
>       at 
> org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
>       at 
> org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
>       at 
> org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
>       at java.lang.Thread.run(Unknown Source)
> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
> Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file 
> specified)
>       at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
>       at org.apache.solr.core.SolrCore.(SolrCore.java:579)
>       at org.apache.solr.core.CoreContainer.create(CoreContainer.java:425)
>       at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:486)
>       at 
> org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:409)
>       ... 18 more
> Caused by: java.io.FileNotFoundException: 
> Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file 
> specified)
>       at java.io.RandomAccessFile.open(Native Method)
>       at java.io.RandomAccessFile.(Unknown Source)
>       at 
> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.(SimpleFSDirectory.java:78)
>       at 
> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.(SimpleFSDirectory.java:108)
>       at 
> org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:65)
>       at 
> org.apache.lucene.index.FieldsReader.(FieldsReader.java:104)
>       at 
> org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:277)
>       at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:640)
>       at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:599)
>       at 
> org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:103)
>       at 
> org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirectoryReader.java:27)
>       at 
> org.apache.lucene.index.Di

Re: replication --> missing field data file

2010-01-06 Thread Noble Paul നോബിള്‍ नोब्ळ्

the index dir is in the name "index" others will be stored as
index

On Wed, Jan 6, 2010 at 10:31 PM, Giovanni Fernandez-Kincade
 wrote:
> How can you differentiate between the backup and the normal index files?
>
> -Original Message-
> From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf Of Noble 
> Paul ??? ??
> Sent: Wednesday, January 06, 2010 11:52 AM
> To: solr-user
> Subject: Re: replication --> missing field data file
>
> On Wed, Jan 6, 2010 at 9:49 PM, Giovanni Fernandez-Kincade
>  wrote:
>> I set up replication between 2 cores on one master and 2 cores on one slave. 
>> Before doing this the master was working without issues, and I stopped all 
>> indexing on the master.
>>
>> Now that replication has synced the index files, an .FDT field is suddenly 
>> missing on both the master and the slave. Pretty much every operation (core 
>> reload, commit, add document) fails with an error like the one posted below.
>>
>> How could this happen? How can one recover from such an error? Is there any 
>> way to regenerate the FDT file without re-indexing everything?
>>
>> This brings me to a question about backups. If I run the 
>> replication?command=backup command, where is this backup stored? I've tried 
>> this a few times and get an OK response from the machine, but I don't see 
>> the backup generated anywhere.
> The backup is done asynchronously. So it always gives an OK response 
> immedietly.
> The backup is created in the data dir itself
>>
>> Thanks,
>> Gio.
>>
>> org.apache.solr.common.SolrException: Error handling 'reload' action
>>       at 
>> org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:412)
>>       at 
>> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:142)
>>       at 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>>       at 
>> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:298)
>>       at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174)
>>       at 
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
>>       at 
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
>>       at 
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
>>       at 
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
>>       at 
>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>>       at 
>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
>>       at 
>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
>>       at 
>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
>>       at 
>> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
>>       at 
>> org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
>>       at 
>> org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
>>       at 
>> org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
>>       at 
>> org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
>>       at java.lang.Thread.run(Unknown Source)
>> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
>> Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file 
>> specified)
>>       at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
>>       at org.apache.solr.core.SolrCore.(SolrCore.java:579)
>>       at org.apache.solr.core.CoreContainer.create(CoreContainer.java:425)
>>       at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:486)
>>       at 
>> org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:409)
>>       ... 18 more
>> Caused by: java.io.FileNotFoundException: 
>> Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file 
>> specified)
>>       at java.io.RandomAccessFile.open(Native Method)
>>       at java.io.RandomAccessFile.(Unknown Source)
>>       at 
>> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.(SimpleFSDirectory.java:78)
>>       at 
>> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.(SimpleFSDirectory.java:108)
>>       at 
>> org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:65)
>>       at 
>> org.apache.lucene.index.FieldsReader.(FieldsReader.java:104)
>>       at 
>> org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:277)
>>       at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:640)
>>       at org.apache.lucene.index.SegmentReader.get(Segme

Re: Solr Cell - PDFs plus literal metadata - GET or POST ?

2010-01-06 Thread Ross

On Tue, Jan 5, 2010 at 2:25 PM, Giovanni Fernandez-Kincade
 wrote:
> Really? Doesn't it have to be delimited differently, if both the file 
> contents and the document metadata will be part of the POST data? How does 
> Solr Cell tell the difference between the literals and the start of the file? 
> I've tried this before and haven't had any luck with it.

Thanks Shalin.

And Giovanni, yes it definitely works.

This will set literal.mydata to the contents of mydata.txt

curl 
"http://localhost:8983/solr/update/extract?literal.id=doc1&uprefix=attr_&fmap.content=attr_content&commit=true";
-F "myfi...@tutorial.html" -F "literal.mydata=
> -Original Message-
> From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com]
> Sent: Monday, January 04, 2010 4:28 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Cell - PDFs plus literal metadata - GET or POST ?
>
> On Wed, Dec 30, 2009 at 7:49 AM, Ross  wrote:
>
>> Hi all
>>
>> I'm experimenting with Solr. I've successfully indexed some PDFs and
>> all looks good but now I want to index some PDFs with metadata pulled
>> from another source. I see this example in the docs.
>>
>> curl "
>> http://localhost:8983/solr/update/extract?literal.id=doc4&captureAttr=true&defaultField=text&capture=div&fmap.div=foo_t&boost.foo_t=3&literal.blah_s=Bah
>> "
>>  -F "tutori...@tutorial.pdf"
>>
>> I can write code to generate a script with those commands substituting
>> my own literal.whatever.  My metadata could be up to a couple of KB in
>> size. Is there a way of making the literal a POST variable rather than
>> a GET?
>
>
> With Curl? Yes, see the man page.
>
>
>>  Will Solr Cell accept it as a POST?
>
>
> Yes, it will.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Strange Behavior When Using CSVRequestHandler

2010-01-06 Thread danben

The problem:

Not all of the documents that I expect to be indexed are showing up in the
index.

The background:

I start off with an empty index based on a schema with a single field named
'query', marked as unique and using the following analyzer:

My input is a utf-8 encoded file with one sentence per line. Its total size
is about 60MB. I would like each line of the file to correspond to a single
document in the solr index. If I print the number of unique lines in the
file (using cat | sort | uniq | wc -l), I get a little over 2M. Printing
the total number of lines in the file gives me around 2.7M.

I use the following to start indexing:

curl
'http://localhost:8983/solr/update/csv?commit=true&separator=%09&stream.file=/home/gkropitz/querystage2map/file1&stream.contentType=text/plain;charset=utf-8&fieldnames=query&escape=\'

When this command completes, I see numDocs is approximately 470k (which is
what I find strange) and maxDocs is approximately 890k (which is fine since
I know I have around 700k duplicates). Even more confusing is that if I run
this exact command a second time without performing any other operations,
numDocs goes up to around 610k, and a third time brings it up to about 750k.

Can anyone tell me what might cause Solr not to index everything in my input
file the first time, and why it would be able to index new documents the
second and third times?

I also have this line in solrconfig.xml, if it matters:

Thanks,
Dan

--
View this message in context:
http://old.nabble.com/Strange-Behavior-When-Using-CSVRequestHandler-tp27026926p27026926.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 1.4 - stats page slow

2010-01-06 Thread Stephen Weiss

Sorry, know I'm a little late in replying but the LukeRequestHandler  
tip was just what I needed!  Thank you so much.


--
Steve

On Dec 25, 2009, at 2:03 AM, Chris Hostetter wrote:



: I've noticed this as well, usually when working with a large field  
cache. I
: haven't done in-depth analysis of this yet, but it seems like when  
the stats
: page is trying to pull data from a large field cache it takes  
quite a long

: time.

In Solr 1.4, the stats page was modified to start reporting stats on  
the

FieldCache (using the new FieldCache introspection API added by Lucene
Java 2.9) so that may be what you are seeing.

: > more than 10 seconds.  We call this programmatically to retrieve  
the last
: > commit date so that we can keep users from committing too  
frequently.  This
: > means some of our administration pages are now taking a long  
time to load.


i'm not really following this ... what piece of data from the  
stats.jsp

are you using to compute/infer a commit date?

if you are looking at registration date of the SolrIndexSearcher you  
can

also get that from the LukeRequestHandler which is much more efficient
(it has options for limiting the work it does)...


http://localhost:8983/solr/admin/luke?numTerms=0&fl=BOGUS




-Hoss

How to ignore term frequency >1? Field-specific Similarity class?

2010-01-06 Thread Andreas Schwarz

Hi,

I want to modify scoring to ignore term frequency > 1. This is useful for short 
fields like titles or subjects, where the number of times a term appears does 
not correspond to relevancy. I found several discussions of this problem, and 
also an implementation that changes the Similarity class to achieve this 
(http://osdir.com/ml/solr-user.lucene.apache.org/2009-09/msg00672.html). 
However, this change is global, but I only need the behavior for some fields. 
What's the best way to do this? Is there a way to use a field-specific 
similarity class, or to evaluate field names/parameters inside a Similarity 
class?

Thanks!
Andreas

Re: Basic sentence parsing with the regex highlighter fragmenter

2010-01-06 Thread Caleb Land

I've looked at the docs/source for WordDelimiterFilter, and I understand
what it does now.

Here is my configuration:

http://gist.github.com/270590

I've tried the StandardTokenizerFactory instead of the
WhitespaceTokenizerFactory, but I get the same problem as before, a the
period from the previous sentence shows up and the period from the current
sentence is cut off of highlighter fragments.

I've tried the WhitespaceTokenizer with the StandardFilter, and this kinda
works, but to match a word at the end of a sentence, you need to search for
the period at the end of the sentence (the period is being tokenized along
with the word).

In any case, if I use the WordDelimiterFilter or add preserveOriginal="1",
everything seems to work. (If I remove the WordDelimiterFilter, the periods
are indexed with the word they're connected to, and searching for those
words doesn't match unless the user includes the period)

I'm trying to go through the code to understand how this works.

On Wed, Jan 6, 2010 at 9:13 AM, Erick Erickson wrote:

> Hmmm, the name WordDelimiterFilterFactory might be leading
> you astray. Its purpose isn't to break things up into "words"
> that have anything to do with grammatical rules. Rather, it's
> purpose is to break up strings of funky characters into
> searchable stuff. see:
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory
>
> In the grammatical sense, PowerShot should just be
> PowerShot, not power shot (which is what WordDelimiterFactory
> gives you, options permitting). So I think you probably want
> one of the other analyzers
>
> Have you tried any other analyzers? StandardAnalyzer might be
> more friendly
>
> HTH
> Erick
>
> On Tue, Jan 5, 2010 at 5:18 PM, Caleb Land  wrote:
>
> > I've tracked this problem down to the fact that I'm using the
> > WordDelimiterFilter. I don't quite understand what's happening, but if I
> > add preserveOriginal="1" as an option, everything looks fine. I think it
> > has
> > to do with the period being stripped in the token stream.
> >
> > On Tue, Jan 5, 2010 at 2:05 PM, Caleb Land  wrote:
> >
> > > Hello,
> > > I'm using Solr 1.4, and I'm trying to get the regex fragmenter to parse
> > > basic sentences, and I'm running into a problem.
> > >
> > > I'm using the default regex specified in the example solr
> configuration:
> > >
> > > [-\w ,/\n\"']{20,200}
> > >
> > > But I am using a larger fragment size (140) with a slop of 1.0.
> > >
> > > Given the passage:
> > >
> > > Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla a neque
> a
> > > ipsum accumsan iaculis at id lacus. Sed magna velit, aliquam ut congue
> > > vitae, molestie quis nunc.
> > >
> > > When I search for "Nulla" (the first word of the second sentence) and
> > grab
> > > the first highlighted snippet, this is what I get:
> > >
> > > . Nulla a neque a ipsum accumsan iaculis at id lacus
> > >
> > > As you can see, there's a leading period from the previous sentence and
> > the
> > > period from the current sentence is missing.
> > >
> > > I understand this regex isn't that advanced, but I've tried everything
> I
> > > can think of, regex-wise, to get this to work, and I always end up with
> > this
> > > problem.
> > >
> > > For example, I've tried: \w[^.!?]{0,200}[.!?]
> > >
> > > Which seems like it should include the ending punctuation, but it
> > doesn't,
> > > so I think I'm missing something.
> > >
> > > Does anybody know a regex that works?
> > > --
> > > Caleb Land
> > >
> >
> >
> >
> > --
> > Caleb Land
> >
>



-- 
Caleb Land

No Analyzer, tokenizer or stemmer works at Solr

2010-01-06 Thread MitchK


I have tested a lot and all the time I thought I set wrong options for my
custom analyzer.
Well, I have noticed that Solr isn't using ANY analyzer, filter or stemmer.
It seems like it only stores the original input.

I am using the example-configuration of the current Solr 1.4 release.
What's wrong?

Thank you!
-- 
View this message in context: 
http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27026959.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: replication --> missing field data file

2010-01-06 Thread Giovanni Fernandez-Kincade

How can you tell when the backup is done? 

-Original Message-
From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf Of Noble 
Paul ??? ??
Sent: Wednesday, January 06, 2010 12:23 PM
To: solr-user
Subject: Re: replication --> missing field data file

the index dir is in the name "index" others will be stored as
index

On Wed, Jan 6, 2010 at 10:31 PM, Giovanni Fernandez-Kincade
 wrote:
> How can you differentiate between the backup and the normal index files?
>
> -Original Message-
> From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf Of Noble 
> Paul ??? ??
> Sent: Wednesday, January 06, 2010 11:52 AM
> To: solr-user
> Subject: Re: replication --> missing field data file
>
> On Wed, Jan 6, 2010 at 9:49 PM, Giovanni Fernandez-Kincade
>  wrote:
>> I set up replication between 2 cores on one master and 2 cores on one slave. 
>> Before doing this the master was working without issues, and I stopped all 
>> indexing on the master.
>>
>> Now that replication has synced the index files, an .FDT field is suddenly 
>> missing on both the master and the slave. Pretty much every operation (core 
>> reload, commit, add document) fails with an error like the one posted below.
>>
>> How could this happen? How can one recover from such an error? Is there any 
>> way to regenerate the FDT file without re-indexing everything?
>>
>> This brings me to a question about backups. If I run the 
>> replication?command=backup command, where is this backup stored? I've tried 
>> this a few times and get an OK response from the machine, but I don't see 
>> the backup generated anywhere.
> The backup is done asynchronously. So it always gives an OK response 
> immedietly.
> The backup is created in the data dir itself
>>
>> Thanks,
>> Gio.
>>
>> org.apache.solr.common.SolrException: Error handling 'reload' action
>>       at 
>> org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:412)
>>       at 
>> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:142)
>>       at 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>>       at 
>> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:298)
>>       at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174)
>>       at 
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
>>       at 
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
>>       at 
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
>>       at 
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
>>       at 
>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>>       at 
>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
>>       at 
>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
>>       at 
>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
>>       at 
>> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
>>       at 
>> org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
>>       at 
>> org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
>>       at 
>> org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
>>       at 
>> org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
>>       at java.lang.Thread.run(Unknown Source)
>> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
>> Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file 
>> specified)
>>       at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
>>       at org.apache.solr.core.SolrCore.(SolrCore.java:579)
>>       at org.apache.solr.core.CoreContainer.create(CoreContainer.java:425)
>>       at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:486)
>>       at 
>> org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:409)
>>       ... 18 more
>> Caused by: java.io.FileNotFoundException: 
>> Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file 
>> specified)
>>       at java.io.RandomAccessFile.open(Native Method)
>>       at java.io.RandomAccessFile.(Unknown Source)
>>       at 
>> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.(SimpleFSDirectory.java:78)
>>       at 
>> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.(SimpleFSDirectory.java:108)
>>       at 
>> org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:65)
>>       at 
>> org.apache.lucene.index.Field

Re: Strange Behavior When Using CSVRequestHandler

2010-01-06 Thread Erick Erickson

I think the root of your problem is that unique fields should NOT
be multivalued. See
http://wiki.apache.org/solr/FieldOptionsByUseCase?highlight=(unique)|(key)

In
this case, since you're tokenizing, your "query" field is
implicitly multi-valued, I don't know what the behavior will be.

But there's another problem:
All the filters in your analyzer definition will mess up the
correspondence between the Unix uniq and numDocs even
if you got by the above. I.e

StopFilter would make the lines "a problem" and "the problem" identical.
WordDelimiter would do all kinds of interesting things
LowerCaseFilter would make "Myproblem" and "myproblem" identical.
RemoveDuplicatesFilter would make "interesting interesting" and
"interesting" identical

You could define a second field, make *that* one unique and NOT analyzer
it in any way...

You could hash your sentences and define the hash as your unique key.

You could

HTH
Erick

On Wed, Jan 6, 2010 at 1:06 PM, danben  wrote:

>
> The problem:
>
> Not all of the documents that I expect to be indexed are showing up in the
> index.
>
> The background:
>
> I start off with an empty index based on a schema with a single field named
> 'query', marked as unique and using the following analyzer:
>
> 
>
> words="stopwords.txt" enablePositionIncrements="true"/>
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>
>
> 
>
> My input is a utf-8 encoded file with one sentence per line.  Its total
> size
> is about 60MB.  I would like each line of the file to correspond to a
> single
> document in the solr index.  If I print the number of unique lines in the
> file (using cat | sort | uniq | wc -l), I get a little over 2M.  Printing
> the total number of lines in the file gives me around 2.7M.
>
> I use the following to start indexing:
>
> curl
> '
> http://localhost:8983/solr/update/csv?commit=true&separator=%09&stream.file=/home/gkropitz/querystage2map/file1&stream.contentType=text/plain;charset=utf-8&fieldnames=query&escape=
> \'
>
> When this command completes, I see numDocs is approximately 470k (which is
> what I find strange) and maxDocs is approximately 890k (which is fine since
> I know I have around 700k duplicates).  Even more confusing is that if I
> run
> this exact command a second time without performing any other operations,
> numDocs goes up to around 610k, and a third time brings it up to about
> 750k.
>
> Can anyone tell me what might cause Solr not to index everything in my
> input
> file the first time, and why it would be able to index new documents the
> second and third times?
>
> I also have this line in solrconfig.xml, if it matters:
>
>  multipartUploadLimitInKB="2048" />
>
> Thanks,
> Dan
>
> --
> View this message in context:
> http://old.nabble.com/Strange-Behavior-When-Using-CSVRequestHandler-tp27026926p27026926.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Basic sentence parsing with the regex highlighter fragmenter

2010-01-06 Thread Erick Erickson

Hmmm, I'll have to defer to the highlighter experts here

Erick

On Wed, Jan 6, 2010 at 3:23 PM, Caleb Land  wrote:

> I've looked at the docs/source for WordDelimiterFilter, and I understand
> what it does now.
>
> Here is my configuration:
>
> http://gist.github.com/270590
>
> I've tried the StandardTokenizerFactory instead of the
> WhitespaceTokenizerFactory, but I get the same problem as before, a the
> period from the previous sentence shows up and the period from the current
> sentence is cut off of highlighter fragments.
>
> I've tried the WhitespaceTokenizer with the StandardFilter, and this kinda
> works, but to match a word at the end of a sentence, you need to search for
> the period at the end of the sentence (the period is being tokenized along
> with the word).
>
> In any case, if I use the WordDelimiterFilter or add preserveOriginal="1",
> everything seems to work. (If I remove the WordDelimiterFilter, the periods
> are indexed with the word they're connected to, and searching for those
> words doesn't match unless the user includes the period)
>
> I'm trying to go through the code to understand how this works.
>
> On Wed, Jan 6, 2010 at 9:13 AM, Erick Erickson  >wrote:
>
> > Hmmm, the name WordDelimiterFilterFactory might be leading
> > you astray. Its purpose isn't to break things up into "words"
> > that have anything to do with grammatical rules. Rather, it's
> > purpose is to break up strings of funky characters into
> > searchable stuff. see:
> >
> >
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory
> >
> > In the grammatical sense, PowerShot should just be
> > PowerShot, not power shot (which is what WordDelimiterFactory
> > gives you, options permitting). So I think you probably want
> > one of the other analyzers
> >
> > Have you tried any other analyzers? StandardAnalyzer might be
> > more friendly
> >
> > HTH
> > Erick
> >
> > On Tue, Jan 5, 2010 at 5:18 PM, Caleb Land  wrote:
> >
> > > I've tracked this problem down to the fact that I'm using the
> > > WordDelimiterFilter. I don't quite understand what's happening, but if
> I
> > > add preserveOriginal="1" as an option, everything looks fine. I think
> it
> > > has
> > > to do with the period being stripped in the token stream.
> > >
> > > On Tue, Jan 5, 2010 at 2:05 PM, Caleb Land 
> wrote:
> > >
> > > > Hello,
> > > > I'm using Solr 1.4, and I'm trying to get the regex fragmenter to
> parse
> > > > basic sentences, and I'm running into a problem.
> > > >
> > > > I'm using the default regex specified in the example solr
> > configuration:
> > > >
> > > > [-\w ,/\n\"']{20,200}
> > > >
> > > > But I am using a larger fragment size (140) with a slop of 1.0.
> > > >
> > > > Given the passage:
> > > >
> > > > Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla a
> neque
> > a
> > > > ipsum accumsan iaculis at id lacus. Sed magna velit, aliquam ut
> congue
> > > > vitae, molestie quis nunc.
> > > >
> > > > When I search for "Nulla" (the first word of the second sentence) and
> > > grab
> > > > the first highlighted snippet, this is what I get:
> > > >
> > > > . Nulla a neque a ipsum accumsan iaculis at id lacus
> > > >
> > > > As you can see, there's a leading period from the previous sentence
> and
> > > the
> > > > period from the current sentence is missing.
> > > >
> > > > I understand this regex isn't that advanced, but I've tried
> everything
> > I
> > > > can think of, regex-wise, to get this to work, and I always end up
> with
> > > this
> > > > problem.
> > > >
> > > > For example, I've tried: \w[^.!?]{0,200}[.!?]
> > > >
> > > > Which seems like it should include the ending punctuation, but it
> > > doesn't,
> > > > so I think I'm missing something.
> > > >
> > > > Does anybody know a regex that works?
> > > > --
> > > > Caleb Land
> > > >
> > >
> > >
> > >
> > > --
> > > Caleb Land
> > >
> >
>
>
>
> --
> Caleb Land
>

Re: No Analyzer, tokenizer or stemmer works at Solr

2010-01-06 Thread Erick Erickson

<<>>

How do you know this? Because it's highly unlikely that SOLR
is completely broken on that level.

Erick

On Wed, Jan 6, 2010 at 3:48 PM, MitchK  wrote:

>
> I have tested a lot and all the time I thought I set wrong options for my
> custom analyzer.
> Well, I have noticed that Solr isn't using ANY analyzer, filter or stemmer.
> It seems like it only stores the original input.
>
> I am using the example-configuration of the current Solr 1.4 release.
> What's wrong?
>
> Thank you!
> --
> View this message in context:
> http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27026959.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: No Analyzer, tokenizer or stemmer works at Solr

2010-01-06 Thread Ryan McKinley



On Jan 6, 2010, at 3:48 PM, MitchK wrote:



I have tested a lot and all the time I thought I set wrong options  
for my

custom analyzer.
Well, I have noticed that Solr isn't using ANY analyzer, filter or  
stemmer.

It seems like it only stores the original input.


The stored value is always the original input.

The *indexed* values are transformed by analysis.

If you really need to store the analyzed fields, that may be possible  
with an UpdateRequestProcessor.  also see:

https://issues.apache.org/jira/browse/SOLR-314

ryan

How to set User.dir or CWD for Solr during Tomcat startup

2010-01-06 Thread Turner, Robbin J

Is there anyway to force the cwd that solr starts up in when using the standard 
startup scripts for tomcat?  I'm working on solaris and using the SMF to start 
and stop tomcat sets the path to /root.  I've been doing a bunch of googling 
and haven't seen if there is a parameter to set within Tomcat other than the 
solr/home which is setup in the solr.xml under the 
$CATALINA_HOME/conf/Catalina/localhost/.

I've had one person give me instructions using the Gui on windows, but I'm at a 
loss on which configuration file that would set that or which environment 
variable can or should be defined.

Any help would be appreciated.

Thanks
Robbin

Search query log using solr

2010-01-06 Thread Ravi Gidwani

Hi All:
 I am currently using solr 1.4 as the search engine for my
application. I am planning to add a search query log that will capture all
the search queries (and more information like IP,user info,date time,etc).
I understand I can easily do this on the application side capturing all the
search request, logging them in a DB/File before sending them to solr for
execution.
 But I wanted to check with the forum if there was any better
approach OR best practices OR anything that has been added to Solr for such
requirement.

The idea is then to use this search log for statistical as well as improving
the search results.

Please share your experience/ideas.

TIA
~Ravi.

Re: DisMaxRequestHandler bf configuration

2010-01-06 Thread Yonik Seeley

On Wed, Jan 6, 2010 at 2:43 AM, Andy  wrote:
> I'd like to boost every query using {!boost b=log(popularity)}. But I'd 
> rather not have to prepend that to every query. It'd be much cleaner for me 
> to configure Solr to use that as default.
>
> My plan is to make DisMaxRequestHandler the default handler and add the 
> following to solrconfig.xml:
>
> 
>     
>  dismax
>  explicit
>  0.01
>  
>     log(popularity)
>  
>     
> 
>
> Is this the correct way to do it?

bf adds in the function query
{!boost} multiples the function query
In the new edismax (which may replace dismax soon) you can specify the
multiplicative boost via
&boost=log(popularity)


-Yonik
http://www.lucidimagination.com

Re: DisMaxRequestHandler bf configuration

2010-01-06 Thread Andy

So if I want to configure Solr to turn every query q=foo into q={!boost 
b=log(popularity)}foo, dismax wouldn't work but edismax would?

If that's the case, can you tell me how to set up/use edismax? I can't find 
much documentation on it. Is it recommended for production use?


--- On Wed, 1/6/10, Yonik Seeley  wrote:

From: Yonik Seeley 
Subject: Re: DisMaxRequestHandler bf configuration
To: solr-user@lucene.apache.org
Date: Wednesday, January 6, 2010, 7:09 PM

On Wed, Jan 6, 2010 at 2:43 AM, Andy  wrote:
> I'd like to boost every query using {!boost b=log(popularity)}. But I'd 
> rather not have to prepend that to every query. It'd be much cleaner for me 
> to configure Solr to use that as default.
>
> My plan is to make DisMaxRequestHandler the default handler and add the 
> following to solrconfig.xml:
>
> 
>     
>  dismax
>  explicit
>  0.01
>  
>     log(popularity)
>  
>     
> 
>
> Is this the correct way to do it?

bf adds in the function query
{!boost} multiples the function query
In the new edismax (which may replace dismax soon) you can specify the
multiplicative boost via
&boost=log(popularity)


-Yonik
http://www.lucidimagination.com

Re: DisMaxRequestHandler bf configuration

2010-01-06 Thread Yonik Seeley

On Wed, Jan 6, 2010 at 7:43 PM, Andy  wrote:
> So if I want to configure Solr to turn every query q=foo into q={!boost 
> b=log(popularity)}foo, dismax wouldn't work but edismax would?

You can do it with dismax it's just that the syntax is slightly
more convoluted.
Check out the section on boosting newer documents:
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents

> If that's the case, can you tell me how to set up/use edismax? I can't find 
> much documentation on it. Is it recommended for production use?

It's in trunk (not 1.4).

-Yonik
http://www.lucidimagination.com

Re: SOLR or Hibernate Search?

2010-01-06 Thread Márcio Paulino

Hi!

Thanks for the answers. These were crucial to my decision. I've adapted the
solr in my application.

On Wed, Dec 30, 2009 at 2:00 AM, Ryan McKinley  wrote:

> If you need to search via the Hibernate API, then use hibernate search.
>
> If you need a scaleable HTTP (REST) then solr may be the way to go.
>
> Also, i don't think hibernate has anything like the faceting / complex
> query stuff etc.
>
>
>
>
> On Dec 29, 2009, at 3:25 PM, Márcio Paulino wrote:
>
>  Hey Everyone!
>>
>> I was make a comparison of both technologies (SOLR AND Hibernate Search)
>> and
>> i see many things are equals. Anyone could told me when i must use SOLR
>> and
>> when i must use Hibernate Search?
>>
>> Im my project i will have:
>>
>> 1. Queries for indexed fields (Strings) and for not indexed Fields
>> (Integer,
>> Float, Date). [In Hibernate Search on in SOLR, i must search on index and,
>> with results of query, search on database (I can't search in both places
>> ate
>> same time).]
>> I Will Have search like:
>> "Give me all Register Where Value < 190 And Name Contains = 'JAVA' "
>>
>> 2. My client need process a lot of email (20.000 per day) and i must
>> indexed
>> all fields (excluded sentDate ) included Attachments, and performance is
>> requirement of my System
>>
>> 3. My Application is multiclient, and i need to separate the index by
>> client.
>>
>> In this Scenario, whats the best solution? SOLR or HIbernateSearch
>>
>> I See SOLR is a dedicated server and has a good performance test. I don't
>> see advantages to use hibernate-search in comparison with SOLR (Except the
>> fact of integrate with my Mapped Object)
>>
>> Thanks for Help
>>
>> --
>> att,
>>
>> **
>> Márcio Paulino
>> Campo Grande - MS
>> MSN / Gtalk: mcopaul...@gmail.com
>> ICQ: 155897898
>> **
>>
>
>


-- 
att,

**
Márcio Paulino
Campo Grande - MS
MSN / Gtalk: mcopaul...@gmail.com
ICQ: 155897898
**

Re: DisMaxRequestHandler bf configuration

2010-01-06 Thread Andy

I meant can I do it with dismax without modifying every single query? I'm 
accessing Solr through haystack and all queries are generated by haystack. I'd 
much rather not have to go under haystack to modify the generated queries.  
Hence I'm trying to find a way to boost every query by default.

--- On Wed, 1/6/10, Yonik Seeley  wrote:

From: Yonik Seeley 
Subject: Re: DisMaxRequestHandler bf configuration
To: solr-user@lucene.apache.org
Date: Wednesday, January 6, 2010, 7:48 PM

On Wed, Jan 6, 2010 at 7:43 PM, Andy  wrote:
> So if I want to configure Solr to turn every query q=foo into q={!boost 
> b=log(popularity)}foo, dismax wouldn't work but edismax would?

You can do it with dismax it's just that the syntax is slightly
more convoluted.
Check out the section on boosting newer documents:
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents

Re: DisMaxRequestHandler bf configuration

2010-01-06 Thread Yonik Seeley

On Wed, Jan 6, 2010 at 8:24 PM, Andy  wrote:
> I meant can I do it with dismax without modifying every single query? I'm 
> accessing Solr through haystack and all queries are generated by haystack. 
> I'd much rather not have to go under haystack to modify the generated 
> queries.  Hence I'm trying to find a way to boost every query by default.

If you can get haystack to pass through the user query as something
like qq, then yes - just use something like the last link I showed at
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
and set defaults for everything except qq.

-Yonik
http://www.lucidimagination.com




> --- On Wed, 1/6/10, Yonik Seeley  wrote:
>
> From: Yonik Seeley 
> Subject: Re: DisMaxRequestHandler bf configuration
> To: solr-user@lucene.apache.org
> Date: Wednesday, January 6, 2010, 7:48 PM
>
> On Wed, Jan 6, 2010 at 7:43 PM, Andy  wrote:
>> So if I want to configure Solr to turn every query q=foo into q={!boost 
>> b=log(popularity)}foo, dismax wouldn't work but edismax would?
>
> You can do it with dismax it's just that the syntax is slightly
> more convoluted.
> Check out the section on boosting newer documents:
> http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
>
>
>
>
>

Re: DisMaxRequestHandler bf configuration

2010-01-06 Thread Andy

Let me make sure I understand you.

I'd get my regular query from haystack as qq=foo rather than q=foo.

Then I put in solrconfig within the dismax section:

    
  {!boost b=$popularityboost v=$qq}&popularityboost=log(popularity)


Is that what you meant?


--- On Wed, 1/6/10, Yonik Seeley  wrote:

From: Yonik Seeley 
Subject: Re: DisMaxRequestHandler bf configuration
To: solr-user@lucene.apache.org
Date: Wednesday, January 6, 2010, 8:42 PM

On Wed, Jan 6, 2010 at 8:24 PM, Andy  wrote:
> I meant can I do it with dismax without modifying every single query? I'm 
> accessing Solr through haystack and all queries are generated by haystack. 
> I'd much rather not have to go under haystack to modify the generated 
> queries.  Hence I'm trying to find a way to boost every query by default.

If you can get haystack to pass through the user query as something
like qq, then yes - just use something like the last link I showed at
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
and set defaults for everything except qq.

-Yonik
http://www.lucidimagination.com




> --- On Wed, 1/6/10, Yonik Seeley  wrote:
>
> From: Yonik Seeley 
> Subject: Re: DisMaxRequestHandler bf configuration
> To: solr-user@lucene.apache.org
> Date: Wednesday, January 6, 2010, 7:48 PM
>
> On Wed, Jan 6, 2010 at 7:43 PM, Andy  wrote:
>> So if I want to configure Solr to turn every query q=foo into q={!boost 
>> b=log(popularity)}foo, dismax wouldn't work but edismax would?
>
> You can do it with dismax it's just that the syntax is slightly
> more convoluted.
> Check out the section on boosting newer documents:
> http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
>
>
>
>
>

Re: No Analyzer, tokenizer or stemmer works at Solr

2010-01-06 Thread MitchK

Hello Erick,

thank you for answering.

I can do whatever I want - Solr does nothing.
For example: If I use the textgen-fieldtype which is predefined, nothing
happens to the text. Even the stopFilter is not working - no stopword from
stopword.txt was replaced. I think, that this only affects the index,
because, if I query for "for" he returns nothing, which is quietly correct,
due to the work of the stopFilter. 

Everything works fine on analysis.jsp, but not in "reality". 

If you have got any testcase-data you want me to add, please, tell me and I
will show you the saved data afterwards.  

Thank you.

Mitch

Erick Erickson wrote:
> 
> <<>>
> 
> How do you know this? Because it's highly unlikely that SOLR
> is completely broken on that level.
> 
> Erick
> 
> On Wed, Jan 6, 2010 at 3:48 PM, MitchK  wrote:
> 
>>
>> I have tested a lot and all the time I thought I set wrong options for my
>> custom analyzer.
>> Well, I have noticed that Solr isn't using ANY analyzer, filter or
>> stemmer.
>> It seems like it only stores the original input.
>>
>> I am using the example-configuration of the current Solr 1.4 release.
>> What's wrong?
>>
>> Thank you!
>> --
>> View this message in context:
>> http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27026959.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27055510.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: No Analyzer, tokenizer or stemmer works at Solr

2010-01-06 Thread MitchK


Hello Ryan,

thank you for answering.

In my schema.xml I am defining the field as "indexed = true".
The problem is: nothing, even the original predefined analyzers don't work
anyway.
Please, have a look on my response to Erick.

Mitch

P.S.
Oh, I see what you mean. The field is indexed = true. My language was a
little bit tricky ;).


ryantxu wrote:
> 
> 
> On Jan 6, 2010, at 3:48 PM, MitchK wrote:
> 
>>
>> I have tested a lot and all the time I thought I set wrong options  
>> for my
>> custom analyzer.
>> Well, I have noticed that Solr isn't using ANY analyzer, filter or  
>> stemmer.
>> It seems like it only stores the original input.
> 
> The stored value is always the original input.
> 
> The *indexed* values are transformed by analysis.
> 
> If you really need to store the analyzed fields, that may be possible  
> with an UpdateRequestProcessor.  also see:
> https://issues.apache.org/jira/browse/SOLR-314
> 
> ryan
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27055512.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: No Analyzer, tokenizer or stemmer works at Solr

2010-01-06 Thread Erik Hatcher

Mitch,

Again, I think you're misunderstanding what analysis does. You must
be expecting we think, though you've not provided exact duplication
steps to be sure, that the value you get back from Solr is the
analyzer processed output. It's not, it's exactly what you provide.
Internally for searching the analysis takes place and writes to the
index in an inverted fashion, but the stored stuff is left alone.

There's some thinking going on implementing it such that analyzed
output is stored.

You can, however, use the analysis request handler componentry to get
analyzed stuff back as you see it in analysis.jsp on a per-document or
per-field text basis - if you're looking to leverage the analyzer
output in that fashion from a client.

Erik

On Jan 7, 2010, at 1:21 AM, MitchK wrote:

Hello Erick,

thank you for answering.

I can do whatever I want - Solr does nothing.
For example: If I use the textgen-fieldtype which is predefined,
nothing
happens to the text. Even the stopFilter is not working - no
stopword from

stopword.txt was replaced. I think, that this only affects the index,
because, if I query for "for" he returns nothing, which is quietly
correct,

due to the work of the stopFilter.

Everything works fine on analysis.jsp, but not in "reality".

If you have got any testcase-data you want me to add, please, tell
me and I

will show you the saved data afterwards.

Thank you.

Mitch

Erick Erickson wrote:

<<>>

How do you know this? Because it's highly unlikely that SOLR
is completely broken on that level.

Erick

On Wed, Jan 6, 2010 at 3:48 PM, MitchK wrote:

I have tested a lot and all the time I thought I set wrong options
for my

custom analyzer.
Well, I have noticed that Solr isn't using ANY analyzer, filter or
stemmer.
It seems like it only stores the original input.

I am using the example-configuration of the current Solr 1.4
release.

What's wrong?

Thank you!
--
View this message in context:
http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27026959.html
Sent from the Solr - User mailing list archive at Nabble.com.

--
View this message in context:
http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27055510.html
Sent from the Solr - User mailing list archive at Nabble.com.

43 matches

Mail list logo