date:20090202

Re: Exact match search problem

2009-02-02 Thread Stephen Weiss

Try using fieldtype "string"  instead of "text" for the UserName  
field.  Then it will not be tokenized so it should only give exact  
matches.


--
Steve


On Feb 2, 2009, at 2:27 AM, mahendra mahendra wrote:


Hi,

I have indexed my data as "custom123, customer, custom" for the  
"UserName" field.
I need to search the records for exact match, when I am trying to  
search with UserName:"customer" I am finding the records where  
UserName is custom123 and custom.


As per my understanding solr splits the AlphaNumeric words into sub  
words

custom123 => "custom","123"

As per above the above logic when I search for UserName:"customer",  
it shouldn't display the custom123 and custom.


Could you please tell me why it is behaving like that or how I can  
search for exact match.


I am using the following declaration for text field in schema file

   positionIncrementGap="100">

  




generateWordParts="1" generateNumberParts="1" catenateWords="1"  
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>


protected="englishprotwords.txt"/>


  
  

synonyms="englishsynonyms.txt" ignoreCase="true" expand="true"/>
words="englishstopwords.txt"/>
generateWordParts="1" generateNumberParts="1" catenateWords="0"  
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>


protected="englishprotwords.txt"/>


  


Thanks in advance!!

- Mahendra

rsyncd / snappuller / multicore how manage it.

2009-02-02 Thread sunnyfr


Hi 

I started rsyncd from one core bin folder.
But it doesn't work for the other core, how can I do without creating
several port for the others ?

Can I use the same port for my three cores ?

If yes how can I do?

Thanks a lot,
Sunny

Wish to everybody a very good week.
-- 
View this message in context: 
http://www.nabble.com/rsyncd---snappuller---multicore-how-manage-it.-tp21786231p21786231.html
Sent from the Solr - User mailing list archive at Nabble.com.

DIH - Example of using $nextUrl and $hasMore

2009-02-02 Thread Jon Baer

Hi,

Sorry I know this exists ...

"If an API supports chunking (when the dataset is too large) multiple calls
need to be made to complete the process. XPathEntityprocessor supports this
with a transformer. If transformer returns a row which contains a field *
$hasMore* with a the value "true" the Processor makes another request with
the same url template (The actual value is recomputed before invoking ). A
transformer can pass a totally new url too for the next call by returning a
row which contains a field *$nextUrl* whose value must be the complete url
for the next call."

But is there a true example of it's use somewhere?  Im trying to figure out
if I know before import that I have 56 "pages" to index how to set this up
properly.  (And how to set it up if pages need to be determined by something
in the feed, etc).

Thanks.

- Jon

DIH using values from solrconfig.xml inside data-config.xml

2009-02-02 Thread Fergus McMenemie

Hello

As per several postings I noted that I can define variables
inside an invariants list section of the DIH handler of
solrconfig.xml:-

  

   data-config.xml
   

   /Volumes/spare/ts
   
  


I can also reference these variables within data-config.xml. This
works,  the solr field "test" is nicely populated. However how do
I use this variable within my regex transformer? Here is my 
data-config.xml:-

   
   

   
  

   
   
   
   
   
   
 
   
   


indexing my content I get an error as follows:-


INFO: SolrDeletionPolicy.onInit: commits:num=2

commit{dir=/Volumes/spare/ts/solrnightlyjanes/data/index,segFN=segments_7,version=1233583868834,generation=7,filenames=[_7.frq,
 _4.fdt, _7.tii, _7.fnm, _4.fdx, _7.tis, segments_7, _7.nrm, _7.prx]

commit{dir=/Volumes/spare/ts/solrnightlyjanes/data/index,segFN=segments_8,version=1233583868835,generation=8,filenames=[segments_8]
Feb 2, 2009 5:00:50 PM org.apache.solr.core.SolrDeletionPolicy updateCommits
INFO: last commit = 1233583868835
Feb 2, 2009 5:00:57 PM org.apache.solr.handler.dataimport.EntityProcessorBase 
applyTransformer
WARNING: transformer threw error
java.util.regex.PatternSyntaxException: Illegal repetition near index 0
${dataimporter.request.finstalldir}(.*)
^
at java.util.regex.Pattern.error(Pattern.java:1650)
at java.util.regex.Pattern.closure(Pattern.java:2706)
at java.util.regex.Pattern.sequence(Pattern.java:1798)
at java.util.regex.Pattern.expr(Pattern.java:1687)
at java.util.regex.Pattern.compile(Pattern.java:1397)
at java.util.regex.Pattern.(Pattern.java:1124)
at java.util.regex.Pattern.compile(Pattern.java:817)
at 
org.apache.solr.handler.dataimport.RegexTransformer.getPattern(RegexTransformer.java:129)
at 
org.apache.solr.handler.dataimport.RegexTransformer.process(RegexTransformer.java:88)
at 
org.apache.solr.handler.dataimport.RegexTransformer.transformRow(RegexTransformer.java:74)
at 
org.apache.solr.handler.dataimport.RegexTransformer.transformRow(RegexTransformer.java:42)
at 
org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:187)
at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:197)
at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:160)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:333)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:359)
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:222)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:155)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:324)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:384)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:365)


Is there some simple escape or other syntax to be used or is
this an enhancement?

Regards Fergus.
-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===

Re: DIH using values from solrconfig.xml inside data-config.xml

2009-02-02 Thread Noble Paul നോബിള്‍ नोब्ळ्

RegexTransformer does not replace the placeholders before processing the regex.
it has to be enhanced



On Mon, Feb 2, 2009 at 10:34 PM, Fergus McMenemie  wrote:
> Hello
>
> As per several postings I noted that I can define variables
> inside an invariants list section of the DIH handler of
> solrconfig.xml:-
>
>   class="org.apache.solr.handler.dataimport.DataImportHandler">
>
>   data-config.xml
>   
>
>   /Volumes/spare/ts
>   
>
>
>
> I can also reference these variables within data-config.xml. This
> works,  the solr field "test" is nicely populated. However how do
> I use this variable within my regex transformer? Here is my
> data-config.xml:-
>
>   
>   
>
>  processor="FileListEntityProcessor"
>   fileName="^.*\.xml$"
>   newerThan="'NOW-1000DAYS'"
>   recursive="true"
>   rootEntity="false"
>   dataSource="null"
>   baseDir="/Volumes/spare/ts/fords/dtd/fordsxml/data">
>dataSource="myfilereader"
>  processor="XPathEntityProcessor"
>  url="${jc.fileAbsolutePath}"
>  stream="false"
>  forEach="/record"
>  
> transformer="DateFormatTransformer,TemplateTransformer,RegexTransformer,HTMLStripTransformer">
>
>   
>regex="${dataimporter.request.finstalldir}(.*)" replaceWith="$1" 
> sourceColName="fileAbsolutePath"/>
>template="${dataimporter.request.finstalldir}" />
>   
>stripHTML="true" />
>xpath="/record/metadata/da...@qualifier='Date']" dateTimeFormat="MMdd"   
> />
> 
>   
>   
>
>
> indexing my content I get an error as follows:-
>
>
> INFO: SolrDeletionPolicy.onInit: commits:num=2
>
> commit{dir=/Volumes/spare/ts/solrnightlyjanes/data/index,segFN=segments_7,version=1233583868834,generation=7,filenames=[_7.frq,
>  _4.fdt, _7.tii, _7.fnm, _4.fdx, _7.tis, segments_7, _7.nrm, _7.prx]
>
> commit{dir=/Volumes/spare/ts/solrnightlyjanes/data/index,segFN=segments_8,version=1233583868835,generation=8,filenames=[segments_8]
> Feb 2, 2009 5:00:50 PM org.apache.solr.core.SolrDeletionPolicy updateCommits
> INFO: last commit = 1233583868835
> Feb 2, 2009 5:00:57 PM org.apache.solr.handler.dataimport.EntityProcessorBase 
> applyTransformer
> WARNING: transformer threw error
> java.util.regex.PatternSyntaxException: Illegal repetition near index 0
> ${dataimporter.request.finstalldir}(.*)
> ^
>at java.util.regex.Pattern.error(Pattern.java:1650)
>at java.util.regex.Pattern.closure(Pattern.java:2706)
>at java.util.regex.Pattern.sequence(Pattern.java:1798)
>at java.util.regex.Pattern.expr(Pattern.java:1687)
>at java.util.regex.Pattern.compile(Pattern.java:1397)
>at java.util.regex.Pattern.(Pattern.java:1124)
>at java.util.regex.Pattern.compile(Pattern.java:817)
>at 
> org.apache.solr.handler.dataimport.RegexTransformer.getPattern(RegexTransformer.java:129)
>at 
> org.apache.solr.handler.dataimport.RegexTransformer.process(RegexTransformer.java:88)
>at 
> org.apache.solr.handler.dataimport.RegexTransformer.transformRow(RegexTransformer.java:74)
>at 
> org.apache.solr.handler.dataimport.RegexTransformer.transformRow(RegexTransformer.java:42)
>at 
> org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:187)
>at 
> org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:197)
>at 
> org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:160)
>at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:333)
>at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:359)
>at 
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:222)
>at 
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:155)
>at 
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:324)
>at 
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:384)
>at 
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:365)
>
>
> Is there some simple escape or other syntax to be used or is
> this an enhancement?
>
> Regards Fergus.
> --
>
> ===
> Fergus McMenemie   Email:fer...@twig.me.uk
> Techmore Ltd   Phone:(UK) 07721 376021
>
> Unix/Mac/Intranets Analyst Programmer
> ===
>



-- 
--Noble Paul

Re: DIH using values from solrconfig.xml inside data-config.xml

2009-02-02 Thread Shalin Shekhar Mangar

On Mon, Feb 2, 2009 at 10:34 PM, Fergus McMenemie  wrote:

>
> Is there some simple escape or other syntax to be used or is
> this an enhancement?
>

I guess the problem is that we are creating the regex Pattern without first
resolving the variable. So we need to call VariableResolver.resolve on the
'regex' attribute's value before creating the Pattern object.

Please raise an issue for this change. Nice use-case though. I guess we
never thought someone would need to use a variable in the regex attribute :)

-- 
Regards,
Shalin Shekhar Mangar.

Re: DIH - Example of using $nextUrl and $hasMore

2009-02-02 Thread Jon Baer

Yes I think what Jared mentions in the JIRA is what I was thinking about
when it is recommended to always return true for $hasMore ...

"The transformer must know somehow when $hasMore should be true. If the
transformer always give $hasMore a value "true", will there be infinite
requests made or will it stop on the first empty request? Using the
EnumeratedEntityTransformer, a user can specify from the config xml when
$hasMore should be true using the chunkSize attribute. This solves a general
case of "request N rows at a time until no more are available". I agree, a
combination of 'rowsFetchedCount' and a HasMoreUntilEmptyTransformer would
also make this doable from the configuration"

This makes sense.

- Jon
  [ Show »  ]
 Jared 
Flatow-
28/Jan/09
09:16 PM The transformer must know somehow when $hasMore should be true. If
the transformer always give $hasMore a value "true", will there be infinite
requests made or will it stop on the first empty request? Using the
EnumeratedEntityTransformer, a user can specify from the config xml when
$hasMore should be true using the chunkSize attribute. This solves a general
case of "request N rows at a time until no more are available". I agree, a
combination of 'rowsFetchedCount' and a HasMoreUntilEmptyTransformer would
also make this doable from the configuration.

On Mon, Feb 2, 2009 at 11:53 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Mon, Feb 2, 2009 at 9:20 PM, Jon Baer  wrote:
>
> > Hi,
> >
> > Sorry I know this exists ...
> >
> > "If an API supports chunking (when the dataset is too large) multiple
> calls
> > need to be made to complete the process. XPathEntityprocessor supports
> this
> > with a transformer. If transformer returns a row which contains a field *
> > $hasMore* with a the value "true" the Processor makes another request
> with
> > the same url template (The actual value is recomputed before invoking ).
> A
> > transformer can pass a totally new url too for the next call by returning
> a
> > row which contains a field *$nextUrl* whose value must be the complete
> url
> > for the next call."
> >
> > But is there a true example of it's use somewhere?  Im trying to figure
> out
> > if I know before import that I have 56 "pages" to index how to set this
> up
> > properly.  (And how to set it up if pages need to be determined by
> > something
> > in the feed, etc).
> >
>
> No, there is no example (yet). You'll put the url with variables for the
> corresponding 'start' and 'count' parameters and a custom transformer can
> specify if another request needs to be made. I know it's not much to go on.
> I'll try to write some documentation on the wiki.
>
> SOLR-994 might be interesting to you. I haven't been able to look at the
> patch though.
>
>  https://issues.apache.org/jira/browse/SOLR-994
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: DIH - Example of using $nextUrl and $hasMore

2009-02-02 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Mon, Feb 2, 2009 at 11:01 PM, Jon Baer  wrote:
> Yes I think what Jared mentions in the JIRA is what I was thinking about
> when it is recommended to always return true for $hasMore ...
>
> "The transformer must know somehow when $hasMore should be true. If the
> transformer always give $hasMore a value "true", will there be infinite
> requests made or will it stop on the first empty request? Using the
> EnumeratedEntityTransformer, a user can specify from the config xml when
> $hasMore should be true using the chunkSize attribute. This solves a general
> case of "request N rows at a time until no more are available". I agree, a
> combination of 'rowsFetchedCount' and a HasMoreUntilEmptyTransformer would
> also make this doable from the configuration"
why cant a Tranformer put a $hasMore=false?
>
> This makes sense.
>
> - Jon
>  [ Show »  ]
>  Jared 
> Flatow-
> 28/Jan/09
> 09:16 PM The transformer must know somehow when $hasMore should be true. If
> the transformer always give $hasMore a value "true", will there be infinite
> requests made or will it stop on the first empty request? Using the
> EnumeratedEntityTransformer, a user can specify from the config xml when
> $hasMore should be true using the chunkSize attribute. This solves a general
> case of "request N rows at a time until no more are available". I agree, a
> combination of 'rowsFetchedCount' and a HasMoreUntilEmptyTransformer would
> also make this doable from the configuration.
>
> On Mon, Feb 2, 2009 at 11:53 AM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
>> On Mon, Feb 2, 2009 at 9:20 PM, Jon Baer  wrote:
>>
>> > Hi,
>> >
>> > Sorry I know this exists ...
>> >
>> > "If an API supports chunking (when the dataset is too large) multiple
>> calls
>> > need to be made to complete the process. XPathEntityprocessor supports
>> this
>> > with a transformer. If transformer returns a row which contains a field *
>> > $hasMore* with a the value "true" the Processor makes another request
>> with
>> > the same url template (The actual value is recomputed before invoking ).
>> A
>> > transformer can pass a totally new url too for the next call by returning
>> a
>> > row which contains a field *$nextUrl* whose value must be the complete
>> url
>> > for the next call."
>> >
>> > But is there a true example of it's use somewhere?  Im trying to figure
>> out
>> > if I know before import that I have 56 "pages" to index how to set this
>> up
>> > properly.  (And how to set it up if pages need to be determined by
>> > something
>> > in the feed, etc).
>> >
>>
>> No, there is no example (yet). You'll put the url with variables for the
>> corresponding 'start' and 'count' parameters and a custom transformer can
>> specify if another request needs to be made. I know it's not much to go on.
>> I'll try to write some documentation on the wiki.
>>
>> SOLR-994 might be interesting to you. I haven't been able to look at the
>> patch though.
>>
>>  https://issues.apache.org/jira/browse/SOLR-994
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>



-- 
--Noble Paul

Re: DIH - Example of using $nextUrl and $hasMore

2009-02-02 Thread Jon Baer

See I think Im just misunderstanding how this entity is suppose to be setup
... for example, using the patch on 1.3 I ended up in a loop where .n is
never set ...

Feb 2, 2009 1:31:02 PM org.apache.solr.handler.dataimport.HttpDataSource
getData
INFO: Created URL to: http://subdomain.site.com/feed.rss?page=

http://subdomain.site.com/boards.rss?page=${blogs.n}"; chunkSize="50"
name="docs" pk="link" processor="XPathEntityProcessor"
forEach="/rss/channel/item" transformer="RegexTransformer,
com.nhl.solr.DateFormatTransformer, TemplateTransformer,
com.nhl.solr.EnumeratedEntityTransformer">

I guess what Im looking for is that snippet which shows how it is setup (the
initial counter) ...

- Jon

On Mon, Feb 2, 2009 at 12:39 PM, Noble Paul നോബിള്‍ नोब्ळ् <
noble.p...@gmail.com> wrote:

> On Mon, Feb 2, 2009 at 11:01 PM, Jon Baer  wrote:
> > Yes I think what Jared mentions in the JIRA is what I was thinking about
> > when it is recommended to always return true for $hasMore ...
> >
> > "The transformer must know somehow when $hasMore should be true. If the
> > transformer always give $hasMore a value "true", will there be infinite
> > requests made or will it stop on the first empty request? Using the
> > EnumeratedEntityTransformer, a user can specify from the config xml when
> > $hasMore should be true using the chunkSize attribute. This solves a
> general
> > case of "request N rows at a time until no more are available". I agree,
> a
> > combination of 'rowsFetchedCount' and a HasMoreUntilEmptyTransformer
> would
> > also make this doable from the configuration"
> why cant a Tranformer put a $hasMore=false?
> >
> > This makes sense.
> >
> > - Jon
> >  [ Show »  ]
> >  Jared Flatow<
> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=jflatow>-
> > 28/Jan/09
> > 09:16 PM The transformer must know somehow when $hasMore should be true.
> If
> > the transformer always give $hasMore a value "true", will there be
> infinite
> > requests made or will it stop on the first empty request? Using the
> > EnumeratedEntityTransformer, a user can specify from the config xml when
> > $hasMore should be true using the chunkSize attribute. This solves a
> general
> > case of "request N rows at a time until no more are available". I agree,
> a
> > combination of 'rowsFetchedCount' and a HasMoreUntilEmptyTransformer
> would
> > also make this doable from the configuration.
> >
> > On Mon, Feb 2, 2009 at 11:53 AM, Shalin Shekhar Mangar <
> > shalinman...@gmail.com> wrote:
> >
> >> On Mon, Feb 2, 2009 at 9:20 PM, Jon Baer  wrote:
> >>
> >> > Hi,
> >> >
> >> > Sorry I know this exists ...
> >> >
> >> > "If an API supports chunking (when the dataset is too large) multiple
> >> calls
> >> > need to be made to complete the process. XPathEntityprocessor supports
> >> this
> >> > with a transformer. If transformer returns a row which contains a
> field *
> >> > $hasMore* with a the value "true" the Processor makes another request
> >> with
> >> > the same url template (The actual value is recomputed before invoking
> ).
> >> A
> >> > transformer can pass a totally new url too for the next call by
> returning
> >> a
> >> > row which contains a field *$nextUrl* whose value must be the complete
> >> url
> >> > for the next call."
> >> >
> >> > But is there a true example of it's use somewhere?  Im trying to
> figure
> >> out
> >> > if I know before import that I have 56 "pages" to index how to set
> this
> >> up
> >> > properly.  (And how to set it up if pages need to be determined by
> >> > something
> >> > in the feed, etc).
> >> >
> >>
> >> No, there is no example (yet). You'll put the url with variables for the
> >> corresponding 'start' and 'count' parameters and a custom transformer
> can
> >> specify if another request needs to be made. I know it's not much to go
> on.
> >> I'll try to write some documentation on the wiki.
> >>
> >> SOLR-994 might be interesting to you. I haven't been able to look at the
> >> patch though.
> >>
> >>  https://issues.apache.org/jira/browse/SOLR-994
> >> --
> >> Regards,
> >> Shalin Shekhar Mangar.
> >>
> >
>
>
>
> --
> --Noble Paul
>

Re: DIH - Example of using $nextUrl and $hasMore

2009-02-02 Thread Shalin Shekhar Mangar

On Mon, Feb 2, 2009 at 9:20 PM, Jon Baer  wrote:

> Hi,
>
> Sorry I know this exists ...
>
> "If an API supports chunking (when the dataset is too large) multiple calls
> need to be made to complete the process. XPathEntityprocessor supports this
> with a transformer. If transformer returns a row which contains a field *
> $hasMore* with a the value "true" the Processor makes another request with
> the same url template (The actual value is recomputed before invoking ). A
> transformer can pass a totally new url too for the next call by returning a
> row which contains a field *$nextUrl* whose value must be the complete url
> for the next call."
>
> But is there a true example of it's use somewhere?  Im trying to figure out
> if I know before import that I have 56 "pages" to index how to set this up
> properly.  (And how to set it up if pages need to be determined by
> something
> in the feed, etc).
>

No, there is no example (yet). You'll put the url with variables for the
corresponding 'start' and 'count' parameters and a custom transformer can
specify if another request needs to be made. I know it's not much to go on.
I'll try to write some documentation on the wiki.

SOLR-994 might be interesting to you. I haven't been able to look at the
patch though.

 https://issues.apache.org/jira/browse/SOLR-994
-- 
Regards,
Shalin Shekhar Mangar.

Re: Tools for Managing Synonyms, Elevate, etc.

2009-02-02 Thread Vicky_Dev


Mark,

Use GUI (may be custom build one) to read files which are present on Solr
server. These files can be read using webservice/RMI call. 

Do all manipulation on synonyms.txt contents and then call webservice/RMI
call to save that information. After saving information , just call RELOAD.


Check
::http://wiki.apache.org/solr/CoreAdmin#head-3f125034c6a64611779442539812067b8b430930

 http://localhost:8983/solr/admin/cores?action=RELOAD&core=core0 

Hope this helps

~Vikrant




Cohen, Mark - IS&T wrote:
> 
> I'm considering building some tools for our internal non-technical staff
> to write to synonyms.txt, elevate.xml, spellings.txt, and protwords.txt
> so software developers don't have to maintain them.  Before my team
> starts building these tools, has anyone done this before?  If so, are
> these tools available as open source?  
> 
> Thanks,
> Mark Cohen
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Tools-for-Managing-Synonyms%2C-Elevate%2C-etc.-tp21696372p21796832.html
Sent from the Solr - User mailing list archive at Nabble.com.

500 Errors on update

2009-02-02 Thread Derek Springer

Hi all,
I recently created a Solr index to track some news articles that I follow
and I've noticed that I occasionally receive 500 errors when posting an
update. It doesn't happen every time and I can't seem to reproduce the
error. I should mention that I have another Solr index setup under the same
instance (configured via solr.xml) and I do not seem to be having the same
issue. Also, I can query the index without issue.

Does anyone know if this is an error with the Tomcat server I have set up,
or an issue with Solr itself? Has anyone else experienced a similar issue?

If it's any help, here's a dump of the xml that caused an error:

Pinging Solr Error: HTTP Error 500: Internal Server Error

  
'The day the music died'? Hardly

http://rss.cnn.com/~r/rss/cnn_showbiz/~3/JBV2Hu7Pisg/index.html

The plane crash that killed Buddy
Holly, Ritchie Valens and The Big Bopper has echoed through rock 'n' roll
history for 50 years, representing, if not the end of rock 'n' roll itself,
the close of an era. On Monday night, the  anniversary of the trio's deaths,
a huge tribute concert is taking place.
2009-02-02T15:43:54Z
www.cnn.com
  

  
'867-5309' number for sale on
eBay

http://rss.cnn.com/~r/rss/cnn_showbiz/~3/rxehPnDAe7Y/index.html

Jenny's phone number is for sale, but
not for a song.
2009-02-02T18:53:42Z
www.cnn.com
  

  
Porn airs during Super Bowl

http://rss.cnn.com/~r/rss/cnn_showbiz/~3/pCTDvXLkyb4/index.html

Super Bowl fans in Tucson, Arizona,
caught a different kind of show during Sunday's big game.
2009-02-02T17:34:43Z
www.cnn.com
  

  
Gallery: Hayden Panettiere at the big
game

http://rss.cnn.com/~r/rss/cnn_showbiz/~3/cygh8gfbXR0/index.html

Gallery: Hayden Panettiere at the big
game
2009-02-02T14:46:26Z
www.cnn.com
  

  
Former 'Homicide' star breaks
out

http://rss.cnn.com/~r/rss/cnn_showbiz/~3/Uxic4SVAHVo/index.html

As the critics rave and the
nominations flow in for her latest role in "Frozen River," Melissa Leo, a
veteran of the independent film scene and shows such as "Homicide," has
managed to stay grounded in her work as an actress.
2009-02-02T13:19:10Z
www.cnn.com
  

  
Don McLean: Buddy Holly was a
genius

http://rss.cnn.com/~r/rss/cnn_showbiz/~3/eBj6NfUFKzs/index.html

Of all the unique oddities of my
career, I am perhaps proudest of the fact that I am forever linked with
Buddy Holly.
2009-02-02T20:55:16Z
www.cnn.com
  

  
Sports attorney: Phelps could lose
endorsements

http://rss.cnn.com/~r/rss/cnn_showbiz/~3/px0QszfYZ3Y/index.html

Olympic gold medalist Michael Phelps
has acknowledged he engaged in "regrettable" behavior and "demonstrated bad
judgment," after a British newspaper published a photograph of the swimmer
using a marijuana pipe.
2009-02-02T19:21:10Z
www.cnn.com
  

  
'Taken' steals No. 1 slot at box
office

http://rss.cnn.com/~r/rss/cnn_showbiz/~3/fEoXK9HMowc/index.html

With an unexpectedly big gross of
$24.6 million, according to Sunday's early estimates, Liam Neeson's
kidnapping thriller "Taken" was the easy victor at the box office on this
Super Bowl weekend.
2009-02-01T20:50:14Z
www.cnn.com

Re: 500 Errors on update

2009-02-02 Thread Matthew Runo


Could you also provide us with the error you were getting?

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Feb 2, 2009, at 1:46 PM, Derek Springer wrote:


Hi all,
I recently created a Solr index to track some news articles that I  
follow
and I've noticed that I occasionally receive 500 errors when posting  
an

update. It doesn't happen every time and I can't seem to reproduce the
error. I should mention that I have another Solr index setup under  
the same
instance (configured via solr.xml) and I do not seem to be having  
the same

issue. Also, I can query the index without issue.

Does anyone know if this is an error with the Tomcat server I have  
set up,
or an issue with Solr itself? Has anyone else experienced a similar  
issue?


If it's any help, here's a dump of the xml that caused an error:

Pinging Solr Error: HTTP Error 500: Internal Server Error

 
   'The day the music died'? Hardlyfield>

   
http://rss.cnn.com/~r/rss/cnn_showbiz/~3/JBV2Hu7Pisg/index.html


   The plane crash that killed Buddy
Holly, Ritchie Valens and The Big Bopper has echoed through rock 'n'  
roll
history for 50 years, representing, if not the end of rock 'n' roll  
itself,
the close of an era. On Monday night, the  anniversary of the trio's  
deaths,

a huge tribute concert is taking place.
   2009-02-02T15:43:54Z
   www.cnn.com
 

 
   '867-5309' number for sale on
eBay
   
http://rss.cnn.com/~r/rss/cnn_showbiz/~3/rxehPnDAe7Y/index.html


   Jenny's phone number is for  
sale, but

not for a song.
   2009-02-02T18:53:42Z
   www.cnn.com
 

 
   Porn airs during Super Bowl
   
http://rss.cnn.com/~r/rss/cnn_showbiz/~3/pCTDvXLkyb4/index.html


   Super Bowl fans in Tucson,  
Arizona,

caught a different kind of show during Sunday's big game.
   2009-02-02T17:34:43Z
   www.cnn.com
 

 
   Gallery: Hayden Panettiere at the  
big

game
   
http://rss.cnn.com/~r/rss/cnn_showbiz/~3/cygh8gfbXR0/index.html


   Gallery: Hayden Panettiere at  
the big

game
   2009-02-02T14:46:26Z
   www.cnn.com
 

 
   Former 'Homicide' star breaks
out
   
http://rss.cnn.com/~r/rss/cnn_showbiz/~3/Uxic4SVAHVo/index.html


   As the critics rave and the
nominations flow in for her latest role in "Frozen River," Melissa  
Leo, a
veteran of the independent film scene and shows such as "Homicide,"  
has

managed to stay grounded in her work as an actress.
   2009-02-02T13:19:10Z
   www.cnn.com
 

 
   Don McLean: Buddy Holly was a
genius
   
http://rss.cnn.com/~r/rss/cnn_showbiz/~3/eBj6NfUFKzs/index.html


   Of all the unique oddities of my
career, I am perhaps proudest of the fact that I am forever linked  
with

Buddy Holly.
   2009-02-02T20:55:16Z
   www.cnn.com
 

 
   Sports attorney: Phelps could lose
endorsements
   
http://rss.cnn.com/~r/rss/cnn_showbiz/~3/px0QszfYZ3Y/index.html


   Olympic gold medalist Michael  
Phelps
has acknowledged he engaged in "regrettable" behavior and  
"demonstrated bad
judgment," after a British newspaper published a photograph of the  
swimmer

using a marijuana pipe.
   2009-02-02T19:21:10Z
   www.cnn.com
 

 
   'Taken' steals No. 1 slot at box
office
   
http://rss.cnn.com/~r/rss/cnn_showbiz/~3/fEoXK9HMowc/index.html


   With an unexpectedly big gross of
$24.6 million, according to Sunday's early estimates, Liam Neeson's
kidnapping thriller "Taken" was the easy victor at the box office on  
this

Super Bowl weekend.
   2009-02-01T20:50:14Z
   www.cnn.com

Re: 500 Errors on update

2009-02-02 Thread Derek Springer

Der, certainly!

org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out:
SingleInstanceLock: write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:85)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1140)
at
org.apache.lucene.index.IndexWriter.(IndexWriter.java:938)
at
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:116)
at
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:122)
at
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:167)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:221)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:59)
at
org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:196)
at
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:595)
type Status
reportmessage Lock obtain timed out: SingleInstanceLock:
write.lock


On Mon, Feb 2, 2009 at 1:51 PM, Matthew Runo  wrote:

> Could you also provide us with the error you were getting?
>
> Thanks for your time!
>
> Matthew Runo
> Software Engineer, Zappos.com
> mr...@zappos.com - 702-943-7833
>
> On Feb 2, 2009, at 1:46 PM, Derek Springer wrote:
>
>  Hi all,
>> I recently created a Solr index to track some news articles that I follow
>> and I've noticed that I occasionally receive 500 errors when posting an
>> update. It doesn't happen every time and I can't seem to reproduce the
>> error. I should mention that I have another Solr index setup under the
>> same
>> instance (configured via solr.xml) and I do not seem to be having the same
>> issue. Also, I can query the index without issue.
>>
>> Does anyone know if this is an error with the Tomcat server I have set up,
>> or an issue with Solr itself? Has anyone else experienced a similar issue?
>>
>> If it's any help, here's a dump of the xml that caused an error:
>>
>> Pinging Solr Error: HTTP Error 500: Internal Server Error
>> 
>> 
>>   'The day the music died'? Hardly
>>   
>> http://rss.cnn.com/~r/rss/cnn_showbiz/~3/JBV2Hu7Pisg/index.html
>> 
>> 
>>   The plane crash that killed Buddy
>> Holly, Ritchie Valens and The Big Bopper has echoed through rock 'n' roll
>> history for 50 years, representing, if not the end of rock 'n' roll
>> itself,
>> the close of an era. On Monday night, the  anniversary of the trio's
>> deaths,
>> a huge tribute concert is taking place.
>>   2009-02-02T15:43:54Z
>>   www.cnn.com
>> 
>>
>> 
>>   '867-5309' number for sale on
>> eBay
>>   
>> http://rss.cnn.com/~r/rss/cnn_showbiz/~3/rxehPnDAe7Y/index.html
>> 
>> 
>>   Jenny's phone number is for sale, but
>> not for a song.
>>   2009-02-02T18:53:42Z
>>   www.cnn.com
>> 
>>
>> 
>>   Porn airs during Super Bowl
>>   
>> http://rss.cnn.com/~r/rss/cnn_showbiz/~3/pCTDvXLkyb4/index.html
>> 
>> 
>>   Super Bowl fans in Tucson, Arizona,
>> cau

Understanding Solr memory usage

2009-02-02 Thread Matthew A . Wagner

I apologize in advance for what's probably a foolish question, but I'm
trying to get a feel for how much memory a properly-configured Solr
instance should be using.

I have an index with 2.5 million documents. The documents aren't all that
large. Our index is 25GB, and optimized fairly often.

We're consistently running out of memory. Sometimes it's a heap space
error, and other times the machine will run into swap. (The latter may not
be directly related to Solr, but nothing else is running on the box.)

We have four dedicated servers for this, each a quad Xeon with 16GB RAM. We
have one master that receives all updates, and three slaves that handle
queries. The three slaves have Tomcat configured for a 14GB heap. There
really isn't a lot of disk activity.

The machines seem underloaded to me, receiving less than one query per
second on average. Requests are served in about 300ms average, so it's not
as if we have many concurrent queries backing up.

We do use multi-field faceting in some searches. I'm having a hard time
figuring out how big of an impact this may have.

None of our caches (filter, auto-warming, etc.) are set for more than 512
documents.

Obviously, memory usage is going to be very variable, but what I'm
wondering is:
a.) Does this sound like a sane configuration, or is something seriously
wrong? It seems that many people are able to run considerably larger
indexes with considerably less resources.
b.) Is there any documentation on how the memory is being used? Is Solr
attempting to cram as much of the 25GB index into memory as possible? Maybe
I just overlooked something, but I don't know how to begin calculating
Solr's memory requirements.
c.) Does anything in the description of my Solr setup jump out at you as a
potential source of memory problems? We've increased the heap space
considerably, up to the current 14GB, and we're still running out of heap
space periodically.

Thanks in advance for any help!
-- Matt Wagner

Re: Understanding Solr memory usage

2009-02-02 Thread Mark Miller

You shouldn't need and dont want to give tomcat anywhere near 14 of GB 
or RAM. You also should certainly not being running out of memory with 
that much RAM and that few documents. Not even close.


You want to leave plenty of RAM for the filesystem cache - so that a lot 
of that 25 gig can be cached in RAM - especially with indexes that large 
(25 gig is somewhat large by index size, 2.5 million documents is not). 
You are likely starving the filesystem cache and OS of RAM. And running 
into swap just because you have given the JVM so much RAM.


You probably do want to tune your cache sizes, but thats not your 
problem here.


Trying giving tomcat a few gig rather than 14 - the rest won't go to waste.

- Mark

Matthew A. Wagner wrote:

I apologize in advance for what's probably a foolish question, but I'm
trying to get a feel for how much memory a properly-configured Solr
instance should be using.

I have an index with 2.5 million documents. The documents aren't all that
large. Our index is 25GB, and optimized fairly often.

We're consistently running out of memory. Sometimes it's a heap space
error, and other times the machine will run into swap. (The latter may not
be directly related to Solr, but nothing else is running on the box.)

We have four dedicated servers for this, each a quad Xeon with 16GB RAM. We
have one master that receives all updates, and three slaves that handle
queries. The three slaves have Tomcat configured for a 14GB heap. There
really isn't a lot of disk activity.

The machines seem underloaded to me, receiving less than one query per
second on average. Requests are served in about 300ms average, so it's not
as if we have many concurrent queries backing up.

We do use multi-field faceting in some searches. I'm having a hard time
figuring out how big of an impact this may have.

None of our caches (filter, auto-warming, etc.) are set for more than 512
documents.

Obviously, memory usage is going to be very variable, but what I'm
wondering is:
a.) Does this sound like a sane configuration, or is something seriously
wrong? It seems that many people are able to run considerably larger
indexes with considerably less resources.
b.) Is there any documentation on how the memory is being used? Is Solr
attempting to cram as much of the 25GB index into memory as possible? Maybe
I just overlooked something, but I don't know how to begin calculating
Solr's memory requirements.
c.) Does anything in the description of my Solr setup jump out at you as a
potential source of memory problems? We've increased the heap space
considerably, up to the current 14GB, and we're still running out of heap
space periodically.

Thanks in advance for any help!
-- Matt Wagner

Re: How to modify the revelance sorting in solr?

2009-02-02 Thread Chris Hostetter


: 1 support a query language, "songname + artist " or "artist + album" or "
: artist + album + songname", some guys would like to query like "because of
: you ne-yo". So I need to cut words in the proper way. How to modify the way
: of cutting words in solr ( recognize the song name or album or artist)

take a look at the dismax queryparser ... it will let you search for all 
of the words across various fields, and will let you specify in your 
configs how "significant" various fields should be in the score 
calculation.

as for spefici recognition of song name or album or artist -- that's a 
slightly harder problem.  if you can describe in words how you think a 
parser should go about figuring out which part of the query string 
corrisponds to which field, then you can express it in code as well, but 
there's no magic in any of hte existing solr query parsers to figure it 
out.

: stop words and cut words into "because", "you", then the results like
: "because I love you" , "because you loved me" are in the front. Another bad

stop words are a function of your analyzer -- customizer the analyzer you 
use in your field type and you can prevent this from happening





-Hoss

Re: Date range query where doc has more than one date field

2009-02-02 Thread Chris Hostetter


: i have a doc which has more than one datefield. they are start and end. now
: i need the user to specify a date range, and i need to find all docs which
: user range is between the docs start and end date fields.

Assuming i'm understanding the question...
http://www.lucidimagination.com/search/document/324344f13c4d34fa/date_range_query_fields

  +startDate:[* TO $user_low] +endDate:[$user_high TO *]

...ie: the start date must be before the low point of the range the user 
specified, and the end date must be after the high point of the range the 
userspecified.





-Hoss

Re: DIH using values from solrconfig.xml inside data-config.xml

2009-02-02 Thread Noble Paul നോബിള്‍ नोब्ळ्

this patch must help

On Mon, Feb 2, 2009 at 10:49 PM, Shalin Shekhar Mangar
 wrote:
> On Mon, Feb 2, 2009 at 10:34 PM, Fergus McMenemie  wrote:
>
>>
>> Is there some simple escape or other syntax to be used or is
>> this an enhancement?
>>
>
> I guess the problem is that we are creating the regex Pattern without first
> resolving the variable. So we need to call VariableResolver.resolve on the
> 'regex' attribute's value before creating the Pattern object.
>
> Please raise an issue for this change. Nice use-case though. I guess we
> never thought someone would need to use a variable in the regex attribute :)
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
--Noble Paul
Index: contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport/RegexTransformer.java
===
--- contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport/RegexTransformer.java	(revision 740022)
+++ contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport/RegexTransformer.java	(working copy)
@@ -45,12 +45,16 @@
   @SuppressWarnings("unchecked")
   public Map transformRow(Map row,
   Context context) {
+VariableResolver vr = context.getVariableResolver();
 List> fields = context.getAllEntityFields();
 for (Map field : fields) {
   String col = field.get(DataImporter.COLUMN);
   String reStr = field.get(REGEX);
+  reStr = vr.replaceTokens(reStr);
   String splitBy = field.get(SPLIT_BY);
+  splitBy =  vr.replaceTokens(splitBy);
   String replaceWith = field.get(REPLACE_WITH);
+  replaceWith = vr.replaceTokens(replaceWith);
   if (reStr != null || splitBy != null) {
 String srcColName = field.get(SRC_COL_NAME);
 if (srcColName == null) {

Re: DIH FileListEntityProcessor recursion and fileName clash

2009-02-02 Thread Fergus McMenemie

Shalin,

OK!

I got myself a JIRA account and opened solr-1000 and followed the
wiki instructions on creating a patch which I have now uploaded! Only
problem is that while the fix seems fine the test case I added to
TestFileListEntityProcessor.java fails. I need somebody who knows 
what they are doing to point out what I am doing wrong and/or how
to debug test failures.

It would also be nice if I knew how to run or debug one Junit
test rather than all of them, which takes almost 8min.



  @Test
  public void testRECURSION() throws IOException {
long time = System.currentTimeMillis();
File childdir = new File("." + time + "/child" );
childdir.mkdirs();
childdir.deleteOnExit();
createFile(childdir, "a.xml", "a.xml".getBytes(), true);
createFile(childdir, "b.xml", "b.xml".getBytes(), true);
createFile(childdir, "c.props", "c.props".getBytes(), true);
Map attrs = AbstractDataImportHandlerTest.createMap(
FileListEntityProcessor.FILE_NAME, "^.*\\.xml$",
FileListEntityProcessor.BASE_DIR, childdir.getAbsolutePath(),
FileListEntityProcessor.RECURSIVE, true);
Context c = AbstractDataImportHandlerTest.getContext(null,
new VariableResolverImpl(), null, 0, Collections.EMPTY_LIST, attrs);
FileListEntityProcessor fileListEntityProcessor = new 
FileListEntityProcessor();
fileListEntityProcessor.init(c);
List fList = new ArrayList();
while (true) {
  // add the documents to the index
  Map f = fileListEntityProcessor.nextRow();
  if (f == null)
break;
  fList.add((String) f.get(FileListEntityProcessor.ABSOLUTE_FILE));
}
System.out.println("List of files indexed -- " + fList);
Assert.assertEquals(3, fList.size());
  }

Regards Fergus.

>On Mon, Feb 2, 2009 at 2:36 AM, Fergus McMenemie  wrote:
>
>> Hello
>>
>> I have been trying to find out why DIH in FileListEntityProcessor
>> mode did not appear to be recursing into subdirectories. Going through
>> FileListEntityProcessor.java I eventually tumbled to the fact that my
>> filename filter setting from data-config.xml also applied to directory
>> names.
>
>
>Hmm, not good.
>
>
>>
>>
>>>   processor="FileListEntityProcessor"
>>   fileName=".*\.xml"
>>   newerThan="'NOW-1000DAYS'"
>>   recursive="true"
>>   rootEntity="false"
>>   dataSource="null"
>>   baseDir="/Volumes/spare/ts/stuff/ford">
>>
>> Now, I feel that the fieldName filter should be applied to files fed
>> into the parser, it should not be applied to the directory names we are
>> recursing through. I bodged the code as follows to adjust the behavior
>> so  that the "FileName" and "excludes" attributes of "entity" only
>> apply to filenames and not directory names.
>
>
>I agree with you.
>
>Perhaps we can have separate filters for directories and files but let's
>hold on till the need comes up.
>
>>
>>
>> It now recurses though my directory tree only indexing the appropriate
>> files! I think the new behavior is more standard.
>>
>> Is this a change valid?
>
>
>Absolutely. Can you please create an issue and attach the patch? Thanks!
>
>-- 
>Regards,
>Shalin Shekhar Mangar.

-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===

Re: ERROR trying to just commit via /update

2009-02-02 Thread Chris Hostetter


: I am trying to do just a commit via  url:
: http://localhost:8084/nightly_web/es_jobs_core/update
: I have tryeid also:
: http://localhost:8084/nightly_web/es_jobs_core/update?commit=true
: And I am getting this error:
: 
: 2009-01-20 11:27:50,424 [http-8084-Processor25] ERROR
: org.apache.solr.core.SolrCore - org.apache.solr.common.SolrException:
: missing content stream


: It looks like Solr is asking me a file with info to update (it does the
: commit after that). I just need to do a commit. The problem has appered
: because I am using the scripts of Solr Collection Distribution and when I
: try to do a snapinstaller it calls to commit script... and commit script
: tries to do what I writed above.
: Am I missing something or iis there something wrong in there...?

the first URL you mentioned should in fact cause an exception like the one 
you mentioned -- but even though that's the URL the commit scrip hits with 
curl, you shouldn't see that exception from the commit script -- because 
the commit scrip does do an HTTP post with a "document" containing 
""...

rs=`curl ${curl_url} -s -H 'Content-type:text/xml; charset=utf-8' -d 
""`

...so i'm not sure how/why you would get that error from teh commit 
script.

the second url you mentioned should work without an exception, evenif 
you hit it using an http GET (i've confirmed this against hte trunk, i'm 
not 100% certain that it worked that was in 1.3) ... so i'm not sure what 
exactly might be going on for you at the moment.


-Hoss

Re: DIH - Example of using $nextUrl and $hasMore

2009-02-02 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Mon, Feb 2, 2009 at 9:20 PM, Jon Baer  wrote:
> Hi,
>
> Sorry I know this exists ...
>
> "If an API supports chunking (when the dataset is too large) multiple calls
> need to be made to complete the process. XPathEntityprocessor supports this
> with a transformer. If transformer returns a row which contains a field *
> $hasMore* with a the value "true" the Processor makes another request with
> the same url template (The actual value is recomputed before invoking ). A
> transformer can pass a totally new url too for the next call by returning a
> row which contains a field *$nextUrl* whose value must be the complete url
> for the next call."
>
> But is there a true example of it's use somewhere?  Im trying to figure out
> if I know before import that I have 56 "pages" to index how to set this up
> properly.  (And how to set it up if pages need to be determined by something
> in the feed, etc).
Let us assume that we are working w/ Solr xml interface as the datasource
the url may contain start=x&rows=y. assume that we have 100's of rows
to be fetched and we wish to chunk it.

you can change the variable 'start' on each xml fetched (it does not
hurt even if it is set for each row) and you can compute find $hasMore
from the xml itself.

Setting a variable can be done by putting it into the returned row
from a transformer.
>
> Thanks.
>
> - Jon
>



-- 
--Noble Paul

Re: Dynamic fields in schema.xml file

2009-02-02 Thread Vicky_Dev


Hi Sagar,

Change dynamic field attribute( C,D and E)--stored = true and validate

if above suggestion is not working , can you share your schema and
solrconfig xml contents?


~Vikrant


Sagar Khetkade-2 wrote:
> 
> 
> 
> Hi,
>  
> I am trying out the dynamic field in schema.xml with its attribute as
> true. Right now I indexing 1 articles having five fields in which the
> two fields are explicitly mention as text field and others are the dynamic
> fields. But while search if the query is fired on the last of the dynamic
> field then the results are retrieved but not for the earlier dynamic
> fields. 
>  
> Eg:  A, B, C, D and E  are the fields. A and B are the explicitly
> mentioned fields with A fields having attribute as indexed=true and
> stored=true and B is having indexed=true. Here A field is unique so is a
> required fields.
> Other fields coming (C, D and E) are considered as dynamic fields with
> attribute indexed=true. So if search is made on E the results are
> retrieved but results are not coming while search on C and D. 
> I have also cross verified with document frequency count. The document
> frequency count is coming for A, B and C field but not on C and D fields. 
> I am stuck up on this issue. Is the schema ( my thinking ) is wrong or
> something else. 
>  
> Regards,
> Sagar Khetkade
> _
> Plug in to the MSN Tech channel for a full update on the latest gizmos
> that made an impact.
> http://computing.in.msn.com/
> 

-- 
View this message in context: 
http://www.nabble.com/Dynamic-fields-in-schema.xml-file-tp21784970p21796922.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: DIH FileListEntityProcessor recursion and fileName clash

2009-02-02 Thread Shalin Shekhar Mangar

On Mon, Feb 2, 2009 at 10:08 PM, Fergus McMenemie  wrote:

> Shalin,
>
> OK!
>
> I got myself a JIRA account and opened solr-1000 and followed the
> wiki instructions on creating a patch which I have now uploaded! Only
> problem is that while the fix seems fine the test case I added to
> TestFileListEntityProcessor.java fails. I need somebody who knows
> what they are doing to point out what I am doing wrong and/or how
> to debug test failures.
>

Thanks!

I'll take a look at the test.


>
> It would also be nice if I knew how to run or debug one Junit
> test rather than all of them, which takes almost 8min.
>

The following command can run a single test:
ant -Dtestcase=TestFileListEntityProcessor test

Also, since DIH is a contrib inside solr, you can execute the "test-contrib"
ant target to run only the tests included in contrib projects.

PS: Congratulations to be the lucky one to create the 1000th issue in Solr
:-)

-- 
Regards,
Shalin Shekhar Mangar.

Re: facet dates and distributed search

2009-02-02 Thread Chris Hostetter


: Hey there, I would like to understand why distributed search doesn't suport
: facet dates. As I understand it would have problems because if the time of
: the servers is not syncronized, the results would not be exact but... In
: case I wouldn't mind if results are completley exacts... would be possible
: to use facet dates on distributd search?

other then the clock sync issues you mentioned, i can't think of any rason 
why it wouldnt' work -- that's typically not going to be a problem in the 
common case, because people usually specify a start/end that are rounded 
anyway so even with a few secondsclock drift the rounded values will 
probably be fine.

in general it should be safe to just mergethe NamedLists, suming the 
values of any keys present in multiple lists

: In case I am completely wrong with this explanation... can someone explain
: me the reason why it's not suported? If I understand it maybe I could try to
: do a path ...

i suspect the reason it doesn't work is because no one's gotten arround to 
it.  if you'd like to work on it take a lookat the FacetComponent. on a 
single instance request the process method delegates to the SimpleFacets 
object to do all the work (inlcudingthe date faceting) but on a 
distributed request thedistributedProcess method deals with merging all of 
the data from teh various shards ... how you go about getting the date 
faceting counts from the responses from all the other shards is soemthing 
you'd have to ask the folks who worked on distributed searching ... it's 
still magic to me.




-Hoss

Re: search/query issue. sorting, match exact, match first etc

2009-02-02 Thread Chris Hostetter


Have you checked hte archive for other discussions about implementing 
auto-complete functionality?  it'snot something i deal with much, but i 
kow it's been discussed.

your specific requirement that things starting with an exact match be 
ordered alphabeticly seems odd to me ... i suspect sorting by some sort of 
"popularity" or document booost might be better ... but either way i think 
the exact behavior you're looking for is oging to require a custom plugin 
-- i suspect using SpanStartQueries ona field type that uses ngram 
tokenization.  (looking into those concepts should help get you started)


: I am trying to utilize solr into an autocomplete thingy.
: 
: Let's assume I query for 'foo'.
: Assuming we work with case insensitive here.
: 
: I would like to have records returned in specific order. First all that
: have exact match, then all that start with Foo in alphabetical order,
: then all that contain the exact word (but not necessarily first) and
: lastly all matches where foo is anywhere within words.
: Any pointers are more than welcome. I am trying to find something in
: archives as well but no luck so far.
: 
: Example response when searching 'foo' or 'Foo':
: 
: Foo
: Foo AAA
: Foo BBB
: Gooo Foo
: Moo Foo
: xxxfoox
: Boo Foos
: 



-Hoss

Best testing practices for an application using SOLR?

2009-02-02 Thread Bruno Aranda

Hi,

I am writing my first application using Solr and I was wondering if there is
any best practice or how are users implementing their JUnit or integration
tests.

Thanks!

Bruno

Re: Method toMultiMap(NamedList params) in SolrParams

2009-02-02 Thread Chris Hostetter


: I'm getting confused about the method Map
: toMultiMap(NamedList params) in SolrParams class.

toMultiMap probably shouldn't have ever been made public -- it' really 
only ment to be use by toSolrParams (it's refactored out to make the code 
easier to read)

: When some of your parameter is instanceof String[] it's converted to to
: String using the toString() method, which seems
: to me to be wrong. It is probably assuming, that the values in NamedList are
: all String, but when you look at the method

It's not assuming that the NamedList only contains Strings -- it's 
assuming that since it needs to produce a String[] then every object in 
the NamedList should be toString()ed to get a String.  in your case since 
the object is already String[] that seems silly -- but you have to keep in 
mind the whole point is to build a String[] out of *multiple* objects 
(that have the same key in the NamedList) ... so if your NamedLIst 
contained two String[]'s with the same key, what would you expect it to do 
if the behavior was different (union the two arrays?)





-Hoss

Re: faceting question

2009-02-02 Thread Chris Hostetter


: is there no other way then to use the patch?

the patch was commited a while back, but it will require experimenting 
with the trunk.

: > If I understand correctly,
: > 1. You want to query for tagList:a AND tagList:b AND tagList:c
: > 2. At the same time, you want to request facets for tagList but only for
: > tagList:a and tagList:b
: >
: > If that is correct, you can use the features introduced by
: > https://issues.apache.org/jira/browse/SOLR-911
: >
: > However you may need to put #1 as fq instead of q.



-Hoss

Re: question about dismax and parentheses

2009-02-02 Thread Chris Hostetter


: seems to be i cant do this. so my question is transforming to following:
: 
: can i join multiple dismax queries into one? for instance if i'm looking for
: +WORD1 +(WORD2 WORD3)
: it can be translated into +WORD1 +WORD2 and +WORD1 +WORD3 query

can it be done?  sure. you could do that in your client before sending the 
query to solr, or you could write a little SearchComponent to do it on the 
server side.


-Hoss

Re: URL-import field type?

2009-02-02 Thread Chris Hostetter


: But we do not have an inbuilt TokenFilter which does that. Nor does
: DIH support it now . I have opened an issue for DIH
: (https://issues.apache.org/jira/browse/SOLR-980)
: Is it desirable to have  TokenFilter which offers similar functionality?

Probably not (you would have to have a way of configuring what kind of 
analysis would be done on the file)

My point was specificly about the original posters use case: he said he 
already had a TokenFilter that parsed the URL target the way he wanted -- 
in which case it's easy for him to to keep using that TokenFilter by 
writing a factory for it.



-Hoss

Unsubscribing

2009-02-02 Thread Ross MacKinnon

I've tried multiple times to unsubscribe from this list using the proper method 
(mailto:solr-user-unsubscr...@lucene.apache.org), but it's not working!  Can 
anyone help with that?

Re: SOLR - indexing Date/Time in local format?

2009-02-02 Thread Chris Hostetter


: We use Solr1.3 and indexed some of our date fields in the format
: '1995-12-31T23:59:59Z' and as we know this is a UTC date. But we do want to
: index the date in IST  which is +05:30hours so that extra conversion from
: UTC to IST across all our application is avoided.

There's no way to do this with Solr/DateField right now -- in general we 
don't want Solr to have to make any assumptions about what timezone the 
client is in, and we don't want the client to have to know what timezone 
the server is in -- that's why we deal with UTC.  The clients all only 
need to know their own timezone to parse/format the date properly.

if you really want this handled on the server side, you could write an 
UpdateProcessor to deal with this.

: 2) And we have some confusion on how the flexible search functions such as
: (NOW, NOW+1DAY etc) provided by DateMathParser works? Now() is being
: calculated upon considering the date indexed as  UTC or  Localtime? Can we
: have the NOW() results in IST if the date indexed is in IST?

"as  UTC or  Localtime" is meaningless in the context of when "NOW" is.  
TimeZones and Locale (UTC, IST, your localtimezone, etc...) only affect 
parsing, formatting, and rounding of date values. NOW means "this moment 
in time" which is agnostic to what you do with it.

When you start rounding date values (ie: NOW/DAY) that is relative UTC.  

(It would be nice if the client could specify timezone info in the request 
for the purposes of DateMath rounding, but there isn't an easy way for 
that info to be passed into theDateField methods)

Something i've seen people do to get dates rounded to a particular 
timezone is to add the offset for that timezone after rounding, but that 
doesn't take into account daylight savings time...

   start:[NOW/DAY+5HOUR+30MIN TO *]




-Hoss

Re: Datemath Now is UST or IST?

2009-02-02 Thread Chris Hostetter


deja vu...

http://www.nabble.com/SOLR---indexing-Date-Time-in-local-format--to21663464.html


: We use Solr1.3 and indexed some of our date fields in the format
: '1995-12-31T23:59:59Z' and as we know this is a UTC date. But we do want to
: index the date in IST  which is +05:30hours so that extra conversion from
: UTC to IST across all our application is avoided. How to do that?
: 
: And we have some confusion on how the flexible search functions such as
: (NOW, NOW+1DAY etc) provided by DateMathParser works? Now() is being
: calculated upon considering the date indexed as  UTC or  Localtime? Can we
: have the NOW() results in IST if the date indexed is in IST?
: 
: Thanks,
: Kalidoss.m,
: 



-Hoss

Re: Solr configuration for queries

2009-02-02 Thread Chris Hostetter

: I need to configure solr, such that it doesn't do any fancy stuff like
: adding adding wildcard characters to normal query, check for existing
: fields, etc.
: 
: I've modified lucene code for Term queries(can be multiple terms) and I need
: to process only term queries. But solr modifies queries and converts them to
: range queries. I just need that solr simply pass the query to lucene
: IndexSearcher and do nothing else in between. Is it possible?

i'm not entirely sure i understand what you mean ("check for existing 
fields" ?!?!) but you may want to take a look at the FieldQParserPlugin, 
it sounds like it might meet your needs -- if not, it will probably serve 
as a good starting point for implementing your own QParser to generate 
exactly the type of query you want based on the input.



-Hoss

Re: DIH using values from solrconfig.xml inside data-config.xml

2009-02-02 Thread Lance Norskog

A separate problem: when I used the DIH in December, the xpath
implementation had few features.  '[...@qualifier='Date']' may not be
supported.

  


On Mon, Feb 2, 2009 at 9:24 AM, Noble Paul നോബിള്‍ नोब्ळ् <
noble.p...@gmail.com> wrote:

> this patch must help
>
> On Mon, Feb 2, 2009 at 10:49 PM, Shalin Shekhar Mangar
>  wrote:
> > On Mon, Feb 2, 2009 at 10:34 PM, Fergus McMenemie 
> wrote:
> >
> >>
> >> Is there some simple escape or other syntax to be used or is
> >> this an enhancement?
> >>
> >
> > I guess the problem is that we are creating the regex Pattern without
> first
> > resolving the variable. So we need to call VariableResolver.resolve on
> the
> > 'regex' attribute's value before creating the Pattern object.
> >
> > Please raise an issue for this change. Nice use-case though. I guess we
> > never thought someone would need to use a variable in the regex attribute
> :)
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>
>
>
> --
> --Noble Paul
>



-- 
Lance Norskog
goks...@gmail.com
650-922-8831 (US)

Re: Understanding Solr memory usage

2009-02-02 Thread Lance Norskog

How many total values are in the faceted fields? Not just in the faceted
query, but the entire index? A facet query builds a counter array for the
entire space of field values.  This can take much more ram than normal
queries. Sorting is also a memory-eater.

On Mon, Feb 2, 2009 at 2:19 PM, Mark Miller  wrote:

> You shouldn't need and dont want to give tomcat anywhere near 14 of GB or
> RAM. You also should certainly not being running out of memory with that
> much RAM and that few documents. Not even close.
>
> You want to leave plenty of RAM for the filesystem cache - so that a lot of
> that 25 gig can be cached in RAM - especially with indexes that large (25
> gig is somewhat large by index size, 2.5 million documents is not). You are
> likely starving the filesystem cache and OS of RAM. And running into swap
> just because you have given the JVM so much RAM.
>
> You probably do want to tune your cache sizes, but thats not your problem
> here.
>
> Trying giving tomcat a few gig rather than 14 - the rest won't go to waste.
>
> - Mark
>
>
> Matthew A. Wagner wrote:
>
>> I apologize in advance for what's probably a foolish question, but I'm
>> trying to get a feel for how much memory a properly-configured Solr
>> instance should be using.
>>
>> I have an index with 2.5 million documents. The documents aren't all that
>> large. Our index is 25GB, and optimized fairly often.
>>
>> We're consistently running out of memory. Sometimes it's a heap space
>> error, and other times the machine will run into swap. (The latter may not
>> be directly related to Solr, but nothing else is running on the box.)
>>
>> We have four dedicated servers for this, each a quad Xeon with 16GB RAM.
>> We
>> have one master that receives all updates, and three slaves that handle
>> queries. The three slaves have Tomcat configured for a 14GB heap. There
>> really isn't a lot of disk activity.
>>
>> The machines seem underloaded to me, receiving less than one query per
>> second on average. Requests are served in about 300ms average, so it's not
>> as if we have many concurrent queries backing up.
>>
>> We do use multi-field faceting in some searches. I'm having a hard time
>> figuring out how big of an impact this may have.
>>
>> None of our caches (filter, auto-warming, etc.) are set for more than 512
>> documents.
>>
>> Obviously, memory usage is going to be very variable, but what I'm
>> wondering is:
>> a.) Does this sound like a sane configuration, or is something seriously
>> wrong? It seems that many people are able to run considerably larger
>> indexes with considerably less resources.
>> b.) Is there any documentation on how the memory is being used? Is Solr
>> attempting to cram as much of the 25GB index into memory as possible?
>> Maybe
>> I just overlooked something, but I don't know how to begin calculating
>> Solr's memory requirements.
>> c.) Does anything in the description of my Solr setup jump out at you as a
>> potential source of memory problems? We've increased the heap space
>> considerably, up to the current 14GB, and we're still running out of heap
>> space periodically.
>>
>> Thanks in advance for any help!
>> -- Matt Wagner
>>
>>
>>
>
>


-- 
Lance Norskog
goks...@gmail.com
650-922-8831 (US)

Re: Unsubscribing

2009-02-02 Thread kirk beers

I am having the same issue  can't get unsubscribed !!

On Mon, Feb 2, 2009 at 8:45 PM, Ross MacKinnon  wrote:

> I've tried multiple times to unsubscribe from this list using the proper
> method (mailto:solr-user-unsubscr...@lucene.apache.org), but it's not
> working!  Can anyone help with that?
>
>

Re: DIH using values from solrconfig.xml inside data-config.xml

2009-02-02 Thread Noble Paul നോബിള്‍ नोब्ळ्

this syntax is supported /record/metadata/da...@qualifier='Date']. if
I am not wrong and there is a testcase also for that

On Tue, Feb 3, 2009 at 7:20 AM, Lance Norskog  wrote:
> A separate problem: when I used the DIH in December, the xpath
> implementation had few features.  '[...@qualifier='Date']' may not be
> supported.
>
>   dateTimeFormat="MMdd"   />
>
>
> On Mon, Feb 2, 2009 at 9:24 AM, Noble Paul നോബിള്‍ नोब्ळ् <
> noble.p...@gmail.com> wrote:
>
>> this patch must help
>>
>> On Mon, Feb 2, 2009 at 10:49 PM, Shalin Shekhar Mangar
>>  wrote:
>> > On Mon, Feb 2, 2009 at 10:34 PM, Fergus McMenemie 
>> wrote:
>> >
>> >>
>> >> Is there some simple escape or other syntax to be used or is
>> >> this an enhancement?
>> >>
>> >
>> > I guess the problem is that we are creating the regex Pattern without
>> first
>> > resolving the variable. So we need to call VariableResolver.resolve on
>> the
>> > 'regex' attribute's value before creating the Pattern object.
>> >
>> > Please raise an issue for this change. Nice use-case though. I guess we
>> > never thought someone would need to use a variable in the regex attribute
>> :)
>> >
>> > --
>> > Regards,
>> > Shalin Shekhar Mangar.
>> >
>>
>>
>>
>> --
>> --Noble Paul
>>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
> 650-922-8831 (US)
>



-- 
--Noble Paul

Re: Recent document boosting with dismax

2009-02-02 Thread James Brady

Hi, no the data_added field was one per document.
2009/2/1 Erik Hatcher 

> Is your date_added field multiValued and you've assigned multiple to some
> documents?
>
>Erik
>
>
> On Jan 31, 2009, at 4:12 PM, James Brady wrote:
>
>  Hi,I'm following the recipe here:
>>
>> http://wiki.apache.org/solr/SolrRelevancyFAQ#head-b1b1cdedcb9cd9bfd9c994709b4d7e540359b1fdfor
>> boosting recent documents: bf=recip(rord(date_added),1,1000,1000)
>>
>> On some of my servers I've started getting errors like this:
>>
>> SEVERE: java.lang.RuntimeException: there are more terms than documents in
>> field "date_added", but it's impossible to sort on tokenized fields
>> at
>>
>> org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:379)
>> at
>> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)
>> at
>>
>> org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:352)
>> at
>>
>> org.apache.solr.search.function.ReverseOrdFieldSource.getValues(ReverseOrdFieldSource.java:55)
>> at
>>
>> org.apache.solr.search.function.ReciprocalFloatFunction.getValues(ReciprocalFloatFunction.java:56)
>> at
>>
>> org.apache.solr.search.function.FunctionQuery$AllScorer.(FunctionQuery.java:103)
>> at
>>
>> org.apache.solr.search.function.FunctionQuery$FunctionWeight.scorer(FunctionQuery.java:81)
>> at
>>
>> org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:232)
>> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:143)
>> at org.apache.lucene.search.Searcher.search(Searcher.java:118)
>> ...
>>
>> The date_added field is stored as a vanilla Solr date type:
>>   > omitNorms="true"/>
>>
>> I'm having lots of other problems (un-related) with corrupt indices -
>> could
>> it be that in running the org.apache.lucene.index.CheckIndex utility, and
>> losing some documents in the process, the ordinal part of my boost
>> function
>> is permanently broken?
>>
>> Thanks!
>> James
>>
>
>

New wiki pages

2009-02-02 Thread Lance Norskog

http://wiki.apache.org/solr/SchemaDesign
http://wiki.apache.org/solr/LargeIndexes
http://wiki.apache.org/solr/UniqueKey

These pages are based on my recent experience and some generalizations. They
are intended for new users who want to use Solr for a major project.  Please
review them and send me comments.

For example: "they are stupid",  "the wiki has no links to them and those
links should be here", etc.

-- 
Lance Norskog
goks...@gmail.com
650-922-8831 (US)

Re: DIH using values from solrconfig.xml inside data-config.xml

2009-02-02 Thread Fergus McMenemie

The solr data field is populated properly. So I guess that bit works. 
I really wish I could use xpath="//para"

>A separate problem: when I used the DIH in December, the xpath
>implementation had few features.  '[...@qualifier='Date']' may not be
>supported.
>
>  dateTimeFormat="MMdd"   />
>
>
>On Mon, Feb 2, 2009 at 9:24 AM, Noble Paul ?? Â Ë³Ë <
>noble.p...@gmail.com> wrote:
>
>> this patch must help
>>
>> On Mon, Feb 2, 2009 at 10:49 PM, Shalin Shekhar Mangar
>>  wrote:
>> > On Mon, Feb 2, 2009 at 10:34 PM, Fergus McMenemie 
>> wrote:
>> >
>> >>
>> >> Is there some simple escape or other syntax to be used or is
>> >> this an enhancement?
>> >>
>> >
>> > I guess the problem is that we are creating the regex Pattern without
>> first
>> > resolving the variable. So we need to call VariableResolver.resolve on
>> the
>> > 'regex' attribute's value before creating the Pattern object.
>> >
>> > Please raise an issue for this change. Nice use-case though. I guess we
>> > never thought someone would need to use a variable in the regex attribute
>> :)
>> >
>> > --
>> > Regards,
>> > Shalin Shekhar Mangar.
>> >
>>
>>
>>
>> --
>> --Noble Paul
>>
>
>
>
>-- 
>Lance Norskog
>goks...@gmail.com
>650-922-8831 (US)

-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===

Re: DIH using values from solrconfig.xml inside data-config.xml

2009-02-02 Thread Shalin Shekhar Mangar

On Tue, Feb 3, 2009 at 11:59 AM, Fergus McMenemie  wrote:

> The solr data field is populated properly. So I guess that bit works.
> I really wish I could use xpath="//para"
>
>
The limitation comes from streaming the XML instead of creating a DOM.
XPathRecordReader is a custom streaming XPath parser implementation and
streaming is easy only because we limit the syntax. You can use
PlainTextEntityProcessor which gives the XML as a string to a  custom
Transformer. This Transformer can create a DOM, run your XPath query and
populate the fields. It's more expensive but it is an option.
-- 
Regards,
Shalin Shekhar Mangar.

field range (min and max term)

2009-02-02 Thread Ben Incani

Hi Solr users,

Is there a method of retrieving a field range i.e. the min and max
values of that fields term enum.

For example I would like to know the first and last date entry of N
documents.

Regards,

-Ben

44 matches

Mail list logo