Re: Problem adding new requesthandler to solr branch_3x

2011-03-05 Thread Paul Rogers
Koji

many thanks for that.

regards

Paul

On 5 March 2011 00:12, Koji Sekiguchi  wrote:

> 
>>
>> If this amended to read:
>>
>> true
>>
>> the solr-example starts fine.
>>
>
> Paul,
>
> It should be true.
>
> Koji
> --
> http://www.rondhuit.com/en/
>


Re: Location of Main Class in Solr?

2011-03-05 Thread Koji Sekiguchi

(11/03/04 3:30), Anurag wrote:

I searched SolrIndexSearcher.java file but there is no main class.  I wanted
to know as to where this class resides. Can i call this main class (if it
exists)  using command line options in terminal , rather than through war
file?


Kumar,

I think you may want to use EmbeddedSolrServer from your mail method.
Please see:

http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer

Koji
--
http://www.rondhuit.com/en/


Re: Help please - recursively indexing lots and lots of text files

2011-03-05 Thread Estrada Groups
Nutch will also handle this but I'd probably stick with the DIH as Steve 
suggested. On windows it's pretty easy to get a list of all the txt file by 
using 

dir /b/s *.txt > files.txt

Just my $0.02 ;-)

Adam

Sent from my iPhone

On Mar 4, 2011, at 5:52 PM, Steven A Rowe  wrote:

> Hi Colin,
> 
> Solr's DataImportHandler sounds like what you want:
> 
>http://wiki.apache.org/solr/DataImportHandler
> 
> In particular, take a look at FileListEntityProcessor:
> 
>http://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor
> 
> Steve
> 
>> -Original Message-
>> From: csm [mailto:cmcswig...@gmail.com]
>> Sent: Friday, March 04, 2011 5:50 PM
>> To: solr-user@lucene.apache.org
>> Subject: Help please - recursively indexing lots and lots of text files
>> 
>> Hi,
>> 
>> I'm new to Lucene/Solr and I'm trying to build an index of a large body of
>> plaintext files for some corpus research that I'm doing.  There are about
>> 37,000 files of typically 50-100 lines each, and they're scattered
>> throughout a huge nested directory structure.  I've worked through the
>> basic
>> Solr tutorial and the text/html indexing tutorial at
>> http://www.slideshare.net/LucidImagination/indexing-text-and-html-files-
>> with-solr-4063407
>> , but after some looking around, I haven't been able to find any resources
>> for indexing a large number of text files that aren't all sitting in the
>> same directory.
>> 
>> Is this simply a case of having to write a shell script to crawl through
>> the
>> whole directory tree and call cURL for every single file, or is there a
>> library or utility that can do this, or just an easier way?  Any help
>> would
>> be greatly appreciated!  Alternatively, if this is a solved problem and I
>> just need to RTFM, it'd be great if someone could point me in the right
>> direction.
>> 
>> Thanks a lot,
>> Colin
>> 
>> --
>> View this message in context: http://lucene.472066.n3.nabble.com/Help-
>> please-recursively-indexing-lots-and-lots-of-text-files-
>> tp2635884p2635884.html
>> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Location of Main Class in Solr?

2011-03-05 Thread Anurag
Thanks.

On Sat, Mar 5, 2011 at 7:58 PM, Koji Sekiguchi [via Lucene] <
ml-node+2638122-740990375-146...@n3.nabble.com> wrote:

> (11/03/04 3:30), Anurag wrote:
> > I searched SolrIndexSearcher.java file but there is no main class.  I
> wanted
> > to know as to where this class resides. Can i call this main class (if it
>
> > exists)  using command line options in terminal , rather than through war
>
> > file?
>
> Kumar,
>
> I think you may want to use EmbeddedSolrServer from your mail method.
> Please see:
>
> http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer
>
> Koji
> --
> http://www.rondhuit.com/en/
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Location-of-Main-Class-in-Solr-tp2627576p2638122.html
>  To unsubscribe from Location of Main Class in Solr?, click 
> here.
>
>



-- 
Kumar Anurag


-
Kumar Anurag

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Location-of-Main-Class-in-Solr-tp2627576p2638658.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Location of Main Class in Solr?

2011-03-05 Thread Anurag
Do you mean that a .class(DOT class) file is responsible for executing the
solr server, hence no Public Static void main is present anywhere.?

Actually i want to play with the Searcher.java file , but i am finding
difficulty as the Searcher file is very big and any changes will lead to
incompatibility with other files.

On Sat, Mar 5, 2011 at 11:00 PM, kumar anurag wrote:

> Thanks.
>
>
> On Sat, Mar 5, 2011 at 7:58 PM, Koji Sekiguchi [via Lucene] <
> ml-node+2638122-740990375-146...@n3.nabble.com> wrote:
>
>> (11/03/04 3:30), Anurag wrote:
>> > I searched SolrIndexSearcher.java file but there is no main class.  I
>> wanted
>> > to know as to where this class resides. Can i call this main class (if
>> it
>> > exists)  using command line options in terminal , rather than through
>> war
>> > file?
>>
>> Kumar,
>>
>> I think you may want to use EmbeddedSolrServer from your mail method.
>> Please see:
>>
>> http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer
>>
>> Koji
>> --
>> http://www.rondhuit.com/en/
>>
>>
>> --
>>  If you reply to this email, your message will be added to the discussion
>> below:
>>
>> http://lucene.472066.n3.nabble.com/Location-of-Main-Class-in-Solr-tp2627576p2638122.html
>>  To unsubscribe from Location of Main Class in Solr?, click 
>> here.
>>
>>
>
>
>
> --
> Kumar Anurag
>



-- 
Kumar Anurag


-
Kumar Anurag

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Location-of-Main-Class-in-Solr-tp2627576p2638667.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: More Date Math: NOW/WEEK

2011-03-05 Thread Andreas Kemkes
Thank you for the clarification.

Personally, I believe it is correct for a week to start in a different 
month/year and it is certainly what I would expect.  As you pointed out, these 
time units don't form a strictly ordered set (...>year>month>day>..., 
week>day...).

Complications arise from the different notions of what the first day of the 
week 
is (Sunday - US and Canada, Monday - Europe and ISO 8601, Saturday - Middle 
East).  This is handled by the locale, I think.

Further complications are introduced by week numbering, but I don't think this 
applies here (http://en.wikipedia.org/wiki/Seven-day_week#Week_numbering).

Both MySQL 
(http://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_yearweek)
 and Postgres have the notion of weeks.

All this ignores complications of 5-day or 6-day weeks, which were used in 
Russia during certain parts of the last century.  There might be other 
historical cases or even current ones, but as you, I believe a definition like 
"A week is a time unit equal to seven days." is commonly accepted.

But maybe you are correct and this is special logic and belongs in the client.

Regards,

Andreas




From: Chris Hostetter 
To: solr-user@lucene.apache.org
Sent: Tue, March 1, 2011 6:30:26 PM
Subject: Re: More Date Math: NOW/WEEK

: Digging into the source code of DateMathParser.java, i found the following 
: comment:
:99   // NOTE: consciously choosing not to support WEEK at this time,  
: 100   // because of complexity in rounding down to the nearest week   101 
 


: // arround a month/year boundry.   102   // (Not to mention: it's not 
clear 

: what people would *expect*) 
: 
: I was able to implement a work-around in my ruby client using the following 
: pseudo code:
:   wd=NOW.wday; "NOW-#{wd}DAY/DAY"

the main issue that comment in DateMathParser.java is refering to is what 
the ambiguity of what should happen when you try do something like 
"2009-01-02T00:00:00Z/WEEK"

"WEEK" would be the only unit where rounding changed a unit 
*larger* then the one you rounded on -- ie: rounding day only affects 
hours, minutes, seconds, millis; rounding on month only affects days, 
hours, minutes, seconds, millies; but in an example like the one above, 
where Jan 2 2009 was a friday.  rounding down a week (using logic similar 
to what you have) would result in "2008-12-28T00:00:00Z" -- changing the 
month and year.

It's not really clear that that is what people would expect -- i'm 
guessing at least a few people would expect it to stop at the 1st of the 
month.

the ambiguity of what behavior makes the most sense is why never got 
arround to implementing it -- it's certianly possible, but the 
various options seemed too confusing to really be very generally useful 
and easy to understand 

as you point out: people who really want special logic like this (and know 
how they want it to behave) have an easy workarround by evaluating "NOW" 
in the client since every week has exactly seven days.



-Hoss



  

-ignore words not working?

2011-03-05 Thread Andy Newby
Hi,

I'm trying to work out why this query won't work:

((title:"man")) OR ((keywords:"man")) OR ((description:"man")) AND
(has_gif:"1")
AND (category_name:"Clip Art")
AND ((-title:"men") AND (-keywords:"men") AND (-description:"men"))

If I remove the last bit, so we have:

((title:"man")) OR ((keywords:"man")) OR ((description:"man")) AND
(has_gif:"1")
AND (category_name:"Clip Art")

...and it works fine

As soon as I put in -field:"value" it yeilds no results... even though there
are a ton of results that match the criteria :/

Can anyone suggest what I'm doing wrong? Everything else is working
perfectly... would be a shame if I couldn't get this feature working
-- 
Andy Newby
a...@ultranerds.com


Re: Solr chained exclusion query

2011-03-05 Thread Michael Sokolov
It sounds as if what you have done is to index sales events (with fields 
customer, product, and date), and now you want to retrieve customers, 
which are not documents.  The most natural way to handle this is to 
index customers as documents (with fields cust id, last sale date). 
Whenever a new sale comes in for a customer, update the customers' most 
recent sale date.  Then the query side is simple.  You might be able to 
accomplish keeping this up to date using some kind of copy-field (a bit 
like a database trigger).


-Mike

On 3/4/2011 10:31 AM, Peter Sturge wrote:

Hi,

Oh, how I wish it was as simple as that! :-)
The tricky ingredient in the use case is to exclude all documents (from any
'saledate') if there's a "recent" 'product' match (e.g. last month).
So, essentially you have to somehow build a query that looks at 2 different
criteria for the same field ('saledate'). This requires the criteria to be
applied at the DocSet level,
rather than on each Document (or, do them sequentially like in SOLR-2026).

I've been having a look at Karl's SOLR-2026, which looks very interesting,
but I've not got it working on trunk as yet.
The only other way I can see is to do multiple client-side round-trip
queries - using the results of the initial search as a filter for the
second.
It's a bit messy, and not a performance winner (esp w/ distributed searches
on large indexes), so hopefully a server-side solution is out there.

Thanks!
Peter





On Fri, Mar 4, 2011 at 2:14 PM, Savvas-Andreas Moysidis<
savvas.andreas.moysi...@googlemail.com>  wrote:


Can you not calculate on the fly when the date which is one month before
the
current is and use that as your upper limit?

e.g. taking today as an example your upper limit would be
20011-02-04T00:00:00Z
and so your query would be something like:
q=products:Dog AND saledate:[* TO 20011-02-04T00:00:00Z]


On 4 March 2011 11:40, Peter Sturge  wrote:


Hello,

I've been wrestling with a query use case, perhaps someone has done this
already?
Is it possible to write a query that excludes results based on another
query?

Scenario:
I have an index that holds:
   'customer'  (textgen)
   'product'   (textgen)
   'saledate'   (date)

I'm looking to return documents for 'customer' entries who have bought a
'product' in the past, but haven't bought in, say, the last month.
(i.e. need to exclude *all* 'customer' documents who have bought

'product'

in the last month, as well as those who have never bought 'product')

A very simple query like this:
 q=products:Dog AND -(products:Dog AND saledate:[2011-01-01T00:00:00Z

TO

*])
returns 'Dog' documents prior to 1 Jan, but these need to be excluded if
there are matches after 1 Jan.
I wasn't expecting the above query to do the extra exclusion - it's just

to

illustrate the general problem that it operates at document level, not
query
level (like a SQL subquery).
If I could could pipe the results of the above to another query, that

would

likely do the trick.
I've tried negative boosts, magic _query_, query() and such, but with no
luck.

Is this possible?
Any insight into how to write such a query would be much appreciated!

Thanks,
Peter





Re: memory leak during undeploying

2011-03-05 Thread Lance Norskog
Classes get saved in PermGen and are never freed. Apparently there are
JVM options to fix this.
I'm not sure if the old String.intern() use in Lucene had this problem.

Lance

On Wed, Mar 2, 2011 at 10:23 PM, Chris Hostetter
 wrote:
>
> : When I did heap analysis, the culprit always seems to
> : be TimeLimitedCollector thread. Because of this, considerable amount of
> : classes are not getting unloaded.
>        ...
> : > > There are couple of JIRA's related to this:
> : > > https://issues.apache.org/jira/browse/LUCENE-2237,
> : > > https://issues.apache.org/jira/browse/SOLR-1735. Even after applying
> : > these
> : > > patches, the issue still remains.
>
> can you clarify what you mean by this -- are you still seeing that
> TimeLimitedCollector is the culprit in your heap analysis (even with the
> patches) or are you still getting problems with PermGen running out, but
> it's caused by other classes and TimeLimitedCollector is no longer the
> culprit? (and if so: which other classes)
>
> FYI: LUCENE-2822 is realted to LUCENE-2237 and has attracted some more
> attention/comments (i suspect largely because it was filed as a bug
> instead of an improvement)
>
>
> -Hoss
>



-- 
Lance Norskog
goks...@gmail.com