Re: Indexing tweet and searching "@keyword" OR "#keyword"

2011-08-10 Thread Mohammad Shariq
I tried tweaking "WordDelimiterFactory" but I won't accept # OR @ symbols
and it ignored totally.
I need solution plz suggest.

On 4 August 2011 21:08, Jonathan Rochkind  wrote:

> It's the WordDelimiterFactory in your filter chain that's removing the
> punctuation entirely from your index, I think.
>
> Read up on what the WordDelimiter filter does, and what it's settings are;
> decide how you want things to be tokenized in your index to get the behavior
> your want; either get WordDelimiter to do it that way by passing it
> different arguments, or stop using WordDelimiter; come back with any
> questions after trying that!
>
>
>
> On 8/4/2011 11:22 AM, Mohammad Shariq wrote:
>
>> I have indexed around 1 million tweets ( using  "text" dataType).
>> when I search the tweet with "#"  OR "@"  I dont get the exact result.
>> e.g.  when I search for "#ipad" OR "@ipad"   I get the result where ipad
>> is
>> mentioned skipping the "#" and "@".
>> please suggest me, how to tune or what are filterFactories to use to get
>> the
>> desired result.
>> I am indexing the tweet as "text", below is "text" which is there in my
>> schema.xml.
>>
>>
>> 
>> 
>> 
>> > minShingleSize="3" maxShingleSize="3" ignoreCase="true"/>
>> > generateWordParts="1"
>> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
>> catenateAll="0" splitOnCaseChange="1"/>
>> 
>> > protected="protwords.txt" language="English"/>
>> 
>> 
>> 
>> > words="stopwords.txt"
>> minShingleSize="3" maxShingleSize="3" ignoreCase="true"/>
>> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>> 
>> > protected="protwords.txt" language="English"/>
>> 
>> 
>>
>>


-- 
Thanks and Regards
Mohammad Shariq


Re: Is optimize needed on slaves if it replicates from optimized master?

2011-08-10 Thread Bernd Fehling


From what I see on my slaves, yes.
After replication has finished and new index is in place and new reader has 
started
I have always a write.lock file in my index directory on slaves, even though 
the index
on master is optimized.

Regards
Bernd


Am 10.08.2011 09:12, schrieb Pranav Prakash:

Do slaves need a separate optimize command if they replicate from optimized
master?

*Pranav Prakash*

"temet nosce"

Twitter  | Blog  |
Google



Re: Is optimize needed on slaves if it replicates from optimized master?

2011-08-10 Thread Shalin Shekhar Mangar
On Wed, Aug 10, 2011 at 1:11 PM, Bernd Fehling <
bernd.fehl...@uni-bielefeld.de> wrote:

>
> From what I see on my slaves, yes.
> After replication has finished and new index is in place and new reader has
> started
> I have always a write.lock file in my index directory on slaves, even
> though the index
> on master is optimized.
>

That is not true. Replication is roughly a copy of the diff between the
master and the slave's index. An optimized index is a merged and re-written
index so replication from an optimized master will give an optimized copy on
the slave.

The write lock is due to the fact that an IndexWriter is always open in Solr
even on the slaves.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Is optimize needed on slaves if it replicates from optimized master?

2011-08-10 Thread Bernd Fehling


Sure there is actually no optimizing on the slave needed,
but after calling optimize on the slave the write.lock will be removed.
So why is the replication process not doing this?

Regards
Bernd


Am 10.08.2011 10:57, schrieb Shalin Shekhar Mangar:

On Wed, Aug 10, 2011 at 1:11 PM, Bernd Fehling<
bernd.fehl...@uni-bielefeld.de>  wrote:



 From what I see on my slaves, yes.
After replication has finished and new index is in place and new reader has
started
I have always a write.lock file in my index directory on slaves, even
though the index
on master is optimized.



That is not true. Replication is roughly a copy of the diff between the
master and the slave's index. An optimized index is a merged and re-written
index so replication from an optimized master will give an optimized copy on
the slave.

The write lock is due to the fact that an IndexWriter is always open in Solr
even on the slaves.



document indexing

2011-08-10 Thread directorscott
Hello,

First of all, I am a beginner and i am trying to develop a sample
application using SolrNet.

I am struggling about schema definition i need to use to correspond my
needs. In database, i have Books(bookId, name) and Pages(pageId, bookId,
text) tables. They have master-detail relationship. I want to be able to
search in Text area of Pages but list the books. Should i use a schema for
Pages (with pageid as unique key) or for Books (with bookId as unique key)
in this scenario? 

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/document-indexing-tp3241832p3241832.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is optimize needed on slaves if it replicates from optimized master?

2011-08-10 Thread Pranav Prakash
That is not true. Replication is roughly a copy of the diff between the
>> master and the slave's index.
>
>
In my case, during replication entire index is copied from master to slave,
during which the size of index goes a little over double. Then it shrinks to
its original size. Am I doing something wrong? How can I get the master to
serve only delta index instead of serving whole index and the slaves merging
the new and old index?

*Pranav Prakash*


Re: document indexing

2011-08-10 Thread lee carroll
It really does depend upon what you want to do in your app but from
the info given I'd go for denormalizing by repeating the least number
of values. So in your case that would be book

PageID+BookID(uniqueKey), pageID, PageVal1, PageValn, BookID, BookName




On 10 August 2011 09:46, directorscott  wrote:
> Hello,
>
> First of all, I am a beginner and i am trying to develop a sample
> application using SolrNet.
>
> I am struggling about schema definition i need to use to correspond my
> needs. In database, i have Books(bookId, name) and Pages(pageId, bookId,
> text) tables. They have master-detail relationship. I want to be able to
> search in Text area of Pages but list the books. Should i use a schema for
> Pages (with pageid as unique key) or for Books (with bookId as unique key)
> in this scenario?
>
> Thanks.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/document-indexing-tp3241832p3241832.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


frange not working in query

2011-08-10 Thread Amit Sawhney
Hi All,

I am trying to sort the results on a unix timestamp using this query. 

http://url.com:8983/solr/db/select/?indent=on&version=2.1&q={!frange%20l=0.25}query($qq)&qq=nokia&sort=unix-timestamp%20desc&start=0&rows=10&qt=dismax&wt=dismax&fl=*,score&hl=on&hl.snippets=1

When I run this query, it says 'no field name specified in query and no 
defaultSearchField defined in schema.xml'

As soon as I remove the frange query and run this, it starts working fine. 

http://url.com:8983/solr/db/select/?indent=on&version=2.1&q=nokia&sort=unix-timestamp%20desc&start=0&rows=10&qt=dismax&wt=dismax&fl=*,score&hl=on&hl.snippets=1

Any pointers?


Thanks,
Amit

RE: Trying to index pdf docs - lazy loading error - ClassNotFoundException: solr.extraction.ExtractingRequestHandler

2011-08-10 Thread Rode González
I have had a mistake with the configs files. From the example directory all
works correctly. Thanks to all.

---
Rode González
Libnova, SL
Paseo de la Castellana, 153-Madrid
[t]91 449 08 94  [f]91 141 21 21
www.libnova.es

> -Mensaje original-
> De: Rode González [mailto:r...@libnova.es]
> Enviado el: martes, 09 de agosto de 2011 13:04
> Para: solr-user@lucene.apache.org
> CC: Leo
> Asunto: Trying to index pdf docs - lazy loading error -
> ClassNotFoundException: solr.extraction.ExtractingRequestHandler
> 
> Hi all.
> 
> 
> 
> I've tried to index pdf documents using the libraries includes in the
> example distribution of solr 3.3.0.
> 
> 
> 
> I've copied all the jars includes in /dist and /contrib directories in
> a
> common /lib directory and I've included this path to the solrconfig.xml
> file.
> 
> 
> 
> The request handler for binary docs has no changes from the example:
> 
> 
> 
>
>   startup="lazy"
> 
>   class="solr.extraction.ExtractingRequestHandler" >
> 
> 
> 
>   
> 
>   
> 
> 
> 
>   
> 
>   
> 
>   
> 
>   
> 
> 
> 
>   
> 
> 
> 
> I've commented all subnodes except fmap.content because I don't use the
> rest
> of them.
> 
> 
> 
> 
> 
> ...BUT... :)
> 
> 
> 
> When I try :
> 
> 
> 
> curl
> "http://myserver:8080/solr/update/extract/?literal.id=1000&commit=true";
> -F "myfile=@myfile_.pdf"
> 
> 
> 
> I get:
> 
> 
> 
> Status HTTP 500 - lazy loading error
> org.apache.solr.common.SolrException:
> lazy loading error
> 
> ...
> 
> Caused by: org.apache.solr.common.SolrException: Error loading class
> 'solr.extraction.ExtractingRequestHandler'
> 
> ...
> 
> 
> 
> 
> 
> I've moved contrib/extraction/lib/* to my lib/*  .
> 
> Restart the server and I can see in the log that apache-solr-cell-
> 3.3.0.jar
> was added to the classloader. But I get the same result :(  ... lazy
> loading
> error, error loading class.
> 
> 
> 
> 
> 
> 
> 
> # What am I forgetting? what am I missing?
> 
> 
> 
> Thanks
> 
> 
> 
> 
> 
> ---
> 
> Rode González
> 
> 
> 
>   _
> 
> No se encontraron virus en este mensaje.
> Comprobado por AVG - www.avg.com
> Versión: 10.0.1392 / Base de datos de virus: 1520/3822 - Fecha de
> publicación: 08/08/11
> 
> 
> -
> No se encontraron virus en este mensaje.
> Comprobado por AVG - www.avg.com
> Versión: 10.0.1392 / Base de datos de virus: 1520/3822 - Fecha de
> publicación: 08/08/11

-
No se encontraron virus en este mensaje.
Comprobado por AVG - www.avg.com
Versión: 10.0.1392 / Base de datos de virus: 1520/3824 - Fecha de
publicación: 08/09/11




Re: Possible bug in FastVectorHighlighter

2011-08-10 Thread Massimo Schiavon

Worked fine. Thanks a lot!

Massimo

On 09/08/2011 11:58, Jayendra Patil wrote:

Try using -

  
  

Regards,
Jayendra


On Tue, Aug 9, 2011 at 4:46 AM, Massimo Schiavon  wrote:

In my Solr (3.3) configuration I specified these two params:




when I do a simple search I obtain correctly highlighted results where
matches areenclosed with correct tag.
If I do the same request with hl.useFastVectorHighlighter=true in the http
query string (or specifying the same parameter in the config file) the
metches are enclosed with  tag (the default value).

Anyone has encountered the same




Re: document indexing

2011-08-10 Thread directorscott
Could you please tell me schema.xml "fields" tag content for such case?
Currently index data is something like this:

PageID BookID Text
1 1"some text"
2 1"some text"
3 1"some text"
4 1"some text"
5 2"some text"
6 2"some text"
7 2"some text"
8 2"some text"

when i make a simple query for the word "some" on Text field, i will have
all 8 rows returned. but i want to list only 2 items (Books with IDs 1 and
2)

I am also considering to concatenate Text columns and have the index like
this:

BookID PageTexts
1 "some text some text some text"
2 "some text some text some text"

I wonder which index structure is better.


 

lee carroll wrote:
> 
> It really does depend upon what you want to do in your app but from
> the info given I'd go for denormalizing by repeating the least number
> of values. So in your case that would be book
> 
> PageID+BookID(uniqueKey), pageID, PageVal1, PageValn, BookID, BookName
> 
> 
> 
> 
> On 10 August 2011 09:46, directorscott  wrote:
>> Hello,
>>
>> First of all, I am a beginner and i am trying to develop a sample
>> application using SolrNet.
>>
>> I am struggling about schema definition i need to use to correspond my
>> needs. In database, i have Books(bookId, name) and Pages(pageId, bookId,
>> text) tables. They have master-detail relationship. I want to be able to
>> search in Text area of Pages but list the books. Should i use a schema
>> for
>> Pages (with pageid as unique key) or for Books (with bookId as unique
>> key)
>> in this scenario?
>>
>> Thanks.
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/document-indexing-tp3241832p3241832.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
> 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/document-indexing-tp3241832p3242219.html
Sent from the Solr - User mailing list archive at Nabble.com.


Date faceting per last hour, three days and last week

2011-08-10 Thread Joan
Hi,

I'm trying date faceting per last 24 hours, three days and last week, but I
don't know how to do it.

I have a DateField and I want to set different ranges, it is posible?

I understand the example from solr
wikibut
I want to do more "gaps" with the same field_date.

How I do this?

Thanks,

Joan


paging size in SOLR

2011-08-10 Thread jame vaalet
hi,
i want to retrieve all the data from solr (say 10,000 ids ) and my page size
is 1000 .
how do i get back the data (pages) one after other ?do i have to increment
the "start" value each time by the page size from 0 and do the iteration ?
In this case am i querying the index 10 time instead of one or after first
query the result will be cached somewhere for the subsequent pages ?


JAME VAALET


How come this query string starts with wildcard?

2011-08-10 Thread Pranav Prakash
While going through my error logs of Solr, i found that a user had fired a
query - jawapan ujian bulanan thn 4 (bahasa melayu). This was converted to
following for autosuggest purposes -
jawapan?ujian?bulanan?thn?4?(bahasa?melayu)* by the javascript code. Solr
threw the exception

Cannot parse 'jawapan?ujian?bulanan?thn?4?(bahasa?melayu)*': '*' or
'?' not allowed as first character in WildcardQuery

How come this query string begins with wildcard character?

When I changed the query to remove brackets, everything went smooth.
There were no results, because probably my search index didn't had
any.


*Pranav Prakash*

"temet nosce"

Twitter  | Blog  |
Google 


Re: Date faceting per last hour, three days and last week

2011-08-10 Thread O. Klein
I would use facet queries:

facet.query=date:[NOW-1DAY TO NOW]
facet.query=date:[NOW-3DAY TO NOW]
facet.query=date:[NOW-7DAY TO NOW]

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Date-faceting-per-last-hour-three-days-and-last-week-tp3242364p3242574.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: How come this query string starts with wildcard?

2011-08-10 Thread Michael Ryan
I think this is because ")" is treated as a token delimiter. So "(foo)bar" is 
treated the same as "(foo) bar" (that is, bar is treated as a separate word). 
So "(foo)*" is really parsed as "(foo) *" and thus the * is treated as the 
start of a new word.

-Michael


[Help Wanted] Graphics and other help for new Lucene/Solr website

2011-08-10 Thread Grant Ingersoll
Hi,

We are in the process of putting up a new Lucene/Solr/PyLucene/OpenRelevance 
website.  You can see a preview at http://lucene.staging.apache.org/lucene/.  
It is more or less a look and feel copy of Mahout and Open For Biz websites.  
This new site, IMO, both looks better than the old one and will be a lot easier 
for us committers to maintain/update and for others to contribute to.

So, how can you help?  

0.  All of the code is at https://svn.apache.org/repos/asf/lucene/cms/trunk.  
Check it out the usual way using SVN.  If you want to build locally, see 
https://issues.apache.org/jira/browse/LUCENE-2748 and the links to the ASF CMS 
guide.

1. If you have any graphic design skills:
- I'd love to have some "mantle/slide" images along the lines of 
http://lucene.staging.apache.org/lucene/images/mantle-lucene-solr.png.  These 
are used in the slideshow at the top of the Lucene, Core and Solr pages and 
should be interesting, inviting, etc. and should give people warm fuzzy 
feelings about all of our software and the great community we have.  (Think 
Marketing!)
- Help us coordinate the color selection on the various pages, 
especially in the slides and especially on the Solr page, as I'm not sure I 
like the green and black background contrasted with the orange of the Solr logo.

2. In a few more days or maybe a week or so, patches to fix content errors, 
etc. will be welcome.  For now, we are still porting things, so I don't want to 
duplicate effort.

3. New, useful documentation is also, of course, always welcome.

4. Test with your favorite browser.  In particular, I don't have IE handy.  
I've checked the site in Chrome, Firefox and Safari.

If you come up w/  images (I won't guarantee they will be accepted, but I am 
appreciative of the help) or other style fixes, etc., please submit all 
content/patches to https://issues.apache.org/jira/browse/LUCENE-2748 and please 
make sure to check the donation box when attaching the file. 

-Grant

 

Re: unique terms and multi-valued fields

2011-08-10 Thread Erick Erickson
Well, it depends (tm).

If you're talking about *indexed* terms, then the value is stored only
once in both the cases you mentioned below. There's really very little
difference between a non-multi-valued field and a multi-valued field
in terms of how it's stored in the searchable portion of the index,
except for some position information.

So, having an XML doc with a single-valued field

computers laptops

is almost identical (except for position info as positionIncrementGap) as a

computers
laptops

multiValued refers to the *input*, not whether more than one word is
allowed in that field.


Now, about *stored* fields. If you store the data, verbatim copies are
kept in the
storage-specific files in each segment, and the values will be on disk for
each document.

But you probably don't care much because this data is only referenced when you
assemble a document for return to the client, it's irrelevant for searching.

Best
Erick

On Tue, Aug 9, 2011 at 8:02 PM, Kevin Osborn  wrote:
> Please verify my understanding. I have a field called "category" and it has a 
> value "computers". If I use this same field and value for all of my 
> documents, it is really only stored on disk once because "category:computers" 
> is a unique term. Is this correct?
>
> But, what about multi-valued fields. So, I have a field called "category". 
> For 100 documents, it has the values "computers" and "laptops". For 100 other 
> documents, it has the values "computers" and "tablets". Is this stored as 
> "category:computers", "category:laptops", "category:tablets", meaning 3 
> unique terms. Or is it stored as "category:computers,laptops" and 
> "category:computers,tablets". I believe it is the first case (hopefully), but 
> I am not sure.
>
> Thanks.


Re: document indexing

2011-08-10 Thread lee carroll
With the first option you can be page specific in your search results
and searches.
Field collapsing/grouping will help with your normalisation issue.
(what you have listed is different to what I listed you don't have a
unique key)

Option 2 means you loose any ability to reference page, but as you
note your documents are at the level you wish your search results to
be returned.

if you are not interested in page then option 2.

On 10 August 2011 12:22, directorscott  wrote:
> Could you please tell me schema.xml "fields" tag content for such case?
> Currently index data is something like this:
>
> PageID BookID Text
> 1         1        "some text"
> 2         1        "some text"
> 3         1        "some text"
> 4         1        "some text"
> 5         2        "some text"
> 6         2        "some text"
> 7         2        "some text"
> 8         2        "some text"
>
> when i make a simple query for the word "some" on Text field, i will have
> all 8 rows returned. but i want to list only 2 items (Books with IDs 1 and
> 2)
>
> I am also considering to concatenate Text columns and have the index like
> this:
>
> BookID     PageTexts
> 1             "some text some text some text"
> 2             "some text some text some text"
>
> I wonder which index structure is better.
>
>
>
>
> lee carroll wrote:
>>
>> It really does depend upon what you want to do in your app but from
>> the info given I'd go for denormalizing by repeating the least number
>> of values. So in your case that would be book
>>
>> PageID+BookID(uniqueKey), pageID, PageVal1, PageValn, BookID, BookName
>>
>>
>>
>>
>> On 10 August 2011 09:46, directorscott  wrote:
>>> Hello,
>>>
>>> First of all, I am a beginner and i am trying to develop a sample
>>> application using SolrNet.
>>>
>>> I am struggling about schema definition i need to use to correspond my
>>> needs. In database, i have Books(bookId, name) and Pages(pageId, bookId,
>>> text) tables. They have master-detail relationship. I want to be able to
>>> search in Text area of Pages but list the books. Should i use a schema
>>> for
>>> Pages (with pageid as unique key) or for Books (with bookId as unique
>>> key)
>>> in this scenario?
>>>
>>> Thanks.
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/document-indexing-tp3241832p3241832.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/document-indexing-tp3241832p3242219.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


AW: Problem with DIH: How to map key value pair stored in 1-N relation from a JDBC Source?

2011-08-10 Thread Christian Bordis
Thanks, 
for this quick and enlightening answer! 

I didn't consider that a Transformer can create new columns. In combination 
with dynamic fields it is exactly what I was looking for.

Thanks James ^^

-Ursprüngliche Nachricht-
Von: Dyer, James [mailto:james.d...@ingrambook.com] 
Gesendet: Dienstag, 9. August 2011 16:03
An: solr-user@lucene.apache.org
Betreff: RE: Problem with DIH: How to map key value pair stored in 1-N relation 
from a JDBC Source?

Christian,

It looks like you should probably write a Transformer for your DIH script.  I 
assume you have a child entity set up for "PriceTable".  Add a Transformer to 
this entity that will look at the value of "currency" and "price", remove these 
from the row, then add them back in with "currency" as the field name and 
"price" as the column value.

By the way, it would likely be better if instead of field names like "EUR" and 
"CHF", you created a dynamic field entry in schema.xml with a dynamic field 
like this:



Then have your DIH Transformer prepend "CURRENCY_" in front of the field name.  
This way should your company ever add a new currency, you wouldn't need to 
change your schema.

For more information on writing a DIH Transformer, see 
http://wiki.apache.org/solr/DIHCustomTransformer

If you would rather use a scripting language such as javascript instead of 
writing your Transformer in java, see 
http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer .

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


Re: Indexing tweet and searching "@keyword" OR "#keyword"

2011-08-10 Thread Erick Erickson
Please look more carefully at the documentation for WDDF,
specifically:

split on intra-word delimiters (all non alpha-numeric characters).

WordDelimiterFilterFactory will always throw away non alpha-numeric
characters, you can't tell it do to otherwise. Try some of the other
tokenizers/analyzers to get what you want, and also look at the
admin/analysis page to see what the exact effects are of your
fieldType definitions.

Here's a great place to start:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

You probably want something like WhitespaceTokenizerFactory
followed by LowerCaseFilterFactory or some such...

But I really question whether this is what you want either. Do you
really want a search on "ipad" to *fail* to match input of "#ipad"? Or
vice-versa?

KeywordTokenizerFactory is probably not the place you want to start,
the tokenization process doesn't break anything up, you happen to be
getting separate tokens because of WDDF, which as you see can't
process things the way you want.


Best
Erick

On Wed, Aug 10, 2011 at 3:09 AM, Mohammad Shariq  wrote:
> I tried tweaking "WordDelimiterFactory" but I won't accept # OR @ symbols
> and it ignored totally.
> I need solution plz suggest.
>
> On 4 August 2011 21:08, Jonathan Rochkind  wrote:
>
>> It's the WordDelimiterFactory in your filter chain that's removing the
>> punctuation entirely from your index, I think.
>>
>> Read up on what the WordDelimiter filter does, and what it's settings are;
>> decide how you want things to be tokenized in your index to get the behavior
>> your want; either get WordDelimiter to do it that way by passing it
>> different arguments, or stop using WordDelimiter; come back with any
>> questions after trying that!
>>
>>
>>
>> On 8/4/2011 11:22 AM, Mohammad Shariq wrote:
>>
>>> I have indexed around 1 million tweets ( using  "text" dataType).
>>> when I search the tweet with "#"  OR "@"  I dont get the exact result.
>>> e.g.  when I search for "#ipad" OR "@ipad"   I get the result where ipad
>>> is
>>> mentioned skipping the "#" and "@".
>>> please suggest me, how to tune or what are filterFactories to use to get
>>> the
>>> desired result.
>>> I am indexing the tweet as "text", below is "text" which is there in my
>>> schema.xml.
>>>
>>>
>>> 
>>> 
>>>     
>>>     >> minShingleSize="3" maxShingleSize="3" ignoreCase="true"/>
>>>     >> generateWordParts="1"
>>> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
>>> catenateAll="0" splitOnCaseChange="1"/>
>>>     
>>>     >> protected="protwords.txt" language="English"/>
>>> 
>>> 
>>>         
>>>         >> words="stopwords.txt"
>>> minShingleSize="3" maxShingleSize="3" ignoreCase="true"/>
>>>         >> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>>>         
>>>         >> protected="protwords.txt" language="English"/>
>>> 
>>> 
>>>
>>>
>
>
> --
> Thanks and Regards
> Mohammad Shariq
>


Re: frange not working in query

2011-08-10 Thread simon
Could you tell us what you're trying to achieve with the range query ?
It's not clear.

-Simon

On Wed, Aug 10, 2011 at 5:57 AM, Amit Sawhney  wrote:
> Hi All,
>
> I am trying to sort the results on a unix timestamp using this query.
>
> http://url.com:8983/solr/db/select/?indent=on&version=2.1&q={!frange%20l=0.25}query($qq)&qq=nokia&sort=unix-timestamp%20desc&start=0&rows=10&qt=dismax&wt=dismax&fl=*,score&hl=on&hl.snippets=1
>
> When I run this query, it says 'no field name specified in query and no 
> defaultSearchField defined in schema.xml'
>
> As soon as I remove the frange query and run this, it starts working fine.
>
> http://url.com:8983/solr/db/select/?indent=on&version=2.1&q=nokia&sort=unix-timestamp%20desc&start=0&rows=10&qt=dismax&wt=dismax&fl=*,score&hl=on&hl.snippets=1
>
> Any pointers?
>
>
> Thanks,
> Amit


Re: frange not working in query

2011-08-10 Thread simon
I meant the frange query, of course

On Wed, Aug 10, 2011 at 10:21 AM, simon  wrote:
> Could you tell us what you're trying to achieve with the range query ?
> It's not clear.
>
> -Simon
>
> On Wed, Aug 10, 2011 at 5:57 AM, Amit Sawhney  wrote:
>> Hi All,
>>
>> I am trying to sort the results on a unix timestamp using this query.
>>
>> http://url.com:8983/solr/db/select/?indent=on&version=2.1&q={!frange%20l=0.25}query($qq)&qq=nokia&sort=unix-timestamp%20desc&start=0&rows=10&qt=dismax&wt=dismax&fl=*,score&hl=on&hl.snippets=1
>>
>> When I run this query, it says 'no field name specified in query and no 
>> defaultSearchField defined in schema.xml'
>>
>> As soon as I remove the frange query and run this, it starts working fine.
>>
>> http://url.com:8983/solr/db/select/?indent=on&version=2.1&q=nokia&sort=unix-timestamp%20desc&start=0&rows=10&qt=dismax&wt=dismax&fl=*,score&hl=on&hl.snippets=1
>>
>> Any pointers?
>>
>>
>> Thanks,
>> Amit
>


Re: Is optimize needed on slaves if it replicates from optimized master?

2011-08-10 Thread Erick Erickson
This is expected behavior. You might be optimizing
your index on the master after every set of changes,
in which case the entire index is copied. During this
period, the space on disk will at least double, there's no
way around that.

If you do NOT optimize, then the slave will only copy changed
segments instead of the entire index. Optimizing isn't
usually necessary except periodically (daily, perhaps weekly,
perhaps never actually).

All that said, depending on how merging happens, you will always
have the possibility of the entire index being copied sometimes
because you'll happen to hit a merge that merges all segments
into one.

There are some advanced options that can control some parts
of merging, but you need to get to the bottom of why the whole
index is getting copied every time before you go there. I'd bet
you're issuing an optimize.

Best
Erick

On Wed, Aug 10, 2011 at 5:30 AM, Pranav Prakash  wrote:
> That is not true. Replication is roughly a copy of the diff between the
>>> master and the slave's index.
>>
>>
> In my case, during replication entire index is copied from master to slave,
> during which the size of index goes a little over double. Then it shrinks to
> its original size. Am I doing something wrong? How can I get the master to
> serve only delta index instead of serving whole index and the slaves merging
> the new and old index?
>
> *Pranav Prakash*
>


Re: paging size in SOLR

2011-08-10 Thread Erick Erickson
Well, if you really want to you can specify start=0 and rows=1 and
get them all back at once.

You can do page-by-page by incrementing the "start" parameter as you
indicated.

You can keep from re-executing the search by setting your queryResultCache
appropriately, but this affects all searches so might be an issue.

Best
Erick

On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet  wrote:
> hi,
> i want to retrieve all the data from solr (say 10,000 ids ) and my page size
> is 1000 .
> how do i get back the data (pages) one after other ?do i have to increment
> the "start" value each time by the page size from 0 and do the iteration ?
> In this case am i querying the index 10 time instead of one or after first
> query the result will be cached somewhere for the subsequent pages ?
>
>
> JAME VAALET
>


Re: paging size in SOLR

2011-08-10 Thread simon
Worth remembering there are some performance penalties with deep
paging, if you use the page-by-page approach. may not be too much of a
problem if you really are only looking to retrieve 10K docs.

-Simon

On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson
 wrote:
> Well, if you really want to you can specify start=0 and rows=1 and
> get them all back at once.
>
> You can do page-by-page by incrementing the "start" parameter as you
> indicated.
>
> You can keep from re-executing the search by setting your queryResultCache
> appropriately, but this affects all searches so might be an issue.
>
> Best
> Erick
>
> On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet  wrote:
>> hi,
>> i want to retrieve all the data from solr (say 10,000 ids ) and my page size
>> is 1000 .
>> how do i get back the data (pages) one after other ?do i have to increment
>> the "start" value each time by the page size from 0 and do the iteration ?
>> In this case am i querying the index 10 time instead of one or after first
>> query the result will be cached somewhere for the subsequent pages ?
>>
>>
>> JAME VAALET
>>
>


RE: paging size in SOLR

2011-08-10 Thread Jonathan Rochkind
I would imagine the performance penalties with deep paging will ALSO be there 
if you just ask for 1 rows all at once though, instead of in, say, 100 row 
paged batches. Yes? No?

-Original Message-
From: simon [mailto:mtnes...@gmail.com] 
Sent: Wednesday, August 10, 2011 10:44 AM
To: solr-user@lucene.apache.org
Subject: Re: paging size in SOLR

Worth remembering there are some performance penalties with deep
paging, if you use the page-by-page approach. may not be too much of a
problem if you really are only looking to retrieve 10K docs.

-Simon

On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson
 wrote:
> Well, if you really want to you can specify start=0 and rows=1 and
> get them all back at once.
>
> You can do page-by-page by incrementing the "start" parameter as you
> indicated.
>
> You can keep from re-executing the search by setting your queryResultCache
> appropriately, but this affects all searches so might be an issue.
>
> Best
> Erick
>
> On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet  wrote:
>> hi,
>> i want to retrieve all the data from solr (say 10,000 ids ) and my page size
>> is 1000 .
>> how do i get back the data (pages) one after other ?do i have to increment
>> the "start" value each time by the page size from 0 and do the iteration ?
>> In this case am i querying the index 10 time instead of one or after first
>> query the result will be cached somewhere for the subsequent pages ?
>>
>>
>> JAME VAALET
>>
>


Building a facet query in SolrJ

2011-08-10 Thread Simon, Richard T
Hi - I'm trying to do a (I think) simple facet query, but I'm not getting the 
results I expect. I have a field, MyField, and I want to get facets for 
specific values of that field. That is, I want a FacetField if MyField is 
"ABC", "DEF", etc. (a specific list of values), but not if MyField is any other 
value.

If I build my query like this:

SolrQuery query = new SolrQuery( luceneQueryStr );
  query.setStart( request.getStartIndex() );
  query.setRows( request.getMaxResults() );
  query.setFacet(true);
 query.setFacetMinCount(1);

  query.addFacetField(MYFIELD);

  for (String fieldValue : desiredFieldValues) {
   query.addFacetQuery(MYFIELD + ":" + fieldValue);
 }


queryResponse.getFacetFields returns facets for ALL values of MyField. I 
figured that was because setting the facet field with addFacetField caused Solr 
to examine all values. But, if I take out that line, then getFacetFields 
returns an empty list.

I'm sure I'm doing something simple wrong, but I'm out of ideas right now.

-Rich






Re: paging size in SOLR

2011-08-10 Thread jame vaalet
when you say queryResultCache, does it only cache n number of result for the
last one query or more than one queries?


On 10 August 2011 20:14, simon  wrote:

> Worth remembering there are some performance penalties with deep
> paging, if you use the page-by-page approach. may not be too much of a
> problem if you really are only looking to retrieve 10K docs.
>
> -Simon
>
> On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson
>  wrote:
> > Well, if you really want to you can specify start=0 and rows=1 and
> > get them all back at once.
> >
> > You can do page-by-page by incrementing the "start" parameter as you
> > indicated.
> >
> > You can keep from re-executing the search by setting your
> queryResultCache
> > appropriately, but this affects all searches so might be an issue.
> >
> > Best
> > Erick
> >
> > On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet 
> wrote:
> >> hi,
> >> i want to retrieve all the data from solr (say 10,000 ids ) and my page
> size
> >> is 1000 .
> >> how do i get back the data (pages) one after other ?do i have to
> increment
> >> the "start" value each time by the page size from 0 and do the iteration
> ?
> >> In this case am i querying the index 10 time instead of one or after
> first
> >> query the result will be cached somewhere for the subsequent pages ?
> >>
> >>
> >> JAME VAALET
> >>
> >
>



-- 

-JAME


Re: Solr 3.3 crashes after ~18 hours?

2011-08-10 Thread alexander sulz

Okay, with this command it hangs.
Also: I managed to get a Thread Dump (attached).

regards

Am 05.08.2011 15:08, schrieb Yonik Seeley:

On Fri, Aug 5, 2011 at 7:33 AM, alexander sulz  wrote:

Usually you get a XML-Response when doing commits or optimize, in this case
I get nothing
in return, but the site ( http://[...]/solr/update?optimize=true ) DOESN'T
load forever or anything.
It doesn't hang! I just get a blank page / empty response.

Sounds like you are doing it from a browser?
Can you try it from the command line?  It should give back some sort
of response (or hang waiting for a response).

curl "http://localhost:8983/solr/update?commit=true";

-Yonik
http://www.lucidimagination.com



I use the stuff in the example folder, the only changes i made was enable
logging and changing the port to 8985.
I'll try getting a thread dump if it happens again!
So far its looking good with having allocated more memory to it.

Am 04.08.2011 16:08, schrieb Yonik Seeley:

On Thu, Aug 4, 2011 at 8:09 AM, alexander sulz
  wrote:

Thank you for the many replies!

Like I said, I couldn't find anything in logs created by solr.
I just had a look at the /var/logs/messages and there wasn't anything
either.

What I mean by crash is that the process is still there and http GET
pings
would return 200
but when i try visiting /solr/admin, I'd get a blank page! The server
ignores any incoming updates or commits,

"ignores" means what?  The request hangs?  If so, could you get a thread
dump?

Do queries work (like /solr/select?q=*:*) ?


thous throwing no errors, no 503's.. It's like the server has a blackout
and
stares blankly into space.

Are you using a different servlet container than what is shipped with
solr?
If you did start with the solr "example" server, what jetty
configuration changes have you made?

-Yonik
http://www.lucidimagination.com




Full thread dump Java HotSpot(TM) Server VM (19.1-b02 mixed mode):

"DestroyJavaVM" prio=10 tid=0x6e32e800 nid=0x5aeb waiting on condition 
[0x]
   java.lang.Thread.State: RUNNABLE

"Timer-2" daemon prio=10 tid=0x6e3ff800 nid=0x5b0b in Object.wait() [0x6e6e5000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0xb0260108> (a java.util.TaskQueue)
at java.util.TimerThread.mainLoop(Unknown Source)
- locked <0xb0260108> (a java.util.TaskQueue)
at java.util.TimerThread.run(Unknown Source)

"pool-1-thread-1" prio=10 tid=0x6e32dc00 nid=0x5b0a waiting on condition 
[0x6dae]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xb02680e8> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(Unknown Source)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Unknown
 Source)
at java.util.concurrent.LinkedBlockingQueue.take(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

"Timer-1" daemon prio=10 tid=0x0874e000 nid=0x5b07 in Object.wait() [0x6eb6d000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0xb02601c0> (a java.util.TaskQueue)
at java.util.TimerThread.mainLoop(Unknown Source)
- locked <0xb02601c0> (a java.util.TaskQueue)
at java.util.TimerThread.run(Unknown Source)

"8106640@qtp-25094328-9 - Acceptor0 SocketConnector@0.0.0.0:8985" prio=10 
tid=0x0832dc00 nid=0x5b06 runnable [0x6ecc7000]
   java.lang.Thread.State: RUNNABLE
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.PlainSocketImpl.accept(Unknown Source)
- locked <0xb0260288> (a java.net.SocksSocketImpl)
at java.net.ServerSocket.implAccept(Unknown Source)
at java.net.ServerSocket.accept(Unknown Source)
at org.mortbay.jetty.bio.SocketConnector.accept(SocketConnector.java:99)
at 
org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:708)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

"9097070@qtp-25094328-8" prio=10 tid=0x0832c400 nid=0x5b05 in Object.wait() 
[0x6ed18000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0xb0264018> (a 
org.mortbay.thread.QueuedThreadPool$PoolThread)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:626)
- locked <0xb0264018> (a org.mortbay.thread.QueuedThreadPool$PoolThread)

"4098499@qtp-25094328-7" prio=10 tid=0x0832ac00 nid=0x5b04 in Object.wait() 
[0x6ed69000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Meth

Error loading a custom request handler in Solr 4.0

2011-08-10 Thread Tom Mortimer
Hi,

Apologies if this is really basic. I'm trying to learn how to create a
custom request handler, so I wrote the minimal class (attached), compiled
and jar'd it, and placed it in example/lib. I added this to solrconfig.xml:



When I started Solr with java -jar start.jar, I got this:

...
SEVERE: java.lang.NoClassDefFoundError:
org/apache/solr/handler/RequestHandlerBase
at java.lang.ClassLoader.defineClass1(Native Method)
...

So I copied all the dist/*.jar files into lib and tried again. This time it
seemed to start ok, but browsing to http://localhost:8983/solr/ displayed
this:

org.apache.solr.common.SolrException: Error Instantiating Request
Handler, FlaxTestHandler is not a org.apache.solr.request.SolrRequestHandler

at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:410) ...


Any ideas?

thanks,
Tom


RE: Building a facet query in SolrJ

2011-08-10 Thread Simon, Richard T
Oops. I think I found it. My desiredFieldValues list has the wrong info. Knew 
there was something simple wrong.

From: Simon, Richard T
Sent: Wednesday, August 10, 2011 10:55 AM
To: solr-user@lucene.apache.org
Cc: Simon, Richard T
Subject: Building a facet query in SolrJ

Hi - I'm trying to do a (I think) simple facet query, but I'm not getting the 
results I expect. I have a field, MyField, and I want to get facets for 
specific values of that field. That is, I want a FacetField if MyField is 
"ABC", "DEF", etc. (a specific list of values), but not if MyField is any other 
value.

If I build my query like this:

SolrQuery query = new SolrQuery( luceneQueryStr );
  query.setStart( request.getStartIndex() );
  query.setRows( request.getMaxResults() );
  query.setFacet(true);
 query.setFacetMinCount(1);

  query.addFacetField(MYFIELD);

  for (String fieldValue : desiredFieldValues) {
   query.addFacetQuery(MYFIELD + ":" + fieldValue);
 }


queryResponse.getFacetFields returns facets for ALL values of MyField. I 
figured that was because setting the facet field with addFacetField caused Solr 
to examine all values. But, if I take out that line, then getFacetFields 
returns an empty list.

I'm sure I'm doing something simple wrong, but I'm out of ideas right now.

-Rich






Re: Solr 3.3 crashes after ~18 hours?

2011-08-10 Thread Yonik Seeley
On Wed, Aug 10, 2011 at 11:00 AM, alexander sulz  wrote:
> Okay, with this command it hangs.

It doesn't look like a hang from this thread dump.  It doesn't look
like any solr requests are executing at the time the dump was taken.

Did you do this from the command line?
curl "http://localhost:8983/solr/update?commit=true";

Are you saying that the curl command just hung and never returned?

-Yonik
http://www.lucidimagination.com

> Also: I managed to get a Thread Dump (attached).
>
> regards
>
> Am 05.08.2011 15:08, schrieb Yonik Seeley:
>>
>> On Fri, Aug 5, 2011 at 7:33 AM, alexander sulz
>>  wrote:
>>>
>>> Usually you get a XML-Response when doing commits or optimize, in this
>>> case
>>> I get nothing
>>> in return, but the site ( http://[...]/solr/update?optimize=true )
>>> DOESN'T
>>> load forever or anything.
>>> It doesn't hang! I just get a blank page / empty response.
>>
>> Sounds like you are doing it from a browser?
>> Can you try it from the command line?  It should give back some sort
>> of response (or hang waiting for a response).
>>
>> curl "http://localhost:8983/solr/update?commit=true";
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>>
>>> I use the stuff in the example folder, the only changes i made was enable
>>> logging and changing the port to 8985.
>>> I'll try getting a thread dump if it happens again!
>>> So far its looking good with having allocated more memory to it.
>>>
>>> Am 04.08.2011 16:08, schrieb Yonik Seeley:

 On Thu, Aug 4, 2011 at 8:09 AM, alexander sulz
  wrote:
>
> Thank you for the many replies!
>
> Like I said, I couldn't find anything in logs created by solr.
> I just had a look at the /var/logs/messages and there wasn't anything
> either.
>
> What I mean by crash is that the process is still there and http GET
> pings
> would return 200
> but when i try visiting /solr/admin, I'd get a blank page! The server
> ignores any incoming updates or commits,

 "ignores" means what?  The request hangs?  If so, could you get a thread
 dump?

 Do queries work (like /solr/select?q=*:*) ?

> thous throwing no errors, no 503's.. It's like the server has a
> blackout
> and
> stares blankly into space.

 Are you using a different servlet container than what is shipped with
 solr?
 If you did start with the solr "example" server, what jetty
 configuration changes have you made?

 -Yonik
 http://www.lucidimagination.com
>>>
>
>


Re: Cache replication

2011-08-10 Thread didier deshommes
Consider putting a cache (memcached, redis, etc) *in front* of your
solr slaves. Just make sure to update it when replication occurs.

didier

On Tue, Aug 9, 2011 at 6:07 PM, arian487  wrote:
> I'm wondering if the caches on all the slaves are replicated across (such as
> queryResultCache).  That is to say, if I hit one of my slaves and cache a
> result, and I make a search later and that search happens to hit a different
> slave, will that first cached result be available for use?
>
> This is pretty important because I'm going to have a lot of slaves and if
> this isn't done, then I'd have a high chance of running a lot uncached
> queries.
>
> Thanks :)
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Cache-replication-tp3240708p3240708.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Dates off by 1 day?

2011-08-10 Thread Olson, Ron
Hi all-

I apologize in advance if this turns out to be a problem between the keyboard 
and the chair, but I'm confused about why my date field is correct in the 
index, but wrong in SolrJ.

I have a field defined as a date in the index:



And if I use the admin site to query the data, I get the right date:

2002-05-13T00:00:00Z

But in my SolrJ code:

Iterator iter = queryResponse.getResults().iterator();

while (iter.hasNext())
{
SolrDocument resultDoc = iter.next();

System.out.println("--> " + resultDoc.getFieldValue("FILE_DATE"));

}

I get:

--> Sun May 12 19:00:00 CDT 2002

I've been searching around through the wiki and other places, but can't seem to 
find anything that either mentions this problem or talks about date handling in 
Solr/SolrJ that might refer to something like this.

Thanks for any info,

Ron



DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


Re: Dates off by 1 day?

2011-08-10 Thread Sethi, Parampreet

The Date difference is coming because of different time zones.

In Solr the date is stored as Zulu time zone and Solrj is returning date in
CDT timezone (jvm is picking system time zone.)

> 2002-05-13T00:00:00Z

> I get:
> 
> --> Sun May 12 19:00:00 CDT 2002

You can convert Date in different time-zones using Java Util date functions
if required.

Hope it helps!

-param
On 8/10/11 11:20 AM, "Olson, Ron"  wrote:

> Hi all-
> 
> I apologize in advance if this turns out to be a problem between the keyboard
> and the chair, but I'm confused about why my date field is correct in the
> index, but wrong in SolrJ.
> 
> I have a field defined as a date in the index:
> 
> 
> 
> And if I use the admin site to query the data, I get the right date:
> 
> 2002-05-13T00:00:00Z
> 
> But in my SolrJ code:
> 
> Iterator iter = queryResponse.getResults().iterator();
> 
> while (iter.hasNext())
> {
> SolrDocument resultDoc = iter.next();
> 
> System.out.println("--> " + resultDoc.getFieldValue("FILE_DATE"));
> 
> }
> 
> I get:
> 
> --> Sun May 12 19:00:00 CDT 2002
> 
> I've been searching around through the wiki and other places, but can't seem
> to find anything that either mentions this problem or talks about date
> handling in Solr/SolrJ that might refer to something like this.
> 
> Thanks for any info,
> 
> Ron
> 
> 
> 
> DISCLAIMER: This electronic message, including any attachments, files or
> documents, is intended only for the addressee and may contain CONFIDENTIAL,
> PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended
> recipient, you are hereby notified that any use, disclosure, copying or
> distribution of this message or any of the information included in or with it
> is  unauthorized and strictly prohibited.  If you have received this message
> in error, please notify the sender immediately by reply e-mail and permanently
> delete and destroy this message and its attachments, along with any copies
> thereof. This message does not create any contractual obligation on behalf of
> the sender or Law Bulletin Publishing Company.
> Thank you.



RE: Dates off by 1 day?

2011-08-10 Thread Olson, Ron
Ah, great! I knew the problem was between the keyboard and the chair. Thanks!

-Original Message-
From: Sethi, Parampreet [mailto:parampreet.se...@teamaol.com]
Sent: Wednesday, August 10, 2011 10:25 AM
To: solr-user@lucene.apache.org
Subject: Re: Dates off by 1 day?


The Date difference is coming because of different time zones.

In Solr the date is stored as Zulu time zone and Solrj is returning date in
CDT timezone (jvm is picking system time zone.)

> 2002-05-13T00:00:00Z

> I get:
>
> --> Sun May 12 19:00:00 CDT 2002

You can convert Date in different time-zones using Java Util date functions
if required.

Hope it helps!

-param
On 8/10/11 11:20 AM, "Olson, Ron"  wrote:

> Hi all-
>
> I apologize in advance if this turns out to be a problem between the keyboard
> and the chair, but I'm confused about why my date field is correct in the
> index, but wrong in SolrJ.
>
> I have a field defined as a date in the index:
>
> 
>
> And if I use the admin site to query the data, I get the right date:
>
> 2002-05-13T00:00:00Z
>
> But in my SolrJ code:
>
> Iterator iter = queryResponse.getResults().iterator();
>
> while (iter.hasNext())
> {
> SolrDocument resultDoc = iter.next();
>
> System.out.println("--> " + resultDoc.getFieldValue("FILE_DATE"));
>
> }
>
> I get:
>
> --> Sun May 12 19:00:00 CDT 2002
>
> I've been searching around through the wiki and other places, but can't seem
> to find anything that either mentions this problem or talks about date
> handling in Solr/SolrJ that might refer to something like this.
>
> Thanks for any info,
>
> Ron
>
>
>
> DISCLAIMER: This electronic message, including any attachments, files or
> documents, is intended only for the addressee and may contain CONFIDENTIAL,
> PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended
> recipient, you are hereby notified that any use, disclosure, copying or
> distribution of this message or any of the information included in or with it
> is  unauthorized and strictly prohibited.  If you have received this message
> in error, please notify the sender immediately by reply e-mail and permanently
> delete and destroy this message and its attachments, along with any copies
> thereof. This message does not create any contractual obligation on behalf of
> the sender or Law Bulletin Publishing Company.
> Thank you.



DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


Re: Error loading a custom request handler in Solr 4.0

2011-08-10 Thread simon
Th attachment isn't showing up (in gmail, at least). Can you inline
the relevant bits of code ?

On Wed, Aug 10, 2011 at 11:05 AM, Tom Mortimer  wrote:
> Hi,
> Apologies if this is really basic. I'm trying to learn how to create a
> custom request handler, so I wrote the minimal class (attached), compiled
> and jar'd it, and placed it in example/lib. I added this to solrconfig.xml:
>     
> When I started Solr with java -jar start.jar, I got this:
>     ...
>     SEVERE: java.lang.NoClassDefFoundError:
> org/apache/solr/handler/RequestHandlerBase
> at java.lang.ClassLoader.defineClass1(Native Method)
>         ...
> So I copied all the dist/*.jar files into lib and tried again. This time it
> seemed to start ok, but browsing to http://localhost:8983/solr/ displayed
> this:
>     org.apache.solr.common.SolrException: Error Instantiating Request
> Handler, FlaxTestHandler is not a org.apache.solr.request.SolrRequestHandler
>
>   at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:410) ...
>
> Any ideas?
> thanks,
> Tom
>


Re: Error loading a custom request handler in Solr 4.0

2011-08-10 Thread Tom Mortimer
Sure -

import org.apache.solr.request.SolrQueryRequest;
import org.apache.solr.response.SolrQueryResponse;
import org.apache.solr.handler.RequestHandlerBase;

public class FlaxTestHandler extends RequestHandlerBase {

public FlaxTestHandler() { }

public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse
rsp)
throws Exception
{
rsp.add("FlaxTest", "Hello!");
}

public String getDescription() { return "Flax"; }
public String getSourceId() { return "Flax"; }
public String getSource() { return "Flax"; }
public String getVersion() { return "Flax"; }

}



On 10 August 2011 16:43, simon  wrote:

> Th attachment isn't showing up (in gmail, at least). Can you inline
> the relevant bits of code ?
>
> On Wed, Aug 10, 2011 at 11:05 AM, Tom Mortimer  wrote:
> > Hi,
> > Apologies if this is really basic. I'm trying to learn how to create a
> > custom request handler, so I wrote the minimal class (attached), compiled
> > and jar'd it, and placed it in example/lib. I added this to
> solrconfig.xml:
> > 
> > When I started Solr with java -jar start.jar, I got this:
> > ...
> > SEVERE: java.lang.NoClassDefFoundError:
> > org/apache/solr/handler/RequestHandlerBase
> > at java.lang.ClassLoader.defineClass1(Native Method)
> > ...
> > So I copied all the dist/*.jar files into lib and tried again. This time
> it
> > seemed to start ok, but browsing to http://localhost:8983/solr/displayed
> > this:
> > org.apache.solr.common.SolrException: Error Instantiating Request
> > Handler, FlaxTestHandler is not a
> org.apache.solr.request.SolrRequestHandler
> >
> >   at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:410)
> ...
> >
> > Any ideas?
> > thanks,
> > Tom
> >
>


Re: how to ignore case in solr search field?

2011-08-10 Thread Tom Mortimer
You can use solr.LowerCaseFilterFactory in an analyser chain for both
indexing and queries. The schema.xml supplied with example has several field
types using this (including "text_general").

Tom


On 10 August 2011 16:42, nagarjuna  wrote:

> Hi please help me ..
>how to ignore case while searching in solr
>
>
> ex:i need same results for the keywords abc, ABC , aBc,AbC and all the
> cases.
>
>
>
>
> Thank u in advance
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/how-to-ignore-case-in-solr-search-field-tp3242967p3242967.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Is optimize needed on slaves if it replicates from optimized master?

2011-08-10 Thread Pranav Prakash
Very well explained. Thanks. Yes, we do optimize Index before replication. I
am not particularly worried about disk space usage. I was more curious of
that behavior.

*Pranav Prakash*

"temet nosce"

Twitter  | Blog  |
Google 


On Wed, Aug 10, 2011 at 19:55, Erick Erickson wrote:

> This is expected behavior. You might be optimizing
> your index on the master after every set of changes,
> in which case the entire index is copied. During this
> period, the space on disk will at least double, there's no
> way around that.
>
> If you do NOT optimize, then the slave will only copy changed
> segments instead of the entire index. Optimizing isn't
> usually necessary except periodically (daily, perhaps weekly,
> perhaps never actually).
>
> All that said, depending on how merging happens, you will always
> have the possibility of the entire index being copied sometimes
> because you'll happen to hit a merge that merges all segments
> into one.
>
> There are some advanced options that can control some parts
> of merging, but you need to get to the bottom of why the whole
> index is getting copied every time before you go there. I'd bet
> you're issuing an optimize.
>
> Best
> Erick
>
> On Wed, Aug 10, 2011 at 5:30 AM, Pranav Prakash  wrote:
> > That is not true. Replication is roughly a copy of the diff between the
> >>> master and the slave's index.
> >>
> >>
> > In my case, during replication entire index is copied from master to
> slave,
> > during which the size of index goes a little over double. Then it shrinks
> to
> > its original size. Am I doing something wrong? How can I get the master
> to
> > serve only delta index instead of serving whole index and the slaves
> merging
> > the new and old index?
> >
> > *Pranav Prakash*
> >
>


RE: [Help Wanted] Graphics and other help for new Lucene/Solr website

2011-08-10 Thread karl.wright
The site looks great.  And thank you for including the ManifoldCF link. ;-)

Karl

-Original Message-
From: ext Grant Ingersoll [mailto:gsing...@apache.org] 
Sent: Wednesday, August 10, 2011 10:09 AM
To: solr-user@lucene.apache.org; java-u...@lucene.apache.org
Subject: [Help Wanted] Graphics and other help for new Lucene/Solr website

Hi,

We are in the process of putting up a new Lucene/Solr/PyLucene/OpenRelevance 
website.  You can see a preview at http://lucene.staging.apache.org/lucene/.  
It is more or less a look and feel copy of Mahout and Open For Biz websites.  
This new site, IMO, both looks better than the old one and will be a lot easier 
for us committers to maintain/update and for others to contribute to.

So, how can you help?  

0.  All of the code is at https://svn.apache.org/repos/asf/lucene/cms/trunk.  
Check it out the usual way using SVN.  If you want to build locally, see 
https://issues.apache.org/jira/browse/LUCENE-2748 and the links to the ASF CMS 
guide.

1. If you have any graphic design skills:
- I'd love to have some "mantle/slide" images along the lines of 
http://lucene.staging.apache.org/lucene/images/mantle-lucene-solr.png.  These 
are used in the slideshow at the top of the Lucene, Core and Solr pages and 
should be interesting, inviting, etc. and should give people warm fuzzy 
feelings about all of our software and the great community we have.  (Think 
Marketing!)
- Help us coordinate the color selection on the various pages, 
especially in the slides and especially on the Solr page, as I'm not sure I 
like the green and black background contrasted with the orange of the Solr logo.

2. In a few more days or maybe a week or so, patches to fix content errors, 
etc. will be welcome.  For now, we are still porting things, so I don't want to 
duplicate effort.

3. New, useful documentation is also, of course, always welcome.

4. Test with your favorite browser.  In particular, I don't have IE handy.  
I've checked the site in Chrome, Firefox and Safari.

If you come up w/  images (I won't guarantee they will be accepted, but I am 
appreciative of the help) or other style fixes, etc., please submit all 
content/patches to https://issues.apache.org/jira/browse/LUCENE-2748 and please 
make sure to check the donation box when attaching the file. 

-Grant

 


Re: [Help Wanted] Graphics and other help for new Lucene/Solr website

2011-08-10 Thread Markus Jelsma
Looks nice! Font seems too light to read with comfort though.

> Hi,
> 
> We are in the process of putting up a new
> Lucene/Solr/PyLucene/OpenRelevance website.  You can see a preview at
> http://lucene.staging.apache.org/lucene/.  It is more or less a look and
> feel copy of Mahout and Open For Biz websites.  This new site, IMO, both
> looks better than the old one and will be a lot easier for us committers
> to maintain/update and for others to contribute to.
> 
> So, how can you help?
> 
> 0.  All of the code is at
> https://svn.apache.org/repos/asf/lucene/cms/trunk.  Check it out the usual
> way using SVN.  If you want to build locally, see
> https://issues.apache.org/jira/browse/LUCENE-2748 and the links to the ASF
> CMS guide.
> 
> 1. If you have any graphic design skills:
>   - I'd love to have some "mantle/slide" images along the lines of
> http://lucene.staging.apache.org/lucene/images/mantle-lucene-solr.png. 
> These are used in the slideshow at the top of the Lucene, Core and Solr
> pages and should be interesting, inviting, etc. and should give people
> warm fuzzy feelings about all of our software and the great community we
> have.  (Think Marketing!) - Help us coordinate the color selection on the
> various pages, especially in the slides and especially on the Solr page,
> as I'm not sure I like the green and black background contrasted with the
> orange of the Solr logo.
> 
> 2. In a few more days or maybe a week or so, patches to fix content errors,
> etc. will be welcome.  For now, we are still porting things, so I don't
> want to duplicate effort.
> 
> 3. New, useful documentation is also, of course, always welcome.
> 
> 4. Test with your favorite browser.  In particular, I don't have IE handy. 
> I've checked the site in Chrome, Firefox and Safari.
> 
> If you come up w/  images (I won't guarantee they will be accepted, but I
> am appreciative of the help) or other style fixes, etc., please submit all
> content/patches to https://issues.apache.org/jira/browse/LUCENE-2748 and
> please make sure to check the donation box when attaching the file.
> 
> -Grant


Re: Error loading a custom request handler in Solr 4.0

2011-08-10 Thread simon
It's working for me. Compiled, inserted in solr/lib, added the config
line to solrconfig.

  when I send a /flaxtest request i get



0
16

Hello!


I was doing this within a core defined in solr.xml

-Simon

On Wed, Aug 10, 2011 at 11:46 AM, Tom Mortimer  wrote:
> Sure -
>
> import org.apache.solr.request.SolrQueryRequest;
> import org.apache.solr.response.SolrQueryResponse;
> import org.apache.solr.handler.RequestHandlerBase;
>
> public class FlaxTestHandler extends RequestHandlerBase {
>
>    public FlaxTestHandler() { }
>
>    public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse
> rsp)
>        throws Exception
>    {
>        rsp.add("FlaxTest", "Hello!");
>    }
>
>    public String getDescription() { return "Flax"; }
>    public String getSourceId() { return "Flax"; }
>    public String getSource() { return "Flax"; }
>    public String getVersion() { return "Flax"; }
>
> }
>
>
>
> On 10 August 2011 16:43, simon  wrote:
>
>> Th attachment isn't showing up (in gmail, at least). Can you inline
>> the relevant bits of code ?
>>
>> On Wed, Aug 10, 2011 at 11:05 AM, Tom Mortimer  wrote:
>> > Hi,
>> > Apologies if this is really basic. I'm trying to learn how to create a
>> > custom request handler, so I wrote the minimal class (attached), compiled
>> > and jar'd it, and placed it in example/lib. I added this to
>> solrconfig.xml:
>> >     
>> > When I started Solr with java -jar start.jar, I got this:
>> >     ...
>> >     SEVERE: java.lang.NoClassDefFoundError:
>> > org/apache/solr/handler/RequestHandlerBase
>> > at java.lang.ClassLoader.defineClass1(Native Method)
>> >         ...
>> > So I copied all the dist/*.jar files into lib and tried again. This time
>> it
>> > seemed to start ok, but browsing to http://localhost:8983/solr/displayed
>> > this:
>> >     org.apache.solr.common.SolrException: Error Instantiating Request
>> > Handler, FlaxTestHandler is not a
>> org.apache.solr.request.SolrRequestHandler
>> >
>> >       at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:410)
>> ...
>> >
>> > Any ideas?
>> > thanks,
>> > Tom
>> >
>>
>


query time problem

2011-08-10 Thread Charles-Andre Martin
Hi,

 

I've noticed poor performance for my solr queries in the past few days.

 

Queries of that type :

 

http://server:5000/solr/select?q=story_search_field_en:(water boston) OR 
story_search_field_fr:(water boston)&rows=350&start=0&sort=r_modify_date 
desc&shards=shard1:5001/solr,shard2:5002/solr&fq=type:(cch_story OR 
cch_published_story)

 

Are slow (more than 10 seconds).

 

I would like to know if someone knows how I could investigate the problem ? I 
tried to specify the parameters &debugQuery=on&explainOther=on but this doesn't 
help much.

 

I also monitored the shards log. Sometimes, there is broken pipe in the shards 
logs.

 

Also, is there a way I could monitor the cache statistics ? 

 

For your information, every shards master and slaves computers have enough RAM 
and disk space.

 

 

Charles-André Martin

 

 



Re: Error loading a custom request handler in Solr 4.0

2011-08-10 Thread Tom Mortimer
Interesting.. is this in trunk (4.0)? Maybe I've broken mine somehow!

What classpath did you use for compiling? And did you copy anything other
than the new jar into lib/ ?

thanks,
Tom


On 10 August 2011 18:07, simon  wrote:

> It's working for me. Compiled, inserted in solr/lib, added the config
> line to solrconfig.
>
>  when I send a /flaxtest request i get
>
> 
> 
> 0
> 16
> 
> Hello!
> 
>
> I was doing this within a core defined in solr.xml
>
> -Simon
>
> On Wed, Aug 10, 2011 at 11:46 AM, Tom Mortimer  wrote:
> > Sure -
> >
> > import org.apache.solr.request.SolrQueryRequest;
> > import org.apache.solr.response.SolrQueryResponse;
> > import org.apache.solr.handler.RequestHandlerBase;
> >
> > public class FlaxTestHandler extends RequestHandlerBase {
> >
> >public FlaxTestHandler() { }
> >
> >public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse
> > rsp)
> >throws Exception
> >{
> >rsp.add("FlaxTest", "Hello!");
> >}
> >
> >public String getDescription() { return "Flax"; }
> >public String getSourceId() { return "Flax"; }
> >public String getSource() { return "Flax"; }
> >public String getVersion() { return "Flax"; }
> >
> > }
> >
> >
> >
> > On 10 August 2011 16:43, simon  wrote:
> >
> >> Th attachment isn't showing up (in gmail, at least). Can you inline
> >> the relevant bits of code ?
> >>
> >> On Wed, Aug 10, 2011 at 11:05 AM, Tom Mortimer  wrote:
> >> > Hi,
> >> > Apologies if this is really basic. I'm trying to learn how to create a
> >> > custom request handler, so I wrote the minimal class (attached),
> compiled
> >> > and jar'd it, and placed it in example/lib. I added this to
> >> solrconfig.xml:
> >> > 
> >> > When I started Solr with java -jar start.jar, I got this:
> >> > ...
> >> > SEVERE: java.lang.NoClassDefFoundError:
> >> > org/apache/solr/handler/RequestHandlerBase
> >> > at java.lang.ClassLoader.defineClass1(Native Method)
> >> > ...
> >> > So I copied all the dist/*.jar files into lib and tried again. This
> time
> >> it
> >> > seemed to start ok, but browsing to
> http://localhost:8983/solr/displayed
> >> > this:
> >> > org.apache.solr.common.SolrException: Error Instantiating Request
> >> > Handler, FlaxTestHandler is not a
> >> org.apache.solr.request.SolrRequestHandler
> >> >
> >> >   at
> org.apache.solr.core.SolrCore.createInstance(SolrCore.java:410)
> >> ...
> >> >
> >> > Any ideas?
> >> > thanks,
> >> > Tom
> >> >
> >>
> >
>


RE: Building a facet query in SolrJ

2011-08-10 Thread Simon, Richard T
I take it back. I didn't find it. I corrected my values and the facet queries 
still don't find what I want.

The values I'm looking for are URIs, so they look like: http://place.org/abc/def

I add the facet query like so:

query.addFacetQuery(MyField + ":" + "\"" + uri + "\"");


I print the query, just to see what it is:

Facet Query:  MyField:" : http://place.org/abc/def";

But when I examine queryResponse.getFacetFields, it's an empty list, if I do 
not set the facet field. If I set the facet field to MyField, then I get facets 
for ALL the values of MyField, not just the ones in the facet queries.

Can anyone help here?

Thanks.


From: Simon, Richard T
Sent: Wednesday, August 10, 2011 11:07 AM
To: Simon, Richard T; solr-user@lucene.apache.org
Subject: RE: Building a facet query in SolrJ

Oops. I think I found it. My desiredFieldValues list has the wrong info. Knew 
there was something simple wrong.

From: Simon, Richard T
Sent: Wednesday, August 10, 2011 10:55 AM
To: solr-user@lucene.apache.org
Cc: Simon, Richard T
Subject: Building a facet query in SolrJ

Hi - I'm trying to do a (I think) simple facet query, but I'm not getting the 
results I expect. I have a field, MyField, and I want to get facets for 
specific values of that field. That is, I want a FacetField if MyField is 
"ABC", "DEF", etc. (a specific list of values), but not if MyField is any other 
value.

If I build my query like this:

SolrQuery query = new SolrQuery( luceneQueryStr );
  query.setStart( request.getStartIndex() );
  query.setRows( request.getMaxResults() );
  query.setFacet(true);
 query.setFacetMinCount(1);

  query.addFacetField(MYFIELD);

  for (String fieldValue : desiredFieldValues) {
   query.addFacetQuery(MYFIELD + ":" + fieldValue);
 }


queryResponse.getFacetFields returns facets for ALL values of MyField. I 
figured that was because setting the facet field with addFacetField caused Solr 
to examine all values. But, if I take out that line, then getFacetFields 
returns an empty list.

I'm sure I'm doing something simple wrong, but I'm out of ideas right now.

-Rich






Re: Error loading a custom request handler in Solr 4.0

2011-08-10 Thread simon
This is in trunk (up to date). Compiler is 1.6.0_26

classpath was  
dist/apache-solr-solrj-4.0-SNAPSHOT.jar:dist/apache-solr-core-4.0-SNAPSHOT.jar
built from trunk just prior by 'ant dist'

I'd try again with a clean trunk .

-Simon

On Wed, Aug 10, 2011 at 1:20 PM, Tom Mortimer  wrote:
> Interesting.. is this in trunk (4.0)? Maybe I've broken mine somehow!
>
> What classpath did you use for compiling? And did you copy anything other
> than the new jar into lib/ ?
>
> thanks,
> Tom
>
>
> On 10 August 2011 18:07, simon  wrote:
>
>> It's working for me. Compiled, inserted in solr/lib, added the config
>> line to solrconfig.
>>
>>  when I send a /flaxtest request i get
>>
>> 
>> 
>> 0
>> 16
>> 
>> Hello!
>> 
>>
>> I was doing this within a core defined in solr.xml
>>
>> -Simon
>>
>> On Wed, Aug 10, 2011 at 11:46 AM, Tom Mortimer  wrote:
>> > Sure -
>> >
>> > import org.apache.solr.request.SolrQueryRequest;
>> > import org.apache.solr.response.SolrQueryResponse;
>> > import org.apache.solr.handler.RequestHandlerBase;
>> >
>> > public class FlaxTestHandler extends RequestHandlerBase {
>> >
>> >    public FlaxTestHandler() { }
>> >
>> >    public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse
>> > rsp)
>> >        throws Exception
>> >    {
>> >        rsp.add("FlaxTest", "Hello!");
>> >    }
>> >
>> >    public String getDescription() { return "Flax"; }
>> >    public String getSourceId() { return "Flax"; }
>> >    public String getSource() { return "Flax"; }
>> >    public String getVersion() { return "Flax"; }
>> >
>> > }
>> >
>> >
>> >
>> > On 10 August 2011 16:43, simon  wrote:
>> >
>> >> Th attachment isn't showing up (in gmail, at least). Can you inline
>> >> the relevant bits of code ?
>> >>
>> >> On Wed, Aug 10, 2011 at 11:05 AM, Tom Mortimer  wrote:
>> >> > Hi,
>> >> > Apologies if this is really basic. I'm trying to learn how to create a
>> >> > custom request handler, so I wrote the minimal class (attached),
>> compiled
>> >> > and jar'd it, and placed it in example/lib. I added this to
>> >> solrconfig.xml:
>> >> >     
>> >> > When I started Solr with java -jar start.jar, I got this:
>> >> >     ...
>> >> >     SEVERE: java.lang.NoClassDefFoundError:
>> >> > org/apache/solr/handler/RequestHandlerBase
>> >> > at java.lang.ClassLoader.defineClass1(Native Method)
>> >> >         ...
>> >> > So I copied all the dist/*.jar files into lib and tried again. This
>> time
>> >> it
>> >> > seemed to start ok, but browsing to
>> http://localhost:8983/solr/displayed
>> >> > this:
>> >> >     org.apache.solr.common.SolrException: Error Instantiating Request
>> >> > Handler, FlaxTestHandler is not a
>> >> org.apache.solr.request.SolrRequestHandler
>> >> >
>> >> >       at
>> org.apache.solr.core.SolrCore.createInstance(SolrCore.java:410)
>> >> ...
>> >> >
>> >> > Any ideas?
>> >> > thanks,
>> >> > Tom
>> >> >
>> >>
>> >
>>
>


Re: query time problem

2011-08-10 Thread simon
Off the top of my head ...

Can you tell if GC is happening more frequently than usual/expected  ?

Is the index optimized - if not, how many segments ?

It's possible that one of the shards is behind a flaky network connection.

Is the 10s performance just for the Solr query or wallclock time at
the browser ?

You can monitor cache statistics from the admin console 'statistics' page

Are you seeing anything untoward in the solr logs ?

-Simon

On Wed, Aug 10, 2011 at 1:11 PM, Charles-Andre Martin
 wrote:
> Hi,
>
>
>
> I've noticed poor performance for my solr queries in the past few days.
>
>
>
> Queries of that type :
>
>
>
> http://server:5000/solr/select?q=story_search_field_en:(water boston) OR 
> story_search_field_fr:(water boston)&rows=350&start=0&sort=r_modify_date 
> desc&shards=shard1:5001/solr,shard2:5002/solr&fq=type:(cch_story OR 
> cch_published_story)
>
>
>
> Are slow (more than 10 seconds).
>
>
>
> I would like to know if someone knows how I could investigate the problem ? I 
> tried to specify the parameters &debugQuery=on&explainOther=on but this 
> doesn't help much.
>
>
>
> I also monitored the shards log. Sometimes, there is broken pipe in the 
> shards logs.
>
>
>
> Also, is there a way I could monitor the cache statistics ?
>
>
>
> For your information, every shards master and slaves computers have enough 
> RAM and disk space.
>
>
>
>
>
> Charles-André Martin
>
>
>
>
>
>


How to start troubleshooting a content extraction issue

2011-08-10 Thread Tim AtLee
Hello

So, I'm a newbie to Solr and Tika and whatnot, so please use simple words
for me :P

I am running Solr on Tomcat 7 on Windows Server 2008 r2, running as the
search engine for a Drupal web site.

Up until recently, everything has been fine - searching works, faceting
works, etc.

Recently a user uploaded a 5mb xltm file, which seems to be causing Tomcat
to spike in CPU usage, and eventually error out.  When the documents are
submitted to be index, the tomcat process spikes up to use 100% of 1
available CPU, with the eventual error in Drupal of "Exception occured
sending *sites/default/files/nodefiles/533/June 30, 2011.xltm* to Solr "0"
Status: Communication Error".

I am looking for some help in figuring out where to troubleshoot this.  I
assume it's this file, but I guess I'd like to be sure - so how can I submit
this file for content extraction manually to see what happens?

Thanks,

Tim


Re: Error loading a custom request handler in Solr 4.0

2011-08-10 Thread Tom Mortimer
Thanks Simon. I'll try again tomorrow.

Tom

On 10 August 2011 18:46, simon  wrote:

> This is in trunk (up to date). Compiler is 1.6.0_26
>
> classpath was
>  
> dist/apache-solr-solrj-4.0-SNAPSHOT.jar:dist/apache-solr-core-4.0-SNAPSHOT.jar
> built from trunk just prior by 'ant dist'
>
> I'd try again with a clean trunk .
>
> -Simon
>
> On Wed, Aug 10, 2011 at 1:20 PM, Tom Mortimer  wrote:
> > Interesting.. is this in trunk (4.0)? Maybe I've broken mine somehow!
> >
> > What classpath did you use for compiling? And did you copy anything other
> > than the new jar into lib/ ?
> >
> > thanks,
> > Tom
> >
> >
> > On 10 August 2011 18:07, simon  wrote:
> >
> >> It's working for me. Compiled, inserted in solr/lib, added the config
> >> line to solrconfig.
> >>
> >>  when I send a /flaxtest request i get
> >>
> >> 
> >> 
> >> 0
> >> 16
> >> 
> >> Hello!
> >> 
> >>
> >> I was doing this within a core defined in solr.xml
> >>
> >> -Simon
> >>
> >> On Wed, Aug 10, 2011 at 11:46 AM, Tom Mortimer  wrote:
> >> > Sure -
> >> >
> >> > import org.apache.solr.request.SolrQueryRequest;
> >> > import org.apache.solr.response.SolrQueryResponse;
> >> > import org.apache.solr.handler.RequestHandlerBase;
> >> >
> >> > public class FlaxTestHandler extends RequestHandlerBase {
> >> >
> >> >public FlaxTestHandler() { }
> >> >
> >> >public void handleRequestBody(SolrQueryRequest req,
> SolrQueryResponse
> >> > rsp)
> >> >throws Exception
> >> >{
> >> >rsp.add("FlaxTest", "Hello!");
> >> >}
> >> >
> >> >public String getDescription() { return "Flax"; }
> >> >public String getSourceId() { return "Flax"; }
> >> >public String getSource() { return "Flax"; }
> >> >public String getVersion() { return "Flax"; }
> >> >
> >> > }
> >> >
> >> >
> >> >
> >> > On 10 August 2011 16:43, simon  wrote:
> >> >
> >> >> Th attachment isn't showing up (in gmail, at least). Can you inline
> >> >> the relevant bits of code ?
> >> >>
> >> >> On Wed, Aug 10, 2011 at 11:05 AM, Tom Mortimer 
> wrote:
> >> >> > Hi,
> >> >> > Apologies if this is really basic. I'm trying to learn how to
> create a
> >> >> > custom request handler, so I wrote the minimal class (attached),
> >> compiled
> >> >> > and jar'd it, and placed it in example/lib. I added this to
> >> >> solrconfig.xml:
> >> >> > 
> >> >> > When I started Solr with java -jar start.jar, I got this:
> >> >> > ...
> >> >> > SEVERE: java.lang.NoClassDefFoundError:
> >> >> > org/apache/solr/handler/RequestHandlerBase
> >> >> > at java.lang.ClassLoader.defineClass1(Native Method)
> >> >> > ...
> >> >> > So I copied all the dist/*.jar files into lib and tried again. This
> >> time
> >> >> it
> >> >> > seemed to start ok, but browsing to
> >> http://localhost:8983/solr/displayed
> >> >> > this:
> >> >> > org.apache.solr.common.SolrException: Error Instantiating
> Request
> >> >> > Handler, FlaxTestHandler is not a
> >> >> org.apache.solr.request.SolrRequestHandler
> >> >> >
> >> >> >   at
> >> org.apache.solr.core.SolrCore.createInstance(SolrCore.java:410)
> >> >> ...
> >> >> >
> >> >> > Any ideas?
> >> >> > thanks,
> >> >> > Tom
> >> >> >
> >> >>
> >> >
> >>
> >
>


Re: Building a facet query in SolrJ

2011-08-10 Thread Erik Hatcher
Try making your queries, manually, to see this closer in action... 
q=MyField: and see what you get.  In this case, because your URI contains 
characters that make the default query parser unhappy, do this sort of query 
instead:

{!term f=MyField}

That way the query is "parsed" properly into a single term query.

I am a little confused below since you're faceting on MyField entirely 
(addFacetField) where you'd get the values of each URI facet query in that list 
anyway.

Erik

On Aug 10, 2011, at 13:42 , Simon, Richard T wrote:

> I take it back. I didn't find it. I corrected my values and the facet queries 
> still don't find what I want.
> 
> The values I'm looking for are URIs, so they look like: 
> http://place.org/abc/def
> 
> I add the facet query like so:
> 
> query.addFacetQuery(MyField + ":" + "\"" + uri + "\"");
> 
> 
> I print the query, just to see what it is:
> 
> Facet Query:  MyField:" : http://place.org/abc/def";
> 
> But when I examine queryResponse.getFacetFields, it's an empty list, if I do 
> not set the facet field. If I set the facet field to MyField, then I get 
> facets for ALL the values of MyField, not just the ones in the facet queries.
> 
> Can anyone help here?
> 
> Thanks.
> 
> 
> From: Simon, Richard T
> Sent: Wednesday, August 10, 2011 11:07 AM
> To: Simon, Richard T; solr-user@lucene.apache.org
> Subject: RE: Building a facet query in SolrJ
> 
> Oops. I think I found it. My desiredFieldValues list has the wrong info. Knew 
> there was something simple wrong.
> 
> From: Simon, Richard T
> Sent: Wednesday, August 10, 2011 10:55 AM
> To: solr-user@lucene.apache.org
> Cc: Simon, Richard T
> Subject: Building a facet query in SolrJ
> 
> Hi - I'm trying to do a (I think) simple facet query, but I'm not getting the 
> results I expect. I have a field, MyField, and I want to get facets for 
> specific values of that field. That is, I want a FacetField if MyField is 
> "ABC", "DEF", etc. (a specific list of values), but not if MyField is any 
> other value.
> 
> If I build my query like this:
> 
> SolrQuery query = new SolrQuery( luceneQueryStr );
>  query.setStart( request.getStartIndex() );
>  query.setRows( request.getMaxResults() );
>  query.setFacet(true);
> query.setFacetMinCount(1);
> 
>  query.addFacetField(MYFIELD);
> 
>  for (String fieldValue : desiredFieldValues) {
>   query.addFacetQuery(MYFIELD + ":" + fieldValue);
> }
> 
> 
> queryResponse.getFacetFields returns facets for ALL values of MyField. I 
> figured that was because setting the facet field with addFacetField caused 
> Solr to examine all values. But, if I take out that line, then getFacetFields 
> returns an empty list.
> 
> I'm sure I'm doing something simple wrong, but I'm out of ideas right now.
> 
> -Rich
> 
> 
> 
> 



RE: Building a facet query in SolrJ

2011-08-10 Thread Simon, Richard T
Hi -- I do get facets for all the values of MyField when I specify the facet 
field, but that's not what I want. I just want facets for a subset of the 
values of MyField. That's why I'm trying to use the facet queries, to just get 
facets for those values.


-Rich

-Original Message-
From: Erik Hatcher [mailto:erik.hatc...@gmail.com] 
Sent: Wednesday, August 10, 2011 2:04 PM
To: solr-user@lucene.apache.org
Subject: Re: Building a facet query in SolrJ

Try making your queries, manually, to see this closer in action... 
q=MyField: and see what you get.  In this case, because your URI contains 
characters that make the default query parser unhappy, do this sort of query 
instead:

{!term f=MyField}

That way the query is "parsed" properly into a single term query.

I am a little confused below since you're faceting on MyField entirely 
(addFacetField) where you'd get the values of each URI facet query in that list 
anyway.

Erik

On Aug 10, 2011, at 13:42 , Simon, Richard T wrote:

> I take it back. I didn't find it. I corrected my values and the facet queries 
> still don't find what I want.
> 
> The values I'm looking for are URIs, so they look like: 
> http://place.org/abc/def
> 
> I add the facet query like so:
> 
> query.addFacetQuery(MyField + ":" + "\"" + uri + "\"");
> 
> 
> I print the query, just to see what it is:
> 
> Facet Query:  MyField:" : http://place.org/abc/def";
> 
> But when I examine queryResponse.getFacetFields, it's an empty list, if I do 
> not set the facet field. If I set the facet field to MyField, then I get 
> facets for ALL the values of MyField, not just the ones in the facet queries.
> 
> Can anyone help here?
> 
> Thanks.
> 
> 
> From: Simon, Richard T
> Sent: Wednesday, August 10, 2011 11:07 AM
> To: Simon, Richard T; solr-user@lucene.apache.org
> Subject: RE: Building a facet query in SolrJ
> 
> Oops. I think I found it. My desiredFieldValues list has the wrong info. Knew 
> there was something simple wrong.
> 
> From: Simon, Richard T
> Sent: Wednesday, August 10, 2011 10:55 AM
> To: solr-user@lucene.apache.org
> Cc: Simon, Richard T
> Subject: Building a facet query in SolrJ
> 
> Hi - I'm trying to do a (I think) simple facet query, but I'm not getting the 
> results I expect. I have a field, MyField, and I want to get facets for 
> specific values of that field. That is, I want a FacetField if MyField is 
> "ABC", "DEF", etc. (a specific list of values), but not if MyField is any 
> other value.
> 
> If I build my query like this:
> 
> SolrQuery query = new SolrQuery( luceneQueryStr );
>  query.setStart( request.getStartIndex() );
>  query.setRows( request.getMaxResults() );
>  query.setFacet(true);
> query.setFacetMinCount(1);
> 
>  query.addFacetField(MYFIELD);
> 
>  for (String fieldValue : desiredFieldValues) {
>   query.addFacetQuery(MYFIELD + ":" + fieldValue);
> }
> 
> 
> queryResponse.getFacetFields returns facets for ALL values of MyField. I 
> figured that was because setting the facet field with addFacetField caused 
> Solr to examine all values. But, if I take out that line, then getFacetFields 
> returns an empty list.
> 
> I'm sure I'm doing something simple wrong, but I'm out of ideas right now.
> 
> -Rich
> 
> 
> 
> 



RE: query time problem

2011-08-10 Thread Charles-Andre Martin
Thanks Simon for these tracks.

Here's my answers :

Can you tell if GC is happening more frequently than usual/expected  ?

GC is OK.

Is the index optimized - if not, how many segments ?

According to the statistics page from the admin :
One shard (master/slave) has 10 segments
The other shard (master/slave) has 13 segments

Is this ok ? The optimize job is running each day during the night.


It's possible that one of the shards is behind a flaky network connection.

Will check ...


Is the 10s performance just for the Solr query or wallclock time at
the browser ?

Both

You can monitor cache statistics from the admin console 'statistics' page

Thanks


Are you seeing anything untoward in the solr logs ?

I see stacktrace :

Aug 10, 2011 1:49:13 PM org.apache.solr.common.SolrException log
SEVERE: ClientAbortException:  java.net.SocketException: Broken pipe
at 
org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:358)
at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:325)
at 
org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:381)
at 
org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:370)
at 
org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:89)
at 
org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:183)
at 
org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:89)
at 
org.apache.solr.request.BinaryResponseWriter.write(BinaryResponseWriter.java:48)
at 
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:322)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at 
org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:740)
at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:434)
at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:349)
at 
org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite(InternalOutputBuffer.java:764)
at 
org.apache.coyote.http11.filters.IdentityOutputFilter.doWrite(IdentityOutputFilter.java:127)
at 
org.apache.coyote.http11.InternalOutputBuffer.doWrite(InternalOutputBuffer.java:573)
at org.apache.coyote.Response.doWrite(Response.java:560)
at 
org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:353)
... 21 more

Charles-André Martin


800 Square Victoria
Montréal (Québec) H4Z 0A3
Tél : (514) 504-2703


-Message d'origine-
De : simon [mailto:mtnes...@gmail.com] 
Envoyé : August-10-11 1:52 PM
À : solr-user@lucene.apache.org
Objet : Re: query time problem

Off the top of my head ...

Can you tell if GC is happening more frequently than usual/expected  ?

Is the index optimized - if not, how many segments ?

It's possible that one of the shards is behind a flaky network connection.

Is the 10s performance just for the Solr query or wallclock time at
the browser ?

You can monitor cache statistics from the admin console 'statistics' page

Are you seeing anything untoward in the solr logs ?

-Simon

On Wed, Aug 10, 2011 at 1:11 PM, Charles-Andre Martin
 wrote:
> Hi,
>
>
>
> I've noticed poor performance for my solr queries in the past few days.
>
>
>
> Queries of that type :
>
>
>
> http://server:5000/solr/select?q=story_search_field_en:(water boston) OR 
> story_search_field_fr:(water boston)&rows=350&start=0&sort=r_modify_date 
> desc&shards=shard1:5001/solr,shard2:5002/solr&fq=type:(cch_s

Can't mix Synonyms with Shingles?

2011-08-10 Thread Jeff Wartes

I would like to combine the ShingleFilterFactory with a SynonymFilterFactory in 
a field type. 

I've looked at something like this using the analysis.jsp tool: 


  




...
  
  
  ...
  


However, when a ShingleFilterFactory is applied first, the SynonymFilterFactory 
appears to do nothing. 
I haven't found any documentation or other warnings against this combination, 
and I don't want to apply shingles after synonyms (this works) because 
multi-word synonyms then cause severe term expansion. I don't really mind if 
the synonyms fail to match shingles, (although I'd prefer they succeed) but I'd 
at least expect that synonyms would continue to match on the original tokens, 
as they do if I remove the ShingleFilterFactory.

I'm using Solr 3.3, any clarification would be appreciated.

Thanks,
  -Jeff Wartes



Re: Error loading a custom request handler in Solr 4.0

2011-08-10 Thread Chris Hostetter

: custom request handler, so I wrote the minimal class (attached), compiled
: and jar'd it, and placed it in example/lib. I added this to solrconfig.xml:

that's the crux of hte issue.

example/lib is where the jetty libraries live -- not solr plugins.

you should either put your custom jar's in the "lib" dir of your solr home 
(ie: example/solr/lib) or put it in a directory of your choice that you 
refer to from your solrconfig.xml file using a  directive.

: So I copied all the dist/*.jar files into lib and tried again. This time it

ouch ... make sure you remove *all* of those, or you will have no end of 
random obscure classpath issues at random times as jars are sometimes 
loaded from the war and sometimes loaded from that directory.


-Hoss


RE: Can't mix Synonyms with Shingles?

2011-08-10 Thread Steven A Rowe
Hi Jeff,

Hi Jeff,

You have configured ShingleFilterFactory with a token separator of "", so e.g. 
"International Corporation" will output the shingle "InternationalCorporation". 
 If this is the form you want to use for synonym matching, it must exist in 
your synonym file.  Does it?

Steve

> -Original Message-
> From: Jeff Wartes [mailto:jwar...@whitepages.com]
> Sent: Wednesday, August 10, 2011 3:43 PM
> To: solr-user@lucene.apache.org
> Subject: Can't mix Synonyms with Shingles?
> 
> 
> I would like to combine the ShingleFilterFactory with a
> SynonymFilterFactory in a field type.
> 
> I've looked at something like this using the analysis.jsp tool:
> 
>  positionIncrementGap="100">
>   
> 
>  generateWordParts="1" generateNumberParts="1" stemEnglishPosessive="1"/>
> 
>  synonyms="synonyms.BusinessNames.txt" ignoreCase="true" expand="true"/>
> ...
>   
>   
>   ...
>   
> 
> 
> However, when a ShingleFilterFactory is applied first, the
> SynonymFilterFactory appears to do nothing.
> I haven't found any documentation or other warnings against this
> combination, and I don't want to apply shingles after synonyms (this
> works) because multi-word synonyms then cause severe term expansion. I
> don't really mind if the synonyms fail to match shingles, (although I'd
> prefer they succeed) but I'd at least expect that synonyms would continue
> to match on the original tokens, as they do if I remove the
> ShingleFilterFactory.
> 
> I'm using Solr 3.3, any clarification would be appreciated.
> 
> Thanks,
>   -Jeff Wartes



RE: Building a facet query in SolrJ

2011-08-10 Thread Chris Hostetter

: query.addFacetQuery(MyField + ":" + "\"" + uri + "\"");
...
: But when I examine queryResponse.getFacetFields, it's an empty list, if 

"facet.query" constraints+counts do not come back in the "facet.field" 
section of hte response.  they come back in the "facet.query" section of 
the response (look at the XML in your browser and you'll see what i 
mean)...

https://lucene.apache.org/solr/api/org/apache/solr/client/solrj/response/QueryResponse.html#getFacetQuery%28%29


-Hoss


Re: Example Solr Config on EC2

2011-08-10 Thread Matt Shields
If I were to build a master with multiple slaves, is it possible to promote
a slave to be the new master if the original master fails?  Will all the
slaves pickup right where they left off, or any time the master fails will
we need to completely regenerate all the data?

If this is possible, are there any examples of this being automated?
 Especially on Win2k3.

Matthew Shields
Owner
BeanTown Host - Web Hosting, Domain Names, Dedicated Servers, Colocation,
Managed Services
www.beantownhost.com
www.sysadminvalley.com
www.jeeprally.com



On Mon, Aug 8, 2011 at 5:34 PM,  wrote:

> Matthew,
>
> Here's another resource:
>
> http://www.lucidimagination.com/blog/2010/02/01/solr-shines-through-the-cloud-lucidworks-solr-on-ec2/
>
>
> Michael Bohlig
> Lucid Imagination
>
>
>
> - Original Message 
> From: Matt Shields 
> To: solr-user@lucene.apache.org
> Sent: Mon, August 8, 2011 2:03:20 PM
> Subject: Example Solr Config on EC2
>
> I'm looking for some examples of how to setup Solr on EC2.  The
> configuration I'm looking for would have multiple nodes for redundancy.
> I've tested in-house with a single master and slave with replication
> running in Tomcat on Windows Server 2003, but even if I have multiple
> slaves
> the single master is a single point of failure.  Any suggestions or example
> configurations?  The project I'm working on is a .NET setup, so ideally I'd
> like to keep this search cluster on Windows Server, even though I prefer
> Linux.
>
> Matthew Shields
> Owner
> BeanTown Host - Web Hosting, Domain Names, Dedicated Servers, Colocation,
> Managed Services
> www.beantownhost.com
> www.sysadminvalley.com
> www.jeeprally.com
>
>


Problem with xinclude in solrconfig.xml

2011-08-10 Thread Way Cool
Hi, Guys,

Based on the document below, I should be able to include a file under the
same directory by specifying relative path via xinclude in solrconfig.xml:
http://wiki.apache.org/solr/SolrConfigXml

However I am getting the following error when I use relative path (absolute
path works fine though):
SEVERE: org.xml.sax.SAXParseException: Error attempting to parse XML file

Any ideas?

Thanks,

YH


Re: Problem with xinclude in solrconfig.xml

2011-08-10 Thread Way Cool
Sorry for the spam. I just figured it out. Thanks.

On Wed, Aug 10, 2011 at 2:17 PM, Way Cool  wrote:

> Hi, Guys,
>
> Based on the document below, I should be able to include a file under the
> same directory by specifying relative path via xinclude in solrconfig.xml:
> http://wiki.apache.org/solr/SolrConfigXml
>
> However I am getting the following error when I use relative path (absolute
> path works fine though):
> SEVERE: org.xml.sax.SAXParseException: Error attempting to parse XML file
>
> Any ideas?
>
> Thanks,
>
> YH
>


RE: Can't mix Synonyms with Shingles?

2011-08-10 Thread Jeff Wartes

Hi Steven,

The token separator was certainly a deliberate choice, are you saying that 
after applying shingles, synonyms can only match shingled terms? The term 
analysis suggests the original tokens still exist. 
You've made me realize that only certain synonyms seem to have problems though, 
so it's not a blanket failure.

Take this synonym definition:
wamu, washington mutual bank, washington mutual

Indexing "wamu" looks like it'll work fine - there are no shingles, and all 
three synonym expansions appear to get indexed. (expand="true") However, 
indexing "washington mutual" applies the shingles correctly, (adds 
washingtonmutual to position 1) but the synonym expansion does not happen. I 
would still expect the synonym definition to match the original terms and index 
'wamu' along with the other stuff.

Thanks.



-Original Message-
From: Steven A Rowe [mailto:sar...@syr.edu] 
Sent: Wednesday, August 10, 2011 12:54 PM
To: solr-user@lucene.apache.org
Subject: RE: Can't mix Synonyms with Shingles?

Hi Jeff,

Hi Jeff,

You have configured ShingleFilterFactory with a token separator of "", so e.g. 
"International Corporation" will output the shingle "InternationalCorporation". 
 If this is the form you want to use for synonym matching, it must exist in 
your synonym file.  Does it?

Steve

> -Original Message-
> From: Jeff Wartes [mailto:jwar...@whitepages.com]
> Sent: Wednesday, August 10, 2011 3:43 PM
> To: solr-user@lucene.apache.org
> Subject: Can't mix Synonyms with Shingles?
> 
> 
> I would like to combine the ShingleFilterFactory with a 
> SynonymFilterFactory in a field type.
> 
> I've looked at something like this using the analysis.jsp tool:
> 
>  positionIncrementGap="100">
>   
> 
>  generateWordParts="1" generateNumberParts="1" stemEnglishPosessive="1"/>
> 
>  synonyms="synonyms.BusinessNames.txt" ignoreCase="true" expand="true"/>
> ...
>   
>   
>   ...
>   
> 
> 
> However, when a ShingleFilterFactory is applied first, the 
> SynonymFilterFactory appears to do nothing.
> I haven't found any documentation or other warnings against this 
> combination, and I don't want to apply shingles after synonyms (this
> works) because multi-word synonyms then cause severe term expansion. I 
> don't really mind if the synonyms fail to match shingles, (although 
> I'd prefer they succeed) but I'd at least expect that synonyms would 
> continue to match on the original tokens, as they do if I remove the 
> ShingleFilterFactory.
> 
> I'm using Solr 3.3, any clarification would be appreciated.
> 
> Thanks,
>   -Jeff Wartes



Solr 3.3: DIH configuration for Oracle

2011-08-10 Thread Eugeny Balakhonov
Hello, all!

 

I want to create a good DIH configuration for my Oracle database with deltas
support. Unfortunately I am not able to do it well as DIH has the strange
restrictions.

I want to explain a problem on a simple example. In a reality my database
has very difficult structure.

 

Initial conditions: Two tables with following easy structure:

 

Table1

-  ID_RECORD(Primary key)

-  DATA_FIELD1

-  ..

-  DATA_FIELD2

-  LAST_CHANGE_TIME

Table2

-  ID_RECORD(Primary key)

-  PARENT_ID_RECORD (Foreign key to Table1.ID_RECORD) 

-  DATA_FIELD1

-  ..

-  DATA_FIELD2

-  LAST_CHANGE_TIME

 

In performance reasons it is necessary to do selection of the given tables
by means of one request (via inner join).

 

My db-data-config.xml file:

 















 

In result I have following error:

 

java.lang.IllegalArgumentException: deltaQuery has no column to resolve to
declared primary key pk='T1_ID_RECORD, T2_ID_RECORD'

 

I have analyzed the source code of DIH. I found that in the DocBuilder class
collectDelta() method works with value of entity attribute "pk" as with
simple string. But in my case this is array with two values: T1_ID_RECORD,
T2_ID_RECORD

 

What do I do wrong?

 

Thanks,

Eugeny

 



Increasing the highlight snippet size

2011-08-10 Thread Sang Yum
Hi,

I have been trying to increase the size of the highlight snippets using
"hl.fragSize" parameter, without much success. It seems that hl.fragSize is
not making any difference at all in terms of snippet size.

For example, compare the following two set of query/results:

http://10.1.1.51:8983/solr/select?q=%28bookCode%3abarglewargle+AND+content%3awriting+AND+id:6970%29&rows=1&sort=id+asc&fl=id%2cbookCode%2cnavPointId%2csectionTitle&hl=true&hl.fl=content&hl.snippets=100&hl.fragSize=10&hl.maxAnalyzedChars=-1&version=2.2

 to write a

http://10.1.1.51:8983/solr/select?q=%28bookCode%3abarglewargle+AND+content%3awriting+AND+id:6970%29&rows=1&sort=id+asc&fl=id%2cbookCode%2cnavPointId%2csectionTitle&hl=true&hl.fl=content&hl.snippets=100&hl.fragSize=1000&hl.maxAnalyzedChars=-1&version=2.2

 to write a

Because of our particular needs, the content has been "spanified", each word
with its own span id. I do apply HTMLStrip during the index time.

What I would like to do is to increase the size of snippet so that the
highlighted snippets contain more surrounding words.

Although hl.fragSize went from 10 to 1000, the result is the same.
This leads me to believe that hl.fragSize might not be the correct parameter
to achieve the effect i am looking for. If so, what parameter should I use?

Thanks!


Re: Example Solr Config on EC2

2011-08-10 Thread Akshay
Yes you can promote a slave to be master refer
http://wiki.apache.org/solr/SolrReplication#enable.2BAC8-disable_master.2BAC8-slave_in_a_node

In AWS one can use an elastic IP(http://aws.amazon.com/articles/1346) to
refer to the master and this can be assigned to slaves as they assume the
role of master(in case of failure). All slaves will then refer to this new
master and there will be no need to regenerate data.

Automation of this maybe possible through CloudWatch alarm-actions. I don't
know of any available example automation scripts.

Cheers
Akshay.

On Wed, Aug 10, 2011 at 9:08 PM, Matt Shields  wrote:

> If I were to build a master with multiple slaves, is it possible to promote
> a slave to be the new master if the original master fails?  Will all the
> slaves pickup right where they left off, or any time the master fails will
> we need to completely regenerate all the data?
>
> If this is possible, are there any examples of this being automated?
>  Especially on Win2k3.
>
> Matthew Shields
> Owner
> BeanTown Host - Web Hosting, Domain Names, Dedicated Servers, Colocation,
> Managed Services
> www.beantownhost.com
> www.sysadminvalley.com
> www.jeeprally.com
>
>
>
> On Mon, Aug 8, 2011 at 5:34 PM,  wrote:
>
> > Matthew,
> >
> > Here's another resource:
> >
> >
> http://www.lucidimagination.com/blog/2010/02/01/solr-shines-through-the-cloud-lucidworks-solr-on-ec2/
> >
> >
> > Michael Bohlig
> > Lucid Imagination
> >
> >
> >
> > - Original Message 
> > From: Matt Shields 
> > To: solr-user@lucene.apache.org
> > Sent: Mon, August 8, 2011 2:03:20 PM
> > Subject: Example Solr Config on EC2
> >
> > I'm looking for some examples of how to setup Solr on EC2.  The
> > configuration I'm looking for would have multiple nodes for redundancy.
> > I've tested in-house with a single master and slave with replication
> > running in Tomcat on Windows Server 2003, but even if I have multiple
> > slaves
> > the single master is a single point of failure.  Any suggestions or
> example
> > configurations?  The project I'm working on is a .NET setup, so ideally
> I'd
> > like to keep this search cluster on Windows Server, even though I prefer
> > Linux.
> >
> > Matthew Shields
> > Owner
> > BeanTown Host - Web Hosting, Domain Names, Dedicated Servers, Colocation,
> > Managed Services
> > www.beantownhost.com
> > www.sysadminvalley.com
> > www.jeeprally.com
> >
> >
>


Re: Increasing the highlight snippet size

2011-08-10 Thread simon
an hl.fragsize of 1000 is problematical, as Solr parses that
parameter as a 32 bit int... that's several bits more.

-Simon

On Wed, Aug 10, 2011 at 4:59 PM, Sang Yum  wrote:
> Hi,
>
> I have been trying to increase the size of the highlight snippets using
> "hl.fragSize" parameter, without much success. It seems that hl.fragSize is
> not making any difference at all in terms of snippet size.
>
> For example, compare the following two set of query/results:
>
> http://10.1.1.51:8983/solr/select?q=%28bookCode%3abarglewargle+AND+content%3awriting+AND+id:6970%29&rows=1&sort=id+asc&fl=id%2cbookCode%2cnavPointId%2csectionTitle&hl=true&hl.fl=content&hl.snippets=100&hl.fragSize=10&hl.maxAnalyzedChars=-1&version=2.2
>
>  to class="werd"> write a
>
> http://10.1.1.51:8983/solr/select?q=%28bookCode%3abarglewargle+AND+content%3awriting+AND+id:6970%29&rows=1&sort=id+asc&fl=id%2cbookCode%2cnavPointId%2csectionTitle&hl=true&hl.fl=content&hl.snippets=100&hl.fragSize=1000&hl.maxAnalyzedChars=-1&version=2.2
>
>  to class="werd"> write a
>
> Because of our particular needs, the content has been "spanified", each word
> with its own span id. I do apply HTMLStrip during the index time.
>
> What I would like to do is to increase the size of snippet so that the
> highlighted snippets contain more surrounding words.
>
> Although hl.fragSize went from 10 to 1000, the result is the same.
> This leads me to believe that hl.fragSize might not be the correct parameter
> to achieve the effect i am looking for. If so, what parameter should I use?
>
> Thanks!
>


Re: Increasing the highlight snippet size

2011-08-10 Thread Sang Yum
I was just trying to set it a ridiculously large number to make it work.
What I am seeing is that hl.fragsize doesn't seem to make any difference in
term of highlight snippet size... I just tried the query with hl.fragsize
set to 1000. Same result as 10.

On Wed, Aug 10, 2011 at 2:20 PM, simon  wrote:

> an hl.fragsize of 1000 is problematical, as Solr parses that
> parameter as a 32 bit int... that's several bits more.
>
> -Simon
>
> On Wed, Aug 10, 2011 at 4:59 PM, Sang Yum  wrote:
> > Hi,
> >
> > I have been trying to increase the size of the highlight snippets using
> > "hl.fragSize" parameter, without much success. It seems that hl.fragSize
> is
> > not making any difference at all in terms of snippet size.
> >
> > For example, compare the following two set of query/results:
> >
> >
> http://10.1.1.51:8983/solr/select?q=%28bookCode%3abarglewargle+AND+content%3awriting+AND+id:6970%29&rows=1&sort=id+asc&fl=id%2cbookCode%2cnavPointId%2csectionTitle&hl=true&hl.fl=content&hl.snippets=100&hl.fragSize=10&hl.maxAnalyzedChars=-1&version=2.2
> >
> >  to > class="werd"> write a
> >
> >
> http://10.1.1.51:8983/solr/select?q=%28bookCode%3abarglewargle+AND+content%3awriting+AND+id:6970%29&rows=1&sort=id+asc&fl=id%2cbookCode%2cnavPointId%2csectionTitle&hl=true&hl.fl=content&hl.snippets=100&hl.fragSize=1000&hl.maxAnalyzedChars=-1&version=2.2
> >
> >  to > class="werd"> write a
> >
> > Because of our particular needs, the content has been "spanified", each
> word
> > with its own span id. I do apply HTMLStrip during the index time.
> >
> > What I would like to do is to increase the size of snippet so that the
> > highlighted snippets contain more surrounding words.
> >
> > Although hl.fragSize went from 10 to 1000, the result is the
> same.
> > This leads me to believe that hl.fragSize might not be the correct
> parameter
> > to achieve the effect i am looking for. If so, what parameter should I
> use?
> >
> > Thanks!
> >
>



-- 
http://twitter.com/sangyum


Re: Cache replication

2011-08-10 Thread arian487
Thanks for the advice paul, but post processing is a must for me given the
nature of my application.  I haven't had problems yet though.  

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Cache-replication-tp3240708p3244202.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Can't mix Synonyms with Shingles?

2011-08-10 Thread Jeff Wartes

After some further playing around, I think I understand what's going on. 
Because the SynonymFilterFactory pays attention to term position when it 
inserts a multi-word synonym, I had assumed it scanned for matches in a way 
that respected term position as well. (ie, for a two-word synonym, I assumed it 
would try to find the second word in position n+1 if it found the first word in 
position n) 

This does not appear to be the case. It appears to find multi-word synonym 
matches by simply walking the list of terms, exhausting all the terms in 
position one before looking at any terms in position two. The ShingleFilter 
adds terms to most positions, so that throws off the 'adjacency' of the 
flattened list of terms. Meaning, a two-word synonym can only match if the 
synonym consists of the original term (position 1) followed by the added 
shingle (also in position 1).
Perhaps a better description is if you're looking at the analysis.jsp display, 
it does not scan for multi-word synonym tokens "across then down", it scans 
"down then across".


It doesn't look like there's a way to do what I'm trying to do (index shingles 
AND multi-word synonyms in one field) without writing my own filter.


-Original Message-
From: Jeff Wartes [mailto:jwar...@whitepages.com] 
Sent: Wednesday, August 10, 2011 1:27 PM
To: solr-user@lucene.apache.org
Subject: RE: Can't mix Synonyms with Shingles?


Hi Steven,

The token separator was certainly a deliberate choice, are you saying that 
after applying shingles, synonyms can only match shingled terms? The term 
analysis suggests the original tokens still exist. 
You've made me realize that only certain synonyms seem to have problems though, 
so it's not a blanket failure.

Take this synonym definition:
wamu, washington mutual bank, washington mutual

Indexing "wamu" looks like it'll work fine - there are no shingles, and all 
three synonym expansions appear to get indexed. (expand="true") However, 
indexing "washington mutual" applies the shingles correctly, (adds 
washingtonmutual to position 1) but the synonym expansion does not happen. I 
would still expect the synonym definition to match the original terms and index 
'wamu' along with the other stuff.

Thanks.



-Original Message-
From: Steven A Rowe [mailto:sar...@syr.edu]
Sent: Wednesday, August 10, 2011 12:54 PM
To: solr-user@lucene.apache.org
Subject: RE: Can't mix Synonyms with Shingles?

Hi Jeff,

Hi Jeff,

You have configured ShingleFilterFactory with a token separator of "", so e.g. 
"International Corporation" will output the shingle "InternationalCorporation". 
 If this is the form you want to use for synonym matching, it must exist in 
your synonym file.  Does it?

Steve

> -Original Message-
> From: Jeff Wartes [mailto:jwar...@whitepages.com]
> Sent: Wednesday, August 10, 2011 3:43 PM
> To: solr-user@lucene.apache.org
> Subject: Can't mix Synonyms with Shingles?
> 
> 
> I would like to combine the ShingleFilterFactory with a 
> SynonymFilterFactory in a field type.
> 
> I've looked at something like this using the analysis.jsp tool:
> 
>  positionIncrementGap="100">
>   
> 
>  generateWordParts="1" generateNumberParts="1" stemEnglishPosessive="1"/>
> 
>  synonyms="synonyms.BusinessNames.txt" ignoreCase="true" expand="true"/>
> ...
>   
>   
>   ...
>   
> 
> 
> However, when a ShingleFilterFactory is applied first, the 
> SynonymFilterFactory appears to do nothing.
> I haven't found any documentation or other warnings against this 
> combination, and I don't want to apply shingles after synonyms (this
> works) because multi-word synonyms then cause severe term expansion. I 
> don't really mind if the synonyms fail to match shingles, (although 
> I'd prefer they succeed) but I'd at least expect that synonyms would 
> continue to match on the original tokens, as they do if I remove the 
> ShingleFilterFactory.
> 
> I'm using Solr 3.3, any clarification would be appreciated.
> 
> Thanks,
>   -Jeff Wartes



Re: Increasing the highlight snippet size

2011-08-10 Thread Sang Yum
Well, only after I posted this question in a public forum, I found the cause
of my problem. I was using "hl.fragSize", instead of "hl.fragsize". After
correcting the case, it worked as expected.

Thanks.

On Wed, Aug 10, 2011 at 3:19 PM, Sang Yum  wrote:

> I was just trying to set it a ridiculously large number to make it work.
> What I am seeing is that hl.fragsize doesn't seem to make any difference in
> term of highlight snippet size... I just tried the query with hl.fragsize
> set to 1000. Same result as 10.
>
>
> On Wed, Aug 10, 2011 at 2:20 PM, simon  wrote:
>
>> an hl.fragsize of 1000 is problematical, as Solr parses that
>> parameter as a 32 bit int... that's several bits more.
>>
>> -Simon
>>
>> On Wed, Aug 10, 2011 at 4:59 PM, Sang Yum  wrote:
>> > Hi,
>> >
>> > I have been trying to increase the size of the highlight snippets using
>> > "hl.fragSize" parameter, without much success. It seems that hl.fragSize
>> is
>> > not making any difference at all in terms of snippet size.
>> >
>> > For example, compare the following two set of query/results:
>> >
>> >
>> http://10.1.1.51:8983/solr/select?q=%28bookCode%3abarglewargle+AND+content%3awriting+AND+id:6970%29&rows=1&sort=id+asc&fl=id%2cbookCode%2cnavPointId%2csectionTitle&hl=true&hl.fl=content&hl.snippets=100&hl.fragSize=10&hl.maxAnalyzedChars=-1&version=2.2
>> >
>> >  to> > class="werd"> write a
>> >
>> >
>> http://10.1.1.51:8983/solr/select?q=%28bookCode%3abarglewargle+AND+content%3awriting+AND+id:6970%29&rows=1&sort=id+asc&fl=id%2cbookCode%2cnavPointId%2csectionTitle&hl=true&hl.fl=content&hl.snippets=100&hl.fragSize=1000&hl.maxAnalyzedChars=-1&version=2.2
>> >
>> >  to> > class="werd"> write a
>> >
>> > Because of our particular needs, the content has been "spanified", each
>> word
>> > with its own span id. I do apply HTMLStrip during the index time.
>> >
>> > What I would like to do is to increase the size of snippet so that the
>> > highlighted snippets contain more surrounding words.
>> >
>> > Although hl.fragSize went from 10 to 1000, the result is the
>> same.
>> > This leads me to believe that hl.fragSize might not be the correct
>> parameter
>> > to achieve the effect i am looking for. If so, what parameter should I
>> use?
>> >
>> > Thanks!
>> >
>>
>
>
>
> --
> http://twitter.com/sangyum
>



-- 
http://twitter.com/sangyum


Re: Can't mix Synonyms with Shingles?

2011-08-10 Thread Robert Muir
On Wed, Aug 10, 2011 at 7:10 PM, Jeff Wartes  wrote:
>
> After some further playing around, I think I understand what's going on. 
> Because the SynonymFilterFactory pays attention to term position when it 
> inserts a multi-word synonym, I had assumed it scanned for matches in a way 
> that respected term position as well. (ie, for a two-word synonym, I assumed 
> it would try to find the second word in position n+1 if it found the first 
> word in position n)
>
> This does not appear to be the case. It appears to find multi-word synonym 
> matches by simply walking the list of terms, exhausting all the terms in 
> position one before looking at any terms in position two.

this is correct: and i think it would cause some serious bad
performance otherwise: if you have a tokenstream like this: (A B C) (D
E F) (G H I) ..., and are matching multiword synonyms, it can
potentially explode at least in terms of cpu time and all the
state-saving/restoring/copying and stuff it would need to start
considering the tokenstream as more of a token-confusion-network, and
it gets worse if you think about position increments > 1.

at least recently in svn, the limitation is documented:
http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/java/org/apache/lucene/analysis/synonym/SynonymFilter.java

-- 
lucidimagination.com


Hudson build issues

2011-08-10 Thread arian487
Whenever I try to build this on our hudson server it says it can't find
org.apache.lucene:lucene-xercesImpl:jar:4.0-SNAPSHOT.  Is the Apache repo
lacking this artifact?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Hudson-build-issues-tp3244563p3244563.html
Sent from the Solr - User mailing list archive at Nabble.com.


LockObtainFailedException

2011-08-10 Thread Naveen Gupta
Hi,

We are doing streaming update to solr for multiple user,

We are getting


Aug 10, 2011 11:56:55 AM org.apache.solr.common.SolrException log

SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
out: NativeFSLock@/var/lib/solr/data/index/write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:84)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1097)
at
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:83)
at
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:102)
at
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:174)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:222)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at org.apache.tomcat.util.net.JIoEndpoint

Aug 10, 2011 12:00:16 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
out: NativeFSLock@/var/lib/solr/data/index/write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:84)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1097)
at
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:83)
at
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:102)
at
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:174)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:222)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:662)

Aug 10, 2011 12:

Re: Indexing tweet and searching "@keyword" OR "#keyword"

2011-08-10 Thread Mohammad Shariq
Do you really want a search on "ipad" to *fail* to match input of "#ipad"?
Or
vice-versa?
My requirement is :  I want to search both '#ipad' and 'ipad' for q='ipad'
BUT for q='#ipad'  I want to search ONLY '#ipad' excluding 'ipad'.


On 10 August 2011 19:49, Erick Erickson  wrote:

> Please look more carefully at the documentation for WDDF,
> specifically:
>
> split on intra-word delimiters (all non alpha-numeric characters).
>
> WordDelimiterFilterFactory will always throw away non alpha-numeric
> characters, you can't tell it do to otherwise. Try some of the other
> tokenizers/analyzers to get what you want, and also look at the
> admin/analysis page to see what the exact effects are of your
> fieldType definitions.
>
> Here's a great place to start:
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>
> You probably want something like WhitespaceTokenizerFactory
> followed by LowerCaseFilterFactory or some such...
>
> But I really question whether this is what you want either. Do you
> really want a search on "ipad" to *fail* to match input of "#ipad"? Or
> vice-versa?
>
> KeywordTokenizerFactory is probably not the place you want to start,
> the tokenization process doesn't break anything up, you happen to be
> getting separate tokens because of WDDF, which as you see can't
> process things the way you want.
>
>
> Best
> Erick
>
> On Wed, Aug 10, 2011 at 3:09 AM, Mohammad Shariq 
> wrote:
> > I tried tweaking "WordDelimiterFactory" but I won't accept # OR @ symbols
> > and it ignored totally.
> > I need solution plz suggest.
> >
> > On 4 August 2011 21:08, Jonathan Rochkind  wrote:
> >
> >> It's the WordDelimiterFactory in your filter chain that's removing the
> >> punctuation entirely from your index, I think.
> >>
> >> Read up on what the WordDelimiter filter does, and what it's settings
> are;
> >> decide how you want things to be tokenized in your index to get the
> behavior
> >> your want; either get WordDelimiter to do it that way by passing it
> >> different arguments, or stop using WordDelimiter; come back with any
> >> questions after trying that!
> >>
> >>
> >>
> >> On 8/4/2011 11:22 AM, Mohammad Shariq wrote:
> >>
> >>> I have indexed around 1 million tweets ( using  "text" dataType).
> >>> when I search the tweet with "#"  OR "@"  I dont get the exact result.
> >>> e.g.  when I search for "#ipad" OR "@ipad"   I get the result where
> ipad
> >>> is
> >>> mentioned skipping the "#" and "@".
> >>> please suggest me, how to tune or what are filterFactories to use to
> get
> >>> the
> >>> desired result.
> >>> I am indexing the tweet as "text", below is "text" which is there in my
> >>> schema.xml.
> >>>
> >>>
> >>>  positionIncrementGap="100">
> >>> 
> >>> 
> >>>  words="stopwords.txt"
> >>> minShingleSize="3" maxShingleSize="3" ignoreCase="true"/>
> >>>  >>> generateWordParts="1"
> >>> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> >>> catenateAll="0" splitOnCaseChange="1"/>
> >>> 
> >>>  >>> protected="protwords.txt" language="English"/>
> >>> 
> >>> 
> >>> 
> >>>  >>> words="stopwords.txt"
> >>> minShingleSize="3" maxShingleSize="3" ignoreCase="true"/>
> >>>  >>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> >>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> >>> 
> >>>  >>> protected="protwords.txt" language="English"/>
> >>> 
> >>> 
> >>>
> >>>
> >
> >
> > --
> > Thanks and Regards
> > Mohammad Shariq
> >
>



-- 
Thanks and Regards
Mohammad Shariq


Re: frange not working in query

2011-08-10 Thread Amit Sawhney
The default sort is on relevance. I want to give an option to users to sort the 
results by date (latest on top). 
This works fine for queries which have few results (upto 100). However, it 
brings inaccurate results as soon as the figure reaches 1000s.
I am trying to limit the sorting to top few results only. Hoping through frange 
I will be able to define the lower limit of relevance score and get better 
results on date sort.

Is there any other way to do this?

Hope its clear.
- Amit

On 10-Aug-2011, at 7:52 PM, simon wrote:

> I meant the frange query, of course
> 
> On Wed, Aug 10, 2011 at 10:21 AM, simon  wrote:
>> Could you tell us what you're trying to achieve with the range query ?
>> It's not clear.
>> 
>> -Simon
>> 
>> On Wed, Aug 10, 2011 at 5:57 AM, Amit Sawhney  wrote:
>>> Hi All,
>>> 
>>> I am trying to sort the results on a unix timestamp using this query.
>>> 
>>> http://url.com:8983/solr/db/select/?indent=on&version=2.1&q={!frange%20l=0.25}query($qq)&qq=nokia&sort=unix-timestamp%20desc&start=0&rows=10&qt=dismax&wt=dismax&fl=*,score&hl=on&hl.snippets=1
>>> 
>>> When I run this query, it says 'no field name specified in query and no 
>>> defaultSearchField defined in schema.xml'
>>> 
>>> As soon as I remove the frange query and run this, it starts working fine.
>>> 
>>> http://url.com:8983/solr/db/select/?indent=on&version=2.1&q=nokia&sort=unix-timestamp%20desc&start=0&rows=10&qt=dismax&wt=dismax&fl=*,score&hl=on&hl.snippets=1
>>> 
>>> Any pointers?
>>> 
>>> 
>>> Thanks,
>>> Amit
>> 



Re: Solr 3.3 crashes after ~18 hours?

2011-08-10 Thread Bernd Fehling

Hi, googling "hotspot server 19.1-b02" shows that you are not alone
with hanging threads and crashes. And not only with solr.
Maybe try another JAVA?

Bernd



Am 10.08.2011 17:00, schrieb alexander sulz:

Okay, with this command it hangs.
Also: I managed to get a Thread Dump (attached).

regards

Am 05.08.2011 15:08, schrieb Yonik Seeley:

On Fri, Aug 5, 2011 at 7:33 AM, alexander sulz wrote:

Usually you get a XML-Response when doing commits or optimize, in this case
I get nothing
in return, but the site ( http://[...]/solr/update?optimize=true ) DOESN'T
load forever or anything.
It doesn't hang! I just get a blank page / empty response.

Sounds like you are doing it from a browser?
Can you try it from the command line? It should give back some sort
of response (or hang waiting for a response).

curl "http://localhost:8983/solr/update?commit=true";

-Yonik
http://www.lucidimagination.com



I use the stuff in the example folder, the only changes i made was enable
logging and changing the port to 8985.
I'll try getting a thread dump if it happens again!
So far its looking good with having allocated more memory to it.

Am 04.08.2011 16:08, schrieb Yonik Seeley:

On Thu, Aug 4, 2011 at 8:09 AM, alexander sulz
wrote:

Thank you for the many replies!

Like I said, I couldn't find anything in logs created by solr.
I just had a look at the /var/logs/messages and there wasn't anything
either.

What I mean by crash is that the process is still there and http GET
pings
would return 200
but when i try visiting /solr/admin, I'd get a blank page! The server
ignores any incoming updates or commits,

"ignores" means what? The request hangs? If so, could you get a thread
dump?

Do queries work (like /solr/select?q=*:*) ?


thous throwing no errors, no 503's.. It's like the server has a blackout
and
stares blankly into space.

Are you using a different servlet container than what is shipped with
solr?
If you did start with the solr "example" server, what jetty
configuration changes have you made?

-Yonik
http://www.lucidimagination.com






--
*
Bernd FehlingUniversitätsbibliothek Bielefeld
Dipl.-Inform. (FH)Universitätsstr. 25
Tel. +49 521 106-4060   Fax. +49 521 106-4052
bernd.fehl...@uni-bielefeld.de33615 Bielefeld

BASE - Bielefeld Academic Search Engine - www.base-search.net
*


Re: How to start troubleshooting a content extraction issue

2011-08-10 Thread Jayendra Patil
You can test the standalone content extraction with the tika-app.jar -

Command to output in text format -
java -jar tika-app-0.8.jar --text file_path

For more options java -jar tika-app-0.8.jar --help

Use the correct tika-app version jar matching the Solr build.

Regards,
Jayendra

On Wed, Aug 10, 2011 at 1:53 PM, Tim AtLee  wrote:
> Hello
>
> So, I'm a newbie to Solr and Tika and whatnot, so please use simple words
> for me :P
>
> I am running Solr on Tomcat 7 on Windows Server 2008 r2, running as the
> search engine for a Drupal web site.
>
> Up until recently, everything has been fine - searching works, faceting
> works, etc.
>
> Recently a user uploaded a 5mb xltm file, which seems to be causing Tomcat
> to spike in CPU usage, and eventually error out.  When the documents are
> submitted to be index, the tomcat process spikes up to use 100% of 1
> available CPU, with the eventual error in Drupal of "Exception occured
> sending *sites/default/files/nodefiles/533/June 30, 2011.xltm* to Solr "0"
> Status: Communication Error".
>
> I am looking for some help in figuring out where to troubleshoot this.  I
> assume it's this file, but I guess I'd like to be sure - so how can I submit
> this file for content extraction manually to see what happens?
>
> Thanks,
>
> Tim
>