date:20110512

It works. Many thanks.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-filter-how-to-specify-OR-expression-tp2930570p2930783.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Facet filter: how to specify OR expression?

I have another facet that is of type integer and it gave an exception.

Is it true that the field has to be of type string or text for the OR
expression to work?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-filter-how-to-specify-OR-expression-tp2930570p2930863.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Facet Count Based on Dates

2011-05-12 Thread Jasneet Sabharwal

But Pivot Faceting is a feature of Solr 4.0 and I am using 3.1 as that 
is a stable built and cant use a a Nightly Build.


The question was: -

I have a schema which has field Polarity which is of type "text" and it 
can have three values 0,1 or -1 and CreatedAt which is of type "date".


*How can I get count of polarity based on dates. For example, it gives 
the output that on 5/1/2011 there were 10 counts of 0, 10 counts of 1 
and 10 counts of -1

*

If I use the facet query like this :-

http://localhost:8983/solor/select/?q=*:*&facet=true&facet.field=Polarity

Then I get the count of the complete database


531477
530682


The query : 
http://localhost:8983/solr/select/?q=*:*%20AND%20CreatedAt:[2011-03-10T00:00:00Z%20TO%202011-03-18T23:59:59Z]&facet=true&facet.date=CreatedAt&facet.date.start=2011-03-10T00:00:00Z&facet.date.end=2011-03-18T23:59:59Z&facet.date.gap=%2B1DAY 



Would give me the count of data per day, like this:


0
276262
183929
196853
2967
22762
11299
37433
14359
+1DAY
2011-03-10T00:00:00Z
2011-03-19T00:00:00Z


How will I be able to get the Polarity count for each date like:-

2011-03-10T00:00:00Z
Polarity
0 = 100
1 = 500
-1 = 200
2011-03-11T00:00:00Z
Polarity
0=100
1=500
-1=200

And so on till the date range ends.


On 10-05-2011 15:51, Grijesh wrote:

Have you looked at Pivot Faceting
http://wiki.apache.org/solr/HierarchicalFaceting
http://wiki.apache.org/solr/SimpleFacetParameters#Pivot_.28ie_Decision_Tree.29_Faceting-1

-
Thanx:
Grijesh
www.gettinhahead.co.in
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-Count-Based-on-Dates-tp2922371p2922541.html
Sent from the Solr - User mailing list archive at Nabble.com.




--
Regards

Jasneet Sabharwal
Software Developer
NextGen Invent Corporation
+91-9871228582

Re: Facet filter: how to specify OR expression?

2011-05-12 Thread Grijesh

No, OR operator should work for any data type

-
Thanx: 
Grijesh 
www.gettinhahead.co.in 
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-filter-how-to-specify-OR-expression-tp2930570p2930915.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Facet Count Based on Dates

2011-05-12 Thread Grijesh

You can apply patch for Hierarchical faceting on Solr 3.1 

-
Thanx: 
Grijesh 
www.gettinhahead.co.in 
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-Count-Based-on-Dates-tp2922371p2930924.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Spatial search - SOLR 3.1

2011-05-12 Thread roySolr

Hello David,

It's easy to calculate it by myself but it was nice if SOLR returns distance
in the response. I can sort
on distance and calculate the distance with PHP to show it to the users.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spatial-search-SOLR-3-1-tp2927579p2930926.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Facet filter: how to specify OR expression?

The exception says:

java.lang.NumberFormatExcepton: for input string "or"

The field type is:




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-filter-how-to-specify-OR-expression-tp2930570p2931282.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Facet filter: how to specify OR expression?

2011-05-12 Thread rajini maski

The input parameter assigning to the field "tint" is type string ("or").  It
is trying to assign tint=or which is incorrect. So the respective exception
has occurred.

On Thu, May 12, 2011 at 4:10 PM, cnyee  wrote:

> The exception says:
>
> java.lang.NumberFormatExcepton: for input string "or"
>
> The field type is:
>  omitNorms="true" positionIncrementGap="0"/>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Facet-filter-how-to-specify-OR-expression-tp2930570p2931282.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Document match with no highlight

Hi,

Ok, here it is.  Please note that I had to type everything.  I did double
and triple check for typos.
I do not use term vectors.  I also left out the "timing" section.

Thanks for all the help.
P.

URL:
http://localhost:8983/solr/select?indent=on&version=2.2&q=DOC_TEXT%3A%223+1+15%22&fq=&start=0
&rows=10&fl=DOC_TEXT%2Cscore&qt=standard&wt=standard&debugQuery=on&explainOther=&hl=on&hl.fl=DOC_TEXT&hl.maxAnalyzedChars=-1

XML:


  
0
19

  
  on
  DOC_TEXT
  standard
  -1
  on
  10
  2.2
  on
  DOC_TEXT,score
  0
  DOC_TEXT:"3 1 15"
  standard

Re: Facet filter: how to specify OR expression?

2011-05-12 Thread Grijesh

"or" is not any operator "OR", "AND", "NOT" all are caps should be used as
operator 

-
Thanx: 
Grijesh 
www.gettinhahead.co.in 
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-filter-how-to-specify-OR-expression-tp2930570p2931318.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Facet filter: how to specify OR expression?

Oh I see

I was wrong in using (pdf or txt). It worked, but have different meanings
altogether from (pdf OR txt).

Thanks a lot for your help.

Best regards,
Yee

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-filter-how-to-specify-OR-expression-tp2930570p2931347.html
Sent from the Solr - User mailing list archive at Nabble.com.

Spellcheck: Two dictionaries

2011-05-12 Thread roySolr

Hello,

I have 2 fields: what and where. For both of the field i want some
spellchecking. I have 2
dictionary's in my config:


ws


what
what
spellchecker_what


where
where
spellchecker_where



I can search on dictionary with spellcheck.dictionary=what in my url. How
can i set 
some spellchecking for both fields?? I see that SOLR 3.1 has 
spellcheck..key parameter. How can i use that in my url?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellcheck-Two-dictionaries-tp2931458p2931458.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to update database record after indexing

2011-05-12 Thread vrpar...@gmail.com

actually every hour some records are inserted into database, so every hour
solr indexing will be called with delta import,

notes: records and data are very large (in GBs)

so each time to find all solr index and update database records process will
be slow.

is there any eventlistners or snapshooter can help me to solve this problem
?


Thanks,

Vishal Parekh


--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-update-database-record-after-indexing-tp2874171p2931537.html
Sent from the Solr - User mailing list archive at Nabble.com.

Coord in queryExplain

2011-05-12 Thread Gabriele Kahlout

Hello,

I'm wondering why the results of coord() are not displayed when debugging
query results, as described in the
wiki[1].
I'd like to see it.
Could someone point to how to make it appear with the debug fields?

-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

Re: Coord in queryExplain

> I'm wondering why the results of coord() are not displayed
> when debugging
> query results, as described in the
> wiki[1].
> I'd like to see it.
> Could someone point to how to make it appear with the debug
> fields?

coord info displayed, however it seems that it is not displayed for value of 
1.0 .
To see coord, issue a multi-word query, and advance to the end of the list via 
start param.

Re: Coord in queryExplain

2011-05-12 Thread Gabriele Kahlout

You are right!

On Thu, May 12, 2011 at 2:54 PM, Ahmet Arslan  wrote:

> > I'm wondering why the results of coord() are not displayed
> > when debugging
> > query results, as described in the
> > wiki[1<
> http://wiki.apache.org/solr/SolrRelevancyFAQ#Why_does_id:archangel_come_before_id:hawkgirl_when_querying_for_.22wings.22
> >].
> > I'd like to see it.
> > Could someone point to how to make it appear with the debug
> > fields?
>
> coord info displayed, however it seems that it is not displayed for value
> of 1.0 .
> To see coord, issue a multi-word query, and advance to the end of the list
> via start param.
>



-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

Re: Document match with no highlight

> URL:
> http://localhost:8983/solr/select?indent=on&version=2.2&q=DOC_TEXT%3A%223+1+15%22&fq=&start=0
> &rows=10&fl=DOC_TEXT%2Cscore&qt=standard&wt=standard&debugQuery=on&explainOther=&hl=on&hl.fl=DOC_TEXT&hl.maxAnalyzedChars=-1
> 
> XML:
> 
> 
>   
>     0
>     19
>     
>       
>        name="indent">on
>        name="hl.fl">DOC_TEXT
>        name="wt">standard
>        name="hl.maxAnalyzedChars">-1
>       on
>       10
>        name="version">2.2
>        name="debugQuery">on
>        name="fl">DOC_TEXT,score
>       0
>       DOC_TEXT:"3 1
> 15"
>        name="qt">standard
>       
>     
>   
>

Re: Facet Count Based on Dates

2011-05-12 Thread Jasneet Sabharwal


Is it possible to use the features of 3.1 by default for my query ?
On 12-05-2011 13:38, Grijesh wrote:

You can apply patch for Hierarchical faceting on Solr 3.1

-
Thanx:
Grijesh
www.gettinhahead.co.in
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-Count-Based-on-Dates-tp2922371p2930924.html
Sent from the Solr - User mailing list archive at Nabble.com.




--
Regards

Jasneet Sabharwal
Software Developer
NextGen Invent Corporation
+91-9871228582

Re: Facet Count Based on Dates

2011-05-12 Thread Jasneet Sabharwal


Or is it possible to use a Group By query in Solr 3.1 like we do in SQL ?
On 12-05-2011 19:37, Jasneet Sabharwal wrote:

Is it possible to use the features of 3.1 by default for my query ?
On 12-05-2011 13:38, Grijesh wrote:

You can apply patch for Hierarchical faceting on Solr 3.1

-
Thanx:
Grijesh
www.gettinhahead.co.in
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-Count-Based-on-Dates-tp2922371p2930924.html

Sent from the Solr - User mailing list archive at Nabble.com.







--
Regards

Jasneet Sabharwal
Software Developer
NextGen Invent Corporation
+91-9871228582

Re: K-Stemmer for Solr 3.1

2011-05-12 Thread Mark

java.lang.AbstractMethodError: 
org.apache.lucene.analysis.TokenStream.incrementToken()Z


Would you mind explaining your modifications? Thanks

On 5/11/11 11:14 PM, Bernd Fehling wrote:


Am 12.05.2011 02:05, schrieb Mark:
It appears that the older version of the Lucid Works KStemmer is 
incompatible with Solr 3.1. Has anyone been able to get this to work? 
If not,

what are you using as an alternative?

Thanks


Lucid KStemmer works nice with Solr3.1 after some minor mods to
KStemFilter.java and KStemFilterFactory.java.
What problems do you have?

Bernd

MoreLikeThis PDF search

2011-05-12 Thread Brian Lamb

Hi all,

I've become more and more familiar with the MoreLikeThis handler over the
last several months. I'm curious whether it is possible to do a MoreLikeThis
search by uploading a PDF? I looked at the ExtractingRequestHandler and that
looks like it that is used to process PDF files and the like but is it
possible to combine the two?

Just to be clear, I don't want to send a PDF and have that be a part of the
index. But rather, I'd like to be able to use the PDF as a MoreLikeThis
search.

Thanks,

Brian Lamb

RE: Document match with no highlight

2011-05-12 Thread Bob Sandiford

Don't you need to include your unique id field in your 'fl' parameter?  It will 
be needed anyways so you can match up the highlight fragments with the result 
docs once highlighting is working...

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com 
Join the conversation - you may even get an iPad or Nook out of it!

Like us on Facebook!

Follow us on Twitter!



> -Original Message-
> From: Ahmet Arslan [mailto:iori...@yahoo.com]
> Sent: Thursday, May 12, 2011 7:10 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Document match with no highlight
> 
> > URL:
> >
> http://localhost:8983/solr/select?indent=on&version=2.2&q=DOC_TEXT%3A%2
> 23+1+15%22&fq=&start=0
> >
> &rows=10&fl=DOC_TEXT%2Cscore&qt=standard&wt=standard&debugQuery=on&expl
> ainOther=&hl=on&hl.fl=DOC_TEXT&hl.maxAnalyzedChars=-1
> >
> > XML:
> > 
> > 
> >   
> >     0
> >     19
> >     
> >       
> >        > name="indent">on
> >        > name="hl.fl">DOC_TEXT
> >        > name="wt">standard
> >        > name="hl.maxAnalyzedChars">-1
> >       on
> >       10
> >        > name="version">2.2
> >        > name="debugQuery">on
> >        > name="fl">DOC_TEXT,score
> >       0
> >       DOC_TEXT:"3 1
> > 15"
> >        > name="qt">standard
> >       
> >     
> >   
> >

TrieIntField for "short" values

2011-05-12 Thread Juan Antonio Farré Basurte

Hello,
I'm quite a beginner in solr and have many doubts while trying to learn how 
everything works.
I have only a slight idea on how TrieFields work.
The thing is I have an integer value that will always be in the range 0-1000. A 
short field would be enough for this, but there is no such TrieShortField (not 
even a SortableShortField). So, I used a TrieIntField.
My doubt is, in this case, what would be a suitable value for precisionStep. If 
the field had only 1000 distinct values, but they were more or less uniformly 
distributed in the 32-bit int range, probably a big precisionStep would be 
suitable. But as my values are in the range 0 to 1000, I think (without much 
knowledge) that a low precisionStep should be more adequate. For example, 2.
Can anybody, please, help me finding a good configuration for this type? And, 
if possible, can anybody explain in a brief and intuitive way what are the 
differences and tradeoffs of choosing smaller or bigger precisionSteps?
Thanks a lot,

Juan

Re: Result docs missing only when shards parameter present in query?

2011-05-12 Thread mrw


Does this seem like it would be a configuration issue, an indexed data
issue, or something else?

Thanks


mrw wrote:
> 
> We have two Solr nodes, each with multiple shards.  If we query each shard
> directly (no shards parameter), we get the expected results:
> 
> response
>lst name="responseHeader"
>int name="status" 0
>int name="QTime"  22
>result name="response" numFound="100" start="0"
> doc
> doc
>   
> (^^^ hand-typed pseudo XML)
> 
> However, if we add the shards parameter and even supply one of the above
> shards, we get the same number of results, but all the doc elements under
> the result element are missing:
> 
> response
>lst name="responseHeader"
>int name="status" 0
>int name="QTime"  33
>result name="response" numFound="100" start="0"
>
> 
> (^^^ note missing doc elements)
> 
> It doesn't matter which shard is specified in the shards parameter;  if
> any or all of the shards are specified after the shards parameter, we see
> this behavior.
> 
> When we go to http://:8983/solr/  on either node, we see all the
> shards properly listed.  
> 
> So, the shards seem to be registered properly, and work individually, but
> not when the shards parameter is supplied.   Any ideas?
> 
> 
> Thanks!
> 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Result-docs-missing-only-when-shards-parameter-present-in-query-tp2928889p2932248.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Document match with no highlight

Hi,

The type "text" is the default one that came with the default solr 1.4
install w.o any modifications.

If I remove the quotes I do get snipets.  In fact if I did "3 1 15"~1 I do
get snipet also.

Hope that helps.

P.

On Thu, May 12, 2011 at 9:09 AM, Ahmet Arslan  wrote:

>  > URL:
> >
> http://localhost:8983/solr/select?indent=on&version=2.2&q=DOC_TEXT%3A%223+1+15%22&fq=&start=0
> >
> &rows=10&fl=DOC_TEXT%2Cscore&qt=standard&wt=standard&debugQuery=on&explainOther=&hl=on&hl.fl=DOC_TEXT&hl.maxAnalyzedChars=-1
> >
> > XML:
> > 
> > 
> >   
> > 0
> > 19
> > 
> >   
> >> name="indent">on
> >> name="hl.fl">DOC_TEXT
> >> name="wt">standard
> >> name="hl.maxAnalyzedChars">-1
> >   on
> >   10
> >> name="version">2.2
> >> name="debugQuery">on
> >> name="fl">DOC_TEXT,score
> >   0
> >   DOC_TEXT:"3 1
> > 15"
> >> name="qt">standard
> >   
> > 
> >   
> >

Re: Document match with no highlight

Hi,

I use
DOC_ID
in schema.xml

I think this is the default unique id that is used for matching.  Someone
correct me if I am wrong.

P.



On Thu, May 12, 2011 at 11:01 AM, Bob Sandiford <
bob.sandif...@sirsidynix.com> wrote:

> Don't you need to include your unique id field in your 'fl' parameter?  It
> will be needed anyways so you can match up the highlight fragments with the
> result docs once highlighting is working...
>
> Bob Sandiford | Lead Software Engineer | SirsiDynix
> P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
> www.sirsidynix.com
> Join the conversation - you may even get an iPad or Nook out of it!
>
> Like us on Facebook!
>
> Follow us on Twitter!
>
>
>
> > -Original Message-
> > From: Ahmet Arslan [mailto:iori...@yahoo.com]
> > Sent: Thursday, May 12, 2011 7:10 AM
> > To: solr-user@lucene.apache.org
>  > Subject: Re: Document match with no highlight
> >
> > > URL:
> > >
> > http://localhost:8983/solr/select?indent=on&version=2.2&q=DOC_TEXT%3A%2
> > 23+1+15%22&fq=&start=0
> > >
> > &rows=10&fl=DOC_TEXT%2Cscore&qt=standard&wt=standard&debugQuery=on&expl
> > ainOther=&hl=on&hl.fl=DOC_TEXT&hl.maxAnalyzedChars=-1
> > >
> > > XML:
> > > 
> > > 
> > >   
> > > 0
> > > 19
> > > 
> > >   
> > >> > name="indent">on
> > >> > name="hl.fl">DOC_TEXT
> > >> > name="wt">standard
> > >> > name="hl.maxAnalyzedChars">-1
> > >   on
> > >   10
> > >> > name="version">2.2
> > >> > name="debugQuery">on
> > >> > name="fl">DOC_TEXT,score
> > >   0
> > >   DOC_TEXT:"3 1
> > > 15"
> > >> > name="qt">standard
> > >   
> > > 
> > >   
> > >

Changing the schema

2011-05-12 Thread Brian Lamb

If I change the field type in my schema, do I need to rebuild the entire
index? I'm at a point now where it takes over a day to do a full import due
to the sheer size of my application and I would prefer not having to reindex
just because I want to make a change somewhere.

Thanks,

Brian Lamb

RE: Document match with no highlight

2011-05-12 Thread Pierre GOSSE

> In fact if I did "3 1 15"~1 I do get snipet also.

Strange, I had a very similar problem, but with overlapping tokens. Since 
you're using the standard "text" field, this should be you're case. 

Maybe you could have a look at this issue, since it sound very familiar to me :
https://issues.apache.org/jira/browse/LUCENE-3087

Pierre

-Message d'origine-
De : Phong Dais [mailto:phong.gd...@gmail.com] 
Envoyé : jeudi 12 mai 2011 17:26
À : solr-user@lucene.apache.org
Objet : Re: Document match with no highlight

Hi,



The type "text" is the default one that came with the default solr 1.4
install w.o any modifications.

If I remove the quotes I do get snipets.  In fact if I did "3 1 15"~1 I do
get snipet also.

Hope that helps.

P.

On Thu, May 12, 2011 at 9:09 AM, Ahmet Arslan  wrote:

>  > URL:
> >
> http://localhost:8983/solr/select?indent=on&version=2.2&q=DOC_TEXT%3A%223+1+15%22&fq=&start=0
> >
> &rows=10&fl=DOC_TEXT%2Cscore&qt=standard&wt=standard&debugQuery=on&explainOther=&hl=on&hl.fl=DOC_TEXT&hl.maxAnalyzedChars=-1
> >
> > XML:
> > 
> > 
> >   
> > 0
> > 19
> > 
> >   
> >> name="indent">on
> >> name="hl.fl">DOC_TEXT
> >> name="wt">standard
> >> name="hl.maxAnalyzedChars">-1
> >   on
> >   10
> >> name="version">2.2
> >> name="debugQuery">on
> >> name="fl">DOC_TEXT,score
> >   0
> >   DOC_TEXT:"3 1
> > 15"
> >> name="qt">standard
> >   
> > 
> >   
> >

RE: Document match with no highlight

2011-05-12 Thread Pierre GOSSE

> Since you're using the standard "text" field, this should NOT be you're case.

Sorry, for the missing NOT in previous phrase. You should have the same issue 
given what you said, but still, it sound very similar. 

Are you sure your fieldtype "text" has nothing special ? a tokenizer or filter 
that could add some token in your indexed text but not in your query, like for 
example a WordDelimiter present in  and not  ?

Pierre

-Message d'origine-
De : Pierre GOSSE [mailto:pierre.go...@arisem.com] 
Envoyé : jeudi 12 mai 2011 18:21
À : solr-user@lucene.apache.org
Objet : RE: Document match with no highlight

> In fact if I did "3 1 15"~1 I do get snipet also.

Strange, I had a very similar problem, but with overlapping tokens. Since 
you're using the standard "text" field, this should be you're case. 

Maybe you could have a look at this issue, since it sound very familiar to me :
https://issues.apache.org/jira/browse/LUCENE-3087

Pierre

-Message d'origine-
De : Phong Dais [mailto:phong.gd...@gmail.com] 
Envoyé : jeudi 12 mai 2011 17:26
À : solr-user@lucene.apache.org
Objet : Re: Document match with no highlight

Hi,

The type "text" is the default one that came with the default solr 1.4
install w.o any modifications.

If I remove the quotes I do get snipets.  In fact if I did "3 1 15"~1 I do
get snipet also.

Hope that helps.

P.

On Thu, May 12, 2011 at 9:09 AM, Ahmet Arslan  wrote:

>  > URL:
> >
> http://localhost:8983/solr/select?indent=on&version=2.2&q=DOC_TEXT%3A%223+1+15%22&fq=&start=0
> >
> &rows=10&fl=DOC_TEXT%2Cscore&qt=standard&wt=standard&debugQuery=on&explainOther=&hl=on&hl.fl=DOC_TEXT&hl.maxAnalyzedChars=-1
> >
> > XML:
> > 
> > 
> >   
> > 0
> > 19
> > 
> >   
> >> name="indent">on
> >> name="hl.fl">DOC_TEXT
> >> name="wt">standard
> >> name="hl.maxAnalyzedChars">-1
> >   on
> >   10
> >> name="version">2.2
> >> name="debugQuery">on
> >> name="fl">DOC_TEXT,score
> >   0
> >   DOC_TEXT:"3 1
> > 15"
> >> name="qt">standard
> >   
> > 
> >   
> >

Support for huge data set?

2011-05-12 Thread atreyu

Hi,

I have about 300 million docs (or 10TB data) which is doubling every 3
years, give or take.  The data mostly consists of Oracle records, webpage
files (HTML/XML, etc.) and office doc files.  There are b/t two and four
dozen concurrent users, typically.  The indexing server has > 27 GB of RAM,
but it still gets extremely taxed, and this will only get worse. 

Would Solr be able to efficiently deal with a load of this size?  I am
trying to avoid the heavy cost of GSA, etc...

Thanks.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Support-for-huge-data-set-tp2932652p2932652.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Support for huge data set?

2011-05-12 Thread Darren Govoni

I have the same questions. 

But from your message, I couldn't tell. Are you using Solr now? Or some
other indexing server?

Darren

On Thu, 2011-05-12 at 09:59 -0700, atreyu wrote:
> Hi,
> 
> I have about 300 million docs (or 10TB data) which is doubling every 3
> years, give or take.  The data mostly consists of Oracle records, webpage
> files (HTML/XML, etc.) and office doc files.  There are b/t two and four
> dozen concurrent users, typically.  The indexing server has > 27 GB of RAM,
> but it still gets extremely taxed, and this will only get worse. 
> 
> Would Solr be able to efficiently deal with a load of this size?  I am
> trying to avoid the heavy cost of GSA, etc...
> 
> Thanks.
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Support-for-huge-data-set-tp2932652p2932652.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Support for huge data set?

2011-05-12 Thread atreyu

Oh, my fault.  No, I am not using Solr yet - just evaluating it.  The current
implementation is a combination of Sphinx and Oracle Text, but I have not
been involved with any of the integration - I'm more of an outside analyst
looking in, but will probably be involved in the integration of any new
methods, particularly Open Source ones.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Support-for-huge-data-set-tp2932652p2932704.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Support for huge data set?

2011-05-12 Thread Darren Govoni

Ok, thanks. Yeah, I'm in the same boat and want to know what others have
done with document numbers that large.

I know there is SolrCloud that can federate numerous solr instances and
query across them, so I suspect some solution with 100's of M's of docs
would require a federation.

If anyone has done this, some best practices would be great to know!

On Thu, 2011-05-12 at 10:10 -0700, atreyu wrote:
> Oh, my fault.  No, I am not using Solr yet - just evaluating it.  The current
> implementation is a combination of Sphinx and Oracle Text, but I have not
> been involved with any of the integration - I'm more of an outside analyst
> looking in, but will probably be involved in the integration of any new
> methods, particularly Open Source ones.
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Support-for-huge-data-set-tp2932652p2932704.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Document match with no highlight

Hi,

I read the link provided and I'll need some time to digest what it is
saying.

Here's my "text" fieldtype.


  





  
  






  

Also, I figured out what value in DOC_TEXT cause this issue to occur.
With a DOC_TEXT of (without the quotes):
"0176 R3 1.5 TO "

Searching for "3 1 15" returns a match with "empty" highlight.
Searching for "3 1 15"~1 returns a match with highlight.

Can anyone see anything that I'm missing?

Thanks,
P.


On Thu, May 12, 2011 at 12:27 PM, Pierre GOSSE wrote:

> > Since you're using the standard "text" field, this should NOT be you're
> case.
>
> Sorry, for the missing NOT in previous phrase. You should have the same
> issue given what you said, but still, it sound very similar.
>
> Are you sure your fieldtype "text" has nothing special ? a tokenizer or
> filter that could add some token in your indexed text but not in your query,
> like for example a WordDelimiter present in  and not  ?
>
> Pierre
>
> -Message d'origine-
> De : Pierre GOSSE [mailto:pierre.go...@arisem.com]
> Envoyé : jeudi 12 mai 2011 18:21
> À : solr-user@lucene.apache.org
> Objet : RE: Document match with no highlight
>
> > In fact if I did "3 1 15"~1 I do get snipet also.
>
> Strange, I had a very similar problem, but with overlapping tokens. Since
> you're using the standard "text" field, this should be you're case.
>
> Maybe you could have a look at this issue, since it sound very familiar to
> me :
> https://issues.apache.org/jira/browse/LUCENE-3087
>
> Pierre
>
> -Message d'origine-
> De : Phong Dais [mailto:phong.gd...@gmail.com]
> Envoyé : jeudi 12 mai 2011 17:26
> À : solr-user@lucene.apache.org
> Objet : Re: Document match with no highlight
>
> Hi,
>
> 
>
> The type "text" is the default one that came with the default solr 1.4
> install w.o any modifications.
>
> If I remove the quotes I do get snipets.  In fact if I did "3 1 15"~1 I do
> get snipet also.
>
> Hope that helps.
>
> P.
>
> On Thu, May 12, 2011 at 9:09 AM, Ahmet Arslan  wrote:
>
> >  > URL:
> > >
> >
> http://localhost:8983/solr/select?indent=on&version=2.2&q=DOC_TEXT%3A%223+1+15%22&fq=&start=0
> > >
> >
> &rows=10&fl=DOC_TEXT%2Cscore&qt=standard&wt=standard&debugQuery=on&explainOther=&hl=on&hl.fl=DOC_TEXT&hl.maxAnalyzedChars=-1
> > >
> > > XML:
> > > 
> > > 
> > >   
> > > 0
> > > 19
> > > 
> > >   
> > >> > name="indent">on
> > >> > name="hl.fl">DOC_TEXT
> > >> > name="wt">standard
> > >> > name="hl.maxAnalyzedChars">-1
> > >   on
> > >   10
> > >> > name="version">2.2
> > >> > name="debugQuery">on
> > >> > name="fl">DOC_TEXT,score
> > >   0
> > >   DOC_TEXT:"3 1
> > > 15"
> > >> > name="qt">standard
> > >   
> > > 
> > >   
> > >

What is correct use of HTMLStripCharFilter in Solr 3.1

2011-05-12 Thread nicksnels

Hi,

I recently upgraded from Solr 1.3 to Solr 3.1 in order to take advantage of
the HTMLStripCharFilter. But it isn't working as I expected.

I have a text field that may contain HTML tags. I however would like to
store it in Solr without the HTML tags. And retrieve the text field for
display and for highlighting without HTML tags.

I added  to the top of
 in the schema.xml file of the solr
example, both in  and in .

And the text field is simply:



Now, when I do a search. The text field still has all the HTML tags in them
and the highlighting is totally screwed up with em tags around virtually
every word. What am I doing wrong?

Kind regards,

Nick

--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-is-correct-use-of-HTMLStripCharFilter-in-Solr-3-1-tp2933021p2933021.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Anyone familiar with Solandra or Lucendra?

2011-05-12 Thread kenf_nc

I modified the subject to include Lucendra, in case anyone has heard of it by
that name. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Anyone-familiar-with-Solandra-or-Lucendra-tp2927357p2933051.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: What is correct use of HTMLStripCharFilter in Solr 3.1

> I recently upgraded from Solr 1.3 to Solr 3.1 in order to
> take advantage of
> the HTMLStripCharFilter. But it isn't working as I
> expected.
> 
> I have a text field that may contain HTML tags. I however
> would like to
> store it in Solr without the HTML tags. And retrieve the
> text field for
> display and for highlighting without HTML tags.
> 
> I added  class="solr.HTMLStripCharFilterFactory"/> to the top of
>  positionIncrementGap="100"
> autoGeneratePhraseQueries="true"> in the schema.xml file
> of the solr
> example, both in  and in
> .
> 
> And the text field is simply:
> 
>  stored="true"/>
> 
> Now, when I do a search. The text field still has all the
> HTML tags in them
> and the highlighting is totally screwed up with em tags
> around virtually
> every word. What am I doing wrong?

You need to strip html tag before analysis phase. If you are using DIH, you can 
use stripHTML="true" transformer.

Re: Anyone familiar with Solandra or Lucandra?

2011-05-12 Thread Smiley, David W.

The old name is "Lucandra" not Lucendra. I've changed the subject accordingly.

I'm looking forward to responses from people but I'm afraid it appears it has 
not yet gotten much uptake yet. I think it has enormous potential once it's 
hardened a bit and there's more documentation. Personally, I've been looking 
forward to kicking the tires a bit once I get some time.

~ David Smiley
Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/

On May 12, 2011, at 2:54 PM, kenf_nc wrote:

> I modified the subject to include Lucendra, in case anyone has heard of it by
> that name. 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Anyone-familiar-with-Solandra-or-Lucendra-tp2927357p2933051.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: What is correct use of HTMLStripCharFilter in Solr 3.1

2011-05-12 Thread Jonathan Rochkind


On 5/12/2011 2:55 PM, Ahmet Arslan wrote:

I recently upgraded from Solr 1.3 to Solr 3.1 in order to
take advantage of
the HTMLStripCharFilter. But it isn't working as I
expected.


You need to strip html tag before analysis phase. If you are using DIH, you can use 
stripHTML="true" transformer.




Wait, then what's the HTMLStripCharFilter for?

Re: Support for huge data set?

2011-05-12 Thread Jonathan Rochkind

If each document is VERY small, it's actually possible that one Solr
server could handle it -- especially if you DON'T try to do facetting or
other similar features, but stick to straight search and relevancy.
There are other factors too. But # of documents is probably less
important than total size of index, or number of unique terms -- of
course # of documents often correlates to those too.

But if each document is largeish... yeah, I suspect that'll be too much
for any one Solr server. You'll have to use some kind of distribution.
Out of the box, Solr has a Distributed Search function meant for this
use case. http://wiki.apache.org/solr/DistributedSearch . Some Solr
features don't work under a Distributed setup, but the basic ones are
there. There are some other add-ons not (yet anyway) part of Solr distro
that try to solve this in even more sophisticated ways too, like SolrCloud.

I don't personally know of anyone indexing that many documents, although
it is probably done. But I do know of the HathiTrust project (not me
personally) indexing fewer documents but still adding up to terrabytes
of total index (millions to tens of millions of documents, but each one
is a digitized book that could be 100-400 pages), using Distributed
Searching feature, succesfully, although it required some care and
maintenance it wasn't just a "turn it on and it works" situation.

http://www.hathitrust.org/blogs/large-scale-search/scaling-large-scale-search-50-volumes-5-million-volumes-and-beyond

http://www.hathitrust.org/technical_reports/Large-Scale-Search.pdf

On 5/12/2011 1:06 PM, Darren Govoni wrote:

I have the same questions.

But from your message, I couldn't tell. Are you using Solr now? Or some
other indexing server?

Darren

On Thu, 2011-05-12 at 09:59 -0700, atreyu wrote:

Hi,

I have about 300 million docs (or 10TB data) which is doubling every 3
years, give or take. The data mostly consists of Oracle records, webpage
files (HTML/XML, etc.) and office doc files. There are b/t two and four
dozen concurrent users, typically. The indexing server has> 27 GB of RAM,
but it still gets extremely taxed, and this will only get worse.

Would Solr be able to efficiently deal with a load of this size? I am
trying to avoid the heavy cost of GSA, etc...

Thanks.

--
View this message in context:
http://lucene.472066.n3.nabble.com/Support-for-huge-data-set-tp2932652p2932652.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: What is correct use of HTMLStripCharFilter in Solr 3.1

2011-05-12 Thread Mike Sokolov

It preserves the location of the terms in the original HTML document so 
that you can highlight terms in HTML.  This makes it possible (for 
instance) to display the entire document, with all the search terms 
highlighted, or (with some careful surgery) to display formatted HTML 
(bold, italic, etc) in your search results.


-Mike

On 05/12/2011 03:42 PM, Jonathan Rochkind wrote:

On 5/12/2011 2:55 PM, Ahmet Arslan wrote:

I recently upgraded from Solr 1.3 to Solr 3.1 in order to
take advantage of
the HTMLStripCharFilter. But it isn't working as I
expected.

You need to strip html tag before analysis phase. If you are using 
DIH, you can use stripHTML="true" transformer.





Wait, then what's the HTMLStripCharFilter for?

Re: What is correct use of HTMLStripCharFilter in Solr 3.1

> Wait, then what's the HTMLStripCharFilter for?

To remove html tags in the analysis phase. For instance it can be used to 
display original html documents with search terms highlighted.

Re: Support for huge data set?

2011-05-12 Thread atreyu

Thanks for the detailed response, Jonathon.  I will look into the links and
check out SolrCloud and Distributed Search.  Load-sharing b/t 2 or 3 servers
should not pose a problem, so long as it is robust (or at least not slower),
fault-tolerant, and reliable.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Support-for-huge-data-set-tp2932652p2933367.html
Sent from the Solr - User mailing list archive at Nabble.com.

field type=string vs field type=text

2011-05-12 Thread chetan

What is the difference between setting a fields type to string vs setting it
to text.

e.g.

or




--
View this message in context: 
http://lucene.472066.n3.nabble.com/field-type-string-vs-field-type-text-tp2932083p2932083.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: field type=string vs field type=text

2011-05-12 Thread Gora Mohanty

On Thu, May 12, 2011 at 8:23 PM, chetan  wrote:
> What is the difference between setting a fields type to string vs setting it
> to text.
>
> e.g.
> 
> or
> 
[...]

Please take a closer look at the fieldType definitions towards the
beginning of the default schema.xml. The "text" type has tokenizers,
and analyzers applied to it, while the "string" type does no processing
of the input data.

Regards,
Gora

A couple newbie questions

2011-05-12 Thread Stuart Smith

Hello!
I just started using Solr. My general use case is pushing a lot of data from
Hbase to solr via an M/R job using Solrj. I have lots of questions, but the
ones I'd like to start with are:

(1)
I noticed this:
http://lucene.472066.n3.nabble.com/what-happens-to-docsPending-if-stop-solr-before-commit-td2781493.html

Would seem to indicate that pending documents are commited on restart. This is
great! I also noticed, that while there is a lag on start up if I have
documents pending - it's only a few minutes or so. But if I issue a commit for
the same number of files, the server stays blocked for 20 min or so. It almost
seems like it would be a faster to add all my documents and restart the server,
rather than issuing a commit. Am I doing something strange? Is this a valid
conclusion?

(2)
I'm also getting a lot of errors about invalid UTF-8:

SEVERE: org.apache.solr.common.SolrException: Invalid UTF-8 character 0x at
char #2380289, byte #2378666)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)

It could be that the values I have in some of my document fields is indeed
invalid. My question is what does this mean when I'm submitting a batch of
documents (specifically I'm using Solrj's StreamingUpdateSolrServer w/ a
BinaryRequestWriter) - do I:

- lose the whole batch that has the bad document?
- lose the document?
- lose the one field?

I wish it was the third, hope it's the second, and I'm afraid it's the first...

Ooo.. and I guess a third question - I'm having trouble finding a document that
describes the overall design/functionality of Solr, something that would help
me reason about stuff like "what happens to pending documents when the server
restarts" or "does a commit in one indexing thread commit previously added
documents from another indexing thread". Both of those I've answered to my
satisfaction by looking over the Solr logs & mailing lists, but I'm wondering
if there's some documentation I missed somehow..
For example, something like this:
http://hadoop.apache.org/common/docs/current/hdfs_design.html
http://hbase.apache.org/book.html#architecture

Thanks!

Take care,
-stu

Re: field type=string vs field type=text

2011-05-12 Thread Tomás Fernández Löbbe

Hi, my recommendation: To quickly understand the difference between those
two different field types, index one document using string and text fields,
then facet on those fields and you will see how the terms were indexed.

Using one field type or the other will depend on what you want to do with
that field.

On Thu, May 12, 2011 at 5:18 PM, Gora Mohanty  wrote:

> On Thu, May 12, 2011 at 8:23 PM, chetan  wrote:
> > What is the difference between setting a fields type to string vs setting
> it
> > to text.
> >
> > e.g.
> > 
> > or
> > 
> [...]
>
> Please take a closer look at the fieldType definitions towards the
> beginning of the default schema.xml. The "text" type has tokenizers,
> and analyzers applied to it, while the "string" type does no processing
> of the input data.
>
> Regards,
> Gora
>

Re: Replication Clarification Please

2011-05-12 Thread Ravi Solr

Thank you Mr. Bell and Mr. Kanarsky, as per your advise we have moved
from 1.4.1 to 3.1 and have made several changes to configuration. The
configuration changes have worked nicely till now and the replication
is finishing within the interval and not backing up. The changes we
made are as follows

1. Increased the mergeFactor from 10 to 15
2. Increased ramBufferSizeMB to 1024
3. Changed lockType to single (previously it was simple)
4. Set maxCommitsToKeep to 1 in the deletionPolicy
5. Set maxPendingDeletes to 0
6. Changed caches from LRUCache to FastLRUCache as we had hit ratios
well over 75% to increase warming speed
7. Increased the poll interval to 6 minutes and re-indexed all content.

Thanks,

Ravi Kiran Bhaskar

On Wed, May 11, 2011 at 6:00 PM, Alexander Kanarsky
 wrote:
> Ravi,
>
> if you have what looks like a full replication each time even if the
> master generation is greater than slave, try to watch for the index on
> both master and slave the same time to see what files are getting
> replicated. You probably may need to adjust your merge factor, as Bill
> mentioned.
>
> -Alexander
>
>
>
> On Tue, 2011-05-10 at 12:45 -0400, Ravi Solr wrote:
>> Hello Mr. Kanarsky,
>>                 Thank you very much for the detailed explanation,
>> probably the best explanation I found regarding replication. Just to
>> be sure, I wanted to test solr 3.1 to see if it alleviates the
>> problems...I dont think it helped. The master index version and
>> generation are greater than the slave, still the slave replicates the
>> entire index form master (see replication admin screen output below).
>> Any idea why it would get the whole index everytime even in 3.1 or am
>> I misinterpreting the output ? However I must admit that 3.1 finished
>> the replication unlike 1.4.1 which would hang and be backed up for
>> ever.
>>
>> Master        http://masterurl:post/solr-admin/searchcore/replication
>>       Latest Index Version:null, Generation: null
>>       Replicatable Index Version:1296217097572, Generation: 12726
>>
>> Poll Interval         00:03:00
>>
>> Local Index   Index Version: 1296217097569, Generation: 12725
>>
>>       Location: /data/solr/core/search-data/index
>>       Size: 944.32 MB
>>       Times Replicated Since Startup: 148
>>       Previous Replication Done At: Tue May 10 12:32:42 EDT 2011
>>       Config Files Replicated At: null
>>       Config Files Replicated: null
>>       Times Config Files Replicated Since Startup: null
>>       Next Replication Cycle At: Tue May 10 12:35:41 EDT 2011
>>
>> Current Replication Status    Start Time: Tue May 10 12:32:41 EDT 2011
>>       Files Downloaded: 18 / 108
>>       Downloaded: 317.48 KB / 436.24 MB [0.0%]
>>       Downloading File: _ayu.nrm, Downloaded: 4 bytes / 4 bytes [100.0%]
>>       Time Elapsed: 17s, Estimated Time Remaining: 23902s, Speed: 18.67 KB/s
>>
>>
>> Thanks,
>> Ravi Kiran Bhaskar
>>
>> On Tue, May 10, 2011 at 4:10 AM, Alexander Kanarsky
>>  wrote:
>> > Ravi,
>> >
>> > as far as I remember, this is how the replication logic works (see
>> > SnapPuller class, fetchLatestIndex method):
>> >
>> >> 1. Does the Slave get the whole index every time during replication or
>> >> just the delta since the last replication happened ?
>> >
>> >
>> > It look at the index version AND the index generation. If both slave's
>> > version and generation are the same as on master, nothing gets
>> > replicated. if the master's generation is greater than on slave, the
>> > slave fetches the delta files only (even if the partial merge was done
>> > on the master) and put the new files from master to the same index
>> > folder on slave (either index or index., see further
>> > explanation). However, if the master's index generation is equals or
>> > less than one on slave, the slave does the full replication by
>> > fetching all files of the master's index and place them into a
>> > separate folder on slave (index.). Then, if the fetch is
>> > successfull, the slave updates (or creates) the index.properties file
>> > and puts there the name of the "current" index folder. The "old"
>> > index. folder(s) will be kept in 1.4.x - which was treated
>> > as a bug - see SOLR-2156 (and this was fixed in 3.1). After this, the
>> > slave does commit or reload core depending whether the config files
>> > were replicated. There is another bug in 1.4.x that fails replication
>> > if the slave need to do the full replication AND the config files were
>> > changed - also fixed in 3.1 (see SOLR-1983).
>> >
>> >> 2. If there are huge number of queries being done on slave will it
>> >> affect the replication ? How can I improve the performance ? (see the
>> >> replications details at he bottom of the page)
>> >
>> >
>> > >From my experience the half of the replication time is a time when the
>> > transferred data flushes to the disk. So the IO impact is important.
>> >
>> >> 3. Will the segment names be same be same on master and slave after
>> >> replication ? I see that they are di

DIH help request: nested xml entities and xpath

2011-05-12 Thread Weiss, Eric

Apologies in advance if this topic/question has been previously answered…I have 
scoured the docs, mail archives, web looking for an answer(s) with no luck.  I 
am sure I am just being dense or missing something obvious…please point out my 
stupidity as my head hurts trying to get this working.

Solr 3.1
Java 1.6
Eclipse/Tomcat 7/Maven 2.x

Goal: to extract manufacturer names from a repeating list of keywords each 
denoted by a Category, one of which is "Manufacturer", and load them into a 
MsgKeywordMF field  (see xml below)

I have xml files I am loading via DIH.  This an abbreviated example xml data 
(each file has repeating "Report" items, each report has repeating MsgSet, Msg, 
MsgList, etc items).  Notice the nested repeating groups, namely MsgItems, 
within each document (Report):




  

02/22/2011

 …

  

  



  http://someurl.com/path/to/doc

   …

  blah blah

  



  SomeType

  Location

  USA





  AnotherType

  Manufacturer

  Apple



…

  



  



…


…

…

Here is my data-config.xml:




  


  



  

  

  

  

  

  

  …

  



  




As seen in my config and sample data above, I am extracting the repeating 
"Keywords" into the the MsgKeyword field.  Also, and the part that does NOT 
work, I am trying to extract into a separate field just the keywords that have 
a "Category" of "Manufacturer" -->   

I have also tried: 
…after changing the "Category" to an attribute of MsgItem () but it too fails to match.

I have tested my xpath notation against my xml data file using various xpath 
evaluator tools, like within Eclipse, and it matches perfectly…but I can't get 
it to match/work during import.

As I am able to understand it, DIH does not support nested/correlated entities, 
at least not with XML data sources using nested entity tags.  I've tried 
without success to nest entities but I can't "correlate" the nested entity with 
the parent.  I think the way I'm trying should work, but no luck so far….

BTW, I can't easily change the xml format, although it is possible with some 
pain…

Any ideas?

TIA,
-- Eric

solr velocity.log setting

2011-05-12 Thread Yuhan Zhang

hi all,

I'm new to solr, and trying to install it on tomcat. however, an exception
was reached when
the page http://localhost/sorl/browse was visited:

 *FileNotFoundException: velocity.log (Permission denied) *

looks like solr is trying to create a velocity.log file to tomcat root. so,
how should I set the configuration
file on solr to change the location that velocity.log is logging to?

Thank you.

Y

Re: DIH help request: nested xml entities and xpath

2011-05-12 Thread Ashique

Hi All,

I am a Java/J2ee programmer and very new to SOLR. I would  like to index a
table in a postgresSql database to SOLR. Then searching the records from a
GUI (Jsp Page) and showing the results in tabular form. Could any one help
me out with a simple sample code.

Thank you.

Regards,
Ashique

On Fri, May 13, 2011 at 4:53 AM, Weiss, Eric  wrote:

> Apologies in advance if this topic/question has been previously answered…I
> have scoured the docs, mail archives, web looking for an answer(s) with no
> luck.  I am sure I am just being dense or missing something obvious…please
> point out my stupidity as my head hurts trying to get this working.
>
> Solr 3.1
> Java 1.6
> Eclipse/Tomcat 7/Maven 2.x
>
> Goal: to extract manufacturer names from a repeating list of keywords each
> denoted by a Category, one of which is "Manufacturer", and load them into a
> MsgKeywordMF field  (see xml below)
>
> I have xml files I am loading via DIH.  This an abbreviated example xml
> data (each file has repeating "Report" items, each report has repeating
> MsgSet, Msg, MsgList, etc items).  Notice the nested repeating groups,
> namely MsgItems, within each document (Report):
>
>
> 
>
>  
>
>02/22/2011
>
> …
>
>  
>
>  
>
>
>
>  http://someurl.com/path/to/doc
>
>   …
>
>  blah blah
>
>  
>
>
>
>  SomeType
>
>  Location
>
>  USA
>
>
>
>
>
>  AnotherType
>
>  Manufacturer
>
>  Apple
>
>
>
>…
>
>  
>
>
>
>  
>
> 
> 
> …
> 
> 
> …
> 
> …
>
> Here is my data-config.xml:
>
>
> 
>
>  
>
>
>  
>
>
>processor="FileListEntityProcessor" fileName="^.*\.xml$"
> recursive="false" baseDir="/files/xml/">
>
>  
>rootEntity="true" pk="id"
>
>  url="${fileload.fileAbsolutePath}"
> processor="XPathEntityProcessor"
>
>  forEach="/Report/MsgSet/Msg" onError="skip"
>
>  transformer="DateFormatTransformer,RegexTransformer">
>
>   xpath="/Report/MsgSet/Msg/DocumentText"/>
>
>  
>
>   xpath="/Report/MsgSet/Msg/MsgList/MsgItem/Category" />
>
>   xpath="/Report/MsgSet/Msg/MsgList/MsgItem/Keyword" />
>
>   xpath="/Report/MsgSet/Msg/MsgList/MsgItem[Category='Manufacturer']/Keyword"
> />
>
>  …
>
>  
>
>
>
>  
>
> 
>
>
> As seen in my config and sample data above, I am extracting the repeating
> "Keywords" into the the MsgKeyword field.  Also, and the part that does NOT
> work, I am trying to extract into a separate field just the keywords that
> have a "Category" of "Manufacturer" -->xpath="/Report/MsgSet/Msg/MsgList/MsgItem[Category='Manufacturer']/Keyword"
> />
>
> I have also tried:  xpath="/Report/MsgSet/Msg/MsgList/MsgItem[@Category='Manufacturer']/Keyword"
> />
> …after changing the "Category" to an attribute of MsgItem ( Category="Location">) but it too fails to match.
>
> I have tested my xpath notation against my xml data file using various
> xpath evaluator tools, like within Eclipse, and it matches perfectly…but I
> can't get it to match/work during import.
>
> As I am able to understand it, DIH does not support nested/correlated
> entities, at least not with XML data sources using nested entity tags.  I've
> tried without success to nest entities but I can't "correlate" the nested
> entity with the parent.  I think the way I'm trying should work, but no luck
> so far….
>
> BTW, I can't easily change the xml format, although it is possible with
> some pain…
>
> Any ideas?
>
> TIA,
> -- Eric
>
>

Faceting question

2011-05-12 Thread Mark

Is there anyway to perform a search that searches across 2 fields yet 
only gives me facets accounts for documents matching 1 field?


For example

If I have fields A & B and I perform a search across I would like to 
match my query across either of these two fields. I would then like 
facet counts for how many documents matched in field A only.


Can this accomplished? If not out of the box what classes should I look 
into to create this myself?


Thanks

Fieldcollapsing patxh not applied properly

2011-05-12 Thread Isha Garg


Hi kai,
  as per your previous mails you have already applied the 
patches with solr 1.4.I followed the steps of your mail accordingly . 
But During step 9  i got the error
# 1 out of 1 hunked failed.When I apply ony 
SOLR-236-1_4_1-paging-totals-working.patch it build successfully but the 
changes are not get reflected in solr-src .

Kindly tell me where I am going wrong.
Steps are:

1.Downloaded [solr]
2Downloaded [SOLR-236-1_4_1-paging-totals-working.patch]
3Changed line 2837 of that patch to `@@ -0,0 +1,511 @@` (
4 Downloaded [SOLR-236-1_4_1-NPEfix.patch]
5 Extracted the Solr archive
6 Applied both patches:
7 `cd apache-solr-1.4.1`
8 `patch -p0<  ../SOLR-236-1_4_1-paging-totals-working.patch`
9`patch -p0<  ../SOLR-236-1_4_1-NPEfix.patch`
10 Build Solr
11 `ant clean`
12 `ant example` ... tells me "BUILD SUCCESSFUL

Thanks in advance!
Isha garg

Fieldcollapsing patch not applied properly

2011-05-12 Thread Isha Garg


Hi kai,

  As per your previous mails you have already applied the
patches with solr 1.4.I followed the steps of your mail accordingly .
But During step 9  i got the error
# 1 out of 1 hunked failed.When I apply ony
SOLR-236-1_4_1-paging-totals-working.patch it build successfully but the
changes are not get reflected in solr-src .
Kindly tell me where I am going wrong.
Steps are:

1.Downloaded [solr]
2Downloaded [SOLR-236-1_4_1-paging-totals-working.patch]
3Changed line 2837 of that patch to `@@ -0,0 +1,511 @@` (
4 Downloaded [SOLR-236-1_4_1-NPEfix.patch]
5 Extracted the Solr archive
6 Applied both patches:
7 `cd apache-solr-1.4.1`
8 `patch -p0<   ../SOLR-236-1_4_1-paging-totals-working.patch`
9`patch -p0<   ../SOLR-236-1_4_1-NPEfix.patch`
10 Build Solr
11 `ant clean`
12 `ant example` ... tells me "BUILD SUCCESSFUL

Thanks in advance!
Isha garg

Re: K-Stemmer for Solr 3.1

2011-05-12 Thread Bernd Fehling


I backported a Lucid KStemmer version from solr 4.0 which I found somewhere.
Just changed from
import org.apache.lucene.analysis.util.CharArraySet;  // solr4.0
to
import org.apache.lucene.analysis.CharArraySet;  // solr3.1

Bernd


Am 12.05.2011 16:32, schrieb Mark:

java.lang.AbstractMethodError: 
org.apache.lucene.analysis.TokenStream.incrementToken()Z

Would you mind explaining your modifications? Thanks

On 5/11/11 11:14 PM, Bernd Fehling wrote:


Am 12.05.2011 02:05, schrieb Mark:

It appears that the older version of the Lucid Works KStemmer is incompatible 
with Solr 3.1. Has anyone been able to get this to work? If not,
what are you using as an alternative?

Thanks


Lucid KStemmer works nice with Solr3.1 after some minor mods to
KStemFilter.java and KStemFilterFactory.java.
What problems do you have?

Bernd

Re: Faceting question

Hi,

I think there is a bit of a mixup here.  Facets are not about which field a 
match was on, but about what values hits have in one or more fields you facet 
on.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Mark 
> To: solr-user@lucene.apache.org
> Sent: Fri, May 13, 2011 1:19:10 AM
> Subject: Faceting question
> 
> Is there anyway to perform a search that searches across 2 fields yet only 
>gives  me facets accounts for documents matching 1 field?
> 
> For example
> 
> If  I have fields A & B and I perform a search across I would like to match 
> my  
>query across either of these two fields. I would then like facet counts for 
>how  
>many documents matched in field A only.
> 
> Can this accomplished? If not out  of the box what classes should I look into 
>to create this  myself?
> 
> Thanks
>

Re: Support for huge data set?

With that many documents, I think GSA cost might be in millions of USD.  Don't 
go there.

300 MB docs might be called medium these days.  Of course, if those documents 
themselves are huge, then it's more resource intensive.  10 TB sounds like a 
lot 
when it comes to search, but it's hard to tell what that represents (e.g. are 
those docs with lots of photos in them?  Presentations very light on text?  
Plain text documents with 300 words per page? etc.)

Anyhow, yes, Solr is a fine choice for this.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: atreyu 
> To: solr-user@lucene.apache.org
> Sent: Thu, May 12, 2011 12:59:28 PM
> Subject: Support for huge data set?
> 
> Hi,
> 
> I have about 300 million docs (or 10TB data) which is doubling every  3
> years, give or take.  The data mostly consists of Oracle records,  webpage
> files (HTML/XML, etc.) and office doc files.  There are b/t two  and four
> dozen concurrent users, typically.  The indexing server has  > 27 GB of RAM,
> but it still gets extremely taxed, and this will only get  worse. 
> 
> Would Solr be able to efficiently deal with a load of this  size?  I am
> trying to avoid the heavy cost of GSA,  etc...
> 
> Thanks.
> 
> 
> --
> View this message in context: 
>http://lucene.472066.n3.nabble.com/Support-for-huge-data-set-tp2932652p2932652.html
>
> Sent  from the Solr - User mailing list archive at Nabble.com.
>

Re: Changing the schema

Brian,

Yes, you do need to reindex.  We've used Hadoop with Solr to speed up indexing 
by orders of magnitude for some of our customers.  Something to consider.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Brian Lamb 
> To: solr-user@lucene.apache.org
> Sent: Thu, May 12, 2011 11:53:27 AM
> Subject: Changing the schema
> 
> If I change the field type in my schema, do I need to rebuild the  entire
> index? I'm at a point now where it takes over a day to do a full  import due
> to the sheer size of my application and I would prefer not having  to reindex
> just because I want to make a change  somewhere.
> 
> Thanks,
> 
> Brian Lamb
>

Re: Facet Count Based on Dates