Re: Order of words in proximity search

2011-05-16 Thread Tor Henning Ueland
Hi,

The strange part is that i have actually tried a slop of 1000 (1K),
and the results are still different. This even when the test data has
a limiter of 10K for each sentence.
(This means that a sloppy phrase should only give hits where the
complete sentence is found, yet it is not the result...)

Hope that explains the issue a bit better :)

Regards
Tor

On Mon, May 16, 2011 at 8:08 AM, lboutros  wrote:
> the key phrase was this one :) :
>
> "A sloppy phrase query specifies a maximum "slop", or the number of
> positions tokens need to be moved to get a match. "
>
> so you could search for "foo bar"~101 in your example.
>
> Ludovic.
>
>
> -
> Jouve
> France.
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Order-of-words-in-proximity-search-tp2938427p2946620.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Mvh
Tor Henning Ueland


Re: Order of words in proximity search

2011-05-16 Thread lboutros
The analyzer of the field you are using could impact the Phrase Query Slop. 
Could you copy/paste the part of the schema ?

Ludovic.

-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Order-of-words-in-proximity-search-tp2938427p2946764.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Want to Delete Existing Index & create fresh index

2011-05-16 Thread Pawan Darira
It is by default commented in solrconfig.xml

On Sat, May 14, 2011 at 10:49 PM, Gabriele Kahlout  wrote:

> I guess you are having issues with the datadir. Did you set the datadir in
> solrconfig.xml?
>
> On Sat, May 14, 2011 at 4:10 PM, Pawan Darira  >wrote:
>
> > Hi
> >
> > I am using Solr 1.4. & had changed schema already. When i created the
> index
> > for first time, the directory was automatically created & index made
> > perfectly fine.
> >
> > Now, i want to create the index from scratch, so I deleted the whole
> > data/index directory & ran the script. Now it is only creating empty
> > directories & NO index files inside that.
> >
> > Thanks
> > Pawan
> >
> >
> > On Sat, May 14, 2011 at 6:54 PM, Dmitry Kan 
> wrote:
> >
> > > Hi Pawan,
> > >
> > > Which SOLR version do you have installed?
> > >
> > > It should be absolutely normal for the data/ sub directory to create
> when
> > > starting up SOLR.
> > >
> > > So just go ahead and post your data into SOLR, if you have changed the
> > > schema already.
> > >
> > > --
> > > Regards,
> > >
> > > Dmitry Kan
> > >
> > > On Sat, May 14, 2011 at 4:01 PM, Pawan Darira  > > >wrote:
> > >
> > > > I did that. Index directory is created but not contents in that
> > > >
> > > > 2011/5/14 François Schiettecatte 
> > > >
> > > > > You can also shut down solr/lucene, do:
> > > > >
> > > > >rm -rf /YourIndexName/data/index
> > > > >
> > > > > and restart, the index directory will be automatically recreated.
> > > > >
> > > > > François
> > > > >
> > > > > On May 14, 2011, at 1:53 AM, Gabriele Kahlout wrote:
> > > > >
> > > > > > "curl --fail $solrIndex/update?commit=true -d
> > > > > > '*:*'" #empty index [1
> > > > > > <
> > > > >
> > >
> http://wiki.apache.org/nutch/Whole-Web%20Crawling%20incremental%20script
> > > > >]
> > > > > >
> > > > > > did u try?
> > > > > >
> > > > > >
> > > > > > On Sat, May 14, 2011 at 7:26 AM, Pawan Darira <
> > > pawan.dar...@gmail.com
> > > > > >wrote:
> > > > > >
> > > > > >> Hi
> > > > > >>
> > > > > >> I had an existing index created months back. now my database
> > schema
> > > > has
> > > > > >> changed. i wanted to delete the current data/index directory &
> > > > re-create
> > > > > >> the
> > > > > >> fresh index
> > > > > >>
> > > > > >> but it is saying that "segments" file not found & just create
> > blank
> > > > > >> data/index directory. Please help
> > > > > >>
> > > > > >> --
> > > > > >> Thanks,
> > > > > >> Pawan Darira
> > > > > >>
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Regards,
> > > > > > K. Gabriele
> > > > > >
> > > > > > --- unchanged since 20/9/10 ---
> > > > > > P.S. If the subject contains "[LON]" or the addressee
> acknowledges
> > > the
> > > > > > receipt within 48 hours then I don't resend the email.
> > > > > > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x,
> this)
> > ∧
> > > > > time(x)
> > > > > > < Now + 48h) ⇒ ¬resend(I, this).
> > > > > >
> > > > > > If an email is sent by a sender that is not a trusted contact or
> > the
> > > > > email
> > > > > > does not contain a valid code then the email is not received. A
> > valid
> > > > > code
> > > > > > starts with a hyphen and ends with "X".
> > > > > > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈
> subject(x)
> > ∧
> > > y
> > > > ∈
> > > > > > L(-[a-z]+[0-9]X)).
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Thanks,
> > > > Pawan Darira
> > > >
> > >
> >
> >
> >
> > --
> > Thanks,
> > Pawan Darira
> >
>
>
>
> --
> Regards,
> K. Gabriele
>
> --- unchanged since 20/9/10 ---
> P.S. If the subject contains "[LON]" or the addressee acknowledges the
> receipt within 48 hours then I don't resend the email.
> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
> time(x)
> < Now + 48h) ⇒ ¬resend(I, this).
>
> If an email is sent by a sender that is not a trusted contact or the email
> does not contain a valid code then the email is not received. A valid code
> starts with a hyphen and ends with "X".
> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> L(-[a-z]+[0-9]X)).
>



-- 
Thanks,
Pawan Darira


Can't seem to get External Field scoring running

2011-05-16 Thread karanveer singh
I want to be able to dynamically change scores without having to
update the entire document.
For this, I started using the External File Field.

I set a fieldType called idRankFile and field called idRank in schema.xml :




Now I set the idRank for various id's in a file called
external_idRank.txt in dataDir :

F8V7067-APL-KIT = 1.0
IW-02 = 10.0
9885A004 = 100.0

Originally, the scores for these 3 id's (for my query) was in reverse order.
Now, I query using the following :

http://localhost:8983/solr/select?indent=on&q=car%20power%20adaptor&fl=id,name&_val_:idRank

However, the order for the results remains the same. It seems it
hasn't taken the external field into account

Any ideas how to do this? Is my query correct?


Re: UIMA analysisEngine path

2011-05-16 Thread Tommaso Teofili
Hello,

if you want to take the descriptor from a jar, provided that you configured
the jar inside a  element in solrconfig, then you just need to write
the correct classpath in the analysisEngine element.
For example if your descriptor resides in com/something/desc/ path inside
the jar then you should set the analysisEngine element as
/com/something/desc/descriptorname.xml
If you instead need to get the descriptor from filesystem try the patch in
SOLR-2501 [1].
Hope this helps,
Tommaso

[1] :  https://issues.apache.org/jira/browse/SOLR-2501

2011/5/13 chamara 

> Hi,
>  Is this code line 57 needs to be changed to the location where the jar
> files(library files) resides?
>  URL url = this.getClass().getResource(""); I
> did
> change it but no luck so far. Let me know what i am doing wrong?
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/UIMA-analysisEngine-path-tp2895284p2935541.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Order of words in proximity search

2011-05-16 Thread Tor Henning Ueland
http://pastebin.com/svyefmM6

Pretty standard :)

/Tor

On Mon, May 16, 2011 at 9:18 AM, lboutros  wrote:
> The analyzer of the field you are using could impact the Phrase Query Slop.
> Could you copy/paste the part of the schema ?
>
> Ludovic.
>
> -
> Jouve
> France.
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Order-of-words-in-proximity-search-tp2938427p2946764.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Mvh
Tor Henning Ueland


[POLL] How do you (like to) do logging with Solr

2011-05-16 Thread Jan Høydahl
Hi,

This poll is to investigate how you currently do or would like to do logging 
with Solr when deploying solr.war to a SEPARATE java application server (such 
as Tomcat, Resin etc) outside of the bundled "solr/example". For background on 
how things work in Solr now, see http://wiki.apache.org/solr/SolrLogging and 
for more info on the SLF4J framework, see http://www.slf4j.org/manual.html

Please tick one of the options below with an [X]:

[ ]  I always use the JDK logging as bundled in solr.war, that's perfect
[ ]  I sometimes use log4j or another framework and am happy with re-packaging 
solr.war
[ ]  Give me solr.war WITHOUT an slf4j logger binding, so I can choose at 
deploy time
[ ]  Let me choose whether to bundle a binding or not at build time, using an 
ANT option
[ ]  What's wrong with the "solr/example" Jetty? I never run Solr elsewhere!
[ ]  What? Solr can do logging? How cool!

Note that NOT bundling a logger binding with solr.war means defaulting to the 
NOP logger after outputting these lines to stderr:
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com



Re: [POLL] How do you (like to) do logging with Solr

2011-05-16 Thread Peter Sturge
 [X]  I always use the JDK logging as bundled in solr.war, that's perfect
 [ ]  I sometimes use log4j or another framework and am happy with
re-packaging solr.war
 [ ]  Give me solr.war WITHOUT an slf4j logger binding, so I can
choose at deploy time
 [ ]  Let me choose whether to bundle a binding or not at build time,
using an ANT option
 [ ]  What's wrong with the "solr/example" Jetty? I never run Solr elsewhere!
 [ ]  What? Solr can do logging? How cool!


Re: [POLL] How do you (like to) do logging with Solr

2011-05-16 Thread Markus Jelsma
 [ ]  I always use the JDK logging as bundled in solr.war, that's perfect
 [X ]  I sometimes use log4j or another framework and am happy with
re-packaging solr.war
 [ ]  Give me solr.war WITHOUT an slf4j logger binding, so I can
choose at deploy time
 [ ]  Let me choose whether to bundle a binding or not at build time,
using an ANT option
 [ ]  What's wrong with the "solr/example" Jetty? I never run Solr elsewhere!
 [ ]  What? Solr can do logging? How cool!

Setting up log4j is easy but encountered issues with versions when switching 
to 3.1.


Re: why query chinese character with bracket become phrase query by default?

2011-05-16 Thread Michael McCandless
On Sun, May 15, 2011 at 7:44 PM, Mark Miller  wrote:

>> Could you please revert your commit, until we've reached some
>> consensus on this discussion first?
>
> Let's reach some consensus, but why revert? This has been the behavior - 
> shouldn't the consensus onus be on changing it to begin with? That's how I 
> see it.

To be clear, I'm asking that Yonik revert his commit from yesterday
(rev 1103444), where he added "text_nwd" fieldType and dynamic fields
*_nwd to the example schema.xml.

I agree we should reach consensus before changing what's already
committed, that's exactly why I'm asking Yonik to revert -- we were in
the middle of discussing this, and I had posted a patch on SOLR-2519,
when he suddenly committed the text_nwd change, yesterday.

Does anyone disagree that Yonik's commit was inappropriate?  This is
not how we work at Apache.

> I'm going to need to get back up to speed on this issue before I can comment 
> more helpfully. Better out of the box support for other languages is 
> important - I think it makes sense to discuss this issue again myself.

+1

Solr, out of box, is just awful for non-whitespace languages (eg CJK,
and others).  And for every user who comes to the list asking for help
(thank you cyang2010!), I imagine there are many others who simply
gave up and walked away (from Solr) when they tried it on CJK
content.

Lucene has made awesome strides in having natural defaults that work
well across many languages, thanks to the hard work of Robert and
others (StandardAnalyzer now actually follows a standard (UAX #29 --
text segmentation), autophrase off in QP, etc.), and I think we should
take advantage of this in Solr, just like ElasticSearch does.

Really, the best solution (I think) would be to have language-specific
fieldTypes (text_en, text_zh, etc.), but I suspect there's a good
amount of work to reach that so in the meantime I think we should fix
the defaults for the "text" fieldType to work well across many
languages.

Mike

http://blog.mikemccandless.com


Re: [POLL] How do you (like to) do logging with Solr

2011-05-16 Thread Gora Mohanty
On Mon, May 16, 2011 at 2:13 PM, Jan Høydahl  wrote:
[...]
> Please tick one of the options below with an [X]:
>
> [ X]  I always use the JDK logging as bundled in solr.war, that's perfect
> [ ]  I sometimes use log4j or another framework and am happy with 
> re-packaging solr.war
> [ ]  Give me solr.war WITHOUT an slf4j logger binding, so I can choose at 
> deploy time
> [ ]  Let me choose whether to bundle a binding or not at build time, using an 
> ANT option
> [ ]  What's wrong with the "solr/example" Jetty? I never run Solr elsewhere!
> [ ]  What? Solr can do logging? How cool!

Regards,
Gora


Re: [POLL] How do you (like to) do logging with Solr

2011-05-16 Thread Martijn v Groningen
[ ]  I always use the JDK logging as bundled in solr.war, that's perfect
[ ]  I sometimes use log4j or another framework and am happy with
re-packaging solr.war
[X]  Give me solr.war WITHOUT an slf4j logger binding, so I can choose at
deploy time
[ ]  Let me choose whether to bundle a binding or not at build time, using
an ANT option
[ ]  What's wrong with the "solr/example" Jetty? I never run Solr elsewhere!
[ ]  What? Solr can do logging? How cool!

On 16 May 2011 11:32, Gora Mohanty  wrote:

> On Mon, May 16, 2011 at 2:13 PM, Jan Høydahl 
> wrote:
> [...]
> > Please tick one of the options below with an [X]:
> >
> > [ X]  I always use the JDK logging as bundled in solr.war, that's perfect
> > [ ]  I sometimes use log4j or another framework and am happy with
> re-packaging solr.war
> > [ ]  Give me solr.war WITHOUT an slf4j logger binding, so I can choose at
> deploy time
> > [ ]  Let me choose whether to bundle a binding or not at build time,
> using an ANT option
> > [ ]  What's wrong with the "solr/example" Jetty? I never run Solr
> elsewhere!
> > [ ]  What? Solr can do logging? How cool!
>
> Regards,
> Gora
>



-- 
Met vriendelijke groet,

Martijn van Groningen


Re: [POLL] How do you (like to) do logging with Solr

2011-05-16 Thread Chantal Ackermann

> Please tick one of the options below with an [X]:
> 
> [ ]  I always use the JDK logging as bundled in solr.war, that's perfect
> [X]  I sometimes use log4j or another framework and am happy with 
> re-packaging solr.war

actually : not so happy because our operations team has to repackage it.
But there is no option for
 [X] add the logger configuration to the server's classpath, no
repackaging!

> [ ]  Give me solr.war WITHOUT an slf4j logger binding, so I can choose at 
> deploy time
> [ ]  Let me choose whether to bundle a binding or not at build time, using an 
> ANT option
> [ ]  What's wrong with the "solr/example" Jetty? I never run Solr elsewhere!
> [ ]  What? Solr can do logging? How cool!




Re: [POLL] How do you (like to) do logging with Solr

2011-05-16 Thread Jan Høydahl
>> [X]  I sometimes use log4j or another framework and am happy with 
>> re-packaging solr.war
> 
> actually : not so happy because our operations team has to repackage it.
> But there is no option for
> [X] add the logger configuration to the server's classpath, no
> repackaging!

That's what happens if we ship solr.war without any pre-set logger binding - 
it's the binding provided in your app-server's classpath which will be used.


And now my vote:

[ ]  I always use the JDK logging as bundled in solr.war, that's perfect
[ ]  I sometimes use log4j or another framework and am happy with re-packaging 
solr.war
[X]  Give me solr.war WITHOUT an slf4j logger binding, so I can choose at 
deploy time
[ ]  Let me choose whether to bundle a binding or not at build time, using an 
ANT option
[ ]  What's wrong with the "solr/example" Jetty? I never run Solr elsewhere!
[ ]  What? Solr can do logging? How cool!


LockObtainedFailedException on solr update

2011-05-16 Thread nitesh nandy
My solr index is updated simultaneously by multiple clients via REST. I use
"commitWithing" attribute in the  command to direct auto commits.

I start getting this error after a couple of days of usage. How do i fix
this ? Please find the error log below. Using solr 3.1 with tomcat  Thanks
--
HTTP Status 500 - Lock obtain timed out: NativeFSLock@
/var/lib/solr/data/index/write.lock

org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out:
NativeFSLock@/var/lib/solr/data/index/write.lock
  at org.apache.lucene.store.Lock.obtain(Lock.java:84)
  at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1097)
  at
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:83)
  at
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:102)
  at
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:174)
  at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:222)
  at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
  at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147)
  at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
  at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)
  at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
  at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
  at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
  at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
  at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
  at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
  at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
  at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
  at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
  at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
  at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
  at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
  at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
  at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
  at java.lang.Thread.run(Thread.java:662)


-- 
Regards,

Nitesh Nandy


Set Full-Import Clean=False

2011-05-16 Thread Jasneet Sabharwal

Hi

Where do I set the default value of clean = false when a full-import is 
done.


I did not find any such value in solrconfig.xml. I am using Solr 3.1

--
Regards

Jasneet Sabharwal




Re: Set Full-Import Clean=False

2011-05-16 Thread Gora Mohanty
On Mon, May 16, 2011 at 5:29 PM, Jasneet Sabharwal
 wrote:
> Hi
>
> Where do I set the default value of clean = false when a full-import is
> done.

Append it to the URL, e.g., dataimport?command=full-import&clean=false

Regards,
Gora


Re: Set Full-Import Clean=False

2011-05-16 Thread Jasneet Sabharwal
I have been doing that, but I want to set it as False by default, so 
that even if the admin forgets to set clean=false in the URL, it doesn't 
do it on its own.

On 16-05-2011 17:38, Gora Mohanty wrote:

On Mon, May 16, 2011 at 5:29 PM, Jasneet Sabharwal
  wrote:

Hi

Where do I set the default value of clean = false when a full-import is
done.

Append it to the URL, e.g., dataimport?command=full-import&clean=false

Regards,
Gora



--
Regards

Jasneet Sabharwal
Software Developer
NextGen Invent Corporation
+91-9871228582



Solr Cell and operations on metadata extracted

2011-05-16 Thread Olivier Tavard
Hi,



I have a question about Solr Cell please.

I index some files. For example, if I want to extract the filename, then use
a hash function on it like MD5 and then store it on Solr ; the correct way
is to use Tika « manually » to extract the metadata I want, do the
transformations on it and then send it to Solr ?

I can’t use directly Solr Cell in this case because I can't do modifications
on the metadata extracted, right ?





Thanks,



Olivier


Re: Set Full-Import Clean=False

2011-05-16 Thread Stefan Matheis
Jasneet,

what about defining the value as a default in the dataimport
request-handler? like the sample at
http://wiki.apache.org/solr/SolrRequestHandler does?

Regards
Stefan

On Mon, May 16, 2011 at 2:10 PM, Jasneet Sabharwal
 wrote:
> I have been doing that, but I want to set it as False by default, so that
> even if the admin forgets to set clean=false in the URL, it doesn't do it on
> its own.
> On 16-05-2011 17:38, Gora Mohanty wrote:
>>
>> On Mon, May 16, 2011 at 5:29 PM, Jasneet Sabharwal
>>   wrote:
>>>
>>> Hi
>>>
>>> Where do I set the default value of clean = false when a full-import is
>>> done.
>>
>> Append it to the URL, e.g., dataimport?command=full-import&clean=false
>>
>> Regards,
>> Gora
>
>
> --
> Regards
>
> Jasneet Sabharwal
> Software Developer
> NextGen Invent Corporation
> +91-9871228582
>
>


Re: why query chinese character with bracket become phrase query by default?

2011-05-16 Thread Mark Miller

On May 16, 2011, at 5:30 AM, Michael McCandless wrote:

> Does anyone disagree that Yonik's commit was inappropriate?  This is
> not how we work at Apache.

Ah - dunno yet - I obviously missed part of the conversation here. I thought 
you where talking about reversing 'autophrase off' as the default, not these 
'quick' new field types.

Excuse me for a moment while I read...

Yeah - seems a little hasty. Not a fan of 'text_nwd' as a field name either. 
Didn't seem malicious to me, but it does seem we should probably work together 
in JIRA/discussion before just shotgunning changes...

Don't know that I care if it's reverted (if we fall back another 10 steps into 
that BS I quit everything and I'm moving to South America), but we should push 
on here either way.

- Mark Miller
lucidimagination.com

Lucene/Solr User Conference
May 25-26, San Francisco
www.lucenerevolution.org







Re: Set Full-Import Clean=False

2011-05-16 Thread Jasneet Sabharwal

Stefan,

I have added the DIH request handler in the solrconfig.xml. Do I have to 
add the clean=false in that or somewhere else ?


Regards
Jasneet
On 16-05-2011 18:03, Stefan Matheis wrote:

Jasneet,

what about defining the value as a default in the dataimport
request-handler? like the sample at
http://wiki.apache.org/solr/SolrRequestHandler does?

Regards
Stefan

On Mon, May 16, 2011 at 2:10 PM, Jasneet Sabharwal
  wrote:

I have been doing that, but I want to set it as False by default, so that
even if the admin forgets to set clean=false in the URL, it doesn't do it on
its own.
On 16-05-2011 17:38, Gora Mohanty wrote:

On Mon, May 16, 2011 at 5:29 PM, Jasneet Sabharwal
wrote:

Hi

Where do I set the default value of clean = false when a full-import is
done.

Append it to the URL, e.g., dataimport?command=full-import&clean=false

Regards,
Gora


--
Regards

Jasneet Sabharwal
Software Developer
NextGen Invent Corporation
+91-9871228582





--
Regards

Jasneet Sabharwal
Software Developer
NextGen Invent Corporation
+91-9871228582



Re: Set Full-Import Clean=False

2011-05-16 Thread Stefan Matheis
Jasneet

On Mon, May 16, 2011 at 3:10 PM, Jasneet Sabharwal
 wrote:
> I have added the DIH request handler in the solrconfig.xml.

Exactly there :)

Regards
Stefan


Getting Null pointer exception While doing a full import

2011-05-16 Thread mechravi25
Hi, 
I am doing a full import in one of the cores. But I am getting Null poniter
exception and the import is failing again and again. I also tried clearing
the indexes and started the full import, but still indexing failed. 

The full import request is prefect and I verified it with other full import
requests too. Any Suggestion/Solution will be of great help. Thanks in
advance. The exception is as follows:

May 14, 2011 5:06:56 AM org.apache.solr.core.SolrCore execute
INFO: [core6] webapp=/solr path=/dataimport params={wt=javabin&version=1}
status=0 QTime=0 
May 14, 2011 9:03:55 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.NullPointerException
at java.io.StringReader.(StringReader.java:33)
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:197)
at 
org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:78)
at org.apache.solr.search.QParser.getQuery(QParser.java:137)
at
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:85)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

Thanks & Regards,
Sivaganesh
Email id: sivaganesh_sel...@infosys.com

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-Null-pointer-exception-While-doing-a-full-import-tp2947854p2947854.html
Sent from the Solr - User mailing list archive at Nabble.com.


boolean versus non-boolean search

2011-05-16 Thread Dmitry Kan
Dear list,

Might have missed it from the literature and the list, sorry if so, but:

SOLR 1.4.1



Consider the query:

term1 term2 OR "term1 term2" OR "term1 term3"


Problem: The query produces a hit containing only term1.

Solution: Modified query, grouping with parenthesis

(term1 term2) OR "term1 term2" OR "term1 term3"

produces hits with both term1 and term2 present and other hits that are hit
by OR'ed clauses.


Problem 1. Another modified query, AND instead of parenthesis:

term1 AND term2 OR "term1 term2" OR "term1 term3"

produces same results as the original query and same debug output.

Why is that?

-- 
Regards,

Dmitry Kan


Re: Set Full-Import Clean=False

2011-05-16 Thread Jasneet Sabharwal

Stefan

class="org.apache.solr.handler.dataimport.DataImportHandler">


name="config">/home/jasneet/apache-solr-3.1.0/example/solr/conf/data-config.xml

false



Should it be like this ?
On 16-05-2011 18:48, Stefan Matheis wrote:

Jasneet

On Mon, May 16, 2011 at 3:10 PM, Jasneet Sabharwal
  wrote:

I have added the DIH request handler in the solrconfig.xml.

Exactly there :)

Regards
Stefan



--
Regards

Jasneet Sabharwal
Software Developer
NextGen Invent Corporation
+91-9871228582



RE: SolrDispatchFilter

2011-05-16 Thread Rod.Madden
Yep that fixed my problem ...many thanks !


 


-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Friday, May 13, 2011 6:37 PM
To: solr-user@lucene.apache.org
Subject: RE: SolrDispatchFilter


: This problem is only occurring when using IE8 ( Chrome & FireFox fine
)

if it only happens when using the form on the admin screen (and not when

hitting the URL directly, via shift-reload for example), it may just be 
a differnet manifestation of this silly javascript bug...

https://issues.apache.org/jira/browse/SOLR-2455


-Hoss


Re: why query chinese character with bracket become phrase query by default?

2011-05-16 Thread Yonik Seeley
On Sun, May 15, 2011 at 1:48 PM, Michael McCandless
 wrote:
> Could you please revert your commit, until we've reached some
> consensus on this discussion first?

Huh?
I thought everyone was in agreement that we needed more field types
for different languages?
I added my best guess about what a generic type for
non-whitespace-delimited might look like.
Since it's a new field type, it doesn't affect anything.  Hopefully it
only improves the situation
for someone trying to use one of these languages.

The only negative would seem to be if it's worse than nothing (i.e. a
very bad example
because it actually doesn't work for non-whitespace-delimited languages).

The issue about changing defaults on TextField and changing what "text" does in
the example schema by default is not dependent on this.  They are only related
by the fact that if another field is added/changed then _nwd may
become redundant
and can be removed.  For now, it only seems like an improvement?

Anyway... the whole language of "revert" seems unnecessarily confrontational.
Feel free to improve what's there (or delete *_nwd if people really
feel it adds no/negative value)

-Yonik


How to index and query "C#" as whole term?

2011-05-16 Thread Gnanakumar
Hi,

I'm using Apache Solr v3.1.

How do I configure/allow Solr to both index and query the term "c#" as a
whole word/term?  From "Analysis" page, I could see that the term "c#" is
being reduced/converted into just "c" by solr.WordDelimiterFilterFactory.

Regards,
Gnanam



Re: [POLL] How do you (like to) do logging with Solr

2011-05-16 Thread Péter Király
> [ ]  I always use the JDK logging as bundled in solr.war, that's perfect
> [x]  I sometimes use log4j or another framework and am happy with 
> re-packaging solr.war
> [ ]  Give me solr.war WITHOUT an slf4j logger binding, so I can choose at 
> deploy time
> [ ]  Let me choose whether to bundle a binding or not at build time, using an 
> ANT option
> [ ]  What's wrong with the "solr/example" Jetty? I never run Solr elsewhere!
> [ ]  What? Solr can do logging? How cool!

Péter


Re: Set Full-Import Clean=False

2011-05-16 Thread Stefan Matheis
On Mon, May 16, 2011 at 3:27 PM, Jasneet Sabharwal
 wrote:
> Should it be like this ?

Never tried it myself, but what i guess from the Wiki ... Yes. doesn't
work for you, or just asked to be sure, before integrating it?


Re: why query chinese character with bracket become phrase query by default?

2011-05-16 Thread Yonik Seeley
On Mon, May 16, 2011 at 5:30 AM, Michael McCandless
 wrote:
> To be clear, I'm asking that Yonik revert his commit from yesterday
> (rev 1103444), where he added "text_nwd" fieldType and dynamic fields
> *_nwd to the example schema.xml.

So... your position is that until the "text" fieldType is changed to
support non-whitespace-delimited languages better, that
no other fieldType should be changed/added to better support
non-whitespace-delimited languages?
Man, that seems political, not technical.

Whatever... I'll "revert".

-Yonik


Re: why query chinese character with bracket become phrase query by default?

2011-05-16 Thread Simon Willnauer
On Mon, May 16, 2011 at 3:51 PM, Yonik Seeley
 wrote:
> On Mon, May 16, 2011 at 5:30 AM, Michael McCandless
>  wrote:
>> To be clear, I'm asking that Yonik revert his commit from yesterday
>> (rev 1103444), where he added "text_nwd" fieldType and dynamic fields
>> *_nwd to the example schema.xml.
>
> So... your position is that until the "text" fieldType is changed to
> support non-whitespace-delimited languages better, that
> no other fieldType should be changed/added to better support
> non-whitespace-delimited languages?
> Man, that seems political, not technical.

To me it seems neither nor. Its rather the process of improving
aligned with outstanding issues.
It shouldn't feel wrong.

Simon
>
> Whatever... I'll "revert".
>
> -Yonik
>


Re: why query chinese character with bracket become phrase query by default?

2011-05-16 Thread Michael McCandless
On Mon, May 16, 2011 at 9:51 AM, Yonik Seeley
 wrote:

>> To be clear, I'm asking that Yonik revert his commit from yesterday
>> (rev 1103444), where he added "text_nwd" fieldType and dynamic fields
>> *_nwd to the example schema.xml.
>
> So... your position is that until the "text" fieldType is changed to
> support non-whitespace-delimited languages better, that
> no other fieldType should be changed/added to better support
> non-whitespace-delimited languages?

No, that's not my position at all.

My position is: please don't suddenly commit changes, with "your way",
while we're still discussing how to solve the issue.  That's not the
Apache way.

This applies in general, not just this case (fixing Solr's
out-of-the-box behavior with non-whitespace languages).

So, it could very well be, after we iterate on SOLR-2519, that we all
agree your baby step is great, in which case let's go forward with
that.  But we should all come to some consensus about that before you
suddenly commit.

> Man, that seems political, not technical.

I'm sorry you feel that way, but it's important to me that we all
follow the Apache way here.  I feel this will only make our community
stronger.

It's also important that any time another committer is uncomfortable
with what just got committed, and asks for a revert, that it *not* be
a big deal.  It's not political, it was just a mistake and the revert
is quick and painless.

We are commit-then-review here, and if someone is uncomfortable, they
should say so and whoever committed should simply revert it and
re-iterate.  This should be a simple & free tool for all of us to
use.

> Whatever... I'll "revert".

Thank you.

Mike


Re: Show filename in search result using a FileListEntityProcessor

2011-05-16 Thread Marcel Panse
Hi, thanks for the reply.

I tried a couple of things both in the tika-test entity and in the entity
named 'f'.
In the tika-test entity I tried:




even



I also tried doing things in the entity 'f' like:




None of it works. I also added fileName to the schema like:



In fields. Doesn't help.

Can anyone provide me with a working example? I'm pretty stuck here on
something that seems really trivial and simple :-(



On Sat, May 14, 2011 at 22:56, kbootz  wrote:

> There is a JIRA item(can't recall it atm) that addresses the issue with the
> docs. I'm running 3.1 and per your example you should be able to get it
> using ${f.file}. I think* it should also be in the entity desc. but I'm
> also
> new and that's just how I access it.
>
> GL
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Show-filename-in-search-result-using-a-FileListEntityProcessor-tp2939193p2941305.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: why query chinese character with bracket become phrase query by default?

2011-05-16 Thread Yonik Seeley
On Mon, May 16, 2011 at 10:06 AM, Michael McCandless
 wrote:
> On Mon, May 16, 2011 at 9:51 AM, Yonik Seeley
>  wrote:
>
>>> To be clear, I'm asking that Yonik revert his commit from yesterday
>>> (rev 1103444), where he added "text_nwd" fieldType and dynamic fields
>>> *_nwd to the example schema.xml.
>>
>> So... your position is that until the "text" fieldType is changed to
>> support non-whitespace-delimited languages better, that
>> no other fieldType should be changed/added to better support
>> non-whitespace-delimited languages?
>
> No, that's not my position at all.
>
> My position is: please don't suddenly commit changes, with "your way",
> while we're still discussing how to solve the issue.  That's not the
> Apache way.

Dude... everyone has always agreed we need more fieldtypes to support
different languages (as you did earlier in this thread too).  There's been a
history of just adding stuff like that (half of the commits to the example
schema have no associated JIRA issue).

What happens to the default "text" field will have no bearing on that.
We will still need more field types to support more languages.
Would you be against me adding a text_cjk fieldtype too?

My position: it's silly for a lack of consensus on the "text" field to
block progesss on any other fieldtype.

-Yonik


Re: How to index and query "C#" as whole term?

2011-05-16 Thread Gora Mohanty
On Mon, May 16, 2011 at 7:05 PM, Gnanakumar  wrote:
> Hi,
>
> I'm using Apache Solr v3.1.
>
> How do I configure/allow Solr to both index and query the term "c#" as a
> whole word/term?  From "Analysis" page, I could see that the term "c#" is
> being reduced/converted into just "c" by solr.WordDelimiterFilterFactory.
[...]

Yes, as you have discovered the analyzers for the field type in
question will affect the values indexed.

To index "c#" exactly as is, you can use the "string" type, instead
of the "text" type. However, what you probably want some filters
to be applied, e.g., LowerCaseFilterFactory. Take a look at the
definition of the fieldType "text" in schema.xml, define a new field
type that has only the tokenizers and analyzers that you need, and
use that type for your field. This Wiki page should be helpful:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

Regards,
Gora


Re: How to index and query "C#" as whole term?

2011-05-16 Thread Jonathan Rochkind
I don't think you'd want to use the string type here. String type is 
almost never appropriate for a field you want to actually search on (it 
is appropriate for fields to facet on).


But you may want to use Text type with different analyzers selected.  
You probably want Text type so the value is still split into different 
tokens on word boundaries; you just don't want an analyzer set that 
removes punctuation.


On 5/16/2011 10:46 AM, Gora Mohanty wrote:

On Mon, May 16, 2011 at 7:05 PM, Gnanakumar  wrote:

Hi,

I'm using Apache Solr v3.1.

How do I configure/allow Solr to both index and query the term "c#" as a
whole word/term?  From "Analysis" page, I could see that the term "c#" is
being reduced/converted into just "c" by solr.WordDelimiterFilterFactory.

[...]

Yes, as you have discovered the analyzers for the field type in
question will affect the values indexed.

To index "c#" exactly as is, you can use the "string" type, instead
of the "text" type. However, what you probably want some filters
to be applied, e.g., LowerCaseFilterFactory. Take a look at the
definition of the fieldType "text" in schema.xml, define a new field
type that has only the tokenizers and analyzers that you need, and
use that type for your field. This Wiki page should be helpful:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

Regards,
Gora



Re: boolean versus non-boolean search

2011-05-16 Thread Jonathan Rochkind

Why? Becuase of how the solr/lucene query parser parses?

It parses into seperate tokens/phrases, and then marks each unit as 
mandatory or optional. The operator's joining the tokens/phrases are 
used to determine if a unit is mandatory or optional.


Since your defaultOperator="AND"

term1 term2 OR X

is the same as:

term1 AND term2 OR X

because it used the defaultOperator in between term1 and term2, since no 
explicit operator was provided.


Then we get to the one you specifically did add the AND in. I guess that 
it basically groups left-to-right. So:


term1 AND term2 OR X OR Y

is the same as:

term1 AND (term2 OR (X OR Y))

But I guess you already figured this all out, yeah?

On 5/16/2011 9:24 AM, Dmitry Kan wrote:

Dear list,

Might have missed it from the literature and the list, sorry if so, but:

SOLR 1.4.1



Consider the query:

term1 term2 OR "term1 term2" OR "term1 term3"


Problem: The query produces a hit containing only term1.

Solution: Modified query, grouping with parenthesis

(term1 term2) OR "term1 term2" OR "term1 term3"

produces hits with both term1 and term2 present and other hits that are hit
by OR'ed clauses.


Problem 1. Another modified query, AND instead of parenthesis:

term1 AND term2 OR "term1 term2" OR "term1 term3"

produces same results as the original query and same debug output.

Why is that?



RE: How to index and query "C#" as whole term?

2011-05-16 Thread Robert Petersen
I have always just converted terms like 'C#' or 'C++' into 'csharp' and
'cplusplus' before indexing them and similarly converted those terms if
someone searched on them.  That always has worked just fine for me...
:)

-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu] 
Sent: Monday, May 16, 2011 8:28 AM
To: solr-user@lucene.apache.org
Subject: Re: How to index and query "C#" as whole term?

I don't think you'd want to use the string type here. String type is 
almost never appropriate for a field you want to actually search on (it 
is appropriate for fields to facet on).

But you may want to use Text type with different analyzers selected.  
You probably want Text type so the value is still split into different 
tokens on word boundaries; you just don't want an analyzer set that 
removes punctuation.

On 5/16/2011 10:46 AM, Gora Mohanty wrote:
> On Mon, May 16, 2011 at 7:05 PM, Gnanakumar  wrote:
>> Hi,
>>
>> I'm using Apache Solr v3.1.
>>
>> How do I configure/allow Solr to both index and query the term "c#"
as a
>> whole word/term?  From "Analysis" page, I could see that the term
"c#" is
>> being reduced/converted into just "c" by
solr.WordDelimiterFilterFactory.
> [...]
>
> Yes, as you have discovered the analyzers for the field type in
> question will affect the values indexed.
>
> To index "c#" exactly as is, you can use the "string" type, instead
> of the "text" type. However, what you probably want some filters
> to be applied, e.g., LowerCaseFilterFactory. Take a look at the
> definition of the fieldType "text" in schema.xml, define a new field
> type that has only the tokenizers and analyzers that you need, and
> use that type for your field. This Wiki page should be helpful:
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>
> Regards,
> Gora
>


Re: How to index and query "C#" as whole term?

2011-05-16 Thread Markus Jelsma
Before indexing so outside Solr? Using the SynonymFilter would be easier i 
guess.

On Monday 16 May 2011 17:44:24 Robert Petersen wrote:
> I have always just converted terms like 'C#' or 'C++' into 'csharp' and
> 'cplusplus' before indexing them and similarly converted those terms if
> someone searched on them.  That always has worked just fine for me...
> 
> :)
> 
> -Original Message-
> From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
> Sent: Monday, May 16, 2011 8:28 AM
> To: solr-user@lucene.apache.org
> Subject: Re: How to index and query "C#" as whole term?
> 
> I don't think you'd want to use the string type here. String type is
> almost never appropriate for a field you want to actually search on (it
> is appropriate for fields to facet on).
> 
> But you may want to use Text type with different analyzers selected.
> You probably want Text type so the value is still split into different
> tokens on word boundaries; you just don't want an analyzer set that
> removes punctuation.
> 
> On 5/16/2011 10:46 AM, Gora Mohanty wrote:
> > On Mon, May 16, 2011 at 7:05 PM, Gnanakumar  wrote:
> >> Hi,
> >> 
> >> I'm using Apache Solr v3.1.
> >> 
> >> How do I configure/allow Solr to both index and query the term "c#"
> 
> as a
> 
> >> whole word/term?  From "Analysis" page, I could see that the term
> 
> "c#" is
> 
> >> being reduced/converted into just "c" by
> 
> solr.WordDelimiterFilterFactory.
> 
> > [...]
> > 
> > Yes, as you have discovered the analyzers for the field type in
> > question will affect the values indexed.
> > 
> > To index "c#" exactly as is, you can use the "string" type, instead
> > of the "text" type. However, what you probably want some filters
> > to be applied, e.g., LowerCaseFilterFactory. Take a look at the
> > definition of the fieldType "text" in schema.xml, define a new field
> > type that has only the tokenizers and analyzers that you need, and
> > use that type for your field. This Wiki page should be helpful:
> > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
> > 
> > Regards,
> > Gora

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: boolean versus non-boolean search

2011-05-16 Thread Dmitry Kan
Hi Jonathan,

Well, I clearly understand, why 'term1 term2 OR ...' gives exactly same
results as 'term1 AND term2 OR ...', but what I do not get is, why grouping
with parentheses is required to have both term1 and term2 in the same hit
even though AND is the default operator and space between terms is expected
to be treated as AND.

Dmitry


On Mon, May 16, 2011 at 6:33 PM, Jonathan Rochkind  wrote:

> Why? Becuase of how the solr/lucene query parser parses?
>
> It parses into seperate tokens/phrases, and then marks each unit as
> mandatory or optional. The operator's joining the tokens/phrases are used to
> determine if a unit is mandatory or optional.
>
> Since your defaultOperator="AND"
>
> term1 term2 OR X
>
> is the same as:
>
> term1 AND term2 OR X
>
> because it used the defaultOperator in between term1 and term2, since no
> explicit operator was provided.
>
> Then we get to the one you specifically did add the AND in. I guess that it
> basically groups left-to-right. So:
>
> term1 AND term2 OR X OR Y
>
> is the same as:
>
> term1 AND (term2 OR (X OR Y))
>
> But I guess you already figured this all out, yeah?
>
>
> On 5/16/2011 9:24 AM, Dmitry Kan wrote:
>
>> Dear list,
>>
>> Might have missed it from the literature and the list, sorry if so, but:
>>
>> SOLR 1.4.1
>> 
>>
>>
>> Consider the query:
>>
>> term1 term2 OR "term1 term2" OR "term1 term3"
>>
>>
>> Problem: The query produces a hit containing only term1.
>>
>> Solution: Modified query, grouping with parenthesis
>>
>> (term1 term2) OR "term1 term2" OR "term1 term3"
>>
>> produces hits with both term1 and term2 present and other hits that are
>> hit
>> by OR'ed clauses.
>>
>>
>> Problem 1. Another modified query, AND instead of parenthesis:
>>
>> term1 AND term2 OR "term1 term2" OR "term1 term3"
>>
>> produces same results as the original query and same debug output.
>>
>> Why is that?
>>
>>


-- 
Regards,

Dmitry Kan


Re: document storage

2011-05-16 Thread Mike Sokolov

On 05/15/2011 11:48 AM, Erick Erickson wrote:

Where are the documents coming from? Because storing them ONLY in
Solr risks losing them if your index is somehow hosed.
   
In our case, we generally have source documents and can reproduce the 
index if need be, but that's a good point.

Storing them externally only has the advantage that your index will be
much smaller, which helps when replicating as you scale. The downside
here is that highlighting will be more resource-intensive since you're
re-analyzing text in order to highlight.
   
I had been imagining that the Highlighter could use stored term 
positions so as to avoid re-analysis.  Is this incompatible with 
external storage?


We might conceivably need to replicate the documents anyway, even if 
they are stored externally, in order to make them available to a farm of 
servers, although a SAN is another possibility here.


My main concern about storing internally was the cost of merging 
(optimizing) the index.  Presumably that would be increased if the docs 
are stored in it.

So, as usual, "it depends" (tm). What is the scale you need? What
is the QPS you're thinking of supporting?
   
Things are working well at a small scale, and in that environment I 
think all of these solutions work more or less equally well.  We're 
worrying about 10's of millions of documents and QPS around 50, so I 
expect we will have some significant challenges in coordinating a 
cluster of servers, and we're trying to plan as well as we can for 
that.  We expect updates to be performed in a "batch" mode - they don't 
have to be real-time, but they might need to be daily.


-Mike


Problem with custom Similarity class

2011-05-16 Thread Alex Grilo
Hi,
I'm new to Solr and I'm trying to use my custom Similarity class but I've
not succeeded on that.

I added some debug information and my class is loaded, but it is not used
when queries are made.

Does someone could help me? If any further information is relevant, I can
provide it.

Thanks in advance
--
Alex Bredariol Grilo
Developer - umamao.com


Re: why query chinese character with bracket become phrase query by default?

2011-05-16 Thread Michael McCandless
On Mon, May 16, 2011 at 10:22 AM, Yonik Seeley
 wrote:

>> My position is: please don't suddenly commit changes, with "your way",
>> while we're still discussing how to solve the issue.  That's not the
>> Apache way.
>
> Dude... everyone has always agreed we need more fieldtypes to support
> different languages (as you did earlier in this thread too).

+1, and I still agree that'd be best.  In that ideal future we would
have no more "text" fieldType, only text_zh, text_en, etc.

> There's been a
> history of just adding stuff like that (half of the commits to the example
> schema have no associated JIRA issue).

I wasn't objecting to the lack of a referenced JIRA issue; I was
objecting to you suddenly committing 'your way" while we were still
discussing what to do.

> What happens to the default "text" field will have no bearing on that.

That's not really true?  I think any changes we make to any default
"text*" fieldTypes are strongly related.

For example, if we fix the "text" fieldType to have good all-around
defaults for all languages (ie, the patch on SOLR-2519) then we don't
need separate text_nwd/*_nwd field types.  Instead, maybe we could add
text_autophrase fieldTypes?  Or maybe text_en_autophrase?

> We will still need more field types to support more languages.

Right.

> Would you be against me adding a text_cjk fieldtype too?

text_cjk would be *awesome*, but text_zh, text_ja, text_ko would be
even better!

If we fix "text" fieldType to be generic for all languages (use
StandardAnalyzer, turn off autophrase), but then
go and add in specific languages over time (say text_en, text_cjk,
etc.), I think that's a great way to iterate towards the ideal future
where we have text_XX coverage for many languages.

> My position: it's silly for a lack of consensus on the "text" field to
> block progesss on any other fieldtype.

I disagree; I think changes to "text" fieldType are very much tied up
to what other "text_* fieldTypes we want to introduce.

This is a *really* important configuration file in Solr and we should
present good defaults with it.  People who first use Solr start with
the schema.xml as their starting point.

People who first start with ElasticSearch today get StandardAnalyzer
and no autophrase as the default, which is the best overall default
Lucene has to offer right now.  I think Solr should do the same.

So to sum up, I think we should:

  1) Fix "text" fieldType to stop destroying non-whitespace languages,
 and use the best "general" defaults we have to offer today
 (switch from WhitespaceTokenizer -> StandardTokenizer, and turn
 off autophrase); this is the patch on SOLR-2519.

  2) Add in text_XX specific language field types for as many as we
 can now, iterating over time to add more as we can / people get
 the itch.  We now have a fabulous analysis module (thank you
 Robert!), so we should take advantage of that and at least make
 text_XX for all the matching analyzers in there.

Let's continue this on the issue...

Mike

http://blog.mikemccandless.com


Re: assit with the Clustering component in Solr/Lucene

2011-05-16 Thread Stanislaw Osinski
>
> Both of the clustering algorithms that ship with Solr (Lingo and STC) are
>> designed to allow one document to appear in more than one cluster, which
>> actually does make sense in many scenarios. There's no easy way to force
>> them to produce hard clusterings because this would require a complete
>> change in the way the algorithms work. If you need each document to belong
>> to exactly one cluster, you'd have to post-process the clusters to remove
>> the redundant document assignments.
>>
>
> On the second thought, I have a simple implementation of k-means clustering
> that could do hard clustering for you. It's not available yet, it will most
> probably be part of the next major release of Carrot2 (the package that does
> the clustering). Please watch this issue
> http://issues.carrot2.org/browse/CARROT-791 to get updates on this.
>

Just to let you know: Carrot2 3.5.0 has landed in Solr trunk and branch_3x,
so you can use the bisecting k-means clustering algorithm
(org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm) which
will produce non-overlapping clusters for you. The downside of this simple
implementation of k-means is that, for the time being, it produces one-word
cluster labels rather than phrases as Lingo and STC.

Cheers,

S.


Re: Debugging same SOLR installation on 2 different servers

2011-05-16 Thread Paul Michalet

Thanks Erick !

As I re-checked the configuration files, it turns out someone had 
modified the /solr/conf/*stopwords.txt* on the production server,
and now we know what problem we're dealing with, which seems to be 
related to:
 - 
http://lucene.472066.n3.nabble.com/Dismax-Minimum-Match-Stopwords-Bug-td493483.html#a493488
 - 
http://stackoverflow.com/questions/3635096/dismax-feat-stopwords-synonyms-etc


Now I've tried to get around that issue by changing name="mm">2<-35% to 1 in *solrconfig.xml*,
as suggested on http://drupal.org/node/1102646#comment-4249774 which 
actually gets us results for the incriminated queries, but it adds way 
too much *noise*...


So I tried to make sure all my field types were using our 
StopFilterFactory (even sortMissingLast="true" omitNorms="true">), with no luck.
I'll keep on looking for clues, meanwhile if there's a known way around 
that issue, I'd be really grateful to hear about it :)


Cheers !
Paul


Le 15/05/2011 16:48, Erick Erickson a écrit :

What happens if you copy the index from one machine to the other? Probably from
prod to test. If your results stay the same, that'd eliminate index
differences as
the culprit.

What do you get by attaching&debugQuery=on the the queries that differ?
Is the parsed query any different? I'm wondering here if you somehow
have a difference in the configuration, perhaps dismax? anyway, if the
parsed queries are identical, that eliminates that possibility.

Next, what about synonym files? Stopwords? Are you absolutely sure they're
identical?

If you're using dismax, is it possible that the mm (minimum should match)
is different?

Perhaps this is all stuff you've done already, but this would at least narrow
down where the problem might lie...

Best
Erick

On Wed, May 11, 2011 at 12:10 PM, Paul Michalet  wrote:

Thanks for the hint :)
We ruled that out after having tested special characters, and if it was an
applicative bug, it wouldn't work consistently like it currently does for
the majority of queries.
The only difference we noticed was in the HTTP headers in the SOLR response:
occasionnally, the "Content-length" is present, but I've been told it was
probably not causing our bug:
  =>  dev:
headers = Array
(
[0] =>  HTTP/1.1 200 OK
[1] =>  Last-Modified: Fri, 29 Apr 2011 13:36:21 GMT
[2] =>  ETag: "MTFjZjU2MTgxNDgwMDAwMFNvbHI="
[3] =>  Content-Type: text/plain; charset=utf-8
[4] =>  Server: Jetty(6.1.3)
)

=>  production:
headers = Array
(
[0] =>  HTTP/1.1 200 OK
[1] =>  Last-Modified: Fri, 06 May 2011 14:18:36 GMT
[2] =>  ETag: "OGI3ZWYyZDUxNDgwMDAwMFNvbHI="
[3] =>  Content-Type: text/plain; charset=utf-8
[4] =>  Content-Length: 2558
[5] =>  Server: Jetty(6.1.3)
)

Paul Michalet

Le 11/05/2011 17:47, Paul Libbrecht a écrit :

Could it be something in the transmission of the query?
Or is it also identical?

paul


Le 11 mai 2011 à 17:19, Paul Michalet a écrit :


Hello everyone

We have succesfully installed SOLR on 2 servers (developpement and
production), using the same configuration files and paths.
Both SOLR instances have indexed the same contents and most queries give
identical results, but there's a few exceptions where the production
instance returns 0 results (the developpement instance returns perfectly
valid results for the same query).
We checked the logs in both environments without finding anything
suspicous (the queries are rigorously identical, and the index is built in
the exact same way) and we've run out of options as to where to look for
debugging these cases.

Our developpement server is Debian and the production is CentOS;
the SOLR version installed in both environments is 1.4.0.

The weird thing is that the few queries failing in the production
instance contain very common terms (without quotes) which, when queried
individually, return valid results...
Any pointers would be greatly appreciated;
thanks in advance !

Paul


Re: boolean versus non-boolean search

2011-05-16 Thread Mike Sokolov


On 05/16/2011 09:24 AM, Dmitry Kan wrote:

Dear list,

Might have missed it from the literature and the list, sorry if so, but:

SOLR 1.4.1



Consider the query:

term1 term2 OR "term1 term2" OR "term1 term3"

   
I think what's happening is that your query gets rewritten into 
something like:


+term1 + (term2? "term1 term2"? term3?)

where in my notation ? means  is "optional", and "+" means 
required.  So any document would match the second clause


-Mike


Re: assit with the Clustering component in Solr/Lucene

2011-05-16 Thread ramdev.wudali
Thanks much Stan,


Ramdev

On May 16, 2011, at 11:38 AM, Stanislaw Osinski wrote:


Both of the clustering algorithms that ship with Solr 
(Lingo and STC) are designed to allow one document to appear in more than one 
cluster, which actually does make sense in many scenarios. There's no easy way 
to force them to produce hard clusterings because this would require a complete 
change in the way the algorithms work. If you need each document to belong to 
exactly one cluster, you'd have to post-process the clusters to remove the 
redundant document assignments.



On the second thought, I have a simple implementation of 
k-means clustering that could do hard clustering for you. It's not available 
yet, it will most probably be part of the next major release of Carrot2 (the 
package that does the clustering). Please watch this issue 
http://issues.carrot2.org/browse/CARROT-791 to get updates on this.



Just to let you know: Carrot2 3.5.0 has landed in Solr trunk and 
branch_3x, so you can use the bisecting k-means clustering algorithm 
(org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm) which will 
produce non-overlapping clusters for you. The downside of this simple 
implementation of k-means is that, for the time being, it produces one-word 
cluster labels rather than phrases as Lingo and STC.

Cheers,

S.






Re: [POLL] How do you (like to) do logging with Solr

2011-05-16 Thread Mike Sokolov
We use log4j explicitly and find it irritating to deal with the built-in 
JDK logging default.  We also have conflicts with other packages that 
have their own ideas about how to bind slf4j, so the less of this the 
better, IMO.  The 1.6.1 no-op default behavior seems a bit unfortunate 
as out-of-the-box behavior to me though. Not sure if there's anything to 
be done about that.  Can you log to stderr when there's no logger available?


-Mike

On 05/16/2011 04:43 AM, Jan Høydahl wrote:

Hi,

This poll is to investigate how you currently do or would like to do logging with Solr 
when deploying solr.war to a SEPARATE java application server (such as Tomcat, Resin etc) 
outside of the bundled "solr/example". For background on how things work in 
Solr now, see http://wiki.apache.org/solr/SolrLogging and for more info on the SLF4J 
framework, see http://www.slf4j.org/manual.html

Please tick one of the options below with an [X]:

[ ]  I always use the JDK logging as bundled in solr.war, that's perfect
[ ]  I sometimes use log4j or another framework and am happy with re-packaging 
solr.war
[X]  Give me solr.war WITHOUT an slf4j logger binding, so I can choose at 
deploy time
[ ]  Let me choose whether to bundle a binding or not at build time, using an 
ANT option
[ ]  What's wrong with the "solr/example" Jetty? I never run Solr elsewhere!
[ ]  What? Solr can do logging? How cool!

Note that NOT bundling a logger binding with solr.war means defaulting to the 
NOP logger after outputting these lines to stderr:
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

   


Re: UIMA analysisEngine path

2011-05-16 Thread chamara
Hi Tommaso,
 Thanks for the quick reply. I had copied the lib files and
followed instructions on http://wiki.apache.org/solr/SolrUIMA#Installation.
However i get this error. The AnalysisEngine has the default class path
which is /org/apache/uima/desc/.

SEVERE: org.apache.solr.common.SolrException: Error Instantiating
UpdateRequestP
rocessorFactory,
org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactor
y is not a org.apache.solr.update.processor.UpdateRequestProcessorFactory


Regards,
Chamara


On Mon, May 16, 2011 at 9:17 AM, Tommaso Teofili [via Lucene] <
ml-node+2946920-843126873-399...@n3.nabble.com> wrote:

> Hello,
>
> if you want to take the descriptor from a jar, provided that you configured
>
> the jar inside a  element in solrconfig, then you just need to write
> the correct classpath in the analysisEngine element.
> For example if your descriptor resides in com/something/desc/ path inside
> the jar then you should set the analysisEngine element as
> /com/something/desc/descriptorname.xml
> If you instead need to get the descriptor from filesystem try the patch in
> SOLR-2501 [1].
> Hope this helps,
> Tommaso
>
> [1] :  https://issues.apache.org/jira/browse/SOLR-2501
>
> 2011/5/13 chamara <[hidden 
> email]>
>
>
> > Hi,
> >  Is this code line 57 needs to be changed to the location where the jar
> > files(library files) resides?
> >  URL url = this.getClass().getResource(""); I
> > did
> > change it but no luck so far. Let me know what i am doing wrong?
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/UIMA-analysisEngine-path-tp2895284p2935541.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/UIMA-analysisEngine-path-tp2895284p2946920.html
>  To unsubscribe from UIMA analysisEngine path, click 
> here.
>
>



-- 
--- Chamara 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/UIMA-analysisEngine-path-tp2895284p2948760.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem with custom Similarity class

2011-05-16 Thread Gora Mohanty
On Mon, May 16, 2011 at 10:04 PM, Alex Grilo  wrote:
> Hi,
> I'm new to Solr and I'm trying to use my custom Similarity class but I've
> not succeeded on that.
>
> I added some debug information and my class is loaded, but it is not used
> when queries are made.
>
> Does someone could help me? If any further information is relevant, I can
> provide it.
[...]

Have you overriden the default similarity class in schema.xml?
Though, if your class is getting loaded, that should be the case.

The code for the class should be pretty small, right? Please post
it here, or better yet at pastebin.com, and send a link to this list.

Regards,
Gora


Re: solr velocity.log setting

2011-05-16 Thread Yuhan Zhang
I solved the problem of velocity.log following this tutorial:

http://kris-itproblems.blogspot.com/2010/11/velocitylog-permission-denied.html

On Thu, May 12, 2011 at 6:36 PM, Yuhan Zhang  wrote:

> hi all,
>
> I'm new to solr, and trying to install it on tomcat. however, an exception
> was reached when
> the page http://localhost/sorl/browse was visited:
>
>  *FileNotFoundException: velocity.log (Permission denied) *
>
> looks like solr is trying to create a velocity.log file to tomcat root. so,
> how should I set the configuration
> file on solr to change the location that velocity.log is logging to?
>
> Thank you.
>
> Y
>


Re: UIMA analysisEngine path

2011-05-16 Thread Tommaso Teofili
The error you pasted doesn't seem to be related to a (class)path issue but more 
likely to be related to a Solr instance at 1.4.1/3.1.0 and Solr-UIMA module at 
3.1.0/4.0-SNAPSHOT(trunk); it seems that the error raises from 
UpdateRequestProcessorFactory API changed.
Hope this helps,
Tommaso


Il giorno 16/mag/2011, alle ore 18.54, chamara ha scritto:

> Hi Tommaso,
> Thanks for the quick reply. I had copied the lib files and
> followed instructions on http://wiki.apache.org/solr/SolrUIMA#Installation.
> However i get this error. The AnalysisEngine has the default class path
> which is /org/apache/uima/desc/.
> 
> SEVERE: org.apache.solr.common.SolrException: Error Instantiating
> UpdateRequestP
> rocessorFactory,
> org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactor
> y is not a org.apache.solr.update.processor.UpdateRequestProcessorFactory
> 
> 
> Regards,
> Chamara
> 
> 
> On Mon, May 16, 2011 at 9:17 AM, Tommaso Teofili [via Lucene] <
> ml-node+2946920-843126873-399...@n3.nabble.com> wrote:
> 
>> Hello,
>> 
>> if you want to take the descriptor from a jar, provided that you configured
>> 
>> the jar inside a  element in solrconfig, then you just need to write
>> the correct classpath in the analysisEngine element.
>> For example if your descriptor resides in com/something/desc/ path inside
>> the jar then you should set the analysisEngine element as
>> /com/something/desc/descriptorname.xml
>> If you instead need to get the descriptor from filesystem try the patch in
>> SOLR-2501 [1].
>> Hope this helps,
>> Tommaso
>> 
>> [1] :  https://issues.apache.org/jira/browse/SOLR-2501
>> 
>> 2011/5/13 chamara <[hidden 
>> email]>
>> 
>> 
>>> Hi,
>>> Is this code line 57 needs to be changed to the location where the jar
>>> files(library files) resides?
>>> URL url = this.getClass().getResource(""); I
>>> did
>>> change it but no luck so far. Let me know what i am doing wrong?
>>> 
>>> --
>>> View this message in context:
>>> 
>> http://lucene.472066.n3.nabble.com/UIMA-analysisEngine-path-tp2895284p2935541.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>> 
>> 
>> 
>> --
>> If you reply to this email, your message will be added to the discussion
>> below:
>> 
>> http://lucene.472066.n3.nabble.com/UIMA-analysisEngine-path-tp2895284p2946920.html
>> To unsubscribe from UIMA analysisEngine path, click 
>> here.
>> 
>> 
> 
> 
> 
> -- 
> --- Chamara 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/UIMA-analysisEngine-path-tp2895284p2948760.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Problem indexing CSV

2011-05-16 Thread Dora Wuyts

















I’m pretty
new to Solr and I have a question about indexing data using CSV. 

I have a
Blacklight-application running on my Mac 10.6.7 and I configured the schema.xml
and solrconfig.xml in the separate Apache-Solr-directory according to the
guidelines on the Blacklight-website. I have added the RequestHandler to
solrconfig.xml as well. But when I try to index the exemplary document
books.csv (with Solr and the Blacklight script running in the background), I
get an error saying that it came across an undefined field, cat. I assume it’s
not just cat that isn’t recognised as a field. 

What should I do to make the
indexing via CSV possible, both for the exemplary document as for further
documents to follow?


Kind
regards   

Re: Problem with custom Similarity class

2011-05-16 Thread Alex Grilo
The code is here: http://pastebin.com/50ugqRfA

and my schema.xml configuration entry for
similarity is:


Thanks

Alex

On Mon, May 16, 2011 at 2:01 PM, Gora Mohanty  wrote:

> On Mon, May 16, 2011 at 10:04 PM, Alex Grilo  wrote:
> > Hi,
> > I'm new to Solr and I'm trying to use my custom Similarity class but I've
> > not succeeded on that.
> >
> > I added some debug information and my class is loaded, but it is not used
> > when queries are made.
> >
> > Does someone could help me? If any further information is relevant, I can
> > provide it.
> [...]
>
> Have you overriden the default similarity class in schema.xml?
> Though, if your class is getting loaded, that should be the case.
>
> The code for the class should be pretty small, right? Please post
> it here, or better yet at pastebin.com, and send a link to this list.
>
> Regards,
> Gora
>


Re: why query chinese character with bracket become phrase query by default?

2011-05-16 Thread Chris Hostetter

: Does anyone disagree that Yonik's commit was inappropriate?  This is
: not how we work at Apache.

FWIW: I don't see how Yonik's commit was inappropriate at all

He added some new example configuration to "trunk" that was unused, and in 
no way "un-did" or "blocked" any other attempts at improving the configs.

It had no impact on any existing usage, and only served as an example 
(which could be iterated forward)

I seriously don't see the problem here.

-Hoss


RE: How to index and query "C#" as whole term?

2011-05-16 Thread Robert Petersen
Sorry I am also using a synonyms.txt for this in the analysis stack.  I
was not clear, sorry for any confusion.  I am not doing it outside of
Solr but on the way into the index it is converted...  :)

-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
Sent: Monday, May 16, 2011 8:51 AM
To: solr-user@lucene.apache.org
Subject: Re: How to index and query "C#" as whole term?

Before indexing so outside Solr? Using the SynonymFilter would be easier
i 
guess.

On Monday 16 May 2011 17:44:24 Robert Petersen wrote:
> I have always just converted terms like 'C#' or 'C++' into 'csharp'
and
> 'cplusplus' before indexing them and similarly converted those terms
if
> someone searched on them.  That always has worked just fine for me...
> 
> :)
> 
> -Original Message-
> From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
> Sent: Monday, May 16, 2011 8:28 AM
> To: solr-user@lucene.apache.org
> Subject: Re: How to index and query "C#" as whole term?
> 
> I don't think you'd want to use the string type here. String type is
> almost never appropriate for a field you want to actually search on
(it
> is appropriate for fields to facet on).
> 
> But you may want to use Text type with different analyzers selected.
> You probably want Text type so the value is still split into different
> tokens on word boundaries; you just don't want an analyzer set that
> removes punctuation.
> 
> On 5/16/2011 10:46 AM, Gora Mohanty wrote:
> > On Mon, May 16, 2011 at 7:05 PM, Gnanakumar
wrote:
> >> Hi,
> >> 
> >> I'm using Apache Solr v3.1.
> >> 
> >> How do I configure/allow Solr to both index and query the term "c#"
> 
> as a
> 
> >> whole word/term?  From "Analysis" page, I could see that the term
> 
> "c#" is
> 
> >> being reduced/converted into just "c" by
> 
> solr.WordDelimiterFilterFactory.
> 
> > [...]
> > 
> > Yes, as you have discovered the analyzers for the field type in
> > question will affect the values indexed.
> > 
> > To index "c#" exactly as is, you can use the "string" type, instead
> > of the "text" type. However, what you probably want some filters
> > to be applied, e.g., LowerCaseFilterFactory. Take a look at the
> > definition of the fieldType "text" in schema.xml, define a new field
> > type that has only the tokenizers and analyzers that you need, and
> > use that type for your field. This Wiki page should be helpful:
> > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
> > 
> > Regards,
> > Gora

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Highlighting issue with Solr 3.1

2011-05-16 Thread Nemani, Raj
All,

 

I have just installed Solr 3.1 running on Tomcat 7.  I am noticing a possible 
issue with Highlighting.  I have a filed in my index called "story".  The solr 
document that I am testing with the data in the story field starts with the 
following snippet (remaining data in the field is not shown to keep things 
simple)

 

EN AMÉRICA LATINA, 

 

When I search for "america" with the highlighting enabled on the "story' field, 
here is what I get in my "highlighting" section of the response.  I am using 
the "ASCIIFoldingFilterFactory" to make my searches accent insensitive.  

 

EN AMÉRICA LATINA, SE HAN PRODUCIDO AVANCES, CON RESPECTO A LA PROTECCIÓN. The problem is the encode html tags before the showing up as raw html tags (because of the encoding) on my search results page. Just to make sure, I do want the html to be interpreted as html not as text. In this particular situation I am not worried about the dangers of allowing such behavior. The same test performed on the same data running on 1.4.1 index does not exhibit this behavior. Any help is appreciated. Please let me know if I need to post my field type definitions (index and query) from the SolrConfig.xml for the "story" field. Thanks in advance Raj


indexing directed graph

2011-05-16 Thread dani.b.angelov
Hello,
is it possible to index graph - named vertices and named edges? My target
is, with text search to find whether particular node is connected(direct or
indirect) with another. Thank you.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-directed-graph-tp2949556p2949556.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: indexing directed graph

2011-05-16 Thread Gora Mohanty
On Tue, May 17, 2011 at 1:20 AM, dani.b.angelov
 wrote:
> Hello,
> is it possible to index graph - named vertices and named edges? My target
> is, with text search to find whether particular node is connected(direct or
> indirect) with another. Thank you.
[...]

There was a discussion earlier on this topic, i.e., how to efficiently
store graphs in Solr/Lucene, but IMHO, at the moment, you are
better off using a graph database that is specifically adapted for
that.

Regards,
Gora


Re: indexing directed graph

2011-05-16 Thread Stefan Matheis

Dani,

i'm actually playing with Neo4j .. and the have a Lucene-Indexing and 
plan to have Solr-Integration (no idea what the current state is).


http://lists.neo4j.org/pipermail/user/2010-January/002372.html

Regards
Stefan

Am 16.05.2011 21:50, schrieb dani.b.angelov:

Hello,
is it possible to index graph - named vertices and named edges? My target
is, with text search to find whether particular node is connected(direct or
indirect) with another. Thank you.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-directed-graph-tp2949556p2949556.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: [POLL] How do you (like to) do logging with Solr

2011-05-16 Thread Stephen Duncan Jr
[ ]  I always use the JDK logging as bundled in solr.war, that's perfect
[ ]  I sometimes use log4j or another framework and am happy with
re-packaging solr.war
[X]  Give me solr.war WITHOUT an slf4j logger binding, so I can choose at
deploy time
[ ]  Let me choose whether to bundle a binding or not at build time, using
an ANT option
[ ]  What's wrong with the "solr/example" Jetty? I never run Solr elsewhere!
[ ]  What? Solr can do logging? How cool!


Actually, more specifically, the build distribution could build a war done
either way, but I'd most like to see the war file WITHOUT a binding be
deployed to Maven.

As it stands, I've done both 1) deploy solr without logging to Maven and use
it, and 2) deploy solr with jdk logging to Maven, then have a Maven build
repackage to remove jdk and use my preferred implementation (logback).  I've
only done 2) at the preference of others who don't want me to deploy a
"modified" war to our Maven repo.

Stephen Duncan Jr
www.stephenduncanjr.com


Re: indexing directed graph

2011-05-16 Thread dani.b.angelov
Thank you Gora,

1. Could you confirm, that the context of  IMHO is 'In My Humble Opinion'.
2. Could you point example of graph database.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-directed-graph-tp2949556p2949734.html
Sent from the Solr - User mailing list archive at Nabble.com.


indexing directed graph

2011-05-16 Thread dani.b.angelov
Hello,
is it possible to index graph - named vertices and named edges? My target
is, with text search to find whether particular node is connected(direct or
indirect) with another. Thank you.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-directed-graph-tp2949553p2949553.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: indexing directed graph

2011-05-16 Thread Gora Mohanty
On Tue, May 17, 2011 at 1:55 AM, dani.b.angelov
 wrote:
> Thank you Gora,
>
> 1. Could you confirm, that the context of  IMHO is 'In My Humble Opinion'.

Yes, sorry. I must remember to be less lazy in typing.

> 2. Could you point example of graph database.
[...]

As Stefan points out above, I like Neo4j among the few that I have
played around with. Searching Google for "open source graph databases"
turns up several, and some reviews.

Regards,
Gora


Re: How to index and query "C#" as whole term?

2011-05-16 Thread Erick Erickson
The other advantage to the synonyms approach is it will be much less
of a headache down the road.

For instance, imagine you've defined "whitespacetokenizer" and
"lowercasefilter".
That'll fix your example just fine. It'll also cause all punctuation
to be included in
the tokens, so if you indexed "try to find me." (note the period) and
searched for
"me" (without the period) you'd not get a hit.

Then, let's say you get clever and do a regex manipulation via
PatternReplaceCharFilterFactory to leave in '#' but remove other
punctuation.
Then any miscellaneous stream that contains a # will give surprising
results. Consider 15# (for 15 pounds). Won't match 15 in a search now.

So whatever solution you choose, think about it pretty carefully before
you jump ..

Best
Erick

On Mon, May 16, 2011 at 2:10 PM, Robert Petersen  wrote:
> Sorry I am also using a synonyms.txt for this in the analysis stack.  I
> was not clear, sorry for any confusion.  I am not doing it outside of
> Solr but on the way into the index it is converted...  :)
>
> -Original Message-
> From: Markus Jelsma [mailto:markus.jel...@openindex.io]
> Sent: Monday, May 16, 2011 8:51 AM
> To: solr-user@lucene.apache.org
> Subject: Re: How to index and query "C#" as whole term?
>
> Before indexing so outside Solr? Using the SynonymFilter would be easier
> i
> guess.
>
> On Monday 16 May 2011 17:44:24 Robert Petersen wrote:
>> I have always just converted terms like 'C#' or 'C++' into 'csharp'
> and
>> 'cplusplus' before indexing them and similarly converted those terms
> if
>> someone searched on them.  That always has worked just fine for me...
>>
>> :)
>>
>> -Original Message-
>> From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
>> Sent: Monday, May 16, 2011 8:28 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: How to index and query "C#" as whole term?
>>
>> I don't think you'd want to use the string type here. String type is
>> almost never appropriate for a field you want to actually search on
> (it
>> is appropriate for fields to facet on).
>>
>> But you may want to use Text type with different analyzers selected.
>> You probably want Text type so the value is still split into different
>> tokens on word boundaries; you just don't want an analyzer set that
>> removes punctuation.
>>
>> On 5/16/2011 10:46 AM, Gora Mohanty wrote:
>> > On Mon, May 16, 2011 at 7:05 PM, Gnanakumar
> wrote:
>> >> Hi,
>> >>
>> >> I'm using Apache Solr v3.1.
>> >>
>> >> How do I configure/allow Solr to both index and query the term "c#"
>>
>> as a
>>
>> >> whole word/term?  From "Analysis" page, I could see that the term
>>
>> "c#" is
>>
>> >> being reduced/converted into just "c" by
>>
>> solr.WordDelimiterFilterFactory.
>>
>> > [...]
>> >
>> > Yes, as you have discovered the analyzers for the field type in
>> > question will affect the values indexed.
>> >
>> > To index "c#" exactly as is, you can use the "string" type, instead
>> > of the "text" type. However, what you probably want some filters
>> > to be applied, e.g., LowerCaseFilterFactory. Take a look at the
>> > definition of the fieldType "text" in schema.xml, define a new field
>> > type that has only the tokenizers and analyzers that you need, and
>> > use that type for your field. This Wiki page should be helpful:
>> > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>> >
>> > Regards,
>> > Gora
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>


Re: indexing directed graph

2011-05-16 Thread dani.b.angelov
I am wandering, whether the following idea is worth.
We can describe the graph with series of triples. So can we create some bean
with fields, for example:
...
@Field
String[] sybjects;
@Field
String[] predicates;
@Field
String[] objects;
@Field
int[] level;
...
or other combination of metadata.
We can index/search this bean. Based on the content of the found bean, we
can conclude for interconnections between graph participants.
What do you thing?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-directed-graph-tp2949556p2949845.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: indexing directed graph

2011-05-16 Thread Jonathan Rochkind
You can certainly index it, the problem will be being able to make the 
kinds of queries you want to make on it once indexed. Indexing it in a 
way that will let you do such queries.


The kind of typical queries I'd imagine you wanting to run on such a 
graph -- I can't think of any way to index in Solr to support. But if 
you give examples of the sorts of queries you want to run, maybe someone 
else has an idea, or can give a definitive 'no'.


On 5/16/2011 3:49 PM, dani.b.angelov wrote:

Hello,
is it possible to index graph - named vertices and named edges? My target
is, with text search to find whether particular node is connected(direct or
indirect) with another. Thank you.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-directed-graph-tp2949553p2949553.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: indexing directed graph

2011-05-16 Thread Jonathan Rochkind
Absolutely you can index each point or line of the graph with it's own 
document in Solr, perhaps as a triple. (Sounds like you are specifically 
talking about RDF-type data, huh?  Asking about that specifically might 
get you more useful ideas than asking graphs in general).


But if you want to then figure out if two points are connected, or get 
the list of all points within X distance from a known point, or do other 
things you are likely to want to do it with it... Solr's not going to 
give you the tools to do that, indexed like that.


On 5/16/2011 4:52 PM, dani.b.angelov wrote:

I am wandering, whether the following idea is worth.
We can describe the graph with series of triples. So can we create some bean
with fields, for example:
...
@Field
String[] sybjects;
@Field
String[] predicates;
@Field
String[] objects;
@Field
int[] level;
...
or other combination of metadata.
We can index/search this bean. Based on the content of the found bean, we
can conclude for interconnections between graph participants.
What do you thing?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-directed-graph-tp2949556p2949845.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: How many UpdateHandlers can a Solr config have?

2011-05-16 Thread Chris Hostetter

: just a very basic question, but I haven't been able to find the answer in
: the Solr wiki: how many updateHandlers can one Solr config have? Just one?
: Or many?

There can only be one  declaration in solrconfig.xml, 
it's 
responsible for "owning" updates to the index.

But there can be any number of  declarations to 
configure request handlers that do updates, as well as any number of 
 declarations that can identify the 
processors used for dealing with updates (which cna be refered to by name 
from the request handlers)


-Hoss


Re: Highlighting does not work when using !boost as a nested query

2011-05-16 Thread Chris Hostetter

: As I said in my previous message, if I issue:
: q=+field1:range +field2:value +_query_:{!dismax v=$qq}
: highlighting works. I've just discovered the problem is not just with 
{!boost...}. If I just add a bf parameter to the previous query, highlighting 
also fails.
: Anybody knows what can be happening? I'm really stuck on this problem...

Just a hunch, but i suspect the problem has to do with 
highlighter (or maybe it's the fragment generator?) trying to determine
matches from query types it doens't understand 

I thought there was a query param you could use to tell the highlighter to 
use an "alternate" query string (that would be simpler) instead of the 
real query ... but i'm not seeing it in the docs.

hl.requireFieldMatch=false might also help (not sure)

In general it would probably be helpful for folks if you could post the 
*entire* request you are making (full query string and all request params) 
along with the solrconfig.xml sections that show how your request handler 
and highlighter are configured.



-Hoss


Re: K-Stemmer for Solr 3.1

2011-05-16 Thread Smiley, David W.
Lucid's KStemmer is LGPL and the Solr committers have shown that they don't 
want LGPL libraries shipping with Solr. If you are intent on releasing your 
changes, I suggest attaching both the modified source and the compiled jar onto 
Solr's k-stemmer wiki page; and of course say that it's LGPL licensed.

~ David Smiley

On May 16, 2011, at 2:24 AM, Bernd Fehling wrote:

> I don't know if it is allowed to modify Lucid code and add it to jira.
> If someone from Lucid would give me the permission and the Solr developers
> have nothing against it I won't mind adding the Lucid KStemmer to jira
> for Solr 3.x and 4.x.
> 
> There are several Lucid KStemmer users which I can see from the many requests
> which I got. Also the Lucid KStemmer is faster than the standard KStemmer.
> 
> Bernd
> 
> Am 16.05.2011 06:33, schrieb Bill Bell:
>> Did you upload the code to Jira?
>> 
>> On 5/13/11 12:28 AM, "Bernd Fehling"
>> wrote:
>> 
>>> I backported a Lucid KStemmer version from solr 4.0 which I found
>>> somewhere.
>>> Just changed from
>>> import org.apache.lucene.analysis.util.CharArraySet;  // solr4.0
>>> to
>>> import org.apache.lucene.analysis.CharArraySet;  // solr3.1
>>> 
>>> Bernd
>>> 
>>> 
>>> Am 12.05.2011 16:32, schrieb Mark:
 java.lang.AbstractMethodError:
 org.apache.lucene.analysis.TokenStream.incrementToken()Z
 
 Would you mind explaining your modifications? Thanks
 
 On 5/11/11 11:14 PM, Bernd Fehling wrote:
> 
> Am 12.05.2011 02:05, schrieb Mark:
>> It appears that the older version of the Lucid Works KStemmer is
>> incompatible with Solr 3.1. Has anyone been able to get this to work?
>> If not,
>> what are you using as an alternative?
>> 
>> Thanks
> 
> Lucid KStemmer works nice with Solr3.1 after some minor mods to
> KStemFilter.java and KStemFilterFactory.java.
> What problems do you have?
> 
> Bernd
>> 
>> 
> 
> -- 
> *
> Bernd FehlingUniversitätsbibliothek Bielefeld
> Dipl.-Inform. (FH)Universitätsstr. 25
> Tel. +49 521 106-4060   Fax. +49 521 106-4052
> bernd.fehl...@uni-bielefeld.de33615 Bielefeld
> 
> BASE - Bielefeld Academic Search Engine - www.base-search.net
> *



Re: Problem with custom Similarity class

2011-05-16 Thread Chris Hostetter

: The code is here: http://pastebin.com/50ugqRfA
: 
: and my schema.xml configuration entry for
: similarity is:
: 

exactly what version of Solr are you using?

what does the full field/fieldType declaration look like in your 
schema.xml for the filed you are testing with?

what does your exactl query request look like? 

The trunk branch of lucene/solr has made some changes to how similarity 
works (it's now very much per field) and how you declare your similarity 
in schema.xml ... if i remember correctly, the syntax from 3.1 to declare 
a "global" similarity *should* still work in 4.x as a way to declare the 
"default" used by fields that don't define a similarity, but there may be 
a bug (or i may be remembering incorrectly ... if the syntax really is no 
longer used at all then we should make sure it logs a nice fat error on 
startup)

: > > I added some debug information and my class is loaded, but it is not used
: > > when queries are made.

Please clarify exactly how you are testing this and what you mean by "is 
not used when queries are made" ... it's important to rule out the 
possibility that you are just missunderstanding how the similarity methods 
are used.


-Hoss


RE: K-Stemmer for Solr 3.1

2011-05-16 Thread Steven A Rowe
On 5/16/2011 at 5:33 PM, David W. Smiley wrote:
> Lucid's KStemmer is LGPL and the Solr committers have shown that they
> don't want LGPL libraries shipping with Solr. If you are intent on
> releasing your changes, I suggest attaching both the modified source and
> the compiled jar onto Solr's k-stemmer wiki page; and of course say that
> it's LGPL licensed.

AFAICT, all Apache MoinMoin wikis (at least Lucene's and Solr's) have disabled 
attachments - you can't retrieve existing attachments, and you can't create new 
ones.  (Spam, apparently, was the impetus for this change.)

Steve


Re: K-Stemmer for Solr 3.1

2011-05-16 Thread Robert Muir
On Mon, May 16, 2011 at 5:33 PM, Smiley, David W.  wrote:
> Lucid's KStemmer is LGPL and the Solr committers have shown that they don't 
> want LGPL libraries shipping with Solr. If you are intent on releasing your 
> changes, I suggest attaching both the modified source and the compiled jar 
> onto Solr's k-stemmer wiki page; and of course say that it's LGPL licensed.
>
> ~ David Smiley

Hi David, I don't know much about this stemmer but the original
implementation is BSD-licensed
(http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi)


Re: [POLL] How do you (like to) do logging with Solr

2011-05-16 Thread Chris Hostetter

: This poll is to investigate how you currently do or would like to do 
: logging with Solr when deploying solr.war to a SEPARATE java application 
: server (such as Tomcat, Resin etc) outside of the bundled 

FWIW...

a) the context of this poll is SOLR-2487

b) this poll seems flawed to me, as it completely sidesteps what i 
consider the major crux of the issue: 


   If: You are someone who does not like (or has conflicts with) 
   the JDK logging binding currently included in the solr.war 
   that is built by default and included in the binary releases; 
 Then: Do you consider building solr.war from source difficult?


-Hoss


Re: [POLL] How do you (like to) do logging with Solr

2011-05-16 Thread Chris Hostetter

My answers...

: [X]  I always use the JDK logging as bundled in solr.war, that's perfect
: [X]  I sometimes use log4j or another framework and am happy with 
re-packaging solr.war
: [ ]  Give me solr.war WITHOUT an slf4j logger binding, so I can choose at 
deploy time
: [X]  Let me choose whether to bundle a binding or not at build time, using an 
ANT option
: [ ]  What's wrong with the "solr/example" Jetty? I never run Solr elsewhere!
: [ ]  What? Solr can do logging? How cool!


-Hoss


Re: Boost newer documents only if date is different from timestamp

2011-05-16 Thread Chris Hostetter

The "map" function lets you replace an arbitrary range of values with a 
new value, so you could "map" any value greater then the ms that today 
started on to any other point in history...

http://wiki.apache.org/solr/FunctionQuery#map

An easier approach would be probably be to apply some logic at index time: 
you can still index the the Last-Modified date you are getting, 
but if you believe that date is artificial, you can index an alternate 
date (possibly based on some rules you know about the site, or reuse the 
"first' last modified date you ever got for that URL, etc...) in a 
distinct field and use that value for date boosting.

: I am trying to boost newer documents in Solr queries. The ms function
: 
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
: seems to be the right way to go, but I need to add an additional
: condition:
: I am using the last-Modified-Date from crawled web pages as the date
: to consider, and that does not always provide a meaningful date.
: Therefore I would like the function to only boost documents where the
: date (not time) found in the last-Modified-Date is different from the
: timestamp, eliminating results that just return the current date as
: the last-Modified-Date. Suggestions are appreciated!
: 

-Hoss


Re: Embedded Solr Optimize under Windows

2011-05-16 Thread Chris Hostetter

: http://code.google.com/p/solr-geonames/wiki/DeveloperInstall
: "It's worth noting that the build has also been run on Mac and Solaris now,
: and the Solr index is about half the size. We suspect the optimize() call in
: Embedded Solr is not working correctly under Windows."
: 
: We've observed that Windows leaves lots of segments on disk and takes up
: twice the volume as the other OSs. Perhaps file locking or something

The problem isn't that "optimize" doesn't work on windows, the problem is 
that windows file semantics won't let files be deleted while there are 
open file handles -- so Lucene's Directory behavior is to leave the files 
on disk, and try to clean them up later.  (on the next write, or next 
optimize call)


-Hoss


Re: Embedded Solr Optimize under Windows

2011-05-16 Thread Greg Pendlebury
Thanks for the reply. I'm at home right now, or I'd try this myself, but is
the suggestion that two optimize() calls in a row would resolve the issue?
The process in question is a JVM devoted entirely to harvesting, calls
optimize() then shuts down.

The least processor intensive way of triggering this behaviour is
desirable... perhaps a commit()? But I wouldn't have expected that to
trigger a write.

On 17 May 2011 10:20, Chris Hostetter  wrote:

>
> : http://code.google.com/p/solr-geonames/wiki/DeveloperInstall
> : "It's worth noting that the build has also been run on Mac and Solaris
> now,
> : and the Solr index is about half the size. We suspect the optimize() call
> in
> : Embedded Solr is not working correctly under Windows."
> :
> : We've observed that Windows leaves lots of segments on disk and takes up
> : twice the volume as the other OSs. Perhaps file locking or something
>
> The problem isn't that "optimize" doesn't work on windows, the problem is
> that windows file semantics won't let files be deleted while there are
> open file handles -- so Lucene's Directory behavior is to leave the files
> on disk, and try to clean them up later.  (on the next write, or next
> optimize call)
>
>
> -Hoss
>


Re: Highlighting issue with Solr 3.1

2011-05-16 Thread Koji Sekiguchi

(11/05/17 3:27), Nemani, Raj wrote:

All,



I have just installed Solr 3.1 running on Tomcat 7.  I am noticing a possible issue with 
Highlighting.  I have a filed in my index called "story".  The solr document 
that I am testing with the data in the story field starts with the following snippet 
(remaining data in the field is not shown to keep things simple)



EN AMÉRICA LATINA,



When I search for "america" with the highlighting enabled on the "story' field, here is what I get 
in my "highlighting" section of the response.  I am using the "ASCIIFoldingFilterFactory" to 
make my searches accent insensitive.



ENAMÉRICA LATINA, SE HAN PRODUCIDO AVANCES, CON RESPECTO A LA PROTECCIÓN. The problem is the encode html tags before the showing up as raw html tags (because of the encoding) on my search results page. Just to make sure, I do want the html to be interpreted as html not as text. In this particular situation I am not worried about the dangers of allowing such behavior. The same test performed on the same data running on 1.4.1 index does not exhibit this behavior. Any help is appreciated. Please let me know if I need to post my field type definitions (index and query) from the SolrConfig.xml for the "story" field. Thanks in advance Raj I bet you have an encoder setting in your solrconfig.xml: If so, try to comment it out. Koji -- http://www.rondhuit.com/en/


Structured fields and termVectors

2011-05-16 Thread Jack Repenning
How does MoreLikeThis use termVectors?

My documents (full sample at the bottom) frequently include lines more or less 
like this

   M /trunk/home/.Aquamacs/Preferences.el

I want to "MoreLikeThis" based on the full path, but not the "M". But what I 
actually display as a search result should include "M" (should look pretty much 
like the sample, below).

If I define a field to include that whole line, I can certainly search in ways 
that skip the "M", but how do I control the termVector and MoreLikeThis?  I 
think the answer is not to termVector the line as shown, but rather to index 
these lines twice, once whole (which is also copyFielded into the display 
text), and a second time with just the path (and termVectors="true"). Which is 
OK, but since these lines will represent most of my data, double-indexing seems 
to double my storage, which is ... oh, well ... not entirely optimal.

So is there some way I can index the full line, once, with "M" and path, and 
tell the termVector to include the whole path and nothing but the path?



-==-
Jack Repenning
Technologist
Codesion Business Unit
CollabNet, Inc.
8000 Marina Boulevard, Suite 600
Brisbane, California 94005
office: +1 650.228.2562
twitter: http://twitter.com/jrep




r3580 | jack | 2011-04-26 13:55:46 -0700 (Tue, 26 Apr 2011) | 1 line
Changed paths:
   M /trunk/home/.Aquamacs
   M /trunk/home/.Aquamacs/Preferences.el
   M /trunk/www/wynton-start-page.html

simplify the hijack of Aquamacs prefs storage, aufl




PGP.sig
Description: This is a digitally signed message part


Re: How to set a common field to several values types ?

2011-05-16 Thread habogay
I want create  field from extract value from another field with some java
code ( using regular expressions ) . How to make  this ? 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-set-a-common-field-to-several-values-types-tp2922192p2951036.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Want to Delete Existing Index & create fresh index

2011-05-16 Thread Pawan Darira
I set the datadir in solrconfig.xml. actually m using core based structures.
is it creating any problem



On Sat, May 14, 2011 at 10:49 PM, Gabriele Kahlout  wrote:

> I guess you are having issues with the datadir. Did you set the datadir in
> solrconfig.xml?
>
> On Sat, May 14, 2011 at 4:10 PM, Pawan Darira  >wrote:
>
> > Hi
> >
> > I am using Solr 1.4. & had changed schema already. When i created the
> index
> > for first time, the directory was automatically created & index made
> > perfectly fine.
> >
> > Now, i want to create the index from scratch, so I deleted the whole
> > data/index directory & ran the script. Now it is only creating empty
> > directories & NO index files inside that.
> >
> > Thanks
> > Pawan
> >
> >
> > On Sat, May 14, 2011 at 6:54 PM, Dmitry Kan 
> wrote:
> >
> > > Hi Pawan,
> > >
> > > Which SOLR version do you have installed?
> > >
> > > It should be absolutely normal for the data/ sub directory to create
> when
> > > starting up SOLR.
> > >
> > > So just go ahead and post your data into SOLR, if you have changed the
> > > schema already.
> > >
> > > --
> > > Regards,
> > >
> > > Dmitry Kan
> > >
> > > On Sat, May 14, 2011 at 4:01 PM, Pawan Darira  > > >wrote:
> > >
> > > > I did that. Index directory is created but not contents in that
> > > >
> > > > 2011/5/14 François Schiettecatte 
> > > >
> > > > > You can also shut down solr/lucene, do:
> > > > >
> > > > >rm -rf /YourIndexName/data/index
> > > > >
> > > > > and restart, the index directory will be automatically recreated.
> > > > >
> > > > > François
> > > > >
> > > > > On May 14, 2011, at 1:53 AM, Gabriele Kahlout wrote:
> > > > >
> > > > > > "curl --fail $solrIndex/update?commit=true -d
> > > > > > '*:*'" #empty index [1
> > > > > > <
> > > > >
> > >
> http://wiki.apache.org/nutch/Whole-Web%20Crawling%20incremental%20script
> > > > >]
> > > > > >
> > > > > > did u try?
> > > > > >
> > > > > >
> > > > > > On Sat, May 14, 2011 at 7:26 AM, Pawan Darira <
> > > pawan.dar...@gmail.com
> > > > > >wrote:
> > > > > >
> > > > > >> Hi
> > > > > >>
> > > > > >> I had an existing index created months back. now my database
> > schema
> > > > has
> > > > > >> changed. i wanted to delete the current data/index directory &
> > > > re-create
> > > > > >> the
> > > > > >> fresh index
> > > > > >>
> > > > > >> but it is saying that "segments" file not found & just create
> > blank
> > > > > >> data/index directory. Please help
> > > > > >>
> > > > > >> --
> > > > > >> Thanks,
> > > > > >> Pawan Darira
> > > > > >>
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Regards,
> > > > > > K. Gabriele
> > > > > >
> > > > > > --- unchanged since 20/9/10 ---
> > > > > > P.S. If the subject contains "[LON]" or the addressee
> acknowledges
> > > the
> > > > > > receipt within 48 hours then I don't resend the email.
> > > > > > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x,
> this)
> > ∧
> > > > > time(x)
> > > > > > < Now + 48h) ⇒ ¬resend(I, this).
> > > > > >
> > > > > > If an email is sent by a sender that is not a trusted contact or
> > the
> > > > > email
> > > > > > does not contain a valid code then the email is not received. A
> > valid
> > > > > code
> > > > > > starts with a hyphen and ends with "X".
> > > > > > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈
> subject(x)
> > ∧
> > > y
> > > > ∈
> > > > > > L(-[a-z]+[0-9]X)).
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Thanks,
> > > > Pawan Darira
> > > >
> > >
> >
> >
> >
> > --
> > Thanks,
> > Pawan Darira
> >
>
>
>
> --
> Regards,
> K. Gabriele
>
> --- unchanged since 20/9/10 ---
> P.S. If the subject contains "[LON]" or the addressee acknowledges the
> receipt within 48 hours then I don't resend the email.
> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
> time(x)
> < Now + 48h) ⇒ ¬resend(I, this).
>
> If an email is sent by a sender that is not a trusted contact or the email
> does not contain a valid code then the email is not received. A valid code
> starts with a hyphen and ends with "X".
> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> L(-[a-z]+[0-9]X)).
>



-- 
Thanks,
Pawan Darira


error while doing full import

2011-05-16 Thread deniz
org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing failed
for xml, url:http://xxx.xxx.xxx/frontend_dev.php/xxx/xxx/xxx rows
processed:0 Processing Document # 1 at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:292)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:187)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:164)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
Caused by: java.lang.RuntimeException:
com.ctc.wstx.exc.WstxParsingException: Undeclared general entity "nbsp" at
[row,col {unknown-source}]: [170,29] at
org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:181)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:282)
... 10 more Caused by: com.ctc.wstx.exc.WstxParsingException: Undeclared
general entity "nbsp" at [row,col {unknown-source}]: [170,29] at
com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:630)
at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:467) at
com.ctc.wstx.sr.BasicStreamReader.handleUndeclaredEntity(BasicStreamReader.java:5431)
at
com.ctc.wstx.sr.StreamScanner.expandUnresolvedEntity(StreamScanner.java:1661)
at com.ctc.wstx.sr.StreamScanner.expandEntity(StreamScanner.java:1555) at
com.ctc.wstx.sr.StreamScanner.fullyResolveEntity(StreamScanner.java:1523) at
com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2757)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019) at
org.apache.solr.handler.dataimport.XPathRecordReader$Node.handleStartElement(XPathRecordReader.java:370)
at
org.apache.solr.handler.dataimport.XPathRecordReader$Node.parse(XPathRecordReader.java:304)
at
org.apache.solr.handler.dataimport.XPathRecordReader$Node.access$200(XPathRecordReader.java:196)
at
org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:178)
... 11 more May 17, 2011 10:51:51 AM
org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full
Import failed org.apache.solr.handler.dataimport.DataImportHandlerException:
Parsing failed for xml, url:http://xxx.xxx.xxx/frontend_dev.php/xxx/xxx/xxx
rows processed:0 Processing Document # 1 at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:292)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:187)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:164)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
Caused by: java.lang.RuntimeException:
com.ctc.wstx.exc.WstxParsingException: Undeclared general entity "nbsp" at
[row,col {unknown-source}]: [170,29] at
org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:181)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:282)
... 10 more Caused by: com.ctc.wstx.exc.WstxParsingException: Undeclared
general entity "nbsp" at [row,col {unknown-source}]: [170,29] at
com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:630)
at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:467) at
com.ctc.wstx.sr.BasicStreamReader.handleUndeclaredEntity(BasicStreamReader.java:5431)
at
com.ctc.wstx.sr.StreamScanner.expandUnresolvedEntity(StreamScanner.java:1661)
at com.ctc.wstx.sr.

RE: How to index and query "C#" as whole term?

2011-05-16 Thread Gnanakumar
Thank you all for your valuable suggestion/approach.  I'll set it up in
synonyms.txt using solr.SynonymFilterFactory. Hope this fit the bill.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, May 17, 2011 2:12 AM
To: solr-user@lucene.apache.org
Subject: Re: How to index and query "C#" as whole term?

The other advantage to the synonyms approach is it will be much less
of a headache down the road.

For instance, imagine you've defined "whitespacetokenizer" and
"lowercasefilter".
That'll fix your example just fine. It'll also cause all punctuation
to be included in
the tokens, so if you indexed "try to find me." (note the period) and
searched for
"me" (without the period) you'd not get a hit.

Then, let's say you get clever and do a regex manipulation via
PatternReplaceCharFilterFactory to leave in '#' but remove other
punctuation.
Then any miscellaneous stream that contains a # will give surprising
results. Consider 15# (for 15 pounds). Won't match 15 in a search now.

So whatever solution you choose, think about it pretty carefully before
you jump ..

Best
Erick

On Mon, May 16, 2011 at 2:10 PM, Robert Petersen  wrote:
> Sorry I am also using a synonyms.txt for this in the analysis stack.  I
> was not clear, sorry for any confusion.  I am not doing it outside of
> Solr but on the way into the index it is converted...  :)
>
> -Original Message-
> From: Markus Jelsma [mailto:markus.jel...@openindex.io]
> Sent: Monday, May 16, 2011 8:51 AM
> To: solr-user@lucene.apache.org
> Subject: Re: How to index and query "C#" as whole term?
>
> Before indexing so outside Solr? Using the SynonymFilter would be easier
> i
> guess.
>
> On Monday 16 May 2011 17:44:24 Robert Petersen wrote:
>> I have always just converted terms like 'C#' or 'C++' into 'csharp'
> and
>> 'cplusplus' before indexing them and similarly converted those terms
> if
>> someone searched on them.  That always has worked just fine for me...
>>
>> :)
>>
>> -Original Message-
>> From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
>> Sent: Monday, May 16, 2011 8:28 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: How to index and query "C#" as whole term?
>>
>> I don't think you'd want to use the string type here. String type is
>> almost never appropriate for a field you want to actually search on
> (it
>> is appropriate for fields to facet on).
>>
>> But you may want to use Text type with different analyzers selected.
>> You probably want Text type so the value is still split into different
>> tokens on word boundaries; you just don't want an analyzer set that
>> removes punctuation.
>>
>> On 5/16/2011 10:46 AM, Gora Mohanty wrote:
>> > On Mon, May 16, 2011 at 7:05 PM, Gnanakumar
> wrote:
>> >> Hi,
>> >>
>> >> I'm using Apache Solr v3.1.
>> >>
>> >> How do I configure/allow Solr to both index and query the term "c#"
>>
>> as a
>>
>> >> whole word/term?  From "Analysis" page, I could see that the term
>>
>> "c#" is
>>
>> >> being reduced/converted into just "c" by
>>
>> solr.WordDelimiterFilterFactory.
>>
>> > [...]
>> >
>> > Yes, as you have discovered the analyzers for the field type in
>> > question will affect the values indexed.
>> >
>> > To index "c#" exactly as is, you can use the "string" type, instead
>> > of the "text" type. However, what you probably want some filters
>> > to be applied, e.g., LowerCaseFilterFactory. Take a look at the
>> > definition of the fieldType "text" in schema.xml, define a new field
>> > type that has only the tokenizers and analyzers that you need, and
>> > use that type for your field. This Wiki page should be helpful:
>> > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>> >
>> > Regards,
>> > Gora
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>




how to use variable in solfconfig.xml?

2011-05-16 Thread deniz
 

db-config.xml
  
 database
 JdbcDataSource
 com.mysql.jdbc.Driver
 jdbc:mysql://xxx/x
 xxx
 xxx
  
  
 url_data
 URLDataSource
 http://xxx.xxx.xxx/frontend_dev.php/x//${var}
 XPathEntityProcessor
  

  





is this possible?

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-use-variable-in-solfconfig-xml-tp2951337p2951337.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to use variable in solfconfig.xml?

2011-05-16 Thread Paul Libbrecht
Deniz,

you want  to be parametrizable?
I use solr-packager to do that. And it works well.
The solrconfig and schema are all processed through the filter-resources maven 
process.

paul


Le 17 mai 2011 à 07:59, deniz a écrit :

>  class="org.apache.solr.handler.dataimport.DataImportHandler">
>
>   db-config.xml
>  
> database
> JdbcDataSource
> com.mysql.jdbc.Driver
> jdbc:mysql://xxx/x
> xxx
> xxx
>  
>  
> url_data
> URLDataSource
>  name="url">http://xxx.xxx.xxx/frontend_dev.php/x//${var}
> XPathEntityProcessor
>  
>
>  
> 
> 
> 
> 
> 
> is this possible?
> 
> -
> Zeki ama calismiyor... Calissa yapar...
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/how-to-use-variable-in-solfconfig-xml-tp2951337p2951337.html
> Sent from the Solr - User mailing list archive at Nabble.com.



AW: Structured fields and termVectors

2011-05-16 Thread Martin Rödig
Hello,

I think you can use a field that is stored and indexed. On the index time you 
can use a Keywordtokanizer und a filter to reduce the Path (without M). The 
value to display (stored Field) is always the orginal value, that means the 
value that comes in. The value was stored before the tokanizer and filter are 
working. The indexed values (and Termvectors) are the terms after the tokanizer 
and filters, so you have the reduce Path in the Termvectors.  
I hope this can Help you.

Mit freundlichen Grüßen
M.Sc. Dipl.-Inf. (FH) Martin Rödig
 
SHI Elektronische Medien GmbH
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - -
AKTUELL - NEU - AB SOFORT 
Solr/Lucene Schulung vom 28. - 30. Juni in Zürich
Wien 19. - 21.07.2011 | München 27. - 29.09. und 15. - 17.11.2011

Als erster zertifizierter Trainingspartner von Lucid Imagination in 
Deutschland, Österreich und Schweiz bietet SHI ab sofort 
deutschsprachige Solr Schulungen an.
Weitere Informationen: www.shi-gmbh.com/services/solr-training
Achtung: Die Anzahl der Plätze ist beschränkt!
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - -
Postadresse: Watzmannstr. 23, 86316 Friedberg
Besuchsadresse: Curt-Frenzel-Str. 12, 86167 Augsburg

Internet: http://www.shi-gmbh.com
Registergericht Augsburg HRB 17382
Geschäftsführer: Peter Spiske
Steuernummer: 103/137/30412

-Ursprüngliche Nachricht-
Von: Jack Repenning [mailto:jrepenn...@collab.net] 
Gesendet: Dienstag, 17. Mai 2011 03:30
An: solr-user@lucene.apache.org
Betreff: Structured fields and termVectors

How does MoreLikeThis use termVectors?

My documents (full sample at the bottom) frequently include lines more or less 
like this

   M /trunk/home/.Aquamacs/Preferences.el

I want to "MoreLikeThis" based on the full path, but not the "M". But what I 
actually display as a search result should include "M" (should look pretty much 
like the sample, below).

If I define a field to include that whole line, I can certainly search in ways 
that skip the "M", but how do I control the termVector and MoreLikeThis?  I 
think the answer is not to termVector the line as shown, but rather to index 
these lines twice, once whole (which is also copyFielded into the display 
text), and a second time with just the path (and termVectors="true"). Which is 
OK, but since these lines will represent most of my data, double-indexing seems 
to double my storage, which is ... oh, well ... not entirely optimal.

So is there some way I can index the full line, once, with "M" and path, and 
tell the termVector to include the whole path and nothing but the path?



-==-
Jack Repenning
Technologist
Codesion Business Unit
CollabNet, Inc.
8000 Marina Boulevard, Suite 600
Brisbane, California 94005
office: +1 650.228.2562
twitter: http://twitter.com/jrep




r3580 | jack | 2011-04-26 13:55:46 -0700 (Tue, 26 Apr 2011) | 1 line Changed 
paths:
   M /trunk/home/.Aquamacs
   M /trunk/home/.Aquamacs/Preferences.el
   M /trunk/www/wynton-start-page.html

simplify the hijack of Aquamacs prefs storage, aufl




Re: how to use variable in solfconfig.xml?

2011-05-16 Thread deniz
well the things that i wanna do is something like this:

lets say we got two users, ids are 1 and 2. and the links /1 returns
user1's data in xml format and /2 returns user2's data again in xml
format and I want to use the resulting xmls for indexing...

or is it better to have only one page like /users and return user1 and
user2 on the same page?


I will check for solr-packager too... 
thank you

deniz

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-use-variable-in-solfconfig-xml-tp2951337p2951395.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: indexing directed graph

2011-05-16 Thread dani.b.angelov
Thank you for your reply!
My target is to make some kind of "relation" between the document indexed it
to the index. So, for the performance reasons, I would like to index the
graph created from this relations.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-directed-graph-tp2949556p2951413.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: indexing directed graph

2011-05-16 Thread dani.b.angelov
Gora, thank you for your reply!

Could you point me a link regarding "There was a discussion earlier on this
topic...".

--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-directed-graph-tp2949556p2951418.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: indexing directed graph

2011-05-16 Thread dani.b.angelov
Thank you for your reply!
My target is to make some kind of "relations" between the documents indexed
in to the index. So, for the performance reasons, I would like to index the
graph created from this relations. In this way I can query/search the
subgraph documents by vertices names/text and/or edges name/text 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-directed-graph-tp2949759p2951426.html
Sent from the Solr - User mailing list archive at Nabble.com.


Set operations on multiple queries with different qf parameters

2011-05-16 Thread Nikhil Chhaochharia
Hi,

I am using Solr 3.1 with edismax.  My frontend allows the user to create 
arbitrarily complex queries by modifying q, fq, qf and mm (only 1 and 100% are 
allowed) parameters.  The queries can then be saved by the user.

The user should be able to perform set operations on the saved searches.  For 
example, the user may want to see all documents which are returned both by 
saved search 1 and saved search 2 (equivalent to intersection of the two).

If the saved searches contain q, fq and/or mm, then I can combine the saved 
searches to create a new query which will be equivalent to their intersection.  
However, I can't figure out how to handle qf?

For example,

Query 1 = q=abc def&fq=field1:xyz&mm=1&qf=p,q,r
Query 2 = q=jkl&mm=100%&qf=q,r,s

How do I get the list of common documents which are present in the result set 
of both queries?



Thanks,
Nikhil