Re: Storing Related Data - At Different Times

2008-01-21 Thread Gavin
Hi Otis,
Thanks. Was thinking along those lines. But having two indexes will
hurt my search. 

1 . Searching fields that belong only to the personal details should
result in 5 resumes begin shown for the guy (if he has 5). But now it
will only show 1 link to the personal details and no resumes.

2 . Searching fields that belong to the personal details and resume
details will result in 2 sets of results which I will have to manually
combine using text processing. 

Can I avoid doing this?

Thanks,
Gavin



On Sun, 2008-01-20 at 22:52 -0800, Otis Gospodnetic wrote:
> You could have 2 separate indices tied with a common field (a la FK->PK).  
> Then you only need to change the item you are updating.
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> - Original Message 
> From: Gavin <[EMAIL PROTECTED]>
> To: solr-user 
> Sent: Monday, January 21, 2008 12:09:23 AM
> Subject: Storing Related Data - At Different Times
> 
> Hi,
> In the web application we are developing we have two sets of
>  details.
> The personal details and the resume details. We allow 5 different
> resumes to be available for each user. But we want the personal details
> to remain same for each 5 resumes. Personal details are added at
> registration time. After than for each resume we want link personal
> details. This is a simple join in the db. But how do we achieve this in
> Solr. The problem is when personal details are changed we will have to
> update all 5 resumes. 
> 
> I read the thread "Some sort of join in SOLR?". But not sure this
> answers my problem. Would very much appreciate some sort of help on
>  this
> one.
> 
> Thanks,
-- 
Gavin Selvaratnam,
Project Leader

hSenid Mobile Solutions
Phone: +94-11-2446623/4 
Fax: +94-11-2307579 

Web: http://www.hSenidMobile.com 
 
Make it happen

Disclaimer: This email and any files transmitted with it are confidential and 
intended solely for 
the use of the individual or entity to which they are addressed. The content 
and opinions 
contained in this email are not necessarily those of hSenid Software 
International. 
If you have received this email in error please contact the sender.



Re: Update the index

2008-01-21 Thread farhanali

updating a document in Solr index does not require any  tag just post
the document with the same id it will be updated.

-- 
View this message in context: 
http://www.nabble.com/Update-the-index-tp14991443p14994095.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Term vector

2008-01-21 Thread Grant Ingersoll
Term vectors are, to some extent, the opposite of the inverted index.   
They store term, position and offset (the latter two are optional) on  
a per document basis, such that you can say "give me the terms,  
position and offsets for document X".  In terms of MLT, they are used  
to figure out what the most "important" terms are in a document, such  
that  a new query can be formed to find other documents that are "more  
like this" document.  They are also useful for highlighting and other  
non-search related activities like clustering, etc.


For more info, see my talk at ApacheCon: http://cnlp.org/presentations/slides/AdvancedLucene.pdf 
   Also, search for term vectors on the Lucene user mailing list (you  
can do this via Nabble)


-Grant

On Jan 20, 2008, at 10:04 PM, anuvenk wrote:



what are term vectors? How do they help with mlt?
--
View this message in context: 
http://www.nabble.com/Term-vector-tp14990408p14990408.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Grant Ingersoll
http://lucene.grantingersoll.com
http://www.lucenebootcamp.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ






Newbie with Java + typo

2008-01-21 Thread Daniel Andersson

Hi people

First the typo on http://wiki.apache.org/solr/mySolr:
"Production
Typically it's not recommended do have your front end"

it should probably be "..recommended To have.."



Second, I don't know much about Java, nor about Jetty/Resin/JBoss/ 
Tomcat. I went through the tutorial and was impressed with how easy  
it all seemed. Until the tutorial ended..


As a newbie, should I use Tomcat, JBoss, Resin, Jetty or the thing  
that comes with the example (Jetty, or?)?


All the installation pages talk about this and that that doesn't make  
much sense to non-Java people like myself :-/


Would be MUCH appreciated with some after-tutorial page for us  
newbies. Right now I'm "just" looking for something that can be used  
on a production level machine. It doesn't have to be the fastest, as  
long as it's fairly easy to install.


Recommendations and pointers are very welcome :)



Thanks in advance!



/ d


Re: Newbie with Java + typo

2008-01-21 Thread Michael Kimsal
Daniel:

As a fellow 'non-java' person I feel your pain (well, felt it anyway).  A
lot depends on your load and the machine, but I successfully ran the stock
jetty system on a box last summer for work and didn't have performance
problems.  The bigger issue was from the other java people complaining that
I hadn't used the standard jboss setup they had already working.  However, I
didnt' have access to that machine, nor would anyone give it to me at the
time, so it was a catch 22.  Performance-wise, the stock jetty will probably
do just fine for you.  Longer term, you may want to learn more about jboss
or tomcat or something else which can give you more application management
options and such.

But don't let those things stop you from running jetty/solr in production -
it's worked fine for me.


On Jan 21, 2008 10:48 AM, Daniel Andersson <[EMAIL PROTECTED]> wrote:

> Hi people
>
> First the typo on http://wiki.apache.org/solr/mySolr:
> "Production
> Typically it's not recommended do have your front end"
>
> it should probably be "..recommended To have.."
>
>
>
> Second, I don't know much about Java, nor about Jetty/Resin/JBoss/
> Tomcat. I went through the tutorial and was impressed with how easy
> it all seemed. Until the tutorial ended..
>
> As a newbie, should I use Tomcat, JBoss, Resin, Jetty or the thing
> that comes with the example (Jetty, or?)?
>
> All the installation pages talk about this and that that doesn't make
> much sense to non-Java people like myself :-/
>
> Would be MUCH appreciated with some after-tutorial page for us
> newbies. Right now I'm "just" looking for something that can be used
> on a production level machine. It doesn't have to be the fastest, as
> long as it's fairly easy to install.
>
> Recommendations and pointers are very welcome :)
>
>
>
> Thanks in advance!
>
>
>
> / d
>



-- 
Michael Kimsal
http://webdevradio.com


Multisearching with Solr

2008-01-21 Thread David Pratt
Hi. I am checking out solr after having some experience with lucene 
using pyLucene. I am looking at the potential of solr to search over a 
large index divided over multiple servers to collect results, sort of 
what the parallel multisearcher does in Lucene on its own. From quick 
scan of archives it appears SOLR-303 may be the answer to this. Can this 
functionality be incorporated into 1.2 in a sandbox environment? Has 
anyone written a recipe that would be helpful in getting a sandbox up 
and running with SOLR-303?


It will most likely be a few months before needing to incorporate this 
type of functionality in production but hoping to begin experimenting as 
soon as possible. On that note, is it anticipated that 1.3 will be out 
in a few months. If so, will it include this functionality? Lastly, what 
sort of load balancing and replication potential is anticipated for the 
multisearching capability? Many thanks.


Regards,
David


Re: Newbie with Java + typo

2008-01-21 Thread Ryan McKinley

Daniel Andersson wrote:

Hi people

First the typo on http://wiki.apache.org/solr/mySolr:
"Production
Typically it's not recommended do have your front end"

it should probably be "..recommended To have.."



you can edit any of the wiki pages...  fixing typos is a great contribution!


As a newbie, should I use Tomcat, JBoss, Resin, Jetty or the thing that 
comes with the example (Jetty, or?)?




Solr is servlet container agnostic -- it should run equally well on any 
of them.  Most people are constrained to use what they are already 
using.  If you really have no preference, perhaps stick with the jetty 
one included in the example.



Would be MUCH appreciated with some after-tutorial page for us newbies. 
Right now I'm "just" looking for something that can be used on a 
production level machine. It doesn't have to be the fastest, as long as 
it's fairly easy to install.


jetty is fine.  I think otis is using that in http://www.simpy.com/ -- I 
use resin.  Everyone you ask will give you a different answer ;) but the 
three containers that are most used by solr developers are jetty, resin 
an tomcat.


ryan


Re: Multisearching with Solr

2008-01-21 Thread Erick Erickson
You can always use the trunk build, but you'll have to check the
status of SOLR-303 to be sure it's in the trunk...

Here's a thread that discusses this...

http://mail.google.com/mail/?zx=wmtcqx3ngeup&shva=1#label/Solr/11799e3704804489

Best
Erick

On Jan 21, 2008 10:55 AM, David Pratt <[EMAIL PROTECTED]> wrote:

> Hi. I am checking out solr after having some experience with lucene
> using pyLucene. I am looking at the potential of solr to search over a
> large index divided over multiple servers to collect results, sort of
> what the parallel multisearcher does in Lucene on its own. From quick
> scan of archives it appears SOLR-303 may be the answer to this. Can this
> functionality be incorporated into 1.2 in a sandbox environment? Has
> anyone written a recipe that would be helpful in getting a sandbox up
> and running with SOLR-303?
>
> It will most likely be a few months before needing to incorporate this
> type of functionality in production but hoping to begin experimenting as
> soon as possible. On that note, is it anticipated that 1.3 will be out
> in a few months. If so, will it include this functionality? Lastly, what
> sort of load balancing and replication potential is anticipated for the
> multisearching capability? Many thanks.
>
> Regards,
> David
>


Re: Newbie with Java + typo

2008-01-21 Thread Daniel Andersson

On Jan 21, 2008, at 5:00 PM, Ryan McKinley wrote:


Daniel Andersson wrote:

Hi people
First the typo on http://wiki.apache.org/solr/mySolr:
"Production
Typically it's not recommended do have your front end"
it should probably be "..recommended To have.."


you can edit any of the wiki pages...  fixing typos is a great  
contribution!


Well, no. "Immutable Page", and as far as I know (english not being  
my mother tongue), that means I can't edit the page



Would be MUCH appreciated with some after-tutorial page for us  
newbies. Right now I'm "just" looking for something that can be  
used on a production level machine. It doesn't have to be the  
fastest, as long as it's fairly easy to install.


jetty is fine.  I think otis is using that in http://www.simpy.com/  
-- I use resin.  Everyone you ask will give you a different  
answer ;) but the three containers that are most used by solr  
developers are jetty, resin an tomcat.


Yeah, that's what I kind of expected ;) Hence the question "for a  
total newbie who don't know.."


Will stick with the example jetty

/ d


Re: Newbie with Java + typo

2008-01-21 Thread Brian Whitman


On Jan 21, 2008, at 11:13 AM, Daniel Andersson wrote:
Well, no. "Immutable Page", and as far as I know (english not being  
my mother tongue), that means I can't edit the page



You need to create an account first.


Re: Newbie with Java + typo

2008-01-21 Thread Daniel Andersson

On Jan 21, 2008, at 4:53 PM, Michael Kimsal wrote:

As a fellow 'non-java' person I feel your pain (well, felt it  
anyway).  A
lot depends on your load and the machine, but I successfully ran  
the stock

jetty system on a box last summer for work and didn't have performance
problems.    Performance-wise, the stock jetty will probably
do just fine for you.  Longer term, you may want to learn more  
about jboss
or tomcat or something else which can give you more application  
management

options and such.

But don't let those things stop you from running jetty/solr in  
production -

it's worked fine for me.


Sounds good to me, thanks!

/ d


Re: Multisearching with Solr

2008-01-21 Thread David Pratt
Hi Erick. Thank you for your reply. Unfortunately, I cannot access the 
link you provided. It this message from the solr-user list? Many thanks.


Regards,
David

Erick Erickson wrote:

You can always use the trunk build, but you'll have to check the
status of SOLR-303 to be sure it's in the trunk...

Here's a thread that discusses this...

http://mail.google.com/mail/?zx=wmtcqx3ngeup&shva=1#label/Solr/11799e3704804489

Best
Erick

On Jan 21, 2008 10:55 AM, David Pratt <[EMAIL PROTECTED]> wrote:


Hi. I am checking out solr after having some experience with lucene
using pyLucene. I am looking at the potential of solr to search over a
large index divided over multiple servers to collect results, sort of
what the parallel multisearcher does in Lucene on its own. From quick
scan of archives it appears SOLR-303 may be the answer to this. Can this
functionality be incorporated into 1.2 in a sandbox environment? Has
anyone written a recipe that would be helpful in getting a sandbox up
and running with SOLR-303?

It will most likely be a few months before needing to incorporate this
type of functionality in production but hoping to begin experimenting as
soon as possible. On that note, is it anticipated that 1.3 will be out
in a few months. If so, will it include this functionality? Lastly, what
sort of load balancing and replication potential is anticipated for the
multisearching capability? Many thanks.

Regards,
David





Re: spellcheckhandler

2008-01-21 Thread anuvenk

I did try with the latest nightly build. The problem still exists. 
I tested with the example data that comes with solr package.
1)with termsourcefield set to 'word' which is string fieldtype
q=iped nano   returns   'ipod nano' which is good

2) with termsourcefield set to 'spell' (which is the catchall field of
'spell' fieldtype according to the tutorial 
http://wiki.apache.org/solr/SpellCheckerRequestHandler
that has my text fields copied in to it at index time)
q=grapics returns 'graphics' which is good
but q=grapics card returns nothing.

Not sure if i'm missing something. Please help!!


Otis Gospodnetic wrote:
> 
> You don't need to wait for 1.3 to be released - you can simply use a
> recent nightly build.
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> - Original Message 
> From: anuvenk <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Monday, January 21, 2008 12:35:52 AM
> Subject: Re: spellcheckhandler
> 
> 
> I followed the steps outlined in 
> http://wiki.apache.org/solr/SpellCheckerRequestHandler
> with regards to setting up of the schema with a new field 'spell' and
> copying other fields to this 'spell' field at index time.
> It works fine with single word queries but doesn't return anything for
> multi-word queries. I read previous posts where this has been
>  discussed. I
> read that some of the active members are in the process of releasing
>  patches
> that fixes this problem. I'm actually trying to implement this spell
>  check
> in the production set up. Is it absolutely not possible to get spell
>  check
> results back for multi-word queries, should i wait for 1.3 release. If
>  there
> is any other option please educate me. In case a patch was already
>  released,
> how to add it to the current 1.2 version that i'm using?
> -- 
> View this message in context:
>  http://www.nabble.com/spellcheckhandler-tp14627712p14991534.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/spellcheckhandler-tp14627712p15002379.html
Sent from the Solr - User mailing list archive at Nabble.com.



DisMax and Search Components

2008-01-21 Thread Doug Steigerwald
Is there any support for DisMax (or any search request handlers) in search components, or is that 
something that still needs to be done?  It seems like it isn't supported at the moment.


We want to be able to use a field collapsing component 
(https://issues.apache.org/jira/browse/SOLR-236), but still be able to use our DisMax handlers.


Right now it's one or the other, and we -need- both.

Thanks.
doug


Re: Multisearching with Solr

2008-01-21 Thread Erick Erickson
Yep, it's from the SOLR user list. Well, not really. I mistakenly copied
my gmail url when I was looking at the relevant post, which *of course*
you can't access

http://svn.apache.org/repos/asf/lucene/solr/trunk
or
http://lucene.apache.org/solr/version_control.html


Sorry 'bout that.
Erick


On Jan 21, 2008 11:34 AM, David Pratt <[EMAIL PROTECTED]> wrote:

> Hi Erick. Thank you for your reply. Unfortunately, I cannot access the
> link you provided. It this message from the solr-user list? Many thanks.
>
> Regards,
> David
>
> Erick Erickson wrote:
> > You can always use the trunk build, but you'll have to check the
> > status of SOLR-303 to be sure it's in the trunk...
> >
> > Here's a thread that discusses this...
> >
> >
> http://mail.google.com/mail/?zx=wmtcqx3ngeup&shva=1#label/Solr/11799e3704804489
> >
> > Best
> > Erick
> >
> > On Jan 21, 2008 10:55 AM, David Pratt <[EMAIL PROTECTED]> wrote:
> >
> >> Hi. I am checking out solr after having some experience with lucene
> >> using pyLucene. I am looking at the potential of solr to search over a
> >> large index divided over multiple servers to collect results, sort of
> >> what the parallel multisearcher does in Lucene on its own. From quick
> >> scan of archives it appears SOLR-303 may be the answer to this. Can
> this
> >> functionality be incorporated into 1.2 in a sandbox environment? Has
> >> anyone written a recipe that would be helpful in getting a sandbox up
> >> and running with SOLR-303?
> >>
> >> It will most likely be a few months before needing to incorporate this
> >> type of functionality in production but hoping to begin experimenting
> as
> >> soon as possible. On that note, is it anticipated that 1.3 will be out
> >> in a few months. If so, will it include this functionality? Lastly,
> what
> >> sort of load balancing and replication potential is anticipated for the
> >> multisearching capability? Many thanks.
> >>
> >> Regards,
> >> David
> >>
> >
>


Re: solr 1.3

2008-01-21 Thread Mike Klaas


On 20-Jan-08, at 5:07 PM, anuvenk wrote:


when will this be released? where can i find the list of
improvements/enhancements in 1.3 if its been documented already?


see http://svn.apache.org/viewvc/lucene/solr/trunk/CHANGES.txt? 
view=markup


We're not sure on a timeframe for release yet.

-Mike


Help - corrupted field in index

2008-01-21 Thread Lance Norskog
I have an 'integer' static field in my schema. Some the index for this field
is corrupted. When I search on this field it works. When I use this field to
sort against, I get this exception. Does this mean that there is a string in
one of my entries? It is possible the field was not required or defaulted at
some point and there are empty indexed fields for some records.
 
description The server encountered an internal error (For input string: ""
java.lang.NumberFormatException: For input string: "" at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:48
) at java.lang.Integer.parseInt(Integer.java:468) at
java.lang.Integer.parseInt(Integer.java:497) at
org.apache.lucene.search.FieldCacheImpl$1.parseInt(FieldCacheImpl.java:136)
at
org.apache.lucene.search.FieldCacheImpl$3.createValue(FieldCacheImpl.java:17
1) at
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) at
org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:154) at
org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:148) at
org.apache.lucene.search.FieldSortedHitQueue.comparatorInt(FieldSortedHitQue
ue.java:204) at
org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQue
ue.java:175) at
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) at
org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSorted
HitQueue.java:155) at
org.apache.lucene.search.FieldSortedHitQueue.(FieldSortedHitQueue.java
:56) at
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java
:856) at
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:
805) at
org.apache.solr.search.SolrIndexSearcher.getDocList(SolrIndexSearcher.java:6
98) at
org.apache.solr.request.StandardRequestHandler.handleRequestBody(StandardReq
uestHandler.java:122) at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.
java:77) at org.apache.solr.core.SolrCore.execute(SolrCore.java:658) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:1
91) at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
159) at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
FilterChain.java:215) at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh
ain.java:188) at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja
va:210) at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja
va:174) at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127
) at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117
) at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java
:108) at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:870)
at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processC
onnection(Http11BaseProtocol.java:665) at
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.jav
a:528) at
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWo
rkerThread.java:81) at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.jav
a:685) at java.lang.Thread.run(Thread.java:595) ) that prevented it from
fulfilling this request.

  _  


A



RE: solr 1.3

2008-01-21 Thread Lance Norskog
 Would somone please consider marking a label on the Subversion repository
that says, "This is a clean version"? I only do HTTP requests and have no
custom software, so I don't care about internal interfaces changing.

Thanks,

Lance Norskog

-Original Message-
From: Mike Klaas [mailto:[EMAIL PROTECTED] 
Sent: Monday, January 21, 2008 11:25 AM
To: solr-user@lucene.apache.org
Subject: Re: solr 1.3


On 20-Jan-08, at 5:07 PM, anuvenk wrote:
>
> when will this be released? where can i find the list of 
> improvements/enhancements in 1.3 if its been documented already?

see http://svn.apache.org/viewvc/lucene/solr/trunk/CHANGES.txt? 
view=markup

We're not sure on a timeframe for release yet.

-Mike



Re: solr 1.3

2008-01-21 Thread Mike Klaas

Lance,

That is a murky area, legally.  Apache requires a considerable amount  
of auditing and process dedicated to anything called a "release".   
Nightly svn builds have a special exemption.  Creating an svn label  
"clean for general use" veers slightly in the direction of a  
"release".  If someone more knowledgeable than me says that this is  
okay, then I think that it is a reasonable thing to do before major  
changes in trunk (like adding a new lucene version, committing  
SearchComponents, etc).


Trunk, though, is quite stable (I'm using a version from ~1.5 months  
ago, though).


-Mike

On 21-Jan-08, at 12:10 PM, Lance Norskog wrote:

 Would somone please consider marking a label on the Subversion  
repository
that says, "This is a clean version"? I only do HTTP requests and  
have no

custom software, so I don't care about internal interfaces changing.

Thanks,

Lance Norskog

-Original Message-
From: Mike Klaas [mailto:[EMAIL PROTECTED]
Sent: Monday, January 21, 2008 11:25 AM
To: solr-user@lucene.apache.org
Subject: Re: solr 1.3


On 20-Jan-08, at 5:07 PM, anuvenk wrote:


when will this be released? where can i find the list of
improvements/enhancements in 1.3 if its been documented already?


see http://svn.apache.org/viewvc/lucene/solr/trunk/CHANGES.txt?
view=markup

We're not sure on a timeframe for release yet.

-Mike





Solr Warm up on Tomcat

2008-01-21 Thread Jae Joo
Hi,

Does anyone have experience or solution how to warm up the solr instance on
the tomcat automatically?

I am using Apache 2 for load balancer and 3 Tomcat machines running Solr.
If one of tomcat is needed to shutdown and startup again, the solr should be
warm up before serving the request.

Thanks,

Jae joo


Re: Missing Content Stream

2008-01-21 Thread Ismail Siddiqui
I am trying solrj to index.. using follwing code

 String url = "http://localhost:8080/solr";;
  SolrServer server = new CommonsHttpSolrServer( url );

its giving error that undifined symbol for constructor(string). can somoen
tell me why this constructor thrwoing error while in source file i can
clearly see this constructor

thanks



On 1/15/08, Ismail Siddiqui <[EMAIL PROTECTED]> wrote:
>
> thanks brian and otis,
> i will definitely try solrj.. but actaually now the problem is resolved by
> setting content length in header i was missing it
> c.setRequestProperty("Content-Length", xmlText.length()+"");
> but now its not throwing any error but not indexing the document either..
> do I have to set autoCommit on in solrconfig.xml ???
>
>
> thanks
>
>
>  On 1/15/08, Brian Whitman <[EMAIL PROTECTED]> wrote:
> >
> >
> > On Jan 15, 2008, at 1:50 PM, Ismail Siddiqui wrote:
> >
> > > Hi Everyone,
> > > I am new to solr. I am trying to index xml using http post as follows
> >
> >
> > Ismail, you seem to have a few spelling mistakes in your xml string.
> > "fiehld, nadme" etc. (a) try fixing them, (b) try solrj instead, I
> > agree w/ otis.
> >
> >
> >
> >
>


Is it possible to have "append" kind update operation?

2008-01-21 Thread zqzuk

Hi, is it possible to have "append" like updates, where if two records of
same id's are posted to solr, the contents of the two merges and composes a
single record with the id? I am asking because my program works in a
multi-thread manner where several threads produces serveral parts of a final
record which is to be posted and indexed. Currently I am having a
preprocessing program where the threads produces parts, then a post
processing where the parts are merged into a single xml file then posted to
solr. If it is possible to do "append" like updating, then each thread can
post to solr directly without writing temporary files.

For example, thread 1 produce an xml file like:
--



198
This is my short text. This is part 1 of the
record with id=198


--

thread 2 produces xml like

--



198
Title here. This is part 2 of record with id=198


--

Currently my program needs to produce the two separate files, then merge
them into

--



198
This is my short text. This is part 1 of the
record with id=198
This is my short text. This is part 1 of the
record with id=198


--
 
Then post the final file. If I post the two separately, I get two separate
records with same id=198, while one has only field "description" and the
other has only field "title".

Is it possible to append? Or is my settings "allowDup" incorrect?

Many thanks!
-- 
View this message in context: 
http://www.nabble.com/Is-it-possible-to-have-%22append%22-kind-update-operation--tp15006743p15006743.html
Sent from the Solr - User mailing list archive at Nabble.com.



illegal characters in xml file to be posted?

2008-01-21 Thread zqzuk

Hi, I am using the SimplePostTool to post files to solr. I have encoutered
some problem with the content of xml files. I noticed that if my xml file
has fields whose values contain the character "&" or "<" or ">", the post
fails and I get the exception :

"javax.xml.stream.XMLStreamException: ParseError at [row, col]:[x,y]
Message: The entity name must immediately follow the '&' in the entity
reference"

Looks like these characters are illegal in xml as embedded contents - but I
did extract them from xml in the first place. Is there a list of such
characters I need to deal with before I pass that to SimplePostTool?

Thanks!
-- 
View this message in context: 
http://www.nabble.com/illegal-characters-in-xml-file-to-be-posted--tp15006748p15006748.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: illegal characters in xml file to be posted?

2008-01-21 Thread Binkley, Peter
You should encode those three characters, and it doesn't hurt to encode
the ampersand and double-quote characters too:
http://en.wikipedia.org/wiki/XML#Entity_references

Peter 

-Original Message-
From: zqzuk [mailto:[EMAIL PROTECTED] 
Sent: Monday, January 21, 2008 2:24 PM
To: solr-user@lucene.apache.org
Subject: illegal characters in xml file to be posted?


Hi, I am using the SimplePostTool to post files to solr. I have
encoutered some problem with the content of xml files. I noticed that if
my xml file has fields whose values contain the character "&" or "<" or
">", the post fails and I get the exception :

"javax.xml.stream.XMLStreamException: ParseError at [row, col]:[x,y]
Message: The entity name must immediately follow the '&' in the entity
reference"

Looks like these characters are illegal in xml as embedded contents -
but I did extract them from xml in the first place. Is there a list of
such characters I need to deal with before I pass that to
SimplePostTool?

Thanks!
--
View this message in context:
http://www.nabble.com/illegal-characters-in-xml-file-to-be-posted--tp150
06748p15006748.html
Sent from the Solr - User mailing list archive at Nabble.com.



Wildcards

2008-01-21 Thread dojolava
Hello,

I just started to use solr and I experience strange behaviour when it comes
to wildcards.

When I use the StandardRequestHandler queries like "eur?p?an" or "eur*an"
work fine.
But "garden?r" or "admini*tion" do not bring any results (without wildcards
there are some of course).

All affected fields are of type text, with the standard schema.xml from the
example.

Does anybody know how to fix this?


RE: illegal characters in xml file to be posted?

2008-01-21 Thread zqzuk

Thanks for the quick advice!


pbinkley wrote:
> 
> You should encode those three characters, and it doesn't hurt to encode
> the ampersand and double-quote characters too:
> http://en.wikipedia.org/wiki/XML#Entity_references
> 
> Peter 
> 
> -Original Message-
> From: zqzuk [mailto:[EMAIL PROTECTED] 
> Sent: Monday, January 21, 2008 2:24 PM
> To: solr-user@lucene.apache.org
> Subject: illegal characters in xml file to be posted?
> 
> 
> Hi, I am using the SimplePostTool to post files to solr. I have
> encoutered some problem with the content of xml files. I noticed that if
> my xml file has fields whose values contain the character "&" or "<" or
> ">", the post fails and I get the exception :
> 
> "javax.xml.stream.XMLStreamException: ParseError at [row, col]:[x,y]
> Message: The entity name must immediately follow the '&' in the entity
> reference"
> 
> Looks like these characters are illegal in xml as embedded contents -
> but I did extract them from xml in the first place. Is there a list of
> such characters I need to deal with before I pass that to
> SimplePostTool?
> 
> Thanks!
> --
> View this message in context:
> http://www.nabble.com/illegal-characters-in-xml-file-to-be-posted--tp150
> 06748p15006748.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/illegal-characters-in-xml-file-to-be-posted--tp15006748p15007840.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Wildcards

2008-01-21 Thread Yonik Seeley
On Jan 21, 2008 5:18 PM, dojolava <[EMAIL PROTECTED]> wrote:
> I just started to use solr and I experience strange behaviour when it comes
> to wildcards.
>
> When I use the StandardRequestHandler queries like "eur?p?an" or "eur*an"
> work fine.
> But "garden?r" or "admini*tion" do not bring any results (without wildcards
> there are some of course).

It's probably stemming.  Something like "gardener" is probably stemmed
to "garden", so
a wildcard query that expects something longer than "garden" won't
find anything.

If you really need more accurate wildcard queries, do a copyField of
this field into another that does not have stemming (perhaps just
whitespace tokenizer and lowercase filter, and maybe stop filter).
Then use this alternate field for wildcard queries.

-Yonik


Re: Wildcards

2008-01-21 Thread dojolava
Thanks a lot!

I checked it, when I search for "g?rden" it works, only "g?rdener" does
not...

I will try the copyField solution.

On Jan 21, 2008 11:23 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:

> On Jan 21, 2008 5:18 PM, dojolava <[EMAIL PROTECTED]> wrote:
> > I just started to use solr and I experience strange behaviour when it
> comes
> > to wildcards.
> >
> > When I use the StandardRequestHandler queries like "eur?p?an" or
> "eur*an"
> > work fine.
> > But "garden?r" or "admini*tion" do not bring any results (without
> wildcards
> > there are some of course).
>
> It's probably stemming.  Something like "gardener" is probably stemmed
> to "garden", so
> a wildcard query that expects something longer than "garden" won't
> find anything.
>
> If you really need more accurate wildcard queries, do a copyField of
> this field into another that does not have stemming (perhaps just
> whitespace tokenizer and lowercase filter, and maybe stop filter).
> Then use this alternate field for wildcard queries.
>
> -Yonik
>


Re: DisMax and Search Components

2008-01-21 Thread Charles Hornberger
On Jan 21, 2008 10:23 AM, Doug Steigerwald
<[EMAIL PROTECTED]> wrote:
> Is there any support for DisMax (or any search request handlers) in search 
> components, or is that
> something that still needs to be done?  It seems like it isn't supported at 
> the moment.

I was curious about this, too ... If it *is* something that needs to
be done, am happy to help w/ the coding. But I would need some
advice/guidance up front --  I'm new enough to Solr that the design
behind the SearchComponents refactoring is not immediately obvious to
me, either from the Jira comments or the code itself.

-Charlie


Re: DisMax and Search Components

2008-01-21 Thread Yonik Seeley
The QueryComponent supports both lucene queryparser syntax and dismax
query syntax.
The dismax request handler now simply sets defType (the default base
query type) to "dismax"

-Yonik

On Jan 21, 2008 1:23 PM, Doug Steigerwald
<[EMAIL PROTECTED]> wrote:
> Is there any support for DisMax (or any search request handlers) in search 
> components, or is that
> something that still needs to be done?  It seems like it isn't supported at 
> the moment.
>
> We want to be able to use a field collapsing component
> (https://issues.apache.org/jira/browse/SOLR-236), but still be able to use 
> our DisMax handlers.
>
> Right now it's one or the other, and we -need- both.
>
> Thanks.
> doug
>


Re: DisMax and Search Components

2008-01-21 Thread Doug Steigerwald

We've found a way to work around it.  In our search components, we're doing 
something like:

  defType = defType == null ? DisMaxQParserPlugin.NAME : defType;

If you add &defType=dismax to the query string, it'll use the 
DisMaxQParserPlugin.

Unfortunately, I haven't been able to figure out an easy way to access the config for the different 
defined disxmax handlers in the config, so on our service side (Rails app), we're going to have a 
configuration with all the params we need to pass (qf, pf, fl, etc) and send them based on 
parameters we have coming into the service that we use to figure out which dismax handler to use 
(uh, yeah, I think that sounds right).


This may not be the best way to do it, but it will work fine for us until we can dedicate more time 
to it (we roll out Solr and our search service to QA next week).


Doug

Charles Hornberger wrote:

On Jan 21, 2008 10:23 AM, Doug Steigerwald
<[EMAIL PROTECTED]> wrote:

Is there any support for DisMax (or any search request handlers) in search 
components, or is that
something that still needs to be done?  It seems like it isn't supported at the 
moment.


I was curious about this, too ... If it *is* something that needs to
be done, am happy to help w/ the coding. But I would need some
advice/guidance up front --  I'm new enough to Solr that the design
behind the SearchComponents refactoring is not immediately obvious to
me, either from the Jira comments or the code itself.

-Charlie


Re: DisMax and Search Components

2008-01-21 Thread Yonik Seeley
On Jan 21, 2008 9:06 PM, Doug Steigerwald
<[EMAIL PROTECTED]> wrote:
> We've found a way to work around it.  In our search components, we're doing 
> something like:
>
>defType = defType == null ? DisMaxQParserPlugin.NAME : defType;

Would it be easier to just add it as a default parameter in the request handler?

-Yonik


Re: DisMax and Search Components

2008-01-21 Thread Doug Steigerwald

We don't always want to use the dismax handler in our setup.

Doug

Yonik Seeley wrote:

On Jan 21, 2008 9:06 PM, Doug Steigerwald
<[EMAIL PROTECTED]> wrote:

We've found a way to work around it.  In our search components, we're doing 
something like:

   defType = defType == null ? DisMaxQParserPlugin.NAME : defType;


Would it be easier to just add it as a default parameter in the request handler?

-Yonik


RE: copyField limitation

2008-01-21 Thread Lance Norskog
Sorting on a non-integer has space problems. As I understand it, sorting
creates an array of integers the size of the number of records in the entire
index. Sorting on a non-integer type also creates a separate array of the
same size with the field data copied into it.  Thus sorting a non-integer
field can use several times as much memory.

We have a very large index with very small records. We are creating matching
integer fields for various fields just to have faster sorts, and we are
doing this after benchmarking our speed and space behaviours.

I filed a Jira issue:

https://issues.apache.org/jira/browse/SOLR-464

Thanks for your time,

Lance Norskog

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley
Sent: Thursday, January 17, 2008 2:53 PM
To: solr-user@lucene.apache.org
Subject: Re: copyField limitation

On Jan 17, 2008 4:53 PM, Lance Norskog <[EMAIL PROTECTED]> wrote:
> Because sort works much faster on type 'integer', but range queries do 
> not work on type 'integer',

Really?  The sort speed should be identical.

-Yonik



OOE during indexing

2008-01-21 Thread Marcus Herou
Hi.

I get OOE with Solr 1.3 Autowarm seem to be the villain in cojunction with
FieldCache somehow.
JVM args: -Xmx512m -Xms512m -Xss128k

Index size is ~4 Million docs, where I index text and store database primary
keys.
du /srv/solr/feedItem/data/index/
1.7G/srv/solr/feedItem/data/index/

To ensure that the docs I index do not swell to much I only allow 5K per doc
to over the wire i.e. I substring 0, 5000 on the field "content"

I have removed "firstSearcher" and "newSearcher" since the queries I used
before killed performance on reindexing the whole index. I will add them
later again when I get into a delta update index state.

Stacktrace.
[06:25:53.122] [null] /update wt=xml&version=2.2 0 3165
[06:25:53.877] Error during auto-warming of key:
[EMAIL PROTECTED]:java.lang.OutOfMemoryError:
Java heap space
[06:25:53.877]  at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java
:104)
[06:25:53.877]  at org.apache.lucene.index.SegmentTermEnum.term(
SegmentTermEnum.java:159)
[06:25:53.877]  at org.apache.lucene.index.SegmentMergeInfo.next(
SegmentMergeInfo.java:66)
[06:25:53.877]  at org.apache.lucene.index.MultiTermEnum.next(
MultiReader.java:315)
[06:25:53.877]  at org.apache.lucene.search.FieldCacheImpl$10.createValue(
FieldCacheImpl.java:388)
[06:25:53.877]  at org.apache.lucene.search.FieldCacheImpl$Cache.get(
FieldCacheImpl.java:72)
[06:25:53.877]  at org.apache.lucene.search.FieldCacheImpl.getStringIndex(
FieldCacheImpl.java:350)
[06:25:53.877]  at
org.apache.lucene.search.FieldSortedHitQueue.comparatorString(
FieldSortedHitQueue.java:266)
[06:25:53.877]  at
org.apache.lucene.search.FieldSortedHitQueue$1.createValue(
FieldSortedHitQueue.java:182)
[06:25:53.877]  at org.apache.lucene.search.FieldCacheImpl$Cache.get(
FieldCacheImpl.java:72)
[06:25:53.877]  at
org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(
FieldSortedHitQueue.java:155)
[06:25:53.877]  at org.apache.lucene.search.FieldSortedHitQueue.(
FieldSortedHitQueue.java:56)
[06:25:53.877]  at org.apache.solr.search.SolrIndexSearcher.getDocListNC(
SolrIndexSearcher.java:862)
[06:25:53.877]  at org.apache.solr.search.SolrIndexSearcher.getDocListC(
SolrIndexSearcher.java:808)
[06:25:53.877]  at org.apache.solr.search.SolrIndexSearcher.access$000(
SolrIndexSearcher.java:56)
[06:25:53.877]  at org.apache.solr.search.SolrIndexSearcher$2.regenerateItem
(SolrIndexSearcher.java:254)
[06:25:53.877]  at org.apache.solr.search.LRUCache.warm(LRUCache.java:192)
[06:25:53.877]  at org.apache.solr.search.SolrIndexSearcher.warm(
SolrIndexSearcher.java:1393)
[06:25:53.877]  at org.apache.solr.core.SolrCore$2.call(SolrCore.java:702)
[06:25:53.877]  at java.util.concurrent.FutureTask$Sync.innerRun(
FutureTask.java:269)
[06:25:53.877]  at java.util.concurrent.FutureTask.run(FutureTask.java:123)
[06:25:53.877]  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(
ThreadPoolExecutor.java:650)
[06:25:53.877]  at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:675)
[06:25:53.877]  at java.lang.Thread.run(Thread.java:595)

Help anyone?

Attaching schema.xml and solrconfig.xml

Kindly

//Marcus Herou






  



















  

  



  




  
  




  


 

 


 
 	
	
	
	
	
	
	
	
	 
	
	



 uid

 
 description

 
 






  
  ${solr.abortOnConfigurationError:true}

  
  

  
   
false
10
1
2147483647
1
1000
1
  

  

false
25
1
2147483647
1


 
 true 
  

  
  




 
  1000
  30






  


  

1024





   


  



true




   

   
10















false


4

  

  
  


  
  
  
  
  

 
   explicit
   
 
  

  
  

 explicit
 0.01
 
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
 
 
text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9
 
 
ord(poplarity)^0.5 recip(rord(price),1,1000,1000)^0.3
 
 
id,name,price,score
 
 
2<-1 5<-2 6<90%
 
 100
 *:*

  

  
  

 explicit
 text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0
 2<-1 5<-2 6<90%
 
 incubationdate_dt:[* TO NOW/DAY-1MONTH]^2.2



  inStock:true



  cat
  manu_exact
  price:[* TO 500]
  price:[500 TO *]

  
  
  

 
inStock:true
 
 
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
 
 
2<-1 5<-2 6<90%
 
  


  
  

 
   1
   0.5
 
 
 
 
 
 
 
 spell
 
 
 
 
 word