date:20090730

On Thu, Jul 30, 2009 at 4:52 AM, Jason Rutherglen <
jason.rutherg...@gmail.com> wrote:

> I created an issue and have added some notes
> https://issues.apache.org/jira/browse/SOLR-1316
>

Also see https://issues.apache.org/jira/browse/SOLR-706

-- 
Regards,
Shalin Shekhar Mangar.

solr-user@lucene.apache.org

2009-07-30 Thread Jörg Agatz

Good Morning SolR :-) its morning in Germany!

i have a Problem, with the Indexing...

I often become an Error.

I think it is because in the XML stand this Character "&"
I need the Character, what happens?


SimplePostTool: FATAL: Solr returned an error:
comctcwstxexcWstxLazyException_Unexpected_character___code_32_missing_name__at_rowcol_unknownsource_1465__comctcwstxexcWstxLazyException_comctcwstxexcWstxUnexpectedCharException_Unexpected_character___code_32_missing_name__at_rowcol_unknownsource_1465__at_comctcwstxexcWstxLazyExceptionthrowLazilyWstxLazyExceptionjava45__at_comctcwstxsrStreamScannerthrowLazyErrorStreamScannerjava729__at_comctcwstxsrBasicStreamReadersafeFinishTokenBasicStreamReaderjava3659__at_comctcwstxsrBasicStreamReadergetTextBasicStreamReaderjava809__at_orgapachesolrhandlerXMLLoaderreadDocXMLLoaderjava278__at_orgapachesolrhandlerXMLLoaderprocessUpdateXMLLoaderjava139__at_orgapachesolrhandlerXMLLoaderloadXMLLoaderjava69__at_orgapachesolrhandlerContentStreamHandlerBasehandleRequestBodyContentStreamHandlerBasejava54__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava131__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava1299__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava338__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava241__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHt

_

Re: solr indexing on same set of records with different value of unique field, not working fine.

2009-07-30 Thread noor


FYI
 Attached schema.xml file.
 And the add doc xml snippets are,

  
501
ESQ.VISION.A72
201
CpuLoopEnd Process=$Z4B1 CpuPin=0,992 
Program=\VEGAS.$SYSTEM.SYS00.MEASFH Terminal=\VEGAS.$TSPM.#TERM 
CpuBusy=0 MemPage=24 User=50,10

\VEGAS.$QQDS
PLGOVNPM
2008-10-07T03:00:30.0Z
2008-10-07T10:02:27.95Z
1247905648000
   
   .


i just load the currentTimeStamps long value into the add doc xml to 
load into solr.



Chris Hostetter wrote:
I'm not really understanding how you could get the situation you describe 
... which suggests that one (or both) of us don't understand exactly what 
happened.


if you can post the actual schema.xml file you used and an example of the 
input you indexed perhaps we can spot the discrepency.


FWIW: using a timestamp as a uniqueKey doesn't make much sense ...

 1) if you have heavy parallelization two docs indexed at the exact same 
time might overwrite eachother.
 2) you have no way of ever replacing an existing doc (unless you roll the 
clock back) in which case there's no advantage to using a uniqueKey -- 
so you might as leave it out of your schema (which makes indexing 
slightly faster) 


: I need to run around 10 million records to index, by solr.
: I has nearly 2lakh records, so i made a program to looping it till 10 million.
: Here, i specified 20 fields in schema.xml file. the unoque field i set was,
: currentTimeStamp field.
: So, when i run the loader program (which loads xml data into solr) it creates
: currentTimestamp value...and loads into solr.
: 
: For this situation,

: i stopped the loader program, after 100 records indexed into solr.
: Then again, i run the loader program for the SAME 100 records to indexed
: means,
: the solr results 100, rather than 200.
: 
: Because, i set currentTimeStamp field as uniqueField. So i expect the result

: as 200, if i run again the same 100 records...
: 
: Any suggestions please...




-Hoss

Re: solr indexing on same set of records with different value of unique field, not working fine.

2009-07-30 Thread noor


Sorry, schema.xml file is here in this mail...

noor wrote:

FYI
 Attached schema.xml file.
 And the add doc xml snippets are,

  
501
ESQ.VISION.A72
201
CpuLoopEnd Process=$Z4B1 CpuPin=0,992 
Program=\VEGAS.$SYSTEM.SYS00.MEASFH Terminal=\VEGAS.$TSPM.#TERM 
CpuBusy=0 MemPage=24 User=50,10

\VEGAS.$QQDS
PLGOVNPM
2008-10-07T03:00:30.0Z
2008-10-07T10:02:27.95Z
1247905648000
   
   .


i just load the currentTimeStamps long value into the add doc xml to 
load into solr.



Chris Hostetter wrote:
I'm not really understanding how you could get the situation you 
describe ... which suggests that one (or both) of us don't understand 
exactly what happened.


if you can post the actual schema.xml file you used and an example of 
the input you indexed perhaps we can spot the discrepency.


FWIW: using a timestamp as a uniqueKey doesn't make much sense ...

 1) if you have heavy parallelization two docs indexed at the exact 
same time might overwrite eachother.
 2) you have no way of ever replacing an existing doc (unless you 
roll the clock back) in which case there's no advantage to using 
a uniqueKey -- so you might as leave it out of your schema (which 
makes indexing slightly faster)

: I need to run around 10 million records to index, by solr.
: I has nearly 2lakh records, so i made a program to looping it till 
10 million.
: Here, i specified 20 fields in schema.xml file. the unoque field i 
set was,

: currentTimeStamp field.
: So, when i run the loader program (which loads xml data into solr) 
it creates

: currentTimestamp value...and loads into solr.
: : For this situation,
: i stopped the loader program, after 100 records indexed into solr.
: Then again, i run the loader program for the SAME 100 records to 
indexed

: means,
: the solr results 100, rather than 200.
: : Because, i set currentTimeStamp field as uniqueField. So i expect 
the result

: as 200, if i run again the same 100 records...
: : Any suggestions please...



-Hoss


  











  





















































 








 





















 















 






   
   

		


























   
   
   

   
   

   
   


   
   

   
   
   










   


   
   


 

evid

 
text

Skipping fields from XML

2009-07-30 Thread Edwin Stauthamer

Hi,

I want to index a perfectly good solr XML-file into an Solr/Lucene instance.
The problem is that the XML has many fields that I don't want to be indexed.

I tried to index the file but Solr gives me an error because the XML
contains fields that I have not declared in my schema.xml

How can I tell Solr to skip unwanted fields and only index the fields that I
have declared in my schema.xml?

I know it must be something with a catchall setting and / or copyFields but
I can not get the configuration right. To be clear. I want Solr to index /
store only a few fields from the XML-file to be indexed and skip all the
other fields.

An answer or a link to a good reference would help.

Re: Skipping fields from XML

I don't think there is a way to do that.


On Thu, Jul 30, 2009 at 1:39 PM, Edwin
Stauthamer wrote:
> Hi,
>
> I want to index a perfectly good solr XML-file into an Solr/Lucene instance.
> The problem is that the XML has many fields that I don't want to be indexed.
>
> I tried to index the file but Solr gives me an error because the XML
> contains fields that I have not declared in my schema.xml
>
> How can I tell Solr to skip unwanted fields and only index the fields that I
> have declared in my schema.xml?
>
> I know it must be something with a catchall setting and / or copyFields but
> I can not get the configuration right. To be clear. I want Solr to index /
> store only a few fields from the XML-file to be indexed and skip all the
> other fields.
>
> An answer or a link to a good reference would help.
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Skipping fields from XML

2009-07-30 Thread AHMET ARSLAN


: I want Solr to index /  store only a few fields from the XML-file to be 
: indexed and  skip all the other fields. 

I think Dynamic fields [1] can help you.



[1] 
http://wiki.apache.org/solr/SchemaXml#head-82dba16404c8e3318021320638b669b3a6d780d0

solr-user@lucene.apache.org

2009-07-30 Thread Toby Cole


Any chance of getting that stack trace as more than one line? :)
Also, where are you posting your documents from? (e.g. Java, PHP,  
command line etc).


It sounds like you're not using 'entities' for your '&' characters  
(ampersands) in your XML.
These should be converted to "&" This should look familiar if  
you've ever written any HTML.



On 30 Jul 2009, at 09:44, Jörg Agatz wrote:


Good Morning SolR :-) its morning in Germany!

i have a Problem, with the Indexing...

I often become an Error.

I think it is because in the XML stand this Character "&"
I need the Character, what happens?


SimplePostTool: FATAL: Solr returned an error:
comctcwstxexcWstxLazyException_Unexpected_character___code_32_missing_ 
name__at_rowcol_unknownsource_1465__comctcwstxexcWstxLazyException_com 
ctcwstxexcWstxUnexpectedCharException_Unexpected_character___code_32_m 
issing_name__at_rowcol_unknownsource_1465__at_comctcwstxexcWstxLazyExc 
eptionthrowLazilyWstxLazyExceptionjava45__at_comctcwstxsrStreamScanner 
throwLazyErrorStreamScannerjava729__at_comctcwstxsrBasicStreamReadersa 
feFinishTokenBasicStreamReaderjava3659__at_comctcwstxsrBasicStreamRead 
ergetTextBasicStreamReaderjava809__at_orgapachesolrhandlerXMLLoaderrea 
dDocXMLLoaderjava278__at_orgapachesolrhandlerXMLLoaderprocessUpdateXML 
Loaderjava139__at_orgapachesolrhandlerXMLLoaderloadXMLLoaderjava69__at 
_orgapachesolrhandlerContentStreamHandlerBasehandleRequestBodyContentS 
treamHandlerBasejava54__at_orgapachesolrhandlerRequestHandlerBasehandl 
eRequestRequestHandlerBasejava131__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava1299__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava338__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava241__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHt


_


--

Toby Cole
Software Engineer, Semantico Limited
 
Registered in England and Wales no. 03841410, VAT no. GB-744614334.
Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK.

Check out all our latest news and thinking on the Discovery blog
http://blogs.semantico.com/discovery-blog/

Re: Skipping fields from XML

2009-07-30 Thread Koji Sekiguchi


Edwin Stauthamer wrote:

Hi,

I want to index a perfectly good solr XML-file into an Solr/Lucene instance.
The problem is that the XML has many fields that I don't want to be indexed.

I tried to index the file but Solr gives me an error because the XML
contains fields that I have not declared in my schema.xml

How can I tell Solr to skip unwanted fields and only index the fields that I
have declared in my schema.xml?

  
How about using "ignored" type for the fields which you don't want to be 
indexed:


class="solr.StrField" />





   :

Koji


I know it must be something with a catchall setting and / or copyFields but
I can not get the configuration right. To be clear. I want Solr to index /
store only a few fields from the XML-file to be indexed and skip all the
other fields.

An answer or a link to a good reference would help.

solr-user@lucene.apache.org

2009-07-30 Thread Markus Jelsma - Buyways B.V.

Indeed, or enclose the text in CDATA tags which should work as well.






On Thu, 2009-07-30 at 09:52 +0100, Toby Cole wrote:

> Any chance of getting that stack trace as more than one line? :)
> Also, where are you posting your documents from? (e.g. Java, PHP,  
> command line etc).
> 
> It sounds like you're not using 'entities' for your '&' characters  
> (ampersands) in your XML.
> These should be converted to "&" This should look familiar if  
> you've ever written any HTML.
> 
> 
> On 30 Jul 2009, at 09:44, Jörg Agatz wrote:
> 
> > Good Morning SolR :-) its morning in Germany!
> >
> > i have a Problem, with the Indexing...
> >
> > I often become an Error.
> >
> > I think it is because in the XML stand this Character "&"
> > I need the Character, what happens?
> > 
> >
> > SimplePostTool: FATAL: Solr returned an error:
> >
> comctcwstxexcWstxLazyException_Unexpected_character___code_32_missing_ 
> >
> name__at_rowcol_unknownsource_1465__comctcwstxexcWstxLazyException_com 
> >
> ctcwstxexcWstxUnexpectedCharException_Unexpected_character___code_32_m 
> >
> issing_name__at_rowcol_unknownsource_1465__at_comctcwstxexcWstxLazyExc 
> >
> eptionthrowLazilyWstxLazyExceptionjava45__at_comctcwstxsrStreamScanner 
> >
> throwLazyErrorStreamScannerjava729__at_comctcwstxsrBasicStreamReadersa 
> >
> feFinishTokenBasicStreamReaderjava3659__at_comctcwstxsrBasicStreamRead 
> >
> ergetTextBasicStreamReaderjava809__at_orgapachesolrhandlerXMLLoaderrea 
> >
> dDocXMLLoaderjava278__at_orgapachesolrhandlerXMLLoaderprocessUpdateXML 
> >
> Loaderjava139__at_orgapachesolrhandlerXMLLoaderloadXMLLoaderjava69__at 
> >
> _orgapachesolrhandlerContentStreamHandlerBasehandleRequestBodyContentS 
> >
> treamHandlerBasejava54__at_orgapachesolrhandlerRequestHandlerBasehandl 
> >
> eRequestRequestHandlerBasejava131__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava1299__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava338__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava241__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHt
> >
> > _
> 
> --
> 
> Toby Cole
> Software Engineer, Semantico Limited
>  
> Registered in England and Wales no. 03841410, VAT no. GB-744614334.
> Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK.
> 
> Check out all our latest news and thinking on the Discovery blog
> http://blogs.semantico.com/discovery-blog/
>

Re: Skipping fields from XML

2009-07-30 Thread AHMET ARSLAN


> How can I tell Solr to skip unwanted fields and only index
> the fields that I have declared in my schema.xml?

More precisely: (taken from schema.xml)

Re: Skipping fields from XML

2009-07-30 Thread Edwin Stauthamer

perfect!
That resolved my issue.

BTW. This was my first posting on this list.
I must say that the responses were quick and to the point!!! Good community
help!

On Thu, Jul 30, 2009 at 10:58 AM, AHMET ARSLAN  wrote:

>
> > How can I tell Solr to skip unwanted fields and only index
> > the fields that I have declared in my schema.xml?
>
> More precisely: (taken from schema.xml)
>
> 
>   
>
>
>
>

-- 
Met vriendelijke groet / Kind regards,

Edwin Stauthamer
Adviser Search & Collaboration
Emid Consult
T: +31 (0) 70 8870700
M: +31 (0) 6 4555 4994
E: estautha...@emidconsult.com
I: http://www.emidconsult.com

solr-user@lucene.apache.org

2009-07-30 Thread Jörg Agatz

Also, i use the Comandline tool "java .jar post.jar xyz.xml"

i donkt know what you are mean with

It sounds like you're not using 'entities' for your '&' characters
(ampersands) in your XML.
These should be converted to "&" This should look familiar if you've
ever written any HTML.
I dont understand this

i musst change even & to & ?

solr-user@lucene.apache.org



On Jul 30, 2009, at 6:17 AM, Jörg Agatz wrote:


Also, i use the Comandline tool "java .jar post.jar xyz.xml"

i donkt know what you are mean with

It sounds like you're not using 'entities' for your '&' characters
(ampersands) in your XML.
These should be converted to "&" This should look familiar if  
you've

ever written any HTML.
I dont understand this

i musst change even & to & ?


Yes, if you need an ampersand in an XML element, it must be escaped:

   Harold & Maude

Erik

solr-user@lucene.apache.org

2009-07-30 Thread Toby Cole


On 30 Jul 2009, at 11:17, Jörg Agatz wrote:

It sounds like you're not using 'entities' for your '&' characters
(ampersands) in your XML.
These should be converted to "&" This should look familiar if  
you've

ever written any HTML.
I dont understand this

i musst change even & to & ?



Yes, '&' characters aren't allowed in XML unless they are either in a  
CDATA section or part of an 'entity'.

A good place to read up on this is: 
http://www.xml.com/pub/a/2001/01/31/qanda.html

In short, replace all your & with &

--
Toby Cole
Software Engineer, Semantico Limited
Registered in England and Wales no. 03841410, VAT no. GB-744614334.
Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK.

Check out all our latest news and thinking on the Discovery blog
http://blogs.semantico.com/discovery-blog/

Re: Multi select faceting

2009-07-30 Thread Grant Ingersoll



On Jul 29, 2009, at 2:38 PM, Mike wrote:


Hi,

We're using Lucid Imagination's LucidWorks Solr 1.3 and we have a  
requirement to implement multiple-select faceting where the facet  
cells show up as checkboxes and despite checked options, all of the  
options continue to persist with counts. The best example I found is  
the search on Lucid Imagination's site: http://www.lucidimagination.com/search/


It appears the Solr 1.4 release has support for doing this with  
filter tagging (http://wiki.apache.org/solr/SimpleFacetParameters#head-f277d409b221b407d9c5430f552bf40ee6185c4c 
), but I was wondering if there was another way to accomplish this  
in 1.3?



The only way I can think to do this is to backport the patch to 1.3.   
FWIW, we are running 1.4-dev at /search, which is where that  
functionality comes from.


-Grant

Re: Question about formatting the results returned from Solr

apparently all the dat ais going to one field 'author'

instead they should be sent to separate fields
author_fname
author_lname
author_email

so you would get details like

 John
 Doe
 j...@doe.com



On Wed, Jul 29, 2009 at 7:39 PM, ahammad wrote:
>
> Hi all,
>
> Not sure how good my title is, but here is a (hopefully) better explanation
> on what I mean.
>
> I am indexing a set of articles from a DB. Each article has an author. The
> author is saved in then the DB as an author ID, which is a number.
>
> There is another table in the DB with more relevant information about the
> author. Basically it has columns like:
>
> id, firstname, lastname, email, userid
>
> I set up the DIH so that it returns the userid, and it works fine:
>
> 
>   jdoe
>   msmith
> 
>
> Would it be possible to return all of the information about the author
> (first name, ...) as a subset of the results above?
>
> Here is what I mean:
>
> 
>   
>      John
>      Doe
>      j...@doe.com
>   
>   ...
> 
>
> Something similar to that at least...
>
> Not sure how descriptive I was, but any pointers would be highly
> appreciated.
>
> Cheers
>
> --
> View this message in context: 
> http://www.nabble.com/Question-about-formatting-the-results-returned-from-Solr-tp24719831p24719831.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Range Query question

2009-07-30 Thread Matt Beaumont


Hi,

I have a set of XML data that holds Minimum and Maximum values and I need to
be able to do specific range queries against them.

(Note that this is a contrived example, and that in reality the garage would
probably hold all the individual prices of all its cars, but this is
analogous to the problem we have which is couched in terms that would
obscure the problem.)

For example, the following XML fragment is indexed so that each 
element becomes a Solr document:



Ford
Ka

   garage1
   2000
   4000


   garage2
   8000
   1





I want to be able do a range query where
  search min value = 2500
  search max value = 3500

This should return garage1 as potentially having cars in my price range as
the range of prices for the garage contains the range I have input.  It's
also worth noting that we can't simply look for min prices that fall inside
our range or max prices that fall inside our range, as in the case outlined
above, none of the individual values fall inside our range, but there is
overlap.

The problem is that the indexed form of this XML is flattened so the 
entity has 2 garage names, 2 min values and 2 max values, but the grouping
between the garage name and it's min and max values is lost.  The danger is
that we end up doing a comparison of the min-of-the-mins and the
max-of-the-maxes, which tells us that a car is available in the price range
which may not be true if garage1 has all cars below our search range and
garage2 has all cars above our search range, e.g. if our search range is
5000-6000 then we should get no match.

We wanted to include the garage name as an attritube of the min/max values
to maintain this link, but couldn't find a way to do this.

Finally, it would be extremely difficult for us to modify the XML presented
to our system, hence our approach to date.

Has anyone had a similar problem and if so how did you overcome it?

Thanks for taking the time to look.

-
Matt Beaumont
mibe...@yahoo.co.uk

-- 
View this message in context: 
http://www.nabble.com/Range-Query-question-tp24737656p24737656.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Question about formatting the results returned from Solr

2009-07-30 Thread ahammad


Yes, I get that.

The problem arises when you have multiple authors. How can I know which
first name goes with which user id etc...

Cheers


Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
> 
> apparently all the dat ais going to one field 'author'
> 
> instead they should be sent to separate fields
> author_fname
> author_lname
> author_email
> 
> so you would get details like
> 
>  John
>  Doe
>  j...@doe.com
> 
> 
> 
> On Wed, Jul 29, 2009 at 7:39 PM, ahammad wrote:
>>
>> Hi all,
>>
>> Not sure how good my title is, but here is a (hopefully) better
>> explanation
>> on what I mean.
>>
>> I am indexing a set of articles from a DB. Each article has an author.
>> The
>> author is saved in then the DB as an author ID, which is a number.
>>
>> There is another table in the DB with more relevant information about the
>> author. Basically it has columns like:
>>
>> id, firstname, lastname, email, userid
>>
>> I set up the DIH so that it returns the userid, and it works fine:
>>
>> 
>>   jdoe
>>   msmith
>> 
>>
>> Would it be possible to return all of the information about the author
>> (first name, ...) as a subset of the results above?
>>
>> Here is what I mean:
>>
>> 
>>   
>>      John
>>      Doe
>>      j...@doe.com
>>   
>>   ...
>> 
>>
>> Something similar to that at least...
>>
>> Not sure how descriptive I was, but any pointers would be highly
>> appreciated.
>>
>> Cheers
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Question-about-formatting-the-results-returned-from-Solr-tp24719831p24719831.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Question-about-formatting-the-results-returned-from-Solr-tp24719831p24737962.html
Sent from the Solr - User mailing list archive at Nabble.com.

How can i get lucene index format version information?

2009-07-30 Thread Licinio Fernández Maurelo

 i want to get the lucene index format version from solr web app (as
luke do), i've tried looking for the info at luke handler response,
but i havn't found this info

-- 
Lici

SOLR deleted almost everything?

2009-07-30 Thread Reece

Hello everyone :)

I was trying to purge out older things.. in this case of a certain
type of document that had an ID lower than 200.  So I posted this:

id:[0 TO 200] AND type:I

Now, I have only 49 type "I" items total in my index (shown by
/solr/select?q=type:I), when there should be numbers still up to about
2165000 which is far far more than 49  I'm curious why this would
be, as I'm trying to build it automatic purging of older things, but
this obviously didn't work the way I thought.

I'm on version 1.1, and my schema information for the fields is below:

   
   
   
   
   
   
   
   
   
   

Thanks for any insight into why I broke it!
-Reece

Re: Multi select faceting

2009-07-30 Thread Mike

Grant, thanks for the reply. We tested our requirement against 1.4-dev and 
were able to achieve what we wanted. The site we're rebuilding has low 
traffic, so we're going to run with 1.4-dev.


Cheers.

- Original Message - 
From: "Grant Ingersoll" 

To: 
Sent: Thursday, July 30, 2009 8:05 AM
Subject: Re: Multi select faceting




On Jul 29, 2009, at 2:38 PM, Mike wrote:


Hi,

We're using Lucid Imagination's LucidWorks Solr 1.3 and we have a 
requirement to implement multiple-select faceting where the facet  cells 
show up as checkboxes and despite checked options, all of the  options 
continue to persist with counts. The best example I found is  the search 
on Lucid Imagination's site: http://www.lucidimagination.com/search/


It appears the Solr 1.4 release has support for doing this with  filter 
tagging 
(http://wiki.apache.org/solr/SimpleFacetParameters#head-f277d409b221b407d9c5430f552bf40ee6185c4c 
 ), but I was wondering if there was another way to accomplish this  in 
1.3?



The only way I can think to do this is to backport the patch to 1.3. 
FWIW, we are running 1.4-dev at /search, which is where that 
functionality comes from.


-Grant

RE: Boosting ('bq') on multi-valued fields

2009-07-30 Thread Ensdorf Ken


> Hey Ken,
> Thanks for your reply.
> When I wrote '5|6' I ment that this is a multiValued field with two
> values
> '5' and '6', rather than the literal string '5|6' (and any Tokenizer).
> Does
> your reply still holds? That is, are multiValued fields dependent on
> the
> notion of tokenization to such a degree so that I cant use str type
> with
> them meaningfully? if so, it seems weird to me that I should be able to
> define a str multiValued field to begin with..

I'm pretty sure you can use multiValued string fields in the way you are 
describing.  If you just do a query without the boost do documents with 
multiple values come back?  That would at least tell you whether the problem 
was matching on the term itself or something to do with your use of boosts.

-Ken

RE: Range Query question

2009-07-30 Thread Ensdorf Ken

> The problem is that the indexed form of this XML is flattened so the
> 
> entity has 2 garage names, 2 min values and 2 max values, but the
> grouping
> between the garage name and it's min and max values is lost.  The
> danger is
> that we end up doing a comparison of the min-of-the-mins and the
> max-of-the-maxes, which tells us that a car is available in the price
> range
> which may not be true if garage1 has all cars below our search range
> and
> garage2 has all cars above our search range, e.g. if our search range
> is
> 5000-6000 then we should get no match.

You could index each garage-car pairing as a separate document, embedding all 
the necessary information you need for searching.

e.g.-


Ford
Ka
   garage1
   2000
   4000

Re: SOLR deleted almost everything?



On Jul 30, 2009, at 9:44 AM, Reece wrote:


Hello everyone :)

I was trying to purge out older things.. in this case of a certain
type of document that had an ID lower than 200.  So I posted this:

id:[0 TO 200] AND type:I

Now, I have only 49 type "I" items total in my index (shown by
/solr/select?q=type:I), when there should be numbers still up to about
2165000 which is far far more than 49  I'm curious why this would
be, as I'm trying to build it automatic purging of older things, but
this obviously didn't work the way I thought.

I'm on version 1.1, and my schema information for the fields is below:

  


Use one of the sortable numeric types for your id field if you need to  
perform range queries on them.  A string is sorted lexicographically:  
1, 10, 11, 2, 3, 4, 5... and thus a range query won't work the way you  
might expect.


Erik

NullPointerException in DataImportHandler


First of all, apologies if you get this twice. I posted it by email an hour
ago but it hasn't appeared in any of the archives, so I'm worried it's got
junked somewhere.

I'm trying to use a DataImportHandler to merge some data from a database
with some other fields from a collection of XML files, rather like the
example in the Architecture section here:

http://wiki.apache.org/solr/DataImportHandler

... so a given document is built from some fields from the database and some
from the XML.

My dataconfig.xml looks like this:



   

   

   

   

   
   
   

   

   



This works if I comment out the inner entity, but when I uncomment it, I get
this error:


30-Jul-2009 14:32:50 org.apache.solr.handler.dataimport.DocBuilder
buildDocument
SEVERE: Exception while processing: domain document :
SolrInputDocument[{id=id(1.0)={1s32D00}, title=title(1.0)={PDB code
1s32, chain D, domain 00}, keywords=keywords(1.0)={some ke
ywords go here}, pdb_code=pdb_code(1.0)={1s32},
doc_type=doc_type(1.0)={domain}, related_ids=related_ids(1.0)={1s32
1s32D}}]
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NullPointerException
   at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:64)
   at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71)
   at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
   at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:344)
   at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:372)
   at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:225)
   at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:167)
   at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333)
   at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393)
   at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372)
Caused by: java.lang.NullPointerException
   at java.io.File.(File.java:222)
   at
org.apache.solr.handler.dataimport.FileDataSource.getData(FileDataSource.java:75)
   at
org.apache.solr.handler.dataimport.FileDataSource.getData(FileDataSource.java:44)
   at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58)
   ... 9 more


I have checked that the file
/cath/people/cathdata/v3_3_0/pdb-XML-noatom/1s32-noatom.xml is readable, so
maybe the full path to the file isn't being constructed properly or
something?

I also tried with the full path template for the file in the entity url
attribute, instead of using a basePath in the dataSource, but I get exactly
the same exception.

This is with the 2009-07-30 nightly build. See attached for schema. 
http://www.nabble.com/file/p24739580/schema.xml schema.xml 

Any ideas? Thanks in advance!

Andrew.


--
:: http://biotext.org.uk/ ::
-- 
View this message in context: 
http://www.nabble.com/NullPointerException-in-DataImportHandler-tp24739580p24739580.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Question about formatting the results returned from Solr

>
> instead they should be sent to separate fields
> author_fname
> author_lname
> author_email
>

or, a dynamic field called author_* (I am assuming all of the author fields
to be of the same type).

And if you use SolrJ, you can transform this info into a data structure like
"Map authorInfo", where the keys would be "firstName",
"lastName", "email" etc. Look for more here -
http://issues.apache.org/jira/browse/SOLR-1129

Cheers
Avlesh

2009/7/30 Noble Paul നോബിള്‍ नोब्ळ् 

> apparently all the dat ais going to one field 'author'
>
> instead they should be sent to separate fields
> author_fname
> author_lname
> author_email
>
> so you would get details like
>
> John
> Doe
> j...@doe.com
>
>
>
> On Wed, Jul 29, 2009 at 7:39 PM, ahammad wrote:
> >
> > Hi all,
> >
> > Not sure how good my title is, but here is a (hopefully) better
> explanation
> > on what I mean.
> >
> > I am indexing a set of articles from a DB. Each article has an author.
> The
> > author is saved in then the DB as an author ID, which is a number.
> >
> > There is another table in the DB with more relevant information about the
> > author. Basically it has columns like:
> >
> > id, firstname, lastname, email, userid
> >
> > I set up the DIH so that it returns the userid, and it works fine:
> >
> > 
> >   jdoe
> >   msmith
> > 
> >
> > Would it be possible to return all of the information about the author
> > (first name, ...) as a subset of the results above?
> >
> > Here is what I mean:
> >
> > 
> >   
> >  John
> >  Doe
> >  j...@doe.com
> >   
> >   ...
> > 
> >
> > Something similar to that at least...
> >
> > Not sure how descriptive I was, but any pointers would be highly
> > appreciated.
> >
> > Cheers
> >
> > --
> > View this message in context:
> http://www.nabble.com/Question-about-formatting-the-results-returned-from-Solr-tp24719831p24719831.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
>
>
>
> --
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
>

Posting data in JSON

Hi All,

 I'm wondering if it's possible to post documents to solr in JSON format.

JSON is much faster than XML to get the queries results, so I think
it'd be great to be able to post data in JSON to speed up the indexing
and lower the network load.

All the best !

Jerome Eteve.

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Re: NullPointerException in DataImportHandler

2009-07-30 Thread Chantal Ackermann


Hi Andrew,

your inner entity uses an XML type datasource. The default entity 
processor is the SQL one, however.


For your inner entity, you have to specify the correct entity processor 
explicitly. You do that by adding the attribute "processor", and the 
value is the classname of the processor you want to use.


e.g. processor="XPathEntityProcessor" 


(See the wikipedia example on the DataImportHandler wiki page.)

Cheers,
Chantal

Andrew Clegg schrieb:

First of all, apologies if you get this twice. I posted it by email an hour
ago but it hasn't appeared in any of the archives, so I'm worried it's got
junked somewhere.

I'm trying to use a DataImportHandler to merge some data from a database
with some other fields from a collection of XML files, rather like the
example in the Architecture section here:

http://wiki.apache.org/solr/DataImportHandler

... so a given document is built from some fields from the database and some
from the XML.

My dataconfig.xml looks like this:



   

   

   

   

   
   
   

   

   



This works if I comment out the inner entity, but when I uncomment it, I get
this error:


30-Jul-2009 14:32:50 org.apache.solr.handler.dataimport.DocBuilder
buildDocument
SEVERE: Exception while processing: domain document :
SolrInputDocument[{id=id(1.0)={1s32D00}, title=title(1.0)={PDB code
1s32, chain D, domain 00}, keywords=keywords(1.0)={some ke
ywords go here}, pdb_code=pdb_code(1.0)={1s32},
doc_type=doc_type(1.0)={domain}, related_ids=related_ids(1.0)={1s32
1s32D}}]
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NullPointerException
   at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:64)
   at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71)
   at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
   at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:344)
   at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:372)
   at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:225)
   at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:167)
   at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333)
   at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393)
   at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372)
Caused by: java.lang.NullPointerException
   at java.io.File.(File.java:222)
   at
org.apache.solr.handler.dataimport.FileDataSource.getData(FileDataSource.java:75)
   at
org.apache.solr.handler.dataimport.FileDataSource.getData(FileDataSource.java:44)
   at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58)
   ... 9 more


I have checked that the file
/cath/people/cathdata/v3_3_0/pdb-XML-noatom/1s32-noatom.xml is readable, so
maybe the full path to the file isn't being constructed properly or
something?

I also tried with the full path template for the file in the entity url
attribute, instead of using a basePath in the dataSource, but I get exactly
the same exception.

This is with the 2009-07-30 nightly build. See attached for schema.
http://www.nabble.com/file/p24739580/schema.xml schema.xml

Any ideas? Thanks in advance!

Andrew.


--
:: http://biotext.org.uk/ ::
--
View this message in context: 
http://www.nabble.com/NullPointerException-in-DataImportHandler-tp24739580p24739580.html
Sent from the Solr - User mailing list archive at Nabble.com.

Posting Word documents

2009-07-30 Thread Kevin Miller

I am trying to post a Word document using the Solr post.jar file.  When
I attempt this, using a command line interface, I get a fatal error.

I have looked at the following resources:

Solr.com: Tutorial, Docs, FAQ, & ExtractingRequestHandler.

As near as I can tell, I have all the files in the proper place.

Following is a portion of the error displayed in the cmd window:

C:\Solr\Apache~1\example\exampledocs>java -jar post.jar *.doc
SimplePostTool: version 1.2
SimplePostTool: WARNING: Make sure your XML documents are encoded in
UTF-8, other encodings are not currently supported
SimplePostTool: POSTing files to http://localhost:8983/solr/update..
SimplePostTool: POSTing file BadNews.doc
SimplePostTool: FATAL: Solr returned an error:
Unexpected_character__code_65533__0xfffd_in_prolog_expected___at_rowcol_
unknownsoruce_11_javaioIOException_Unexpected_charater__code65533__0xfff
d_in_prolog_expected___at_rowcol_unknownsource_11___at_orgapachesolrhand
lerXMLLoaderloadXMLLoaderjava73___at_orgapahcesolrhandlerContentStreamHa
ndlerBasehandlerRequrestBodyContentStreamHandlerBasejava54___...

There is more and if needed I will be happy to post all of it.

Here is the information that posted into the log file:

127.0.0.1 -  -  [30/07/2009:15:20:09 +] "POST /solr/update HTTP/1.1"
500 4011 

Kevin Miller
Web Services

Re: Posting Word documents

2009-07-30 Thread Mark Miller


Look again at ExtractingRequestHandler.

I havn't looked at what post.jar does internally, but it probably 
doesn't work with ExtractingRequestHandler unless you can send other 
params as well. I would use curl as the examples in the doc for 
ExtractingRequestHandler does. Or figure out if post.jar will work for 
you and use it correctly. What Handler is 'update..' mapped to? If its 
not mapped to ExtractingRequestHandler than you have no hope of this 
working in any case. Looks to me like its trying to process the file as 
SolrXml - which means you are not submitting it to ExtractingRequestHandler.


--
- Mark

http://www.lucidimagination.com



Kevin Miller wrote:

I am trying to post a Word document using the Solr post.jar file.  When
I attempt this, using a command line interface, I get a fatal error.

I have looked at the following resources:

Solr.com: Tutorial, Docs, FAQ, & ExtractingRequestHandler.

As near as I can tell, I have all the files in the proper place.

Following is a portion of the error displayed in the cmd window:

C:\Solr\Apache~1\example\exampledocs>java -jar post.jar *.doc
SimplePostTool: version 1.2
SimplePostTool: WARNING: Make sure your XML documents are encoded in
UTF-8, other encodings are not currently supported
SimplePostTool: POSTing files to http://localhost:8983/solr/update..
SimplePostTool: POSTing file BadNews.doc
SimplePostTool: FATAL: Solr returned an error:
Unexpected_character__code_65533__0xfffd_in_prolog_expected___at_rowcol_
unknownsoruce_11_javaioIOException_Unexpected_charater__code65533__0xfff
d_in_prolog_expected___at_rowcol_unknownsource_11___at_orgapachesolrhand
lerXMLLoaderloadXMLLoaderjava73___at_orgapahcesolrhandlerContentStreamHa
ndlerBasehandlerRequrestBodyContentStreamHandlerBasejava54___...

There is more and if needed I will be happy to post all of it.

Here is the information that posted into the log file:

127.0.0.1 -  -  [30/07/2009:15:20:09 +] "POST /solr/update HTTP/1.1"
500 4011 


Kevin Miller
Web Services

Re: How can i get lucene index format version information?



On Jul 30, 2009, at 9:19 AM, Licinio Fernández Maurelo wrote:


i want to get the lucene index format version from solr web app (as
luke do), i've tried looking for the info at luke handler response,
but i havn't found this info


the Luke request handler writes it out:

   indexInfo.add("version", reader.getVersion());

It appears in the index section near the top of the response.

Erik

Re: NullPointerException in DataImportHandler

Chantal Ackermann wrote:
> 
> Hi Andrew,
> 
> your inner entity uses an XML type datasource. The default entity 
> processor is the SQL one, however.
> 
> For your inner entity, you have to specify the correct entity processor 
> explicitly. You do that by adding the attribute "processor", and the 
> value is the classname of the processor you want to use.
> 
> e.g.  processor="XPathEntityProcessor" 
> 

Thanks -- I was also missing a forEach expression -- in my case, just "/"
since each XML file contains the information for no more than one document.

However, I'm now getting a different exception:

30-Jul-2009 16:48:52 org.apache.solr.handler.dataimport.DocBuilder
buildDocument
SEVERE: Exception while processing: domain document :
SolrInputDocument[{id=id(1.0)={1udaA02}, title=title(1.0)={PDB code 1uda,
chain A, domain 02}, pdb_code=pdb_code(1.0)={1uda}, 
doc_type=doc_type(1.0)={domain}, related_ids=related_ids(1.0)={1uda,1udaA}}]
org.apache.solr.handler.dataimport.DataImportHandlerException: Exception
while reading xpaths for fields Processing Document # 1
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:135)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.init(XPathEntityProcessor.java:76)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:71)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:307)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:372)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:225)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:167)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372)
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.LinkedList.entry(LinkedList.java:365)
at java.util.LinkedList.get(LinkedList.java:315)
at
org.apache.solr.handler.dataimport.XPathRecordReader.addField0(XPathRecordReader.java:71)
at
org.apache.solr.handler.dataimport.XPathRecordReader.(XPathRecordReader.java:50)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:121)
... 9 more

My data config now looks like this:

Thanks in advance, again :-)

Andrew.

-- 
View this message in context: 
http://www.nabble.com/NullPointerException-in-DataImportHandler-tp24739580p24741292.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: NullPointerException in DataImportHandler



On Jul 30, 2009, at 11:54 AM, Andrew Clegg wrote:

   
   xpath="//*[local-name()='structCategory']/*[local-name()='struct']/ 
*[local-name()='title']"

/>


The XPathEntityProcessor doesn't support that fancy of an xpath - it  
supports only a limited subset.  Try /structCategory/struct/title  
perhaps?


Erik

Re: NullPointerException in DataImportHandler

2009-07-30 Thread Chantal Ackermann

Hi Andrew,

my experience with XPathEntityProcessor is non-existent. ;-)

Just after a quick look at the method that throws the exception:

  private void addField0(String xpath, String name, boolean multiValued,
 boolean isRecord) {
List paths = new 
LinkedList(Arrays.asList(xpath.split("/")));

if ("".equals(paths.get(0).trim()))
  paths.remove(0);
rootNode.build(paths, name, multiValued, isRecord);
  }

and your foreach attribute value in combination with the xpath:
> forEach="/">
> > 
xpath="//*[local-name()='structCategory']/*[local-name()='struct']/*[local-name()='title']"

> />

I would guess that the double slash at the beginning is not working with 
your foreach regex. I don't know whether this is something the processor 
should expect and handle correctly or whether you have to take care of 
in your configuration.

Cheers,
Chantal

Andrew Clegg schrieb:

Chantal Ackermann wrote:

Hi Andrew,

your inner entity uses an XML type datasource. The default entity
processor is the SQL one, however.

For your inner entity, you have to specify the correct entity processor
explicitly. You do that by adding the attribute "processor", and the
value is the classname of the processor you want to use.

e.g. 

Thanks -- I was also missing a forEach expression -- in my case, just "/"
since each XML file contains the information for no more than one document.

However, I'm now getting a different exception:

30-Jul-2009 16:48:52 org.apache.solr.handler.dataimport.DocBuilder
buildDocument
SEVERE: Exception while processing: domain document :
SolrInputDocument[{id=id(1.0)={1udaA02}, title=title(1.0)={PDB code 1uda,
chain A, domain 02}, pdb_code=pdb_code(1.0)={1uda},
doc_type=doc_type(1.0)={domain}, related_ids=related_ids(1.0)={1uda,1udaA}}]
org.apache.solr.handler.dataimport.DataImportHandlerException: Exception
while reading xpaths for fields Processing Document # 1
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:135)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.init(XPathEntityProcessor.java:76)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:71)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:307)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:372)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:225)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:167)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372)
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.LinkedList.entry(LinkedList.java:365)
at java.util.LinkedList.get(LinkedList.java:315)
at
org.apache.solr.handler.dataimport.XPathRecordReader.addField0(XPathRecordReader.java:71)
at
org.apache.solr.handler.dataimport.XPathRecordReader.(XPathRecordReader.java:50)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:121)
... 9 more

My data config now looks like this:

Thanks in advance, again :-)

Andrew.

--
View this message in context: 
http://www.nabble.com/NullPointerException-in-DataImportHandler-tp24739580p24741292.html
Sent from the Solr - User mailing list archive at Nabble.com.

--
Chantal Ackermann

Re: SOLR deleted almost everything?

2009-07-30 Thread Reece

Right, I figured that that's how it should have sorted... which is why
I did a range from 0 to 200

That should have worked for my example, but it removed things over
200, which using lexical sorting seems to still be invalid.

What's left are things like:  998914
Now, obviously that is expected, as it starts with a number over 2
but why would things like 2165979 be deleted when that is lexically
after 200?

Unless... oh man, I hope I didn't put an extra zero in there by accident  !!
** checking .bash_history...

Oh crap... I ran it between 0 and 7 at some point sigh.
Thanks for the help!

-Reece

On Thu, Jul 30, 2009 at 10:08 AM, Erik
Hatcher wrote:
>
> On Jul 30, 2009, at 9:44 AM, Reece wrote:
>
>> Hello everyone :)
>>
>> I was trying to purge out older things.. in this case of a certain
>> type of document that had an ID lower than 200.  So I posted this:
>>
>> id:[0 TO 200] AND type:I
>>
>> Now, I have only 49 type "I" items total in my index (shown by
>> /solr/select?q=type:I), when there should be numbers still up to about
>> 2165000 which is far far more than 49  I'm curious why this would
>> be, as I'm trying to build it automatic purging of older things, but
>> this obviously didn't work the way I thought.
>>
>> I'm on version 1.1, and my schema information for the fields is below:
>>
>>  > required="true" />
>
> Use one of the sortable numeric types for your id field if you need to
> perform range queries on them.  A string is sorted lexicographically: 1, 10,
> 11, 2, 3, 4, 5... and thus a range query won't work the way you might
> expect.
>
>        Erik
>
>

Re: NullPointerException in DataImportHandler



Erik Hatcher wrote:
> 
> 
> On Jul 30, 2009, at 11:54 AM, Andrew Clegg wrote:
>>> url="${domain.pdb_code}-noatom.xml" processor="XPathEntityProcessor"
>> forEach="/">
>>> xpath="//*[local-name()='structCategory']/*[local-name()='struct']/ 
>> *[local-name()='title']"
>> />
> 
> The XPathEntityProcessor doesn't support that fancy of an xpath - it  
> supports only a limited subset.  Try /structCategory/struct/title  
> perhaps?
> 
> 

Sadly not...

I tried with:



(full path from root)

and



Same ArrayIndex error each time.

Doesn't it use javax.xml then? I was using the complex local-name
expressions to make it namespace-agnostic -- is it agnostic anyway?

Thanks,

Andrew.

-- 
View this message in context: 
http://www.nabble.com/NullPointerException-in-DataImportHandler-tp24739580p24741696.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: NullPointerException in DataImportHandler




Chantal Ackermann wrote:
> 
> 
> my experience with XPathEntityProcessor is non-existent. ;-)
> 
> 

Don't worry -- your hints put me on the right track :-)

I got it working with:





Now, to get it to ignore missing files without an error... Hmm...

Cheers,

Andrew.

-- 
View this message in context: 
http://www.nabble.com/NullPointerException-in-DataImportHandler-tp24739580p24741772.html
Sent from the Solr - User mailing list archive at Nabble.com.

Minimum facet length?

2009-07-30 Thread darren

Hi,
  I am exploring the faceted search results of Solr. My query is like this.

http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=text&facet.limit=500&facet.prefix=wick

If I don't use the prefix, I get back totals for words like 1,a,of,2,3,4.
1 letter/number occurrences in my documents. Its not really useful since
all the documents have some free floating single-digit numbers.

Is there a way to restrict the word frequency results for a facet based on
the length so I can set it to > 3 or is there a better way?

thanks,
Darren

Re: NullPointerException in DataImportHandler



On Jul 30, 2009, at 12:19 PM, Andrew Clegg wrote:

Don't worry -- your hints put me on the right track :-)

I got it working with:

   
   
   

Now, to get it to ignore missing files without an error... Hmm...


onError="skip"  or abort, or continue

Erik

Re: NullPointerException in DataImportHandler

2009-07-30 Thread Chantal Ackermann

It's very easy to write your own entity processor. At least, that is my 
experience with extending the SQLEntityProcessor to my needs. So, maybe 
you'd be better off subclassing the xpath processor and handling the 
xpath in a way you can keep your configuration straight forward.



Andrew Clegg schrieb:



Chantal Ackermann wrote:


my experience with XPathEntityProcessor is non-existent. ;-)




Don't worry -- your hints put me on the right track :-)

I got it working with:





Now, to get it to ignore missing files without an error... Hmm...

Cheers,

Andrew.

--
View this message in context: 
http://www.nabble.com/NullPointerException-in-DataImportHandler-tp24739580p24741772.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Chantal Ackermann
Consultant

mobil+49 (176) 10 00 09 45
emailchantal.ackerm...@btelligent.de



b.telligent GmbH & Co. KG
Lichtenbergstraße 8
D-85748 Garching / München

fon   +49 (89) 54 84 25 60
fax+49 (89) 54 84 25 69
web  www.btelligent.de

Registered in Munich: HRA 84393
Managing Director: b.telligent Verwaltungs GmbH, HRB 153164 represented 
by Sebastian Amtage and Klaus Blaschek

USt.Id.-Nr. DE814054803



Confidentiality Note
This email is intended only for the use of the individual or entity to 
which it is addressed, and may contain information that is privileged, 
confidential and exempt from disclosure under applicable law. If the 
reader of this email message is not the intended recipient, or the 
employee or agent responsible for delivery of the message to the 
intended recipient, you are hereby notified that any dissemination, 
distribution or copying of this communication is prohibited. If you have 
received this email in error, please notify us immediately by telephone 
at +49 (0) 89 54 84 25 60. Thank you.

Re: update some index documents after indexing process is done with DIH

2009-07-30 Thread Marc Sturlese


Hoss I see what you mean. I am trying to implement a CustomUpdateProcessor
checking out here:
http://wiki.apache.org/solr/UpdateRequestProcessor
What is confusing me now is that I have to implement my logic in
processComit as you said:

>>you'll still need the "double commit" (once so you can see the 
>>main changes, and once so the rest of the world can see your 
>>modifications) but you can execute them both directly in your 
>>processCommit(CommitUpdateCommand)

I have noticed that in the processAdd you have acces to the concrete
SolrInpuntDocument you are going to add:
SolrInputDocument doc = cmd.getSolrInputDocument();

But in processCommit, having access to the core I can get the IndexReader
but I still don't know how to get the IndexWriter and SolrInputDocuments in
there.
My idea is to do something like:

   @Override
public void processCommit(CommitUpdateCommand cmd) throws IOException {
  //first commit that show me modification
  //open and iterate over the reader and create solrDocuments list
  //close reader
  //openwriter and update the docs in the list
  //close writer and second commit that shows my changes to the world
  
  if (next != null)
next.processCommit(cmd);

}

As I understood the process, the commitCommand will be sent to the
DirectUpdateHandler2. that will proper do the commit via
UpdateRequestProcessor.
Am I in the right way?  I haven't dealed with CustomUpdateProcessor for
doing something after a commit is executed so I am a bit confused...

Thanks in advance.




hossman wrote:
> 
> 
> This thread all sounds really kludgy ... among other things the 
> newSearcher listener is going to need to some how keep track of when it 
> was called as a result of a "real" commit, vs when it was called as the 
> result of a commit it itself triggered to make changes.
> 
> wouldn't an easier place to implement this logic be in an UpdateProcessor?  
> you'll still need the "double commit" (once so you can see the 
> main changes, and once so the rest of the world can see your 
> modifications) but you can execute them both directly in your 
> processCommit(CommitUpdateCommand) method (so you don't have to worry 
> about being able to tell them apart)
> 
> : Date: Thu, 30 Jul 2009 10:14:16 +0530
> : From:
> :
> =?UTF-8?B?Tm9ibGUgUGF1bCDgtKjgtYvgtKzgtL/gtLPgtY3igI0gIOCkqOCli+CkrOCljeCk
> : s+CljQ==?= 
> : Reply-To: solr-user@lucene.apache.org, noble.p...@gmail.com
> : To: solr-user@lucene.apache.org
> : Subject: Re: update some index documents after indexing process is done
> with 
> : DIH
> : 
> : If you make your EventListener implements SolrCoreAware you can get
> : hold of the core on inform. use that to get hold of the
> : SolrIndexWriter
> : 
> : On Wed, Jul 29, 2009 at 9:20 PM, Marc Sturlese
> wrote:
> : >
> : > From the newSearcher(..) of a CustomEventListener which extends of
> : > AbstractSolrEventListener  can access to SolrIndexSearcher and all
> core
> : > properties but can't get a SolrIndexWriter. Do you now how can I get
> from
> : > there a SolrIndexWriter? This way I would be able to modify the
> documents (I
> : > need to modify them depending on values of other documents, that's why
> I
> : > can't do it with DIH delta-import).
> : > Thanks in advance
> : >
> : >
> : > Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
> : >>
> : >> On Tue, Jul 28, 2009 at 5:17 PM, Marc
> Sturlese
> : >> wrote:
> : >>>
> : >>> That really sounds the best way to reach my goal. How could I
> invoque a
> : >>> listener from the newSearcher?Would be something like:
> : >>>    
> : >>>      
> : >>>         solr 0  : >>> name="rows">10 
> : >>>         rocks 0
>  : >>> name="rows">10 
> : >>>        static newSearcher warming query from
> : >>> solrconfig.xml
> : >>>      
> : >>>    
> : >>>    
> : >>>
> : >>> And MyCustomListener would be the class who open the reader:
> : >>>
> : >>>        RefCounted searchHolder = null;
> : >>>        try {
> : >>>          searchHolder = dataImporter.getCore().getSearcher();
> : >>>          IndexReader reader = searchHolder.get().getReader();
> : >>>
> : >>>          //Here I iterate over the reader doing docuemnt
> modifications
> : >>>
> : >>>        } finally {
> : >>>           if (searchHolder != null) searchHolder.decref();
> : >>>        }
> : >>>        } catch (Exception ex) {
> : >>>            LOG.info("error");
> : >>>        }
> : >>
> : >> you may not be able to access the DIH API from a newSearcher event .
> : >> But the API would give you the searcher directly as a method
> : >> parameter.
> : >>>
> : >>> Finally, to access to documents and add fields to some of them, I
> have
> : >>> thought in using SolrDocument classes. Can you please point me where
> : >>> something similar is done in solr source (I mean creation of
> : >>> SolrDocuemnts
> : >>> and conversion of them to proper lucene docuements).
> : >>>
> : >>> Does this way for reaching the goal makes sense?
> : >>>
> : >>> Thanks in advance

RE: Range Query question

2009-07-30 Thread Matt Beaumont


Thanks for the reply; 
I had thought the solution would be altering the XML.



Ensdorf Ken wrote:
> 
>> The problem is that the indexed form of this XML is flattened so the
>> 
>> entity has 2 garage names, 2 min values and 2 max values, but the
>> grouping
>> between the garage name and it's min and max values is lost.  The
>> danger is
>> that we end up doing a comparison of the min-of-the-mins and the
>> max-of-the-maxes, which tells us that a car is available in the price
>> range
>> which may not be true if garage1 has all cars below our search range
>> and
>> garage2 has all cars above our search range, e.g. if our search range
>> is
>> 5000-6000 then we should get no match.
> 
> You could index each garage-car pairing as a separate document, embedding
> all the necessary information you need for searching.
> 
> e.g.-
> 
> 
> Ford
> Ka
>garage1
>2000
>4000
> 
> 
> 


-
Matt Beaumont
mibe...@yahoo.co.uk

-- 
View this message in context: 
http://www.nabble.com/Range-Query-question-tp24737656p24742062.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Posting data in JSON

On Thu, Jul 30, 2009 at 8:31 PM, Jérôme Etévé wrote:

> Hi All,
>
>  I'm wondering if it's possible to post documents to solr in JSON format.
>
> JSON is much faster than XML to get the queries results, so I think
> it'd be great to be able to post data in JSON to speed up the indexing
> and lower the network load.
>

If you are using Java,Solrj on 1.4 (trunk), you can use the binary format
which is extremely compact and efficient. Note that with Solr/Solrj 1.3,
binary became the default response format for Solrj clients.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Minimum facet length?

On Thu, Jul 30, 2009 at 9:53 PM,  wrote:

> Hi,
>  I am exploring the faceted search results of Solr. My query is like this.
>
>
> http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=text&facet.limit=500&facet.prefix=wick
>
> If I don't use the prefix, I get back totals for words like 1,a,of,2,3,4.
> 1 letter/number occurrences in my documents. Its not really useful since
> all the documents have some free floating single-digit numbers.
>
> Is there a way to restrict the word frequency results for a facet based on
> the length so I can set it to > 3 or is there a better way?
>

Yes, you can specify facet.mincount=3 to return only those terms present in
more than 3 documents. On a related note, a tokenized field (such as text
type in the example schema) will create a large number of unqiue terms.
Faceting on such a field may not be very useful and/or efficient. Typically
faceting is done on untokenized fields (such as string type).

-- 
Regards,
Shalin Shekhar Mangar.

Re: Minimum facet length?



On Jul 30, 2009, at 1:00 PM, Shalin Shekhar Mangar wrote:


On Thu, Jul 30, 2009 at 9:53 PM,  wrote:


Hi,
I am exploring the faceted search results of Solr. My query is like  
this.



http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=text&facet.limit=500&facet.prefix=wick

If I don't use the prefix, I get back totals for words like 1,a,of, 
2,3,4.
1 letter/number occurrences in my documents. Its not really useful  
since

all the documents have some free floating single-digit numbers.

Is there a way to restrict the word frequency results for a facet  
based on

the length so I can set it to > 3 or is there a better way?



Yes, you can specify facet.mincount=3 to return only those terms  
present in
more than 3 documents. On a related note, a tokenized field (such as  
text
type in the example schema) will create a large number of unqiue  
terms.
Faceting on such a field may not be very useful and/or efficient.  
Typically

faceting is done on untokenized fields (such as string type).


I think what was meant by > 3 was if faceting only returned terms of  
length greater than 3, not count.


You could copyField your text field to another field, set the analyzer  
to include a LengthFilterFactory with a minimum length specified, and  
also have other analysis tweaks to have numbers and other stop words  
removed.


Erik

Re: Minimum facet length?

On Thu, Jul 30, 2009 at 10:35 PM, Erik Hatcher
wrote:

>
> On Jul 30, 2009, at 1:00 PM, Shalin Shekhar Mangar wrote:
>
>  On Thu, Jul 30, 2009 at 9:53 PM,  wrote:
>>
>>  Hi,
>>> I am exploring the faceted search results of Solr. My query is like this.
>>>
>>>
>>>
>>> http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=text&facet.limit=500&facet.prefix=wick
>>>
>>> If I don't use the prefix, I get back totals for words like 1,a,of,2,3,4.
>>> 1 letter/number occurrences in my documents. Its not really useful since
>>> all the documents have some free floating single-digit numbers.
>>>
>>> Is there a way to restrict the word frequency results for a facet based
>>> on
>>> the length so I can set it to > 3 or is there a better way?
>>>
>>>
>> Yes, you can specify facet.mincount=3 to return only those terms present
>> in
>> more than 3 documents. On a related note, a tokenized field (such as text
>> type in the example schema) will create a large number of unqiue terms.
>> Faceting on such a field may not be very useful and/or efficient.
>> Typically
>> faceting is done on untokenized fields (such as string type).
>>
>
> I think what was meant by > 3 was if faceting only returned terms of length
> greater than 3, not count.
>

Ah, sorry. I was too fast to reply.

-- 
Regards,
Shalin Shekhar Mangar.

Re: How can i get lucene index format version information?

2009-07-30 Thread Chris Hostetter


: > i want to get the lucene index format version from solr web app (as

: the Luke request handler writes it out:
: 
:indexInfo.add("version", reader.getVersion());

that's the index version (as in "i have added docs to the index, so the 
version number has changed") the question is about the format version (as 
in: "i have upgraded Lucene from 2.1 to 2.3, so the index format has 
changed")

I'm not sure how Luke get's that ... it's not exposed via a public API on 
an IndexReader.

Hmm...  SegmentInfos.readCurrentVersion(Directory) seems like it would do 
the trick; but i'm not sure how that would interact with customized 
INdexReader implementations.  i suppose we could always make it non-fatal 
if it throws an exception (just print the exception mesg in place of hte 
number)

anybody want to submit a patch to add this to the LukeRequestHandler?


-Hoss

Re: How can i get lucene index format version information?

2009-07-30 Thread Walter Underwood

I think the properties page in the admin UI lists the Lucene version,  
but I don't have a live server to check that on at this instant.


wunder

On Jul 30, 2009, at 10:26 AM, Chris Hostetter wrote:



: > i want to get the lucene index format version from solr web app  
(as


: the Luke request handler writes it out:
:
:indexInfo.add("version", reader.getVersion());

that's the index version (as in "i have added docs to the index, so  
the
version number has changed") the question is about the format  
version (as

in: "i have upgraded Lucene from 2.1 to 2.3, so the index format has
changed")

I'm not sure how Luke get's that ... it's not exposed via a public  
API on

an IndexReader.

Hmm...  SegmentInfos.readCurrentVersion(Directory) seems like it  
would do

the trick; but i'm not sure how that would interact with customized
INdexReader implementations.  i suppose we could always make it non- 
fatal
if it throws an exception (just print the exception mesg in place of  
hte

number)

anybody want to submit a patch to add this to the LukeRequestHandler?


-Hoss

Re: Posting data in JSON

Hi,

  Nope, I'm not using solrj (my client code is in Perl), and I'm with solr 1.3.

J.

2009/7/30 Shalin Shekhar Mangar :
> On Thu, Jul 30, 2009 at 8:31 PM, Jérôme Etévé 
> wrote:
>>
>> Hi All,
>>
>>  I'm wondering if it's possible to post documents to solr in JSON format.
>>
>> JSON is much faster than XML to get the queries results, so I think
>> it'd be great to be able to post data in JSON to speed up the indexing
>> and lower the network load.
>
> If you are using Java,Solrj on 1.4 (trunk), you can use the binary format
> which is extremely compact and efficient. Note that with Solr/Solrj 1.3,
> binary became the default response format for Solrj clients.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Mailing list: Change the reply too ?

Hi all,

 I don't know if it does the same from everyone, but when I use the
reply function of my mail agent, it sets the recipient to the user who
sent the message, and not the mailing list.

So it's quite annoying cause I have to change the recipient each time
I reply to someone on the list.

Can the list admins fix this issue ?

Cheers !

J.

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Re: Mailing list: Change the reply too ?



On Jul 30, 2009, at 1:44 PM, Jérôme Etévé wrote:


Hi all,

I don't know if it does the same from everyone, but when I use the
reply function of my mail agent, it sets the recipient to the user who
sent the message, and not the mailing list.

So it's quite annoying cause I have to change the recipient each time
I reply to someone on the list.

Can the list admins fix this issue ?


All my replies go to the list.

From your message, the header says:

 Reply-To: solr-user@lucene.apache.org

Erik

Re: Mailing list: Change the reply too ?

2009/7/30 Erik Hatcher :
>
> On Jul 30, 2009, at 1:44 PM, Jérôme Etévé wrote:
>
>> Hi all,
>>
>> I don't know if it does the same from everyone, but when I use the
>> reply function of my mail agent, it sets the recipient to the user who
>> sent the message, and not the mailing list.
>>
>> So it's quite annoying cause I have to change the recipient each time
>> I reply to someone on the list.
>>
>> Can the list admins fix this issue ?
>
> All my replies go to the list.
>
> From your message, the header says:
>
>  Reply-To: solr-user@lucene.apache.org
>
>Erik

It works with your messages. It might depends on mail agents.

Jerome.

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Re: Posting data in JSON

2009-07-30 Thread Ryan McKinley


check:
https://issues.apache.org/jira/browse/SOLR-945

this will not likely make it into 1.4



On Jul 30, 2009, at 1:41 PM, Jérôme Etévé wrote:


Hi,

 Nope, I'm not using solrj (my client code is in Perl), and I'm with  
solr 1.3.


J.

2009/7/30 Shalin Shekhar Mangar :
On Thu, Jul 30, 2009 at 8:31 PM, Jérôme Etévé  


wrote:


Hi All,

I'm wondering if it's possible to post documents to solr in JSON  
format.


JSON is much faster than XML to get the queries results, so I think
it'd be great to be able to post data in JSON to speed up the  
indexing

and lower the network load.


If you are using Java,Solrj on 1.4 (trunk), you can use the binary  
format
which is extremely compact and efficient. Note that with Solr/Solrj  
1.3,

binary became the default response format for Solrj clients.

--
Regards,
Shalin Shekhar Mangar.





--
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Re: Minimum facet length?

2009-07-30 Thread Darren Govoni

Hi Erik,
  Thanks for the tip. H, well that's a good point, or maybe I will
just do the word filtering upfront and store it separately now that I
think about it more.

Darren

On Thu, 2009-07-30 at 13:05 -0400, Erik Hatcher wrote:
> On Jul 30, 2009, at 1:00 PM, Shalin Shekhar Mangar wrote:
> 
> > On Thu, Jul 30, 2009 at 9:53 PM,  wrote:
> >
> >> Hi,
> >> I am exploring the faceted search results of Solr. My query is like  
> >> this.
> >>
> >>
> >> http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=text&facet.limit=500&facet.prefix=wick
> >>
> >> If I don't use the prefix, I get back totals for words like 1,a,of, 
> >> 2,3,4.
> >> 1 letter/number occurrences in my documents. Its not really useful  
> >> since
> >> all the documents have some free floating single-digit numbers.
> >>
> >> Is there a way to restrict the word frequency results for a facet  
> >> based on
> >> the length so I can set it to > 3 or is there a better way?
> >>
> >
> > Yes, you can specify facet.mincount=3 to return only those terms  
> > present in
> > more than 3 documents. On a related note, a tokenized field (such as  
> > text
> > type in the example schema) will create a large number of unqiue  
> > terms.
> > Faceting on such a field may not be very useful and/or efficient.  
> > Typically
> > faceting is done on untokenized fields (such as string type).
> 
> I think what was meant by > 3 was if faceting only returned terms of  
> length greater than 3, not count.
> 
> You could copyField your text field to another field, set the analyzer  
> to include a LengthFilterFactory with a minimum length specified, and  
> also have other analysis tweaks to have numbers and other stop words  
> removed.
> 
>   Erik
>

Re: Mailing list: Change the reply too ?

2009-07-30 Thread Chris Hostetter


:  I don't know if it does the same from everyone, but when I use the
: reply function of my mail agent, it sets the recipient to the user who
: sent the message, and not the mailing list.
: 
: So it's quite annoying cause I have to change the recipient each time
: I reply to someone on the list.
: 
: Can the list admins fix this issue ?

The list software allways adds a "Reply-To" header indicating that replies 
should be sent to the list.  It does *not* remove any existing Reply-To 
headers that the orriginal sender may have included -- it does this 
because it trusts that the orriginal sender had a reason for putting it 
there (ie: when someone off list, like the apachecon coordinators, sends 
an announcement and the moderators let it through)

It's mail client dependant as to what to do when you reply to a message 
like that -- yours apparently just picks one (and sometimes it's not the 
list) most either reply to both, or ask the user "do you want to reply to 
all"


-Hoss

Reasonable number of maxWarming searchers

Hi All,

 I'm planning to have a certain number of processes posting
independently in a solr instance.
 This instance will solely act as a master instance. No clients queries on it.

 Is there a problem if i set maxWarmingSearchers to something like 30 or 40?
 Also, how do I disable the cache warming? Is setting autowarmCount's
to 0 enough?


 Regards,

 Jerome.

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Re: Reasonable number of maxWarming searchers

I recommend, in this case, that you use Solr's autocommit feature (see  
solrconfig.xml) rather than having your indexing clients issue their  
own commits.  Overlapped searcher warming is just going to be too much  
of a hit on RAM, and generally unnecessary with autocommit.


Erik

On Jul 30, 2009, at 2:28 PM, Jérôme Etévé wrote:


Hi All,

I'm planning to have a certain number of processes posting
independently in a solr instance.
This instance will solely act as a master instance. No clients  
queries on it.


Is there a problem if i set maxWarmingSearchers to something like 30  
or 40?

Also, how do I disable the cache warming? Is setting autowarmCount's
to 0 enough?


Regards,

Jerome.

--
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

What does "showItems" config mean on fieldValueCache mean?

2009-07-30 Thread Stephen Duncan Jr

What's the effect of showItems attribute on the fieldValueCache in Solr 1.4?

-- 
Stephen Duncan Jr
www.stephenduncanjr.com

How to get a stack trace

2009-07-30 Thread Nicolae Mihalache

Hello,

I'm a new user of solr but I have worked a bit with Lucene before. I get some 
out of memory exception when optimizing the index through Solr and I would like 
to find out why.
However, the only message I get on standard output is:
Jul 30, 2009 9:20:22 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.OutOfMemoryError: Java heap space

Is there a way to get a stack trace for this exception? I had a look into the 
java.util.logging options and didn't find anything. 

My solr runs in some standard configuration inside jetty.
Any suggestion would be appreciated.

Thanks,
nicolae

Problem with retrieving field from database using DIH

2009-07-30 Thread ahammad

Hello all,

I've been having this issue for a while now. I am indexing a Sybase
database. Everything is fantastic, except that there is 1 column that I can
never get back. I don't have direct database access via Sybase client, but I
was able to extract the data using some Java code.

The field is essentially a Last Modified field. In the DB I believe that it
is of type long. In the Java program that I have, I am able to retrieve the
data that is in that column and put it in a variable of type Long. This is
not the case in Solr, however.

I set the variable in the schema as required to see why the data is never
stored:

This is what I get in the Tomcat logs:

org.apache.solr.common.SolrException: Document [00069391] missing required
field: lastModified
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:292)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:59)
at
org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:67)
at
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:276)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:373)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:224)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:167)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:316)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:374)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:355)

>From what I can gather, it is not finding the data and/or column, and thus
cannot populate the required field. However, the data is there, which I was
able to prove outside of Solr.

Is there a way to generate more descriptive logs for this? I am completely
lost. I hit this problem a few months ago but I was never able to resolve
it. Any help on this will be much appreciated.

BTW, Solr was successful in retrieving data from other columns in the same
table...

Thanks
--
View this message in context:
http://www.nabble.com/Problem-with-retrieving-field-from-database-using-DIH-tp24746530p24746530.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: What does "showItems" config mean on fieldValueCache mean?



On Jul 30, 2009, at 3:32 PM, Stephen Duncan Jr wrote:

What's the effect of showItems attribute on the fieldValueCache in  
Solr 1.4?


Just outputs details of the last accessed items from the cache in the  
stats display.


Erik

if (showItems != 0) {
  Map items = cache.getLatestAccessedItems( showItems == -1 ?  
Integer.MAX_VALUE : showItems );

  for (Map.Entry e : (Set )items.entrySet()) {
Object k = e.getKey();
Object v = e.getValue();

String ks = "item_" + k;
String vs = v.toString();
lst.add(ks,vs);
  }

}

Solr/Lucene performance differences on Mac OS X running Tiger vs. Leopard ?

2009-07-30 Thread Mark Bennett

As far as our NOC guys know the machines are approximately the same, aside
from the OS.  The Leopard machine is running the default 1.5 JVM.

And it's possible that some other application or config issues is to blame.
Nobody's "blaming" the OS or Lucene, we're just asking around.

Searches on Google haven't turned up any reports, so I'm suspecting the
issue lies elsewhere.  Also I've run on Leopard for months without any
performance issues, though I really don't tax anything on my workstation.

--
Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513

Re: What does "showItems" config mean on fieldValueCache mean?

2009-07-30 Thread Stephen Duncan Jr

On Thu, Jul 30, 2009 at 4:18 PM, Erik Hatcher wrote:

>
> On Jul 30, 2009, at 3:32 PM, Stephen Duncan Jr wrote:
>
>  What's the effect of showItems attribute on the fieldValueCache in Solr
>> 1.4?
>>
>
> Just outputs details of the last accessed items from the cache in the stats
> display.
>
>Erik
>
>if (showItems != 0) {
>  Map items = cache.getLatestAccessedItems( showItems == -1 ?
> Integer.MAX_VALUE : showItems );
>  for (Map.Entry e : (Set )items.entrySet()) {
>Object k = e.getKey();
>Object v = e.getValue();
>
>String ks = "item_" + k;
>String vs = v.toString();
>lst.add(ks,vs);
>  }
>
>}
>
>
Makes sense.  Thanks!

-- 
Stephen Duncan Jr
www.stephenduncanjr.com

Re: Problem with retrieving field from database using DIH

On Fri, Jul 31, 2009 at 1:43 AM, ahammad  wrote:

> From what I can gather, it is not finding the data and/or column, and thus
> cannot populate the required field. However, the data is there, which I was
> able to prove outside of Solr.
>
> Is there a way to generate more descriptive logs for this? I am completely
> lost. I hit this problem a few months ago but I was never able to resolve
> it. Any help on this will be much appreciated.
>

Can you try using the debug mode and see what your sql query is returning?
You can either use the /admin/dataimport.jsp or add a debug=on&verbose=true
parameter to the import. You should probably limit the number of documents
to be indexed by adding rows=X to the full-import command otherwise the
response would be huge.

-- 
Regards,
Shalin Shekhar Mangar.

Re: µTorrent indexed as ÂµTorrent

2009-07-30 Thread Bill Au

Thanks, Robert.  That's exactly what my problem was.  Things work find after
I make sure that all my processing (index and query) are using UTF-8.  FYI,
it took me a while to discover that SolrJ by default uses a GET request for
query, which uses ISO-8859-1.  I had to explicitly use a POST to do query in
SolrJ in order to get it to use UTF-8.

Bill

On Tue, Jul 28, 2009 at 5:27 PM, Robert Muir  wrote:

> Bill, somewhere in the process I think you might be treating your
> UTF-8 text as ISO-8859-1.
>
> Your character: 00B5 (µ)
> Bits: 10110101
>
> UTF8-encoded: 1110 10110101
>
> If you were to treat these bytes as ISO-8859-1 (i.e. reading from a
> file or wrong url encoding) then it looks like:
> 0xC2 (Å) followed by 0xB5 (µ)
>
>
> On Tue, Jul 28, 2009 at 3:26 PM, Bill Au wrote:
> > I am using SolrJ to index the word µTorrent.  After a commit I was not
> able
> > to query for it.  It turns out that the document in my Solr index
> contains
> > the word ÂµTorrent instead of µTorrent.  Any one has any idea what's
> going
> > on???
> >
> > Bill
> >
>
>
>
> --
> Robert Muir
> rcm...@gmail.com
>

Re: µTorrent indexed as ÂµTorrent

2009-07-30 Thread Yonik Seeley

On Thu, Jul 30, 2009 at 6:34 PM, Bill Au wrote:
>  FYI, it took me a while to discover that SolrJ by default uses a GET request 
> for
> query, which uses ISO-8859-1.

That depends on the servlet container.  SolrJ GET requests are sent in
UTF-8.  Some servlet containers such as Tomcat need extra
configuration to treat URLs as UTF-8 instead of latin-1, but the
standard http://www.ietf.org/rfc/rfc3986.txt clearly specifies UTF-8.

To test the servlet container configuration, check out
example/exampledocs/test_utf8.sh

-Yonik
http://www.lucidimagination.com

  I had to explicitly use a POST to do query in
> SolrJ in order to get it to use UTF-8.
>
> Bill
>
> On Tue, Jul 28, 2009 at 5:27 PM, Robert Muir  wrote:
>
>> Bill, somewhere in the process I think you might be treating your
>> UTF-8 text as ISO-8859-1.
>>
>> Your character: 00B5 (µ)
>> Bits: 10110101
>>
>> UTF8-encoded: 1110 10110101
>>
>> If you were to treat these bytes as ISO-8859-1 (i.e. reading from a
>> file or wrong url encoding) then it looks like:
>> 0xC2 (Å) followed by 0xB5 (µ)
>>
>>
>> On Tue, Jul 28, 2009 at 3:26 PM, Bill Au wrote:
>> > I am using SolrJ to index the word µTorrent.  After a commit I was not
>> able
>> > to query for it.  It turns out that the document in my Solr index
>> contains
>> > the word ÂµTorrent instead of µTorrent.  Any one has any idea what's
>> going
>> > on???
>> >
>> > Bill
>> >
>>
>>
>>
>> --
>> Robert Muir
>> rcm...@gmail.com
>>
>

facet sorting by index on sint fields

2009-07-30 Thread Simon Stanlake

Hi,
I have a field in my schema specified using



Where "sint" is specified as follows (the default from schema.xml)



When I do a facet on this field using sort=index I always get the values back 
in lexicographic order. Eg: adding this to a query string...

facet=true&facet.field=wordCount&f.wordCount.facet.sort=index

gives me

5
2
6
...

Is this a current limitation of solr faceting or am I missing a configuration 
step somewhere? I couldn't find any notes in the docs about this.

Cheers,
Simon

Re: How can i get lucene index format version information?

2009-07-30 Thread Jay Hill

Check the system request handler: http://localhost:8983/solr/admin/system

Should look something like this:

1.3.0.2009.07.28.10.39.42
1.4-dev 797693M - jayhill - 2009-07-28
10:39:42
2.9-dev
2.9-dev 794238 - 2009-07-15 18:05:08


-Jay


On Thu, Jul 30, 2009 at 10:32 AM, Walter Underwood wrote:

> I think the properties page in the admin UI lists the Lucene version, but I
> don't have a live server to check that on at this instant.
>
> wunder
>
>
> On Jul 30, 2009, at 10:26 AM, Chris Hostetter wrote:
>
>
>> : > i want to get the lucene index format version from solr web app (as
>>
>> : the Luke request handler writes it out:
>> :
>> :indexInfo.add("version", reader.getVersion());
>>
>> that's the index version (as in "i have added docs to the index, so the
>> version number has changed") the question is about the format version (as
>> in: "i have upgraded Lucene from 2.1 to 2.3, so the index format has
>> changed")
>>
>> I'm not sure how Luke get's that ... it's not exposed via a public API on
>> an IndexReader.
>>
>> Hmm...  SegmentInfos.readCurrentVersion(Directory) seems like it would do
>> the trick; but i'm not sure how that would interact with customized
>> INdexReader implementations.  i suppose we could always make it non-fatal
>> if it throws an exception (just print the exception mesg in place of hte
>> number)
>>
>> anybody want to submit a patch to add this to the LukeRequestHandler?
>>
>>
>> -Hoss
>>
>
>

Re: query in solr lucene

2009-07-30 Thread Sushan Rungta


I tried this but this didn't worked...

Regards,
Sushan

At 12:37 AM 7/30/2009, Avlesh Singh wrote:

You may index your data using a delimiter, like $my-field-content$. While
searching, perform a phrase query with the leading and trailing "$" appended
to the query string.

Cheers
Avlesh

On Wed, Jul 29, 2009 at 12:04 PM, Sushan Rungta  wrote:

> I tried using AND, but it even provided me doc 3 which was not required.
>
> Hence my problem still persists...
>
> regards,
> Sushan
>
>
> At 06:59 AM 7/29/2009, Avlesh Singh wrote:
>
>> >
>> > No, phrase query would match docs 2 and 3. Sushan only wantsdoc 2 as I
>> read
>> > it.
>> >
>> Sorry, my bad. I did not read properly before replying.
>>
>> Cheers
>> Avlesh
>>
>> On Wed, Jul 29, 2009 at 3:23 AM, Erick Erickson > >wrote:
>>
>> > No, phrase query would match docs 2 and 3. Sushan only wantsdoc 2 as I
>> read
>> > it.
>> >
>> > You might have some joy with KeywordAnalyzer, which does
>> > not break the incoming stream up into tokens. You have to be
>> > careful, though, because it also won't fold the case, so 'Hello'
>> > would not match 'hello'.
>> >
>> > Best
>> > Erick
>> >
>> > On Tue, Jul 28, 2009 at 11:11 AM, Avlesh Singh 
>> wrote:
>> >
>> > > You should perform a PhraseQuery on the required field.
>> > > Meaning, http://your-solr-host:port:
>> > > /your-core-path/select?q=fieldName:"Hello
>> > > how are you sushan" would work for you.
>> > >
>> > > Cheers
>> > > Avlesh
>> > >
>> > > 2009/7/28 Gérard Dupont 
>> > >
>> > > > Hi Sushan,
>> > > >
>> > > > I'm not an expert of Solr, just beginner, but it appears to me that
>> you
>> > > >  may
>> > > > have default 'OR' combinaison fo keywords so that will explain this
>> > > > behavior. Try to modify the configuration for an 'AND' combinaison.
>> > > >
>> > > > cheers
>> > > >
>> > > > On Tue, Jul 28, 2009 at 16:49, Sushan Rungta 
>> > wrote:
>> > > >
>> > > > > I am extremely sorry for responding late as I was ill from past
>> few
>> > > days.
>> > > > >
>> > > > > My problem is explained below with an example:
>> > > > >
>> > > > > I am having three documents with following list:
>> > > > >
>> > > > > 1. Hello how are you
>> > > > > 2. Hello how are you sushan
>> > > > > 3. Hello how are you sushan. I am fine.
>> > > > >
>> > > > > When I search for a query "Hello how are you sushan", I should
>> only
>> > get
>> > > > > document 2 in my result.
>> > > > >
>> > > > > I hope this will give you all a better insight in my problem.
>> > > > >
>> > > > > regards,
>> > > > >
>> > > > > Sushan Rungta
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > Gérard Dupont
>> > > > Information Processing Control and Cognition (IPCC) - EADS DS
>> > > > http://weblab-project.org
>> > > >
>> > > > Document & Learning team - LITIS Laboratory
>> > > >
>> > >
>> >
>>
>
>
>

Using DIH for parallel indexing

I am using Solr 1.3 and have a few questions regarding DIH:

   1. Can I pass parameters to DIH and be able to use them inside the
   "query" attribute of an entity inside the data-config file?
   2. I am indexing some 2 million database records using DIH with 4-5
   nested entities (just one level). These subqueries are highly optimized
   cannot be avoided. Since, DIH processes records sequentially, it takes a lot
   of time (approximately 3 hours) to re-build the indexes. My question is -
   Can I use DIH in someway so that indexing can be carried out in parallel?
   3. What happens if I "register" multiple DIH's (like dih1, dih2, dih3
   ...) with different data-config files inside the same core and run
   full-import on each of them at the same time? Are the indexes created by
   each of these (inside the same data directory) merged?

Due to my lack of knowledge on Lucene/Solr internals, some of these
questions might be funny.

Cheers
Avlesh

Re: query in solr lucene

What field type are you using? What kind of filters have you applied on the
field?
The easiest way to make it work it to use a "string" field.

Cheers
Avlesh

On Fri, Jul 31, 2009 at 11:09 AM, Sushan Rungta  wrote:

> I tried this but this didn't worked...
>
> Regards,
> Sushan
>
> At 12:37 AM 7/30/2009, Avlesh Singh wrote:
>
>> You may index your data using a delimiter, like $my-field-content$. While
>> searching, perform a phrase query with the leading and trailing "$"
>> appended
>> to the query string.
>>
>> Cheers
>> Avlesh
>>
>> On Wed, Jul 29, 2009 at 12:04 PM, Sushan Rungta 
>> wrote:
>>
>> > I tried using AND, but it even provided me doc 3 which was not required.
>> >
>> > Hence my problem still persists...
>> >
>> > regards,
>> > Sushan
>> >
>> >
>> > At 06:59 AM 7/29/2009, Avlesh Singh wrote:
>> >
>> >> >
>> >> > No, phrase query would match docs 2 and 3. Sushan only wantsdoc 2 as
>> I
>> >> read
>> >> > it.
>> >> >
>> >> Sorry, my bad. I did not read properly before replying.
>> >>
>> >> Cheers
>> >> Avlesh
>> >>
>> >> On Wed, Jul 29, 2009 at 3:23 AM, Erick Erickson <
>> erickerick...@gmail.com
>> >> >wrote:
>> >>
>> >> > No, phrase query would match docs 2 and 3. Sushan only wantsdoc 2 as
>> I
>> >> read
>> >> > it.
>> >> >
>> >> > You might have some joy with KeywordAnalyzer, which does
>> >> > not break the incoming stream up into tokens. You have to be
>> >> > careful, though, because it also won't fold the case, so 'Hello'
>> >> > would not match 'hello'.
>> >> >
>> >> > Best
>> >> > Erick
>> >> >
>> >> > On Tue, Jul 28, 2009 at 11:11 AM, Avlesh Singh 
>> >> wrote:
>> >> >
>> >> > > You should perform a PhraseQuery on the required field.
>> >> > > Meaning, http://your-solr-host:port:
>> >> > > /your-core-path/select?q=fieldName:"Hello
>> >> > > how are you sushan" would work for you.
>> >> > >
>> >> > > Cheers
>> >> > > Avlesh
>> >> > >
>> >> > > 2009/7/28 Gérard Dupont 
>> >> > >
>> >> > > > Hi Sushan,
>> >> > > >
>> >> > > > I'm not an expert of Solr, just beginner, but it appears to me
>> that
>> >> you
>> >> > > >  may
>> >> > > > have default 'OR' combinaison fo keywords so that will explain
>> this
>> >> > > > behavior. Try to modify the configuration for an 'AND'
>> combinaison.
>> >> > > >
>> >> > > > cheers
>> >> > > >
>> >> > > > On Tue, Jul 28, 2009 at 16:49, Sushan Rungta 
>> >> > wrote:
>> >> > > >
>> >> > > > > I am extremely sorry for responding late as I was ill from past
>> >> few
>> >> > > days.
>> >> > > > >
>> >> > > > > My problem is explained below with an example:
>> >> > > > >
>> >> > > > > I am having three documents with following list:
>> >> > > > >
>> >> > > > > 1. Hello how are you
>> >> > > > > 2. Hello how are you sushan
>> >> > > > > 3. Hello how are you sushan. I am fine.
>> >> > > > >
>> >> > > > > When I search for a query "Hello how are you sushan", I should
>> >> only
>> >> > get
>> >> > > > > document 2 in my result.
>> >> > > > >
>> >> > > > > I hope this will give you all a better insight in my problem.
>> >> > > > >
>> >> > > > > regards,
>> >> > > > >
>> >> > > > > Sushan Rungta
>> >> > > > >
>> >> > > >
>> >> > > >
>> >> > > >
>> >> > > > --
>> >> > > > Gérard Dupont
>> >> > > > Information Processing Control and Cognition (IPCC) - EADS DS
>> >> > > > http://weblab-project.org
>> >> > > >
>> >> > > > Document & Learning team - LITIS Laboratory
>> >> > > >
>> >> > >
>> >> >
>> >>
>> >
>> >
>> >
>>
>
>
>

Re: Using DIH for parallel indexing

On Fri, Jul 31, 2009 at 11:11 AM, Avlesh Singh wrote:
> I am using Solr 1.3 and have a few questions regarding DIH:
>
>   1. Can I pass parameters to DIH and be able to use them inside the
>   "query" attribute of an entity inside the data-config file?
>   2. I am indexing some 2 million database records using DIH with 4-5
>   nested entities (just one level). These subqueries are highly optimized
>   cannot be avoided. Since, DIH processes records sequentially, it takes a lot
>   of time (approximately 3 hours) to re-build the indexes. My question is -
>   Can I use DIH in someway so that indexing can be carried out in parallel?
>   3. What happens if I "register" multiple DIH's (like dih1, dih2, dih3
>   ...) with different data-config files inside the same core and run
>   full-import on each of them at the same time? Are the indexes created by
>   each of these (inside the same data directory) merged?
yes it is possible to create muiltiple instances of DIH as you
mentioned. The only drawback is it would result in multiple commits.

All the data will be written to the same index together
>
> Due to my lack of knowledge on Lucene/Solr internals, some of these
> questions might be funny.
>
> Cheers
> Avlesh
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: NullPointerException in DataImportHandler

On Thu, Jul 30, 2009 at 9:45 PM, Andrew Clegg wrote:
>
>
> Erik Hatcher wrote:
>>
>>
>> On Jul 30, 2009, at 11:54 AM, Andrew Clegg wrote:
>>>            >> url="${domain.pdb_code}-noatom.xml" processor="XPathEntityProcessor"
>>> forEach="/">
>>>                >> xpath="//*[local-name()='structCategory']/*[local-name()='struct']/
>>> *[local-name()='title']"
>>> />
>>
>> The XPathEntityProcessor doesn't support that fancy of an xpath - it
>> supports only a limited subset.  Try /structCategory/struct/title
>> perhaps?
>>
>>
>
> Sadly not...
>
> I tried with:
>
>                 xpath="/datablock/structCategory/struct/title" />
>
> (full path from root)
>
> and
>
>                 xpath="//structCategory/struct/title" />
>
> Same ArrayIndex error each time.
>
> Doesn't it use javax.xml then? I was using the complex local-name
> expressions to make it namespace-agnostic -- is it agnostic anyway?
it does not use javax.xml because those work on a DOM tree which is
not usable for large xml files.

This only supports a subset of xpath. The supported syntax is given here

http://wiki.apache.org/solr/DataImportHandler#head-5ced7c797f1014ef6e8326a34c23f541ebbaadf1-2


>
> Thanks,
>
> Andrew.
>
> --
> View this message in context: 
> http://www.nabble.com/NullPointerException-in-DataImportHandler-tp24739580p24741696.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Using DIH for parallel indexing