Re: search suggest
On Thu, Jul 30, 2009 at 4:52 AM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: > I created an issue and have added some notes > https://issues.apache.org/jira/browse/SOLR-1316 > Also see https://issues.apache.org/jira/browse/SOLR-706 -- Regards, Shalin Shekhar Mangar.
solr-user@lucene.apache.org
Good Morning SolR :-) its morning in Germany! i have a Problem, with the Indexing... I often become an Error. I think it is because in the XML stand this Character "&" I need the Character, what happens? SimplePostTool: FATAL: Solr returned an error: comctcwstxexcWstxLazyException_Unexpected_character___code_32_missing_name__at_rowcol_unknownsource_1465__comctcwstxexcWstxLazyException_comctcwstxexcWstxUnexpectedCharException_Unexpected_character___code_32_missing_name__at_rowcol_unknownsource_1465__at_comctcwstxexcWstxLazyExceptionthrowLazilyWstxLazyExceptionjava45__at_comctcwstxsrStreamScannerthrowLazyErrorStreamScannerjava729__at_comctcwstxsrBasicStreamReadersafeFinishTokenBasicStreamReaderjava3659__at_comctcwstxsrBasicStreamReadergetTextBasicStreamReaderjava809__at_orgapachesolrhandlerXMLLoaderreadDocXMLLoaderjava278__at_orgapachesolrhandlerXMLLoaderprocessUpdateXMLLoaderjava139__at_orgapachesolrhandlerXMLLoaderloadXMLLoaderjava69__at_orgapachesolrhandlerContentStreamHandlerBasehandleRequestBodyContentStreamHandlerBasejava54__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava131__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava1299__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava338__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava241__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHt _
Re: solr indexing on same set of records with different value of unique field, not working fine.
FYI Attached schema.xml file. And the add doc xml snippets are, 501 ESQ.VISION.A72 201 CpuLoopEnd Process=$Z4B1 CpuPin=0,992 Program=\VEGAS.$SYSTEM.SYS00.MEASFH Terminal=\VEGAS.$TSPM.#TERM CpuBusy=0 MemPage=24 User=50,10 \VEGAS.$QQDS PLGOVNPM 2008-10-07T03:00:30.0Z 2008-10-07T10:02:27.95Z 1247905648000 . i just load the currentTimeStamps long value into the add doc xml to load into solr. Chris Hostetter wrote: I'm not really understanding how you could get the situation you describe ... which suggests that one (or both) of us don't understand exactly what happened. if you can post the actual schema.xml file you used and an example of the input you indexed perhaps we can spot the discrepency. FWIW: using a timestamp as a uniqueKey doesn't make much sense ... 1) if you have heavy parallelization two docs indexed at the exact same time might overwrite eachother. 2) you have no way of ever replacing an existing doc (unless you roll the clock back) in which case there's no advantage to using a uniqueKey -- so you might as leave it out of your schema (which makes indexing slightly faster) : I need to run around 10 million records to index, by solr. : I has nearly 2lakh records, so i made a program to looping it till 10 million. : Here, i specified 20 fields in schema.xml file. the unoque field i set was, : currentTimeStamp field. : So, when i run the loader program (which loads xml data into solr) it creates : currentTimestamp value...and loads into solr. : : For this situation, : i stopped the loader program, after 100 records indexed into solr. : Then again, i run the loader program for the SAME 100 records to indexed : means, : the solr results 100, rather than 200. : : Because, i set currentTimeStamp field as uniqueField. So i expect the result : as 200, if i run again the same 100 records... : : Any suggestions please... -Hoss
Re: solr indexing on same set of records with different value of unique field, not working fine.
Sorry, schema.xml file is here in this mail... noor wrote: FYI Attached schema.xml file. And the add doc xml snippets are, 501 ESQ.VISION.A72 201 CpuLoopEnd Process=$Z4B1 CpuPin=0,992 Program=\VEGAS.$SYSTEM.SYS00.MEASFH Terminal=\VEGAS.$TSPM.#TERM CpuBusy=0 MemPage=24 User=50,10 \VEGAS.$QQDS PLGOVNPM 2008-10-07T03:00:30.0Z 2008-10-07T10:02:27.95Z 1247905648000 . i just load the currentTimeStamps long value into the add doc xml to load into solr. Chris Hostetter wrote: I'm not really understanding how you could get the situation you describe ... which suggests that one (or both) of us don't understand exactly what happened. if you can post the actual schema.xml file you used and an example of the input you indexed perhaps we can spot the discrepency. FWIW: using a timestamp as a uniqueKey doesn't make much sense ... 1) if you have heavy parallelization two docs indexed at the exact same time might overwrite eachother. 2) you have no way of ever replacing an existing doc (unless you roll the clock back) in which case there's no advantage to using a uniqueKey -- so you might as leave it out of your schema (which makes indexing slightly faster) : I need to run around 10 million records to index, by solr. : I has nearly 2lakh records, so i made a program to looping it till 10 million. : Here, i specified 20 fields in schema.xml file. the unoque field i set was, : currentTimeStamp field. : So, when i run the loader program (which loads xml data into solr) it creates : currentTimestamp value...and loads into solr. : : For this situation, : i stopped the loader program, after 100 records indexed into solr. : Then again, i run the loader program for the SAME 100 records to indexed : means, : the solr results 100, rather than 200. : : Because, i set currentTimeStamp field as uniqueField. So i expect the result : as 200, if i run again the same 100 records... : : Any suggestions please... -Hoss evid text
Skipping fields from XML
Hi, I want to index a perfectly good solr XML-file into an Solr/Lucene instance. The problem is that the XML has many fields that I don't want to be indexed. I tried to index the file but Solr gives me an error because the XML contains fields that I have not declared in my schema.xml How can I tell Solr to skip unwanted fields and only index the fields that I have declared in my schema.xml? I know it must be something with a catchall setting and / or copyFields but I can not get the configuration right. To be clear. I want Solr to index / store only a few fields from the XML-file to be indexed and skip all the other fields. An answer or a link to a good reference would help.
Re: Skipping fields from XML
I don't think there is a way to do that. On Thu, Jul 30, 2009 at 1:39 PM, Edwin Stauthamer wrote: > Hi, > > I want to index a perfectly good solr XML-file into an Solr/Lucene instance. > The problem is that the XML has many fields that I don't want to be indexed. > > I tried to index the file but Solr gives me an error because the XML > contains fields that I have not declared in my schema.xml > > How can I tell Solr to skip unwanted fields and only index the fields that I > have declared in my schema.xml? > > I know it must be something with a catchall setting and / or copyFields but > I can not get the configuration right. To be clear. I want Solr to index / > store only a few fields from the XML-file to be indexed and skip all the > other fields. > > An answer or a link to a good reference would help. > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Skipping fields from XML
: I want Solr to index / store only a few fields from the XML-file to be : indexed and skip all the other fields. I think Dynamic fields [1] can help you. [1] http://wiki.apache.org/solr/SchemaXml#head-82dba16404c8e3318021320638b669b3a6d780d0
solr-user@lucene.apache.org
Any chance of getting that stack trace as more than one line? :) Also, where are you posting your documents from? (e.g. Java, PHP, command line etc). It sounds like you're not using 'entities' for your '&' characters (ampersands) in your XML. These should be converted to "&" This should look familiar if you've ever written any HTML. On 30 Jul 2009, at 09:44, Jörg Agatz wrote: Good Morning SolR :-) its morning in Germany! i have a Problem, with the Indexing... I often become an Error. I think it is because in the XML stand this Character "&" I need the Character, what happens? SimplePostTool: FATAL: Solr returned an error: comctcwstxexcWstxLazyException_Unexpected_character___code_32_missing_ name__at_rowcol_unknownsource_1465__comctcwstxexcWstxLazyException_com ctcwstxexcWstxUnexpectedCharException_Unexpected_character___code_32_m issing_name__at_rowcol_unknownsource_1465__at_comctcwstxexcWstxLazyExc eptionthrowLazilyWstxLazyExceptionjava45__at_comctcwstxsrStreamScanner throwLazyErrorStreamScannerjava729__at_comctcwstxsrBasicStreamReadersa feFinishTokenBasicStreamReaderjava3659__at_comctcwstxsrBasicStreamRead ergetTextBasicStreamReaderjava809__at_orgapachesolrhandlerXMLLoaderrea dDocXMLLoaderjava278__at_orgapachesolrhandlerXMLLoaderprocessUpdateXML Loaderjava139__at_orgapachesolrhandlerXMLLoaderloadXMLLoaderjava69__at _orgapachesolrhandlerContentStreamHandlerBasehandleRequestBodyContentS treamHandlerBasejava54__at_orgapachesolrhandlerRequestHandlerBasehandl eRequestRequestHandlerBasejava131__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava1299__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava338__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava241__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHt _ -- Toby Cole Software Engineer, Semantico Limited Registered in England and Wales no. 03841410, VAT no. GB-744614334. Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK. Check out all our latest news and thinking on the Discovery blog http://blogs.semantico.com/discovery-blog/
Re: Skipping fields from XML
Edwin Stauthamer wrote: Hi, I want to index a perfectly good solr XML-file into an Solr/Lucene instance. The problem is that the XML has many fields that I don't want to be indexed. I tried to index the file but Solr gives me an error because the XML contains fields that I have not declared in my schema.xml How can I tell Solr to skip unwanted fields and only index the fields that I have declared in my schema.xml? How about using "ignored" type for the fields which you don't want to be indexed: class="solr.StrField" /> : Koji I know it must be something with a catchall setting and / or copyFields but I can not get the configuration right. To be clear. I want Solr to index / store only a few fields from the XML-file to be indexed and skip all the other fields. An answer or a link to a good reference would help.
solr-user@lucene.apache.org
Indeed, or enclose the text in CDATA tags which should work as well. On Thu, 2009-07-30 at 09:52 +0100, Toby Cole wrote: > Any chance of getting that stack trace as more than one line? :) > Also, where are you posting your documents from? (e.g. Java, PHP, > command line etc). > > It sounds like you're not using 'entities' for your '&' characters > (ampersands) in your XML. > These should be converted to "&" This should look familiar if > you've ever written any HTML. > > > On 30 Jul 2009, at 09:44, Jörg Agatz wrote: > > > Good Morning SolR :-) its morning in Germany! > > > > i have a Problem, with the Indexing... > > > > I often become an Error. > > > > I think it is because in the XML stand this Character "&" > > I need the Character, what happens? > > > > > > SimplePostTool: FATAL: Solr returned an error: > > > comctcwstxexcWstxLazyException_Unexpected_character___code_32_missing_ > > > name__at_rowcol_unknownsource_1465__comctcwstxexcWstxLazyException_com > > > ctcwstxexcWstxUnexpectedCharException_Unexpected_character___code_32_m > > > issing_name__at_rowcol_unknownsource_1465__at_comctcwstxexcWstxLazyExc > > > eptionthrowLazilyWstxLazyExceptionjava45__at_comctcwstxsrStreamScanner > > > throwLazyErrorStreamScannerjava729__at_comctcwstxsrBasicStreamReadersa > > > feFinishTokenBasicStreamReaderjava3659__at_comctcwstxsrBasicStreamRead > > > ergetTextBasicStreamReaderjava809__at_orgapachesolrhandlerXMLLoaderrea > > > dDocXMLLoaderjava278__at_orgapachesolrhandlerXMLLoaderprocessUpdateXML > > > Loaderjava139__at_orgapachesolrhandlerXMLLoaderloadXMLLoaderjava69__at > > > _orgapachesolrhandlerContentStreamHandlerBasehandleRequestBodyContentS > > > treamHandlerBasejava54__at_orgapachesolrhandlerRequestHandlerBasehandl > > > eRequestRequestHandlerBasejava131__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava1299__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava338__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava241__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHt > > > > _ > > -- > > Toby Cole > Software Engineer, Semantico Limited > > Registered in England and Wales no. 03841410, VAT no. GB-744614334. > Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK. > > Check out all our latest news and thinking on the Discovery blog > http://blogs.semantico.com/discovery-blog/ >
Re: Skipping fields from XML
> How can I tell Solr to skip unwanted fields and only index > the fields that I have declared in my schema.xml? More precisely: (taken from schema.xml)
Re: Skipping fields from XML
perfect! That resolved my issue. BTW. This was my first posting on this list. I must say that the responses were quick and to the point!!! Good community help! On Thu, Jul 30, 2009 at 10:58 AM, AHMET ARSLAN wrote: > > > How can I tell Solr to skip unwanted fields and only index > > the fields that I have declared in my schema.xml? > > More precisely: (taken from schema.xml) > > > > > > > -- Met vriendelijke groet / Kind regards, Edwin Stauthamer Adviser Search & Collaboration Emid Consult T: +31 (0) 70 8870700 M: +31 (0) 6 4555 4994 E: estautha...@emidconsult.com I: http://www.emidconsult.com
solr-user@lucene.apache.org
Also, i use the Comandline tool "java .jar post.jar xyz.xml" i donkt know what you are mean with It sounds like you're not using 'entities' for your '&' characters (ampersands) in your XML. These should be converted to "&" This should look familiar if you've ever written any HTML. I dont understand this i musst change even & to & ?
solr-user@lucene.apache.org
On Jul 30, 2009, at 6:17 AM, Jörg Agatz wrote: Also, i use the Comandline tool "java .jar post.jar xyz.xml" i donkt know what you are mean with It sounds like you're not using 'entities' for your '&' characters (ampersands) in your XML. These should be converted to "&" This should look familiar if you've ever written any HTML. I dont understand this i musst change even & to & ? Yes, if you need an ampersand in an XML element, it must be escaped: Harold & Maude Erik
solr-user@lucene.apache.org
On 30 Jul 2009, at 11:17, Jörg Agatz wrote: It sounds like you're not using 'entities' for your '&' characters (ampersands) in your XML. These should be converted to "&" This should look familiar if you've ever written any HTML. I dont understand this i musst change even & to & ? Yes, '&' characters aren't allowed in XML unless they are either in a CDATA section or part of an 'entity'. A good place to read up on this is: http://www.xml.com/pub/a/2001/01/31/qanda.html In short, replace all your & with & -- Toby Cole Software Engineer, Semantico Limited Registered in England and Wales no. 03841410, VAT no. GB-744614334. Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK. Check out all our latest news and thinking on the Discovery blog http://blogs.semantico.com/discovery-blog/
Re: Multi select faceting
On Jul 29, 2009, at 2:38 PM, Mike wrote: Hi, We're using Lucid Imagination's LucidWorks Solr 1.3 and we have a requirement to implement multiple-select faceting where the facet cells show up as checkboxes and despite checked options, all of the options continue to persist with counts. The best example I found is the search on Lucid Imagination's site: http://www.lucidimagination.com/search/ It appears the Solr 1.4 release has support for doing this with filter tagging (http://wiki.apache.org/solr/SimpleFacetParameters#head-f277d409b221b407d9c5430f552bf40ee6185c4c ), but I was wondering if there was another way to accomplish this in 1.3? The only way I can think to do this is to backport the patch to 1.3. FWIW, we are running 1.4-dev at /search, which is where that functionality comes from. -Grant
Re: Question about formatting the results returned from Solr
apparently all the dat ais going to one field 'author' instead they should be sent to separate fields author_fname author_lname author_email so you would get details like John Doe j...@doe.com On Wed, Jul 29, 2009 at 7:39 PM, ahammad wrote: > > Hi all, > > Not sure how good my title is, but here is a (hopefully) better explanation > on what I mean. > > I am indexing a set of articles from a DB. Each article has an author. The > author is saved in then the DB as an author ID, which is a number. > > There is another table in the DB with more relevant information about the > author. Basically it has columns like: > > id, firstname, lastname, email, userid > > I set up the DIH so that it returns the userid, and it works fine: > > > jdoe > msmith > > > Would it be possible to return all of the information about the author > (first name, ...) as a subset of the results above? > > Here is what I mean: > > > > John > Doe > j...@doe.com > > ... > > > Something similar to that at least... > > Not sure how descriptive I was, but any pointers would be highly > appreciated. > > Cheers > > -- > View this message in context: > http://www.nabble.com/Question-about-formatting-the-results-returned-from-Solr-tp24719831p24719831.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Range Query question
Hi, I have a set of XML data that holds Minimum and Maximum values and I need to be able to do specific range queries against them. (Note that this is a contrived example, and that in reality the garage would probably hold all the individual prices of all its cars, but this is analogous to the problem we have which is couched in terms that would obscure the problem.) For example, the following XML fragment is indexed so that each element becomes a Solr document: Ford Ka garage1 2000 4000 garage2 8000 1 I want to be able do a range query where search min value = 2500 search max value = 3500 This should return garage1 as potentially having cars in my price range as the range of prices for the garage contains the range I have input. It's also worth noting that we can't simply look for min prices that fall inside our range or max prices that fall inside our range, as in the case outlined above, none of the individual values fall inside our range, but there is overlap. The problem is that the indexed form of this XML is flattened so the entity has 2 garage names, 2 min values and 2 max values, but the grouping between the garage name and it's min and max values is lost. The danger is that we end up doing a comparison of the min-of-the-mins and the max-of-the-maxes, which tells us that a car is available in the price range which may not be true if garage1 has all cars below our search range and garage2 has all cars above our search range, e.g. if our search range is 5000-6000 then we should get no match. We wanted to include the garage name as an attritube of the min/max values to maintain this link, but couldn't find a way to do this. Finally, it would be extremely difficult for us to modify the XML presented to our system, hence our approach to date. Has anyone had a similar problem and if so how did you overcome it? Thanks for taking the time to look. - Matt Beaumont mibe...@yahoo.co.uk -- View this message in context: http://www.nabble.com/Range-Query-question-tp24737656p24737656.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question about formatting the results returned from Solr
Yes, I get that. The problem arises when you have multiple authors. How can I know which first name goes with which user id etc... Cheers Noble Paul നോബിള് नोब्ळ्-2 wrote: > > apparently all the dat ais going to one field 'author' > > instead they should be sent to separate fields > author_fname > author_lname > author_email > > so you would get details like > > John > Doe > j...@doe.com > > > > On Wed, Jul 29, 2009 at 7:39 PM, ahammad wrote: >> >> Hi all, >> >> Not sure how good my title is, but here is a (hopefully) better >> explanation >> on what I mean. >> >> I am indexing a set of articles from a DB. Each article has an author. >> The >> author is saved in then the DB as an author ID, which is a number. >> >> There is another table in the DB with more relevant information about the >> author. Basically it has columns like: >> >> id, firstname, lastname, email, userid >> >> I set up the DIH so that it returns the userid, and it works fine: >> >> >> jdoe >> msmith >> >> >> Would it be possible to return all of the information about the author >> (first name, ...) as a subset of the results above? >> >> Here is what I mean: >> >> >> >> John >> Doe >> j...@doe.com >> >> ... >> >> >> Something similar to that at least... >> >> Not sure how descriptive I was, but any pointers would be highly >> appreciated. >> >> Cheers >> >> -- >> View this message in context: >> http://www.nabble.com/Question-about-formatting-the-results-returned-from-Solr-tp24719831p24719831.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com > > -- View this message in context: http://www.nabble.com/Question-about-formatting-the-results-returned-from-Solr-tp24719831p24737962.html Sent from the Solr - User mailing list archive at Nabble.com.
How can i get lucene index format version information?
i want to get the lucene index format version from solr web app (as luke do), i've tried looking for the info at luke handler response, but i havn't found this info -- Lici
SOLR deleted almost everything?
Hello everyone :) I was trying to purge out older things.. in this case of a certain type of document that had an ID lower than 200. So I posted this: id:[0 TO 200] AND type:I Now, I have only 49 type "I" items total in my index (shown by /solr/select?q=type:I), when there should be numbers still up to about 2165000 which is far far more than 49 I'm curious why this would be, as I'm trying to build it automatic purging of older things, but this obviously didn't work the way I thought. I'm on version 1.1, and my schema information for the fields is below: Thanks for any insight into why I broke it! -Reece
Re: Multi select faceting
Grant, thanks for the reply. We tested our requirement against 1.4-dev and were able to achieve what we wanted. The site we're rebuilding has low traffic, so we're going to run with 1.4-dev. Cheers. - Original Message - From: "Grant Ingersoll" To: Sent: Thursday, July 30, 2009 8:05 AM Subject: Re: Multi select faceting On Jul 29, 2009, at 2:38 PM, Mike wrote: Hi, We're using Lucid Imagination's LucidWorks Solr 1.3 and we have a requirement to implement multiple-select faceting where the facet cells show up as checkboxes and despite checked options, all of the options continue to persist with counts. The best example I found is the search on Lucid Imagination's site: http://www.lucidimagination.com/search/ It appears the Solr 1.4 release has support for doing this with filter tagging (http://wiki.apache.org/solr/SimpleFacetParameters#head-f277d409b221b407d9c5430f552bf40ee6185c4c ), but I was wondering if there was another way to accomplish this in 1.3? The only way I can think to do this is to backport the patch to 1.3. FWIW, we are running 1.4-dev at /search, which is where that functionality comes from. -Grant
RE: Boosting ('bq') on multi-valued fields
> Hey Ken, > Thanks for your reply. > When I wrote '5|6' I ment that this is a multiValued field with two > values > '5' and '6', rather than the literal string '5|6' (and any Tokenizer). > Does > your reply still holds? That is, are multiValued fields dependent on > the > notion of tokenization to such a degree so that I cant use str type > with > them meaningfully? if so, it seems weird to me that I should be able to > define a str multiValued field to begin with.. I'm pretty sure you can use multiValued string fields in the way you are describing. If you just do a query without the boost do documents with multiple values come back? That would at least tell you whether the problem was matching on the term itself or something to do with your use of boosts. -Ken
RE: Range Query question
> The problem is that the indexed form of this XML is flattened so the > > entity has 2 garage names, 2 min values and 2 max values, but the > grouping > between the garage name and it's min and max values is lost. The > danger is > that we end up doing a comparison of the min-of-the-mins and the > max-of-the-maxes, which tells us that a car is available in the price > range > which may not be true if garage1 has all cars below our search range > and > garage2 has all cars above our search range, e.g. if our search range > is > 5000-6000 then we should get no match. You could index each garage-car pairing as a separate document, embedding all the necessary information you need for searching. e.g.- Ford Ka garage1 2000 4000
Re: SOLR deleted almost everything?
On Jul 30, 2009, at 9:44 AM, Reece wrote: Hello everyone :) I was trying to purge out older things.. in this case of a certain type of document that had an ID lower than 200. So I posted this: id:[0 TO 200] AND type:I Now, I have only 49 type "I" items total in my index (shown by /solr/select?q=type:I), when there should be numbers still up to about 2165000 which is far far more than 49 I'm curious why this would be, as I'm trying to build it automatic purging of older things, but this obviously didn't work the way I thought. I'm on version 1.1, and my schema information for the fields is below: Use one of the sortable numeric types for your id field if you need to perform range queries on them. A string is sorted lexicographically: 1, 10, 11, 2, 3, 4, 5... and thus a range query won't work the way you might expect. Erik
NullPointerException in DataImportHandler
First of all, apologies if you get this twice. I posted it by email an hour ago but it hasn't appeared in any of the archives, so I'm worried it's got junked somewhere. I'm trying to use a DataImportHandler to merge some data from a database with some other fields from a collection of XML files, rather like the example in the Architecture section here: http://wiki.apache.org/solr/DataImportHandler ... so a given document is built from some fields from the database and some from the XML. My dataconfig.xml looks like this: This works if I comment out the inner entity, but when I uncomment it, I get this error: 30-Jul-2009 14:32:50 org.apache.solr.handler.dataimport.DocBuilder buildDocument SEVERE: Exception while processing: domain document : SolrInputDocument[{id=id(1.0)={1s32D00}, title=title(1.0)={PDB code 1s32, chain D, domain 00}, keywords=keywords(1.0)={some ke ywords go here}, pdb_code=pdb_code(1.0)={1s32}, doc_type=doc_type(1.0)={domain}, related_ids=related_ids(1.0)={1s32 1s32D}}] org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.NullPointerException at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:64) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:344) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:372) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:225) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:167) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372) Caused by: java.lang.NullPointerException at java.io.File.(File.java:222) at org.apache.solr.handler.dataimport.FileDataSource.getData(FileDataSource.java:75) at org.apache.solr.handler.dataimport.FileDataSource.getData(FileDataSource.java:44) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58) ... 9 more I have checked that the file /cath/people/cathdata/v3_3_0/pdb-XML-noatom/1s32-noatom.xml is readable, so maybe the full path to the file isn't being constructed properly or something? I also tried with the full path template for the file in the entity url attribute, instead of using a basePath in the dataSource, but I get exactly the same exception. This is with the 2009-07-30 nightly build. See attached for schema. http://www.nabble.com/file/p24739580/schema.xml schema.xml Any ideas? Thanks in advance! Andrew. -- :: http://biotext.org.uk/ :: -- View this message in context: http://www.nabble.com/NullPointerException-in-DataImportHandler-tp24739580p24739580.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question about formatting the results returned from Solr
> > instead they should be sent to separate fields > author_fname > author_lname > author_email > or, a dynamic field called author_* (I am assuming all of the author fields to be of the same type). And if you use SolrJ, you can transform this info into a data structure like "Map authorInfo", where the keys would be "firstName", "lastName", "email" etc. Look for more here - http://issues.apache.org/jira/browse/SOLR-1129 Cheers Avlesh 2009/7/30 Noble Paul നോബിള് नोब्ळ् > apparently all the dat ais going to one field 'author' > > instead they should be sent to separate fields > author_fname > author_lname > author_email > > so you would get details like > > John > Doe > j...@doe.com > > > > On Wed, Jul 29, 2009 at 7:39 PM, ahammad wrote: > > > > Hi all, > > > > Not sure how good my title is, but here is a (hopefully) better > explanation > > on what I mean. > > > > I am indexing a set of articles from a DB. Each article has an author. > The > > author is saved in then the DB as an author ID, which is a number. > > > > There is another table in the DB with more relevant information about the > > author. Basically it has columns like: > > > > id, firstname, lastname, email, userid > > > > I set up the DIH so that it returns the userid, and it works fine: > > > > > > jdoe > > msmith > > > > > > Would it be possible to return all of the information about the author > > (first name, ...) as a subset of the results above? > > > > Here is what I mean: > > > > > > > > John > > Doe > > j...@doe.com > > > > ... > > > > > > Something similar to that at least... > > > > Not sure how descriptive I was, but any pointers would be highly > > appreciated. > > > > Cheers > > > > -- > > View this message in context: > http://www.nabble.com/Question-about-formatting-the-results-returned-from-Solr-tp24719831p24719831.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > > > > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com >
Posting data in JSON
Hi All, I'm wondering if it's possible to post documents to solr in JSON format. JSON is much faster than XML to get the queries results, so I think it'd be great to be able to post data in JSON to speed up the indexing and lower the network load. All the best ! Jerome Eteve. -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Re: NullPointerException in DataImportHandler
Hi Andrew, your inner entity uses an XML type datasource. The default entity processor is the SQL one, however. For your inner entity, you have to specify the correct entity processor explicitly. You do that by adding the attribute "processor", and the value is the classname of the processor you want to use. e.g. processor="XPathEntityProcessor" (See the wikipedia example on the DataImportHandler wiki page.) Cheers, Chantal Andrew Clegg schrieb: First of all, apologies if you get this twice. I posted it by email an hour ago but it hasn't appeared in any of the archives, so I'm worried it's got junked somewhere. I'm trying to use a DataImportHandler to merge some data from a database with some other fields from a collection of XML files, rather like the example in the Architecture section here: http://wiki.apache.org/solr/DataImportHandler ... so a given document is built from some fields from the database and some from the XML. My dataconfig.xml looks like this: This works if I comment out the inner entity, but when I uncomment it, I get this error: 30-Jul-2009 14:32:50 org.apache.solr.handler.dataimport.DocBuilder buildDocument SEVERE: Exception while processing: domain document : SolrInputDocument[{id=id(1.0)={1s32D00}, title=title(1.0)={PDB code 1s32, chain D, domain 00}, keywords=keywords(1.0)={some ke ywords go here}, pdb_code=pdb_code(1.0)={1s32}, doc_type=doc_type(1.0)={domain}, related_ids=related_ids(1.0)={1s32 1s32D}}] org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.NullPointerException at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:64) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:344) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:372) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:225) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:167) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372) Caused by: java.lang.NullPointerException at java.io.File.(File.java:222) at org.apache.solr.handler.dataimport.FileDataSource.getData(FileDataSource.java:75) at org.apache.solr.handler.dataimport.FileDataSource.getData(FileDataSource.java:44) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58) ... 9 more I have checked that the file /cath/people/cathdata/v3_3_0/pdb-XML-noatom/1s32-noatom.xml is readable, so maybe the full path to the file isn't being constructed properly or something? I also tried with the full path template for the file in the entity url attribute, instead of using a basePath in the dataSource, but I get exactly the same exception. This is with the 2009-07-30 nightly build. See attached for schema. http://www.nabble.com/file/p24739580/schema.xml schema.xml Any ideas? Thanks in advance! Andrew. -- :: http://biotext.org.uk/ :: -- View this message in context: http://www.nabble.com/NullPointerException-in-DataImportHandler-tp24739580p24739580.html Sent from the Solr - User mailing list archive at Nabble.com.
Posting Word documents
I am trying to post a Word document using the Solr post.jar file. When I attempt this, using a command line interface, I get a fatal error. I have looked at the following resources: Solr.com: Tutorial, Docs, FAQ, & ExtractingRequestHandler. As near as I can tell, I have all the files in the proper place. Following is a portion of the error displayed in the cmd window: C:\Solr\Apache~1\example\exampledocs>java -jar post.jar *.doc SimplePostTool: version 1.2 SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, other encodings are not currently supported SimplePostTool: POSTing files to http://localhost:8983/solr/update.. SimplePostTool: POSTing file BadNews.doc SimplePostTool: FATAL: Solr returned an error: Unexpected_character__code_65533__0xfffd_in_prolog_expected___at_rowcol_ unknownsoruce_11_javaioIOException_Unexpected_charater__code65533__0xfff d_in_prolog_expected___at_rowcol_unknownsource_11___at_orgapachesolrhand lerXMLLoaderloadXMLLoaderjava73___at_orgapahcesolrhandlerContentStreamHa ndlerBasehandlerRequrestBodyContentStreamHandlerBasejava54___... There is more and if needed I will be happy to post all of it. Here is the information that posted into the log file: 127.0.0.1 - - [30/07/2009:15:20:09 +] "POST /solr/update HTTP/1.1" 500 4011 Kevin Miller Web Services
Re: Posting Word documents
Look again at ExtractingRequestHandler. I havn't looked at what post.jar does internally, but it probably doesn't work with ExtractingRequestHandler unless you can send other params as well. I would use curl as the examples in the doc for ExtractingRequestHandler does. Or figure out if post.jar will work for you and use it correctly. What Handler is 'update..' mapped to? If its not mapped to ExtractingRequestHandler than you have no hope of this working in any case. Looks to me like its trying to process the file as SolrXml - which means you are not submitting it to ExtractingRequestHandler. -- - Mark http://www.lucidimagination.com Kevin Miller wrote: I am trying to post a Word document using the Solr post.jar file. When I attempt this, using a command line interface, I get a fatal error. I have looked at the following resources: Solr.com: Tutorial, Docs, FAQ, & ExtractingRequestHandler. As near as I can tell, I have all the files in the proper place. Following is a portion of the error displayed in the cmd window: C:\Solr\Apache~1\example\exampledocs>java -jar post.jar *.doc SimplePostTool: version 1.2 SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, other encodings are not currently supported SimplePostTool: POSTing files to http://localhost:8983/solr/update.. SimplePostTool: POSTing file BadNews.doc SimplePostTool: FATAL: Solr returned an error: Unexpected_character__code_65533__0xfffd_in_prolog_expected___at_rowcol_ unknownsoruce_11_javaioIOException_Unexpected_charater__code65533__0xfff d_in_prolog_expected___at_rowcol_unknownsource_11___at_orgapachesolrhand lerXMLLoaderloadXMLLoaderjava73___at_orgapahcesolrhandlerContentStreamHa ndlerBasehandlerRequrestBodyContentStreamHandlerBasejava54___... There is more and if needed I will be happy to post all of it. Here is the information that posted into the log file: 127.0.0.1 - - [30/07/2009:15:20:09 +] "POST /solr/update HTTP/1.1" 500 4011 Kevin Miller Web Services
Re: How can i get lucene index format version information?
On Jul 30, 2009, at 9:19 AM, Licinio Fernández Maurelo wrote: i want to get the lucene index format version from solr web app (as luke do), i've tried looking for the info at luke handler response, but i havn't found this info the Luke request handler writes it out: indexInfo.add("version", reader.getVersion()); It appears in the index section near the top of the response. Erik
Re: NullPointerException in DataImportHandler
Chantal Ackermann wrote: > > Hi Andrew, > > your inner entity uses an XML type datasource. The default entity > processor is the SQL one, however. > > For your inner entity, you have to specify the correct entity processor > explicitly. You do that by adding the attribute "processor", and the > value is the classname of the processor you want to use. > > e.g. processor="XPathEntityProcessor" > Thanks -- I was also missing a forEach expression -- in my case, just "/" since each XML file contains the information for no more than one document. However, I'm now getting a different exception: 30-Jul-2009 16:48:52 org.apache.solr.handler.dataimport.DocBuilder buildDocument SEVERE: Exception while processing: domain document : SolrInputDocument[{id=id(1.0)={1udaA02}, title=title(1.0)={PDB code 1uda, chain A, domain 02}, pdb_code=pdb_code(1.0)={1uda}, doc_type=doc_type(1.0)={domain}, related_ids=related_ids(1.0)={1uda,1udaA}}] org.apache.solr.handler.dataimport.DataImportHandlerException: Exception while reading xpaths for fields Processing Document # 1 at org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:135) at org.apache.solr.handler.dataimport.XPathEntityProcessor.init(XPathEntityProcessor.java:76) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:71) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:307) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:372) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:225) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:167) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372) Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.LinkedList.entry(LinkedList.java:365) at java.util.LinkedList.get(LinkedList.java:315) at org.apache.solr.handler.dataimport.XPathRecordReader.addField0(XPathRecordReader.java:71) at org.apache.solr.handler.dataimport.XPathRecordReader.(XPathRecordReader.java:50) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:121) ... 9 more My data config now looks like this: Thanks in advance, again :-) Andrew. -- View this message in context: http://www.nabble.com/NullPointerException-in-DataImportHandler-tp24739580p24741292.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: NullPointerException in DataImportHandler
On Jul 30, 2009, at 11:54 AM, Andrew Clegg wrote: xpath="//*[local-name()='structCategory']/*[local-name()='struct']/ *[local-name()='title']" /> The XPathEntityProcessor doesn't support that fancy of an xpath - it supports only a limited subset. Try /structCategory/struct/title perhaps? Erik
Re: NullPointerException in DataImportHandler
Hi Andrew, my experience with XPathEntityProcessor is non-existent. ;-) Just after a quick look at the method that throws the exception: private void addField0(String xpath, String name, boolean multiValued, boolean isRecord) { List paths = new LinkedList(Arrays.asList(xpath.split("/"))); if ("".equals(paths.get(0).trim())) paths.remove(0); rootNode.build(paths, name, multiValued, isRecord); } and your foreach attribute value in combination with the xpath: > forEach="/"> > > xpath="//*[local-name()='structCategory']/*[local-name()='struct']/*[local-name()='title']" > /> I would guess that the double slash at the beginning is not working with your foreach regex. I don't know whether this is something the processor should expect and handle correctly or whether you have to take care of in your configuration. Cheers, Chantal Andrew Clegg schrieb: Chantal Ackermann wrote: Hi Andrew, your inner entity uses an XML type datasource. The default entity processor is the SQL one, however. For your inner entity, you have to specify the correct entity processor explicitly. You do that by adding the attribute "processor", and the value is the classname of the processor you want to use. e.g. Thanks -- I was also missing a forEach expression -- in my case, just "/" since each XML file contains the information for no more than one document. However, I'm now getting a different exception: 30-Jul-2009 16:48:52 org.apache.solr.handler.dataimport.DocBuilder buildDocument SEVERE: Exception while processing: domain document : SolrInputDocument[{id=id(1.0)={1udaA02}, title=title(1.0)={PDB code 1uda, chain A, domain 02}, pdb_code=pdb_code(1.0)={1uda}, doc_type=doc_type(1.0)={domain}, related_ids=related_ids(1.0)={1uda,1udaA}}] org.apache.solr.handler.dataimport.DataImportHandlerException: Exception while reading xpaths for fields Processing Document # 1 at org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:135) at org.apache.solr.handler.dataimport.XPathEntityProcessor.init(XPathEntityProcessor.java:76) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:71) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:307) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:372) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:225) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:167) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372) Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.LinkedList.entry(LinkedList.java:365) at java.util.LinkedList.get(LinkedList.java:315) at org.apache.solr.handler.dataimport.XPathRecordReader.addField0(XPathRecordReader.java:71) at org.apache.solr.handler.dataimport.XPathRecordReader.(XPathRecordReader.java:50) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:121) ... 9 more My data config now looks like this: Thanks in advance, again :-) Andrew. -- View this message in context: http://www.nabble.com/NullPointerException-in-DataImportHandler-tp24739580p24741292.html Sent from the Solr - User mailing list archive at Nabble.com. -- Chantal Ackermann
Re: SOLR deleted almost everything?
Right, I figured that that's how it should have sorted... which is why I did a range from 0 to 200 That should have worked for my example, but it removed things over 200, which using lexical sorting seems to still be invalid. What's left are things like: 998914 Now, obviously that is expected, as it starts with a number over 2 but why would things like 2165979 be deleted when that is lexically after 200? Unless... oh man, I hope I didn't put an extra zero in there by accident !! ** checking .bash_history... Oh crap... I ran it between 0 and 7 at some point sigh. Thanks for the help! -Reece On Thu, Jul 30, 2009 at 10:08 AM, Erik Hatcher wrote: > > On Jul 30, 2009, at 9:44 AM, Reece wrote: > >> Hello everyone :) >> >> I was trying to purge out older things.. in this case of a certain >> type of document that had an ID lower than 200. So I posted this: >> >> id:[0 TO 200] AND type:I >> >> Now, I have only 49 type "I" items total in my index (shown by >> /solr/select?q=type:I), when there should be numbers still up to about >> 2165000 which is far far more than 49 I'm curious why this would >> be, as I'm trying to build it automatic purging of older things, but >> this obviously didn't work the way I thought. >> >> I'm on version 1.1, and my schema information for the fields is below: >> >> > required="true" /> > > Use one of the sortable numeric types for your id field if you need to > perform range queries on them. A string is sorted lexicographically: 1, 10, > 11, 2, 3, 4, 5... and thus a range query won't work the way you might > expect. > > Erik > >
Re: NullPointerException in DataImportHandler
Erik Hatcher wrote: > > > On Jul 30, 2009, at 11:54 AM, Andrew Clegg wrote: >>> url="${domain.pdb_code}-noatom.xml" processor="XPathEntityProcessor" >> forEach="/"> >>> xpath="//*[local-name()='structCategory']/*[local-name()='struct']/ >> *[local-name()='title']" >> /> > > The XPathEntityProcessor doesn't support that fancy of an xpath - it > supports only a limited subset. Try /structCategory/struct/title > perhaps? > > Sadly not... I tried with: (full path from root) and Same ArrayIndex error each time. Doesn't it use javax.xml then? I was using the complex local-name expressions to make it namespace-agnostic -- is it agnostic anyway? Thanks, Andrew. -- View this message in context: http://www.nabble.com/NullPointerException-in-DataImportHandler-tp24739580p24741696.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: NullPointerException in DataImportHandler
Chantal Ackermann wrote: > > > my experience with XPathEntityProcessor is non-existent. ;-) > > Don't worry -- your hints put me on the right track :-) I got it working with: Now, to get it to ignore missing files without an error... Hmm... Cheers, Andrew. -- View this message in context: http://www.nabble.com/NullPointerException-in-DataImportHandler-tp24739580p24741772.html Sent from the Solr - User mailing list archive at Nabble.com.
Minimum facet length?
Hi, I am exploring the faceted search results of Solr. My query is like this. http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=text&facet.limit=500&facet.prefix=wick If I don't use the prefix, I get back totals for words like 1,a,of,2,3,4. 1 letter/number occurrences in my documents. Its not really useful since all the documents have some free floating single-digit numbers. Is there a way to restrict the word frequency results for a facet based on the length so I can set it to > 3 or is there a better way? thanks, Darren
Re: NullPointerException in DataImportHandler
On Jul 30, 2009, at 12:19 PM, Andrew Clegg wrote: Don't worry -- your hints put me on the right track :-) I got it working with: Now, to get it to ignore missing files without an error... Hmm... onError="skip" or abort, or continue Erik
Re: NullPointerException in DataImportHandler
It's very easy to write your own entity processor. At least, that is my experience with extending the SQLEntityProcessor to my needs. So, maybe you'd be better off subclassing the xpath processor and handling the xpath in a way you can keep your configuration straight forward. Andrew Clegg schrieb: Chantal Ackermann wrote: my experience with XPathEntityProcessor is non-existent. ;-) Don't worry -- your hints put me on the right track :-) I got it working with: Now, to get it to ignore missing files without an error... Hmm... Cheers, Andrew. -- View this message in context: http://www.nabble.com/NullPointerException-in-DataImportHandler-tp24739580p24741772.html Sent from the Solr - User mailing list archive at Nabble.com. -- Chantal Ackermann Consultant mobil+49 (176) 10 00 09 45 emailchantal.ackerm...@btelligent.de b.telligent GmbH & Co. KG Lichtenbergstraße 8 D-85748 Garching / München fon +49 (89) 54 84 25 60 fax+49 (89) 54 84 25 69 web www.btelligent.de Registered in Munich: HRA 84393 Managing Director: b.telligent Verwaltungs GmbH, HRB 153164 represented by Sebastian Amtage and Klaus Blaschek USt.Id.-Nr. DE814054803 Confidentiality Note This email is intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If the reader of this email message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is prohibited. If you have received this email in error, please notify us immediately by telephone at +49 (0) 89 54 84 25 60. Thank you.
Re: update some index documents after indexing process is done with DIH
Hoss I see what you mean. I am trying to implement a CustomUpdateProcessor checking out here: http://wiki.apache.org/solr/UpdateRequestProcessor What is confusing me now is that I have to implement my logic in processComit as you said: >>you'll still need the "double commit" (once so you can see the >>main changes, and once so the rest of the world can see your >>modifications) but you can execute them both directly in your >>processCommit(CommitUpdateCommand) I have noticed that in the processAdd you have acces to the concrete SolrInpuntDocument you are going to add: SolrInputDocument doc = cmd.getSolrInputDocument(); But in processCommit, having access to the core I can get the IndexReader but I still don't know how to get the IndexWriter and SolrInputDocuments in there. My idea is to do something like: @Override public void processCommit(CommitUpdateCommand cmd) throws IOException { //first commit that show me modification //open and iterate over the reader and create solrDocuments list //close reader //openwriter and update the docs in the list //close writer and second commit that shows my changes to the world if (next != null) next.processCommit(cmd); } As I understood the process, the commitCommand will be sent to the DirectUpdateHandler2. that will proper do the commit via UpdateRequestProcessor. Am I in the right way? I haven't dealed with CustomUpdateProcessor for doing something after a commit is executed so I am a bit confused... Thanks in advance. hossman wrote: > > > This thread all sounds really kludgy ... among other things the > newSearcher listener is going to need to some how keep track of when it > was called as a result of a "real" commit, vs when it was called as the > result of a commit it itself triggered to make changes. > > wouldn't an easier place to implement this logic be in an UpdateProcessor? > you'll still need the "double commit" (once so you can see the > main changes, and once so the rest of the world can see your > modifications) but you can execute them both directly in your > processCommit(CommitUpdateCommand) method (so you don't have to worry > about being able to tell them apart) > > : Date: Thu, 30 Jul 2009 10:14:16 +0530 > : From: > : > =?UTF-8?B?Tm9ibGUgUGF1bCDgtKjgtYvgtKzgtL/gtLPgtY3igI0gIOCkqOCli+CkrOCljeCk > : s+CljQ==?= > : Reply-To: solr-user@lucene.apache.org, noble.p...@gmail.com > : To: solr-user@lucene.apache.org > : Subject: Re: update some index documents after indexing process is done > with > : DIH > : > : If you make your EventListener implements SolrCoreAware you can get > : hold of the core on inform. use that to get hold of the > : SolrIndexWriter > : > : On Wed, Jul 29, 2009 at 9:20 PM, Marc Sturlese > wrote: > : > > : > From the newSearcher(..) of a CustomEventListener which extends of > : > AbstractSolrEventListener can access to SolrIndexSearcher and all > core > : > properties but can't get a SolrIndexWriter. Do you now how can I get > from > : > there a SolrIndexWriter? This way I would be able to modify the > documents (I > : > need to modify them depending on values of other documents, that's why > I > : > can't do it with DIH delta-import). > : > Thanks in advance > : > > : > > : > Noble Paul നോബിള് नोब्ळ्-2 wrote: > : >> > : >> On Tue, Jul 28, 2009 at 5:17 PM, Marc > Sturlese > : >> wrote: > : >>> > : >>> That really sounds the best way to reach my goal. How could I > invoque a > : >>> listener from the newSearcher?Would be something like: > : >>> > : >>> > : >>> solr 0 : >>> name="rows">10 > : >>> rocks 0 > : >>> name="rows">10 > : >>> static newSearcher warming query from > : >>> solrconfig.xml > : >>> > : >>> > : >>> > : >>> > : >>> And MyCustomListener would be the class who open the reader: > : >>> > : >>> RefCounted searchHolder = null; > : >>> try { > : >>> searchHolder = dataImporter.getCore().getSearcher(); > : >>> IndexReader reader = searchHolder.get().getReader(); > : >>> > : >>> //Here I iterate over the reader doing docuemnt > modifications > : >>> > : >>> } finally { > : >>> if (searchHolder != null) searchHolder.decref(); > : >>> } > : >>> } catch (Exception ex) { > : >>> LOG.info("error"); > : >>> } > : >> > : >> you may not be able to access the DIH API from a newSearcher event . > : >> But the API would give you the searcher directly as a method > : >> parameter. > : >>> > : >>> Finally, to access to documents and add fields to some of them, I > have > : >>> thought in using SolrDocument classes. Can you please point me where > : >>> something similar is done in solr source (I mean creation of > : >>> SolrDocuemnts > : >>> and conversion of them to proper lucene docuements). > : >>> > : >>> Does this way for reaching the goal makes sense? > : >>> > : >>> Thanks in advance
RE: Range Query question
Thanks for the reply; I had thought the solution would be altering the XML. Ensdorf Ken wrote: > >> The problem is that the indexed form of this XML is flattened so the >> >> entity has 2 garage names, 2 min values and 2 max values, but the >> grouping >> between the garage name and it's min and max values is lost. The >> danger is >> that we end up doing a comparison of the min-of-the-mins and the >> max-of-the-maxes, which tells us that a car is available in the price >> range >> which may not be true if garage1 has all cars below our search range >> and >> garage2 has all cars above our search range, e.g. if our search range >> is >> 5000-6000 then we should get no match. > > You could index each garage-car pairing as a separate document, embedding > all the necessary information you need for searching. > > e.g.- > > > Ford > Ka >garage1 >2000 >4000 > > > - Matt Beaumont mibe...@yahoo.co.uk -- View this message in context: http://www.nabble.com/Range-Query-question-tp24737656p24742062.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Posting data in JSON
On Thu, Jul 30, 2009 at 8:31 PM, Jérôme Etévé wrote: > Hi All, > > I'm wondering if it's possible to post documents to solr in JSON format. > > JSON is much faster than XML to get the queries results, so I think > it'd be great to be able to post data in JSON to speed up the indexing > and lower the network load. > If you are using Java,Solrj on 1.4 (trunk), you can use the binary format which is extremely compact and efficient. Note that with Solr/Solrj 1.3, binary became the default response format for Solrj clients. -- Regards, Shalin Shekhar Mangar.
Re: Minimum facet length?
On Thu, Jul 30, 2009 at 9:53 PM, wrote: > Hi, > I am exploring the faceted search results of Solr. My query is like this. > > > http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=text&facet.limit=500&facet.prefix=wick > > If I don't use the prefix, I get back totals for words like 1,a,of,2,3,4. > 1 letter/number occurrences in my documents. Its not really useful since > all the documents have some free floating single-digit numbers. > > Is there a way to restrict the word frequency results for a facet based on > the length so I can set it to > 3 or is there a better way? > Yes, you can specify facet.mincount=3 to return only those terms present in more than 3 documents. On a related note, a tokenized field (such as text type in the example schema) will create a large number of unqiue terms. Faceting on such a field may not be very useful and/or efficient. Typically faceting is done on untokenized fields (such as string type). -- Regards, Shalin Shekhar Mangar.
Re: Minimum facet length?
On Jul 30, 2009, at 1:00 PM, Shalin Shekhar Mangar wrote: On Thu, Jul 30, 2009 at 9:53 PM, wrote: Hi, I am exploring the faceted search results of Solr. My query is like this. http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=text&facet.limit=500&facet.prefix=wick If I don't use the prefix, I get back totals for words like 1,a,of, 2,3,4. 1 letter/number occurrences in my documents. Its not really useful since all the documents have some free floating single-digit numbers. Is there a way to restrict the word frequency results for a facet based on the length so I can set it to > 3 or is there a better way? Yes, you can specify facet.mincount=3 to return only those terms present in more than 3 documents. On a related note, a tokenized field (such as text type in the example schema) will create a large number of unqiue terms. Faceting on such a field may not be very useful and/or efficient. Typically faceting is done on untokenized fields (such as string type). I think what was meant by > 3 was if faceting only returned terms of length greater than 3, not count. You could copyField your text field to another field, set the analyzer to include a LengthFilterFactory with a minimum length specified, and also have other analysis tweaks to have numbers and other stop words removed. Erik
Re: Minimum facet length?
On Thu, Jul 30, 2009 at 10:35 PM, Erik Hatcher wrote: > > On Jul 30, 2009, at 1:00 PM, Shalin Shekhar Mangar wrote: > > On Thu, Jul 30, 2009 at 9:53 PM, wrote: >> >> Hi, >>> I am exploring the faceted search results of Solr. My query is like this. >>> >>> >>> >>> http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=text&facet.limit=500&facet.prefix=wick >>> >>> If I don't use the prefix, I get back totals for words like 1,a,of,2,3,4. >>> 1 letter/number occurrences in my documents. Its not really useful since >>> all the documents have some free floating single-digit numbers. >>> >>> Is there a way to restrict the word frequency results for a facet based >>> on >>> the length so I can set it to > 3 or is there a better way? >>> >>> >> Yes, you can specify facet.mincount=3 to return only those terms present >> in >> more than 3 documents. On a related note, a tokenized field (such as text >> type in the example schema) will create a large number of unqiue terms. >> Faceting on such a field may not be very useful and/or efficient. >> Typically >> faceting is done on untokenized fields (such as string type). >> > > I think what was meant by > 3 was if faceting only returned terms of length > greater than 3, not count. > Ah, sorry. I was too fast to reply. -- Regards, Shalin Shekhar Mangar.
Re: How can i get lucene index format version information?
: > i want to get the lucene index format version from solr web app (as : the Luke request handler writes it out: : :indexInfo.add("version", reader.getVersion()); that's the index version (as in "i have added docs to the index, so the version number has changed") the question is about the format version (as in: "i have upgraded Lucene from 2.1 to 2.3, so the index format has changed") I'm not sure how Luke get's that ... it's not exposed via a public API on an IndexReader. Hmm... SegmentInfos.readCurrentVersion(Directory) seems like it would do the trick; but i'm not sure how that would interact with customized INdexReader implementations. i suppose we could always make it non-fatal if it throws an exception (just print the exception mesg in place of hte number) anybody want to submit a patch to add this to the LukeRequestHandler? -Hoss
Re: How can i get lucene index format version information?
I think the properties page in the admin UI lists the Lucene version, but I don't have a live server to check that on at this instant. wunder On Jul 30, 2009, at 10:26 AM, Chris Hostetter wrote: : > i want to get the lucene index format version from solr web app (as : the Luke request handler writes it out: : :indexInfo.add("version", reader.getVersion()); that's the index version (as in "i have added docs to the index, so the version number has changed") the question is about the format version (as in: "i have upgraded Lucene from 2.1 to 2.3, so the index format has changed") I'm not sure how Luke get's that ... it's not exposed via a public API on an IndexReader. Hmm... SegmentInfos.readCurrentVersion(Directory) seems like it would do the trick; but i'm not sure how that would interact with customized INdexReader implementations. i suppose we could always make it non- fatal if it throws an exception (just print the exception mesg in place of hte number) anybody want to submit a patch to add this to the LukeRequestHandler? -Hoss
Re: Posting data in JSON
Hi, Nope, I'm not using solrj (my client code is in Perl), and I'm with solr 1.3. J. 2009/7/30 Shalin Shekhar Mangar : > On Thu, Jul 30, 2009 at 8:31 PM, Jérôme Etévé > wrote: >> >> Hi All, >> >> I'm wondering if it's possible to post documents to solr in JSON format. >> >> JSON is much faster than XML to get the queries results, so I think >> it'd be great to be able to post data in JSON to speed up the indexing >> and lower the network load. > > If you are using Java,Solrj on 1.4 (trunk), you can use the binary format > which is extremely compact and efficient. Note that with Solr/Solrj 1.3, > binary became the default response format for Solrj clients. > > -- > Regards, > Shalin Shekhar Mangar. > -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Mailing list: Change the reply too ?
Hi all, I don't know if it does the same from everyone, but when I use the reply function of my mail agent, it sets the recipient to the user who sent the message, and not the mailing list. So it's quite annoying cause I have to change the recipient each time I reply to someone on the list. Can the list admins fix this issue ? Cheers ! J. -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Re: Mailing list: Change the reply too ?
On Jul 30, 2009, at 1:44 PM, Jérôme Etévé wrote: Hi all, I don't know if it does the same from everyone, but when I use the reply function of my mail agent, it sets the recipient to the user who sent the message, and not the mailing list. So it's quite annoying cause I have to change the recipient each time I reply to someone on the list. Can the list admins fix this issue ? All my replies go to the list. From your message, the header says: Reply-To: solr-user@lucene.apache.org Erik
Re: Mailing list: Change the reply too ?
2009/7/30 Erik Hatcher : > > On Jul 30, 2009, at 1:44 PM, Jérôme Etévé wrote: > >> Hi all, >> >> I don't know if it does the same from everyone, but when I use the >> reply function of my mail agent, it sets the recipient to the user who >> sent the message, and not the mailing list. >> >> So it's quite annoying cause I have to change the recipient each time >> I reply to someone on the list. >> >> Can the list admins fix this issue ? > > All my replies go to the list. > > From your message, the header says: > > Reply-To: solr-user@lucene.apache.org > >Erik It works with your messages. It might depends on mail agents. Jerome. -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Re: Posting data in JSON
check: https://issues.apache.org/jira/browse/SOLR-945 this will not likely make it into 1.4 On Jul 30, 2009, at 1:41 PM, Jérôme Etévé wrote: Hi, Nope, I'm not using solrj (my client code is in Perl), and I'm with solr 1.3. J. 2009/7/30 Shalin Shekhar Mangar : On Thu, Jul 30, 2009 at 8:31 PM, Jérôme Etévé wrote: Hi All, I'm wondering if it's possible to post documents to solr in JSON format. JSON is much faster than XML to get the queries results, so I think it'd be great to be able to post data in JSON to speed up the indexing and lower the network load. If you are using Java,Solrj on 1.4 (trunk), you can use the binary format which is extremely compact and efficient. Note that with Solr/Solrj 1.3, binary became the default response format for Solrj clients. -- Regards, Shalin Shekhar Mangar. -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Re: Minimum facet length?
Hi Erik, Thanks for the tip. H, well that's a good point, or maybe I will just do the word filtering upfront and store it separately now that I think about it more. Darren On Thu, 2009-07-30 at 13:05 -0400, Erik Hatcher wrote: > On Jul 30, 2009, at 1:00 PM, Shalin Shekhar Mangar wrote: > > > On Thu, Jul 30, 2009 at 9:53 PM, wrote: > > > >> Hi, > >> I am exploring the faceted search results of Solr. My query is like > >> this. > >> > >> > >> http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=text&facet.limit=500&facet.prefix=wick > >> > >> If I don't use the prefix, I get back totals for words like 1,a,of, > >> 2,3,4. > >> 1 letter/number occurrences in my documents. Its not really useful > >> since > >> all the documents have some free floating single-digit numbers. > >> > >> Is there a way to restrict the word frequency results for a facet > >> based on > >> the length so I can set it to > 3 or is there a better way? > >> > > > > Yes, you can specify facet.mincount=3 to return only those terms > > present in > > more than 3 documents. On a related note, a tokenized field (such as > > text > > type in the example schema) will create a large number of unqiue > > terms. > > Faceting on such a field may not be very useful and/or efficient. > > Typically > > faceting is done on untokenized fields (such as string type). > > I think what was meant by > 3 was if faceting only returned terms of > length greater than 3, not count. > > You could copyField your text field to another field, set the analyzer > to include a LengthFilterFactory with a minimum length specified, and > also have other analysis tweaks to have numbers and other stop words > removed. > > Erik >
Re: Mailing list: Change the reply too ?
: I don't know if it does the same from everyone, but when I use the : reply function of my mail agent, it sets the recipient to the user who : sent the message, and not the mailing list. : : So it's quite annoying cause I have to change the recipient each time : I reply to someone on the list. : : Can the list admins fix this issue ? The list software allways adds a "Reply-To" header indicating that replies should be sent to the list. It does *not* remove any existing Reply-To headers that the orriginal sender may have included -- it does this because it trusts that the orriginal sender had a reason for putting it there (ie: when someone off list, like the apachecon coordinators, sends an announcement and the moderators let it through) It's mail client dependant as to what to do when you reply to a message like that -- yours apparently just picks one (and sometimes it's not the list) most either reply to both, or ask the user "do you want to reply to all" -Hoss
Reasonable number of maxWarming searchers
Hi All, I'm planning to have a certain number of processes posting independently in a solr instance. This instance will solely act as a master instance. No clients queries on it. Is there a problem if i set maxWarmingSearchers to something like 30 or 40? Also, how do I disable the cache warming? Is setting autowarmCount's to 0 enough? Regards, Jerome. -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Re: Reasonable number of maxWarming searchers
I recommend, in this case, that you use Solr's autocommit feature (see solrconfig.xml) rather than having your indexing clients issue their own commits. Overlapped searcher warming is just going to be too much of a hit on RAM, and generally unnecessary with autocommit. Erik On Jul 30, 2009, at 2:28 PM, Jérôme Etévé wrote: Hi All, I'm planning to have a certain number of processes posting independently in a solr instance. This instance will solely act as a master instance. No clients queries on it. Is there a problem if i set maxWarmingSearchers to something like 30 or 40? Also, how do I disable the cache warming? Is setting autowarmCount's to 0 enough? Regards, Jerome. -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
What does "showItems" config mean on fieldValueCache mean?
What's the effect of showItems attribute on the fieldValueCache in Solr 1.4? -- Stephen Duncan Jr www.stephenduncanjr.com
How to get a stack trace
Hello, I'm a new user of solr but I have worked a bit with Lucene before. I get some out of memory exception when optimizing the index through Solr and I would like to find out why. However, the only message I get on standard output is: Jul 30, 2009 9:20:22 PM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: Java heap space Is there a way to get a stack trace for this exception? I had a look into the java.util.logging options and didn't find anything. My solr runs in some standard configuration inside jetty. Any suggestion would be appreciated. Thanks, nicolae
Problem with retrieving field from database using DIH
Hello all, I've been having this issue for a while now. I am indexing a Sybase database. Everything is fantastic, except that there is 1 column that I can never get back. I don't have direct database access via Sybase client, but I was able to extract the data using some Java code. The field is essentially a Last Modified field. In the DB I believe that it is of type long. In the Java program that I have, I am able to retrieve the data that is in that column and put it in a variable of type Long. This is not the case in Solr, however. I set the variable in the schema as required to see why the data is never stored: This is what I get in the Tomcat logs: org.apache.solr.common.SolrException: Document [00069391] missing required field: lastModified at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:292) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:59) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:67) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:276) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:373) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:224) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:167) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:316) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:374) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:355) >From what I can gather, it is not finding the data and/or column, and thus cannot populate the required field. However, the data is there, which I was able to prove outside of Solr. Is there a way to generate more descriptive logs for this? I am completely lost. I hit this problem a few months ago but I was never able to resolve it. Any help on this will be much appreciated. BTW, Solr was successful in retrieving data from other columns in the same table... Thanks -- View this message in context: http://www.nabble.com/Problem-with-retrieving-field-from-database-using-DIH-tp24746530p24746530.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: What does "showItems" config mean on fieldValueCache mean?
On Jul 30, 2009, at 3:32 PM, Stephen Duncan Jr wrote: What's the effect of showItems attribute on the fieldValueCache in Solr 1.4? Just outputs details of the last accessed items from the cache in the stats display. Erik if (showItems != 0) { Map items = cache.getLatestAccessedItems( showItems == -1 ? Integer.MAX_VALUE : showItems ); for (Map.Entry e : (Set )items.entrySet()) { Object k = e.getKey(); Object v = e.getValue(); String ks = "item_" + k; String vs = v.toString(); lst.add(ks,vs); } }
Solr/Lucene performance differences on Mac OS X running Tiger vs. Leopard ?
As far as our NOC guys know the machines are approximately the same, aside from the OS. The Leopard machine is running the default 1.5 JVM. And it's possible that some other application or config issues is to blame. Nobody's "blaming" the OS or Lucene, we're just asking around. Searches on Google haven't turned up any reports, so I'm suspecting the issue lies elsewhere. Also I've run on Leopard for months without any performance issues, though I really don't tax anything on my workstation. -- Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
Re: What does "showItems" config mean on fieldValueCache mean?
On Thu, Jul 30, 2009 at 4:18 PM, Erik Hatcher wrote: > > On Jul 30, 2009, at 3:32 PM, Stephen Duncan Jr wrote: > > What's the effect of showItems attribute on the fieldValueCache in Solr >> 1.4? >> > > Just outputs details of the last accessed items from the cache in the stats > display. > >Erik > >if (showItems != 0) { > Map items = cache.getLatestAccessedItems( showItems == -1 ? > Integer.MAX_VALUE : showItems ); > for (Map.Entry e : (Set )items.entrySet()) { >Object k = e.getKey(); >Object v = e.getValue(); > >String ks = "item_" + k; >String vs = v.toString(); >lst.add(ks,vs); > } > >} > > Makes sense. Thanks! -- Stephen Duncan Jr www.stephenduncanjr.com
Re: Problem with retrieving field from database using DIH
On Fri, Jul 31, 2009 at 1:43 AM, ahammad wrote: > From what I can gather, it is not finding the data and/or column, and thus > cannot populate the required field. However, the data is there, which I was > able to prove outside of Solr. > > Is there a way to generate more descriptive logs for this? I am completely > lost. I hit this problem a few months ago but I was never able to resolve > it. Any help on this will be much appreciated. > Can you try using the debug mode and see what your sql query is returning? You can either use the /admin/dataimport.jsp or add a debug=on&verbose=true parameter to the import. You should probably limit the number of documents to be indexed by adding rows=X to the full-import command otherwise the response would be huge. -- Regards, Shalin Shekhar Mangar.
Re: µTorrent indexed as µTorrent
Thanks, Robert. That's exactly what my problem was. Things work find after I make sure that all my processing (index and query) are using UTF-8. FYI, it took me a while to discover that SolrJ by default uses a GET request for query, which uses ISO-8859-1. I had to explicitly use a POST to do query in SolrJ in order to get it to use UTF-8. Bill On Tue, Jul 28, 2009 at 5:27 PM, Robert Muir wrote: > Bill, somewhere in the process I think you might be treating your > UTF-8 text as ISO-8859-1. > > Your character: 00B5 (µ) > Bits: 10110101 > > UTF8-encoded: 1110 10110101 > > If you were to treat these bytes as ISO-8859-1 (i.e. reading from a > file or wrong url encoding) then it looks like: > 0xC2 (Å) followed by 0xB5 (µ) > > > On Tue, Jul 28, 2009 at 3:26 PM, Bill Au wrote: > > I am using SolrJ to index the word µTorrent. After a commit I was not > able > > to query for it. It turns out that the document in my Solr index > contains > > the word µTorrent instead of µTorrent. Any one has any idea what's > going > > on??? > > > > Bill > > > > > > -- > Robert Muir > rcm...@gmail.com >
Re: µTorrent indexed as µTorrent
On Thu, Jul 30, 2009 at 6:34 PM, Bill Au wrote: > FYI, it took me a while to discover that SolrJ by default uses a GET request > for > query, which uses ISO-8859-1. That depends on the servlet container. SolrJ GET requests are sent in UTF-8. Some servlet containers such as Tomcat need extra configuration to treat URLs as UTF-8 instead of latin-1, but the standard http://www.ietf.org/rfc/rfc3986.txt clearly specifies UTF-8. To test the servlet container configuration, check out example/exampledocs/test_utf8.sh -Yonik http://www.lucidimagination.com I had to explicitly use a POST to do query in > SolrJ in order to get it to use UTF-8. > > Bill > > On Tue, Jul 28, 2009 at 5:27 PM, Robert Muir wrote: > >> Bill, somewhere in the process I think you might be treating your >> UTF-8 text as ISO-8859-1. >> >> Your character: 00B5 (µ) >> Bits: 10110101 >> >> UTF8-encoded: 1110 10110101 >> >> If you were to treat these bytes as ISO-8859-1 (i.e. reading from a >> file or wrong url encoding) then it looks like: >> 0xC2 (Å) followed by 0xB5 (µ) >> >> >> On Tue, Jul 28, 2009 at 3:26 PM, Bill Au wrote: >> > I am using SolrJ to index the word µTorrent. After a commit I was not >> able >> > to query for it. It turns out that the document in my Solr index >> contains >> > the word µTorrent instead of µTorrent. Any one has any idea what's >> going >> > on??? >> > >> > Bill >> > >> >> >> >> -- >> Robert Muir >> rcm...@gmail.com >> >
facet sorting by index on sint fields
Hi, I have a field in my schema specified using Where "sint" is specified as follows (the default from schema.xml) When I do a facet on this field using sort=index I always get the values back in lexicographic order. Eg: adding this to a query string... facet=true&facet.field=wordCount&f.wordCount.facet.sort=index gives me 5 2 6 ... Is this a current limitation of solr faceting or am I missing a configuration step somewhere? I couldn't find any notes in the docs about this. Cheers, Simon
Re: How can i get lucene index format version information?
Check the system request handler: http://localhost:8983/solr/admin/system Should look something like this: 1.3.0.2009.07.28.10.39.42 1.4-dev 797693M - jayhill - 2009-07-28 10:39:42 2.9-dev 2.9-dev 794238 - 2009-07-15 18:05:08 -Jay On Thu, Jul 30, 2009 at 10:32 AM, Walter Underwood wrote: > I think the properties page in the admin UI lists the Lucene version, but I > don't have a live server to check that on at this instant. > > wunder > > > On Jul 30, 2009, at 10:26 AM, Chris Hostetter wrote: > > >> : > i want to get the lucene index format version from solr web app (as >> >> : the Luke request handler writes it out: >> : >> :indexInfo.add("version", reader.getVersion()); >> >> that's the index version (as in "i have added docs to the index, so the >> version number has changed") the question is about the format version (as >> in: "i have upgraded Lucene from 2.1 to 2.3, so the index format has >> changed") >> >> I'm not sure how Luke get's that ... it's not exposed via a public API on >> an IndexReader. >> >> Hmm... SegmentInfos.readCurrentVersion(Directory) seems like it would do >> the trick; but i'm not sure how that would interact with customized >> INdexReader implementations. i suppose we could always make it non-fatal >> if it throws an exception (just print the exception mesg in place of hte >> number) >> >> anybody want to submit a patch to add this to the LukeRequestHandler? >> >> >> -Hoss >> > >
Re: query in solr lucene
I tried this but this didn't worked... Regards, Sushan At 12:37 AM 7/30/2009, Avlesh Singh wrote: You may index your data using a delimiter, like $my-field-content$. While searching, perform a phrase query with the leading and trailing "$" appended to the query string. Cheers Avlesh On Wed, Jul 29, 2009 at 12:04 PM, Sushan Rungta wrote: > I tried using AND, but it even provided me doc 3 which was not required. > > Hence my problem still persists... > > regards, > Sushan > > > At 06:59 AM 7/29/2009, Avlesh Singh wrote: > >> > >> > No, phrase query would match docs 2 and 3. Sushan only wantsdoc 2 as I >> read >> > it. >> > >> Sorry, my bad. I did not read properly before replying. >> >> Cheers >> Avlesh >> >> On Wed, Jul 29, 2009 at 3:23 AM, Erick Erickson > >wrote: >> >> > No, phrase query would match docs 2 and 3. Sushan only wantsdoc 2 as I >> read >> > it. >> > >> > You might have some joy with KeywordAnalyzer, which does >> > not break the incoming stream up into tokens. You have to be >> > careful, though, because it also won't fold the case, so 'Hello' >> > would not match 'hello'. >> > >> > Best >> > Erick >> > >> > On Tue, Jul 28, 2009 at 11:11 AM, Avlesh Singh >> wrote: >> > >> > > You should perform a PhraseQuery on the required field. >> > > Meaning, http://your-solr-host:port: >> > > /your-core-path/select?q=fieldName:"Hello >> > > how are you sushan" would work for you. >> > > >> > > Cheers >> > > Avlesh >> > > >> > > 2009/7/28 Gérard Dupont >> > > >> > > > Hi Sushan, >> > > > >> > > > I'm not an expert of Solr, just beginner, but it appears to me that >> you >> > > > may >> > > > have default 'OR' combinaison fo keywords so that will explain this >> > > > behavior. Try to modify the configuration for an 'AND' combinaison. >> > > > >> > > > cheers >> > > > >> > > > On Tue, Jul 28, 2009 at 16:49, Sushan Rungta >> > wrote: >> > > > >> > > > > I am extremely sorry for responding late as I was ill from past >> few >> > > days. >> > > > > >> > > > > My problem is explained below with an example: >> > > > > >> > > > > I am having three documents with following list: >> > > > > >> > > > > 1. Hello how are you >> > > > > 2. Hello how are you sushan >> > > > > 3. Hello how are you sushan. I am fine. >> > > > > >> > > > > When I search for a query "Hello how are you sushan", I should >> only >> > get >> > > > > document 2 in my result. >> > > > > >> > > > > I hope this will give you all a better insight in my problem. >> > > > > >> > > > > regards, >> > > > > >> > > > > Sushan Rungta >> > > > > >> > > > >> > > > >> > > > >> > > > -- >> > > > Gérard Dupont >> > > > Information Processing Control and Cognition (IPCC) - EADS DS >> > > > http://weblab-project.org >> > > > >> > > > Document & Learning team - LITIS Laboratory >> > > > >> > > >> > >> > > >
Using DIH for parallel indexing
I am using Solr 1.3 and have a few questions regarding DIH: 1. Can I pass parameters to DIH and be able to use them inside the "query" attribute of an entity inside the data-config file? 2. I am indexing some 2 million database records using DIH with 4-5 nested entities (just one level). These subqueries are highly optimized cannot be avoided. Since, DIH processes records sequentially, it takes a lot of time (approximately 3 hours) to re-build the indexes. My question is - Can I use DIH in someway so that indexing can be carried out in parallel? 3. What happens if I "register" multiple DIH's (like dih1, dih2, dih3 ...) with different data-config files inside the same core and run full-import on each of them at the same time? Are the indexes created by each of these (inside the same data directory) merged? Due to my lack of knowledge on Lucene/Solr internals, some of these questions might be funny. Cheers Avlesh
Re: query in solr lucene
What field type are you using? What kind of filters have you applied on the field? The easiest way to make it work it to use a "string" field. Cheers Avlesh On Fri, Jul 31, 2009 at 11:09 AM, Sushan Rungta wrote: > I tried this but this didn't worked... > > Regards, > Sushan > > At 12:37 AM 7/30/2009, Avlesh Singh wrote: > >> You may index your data using a delimiter, like $my-field-content$. While >> searching, perform a phrase query with the leading and trailing "$" >> appended >> to the query string. >> >> Cheers >> Avlesh >> >> On Wed, Jul 29, 2009 at 12:04 PM, Sushan Rungta >> wrote: >> >> > I tried using AND, but it even provided me doc 3 which was not required. >> > >> > Hence my problem still persists... >> > >> > regards, >> > Sushan >> > >> > >> > At 06:59 AM 7/29/2009, Avlesh Singh wrote: >> > >> >> > >> >> > No, phrase query would match docs 2 and 3. Sushan only wantsdoc 2 as >> I >> >> read >> >> > it. >> >> > >> >> Sorry, my bad. I did not read properly before replying. >> >> >> >> Cheers >> >> Avlesh >> >> >> >> On Wed, Jul 29, 2009 at 3:23 AM, Erick Erickson < >> erickerick...@gmail.com >> >> >wrote: >> >> >> >> > No, phrase query would match docs 2 and 3. Sushan only wantsdoc 2 as >> I >> >> read >> >> > it. >> >> > >> >> > You might have some joy with KeywordAnalyzer, which does >> >> > not break the incoming stream up into tokens. You have to be >> >> > careful, though, because it also won't fold the case, so 'Hello' >> >> > would not match 'hello'. >> >> > >> >> > Best >> >> > Erick >> >> > >> >> > On Tue, Jul 28, 2009 at 11:11 AM, Avlesh Singh >> >> wrote: >> >> > >> >> > > You should perform a PhraseQuery on the required field. >> >> > > Meaning, http://your-solr-host:port: >> >> > > /your-core-path/select?q=fieldName:"Hello >> >> > > how are you sushan" would work for you. >> >> > > >> >> > > Cheers >> >> > > Avlesh >> >> > > >> >> > > 2009/7/28 Gérard Dupont >> >> > > >> >> > > > Hi Sushan, >> >> > > > >> >> > > > I'm not an expert of Solr, just beginner, but it appears to me >> that >> >> you >> >> > > > may >> >> > > > have default 'OR' combinaison fo keywords so that will explain >> this >> >> > > > behavior. Try to modify the configuration for an 'AND' >> combinaison. >> >> > > > >> >> > > > cheers >> >> > > > >> >> > > > On Tue, Jul 28, 2009 at 16:49, Sushan Rungta >> >> > wrote: >> >> > > > >> >> > > > > I am extremely sorry for responding late as I was ill from past >> >> few >> >> > > days. >> >> > > > > >> >> > > > > My problem is explained below with an example: >> >> > > > > >> >> > > > > I am having three documents with following list: >> >> > > > > >> >> > > > > 1. Hello how are you >> >> > > > > 2. Hello how are you sushan >> >> > > > > 3. Hello how are you sushan. I am fine. >> >> > > > > >> >> > > > > When I search for a query "Hello how are you sushan", I should >> >> only >> >> > get >> >> > > > > document 2 in my result. >> >> > > > > >> >> > > > > I hope this will give you all a better insight in my problem. >> >> > > > > >> >> > > > > regards, >> >> > > > > >> >> > > > > Sushan Rungta >> >> > > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > -- >> >> > > > Gérard Dupont >> >> > > > Information Processing Control and Cognition (IPCC) - EADS DS >> >> > > > http://weblab-project.org >> >> > > > >> >> > > > Document & Learning team - LITIS Laboratory >> >> > > > >> >> > > >> >> > >> >> >> > >> > >> > >> > > >
Re: Using DIH for parallel indexing
On Fri, Jul 31, 2009 at 11:11 AM, Avlesh Singh wrote: > I am using Solr 1.3 and have a few questions regarding DIH: > > 1. Can I pass parameters to DIH and be able to use them inside the > "query" attribute of an entity inside the data-config file? > 2. I am indexing some 2 million database records using DIH with 4-5 > nested entities (just one level). These subqueries are highly optimized > cannot be avoided. Since, DIH processes records sequentially, it takes a lot > of time (approximately 3 hours) to re-build the indexes. My question is - > Can I use DIH in someway so that indexing can be carried out in parallel? > 3. What happens if I "register" multiple DIH's (like dih1, dih2, dih3 > ...) with different data-config files inside the same core and run > full-import on each of them at the same time? Are the indexes created by > each of these (inside the same data directory) merged? yes it is possible to create muiltiple instances of DIH as you mentioned. The only drawback is it would result in multiple commits. All the data will be written to the same index together > > Due to my lack of knowledge on Lucene/Solr internals, some of these > questions might be funny. > > Cheers > Avlesh > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: NullPointerException in DataImportHandler
On Thu, Jul 30, 2009 at 9:45 PM, Andrew Clegg wrote: > > > Erik Hatcher wrote: >> >> >> On Jul 30, 2009, at 11:54 AM, Andrew Clegg wrote: >>> >> url="${domain.pdb_code}-noatom.xml" processor="XPathEntityProcessor" >>> forEach="/"> >>> >> xpath="//*[local-name()='structCategory']/*[local-name()='struct']/ >>> *[local-name()='title']" >>> /> >> >> The XPathEntityProcessor doesn't support that fancy of an xpath - it >> supports only a limited subset. Try /structCategory/struct/title >> perhaps? >> >> > > Sadly not... > > I tried with: > > xpath="/datablock/structCategory/struct/title" /> > > (full path from root) > > and > > xpath="//structCategory/struct/title" /> > > Same ArrayIndex error each time. > > Doesn't it use javax.xml then? I was using the complex local-name > expressions to make it namespace-agnostic -- is it agnostic anyway? it does not use javax.xml because those work on a DOM tree which is not usable for large xml files. This only supports a subset of xpath. The supported syntax is given here http://wiki.apache.org/solr/DataImportHandler#head-5ced7c797f1014ef6e8326a34c23f541ebbaadf1-2 > > Thanks, > > Andrew. > > -- > View this message in context: > http://www.nabble.com/NullPointerException-in-DataImportHandler-tp24739580p24741696.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Using DIH for parallel indexing
Thanks for the revert Noble. A few questions are still open: 1. Can I pass parameters to DIH and be able to use them inside the "query" attribute of an entity inside the data-config file? 2. Can I use the same data-import-handler in someway so that indexing can be carried out in parallel? Cheers Avlesh 2009/7/31 Noble Paul നോബിള് नोब्ळ् > On Fri, Jul 31, 2009 at 11:11 AM, Avlesh Singh wrote: > > I am using Solr 1.3 and have a few questions regarding DIH: > > > > 1. Can I pass parameters to DIH and be able to use them inside the > > "query" attribute of an entity inside the data-config file? > > 2. I am indexing some 2 million database records using DIH with 4-5 > > nested entities (just one level). These subqueries are highly optimized > > cannot be avoided. Since, DIH processes records sequentially, it takes > a lot > > of time (approximately 3 hours) to re-build the indexes. My question is > - > > Can I use DIH in someway so that indexing can be carried out in > parallel? > > 3. What happens if I "register" multiple DIH's (like dih1, dih2, dih3 > > ...) with different data-config files inside the same core and run > > full-import on each of them at the same time? Are the indexes created > by > > each of these (inside the same data directory) merged? > yes it is possible to create muiltiple instances of DIH as you > mentioned. The only drawback is it would result in multiple commits. > > All the data will be written to the same index together > > > > Due to my lack of knowledge on Lucene/Solr internals, some of these > > questions might be funny. > > > > Cheers > > Avlesh > > > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com >
Limiting facets for huge data - setting indexed=false in schema.xml
Hello, We are trying to get Solr to work for a really huge parts database. Details of the database - 55 million parts - Totally 3700 properties (facets). But each record will not have value for all properties. - Most of these facets are defined as dynamic fields within the Solr Index We were getting really unacceptable timing while doing faceting/searches on an index created with this database. With only one user using the system, query times are in excess of 1 minute. With more users concurrently using the system, the response times are further high. We thought that by limiting the number of properties that are available for faceting, the performance can be improved. To test this, we enabled only 6 properties for faceting by setting indexed=true (in schema.xml) for only these properties. All other properties which are defined as dynamic properties had indexed=false. The observations after this change : - Index size reduced by a meagre 5 % only - Performance did not improve. Infact during PSR run we observed that it degraded. My questions: - Will reducing the number of facets improve faceting and search performance ? - Is there a better way to reduce the number of facets ? - Will having a large number of properties defined as dynamic fields, reduce performance ? Thank you. Regards Rahul
Recreating SOLR index after a schema change - without having to re-post the data
Hi, We are using solr-server for a large data-set. We need some changes in solr schema.xml (datatype change from integer to sint for few fields). It turns out that the two datatypes (integer and sint) are incompatible and hence we need to re-index SOLR. My question is: Is there any way by which i can just re-create the index files for existing data/documents in solr? (without having to re-post the documents) I searched through many forums and everything seems to say : "I have to re-post ALL documents to solr for re-indexing". Please suggest me a better alternative to achieve my schema-change (I have very large solr-index - sized around 10GB and it will be tough to query the whole data-set, store it somewhere as XMLs and then to repost) -- Thanks, Vanniarajan