Re: multicore shards and relevancy score
On Tue, Sep 15, 2009 at 2:39 AM, Paul Rosen wrote: > > I've done a few experiments with searching two cores with the same schema > using the shard syntax. (using solr 1.3) > > My use case is that I want to have multiple cores because a few different > people will be managing the indexing, and that will happen at different > times. The data, however, is homogeneous. > > Multiple cores were not built for distributed search. It is inefficient as compared to a single index. But if you want to use them that way, that's your choice. > I've noticed in my tests that the results are not interwoven, but it might > just be my test data. In other words, all the results from one core appear, > then all the results from the other core. > > In thinking about it, it would make sense if the relevancy scores for each > core were completely independent of each other. And that would mean that > there is no way to compare the relevancy scores between the cores. > > In other words, I'd like the following results: > > - really relevant hit from core0 > - pretty relevant hit from core1 > - kind of relevant hit from core0 > - not so relevant hit from core1 > > but I get: > > - really relevant hit from core0 > - kind of relevant hit from core0 > - pretty relevant hit from core1 > - not so relevant hit from core1 > > So, are the results supposed to be interwoven, and I need to study my data > more, or is this just not something that is possible? > > The only difference wrt relevancy between a distributed search and a single-node search is that there is no distributed IDF and therefore a distributed search assumes a random distribution of terms among shards. I'm not sure if that is what you are seeing. > Also, if this is insurmountable, I've discovered two show stoppers that > will prevent using multicore in my project (counting the lack of support for > faceting in multicore). Are these issues addressed in solr 1.4? > > Can you give more details on what these two issues are? -- Regards, Shalin Shekhar Mangar.
Re: Dataimport MySQLNonTransientConnectionException: No operations allowed after connection closed
First of all let us confirm this issue is fixed in 1.4. 1.4 is stable and a lot of people are using it in production and it is going to be released pretty soon On Mon, Sep 14, 2009 at 8:05 PM, palexv wrote: > > I am using 1.3 > Do you suggest 1.4 from developer trunk? I am concern if it stable. Is it > safe to use it in big commerce app? > > > > Noble Paul നോബിള് नोब्ळ्-2 wrote: >> >> which version of Solr are you using. can you try with a recent one and >> confirm this? >> >> On Mon, Sep 14, 2009 at 7:45 PM, palexv wrote: >>> >>> I know that my issue is related to >>> http://www.nabble.com/dataimporthandler-and-multiple-delta-import-td19160129.html#a19160129 >>> and https://issues.apache.org/jira/browse/SOLR-728 >>> but my case is quite different. >>> As I understand patch at https://issues.apache.org/jira/browse/SOLR-728 >>> prevents concurrent executing of import operation but does NOT put >>> command >>> in a queue. >>> >>> I have only few records to index. When run full reindex - it works very >>> fast. But when I try to rerun this even after a couple of seconds - I am >>> getting >>> Caused by: >>> com.mysql.jdbc.exceptions.MySQLNonTransientConnectionException: >>> No operations allowed after connection closed. >>> >>> At this time, when I check status - it says that status is idle and >>> everything was indexed success. >>> Second run of reindex without exception I can run only after 10 seconds. >>> It does not work for me! If I apply patch from >>> https://issues.apache.org/jira/browse/SOLR-728 - I will unable to reindex >>> in >>> next 10 seconds as well. >>> Any suggestions? >>> -- >>> View this message in context: >>> http://www.nabble.com/Dataimport-MySQLNonTransientConnectionException%3A-No-operations-allowed-after-connection-closed-tp25436605p25436605.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >> >> >> >> -- >> - >> Noble Paul | Principal Engineer| AOL | http://aol.com >> >> > > -- > View this message in context: > http://www.nabble.com/Dataimport-MySQLNonTransientConnectionException%3A-No-operations-allowed-after-connection-closed-tp25436605p25436948.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Is it possible to query for "everything" ?
[* TO *] on the standard handler is an implicit query of default_field_name:[* TO *] which matches only documents that have the default field on them. So [* TO *] and *:* are two very different queries, only the latter guaranteed to match all documents. Erik On Sep 14, 2009, at 9:39 PM, Bill Au wrote: For the standard query handler, try [* TO *]. Bill On Mon, Sep 14, 2009 at 8:46 PM, Jay Hill wrote: With dismax you can use q.alt when the q param is missing: q.alt=*:* should work. -Jay On Mon, Sep 14, 2009 at 5:38 PM, Jonathan Vanasco wrote: Thanks Jay & Matt I tried *:* on my app, and it didn't work I tried it on the solr admin, and it did I checked the solr config file, and realized that it works on standard, but not on dismax, queries So i have my app checking *:* on a standard qt, and then filtering what I need on other qts! I would never have figured this out without you two!
Re: Single Core or Multiple Core?
A large majority of users use single core ONLY. It is hard to explain them the need for an extra componentin the url. I would say it is a design problem which we should solve instead of asking users to change On Tue, Sep 15, 2009 at 3:12 AM, Uri Boness wrote: > IMO forcing the users to do configuration change in Solr or in their > application is the same thing - it all boils down to configuration change > (I'll be very surprised if someone is actually hardcoding the Solr URL in > their system - most probably it is configurable, and if it's not, forcing > them to change it is actually a good thing). >> >> Besides, >> if there's only one core, why need a name? > > Consistency. Having a default core as Israel suggested can probably do the > trick. But, at first it might seem that having a default core and not > needing to specify the core name will make it easier for users to use. But I > actually disagree - don't under estimate the power of being consistent. I > rather have a manual telling me "this is how it works and it always work > like that in all scenarios" then having something like "this is how it works > but if you have scenario A then it works differently and you have to do this > instead". > > Shalin Shekhar Mangar wrote: >> >> On Mon, Sep 14, 2009 at 8:16 PM, Uri Boness wrote: >> >> >>> >>> Is it really a problem? I mean, as i see it, solr to cores is what RDBMS >>> is >>> to databases. When you connect to a database you also need to specify the >>> database name. >>> >>> >>> >> >> The problem is compatibility. If we make solr.xml compulsory then we only >> force people to do a configuration change. But if we make a core name >> mandatory, then we force them to change their applications (or the >> applications' configurations). It is better if we can avoid that. Besides, >> if there's only one core, why need a name? >> >> > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Dealing with term vectors
Hi there, i want to recover the term vectors from indexes not calculating then but just only recovering instead. Some questions about this topic: 1. When i put the option ... what's happening behind? 1. Is Lucene storing the tv in the index? 2. Is Lucene storing additional info to allow tv's calculation? 2. Reading Solr 1.4 Enterprise Search book (amazing book!) found this: " In Solr 1.4, it is now possible to tell Lucene that a field should store these for efficient retrieval. Without them, the same information can be derived at runtime but that's slower" (p. 286) - Does this mean that older Solr versions don't come with this functionality? 3. Can tv component expose raw tem vectors for fields not marked wirh ? Thx -- Lici
New to Solr : How to create solr index for rich documents especially .xls
Hi I am a newbie to Solr. Right now I have to do a task of converting rich documents to Solr readable index format so that I can use the index for searching. I learnt about Solr and got a rough idea of what has to be done. Requirement 1: 1) I have to index the rich document format files like .xls,.pdf,doc,ppt Information that I know: For this as far as I searched in Internet I came to know that we can use Data Import Handler, Apache Tika. ( but how to do that with this ).Should I code with the Data Import Handler ? So far I have downloaded a sample document from net and tried running that. The application runs on a Jetty Web Server and when I query in I get an xml file as output. Problems faced: Since I am very new to java I am not able to get a clear picture of what has to be done and what is this Ant tool used for. Requirement 2: I need to change the Web server from Jetty to Jboss Application server. What has to be done for this? Solution tried: I tried copying the solr.war in to the web app directory and tried running the application. Since I am very new to java I might have made some basic mistake too. Please guide me. Thanks in advance. -- View this message in context: http://www.nabble.com/New-to-Solr-%3A-How-to-create-solr-index-for-rich-documents-especially-.xls-tp25451164p25451164.html Sent from the Solr - User mailing list archive at Nabble.com.
Best strategy to commit often under load.
Hi all, I've got a solr server under significant load ( ~40/s ) and a single process which can potentially commit as often as possible. Typically, when it commits every 5 or 10s, my solr server slows down quite a lot and this can lead to congestion problems on my client side. What would you recommend in this situation, is it better to leave solr performs the commits automatically with reasonable autocommit parameters? What are solr's best practices concerning this point? Thanks for your help! Jerome. -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
Re: Dealing with term vectors
On Sep 15, 2009, at 5:31 AM, Licinio Fernández Maurelo wrote: Hi there, i want to recover the term vectors from indexes not calculating then but just only recovering instead. http://wiki.apache.org/solr/TermVectorComponent Some questions about this topic: 1. When i put the option ... what's happening behind? 1. Is Lucene storing the tv in the index? Yes. 2. Is Lucene storing additional info to allow tv's calculation? 2. Reading Solr 1.4 Enterprise Search book (amazing book!) found this: " In Solr 1.4, it is now possible to tell Lucene that a field should store these for efficient retrieval. Without them, the same information can be derived at runtime but that's slower" (p. 286) - Does this mean that older Solr versions don't come with this functionality? I haven't gotten to that section yet, but I bet it's referring to recreating by analyzing the content. 3. Can tv component expose raw tem vectors for fields not marked wirh ? Not yet. You can use the FieldAnalysisRequestHandler (I think that's the name, it used to be called the DocumentAnalysisRequestHandler) to do that, but that would require two trips to the server. -Grant -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Retrieving a field from all result docuemnts & couple of more queries
Hi, I am familiar with Lucene and trying out Solr. I have index which was created outside solr. The index is fairly simple with two field - document_id & content. The query result needs to return all the document IDs. The result need not be ordered by the score. For this, in Lucene, I use custom hit collector with search to get results quickly. The index has a few million documents and queries returning hundreds of thousands of documents are not uncommon. So, the speed is crucial here. Since retrieving the document_id for each document is slow, I am using FileldCache to store the values of document_id. For all the results collected (in a bitset) with hit collector, document_id field is retrieved from the fieldcache. 1. How can I effectively disable scoring? I have read that ConstantScoreQuery is quite fast, but from the code, I see that it is used only for wildcard queries. How can I use ConstantScoreQuery for all the queries (boolean, term, phrase, ..)? Also, is ConstantScoreQuery as fast as a custom hit collector? 2. How can Solr take advantage of the fieldcache while returning the field document_id? The documentation says, fieldcache can be explicitly auto warmed with Solr. If fieldcache is available and initialized at the beginning, will solr look into the cache to retrieve the fields to be returned? 3. If there is an additional field for stemmed_content on which search needs to use different analyzer, I suppose, that could be specified by fieldType attribute in the schema. Thank you, --shashi
How to create a new index file automatically
Hi all, I am newbie to Solr. I have downloaded and used the solr example and I have a basic doubt. There are some xml documents present in apache-solr-1.3.0\example\exampledocs. These are the input files to solr index and I found that by giving this command java –jar post.jar *.xml . All these xml documents have basic structure schema. Say for example abc … …. I want to index some more files. Then in that case should I have to create a new xml file manually or what should I do to create it automatically. Please give me a solution. I am very new to Solr and so please make it as simple as possible. Thanks a lot... -- View this message in context: http://www.nabble.com/How-to-create-a-new-index-file-automatically-tp25455045p25455045.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: multicore shards and relevancy score
Shalin Shekhar Mangar wrote: On Tue, Sep 15, 2009 at 2:39 AM, Paul Rosen wrote: I've done a few experiments with searching two cores with the same schema using the shard syntax. (using solr 1.3) My use case is that I want to have multiple cores because a few different people will be managing the indexing, and that will happen at different times. The data, however, is homogeneous. Multiple cores were not built for distributed search. It is inefficient as compared to a single index. But if you want to use them that way, that's your choice. Well, I'm experimenting with them because it will simplify index maintenance greatly. I am beginning to think that it won't work in my case, though. I've noticed in my tests that the results are not interwoven, but it might just be my test data. In other words, all the results from one core appear, then all the results from the other core. In thinking about it, it would make sense if the relevancy scores for each core were completely independent of each other. And that would mean that there is no way to compare the relevancy scores between the cores. In other words, I'd like the following results: - really relevant hit from core0 - pretty relevant hit from core1 - kind of relevant hit from core0 - not so relevant hit from core1 but I get: - really relevant hit from core0 - kind of relevant hit from core0 - pretty relevant hit from core1 - not so relevant hit from core1 So, are the results supposed to be interwoven, and I need to study my data more, or is this just not something that is possible? The only difference wrt relevancy between a distributed search and a single-node search is that there is no distributed IDF and therefore a distributed search assumes a random distribution of terms among shards. I'm not sure if that is what you are seeing. Also, if this is insurmountable, I've discovered two show stoppers that will prevent using multicore in my project (counting the lack of support for faceting in multicore). Are these issues addressed in solr 1.4? Can you give more details on what these two issues are? The first issue is detailed above, where the results from a search over two shards don't appear to be returned in relevancy order. The second issue was detailed in an email last week "shards and facet count". The facet information is lost when doing a search over two shards, so if I use multicore, I can no longer have facets.
RE: Dataimport MySQLNonTransientConnectionException: No operations allowed after connection closed
Easy FIX: use autoReconnect=true for MySQL: jdbc:mysql://localhost:3306/?useUnicode=true&characterEncoding=UTF-8&autoReconnect=true May be it will help; connection is auto-closed " after a couple of seconds" (usually 10 seconds) by default, for MySQL... connection pooling won't help (their JDBC is already pool based, and server closes connection after some delays) -Fuad (MySQL contributor) > -Original Message- > From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf Of Noble > Paul ??? ?? > Sent: September-15-09 3:48 AM > To: solr-user@lucene.apache.org > Subject: Re: Dataimport MySQLNonTransientConnectionException: No operations > allowed after connection closed > > First of all let us confirm this issue is fixed in 1.4. > > 1.4 is stable and a lot of people are using it in production and it is > going to be released pretty soon > > On Mon, Sep 14, 2009 at 8:05 PM, palexv wrote: > > > > I am using 1.3 > > Do you suggest 1.4 from developer trunk? I am concern if it stable. Is it > > safe to use it in big commerce app? > > > > > > > > Noble Paul നോബിള് नोब्ळ्-2 wrote: > >> > >> which version of Solr are you using. can you try with a recent one and > >> confirm this? > >> > >> On Mon, Sep 14, 2009 at 7:45 PM, palexv wrote: > >>> > >>> I know that my issue is related to > >>> http://www.nabble.com/dataimporthandler-and-multiple-delta-import- > td19160129.html#a19160129 > >>> and https://issues.apache.org/jira/browse/SOLR-728 > >>> but my case is quite different. > >>> As I understand patch at https://issues.apache.org/jira/browse/SOLR-728 > >>> prevents concurrent executing of import operation but does NOT put > >>> command > >>> in a queue. > >>> > >>> I have only few records to index. When run full reindex - it works very > >>> fast. But when I try to rerun this even after a couple of seconds - I am > >>> getting > >>> Caused by: > >>> com.mysql.jdbc.exceptions.MySQLNonTransientConnectionException: > >>> No operations allowed after connection closed. > >>> > >>> At this time, when I check status - it says that status is idle and > >>> everything was indexed success. > >>> Second run of reindex without exception I can run only after 10 seconds. > >>> It does not work for me! If I apply patch from > >>> https://issues.apache.org/jira/browse/SOLR-728 - I will unable to reindex > >>> in > >>> next 10 seconds as well. > >>> Any suggestions? > >>> -- > >>> View this message in context: > >>> http://www.nabble.com/Dataimport-MySQLNonTransientConnectionException%3A- > No-operations-allowed-after-connection-closed-tp25436605p25436605.html > >>> Sent from the Solr - User mailing list archive at Nabble.com. > >>> > >>> > >> > >> > >> > >> -- > >> - > >> Noble Paul | Principal Engineer| AOL | http://aol.com > >> > >> > > > > -- > > View this message in context: http://www.nabble.com/Dataimport- > MySQLNonTransientConnectionException%3A-No-operations-allowed-after- > connection-closed-tp25436605p25436948.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > > > > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Return one word - Auto Complete Request Handler
On Sep 14, 2009, at 2:06 PM, Mohamed Parvez wrote: I am trying configure an request handler that will be uses in the Auto Complete Query. I am limiting the result to one field by using the "fl" parameter, which can be used to specify field to return. How to make the field return only one word not full sentences. Is http://wiki.apache.org/solr/TermsComponent helpful? -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
do NOT want to stem plurals for a particular field, or words
I have a field where there are items that are plurals, and used as very specific locators, so i do a solr search type:articles, and it translates it into : type:article, then into type:articl... is tehre a way to stop it from doing this on either the field "type" or on a list of words "articles, notes, etc" i tried enering into the protwords.txt file and dont seem to get any where -- View this message in context: http://www.nabble.com/do-NOT-want-to-stem-plurals-for-a-particular-field%2C-or-words-tp25455570p25455570.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: do NOT want to stem plurals for a particular field, or words
Hi, You can enable/disable stemming per field type in the schema.xml, by removing the stemming filters from the type definition. Basically, copy your prefered type, rename it to something like 'text_nostem', remove the stemming filter from the type and use your 'text_nostem' type for your field 'type' . By what you say, I guess your field 'type' will be even more happier to simply be of type 'string' . Jerome. 2009/9/15 DHast : > > I have a field where there are items that are plurals, and used as very > specific locators, so i do a solr search type:articles, and it translates it > into : type:article, then into type:articl... is tehre a way to stop it from > doing this on either the field "type" or on a list of words "articles, > notes, etc" > > i tried enering into the protwords.txt file and dont seem to get any where > -- > View this message in context: > http://www.nabble.com/do-NOT-want-to-stem-plurals-for-a-particular-field%2C-or-words-tp25455570p25455570.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
Expected Approximate Release Date Solr 1.4
Its 15th-November-2009. Its been a year since Solr 1.3 was released. Everyone is eagerly expecting that around this time Solr 1.4 will be released. (Refer Book: Solr 1.4 Enterprise Search Server, By David Smiley & Eric Pugh, Page 11 "the latest official release. Solr 1.3 was released on September 15th, 2008. Solr 1.4 is expected around the same time a year later") Is there any expected approximate release date for Solr 1.4 Thanks/Regards, Parvez
Re: stopfilterFactory isn't removing field name
Could this be related to SOLR-1423? On Mon, Sep 14, 2009 at 8:51 AM, Yonik Seeley wrote: > Thanks, I'll see if I can reproduce... > > -Yonik > http://www.lucidimagination.com > > On Mon, Sep 14, 2009 at 2:10 AM, mike anderson > wrote: > > Yeah.. that was weird. removing the line "forever,for ever" from my > synonyms > > file fixed the problem. In fact, i was having the same problem for every > > double word like that. I decided I didn't really need the synonym filter > for > > that field so I just took it out, but I'd really like to know what the > > problem is. > > -mike > > > > On Mon, Sep 14, 2009 at 1:10 AM, Yonik Seeley < > yo...@lucidimagination.com> > > wrote: > >> > >> That's pretty strange... perhaps something to do with your synonyms > >> file mapping "for" to a zero length token? > >> > >> -Yonik > >> http://www.lucidimagination.com > >> > >> On Mon, Sep 14, 2009 at 12:13 AM, mike anderson > > >> wrote: > >> > I'm kind of stumped by this one.. is it something obvious? > >> > I'm running the latest trunk. In some cases the stopFilterFactory > isn't > >> > removing the field name. > >> > > >> > Thanks in advance, > >> > > >> > -mike > >> > > >> > From debugQuery (both words are in the stopwords file): > >> > > >> > http://localhost:8983/solr/select?q=citations:for&debugQuery=true > >> > > >> > citations:for > >> > citations:for > >> > citations: > >> > citations: > >> > > >> > > >> > http://localhost:8983/solr/select?q=citations:the&debugQuery=true > >> > > >> > citations:the > >> > citations:the > >> > > >> > > >> > > >> > > >> > > >> > > >> > schema analyzer for this field: > >> > > >> > >> > positionIncrementGap="100"> > >> > > >> > > >> > >> > synonyms="substitutions.txt" ignoreCase="true" expand="false"/> > >> > > >> > >> > words="citationstopwords.txt"/> > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> >>> > synonyms="substitutions.txt" ignoreCase="true" expand="false"/> > >> > > >> > >> > words="citationstopwords.txt"/> > >> > > >> > > >> > > >> > > >> > > >> > > > > > >
Re: multicore shards and relevancy score
You can query multiple cores using MultiEmbeddedSearchHandler in SOLR-1431. Then the facet counts will be merged just like the current distributed requests. On Tue, Sep 15, 2009 at 7:41 AM, Paul Rosen wrote: > Shalin Shekhar Mangar wrote: >> >> On Tue, Sep 15, 2009 at 2:39 AM, Paul Rosen >> wrote: >> >>> I've done a few experiments with searching two cores with the same schema >>> using the shard syntax. (using solr 1.3) >>> >>> My use case is that I want to have multiple cores because a few different >>> people will be managing the indexing, and that will happen at different >>> times. The data, however, is homogeneous. >>> >>> >> Multiple cores were not built for distributed search. It is inefficient as >> compared to a single index. But if you want to use them that way, that's >> your choice. > > Well, I'm experimenting with them because it will simplify index maintenance > greatly. I am beginning to think that it won't work in my case, though. > >> >>> I've noticed in my tests that the results are not interwoven, but it >>> might >>> just be my test data. In other words, all the results from one core >>> appear, >>> then all the results from the other core. >>> >>> In thinking about it, it would make sense if the relevancy scores for >>> each >>> core were completely independent of each other. And that would mean that >>> there is no way to compare the relevancy scores between the cores. >>> >>> In other words, I'd like the following results: >>> >>> - really relevant hit from core0 >>> - pretty relevant hit from core1 >>> - kind of relevant hit from core0 >>> - not so relevant hit from core1 >>> >>> but I get: >>> >>> - really relevant hit from core0 >>> - kind of relevant hit from core0 >>> - pretty relevant hit from core1 >>> - not so relevant hit from core1 >>> >>> So, are the results supposed to be interwoven, and I need to study my >>> data >>> more, or is this just not something that is possible? >>> >>> >> The only difference wrt relevancy between a distributed search and a >> single-node search is that there is no distributed IDF and therefore a >> distributed search assumes a random distribution of terms among shards. >> I'm >> not sure if that is what you are seeing. >> >> >>> Also, if this is insurmountable, I've discovered two show stoppers that >>> will prevent using multicore in my project (counting the lack of support >>> for >>> faceting in multicore). Are these issues addressed in solr 1.4? >>> >>> >> Can you give more details on what these two issues are? >> > > The first issue is detailed above, where the results from a search over two > shards don't appear to be returned in relevancy order. > > The second issue was detailed in an email last week "shards and facet > count". The facet information is lost when doing a search over two shards, > so if I use multicore, I can no longer have facets. > > >
Re: Best strategy to commit often under load.
Hi Jerome, 5 seconds is too little using Solr 1.3 or 1.4 because of caching and segment warming. If you turn off caching and segment warming, then you may be able do 5s latency using either a RAMDirectory or an SSD. In the future these issues will be fixed and less than 1s will be possible. -J On Tue, Sep 15, 2009 at 3:07 AM, Jérôme Etévé wrote: > Hi all, > > I've got a solr server under significant load ( ~40/s ) and a single > process which can potentially commit as often as possible. > Typically, when it commits every 5 or 10s, my solr server slows down > quite a lot and this can lead to congestion problems on my client > side. > > What would you recommend in this situation, is it better to leave solr > performs the commits automatically with reasonable autocommit > parameters? > > What are solr's best practices concerning this point? > > Thanks for your help! > > Jerome. > > -- > Jerome Eteve. > http://www.eteve.net > jer...@eteve.net >
Re: stopfilterFactory isn't removing field name
On Tue, Sep 15, 2009 at 1:14 PM, mike anderson wrote: > Could this be related to SOLR-1423? Nope, and I haven't been able to reproduce the bug you saw either. -Yonik > On Mon, Sep 14, 2009 at 8:51 AM, Yonik Seeley > wrote: > >> Thanks, I'll see if I can reproduce... >> >> -Yonik >> http://www.lucidimagination.com >> >> On Mon, Sep 14, 2009 at 2:10 AM, mike anderson >> wrote: >> > Yeah.. that was weird. removing the line "forever,for ever" from my >> synonyms >> > file fixed the problem. In fact, i was having the same problem for every >> > double word like that. I decided I didn't really need the synonym filter >> for >> > that field so I just took it out, but I'd really like to know what the >> > problem is. >> > -mike >> > >> > On Mon, Sep 14, 2009 at 1:10 AM, Yonik Seeley < >> yo...@lucidimagination.com> >> > wrote: >> >> >> >> That's pretty strange... perhaps something to do with your synonyms >> >> file mapping "for" to a zero length token? >> >> >> >> -Yonik >> >> http://www.lucidimagination.com >> >> >> >> On Mon, Sep 14, 2009 at 12:13 AM, mike anderson > > >> >> wrote: >> >> > I'm kind of stumped by this one.. is it something obvious? >> >> > I'm running the latest trunk. In some cases the stopFilterFactory >> isn't >> >> > removing the field name. >> >> > >> >> > Thanks in advance, >> >> > >> >> > -mike >> >> > >> >> > From debugQuery (both words are in the stopwords file): >> >> > >> >> > http://localhost:8983/solr/select?q=citations:for&debugQuery=true >> >> > >> >> > citations:for >> >> > citations:for >> >> > citations: >> >> > citations: >> >> > >> >> > >> >> > http://localhost:8983/solr/select?q=citations:the&debugQuery=true >> >> > >> >> > citations:the >> >> > citations:the >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > schema analyzer for this field: >> >> > >> >> > > >> > positionIncrementGap="100"> >> >> > >> >> > >> >> > > >> > synonyms="substitutions.txt" ignoreCase="true" expand="false"/> >> >> > >> >> > > >> > words="citationstopwords.txt"/> >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > > >> > synonyms="substitutions.txt" ignoreCase="true" expand="false"/> >> >> > >> >> > > >> > words="citationstopwords.txt"/> >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> > >> > >> >
Re: CSV Update - Need help mapping csv field to schema's ID
Bump. Can anyone help guide me in the right direction? Want to map each sku field to the schema unique id field using update/csv. Thanks. Dan. Insight 49, LLC wrote: Using http://localhost:8983/solr/update/csv?stream.file, is there any way to map one of the csv fields to one's schema unique id? e.g. A file with 3 fields (sku, product,price): http://localhost:8983/solr/update/csv?stream.file=products.csv&stream.contentType=text/plain;charset=utf-8&header=true&separator=%2c&encapsulator=%22&escape=%5c&fieldnames=sku,product,price I would like to add an additional name:value pair for every line, mapping the sku field to my schema's id field: .map={sku.field}:{id} I would prefer NOT to change the schema by adding a source="sku" dest="id"/>. I read: http://wiki.apache.org/solr/UpdateCSV, but can't quite get it. Thanks! Dan
Re: CSV Update - Need help mapping csv field to schema's ID
On Tue, Sep 15, 2009 at 2:23 PM, Insight 49, LLC wrote: > Want to map each sku field to the schema unique id field using update/csv. You can set the sku field to be the uniqueKey field in the schema. See http://wiki.apache.org/solr/SchemaXml#head-bec9b4f189d7f493c42f99b479ed0a8d0dd3d76e for more info. Mark Matienzo Applications Developer, Digital Experience Group The New York Public Library
Re: "standard" requestHandler components
: I just copied this information to the wiki at : http://wiki.apache.org/solr/SolrRequestHandler FYI: All of this info is specific to SearchComponents which are specific to SearchHandler -- so that page is a missleading place to put this info (plenty of other SearchHandlers don't support components at all) I've updated the wiki accordingly (most of this info was already on the SearchComponent wiki page) -Hoss
Re: How to create a new index file automatically
There are a few different ways to get data into Solr. XML is one way, and probably the most common. As far as Solr is concerned it doesn't matter whether you construct XML input by hand or write some kind of code to do it. Solr won't automatically create any files like the example .xml files for you, though, nor would it make all that much sense for it to do so. For testing it's fine to use the post.jar script like you're doing, but most people are probably not going to do this in production; rather they'll submit the XML to Solr with an HTTP POST from some indexing process. The format for the XML files is described at http://wiki.apache.org/solr/UpdateXmlMessages If you're doing an HTTP POST, the URL to post to will be something like http://:/solr/update Solr can also accept input in CSV format. Or it can import data from your Sql database using http://wiki.apache.org/solr/DataImportHandler It can import documents in certain other formats using the http://wiki.apache.org/solr/ExtractingRequestHandler Note: I'm not sure if you understand, from your message, that you're going to have to create a schema for your data at some point. The "example" directory contains an example schema, but it probably won't be suitable for your application. See http://wiki.apache.org/solr/SchemaXml 2009/9/15 busbus : > > Hi all, > > I am newbie to Solr. > > I have downloaded and used the solr example and I have a basic doubt. > > There are some xml documents present in > apache-solr-1.3.0\example\exampledocs. > These are the input files to solr index and I found that by giving this > command > > java –jar post.jar *.xml > > . All these xml documents have basic structure schema. > > Say for example > > > > abc > … > …. > > > > > I want to index some more files. Then in that case should I have to create a > new xml file manually or what should I do to create it automatically. > > Please give me a solution. I am very new to Solr and so please make it as > simple as possible. > > Thanks a lot... > > -- > View this message in context: > http://www.nabble.com/How-to-create-a-new-index-file-automatically-tp25455045p25455045.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Solr exception with missing required field (meta_guid_s)
Hi, I have a data-config file where I map the fields of a very simple table using dynamic field definitions : but when I run the dataimport I get this error: WARNING: Error creating document : SolrInputDocumnt[{id_i=id_i(1.0)={2}, name_s=name_s(1.0)={John Smith}, city_s=city_s(1.0)={Newark}}] org.apache.solr.common.SolrException: Document [null] missing required field: meta_guid_s >From the schema.xml I see that the meta_guid_s field is defined as a "Global unique ID" but does this have to be set explicitly or mapped to a particular field? thanks. -- View this message in context: http://www.nabble.com/Solr-exception-with-missing-required-field-%28meta_guid_s%29-tp25460529p25460529.html Sent from the Solr - User mailing list archive at Nabble.com.
faceted query not working as i expected
I'm trying to request documents that have "facet.venue_type" as "Private Collection" Instead I'm also getting items where another field is marked "Permanent Collection" My schema has: required="false" /> stored="true" required="false" /> My query is q=*:* qt=standard facet=true facet.missing=true facet.field=facet.venue_type fq=venue_type:Private+Collection Can anyone offer a suggestion as to what I'm doing wrong ?
Re: Single Core or Multiple Core?
: A large majority of users use single core ONLY. It is hard to explain : them the need for an extra componentin the url. A majority use only a single core because that's all they know because it's what the default example and the tutorial use. Even when people have no have use for running multiple cores with differnet schemas *concurrently* the value of swapping out cores on config upgrade is certainly worth the inconvinince of needing to add "/corename" to the urls they connect from in their clients. : I would say it is a design problem which we should solve instead of : asking users to change the pros/cons of default core names were discussed at great length when multicore support was first added. Because of core swapping and path based requestHandler naming the confusion introduced by trying to have a default core winds up being *vastly* worse then the confusion of trying to explain why they should use "/solr/core/select" instead of "/solr/select" -Hoss
Re: faceted query not working as i expected
--- On Tue, 9/15/09, Jonathan Vanasco wrote: > From: Jonathan Vanasco > Subject: faceted query not working as i expected > To: solr-user@lucene.apache.org > Date: Tuesday, September 15, 2009, 10:54 PM > I'm trying to request documents that > have "facet.venue_type" as "Private Collection" > > Instead I'm also getting items where another field is > marked "Permanent Collection" > > My schema has: > > > indexed="true" stored="true" required="false" /> > type="string" indexed="true" stored="true" required="false" > /> > > /> > > > My query is > > q=*:* > qt=standard > facet=true > facet.missing=true > facet.field=facet.venue_type > fq=venue_type:Private+Collection > > Can anyone offer a suggestion as to what I'm doing wrong ? > The filter query fq=venue_type:Private+Collection has a part that runs on default field. It is parsed to venue_type:Private defaultField:Collection You can use fq=venue_type:"Private+Collection" or fq=venue_type:(Private AND Collection) instead. These will/may bring documents having "something Private Collection" in venue_type field since it is a tokenized field. If you want to retrieve documents that have "facet.venue_type" as "Private Collection" you can use fq:facet.venue_type:"Private Collection" that operates on a string (non-tokenized) field. Hope this helps.
documentation deficiency : case sensitivity of boolean operators
I couldn't find this anywhere on solr's docs / faq i finally found a reference on lucene http://lucene.apache.org/java/2_4_0/queryparsersyntax.html this should really be added somewhere. i'm not sure where, but I thought this was worth bringing up to the list -- as it really confused the hell out of me :)
Re: documentation deficiency : case sensitivity of boolean operators
: Subject: documentation deficiency : case sensitivity of boolean operators : : I couldn't find this anywhere on solr's docs / faq if you have suggestions on places to add it, feel free to update the wiki. (most of the documentation is deliberatly agnostic to the specifics of the query parser syntax, instead relying on links to point you to the same refrence URL you found ... so i can't actually think of anywhere in the Solr docs that mentions the AND/OR/NOT syntax that it would make sense to clarify this) -Hoss
Re: documentation deficiency : case sensitivity of boolean operators
That's already linked from http://wiki.apache.org/solr/SolrQuerySyntax -Yonik http://www.lucidimagination.com On Tue, Sep 15, 2009 at 5:38 PM, Jonathan Vanasco wrote: > I couldn't find this anywhere on solr's docs / faq > > i finally found a reference on lucene > http://lucene.apache.org/java/2_4_0/queryparsersyntax.html > > this should really be added somewhere. i'm not sure where, but I thought > this was worth bringing up to the list -- as it really confused the hell out > of me :) >
Re: Automatically calculate boost factor
: http://wiki.apache.org/solr/FunctionQuery. Either that or roll it up into the : document boost, but that loses some precision. but if that's what you want to do then yes: solr can compute the documenta boost on submission based on the field values ... *IF* if you write an UpdateProcessor to do that. : > 1.2 : > 1.5 : > 0.8 : > : > Document boost = 1.2*1.5*0.8 : > : > Is it possible to get SOLr to calculate the boost automatically upon : > submission based on field values? -Hoss
Re: Expected Approximate Release Date Solr 1.4
: Its 15th-November-2009. Its been a year since Solr 1.3 was released. It's september actaully. : Is there any expected approximate release date for Solr 1.4 there is no specific date, but the timeframe and what the release is dependent on have been discussed in several threads... http://wiki.apache.org/solr/Solr1.4 -Hoss
Re: CSV Update - Need help mapping csv field to schema's ID
: I would like to add an additional name:value pair for every line, mapping the : sku field to my schema's id field: : : .map={sku.field}:{id} the map param is for replacing a *value* with a different' value ... it's useful for things like numeric codes in CSV files that you want to replace with strings in your index. : I would prefer NOT to change the schema by adding a . that's the only solution i can think of unless you want to write an UpdateProcessor. -Hoss
Multiple parsedquery in the result set when debugQuery=true
Are there supposed to be multiple parsedquery entries for a distributed query when debugQuery=true?
Re: Retrieving a field from all result docuemnts & couple of more queries
Hi, 1)Solr has various type of caches . We can specify how many documents cache can have at a time. e.g. if windowsize=50 50 results will be cached in queryResult Cache. if user makes a new request to server for results after 50 documents a new request will be sent to the server & server will retrieve next 50 results in the cache. http://wiki.apache.org/solr/SolrCaching Yes, solr looks into the cache to retrieve the fields to be returned. 2) Yes, we can have different tokenizers or filters for index & search. We need not create a different fieldtype. We need to configure the same fieldtype (datatype) for index & search analyzers sections differently. e.g. ** * * Regards, Abhay On Tue, Sep 15, 2009 at 6:41 PM, Shashikant Kore wrote: > Hi, > > I am familiar with Lucene and trying out Solr. > > I have index which was created outside solr. The index is fairly > simple with two field - document_id & content. The query result needs > to return all the document IDs. The result need not be ordered by the > score. For this, in Lucene, I use custom hit collector with search to > get results quickly. The index has a few million documents and queries > returning hundreds of thousands of documents are not uncommon. So, the > speed is crucial here. > > Since retrieving the document_id for each document is slow, I am using > FileldCache to store the values of document_id. For all the results > collected (in a bitset) with hit collector, document_id field is > retrieved from the fieldcache. > > 1. How can I effectively disable scoring? I have read that > ConstantScoreQuery is quite fast, but from the code, I see that it is > used only for wildcard queries. How can I use ConstantScoreQuery for > all the queries (boolean, term, phrase, ..)? Also, is > ConstantScoreQuery as fast as a custom hit collector? > > 2. How can Solr take advantage of the fieldcache while returning the > field document_id? The documentation says, fieldcache can be > explicitly auto warmed with Solr. If fieldcache is available and > initialized at the beginning, will solr look into the cache to > retrieve the fields to be returned? > > 3. If there is an additional field for stemmed_content on which search > needs to use different analyzer, I suppose, that could be specified by > fieldType attribute in the schema. > > Thank you, > > --shashi >
Re: How to create a new index file automatically
> It can import documents in certain other formats using the > http://wiki.apache.org/solr/ExtractingRequestHandler > 1) According to my inference.Solr uses Apache Tikka to convert other rich document format files to Text Files, so that the Class ExtractRequestHandler use the output text file to create the Index files. 2. If Point 1 is correct,then I think this could suit my requirements since I need to index rich documents files especially .xls format. But i cant find the class ExtractRequestHandler which has to be configured in SOLRCONFIG.xml file, so that i can import XLS documents through the servlet ttp://localhost:8983/solr/update/extract?= -- View this message in context: http://www.nabble.com/How-to-create-a-new-index-file-automatically-tp25455045p25466714.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr exception with missing required field (meta_guid_s)
On Wed, Sep 16, 2009 at 1:13 AM, kedardes wrote: > > Hi, I have a data-config file where I map the fields of a very simple table > using dynamic field definitions : > > > > > > > > > > but when I run the dataimport I get this error: > WARNING: Error creating document : SolrInputDocumnt[{id_i=id_i(1.0)={2}, > name_s=name_s(1.0)={John Smith}, city_s=city_s(1.0)={Newark}}] > org.apache.solr.common.SolrException: Document [null] missing required > field: meta_guid_s > > From the schema.xml I see that the meta_guid_s field is defined as a > "Global > unique ID" but does this have to be set explicitly or mapped to a > particular > field? > You have created that schema so you are the better person to answer that question. As far as a required field or uniqueKey is concerned, their values have to be set or copied from another field. -- Regards, Shalin Shekhar Mangar.
Re: Questions on copyField
Would appreciate any help on this. Thanks Rahul On Mon, Sep 14, 2009 at 5:12 PM, Rahul R wrote: > Hello, > I have a few questions regarding the copyField directive in schema.xml > > 1. Does the destination field store a reference or the actual data ? > If I have soemthing like this > > then will the values in the 'name' field get copied into the 'text' field > or will the 'text' field only store a reference to the 'name' field ? To put > it more simply, if I later delete the 'name' field from the index will I > lose the corresponding data in the 'text' field ? > > 2. Is there any inbuilt API which I can use to do the copyField action > programmatically ? > > 3. Can I do a copyfield from the schema as well as programmatically for the > same destination field > Suppose I want the 'text' field to contain values for name, age and > location. In my index only 'name' and 'age' are defined as fields. So I can > add directives like > > > The location however, I want to add it to the 'text' field > programmatically. I don't want to store the location as a separate field in > the index. Can I do this ? > > Thank you. > > Regards > Rahul >