Re: spellcheck /too many open files
On Tue, Jun 9, 2009 at 11:15 AM, revas wrote: > > 1)Does the spell check component support all languages? > SpellCheckComponent relies on Lucene/Solr analyzers and tokenizers. So if you can find an analyzer/tokenizer for your language, spell checker can work. > 2) I have a scnenario where i have abt 20 webapps in a single container.We > get too many open files at index time /while restarting tomcat. Is that because of SpellCheckComponent? > The mergefactor is at default. > > If i reduce the merge factor to 2 and optimize the index ,will the open > files be closed automatically or would i have to reindex to close the open > files or how do i close the already opened files.This is on linux with > solr > 1.3 and tomcat 5.5 > Lucene/Solr does not keep any file opened longer than it is necessary. But decreasing merge factor should help. You can also increase the open file limit on your system. -- Regards, Shalin Shekhar Mangar.
Re: Use the same SQL Field in Dataimporthandler twice?
Ok here it goes: " " The name of the database is "dbA" and the table name is "project". Everything works out fine except the comment part highlighted (bold). That works to as I stated If I change the phrase to: " " so that I don´t use my primary key "id" twice but the problem is I need to use "id" for the comment part too. kind regards, Sebastian Noble Paul നോബിള് नोब्ळ्-2 wrote: > > On Tue, Jun 9, 2009 at 12:41 AM, gateway0 wrote: >> >> Thanks for your answer. >> >> "${db.tableA.id}" that specifies the sql query that the Dataimporthandler >> should Use the sql field "id" in table "tableA" located in Database "db". > > The naming convention does not work like that. > > if the entity name is 'tableA' then the field 'id' is addressed as > 'tableA.id' > > As I said earlier, if you could privide mw with the entire > data-config.xml it would be more helpful > >> >> like in the example from the Solr Wiki: >> " >> >> " >> >> It´s strange I know but when I use something other than "id" as the >> foreign >> key for the query everything works! >> >> like: >> "${db.tableA.anotherid}" >> >> >> >> Noble Paul നോബിള് नोब्ळ्-2 wrote: >>> >>> what is ${db.tableA.id} ? >>> >>> I think there is something extra in that >>> >>> can you paste the whole data-config.xml? >>> >>> can you paste >>> >>> On Sun, Jun 7, 2009 at 1:09 AM, gateway0 wrote: Hi, I tried to do the following: " " So I use the SQL Table Field "id" twice once for "db_id" in my index and for the sql query as "fid=id". That doesn´t work! But when I change the query from "fid=id" to like "fid=otherkey" it does work! Like: " " Is there any other kind of a workaround so I can use the SQL Field "id" twice as I wanted to? Thanks kind regards, Sebastian -- View this message in context: http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23904968.html Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >>> >>> -- >>> - >>> Noble Paul | Principal Engineer| AOL | http://aol.com >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23930286.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com > > -- View this message in context: http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23938282.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Use the same SQL Field in Dataimporthandler twice?
can you avoid "." dots in the entity name and try it out. dots are special characters and it should have caused some problem On Tue, Jun 9, 2009 at 1:37 PM, gateway0 wrote: > > Ok here it goes: > " > > > driver="com.mysql.jdbc.Driver" > url="jdbc:mysql://localhost:3306/dbA?zeroDateTimeBehavior=convertToNull" > user="root" password=""/> > > transformer="TemplateTransformer" query="select *, 'dbA.project' from > project"> > > template="${dbA.project.dbA.project},id:${dbA.project.id}"/> > > > > > > > > > > dateTimeFormat="-MM-dd'T'hh:mm:ss"/> > dateTimeFormat="-MM-dd'T'hh:mm:ss"/> > > > > > > > > > > > > " > The name of the database is "dbA" and the table name is "project". > > Everything works out fine except the comment part highlighted (bold). That > works to as I stated If I change the phrase to: > " > > > > " > so that I don´t use my primary key "id" twice but the problem is I need to > use "id" for the comment part too. > > kind regards, Sebastian > > > Noble Paul നോബിള് नोब्ळ्-2 wrote: >> >> On Tue, Jun 9, 2009 at 12:41 AM, gateway0 wrote: >>> >>> Thanks for your answer. >>> >>> "${db.tableA.id}" that specifies the sql query that the Dataimporthandler >>> should Use the sql field "id" in table "tableA" located in Database "db". >> >> The naming convention does not work like that. >> >> if the entity name is 'tableA' then the field 'id' is addressed as >> 'tableA.id' >> >> As I said earlier, if you could privide mw with the entire >> data-config.xml it would be more helpful >> >>> >>> like in the example from the Solr Wiki: >>> " >>> >>> " >>> >>> It´s strange I know but when I use something other than "id" as the >>> foreign >>> key for the query everything works! >>> >>> like: >>> "${db.tableA.anotherid}" >>> >>> >>> >>> Noble Paul നോബിള് नोब्ळ्-2 wrote: what is ${db.tableA.id} ? I think there is something extra in that can you paste the whole data-config.xml? can you paste On Sun, Jun 7, 2009 at 1:09 AM, gateway0 wrote: > > Hi, > > I tried to do the following: > > " > > > > > > " > > So I use the SQL Table Field "id" twice once for "db_id" in my index > and > for > the sql query as "fid=id". > > That doesn´t work! > > But when I change the query from "fid=id" to like "fid=otherkey" it > does > work! > Like: > " > > > > > > " > > Is there any other kind of a workaround so I can use the SQL Field "id" > twice as I wanted to? Thanks > > kind regards, Sebastian > -- > View this message in context: > http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23904968.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com >>> >>> -- >>> View this message in context: >>> http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23930286.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >> >> >> >> -- >> - >> Noble Paul | Principal Engineer| AOL | http://aol.com >> >> > > -- > View this message in context: > http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23938282.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Solr Multiple Queries?
Hi there Samnang! Please see inline for comments: On Tue, 09 Jun 2009 08:40:02 +0200, Samnang Chhun wrote: Hi all, I just get started looking at using Solr as my search web service. But I don't know does Solr have some features for multiple queries: - Startswith This is what we call prefix queries and wild card queries. For instance, you want something that starts with "man", you can search for man* - Exact Match Exact matching is done with apostrophes; "Solr rocks" - Contain Hmm, what do you mean by contain? Inside a given word? That might be a bit more tricky. We have an issue open at the moment for supporting leading wildcards, and that might allow for you to search for *cogn* and match recognition etc. If that was what you meant, you can look at the ongoing issue http://issues.apache.org/jira/browse/SOLR-218 - Doesn't Contain NOT or - are keywords to exclude something (solr supports all the boolean operators that Lucene supports). - In the range range queries in solr are done by using brackets. for instance price:[500 TO 1000] will return all results with prices ranging from 500 to 1000. There is a lot of information on the Wiki that you should check out: http://wiki.apache.org/solr/ Could anyone guide me how to implement those features in Solr? Cheers, Samnang Cheers, Aleks -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
solr in distributed mode
Hi, I was looking for ways in which we can use solr in distributed mode. is there anyways we can use solr indexes across machines or by using Hadoop Distributed File System? Its has been mentioned in the wiki that When an index becomes too large to fit on a single system, or when a single query takes too long to execute, an index can be split into multiple shards, and Solr can query and merge results across those shards. what i understand is that shards are a partition. are shards on the same machine or can it be on different machines?? do we have to manually split the indexes to store in different shards. do you have an example or some tutorial which demonstrates distributed index searching/ storing using shards? Regards, Raakhi
Re: User Credentials for Solr Data Dir
nope On Tue, Jun 9, 2009 at 4:59 AM, vaibhav joshi wrote: > > Hi, > > > > I am currently using solr 1.3 and runnign the sole as NT service. I need to > store data indexes on a Remote Filer machine. the Filer needs user > credentials inorder to access the same.. Is there a solr configuration which > I can use to pass these credentials? > > > > I was reading some blogs and they suggested to run the NT service with user > who can access the resource needed. Since I need to use existing build and > deploy tools in the company, and they always run the NT serviec "LOCAL > System" which cannot access other resource. > > > > Thats why i am trying to explore if its possible to pass these credentials > via JNDI/System variables? Is it possible? > > > > Thanks > > Vaibhav > > > > _ > More than messages–check out the rest of the Windows Live™. > http://www.microsoft.com/india/windows/windowslive/ -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Use the same SQL Field in Dataimporthandler twice?
No I changed the entity name to "dbA:project" but still the same problem. Interesting sidenote If I use my Data-Config as posted (with the "id" field in the comment section) none of the other entities works anymore like for example: " entity name="user" dataSource="dbA" query="select username from ci_user where userid='${dbA.project.created_by}' "> " returns an empty result. Still can´t figure it out why I cant use the (sql)tables primary key - once to save it in the index directly and - twice to query against my comment table Noble Paul നോബിള് नोब्ळ्-2 wrote: > > can you avoid "." dots in the entity name and try it out. dots are > special characters and it should have caused some problem > > On Tue, Jun 9, 2009 at 1:37 PM, gateway0 wrote: >> >> Ok here it goes: >> " >> >> >> > driver="com.mysql.jdbc.Driver" >> url="jdbc:mysql://localhost:3306/dbA?zeroDateTimeBehavior=convertToNull" >> user="root" password=""/> >> >> > transformer="TemplateTransformer" query="select *, 'dbA.project' from >> project"> >> >> > template="${dbA.project.dbA.project},id:${dbA.project.id}"/> >> >> >> >> >> >> >> >> >> >> > dateTimeFormat="-MM-dd'T'hh:mm:ss"/> >> > dateTimeFormat="-MM-dd'T'hh:mm:ss"/> >> >> >> >> >> >> >> >> >> >> >> >> " >> The name of the database is "dbA" and the table name is "project". >> >> Everything works out fine except the comment part highlighted (bold). >> That >> works to as I stated If I change the phrase to: >> " >> >> >> >> " >> so that I don´t use my primary key "id" twice but the problem is I need >> to >> use "id" for the comment part too. >> >> kind regards, Sebastian >> >> >> Noble Paul നോബിള് नोब्ळ्-2 wrote: >>> >>> On Tue, Jun 9, 2009 at 12:41 AM, gateway0 wrote: Thanks for your answer. "${db.tableA.id}" that specifies the sql query that the Dataimporthandler should Use the sql field "id" in table "tableA" located in Database "db". >>> >>> The naming convention does not work like that. >>> >>> if the entity name is 'tableA' then the field 'id' is addressed as >>> 'tableA.id' >>> >>> As I said earlier, if you could privide mw with the entire >>> data-config.xml it would be more helpful >>> like in the example from the Solr Wiki: " " It´s strange I know but when I use something other than "id" as the foreign key for the query everything works! like: "${db.tableA.anotherid}" Noble Paul നോബിള് नोब्ळ्-2 wrote: > > what is ${db.tableA.id} ? > > I think there is something extra in that > > can you paste the whole data-config.xml? > > can you paste > > On Sun, Jun 7, 2009 at 1:09 AM, gateway0 wrote: >> >> Hi, >> >> I tried to do the following: >> >> " >> >> >> >> >> >> " >> >> So I use the SQL Table Field "id" twice once for "db_id" in my index >> and >> for >> the sql query as "fid=id". >> >> That doesn´t work! >> >> But when I change the query from "fid=id" to like "fid=otherkey" it >> does >> work! >> Like: >> " >> >> >> >> >> >> " >> >> Is there any other kind of a workaround so I can use the SQL Field >> "id" >> twice as I wanted to? Thanks >> >> kind regards, Sebastian >> -- >> View this message in context: >> http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23904968.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com > > -- View this message in context: http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23930286.html Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >>> >>> -- >>> - >>> Noble Paul | Principal Engineer| AOL | http://aol.com >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23938282.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com > > -- View this message in context: http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23939391.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: spellcheck /too many open files
But the spell check componenet uses the n-gram analyzer and henc should work for any language ,is this correct ,also we can refer an extern dictionary for suggestions ,could this be in any language? The open files is not because of spell check as we have not yet implemented this yet, every time we restart solr we need to up the ulimit ,otherwise it does not work,so is there any workaround to permanently close this open files ,does optmizing the index close it? Regards Sujatha On Tue, Jun 9, 2009 at 12:53 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > On Tue, Jun 9, 2009 at 11:15 AM, revasHi wrote: > > > > > 1)Does the spell check component support all languages? > > > > SpellCheckComponent relies on Lucene/Solr analyzers and tokenizers. So if > you can find an analyzer/tokenizer for your language, spell checker can > work. > > > > 2) I have a scnenario where i have abt 20 webapps in a single > container.We > > get too many open files at index time /while restarting tomcat. > > > Is that because of SpellCheckComponent? > > > > The mergefactor is at default. > > > > If i reduce the merge factor to 2 and optimize the index ,will the open > > files be closed automatically or would i have to reindex to close the > open > > files or how do i close the already opened files.This is on linux with > > solr > > 1.3 and tomcat 5.5 > > > > Lucene/Solr does not keep any file opened longer than it is necessary. But > decreasing merge factor should help. You can also increase the open file > limit on your system. > > -- > Regards, > Shalin Shekhar Mangar. >
Lucene2.9-dev version in Solr nightly-build and FieldCache memory usage
Hey there, Does the lucene2.9-dev used in current Solr nighty-build (9-6-2009) include the patch LUCENE-1662 to avoid doubling memory usage in lucene FieldCache?? Thanks in advance -- View this message in context: http://www.nabble.com/Lucene2.9-dev-version-in-Solr-nightly-build-and-FieldCache-memory-usage-tp23939495p23939495.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: spellcheck /too many open files
On Tue, Jun 9, 2009 at 2:56 PM, revas wrote: > But the spell check componenet uses the n-gram analyzer and henc should > work > for any language ,is this correct ,also we can refer an extern dictionary > for suggestions ,could this be in any language? > Yes it does use n-grams but there's an analysis step before the n-grams are created. For example, if you are creating your spell check index from a Solr field, SpellCheckComponent uses that field's index time analyzer. So you should create your language-specific fields in such a way that the analysis works correctly for that language. > The open files is not because of spell check as we have not yet implemented > this yet, every time we restart solr we need to up the ulimit ,otherwise it > does not work,so is there any workaround to permanently close this open > files ,does optmizing the index close it? > Optimization merges the segments of the index into one big segment. So it will reduce the number of files. However, during the merge it may create many more files. The old files after the merge are cleanup by Lucene in a while (unless you have changed the defaults in the IndexDeletionPolicy section in solrconfig.xml). -- Regards, Shalin Shekhar Mangar.
Multiple queries in one, something similar to a SQL "union"
I have an index with two fields - name and type. I need to perform a search on the name field so that *equal number of results are fetched for each type *. Currently, I am achieving this by firing multiple queries with a different type and then merging the results. In my database driven version, I used to do a "union" of multiple queries (and not separate SQL queries) to achieve this. Can Solr do something similar? If not, can this be a possible enhancement? Cheers Avlesh
Re: Use the same SQL Field in Dataimporthandler twice?
There should be no problem if you re-use the same variable are you sure you removed the dots from everywhere? On Tue, Jun 9, 2009 at 2:55 PM, gateway0 wrote: > > No I changed the entity name to "dbA:project" but still the same problem. > > Interesting sidenote If I use my Data-Config as posted (with the "id" field > in the comment section) none of the other entities works anymore like for > example: > " > entity name="user" dataSource="dbA" query="select username from > ci_user where userid='${dbA.project.created_by}' "> > > > " > returns an empty result. > > Still can´t figure it out why I cant use the (sql)tables primary key > - once to save it in the index directly and > - twice to query against my comment table > > > > > > Noble Paul നോബിള് नोब्ळ्-2 wrote: >> >> can you avoid "." dots in the entity name and try it out. dots are >> special characters and it should have caused some problem >> >> On Tue, Jun 9, 2009 at 1:37 PM, gateway0 wrote: >>> >>> Ok here it goes: >>> " >>> >>> >>> >> driver="com.mysql.jdbc.Driver" >>> url="jdbc:mysql://localhost:3306/dbA?zeroDateTimeBehavior=convertToNull" >>> user="root" password=""/> >>> >>> >> transformer="TemplateTransformer" query="select *, 'dbA.project' from >>> project"> >>> >>> >> template="${dbA.project.dbA.project},id:${dbA.project.id}"/> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >> dateTimeFormat="-MM-dd'T'hh:mm:ss"/> >>> >> dateTimeFormat="-MM-dd'T'hh:mm:ss"/> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> " >>> The name of the database is "dbA" and the table name is "project". >>> >>> Everything works out fine except the comment part highlighted (bold). >>> That >>> works to as I stated If I change the phrase to: >>> " >>> >>> >>> >>> " >>> so that I don´t use my primary key "id" twice but the problem is I need >>> to >>> use "id" for the comment part too. >>> >>> kind regards, Sebastian >>> >>> >>> Noble Paul നോബിള് नोब्ळ्-2 wrote: On Tue, Jun 9, 2009 at 12:41 AM, gateway0 wrote: > > Thanks for your answer. > > "${db.tableA.id}" that specifies the sql query that the > Dataimporthandler > should Use the sql field "id" in table "tableA" located in Database > "db". The naming convention does not work like that. if the entity name is 'tableA' then the field 'id' is addressed as 'tableA.id' As I said earlier, if you could privide mw with the entire data-config.xml it would be more helpful > > like in the example from the Solr Wiki: > " > > " > > It´s strange I know but when I use something other than "id" as the > foreign > key for the query everything works! > > like: > "${db.tableA.anotherid}" > > > > Noble Paul നോബിള് नोब्ळ्-2 wrote: >> >> what is ${db.tableA.id} ? >> >> I think there is something extra in that >> >> can you paste the whole data-config.xml? >> >> can you paste >> >> On Sun, Jun 7, 2009 at 1:09 AM, gateway0 wrote: >>> >>> Hi, >>> >>> I tried to do the following: >>> >>> " >>> >>> >>> >>> >>> >>> " >>> >>> So I use the SQL Table Field "id" twice once for "db_id" in my index >>> and >>> for >>> the sql query as "fid=id". >>> >>> That doesn´t work! >>> >>> But when I change the query from "fid=id" to like "fid=otherkey" it >>> does >>> work! >>> Like: >>> " >>> >>> >>> >>> >>> >>> " >>> >>> Is there any other kind of a workaround so I can use the SQL Field >>> "id" >>> twice as I wanted to? Thanks >>> >>> kind regards, Sebastian >>> -- >>> View this message in context: >>> http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23904968.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >> >> >> >> -- >> - >> Noble Paul | Principal Engineer| AOL | http://aol.com >> >> > > -- > View this message in context: > http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23930286.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com >>> >>> -- >>> View this message in context: >>> http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23938282.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >> >> >> >> -- >> - >> Noble Pau
Re: Multiple queries in one, something similar to a SQL "union"
I don't know if I follow you correctly, but you are saying that you want X results per type? So you do something like limit=X and query = type:Y etc. and merge the results? - Aleks On Tue, 09 Jun 2009 12:33:21 +0200, Avlesh Singh wrote: I have an index with two fields - name and type. I need to perform a search on the name field so that *equal number of results are fetched for each type *. Currently, I am achieving this by firing multiple queries with a different type and then merging the results. In my database driven version, I used to do a "union" of multiple queries (and not separate SQL queries) to achieve this. Can Solr do something similar? If not, can this be a possible enhancement? Cheers Avlesh -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: spellcheck /too many open files
Thanks ShalinWhen we use the external file dictionary (if there is one),then it should work fine ,right for spell check,also is there any format for this file Regards Sujatha On Tue, Jun 9, 2009 at 3:03 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > On Tue, Jun 9, 2009 at 2:56 PM, revas wrote: > > > But the spell check componenet uses the n-gram analyzer and henc should > > work > > for any language ,is this correct ,also we can refer an extern dictionary > > for suggestions ,could this be in any language? > > > > Yes it does use n-grams but there's an analysis step before the n-grams are > created. For example, if you are creating your spell check index from a > Solr > field, SpellCheckComponent uses that field's index time analyzer. So you > should create your language-specific fields in such a way that the analysis > works correctly for that language. > > > > The open files is not because of spell check as we have not yet > implemented > > this yet, every time we restart solr we need to up the ulimit ,otherwise > it > > does not work,so is there any workaround to permanently close this open > > files ,does optmizing the index close it? > > > > Optimization merges the segments of the index into one big segment. So it > will reduce the number of files. However, during the merge it may create > many more files. The old files after the merge are cleanup by Lucene in a > while (unless you have changed the defaults in the IndexDeletionPolicy > section in solrconfig.xml). > > -- > Regards, > Shalin Shekhar Mangar. >
Sharding strategy
Hi all, I'm trying to figure out how to shard our index as it is growing rapidly and we want to make our solution scalable. So, we have documents that are most commonly sorted by their date. My initial thought is to shard the index by date, but I wonder if you have any input on this and how to best solve this... I know that the most frequent queries will be executed against the "latest" shard, but then let's say we shard by year, how do we best solve the situation that will occur in the beginning of a new year? (Some of the data will be in the last shard, but most of it will be on the second last shard.) Would it be stupid to have a "latest" shard with duplicate data (always consisting of the last 6 months or something like that) and maintain that index in addition to the regular yearly shards? Any one else facing a similar situation with a good solution? Any input would be greatly appreciated :) Cheers, Aleksander -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: spellcheck /too many open files
On Tue, Jun 9, 2009 at 4:32 PM, revas wrote: > Thanks ShalinWhen we use the external file dictionary (if there is > one),then it should work fine ,right for spell check,also is there any > format for this file > The external file should have one token per line. See http://wiki.apache.org/solr/FileBasedSpellChecker The default analyzer is WhitespaceAnalyzer. So all tokens in the file will be split on whitespace and the resulting tokens will be used for giving suggestions. If you want to change the analyzer, specify fieldType in the spell checker configuration and the component will use the analyzer configured for that field type. -- Regards, Shalin Shekhar Mangar.
Re: spellcheck /too many open files
Thanks On Tue, Jun 9, 2009 at 5:14 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > On Tue, Jun 9, 2009 at 4:32 PM, revas wrote: > > > Thanks ShalinWhen we use the external file dictionary (if there is > > one),then it should work fine ,right for spell check,also is there any > > format for this file > > > > The external file should have one token per line. See > http://wiki.apache.org/solr/FileBasedSpellChecker > > The default analyzer is WhitespaceAnalyzer. So all tokens in the file will > be split on whitespace and the resulting tokens will be used for giving > suggestions. If you want to change the analyzer, specify fieldType in the > spell checker configuration and the component will use the analyzer > configured for that field type. > > -- > Regards, > Shalin Shekhar Mangar. >
Re: Multiple queries in one, something similar to a SQL "union"
> > I don't know if I follow you correctly, but you are saying that you want X > results per type? > You are right. I need "X" number of results per type. So you do something like limit=X and query = type:Y etc. and merge the > results? > That is what the question is! Which means, if I have 4 types, I am currently making 4 queries to Solr. The question is aimed to find a possibility of doing it in a single query ... and, suggesting that the implementation could be more on the lines of a SQL Union. Cheers Avlesh On Tue, Jun 9, 2009 at 4:29 PM, Aleksander M. Stensby < aleksander.sten...@integrasco.no> wrote: > I don't know if I follow you correctly, but you are saying that you want X > results per type? > So you do something like limit=X and query = type:Y etc. and merge the > results? > > - Aleks > > > > On Tue, 09 Jun 2009 12:33:21 +0200, Avlesh Singh wrote: > > I have an index with two fields - name and type. I need to perform a >> search >> on the name field so that *equal number of results are fetched for each >> type >> *. >> Currently, I am achieving this by firing multiple queries with a different >> type and then merging the results. >> In my database driven version, I used to do a "union" of multiple queries >> (and not separate SQL queries) to achieve this. >> >> Can Solr do something similar? If not, can this be a possible enhancement? >> >> Cheers >> Avlesh >> > > > > -- > Aleksander M. Stensby > Lead software developer and system architect > Integrasco A/S > www.integrasco.no > http://twitter.com/Integrasco > > Please consider the environment before printing all or any of this e-mail >
Re: Multiple queries in one, something similar to a SQL "union"
On Tue, Jun 9, 2009 at 4:03 PM, Avlesh Singh wrote: > I have an index with two fields - name and type. I need to perform a search > on the name field so that *equal number of results are fetched for each > type > *. > Currently, I am achieving this by firing multiple queries with a different > type and then merging the results. > In my database driven version, I used to do a "union" of multiple queries > (and not separate SQL queries) to achieve this. > > Can Solr do something similar? If not, can this be a possible enhancement? > Not right now. There's an issue open: https://issues.apache.org/jira/browse/SOLR-1093 -- Regards, Shalin Shekhar Mangar.
Re: solr in distributed mode
Rakhi Khatwani wrote: Hi, I was looking for ways in which we can use solr in distributed mode. is there anyways we can use solr indexes across machines or by using Hadoop Distributed File System? Its has been mentioned in the wiki that When an index becomes too large to fit on a single system, or when a single query takes too long to execute, an index can be split into multiple shards, and Solr can query and merge results across those shards. what i understand is that shards are a partition. are shards on the same machine or can it be on different machines?? do we have to manually split the indexes to store in different shards. do you have an example or some tutorial which demonstrates distributed index searching/ storing using shards? Regards, Raakhi You might check out this article to get an idea of how Solr scales (lot of extra stuff in Lucene in there too, just skip to around) http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr You can also check out the wiki: http://wiki.apache.org/solr/DistributedSearch Also see: Solr 1.4 : http://wiki.apache.org/solr/SolrReplication Solr 1.3,1.4: http://wiki.apache.org/solr/CollectionDistribution -- - Mark http://www.lucidimagination.com
Re: solr distributed search example - exception
Thanks for bringing closure to this Raakhi. - Mark Rakhi Khatwani wrote: Hi Mark, i actually got this error coz i was using an old version of java. now the problem is solved Thanks anyways Raakhi On Tue, Jun 9, 2009 at 11:17 AM, Rakhi Khatwani wrote: Hi Mark, yea i would like to open a JIRA issue for it. how do i go about that? Regards, Raakhi On Mon, Jun 8, 2009 at 7:58 PM, Mark Miller wrote: That is a very odd cast exception to get. Do you want to open a JIRA issue for this? It looks like an odd exception because the call is: NodeList nodes = (NodeList)solrConfig.evaluate(configPath, XPathConstants.NODESET); // cast exception is we get an ArrayList rather than NodeList Which leads to: Object o = xpath.evaluate(xstr, doc, type); where type = XPathConstants.NODESET So you get back an Object based on the XPathConstant passed. There does not appear to be a value that would return an ArrayList. Using XPathConstants.NODESET gets you a NodeList according to the XPath API. I'm not sure what could cause this to happen. - Mark Rakhi Khatwani wrote: Hi, I was executing a simple example which demonstrates DistributedSearch. example provided in the following link: http://wiki.apache.org/solr/DistributedSearch however, when i startup the server in both port nos: 8983 and 7574, i get the following exception: SEVERE: Could not start SOLR. Check solr/home property java.lang.ClassCastException: java.util.ArrayList cannot be cast to org.w3c.dom.NodeList at org.apache.solr.search.CacheConfig.getMultipleConfigs(CacheConfig.java:61) at org.apache.solr.core.SolrConfig.(SolrConfig.java:131) at org.apache.solr.core.SolrConfig.(SolrConfig.java:70) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594) at org.mortbay.jetty.servlet.Context.startContext(Context.java:139) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117) at org.mortbay.jetty.Server.doStart(Server.java:210) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929) at java.lang.reflect.Method.invoke(libgcj.so.7rh) at org.mortbay.start.Main.invokeMain(Main.java:183) at org.mortbay.start.Main.start(Main.java:497) at org.mortbay.start.Main.main(Main.java:115) 2009-06-08 18:36:28.016::WARN: failed SolrRequestFilter java.lang.NoClassDefFoundError: org.apache.solr.core.SolrCore at java.lang.Class.initializeClass(libgcj.so.7rh) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:77) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594) at org.mortbay.jetty.servlet.Context.startContext(Context.java:139) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117) at org.mortbay.jetty.Server.doStart(Server.java:210) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.xml.XmlConfiguratio
Re: Use the same SQL Field in Dataimporthandler twice?
Noticed this "warning" in the log file: " Jun 9, 2009 2:53:35 PM org.apache.solr.handler.dataimport.TemplateTransformer transformRow WARNING: Unable to resolve variable: dbA.project.id while parsing expression: ${dbA.project.dbA.project},id:${dbA.project.id} " Ok? Whats that suppose to mean? gateway0 wrote: > > No I changed the entity name to "dbA:project" but still the same problem. > > Interesting sidenote If I use my Data-Config as posted (with the "id" > field in the comment section) none of the other entities works anymore > like for example: > " > entity name="user" dataSource="dbA" query="select username from > ci_user where userid='${dbA.project.created_by}' "> > > > " > returns an empty result. > > Still can´t figure it out why I cant use the (sql)tables primary key > - once to save it in the index directly and > - twice to query against my comment table > > > > > > Noble Paul നോബിള് नोब्ळ्-2 wrote: >> >> can you avoid "." dots in the entity name and try it out. dots are >> special characters and it should have caused some problem >> >> On Tue, Jun 9, 2009 at 1:37 PM, gateway0 wrote: >>> >>> Ok here it goes: >>> " >>> >>> >>> >> driver="com.mysql.jdbc.Driver" >>> url="jdbc:mysql://localhost:3306/dbA?zeroDateTimeBehavior=convertToNull" >>> user="root" password=""/> >>> >>> >> transformer="TemplateTransformer" query="select *, 'dbA.project' from >>> project"> >>> >>> >> template="${dbA.project.dbA.project},id:${dbA.project.id}"/> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >> dateTimeFormat="-MM-dd'T'hh:mm:ss"/> >>> >> dateTimeFormat="-MM-dd'T'hh:mm:ss"/> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> " >>> The name of the database is "dbA" and the table name is "project". >>> >>> Everything works out fine except the comment part highlighted (bold). >>> That >>> works to as I stated If I change the phrase to: >>> " >>> >>> >>> >>> " >>> so that I don´t use my primary key "id" twice but the problem is I need >>> to >>> use "id" for the comment part too. >>> >>> kind regards, Sebastian >>> >>> >>> Noble Paul നോബിള് नोब्ळ्-2 wrote: On Tue, Jun 9, 2009 at 12:41 AM, gateway0 wrote: > > Thanks for your answer. > > "${db.tableA.id}" that specifies the sql query that the > Dataimporthandler > should Use the sql field "id" in table "tableA" located in Database > "db". The naming convention does not work like that. if the entity name is 'tableA' then the field 'id' is addressed as 'tableA.id' As I said earlier, if you could privide mw with the entire data-config.xml it would be more helpful > > like in the example from the Solr Wiki: > " > > " > > It´s strange I know but when I use something other than "id" as the > foreign > key for the query everything works! > > like: > "${db.tableA.anotherid}" > > > > Noble Paul നോബിള് नोब्ळ्-2 wrote: >> >> what is ${db.tableA.id} ? >> >> I think there is something extra in that >> >> can you paste the whole data-config.xml? >> >> can you paste >> >> On Sun, Jun 7, 2009 at 1:09 AM, gateway0 wrote: >>> >>> Hi, >>> >>> I tried to do the following: >>> >>> " >>> >>> >>> >>> >>> >>> " >>> >>> So I use the SQL Table Field "id" twice once for "db_id" in my index >>> and >>> for >>> the sql query as "fid=id". >>> >>> That doesn´t work! >>> >>> But when I change the query from "fid=id" to like "fid=otherkey" it >>> does >>> work! >>> Like: >>> " >>> >>> >>> >>> >>> >>> " >>> >>> Is there any other kind of a workaround so I can use the SQL Field >>> "id" >>> twice as I wanted to? Thanks >>> >>> kind regards, Sebastian >>> -- >>> View this message in context: >>> http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23904968.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >> >> >> >> -- >> - >> Noble Paul | Principal Engineer| AOL | http://aol.com >> >> > > -- > View this message in context: > http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23930286.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com >>> >>> -- >>> View this message in context: >>> http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p239382
Re: Use the same SQL Field in Dataimporthandler twice?
Noticed this "warning" in the log file: " Jun 9, 2009 2:53:35 PM org.apache.solr.handler.dataimport.TemplateTransformer transformRow WARNING: Unable to resolve variable: dbA.project.id while parsing expression: ${dbA.project.dbA.project},id:${dbA.project.id} " Ok? Whats that suppose to mean? And yes I replaced the dots (with ":") from the entity names like you suggested. Still no change. Noble Paul നോബിള് नोब्ळ्-2 wrote: > > There should be no problem if you re-use the same variable > > are you sure you removed the dots from everywhere? > > > On Tue, Jun 9, 2009 at 2:55 PM, gateway0 wrote: >> >> No I changed the entity name to "dbA:project" but still the same problem. >> >> Interesting sidenote If I use my Data-Config as posted (with the "id" >> field >> in the comment section) none of the other entities works anymore like for >> example: >> " >> entity name="user" dataSource="dbA" query="select username from >> ci_user where userid='${dbA.project.created_by}' "> >> >> >> " >> returns an empty result. >> >> Still can´t figure it out why I cant use the (sql)tables primary key >> - once to save it in the index directly and >> - twice to query against my comment table >> >> >> >> >> >> Noble Paul നോബിള് नोब्ळ्-2 wrote: >>> >>> can you avoid "." dots in the entity name and try it out. dots are >>> special characters and it should have caused some problem >>> >>> On Tue, Jun 9, 2009 at 1:37 PM, gateway0 wrote: Ok here it goes: " >>> driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/dbA?zeroDateTimeBehavior=convertToNull" user="root" password=""/> >>> transformer="TemplateTransformer" query="select *, 'dbA.project' from project"> >>> template="${dbA.project.dbA.project},id:${dbA.project.id}"/> >>> dateTimeFormat="-MM-dd'T'hh:mm:ss"/> >>> dateTimeFormat="-MM-dd'T'hh:mm:ss"/> " The name of the database is "dbA" and the table name is "project". Everything works out fine except the comment part highlighted (bold). That works to as I stated If I change the phrase to: " " so that I don´t use my primary key "id" twice but the problem is I need to use "id" for the comment part too. kind regards, Sebastian Noble Paul നോബിള് नोब्ळ्-2 wrote: > > On Tue, Jun 9, 2009 at 12:41 AM, gateway0 wrote: >> >> Thanks for your answer. >> >> "${db.tableA.id}" that specifies the sql query that the >> Dataimporthandler >> should Use the sql field "id" in table "tableA" located in Database >> "db". > > The naming convention does not work like that. > > if the entity name is 'tableA' then the field 'id' is addressed as > 'tableA.id' > > As I said earlier, if you could privide mw with the entire > data-config.xml it would be more helpful > >> >> like in the example from the Solr Wiki: >> " >> >> " >> >> It´s strange I know but when I use something other than "id" as the >> foreign >> key for the query everything works! >> >> like: >> "${db.tableA.anotherid}" >> >> >> >> Noble Paul നോബിള് नोब्ळ्-2 wrote: >>> >>> what is ${db.tableA.id} ? >>> >>> I think there is something extra in that >>> >>> can you paste the whole data-config.xml? >>> >>> can you paste >>> >>> On Sun, Jun 7, 2009 at 1:09 AM, gateway0 >>> wrote: Hi, I tried to do the following: " " So I use the SQL Table Field "id" twice once for "db_id" in my index and for the sql query as "fid=id". That doesn´t work! But when I change the query from "fid=id" to like "fid=otherkey" it does work! Like: " " Is there any other kind of a workaround so I can use the SQL Field "id" twice as I wanted to? Thanks kind regards, Sebastian -- View this message in context: http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23904968.html Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >>> >>> -- >>> - >>> Noble Paul | Principal Engineer| AOL | http://aol.com >>> >>> >> >> -- >> View this mess
Re: Lucene2.9-dev version in Solr nightly-build and FieldCache memory usage
Yep. CHANGES.txt for Solr has this: 34. Upgraded to Lucene 2.9-dev r779312 (yonik) And if you click the "All" tab for LUCENE-1662, is says the committed revision was 779277 -Yonik http://www.lucidimagination.com On Tue, Jun 9, 2009 at 5:32 AM, Marc Sturlese wrote: > > Hey there, > Does the lucene2.9-dev used in current Solr nighty-build (9-6-2009) include > the patch LUCENE-1662 to avoid doubling memory usage in lucene FieldCache?? > Thanks in advance > -- > View this message in context: > http://www.nabble.com/Lucene2.9-dev-version-in-Solr-nightly-build-and-FieldCache-memory-usage-tp23939495p23939495.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: Lucene2.9-dev version in Solr nightly-build and FieldCache memory usage
Thanks Yonik, didn't know how to check for the last commited revison. Yonik Seeley-2 wrote: > > Yep. CHANGES.txt for Solr has this: > 34. Upgraded to Lucene 2.9-dev r779312 (yonik) > > And if you click the "All" tab for LUCENE-1662, is says the committed > revision was 779277 > > -Yonik > http://www.lucidimagination.com > > > > On Tue, Jun 9, 2009 at 5:32 AM, Marc Sturlese > wrote: >> >> Hey there, >> Does the lucene2.9-dev used in current Solr nighty-build (9-6-2009) >> include >> the patch LUCENE-1662 to avoid doubling memory usage in lucene >> FieldCache?? >> Thanks in advance >> -- >> View this message in context: >> http://www.nabble.com/Lucene2.9-dev-version-in-Solr-nightly-build-and-FieldCache-memory-usage-tp23939495p23939495.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/Lucene2.9-dev-version-in-Solr-nightly-build-and-FieldCache-memory-usage-tp23939495p23943239.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: fq vs. q
Martin Davidsson schrieb: I've tried to read up on how to decide, when writing a query, what criteria goes in the q parameter and what goes in the fq parameter, to achieve optimal performance. Is there [...] some kind of rule of thumb to help me decide how to split things up when querying against one or more fields. This is a good question. I don't know if there is any such rule. I'm going to sum up my understanding of filter queries hoping that the pros will point out any flaws in my assumptions. http://wiki.apache.org/solr/SolrCaching - filterCache A filter query is cached, which means that it is the more useful the more often it is repeated. We know how often certain queries arise, or at least have the means to collect that data - so we know what might be candidates for filtering. The result of a filter query is cached and then used to filter a primary query result using set intersection. If my filter query result comprises more than 50 % of the entire document collection, its selectivity is poor. I might need it despite this fact, but it might also be worth while thinking about how to reframe the requirement, allowing for more efficient filters. Memory consumption is probably not a great concern here as the cache stores only document IDs. (And if those are integers, it's just 4 bytes each.) So having 100 filters containing 100,000 items on average, the memory consumption increase should be around 40 MB. By the way, are these document IDs (user in filterCache, documentCache, queryResultCache) the ones I configure in schema.xml or does Solr map my IDs to integers in order to ensure efficiency? A filter query should probably be orthogonal to the primary query, which means in plain English: unrelated to the primary query. To give an example, I have a field "category", which is a required field. In the class of searches where I use a filter on that field, the primary search is for something entirely different, so in most cases, it will not, or not necessarily, bias the primary result to any particular distribution of the category values. I then allow the application to apply filtering by category, incidentally, using faceting, which is a typical usage pattern, I guess. Michael Ludwig
filterCache/@size, queryResultCache/@size, documentCache/@size
Common cache configuration parameters include @size ("size" attribute). http://wiki.apache.org/solr/SolrCaching For each of the following, does this mean the maximum size of: * filterCache/@size - filter query results? * queryResultCache/@size - query results? * documentCache/@size - documents? So if I know my tiny documents don't take up much memory (just 500 Bytes on average), I'd want to have very different settings for the documentCache than if I decided to store 10 KB per doc in Solr? And if I know that only 100 filters are possible, there is no point raising the filterCache/@size above that threshold? Given the following three filtering scenarios of (a) x:bla, (b) y:blub, and (c) x:bla AND y:blub, will I end up with two or three distinct filters? In other words, may filters be composites or are they decomposed as far as their number (relevant for @size) is concerned? Michael Ludwig
Re: Use the same SQL Field in Dataimporthandler twice?
On Tue, Jun 9, 2009 at 6:39 PM, gateway0 wrote: > > Noticed this "warning" in the log file: > " > Jun 9, 2009 2:53:35 PM > org.apache.solr.handler.dataimport.TemplateTransformer transformRow > WARNING: Unable to resolve variable: dbA.project.id while parsing > expression: ${dbA.project.dbA.project},id:${dbA.project.id} > " > > Ok? Whats that suppose to mean? This means that you still have dots in the entity name ${dbA.project.id} does not get resolved correctly > > > > gateway0 wrote: >> >> No I changed the entity name to "dbA:project" but still the same problem. >> >> Interesting sidenote If I use my Data-Config as posted (with the "id" >> field in the comment section) none of the other entities works anymore >> like for example: >> " >> entity name="user" dataSource="dbA" query="select username from >> ci_user where userid='${dbA.project.created_by}' "> >> >> >> " >> returns an empty result. >> >> Still can´t figure it out why I cant use the (sql)tables primary key >> - once to save it in the index directly and >> - twice to query against my comment table >> >> >> >> >> >> Noble Paul നോബിള് नोब्ळ्-2 wrote: >>> >>> can you avoid "." dots in the entity name and try it out. dots are >>> special characters and it should have caused some problem >>> >>> On Tue, Jun 9, 2009 at 1:37 PM, gateway0 wrote: Ok here it goes: " >>> driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/dbA?zeroDateTimeBehavior=convertToNull" user="root" password=""/> >>> transformer="TemplateTransformer" query="select *, 'dbA.project' from project"> >>> template="${dbA.project.dbA.project},id:${dbA.project.id}"/> >>> dateTimeFormat="-MM-dd'T'hh:mm:ss"/> >>> dateTimeFormat="-MM-dd'T'hh:mm:ss"/> " The name of the database is "dbA" and the table name is "project". Everything works out fine except the comment part highlighted (bold). That works to as I stated If I change the phrase to: " " so that I don´t use my primary key "id" twice but the problem is I need to use "id" for the comment part too. kind regards, Sebastian Noble Paul നോബിള് नोब्ळ्-2 wrote: > > On Tue, Jun 9, 2009 at 12:41 AM, gateway0 wrote: >> >> Thanks for your answer. >> >> "${db.tableA.id}" that specifies the sql query that the >> Dataimporthandler >> should Use the sql field "id" in table "tableA" located in Database >> "db". > > The naming convention does not work like that. > > if the entity name is 'tableA' then the field 'id' is addressed as > 'tableA.id' > > As I said earlier, if you could privide mw with the entire > data-config.xml it would be more helpful > >> >> like in the example from the Solr Wiki: >> " >> >> " >> >> It´s strange I know but when I use something other than "id" as the >> foreign >> key for the query everything works! >> >> like: >> "${db.tableA.anotherid}" >> >> >> >> Noble Paul നോബിള് नोब्ळ्-2 wrote: >>> >>> what is ${db.tableA.id} ? >>> >>> I think there is something extra in that >>> >>> can you paste the whole data-config.xml? >>> >>> can you paste >>> >>> On Sun, Jun 7, 2009 at 1:09 AM, gateway0 wrote: Hi, I tried to do the following: " " So I use the SQL Table Field "id" twice once for "db_id" in my index and for the sql query as "fid=id". That doesn´t work! But when I change the query from "fid=id" to like "fid=otherkey" it does work! Like: " " Is there any other kind of a workaround so I can use the SQL Field "id" twice as I wanted to? Thanks kind regards, Sebastian -- View this message in context: http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23904968.html Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >>> >>> -- >>> - >>> Noble Paul | Principal Engineer| AOL | http://aol.com >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/Use-the-same-SQL-Field-in-Dataimporthandler-twice--tp23904968p23930286.html >> Sent from
Re: Terms Component
I just got the nightly build, and terms comp works great!!! merci beaucoup On Mon, Jun 8, 2009 at 8:00 PM, Aleksander M. Stensby < aleksander.sten...@integrasco.no> wrote: > You can try out the nightly build of solr (which is the solr 1.4 dev > version) containing all the new nice and shiny features of Solr 1.4:) > To use Terms Component you simply need to configure the handler as > explained in the documentation / wiki. > > Cheers, > Aleksander > > > > On Mon, 08 Jun 2009 14:22:15 +0200, Anshuman Manur < > anshuman_ma...@stragure.com> wrote: > > while on the subject, can anybody tell me when Solr 1.4 might come out? >> >> Thanks >> Anshuman Manur >> >> On Mon, Jun 8, 2009 at 5:37 PM, Anshuman Manur >> wrote: >> >> I'm using Solr 1.3 apparently.and Solr 1.4 is not out yet. >>> Sorry..My mistake! >>> >>> >>> On Mon, Jun 8, 2009 at 5:18 PM, Anshuman Manur < >>> anshuman_ma...@stragure.com> wrote: >>> >>> Hello, I want to use the terms component in Solr 1.4: But http://localhost:8983/solr/terms?terms.fl=name But, I get the following error with the above query: java.lang.NullPointerException at org.apache.solr.common.util.StrUtils.splitSmart(StrUtils.java:37) at org.apache.solr.search.OldLuceneQParser.parse(LuceneQParserPlugin.java:104) at org.apache.solr.search.QParser.getQuery(QParser.java:88) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:82) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:148) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:84) at javax.servlet.http.HttpServlet.service(HttpServlet.java:690) at javax.servlet.http.HttpServlet.service(HttpServlet.java:803) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:295) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:568) at org.ofbiz.catalina.container.CrossSubdomainSessionValve.invoke(CrossSubdomainSessionValve.java:44) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Any help would be great. Thanks Anshuman Manur >>> >>> > > > -- > Aleksander M. Stensby > Lead software developer and system architect > Integrasco A/S > www.integrasco.no > http://twitter.com/Integrasco > > Please consider the environment before printing all or any of this e-mail >
Initializing Solr Example
In trying to run the example distributed with Solr 1.3.0 from the command line, the process seems to stop at the following line: INFO: [] Registered new searcher searc...@147c1db main The searcher ID is not always the same, but it repeatedly gets caught at this line. Any suggestions?
Re: filter on millions of IDs from external query
Ryan McKinley schrieb: I am working with an in index of ~10 million documents. The index does not change often. I need to preform some external search criteria that will return some number of results -- this search could take up to 5 mins and return anywhere from 0-10M docs. If it really takes so long, then something is likely wrong. You might be able to achieve a significant improvement by reframing your requirement. I would like to use the output of this long running query as a filter in solr. Any suggestions on how to wire this all together? Just use it as a filter query. The result will be cached, the query won't have to be executed again (if I'm not mistaken) until a new index searcher is opened (after an index update and a commit), or until the filter query result is evicted from the cache, which you should make sure won't happen if your query really is so terribly expensive. Michael Ludwig
Re: Solr relevancy score - conversion
Solr does not support this. You can do it yourself by taking the highest score and using that as 100% and calculating other percentages from that number. For example if the max score is 10 and the next result has a score of 5, you would do (5 / 10) * 100 = 50%. Hope this helps. Thanks, Matt Weber eSr Technologies http://www.esr-technologies.com On Jun 8, 2009, at 10:05 PM, Vijay_here wrote: Hi, I am using solr to inxdex some of the legal documents, where i need the solr search engine to return relevancy ranking score for each search results. As of now i am getting score like 3.12, 1.23, 0.23 so on. Would need an more proportionate score like rounded to 100% (95% relevant, 80 % relevant and so on). Is there a way to make solr returns such scores of such relevance. Any other approach to arrive at this scores also be appreciated thanks vijay -- View this message in context: http://www.nabble.com/Solr-relevancy-score---conversion-tp23936413p23936413.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Field Compression
Fer-Bj schrieb: for all the documents we have a field called "small_body" , which is a 60 chars max text field that were we store the "abstract" for each article. we need to display this small_body we want to compress every time. If this works like compressing individual files, the overhead for just 60 characters (which may be no more than 60 bytes) may mean that any attempt at compression results in inflation. On the other hand, if lower-level units (pages) are compressed (as opposed to individual fields), then I don't know what sense a configurable compression threshold might make. Maybe one of the pros can clarify. Last question: what's the best way to determine the compress threshold ? One fairly obvious way would be to index the same set of documents twice, with compression and then without, and then to compare the index size on disk. If you don't save, say, five or ten percent (YMMV), it might not be worth the effort. Michael Ludwig
Re: Faceting on text fields
Yao Ge schrieb: The facet query is considerably slower comparing to other facets from structured database fields (with highly repeated values). What I found interesting is that even after I constrained search results to just a few hunderd hits using other facets, these text facets are still very slow. I understand that text fields are not good candidate for faceting as it can contain very large number of unique values. However why it is still slow after my matching documents is reduced to hundreds? Is it because the whole filter is cached (regardless the matching docs) and I don't have enough filter cache size to fit the whole list? Very interesting questions! I think an answer would both require and further an understanding of how filters work, which might even lead to a more general guideline on when and how to use filters and facets. Even though faceting appears to have changed in 1.4 vs 1.3, it would still be interesting to understand the 1.3 side of things. Lastly, what I really want to is to give user a chance to visualize and filter on top relevant words in the free-text fields. Are there alternative to facet field approach? term vectors? I can do client side process based on top N (say 100) hits for this but it is my last option. Also a very interesting data mining question! I'm sorry I don't have any answers for you. Maybe someone else does. Best, Michael Ludwig
Solr update performance decrease after a while
Hello, We are indexing approximately 500 documents per day. My benchmark says an update is done in 0.7 sec just after Solr has been started. But it quickly decrease to 2.2 secs per update ! I have just been focused on the Schema until now, and didn't changed many stuffs in the solrconfig file. Maybe you have some tips which could help me to be more linear ? Thanks a lot Vincent -- View this message in context: http://www.nabble.com/Solr-update-performance-decrease-after-a-while-tp23945947p23945947.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Trie Patches- Backportable?
I take it by the deafening silence that this is not possible? :-) On Mon, Jun 8, 2009 at 11:34 AM, Amit Nithian wrote: > Hi, > I am still using Solr 1.2 with the Lucene 2.2 that came with that version > of Solr. I am interested in taking advantage of the trie filtering to > alleviate some performance problems and was wondering how back-portable > these patches are? > > I am also trying to understand how the Trie algorithm cuts down the number > of term queries compared to a normal range query. I was at the recent Bay > Area lucene/solr meetup where this was covered but missed some of the > details. > > I know the ideal case is to upgrade to a newer Solr/Lucene but we are > resource constrained and can't devote the time right now to test and upgrade > our production systems to a newer Solr. > > Thanks! > Amit >
Re: Faceting on text fields
Yonik Seeley schrieb: Are you using Solr 1.3? You might want to try the latest 1.4 test build - faceting has changed a lot. I found two significant changes (but there may well be more): [#SOLR-911] multi-select facets - ASF JIRA https://issues.apache.org/jira/browse/SOLR-911 Yao, it sounds like the following (which is in 1.4) might have a chance of helping your faceting performance issue: [#SOLR-475] multi-valued faceting via un-inverted field - ASF JIRA https://issues.apache.org/jira/browse/SOLR-475 Yonik, from your initial comment for SOLR-475: | * To save space and speed up faceting, any term that matches enough | * documents will not be un-inverted... it will be skipped while | * building the un-inverted field structore, and will use a set | * intersection method during faceting. Does this mean that frequently occurring terms (which we can use for faceting in 1.3 without problems) are handled exactly as they were before, by allocating a slot in the filter cache upon request, while those zillions of pesky little fringe terms outside the mainstream, for which allocating a slot in the filter cache would be overkill (and possibly cause inefficient contention, eviction, and, hence, a performance penalty) are now handled by the new structure mapping documents to term numbers? So doing faceting for a given set of documents would result in (a) doing set intersection using those filter query results that have been set up (for the terms occurring in many documents), and (b) collecting all the pesky little terms from the new structure mapping documents to term numbers? So basically, depending on expediency, you (a) know the facets and count the documents which display them, or you (b) take the documents and see what facets they have? Michael Ludwig
Re: Trie Patches- Backportable?
On Tue, Jun 9, 2009 at 10:19 PM, Amit Nithian wrote: > I take it by the deafening silence that this is not possible? :-) > Anything is possible :) However, it might be easier to upgrade to 1.4 instead. > > On Mon, Jun 8, 2009 at 11:34 AM, Amit Nithian wrote: > > > Hi, > > I am still using Solr 1.2 with the Lucene 2.2 that came with that version > > of Solr. I am interested in taking advantage of the trie filtering to > > alleviate some performance problems and was wondering how back-portable > > these patches are? > > > Trie is a new functionality. It does have a few dependencies on new Lucene APIs (TokenStream/TermAttribute etc.). On the Solr side I think it'd be easier. > > > I am also trying to understand how the Trie algorithm cuts down the > number > > of term queries compared to a normal range query. I was at the recent Bay > > Area lucene/solr meetup where this was covered but missed some of the > > details. > > > See the javadocs. It has the link to the paper in which it is described in more detail. http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/contrib-queries/org/apache/lucene/search/trie/package-summary.html -- Regards, Shalin Shekhar Mangar.
Re: User Credentials for Solr Data Dir
I do not recommend using network storage for indexes. This is almost always extremely slow. When I tried it, indexing ran 100X slower. If you don't mind terrible performance, configure your NT service to run as a specific user. The default user is one that has almost no privileges. Create a new user, perhaps "solr", give that user the desired privs, and configure the service to run as that user. But you should still use local disk. wunder On 6/9/09 1:55 AM, "Noble Paul നോബിള് नोब्ळ्" wrote: > nope > > On Tue, Jun 9, 2009 at 4:59 AM, vaibhav joshi wrote: >> >> Hi, >> >> I am currently using solr 1.3 and runnign the sole as NT service. I need to >> store data indexes on a Remote Filer machine. the Filer needs user >> credentials inorder to access the same.. Is there a solr configuration which >> I can use to pass these credentials? >> >> I was reading some blogs and they suggested to run the NT service with user >> who can access the resource needed. Since I need to use existing build and >> deploy tools in the company, and they always run the NT serviec "LOCAL >> System" which cannot access other resource. >> >> Thats why i am trying to explore if its possible to pass these credentials >> via JNDI/System variables? Is it possible? >> >> Thanks >> >> Vaibhav
Re: fq vs. q
On Tue, Jun 9, 2009 at 7:25 PM, Michael Ludwig wrote: > > http://wiki.apache.org/solr/SolrCaching - filterCache > > A filter query is cached, which means that it is the more useful the > more often it is repeated. We know how often certain queries arise, or > at least have the means to collect that data - so we know what might be > candidates for filtering. Correct. > The result of a filter query is cached and then used to filter a primary > query result using set intersection. If my filter query result comprises > more than 50 % of the entire document collection, its selectivity is > poor. I might need it despite this fact, but it might also be worth > while thinking about how to reframe the requirement, allowing for more > efficient filters. Correct. > Memory consumption is probably not a great concern here as the cache > stores only document IDs. (And if those are integers, it's just 4 bytes > each.) So having 100 filters containing 100,000 items on average, the > memory consumption increase should be around 40 MB. > A lot of times it is stored as a bitset so the memory requirements may be even lesser. > > By the way, are these document IDs (user in filterCache, documentCache, > queryResultCache) the ones I configure in schema.xml or does Solr map my > IDs to integers in order to ensure efficiency? > These are internal doc ids assigned by Lucene. > A filter query should probably be orthogonal to the primary query, which > means in plain English: unrelated to the primary query. To give an > example, I have a field "category", which is a required field. In the > class of searches where I use a filter on that field, the primary search > is for something entirely different, so in most cases, it will not, or > not necessarily, bias the primary result to any particular distribution > of the category values. I then allow the application to apply filtering > by category, incidentally, using faceting, which is a typical usage > pattern, I guess. > Yes and no. There are use-cases where the query is applicable only to the filtered set. For example, when the same index contains many different "types" of documents. It is just that the intersection may need to do more or less work. -- Regards, Shalin Shekhar Mangar.
Re: statistics about word distances in solr
Moin Jens, Jens Fischer schrieb: I was wondering if there's an option to return statistics about distances from the query terms to the most frequent terms in the result documents. The additional information I'm looking for is the average distance between these terms and my search term. So let's say I have two docs "the house is red" "I live in a red house" The search for "house" should also return the info the:1 is:1 red:1.5 I:5 live:4 Could you explain what the "distance" here is? Something like "edit distance"? Ah, I see: You want the textual distance between the search term and other terms in the document, and then you want that averaged, i.e. the cumulative distance divided by the number of occurrences. No idea if that functionality is available. However, the sort of calculation you want to perform requires the engine to not only collect all the terms to present as facets (much improved in 1.4, as I've just learned), but to also analyze each document (if I'm not mistaken) to determine the distance for each facet term from your primary query term. (Or terms.) The number of lookup operations is likely to scale as the product of the number of your primary search results, the number of your search terms, and the number of your facets. I assume this is an expensive operation. Michael Ludwig
Re: fq vs. q
Shalin Shekhar Mangar schrieb: On Tue, Jun 9, 2009 at 7:25 PM, Michael Ludwig wrote: A filter query should probably be orthogonal to the primary query, which means in plain English: unrelated to the primary query. To give an example, I have a field "category", which is a required field. In the class of searches where I use a filter on that field, the primary search is for something entirely different, so in most cases, it will not, or not necessarily, bias the primary result to any particular distribution of the category values. I then allow the application to apply filtering by category, incidentally, using faceting, which is a typical usage pattern, I guess. Yes and no. There are use-cases where the query is applicable only to the filtered set. For example, when the same index contains many different "types" of documents. It is just that the intersection may need to do more or less work. Sorry, I don't understand. I used to think that the engine applies the filter to the primary query result. What you're saying here sounds as if it could also pre-filter my document collection to then apply a query to it (which should yield the same result). What does it mean that "the query is applicable only to the filtered set"? And thanks for having clarified the other points! Michael Ludwig
Re: filterCache/@size, queryResultCache/@size, documentCache/@size
On Tue, Jun 9, 2009 at 7:47 PM, Michael Ludwig wrote: > Common cache configuration parameters include @size ("size" attribute). > > http://wiki.apache.org/solr/SolrCaching > > For each of the following, does this mean the maximum size of: > > * filterCache/@size - filter query results? Maximum number of filters that can be cached. > * queryResultCache/@size - query results? Maximum number of queries (DocLists) that can be cached. > * documentCache/@size - documents? Correct. > So if I know my tiny documents don't take up much memory (just 500 > Bytes on average), I'd want to have very different settings for the > documentCache than if I decided to store 10 KB per doc in Solr? Correct. > And if I know that only 100 filters are possible, there is no point > raising the filterCache/@size above that threshold? Correct. Faceting also uses the filterCache so keep that in mind too. > Given the following three filtering scenarios of (a) x:bla, (b) y:blub, > and (c) x:bla AND y:blub, will I end up with two or three distinct > filters? In other words, may filters be composites or are they > decomposed as far as their number (relevant for @size) is concerned? > It will be three. If you want to cache separately, send them as separate fq parameters. -- Regards, Shalin Shekhar Mangar.
Re: fq vs. q
On Tue, Jun 9, 2009 at 11:11 PM, Michael Ludwig wrote: > > Sorry, I don't understand. I used to think that the engine applies the > filter to the primary query result. What you're saying here sounds as if > it could also pre-filter my document collection to then apply a query to > it (which should yield the same result). What does it mean that "the > query is applicable only to the filtered set"? > Sorry for not being clear. No, both filters and queries are computed on the entire index. My comment was related to the "A filter query should probably be orthogonal to the primary query..." part. I meant that both kinds of use-cases are common. -- Regards, Shalin Shekhar Mangar.
Re: filterCache/@size, queryResultCache/@size, documentCache/@size
Shalin Shekhar Mangar schrieb: On Tue, Jun 9, 2009 at 7:47 PM, Michael Ludwig wrote: Given the following three filtering scenarios of (a) x:bla, (b) y:blub, and (c) x:bla AND y:blub, will I end up with two or three distinct filters? In other words, may filters be composites or are they decomposed as far as their number (relevant for @size) is concerned? It will be three. If you want to cache separately, send them as separate fq parameters. Thanks a lot for clarifying all my questions. Michael Ludwig
Re: fq vs. q
Shalin Shekhar Mangar schrieb: No, both filters and queries are computed on the entire index. My comment was related to the "A filter query should probably be orthogonal to the primary query..." part. I meant that both kinds of use-cases are common. Got it. Thanks :-) Michael Ludwig
ExtractingRequestHandler and local files
Hi, I would greatly appreciate a quick response to this question. Is there a means of passing a local file to the ExtractingRequestHandler (as the enableRemoteStreaming/stream.file option does with the other handlers) so the file contents can directly be read from the local disk versus going over HTTP? Per the Solr wiki entry for ExtractingRequestHandler, enableRemoteStreaming is not used? This is also a tad confusing because the Ruby example off: http://www.lucidimagination.com/blog/2009/02/17/acts_as_solr_cell/ explicitly recommends setting this parameter? Thanks! _ Lauren found her dream laptop. Find the PC that’s right for you. http://www.microsoft.com/windows/choosepc/?ocid=ftp_val_wl_290
Re: ExtractingRequestHandler and local files
I haven't tried it, but I thought the enableRemoteStreaming stuff should work. That stuff is handled by Solr in other places, if I recall correctly. Have you tried it? -Grant On Jun 9, 2009, at 2:28 PM, doraiswamy thirumalai wrote: Hi, I would greatly appreciate a quick response to this question. Is there a means of passing a local file to the ExtractingRequestHandler (as the enableRemoteStreaming/stream.file option does with the other handlers) so the file contents can directly be read from the local disk versus going over HTTP? Per the Solr wiki entry for ExtractingRequestHandler, enableRemoteStreaming is not used? This is also a tad confusing because the Ruby example off: http://www.lucidimagination.com/blog/2009/02/17/acts_as_solr_cell/ explicitly recommends setting this parameter? Thanks! _ Lauren found her dream laptop. Find the PC that’s right for you. http://www.microsoft.com/windows/choosepc/?ocid=ftp_val_wl_290 -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Configure Collection Distribution in Solr 1.3
Hi Aleksander , I gone thorugh the below links and successfully configured rsync using cygwin on windows xp. In Solr documentation they mentioned many script files like rysnc-enable, snapshooter..etc. These all UNIX based files scripts. where do I get these script files for windows OS ? Any help on this would be great helpful. Thanks MaheshR. Aleksander M. Stensby wrote: > > You'll find everything you need in the Wiki. > http://wiki.apache.org/solr/SolrCollectionDistributionOperationsOutline > > http://wiki.apache.org/solr/SolrCollectionDistributionScripts > > If things are still uncertain I've written a guide for when we used the > solr distribution scrips on our lucene index earlier. You can read that > guide here: > http://www.integrasco.no/index.php?option=com_content&view=article&id=51:lucene-index-replication&catid=35:blog&Itemid=53 > > Cheers, > Aleksander > > > On Mon, 08 Jun 2009 18:22:01 +0200, MaheshR > wrote: > >> >> Hi, >> >> we configured multi-core solr 1.3 server in Tomcat 6.0.18 servlet >> container. >> Its working great. Now I need to configure collection Distribution to >> replicate indexing data between master and 2 slaves. Please provide me >> step >> by step instructions to configure collection distribution between master >> and >> slaves would be helpful. >> >> Thanks in advance. >> >> Thanks >> Mahesh. > > > > -- > Aleksander M. Stensby > Lead software developer and system architect > Integrasco A/S > www.integrasco.no > http://twitter.com/Integrasco > > Please consider the environment before printing all or any of this e-mail > > -- View this message in context: http://www.nabble.com/Configure-Collection-Distribution-in-Solr-1.3-tp23927332p23949324.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Faceting on text fields
Yep, all that sounds right. An additional optimization counts terms for the documents *not* in the set when the base set is over half the size of the index. -Yonik http://www.lucidimagination.com On Tue, Jun 9, 2009 at 1:01 PM, Michael Ludwig wrote: > Yonik, > > from your initial comment for SOLR-475: > > | * To save space and speed up faceting, any term that matches enough > | * documents will not be un-inverted... it will be skipped while > | * building the un-inverted field structore, and will use a set > | * intersection method during faceting. > > Does this mean that frequently occurring terms (which we can use for > faceting in 1.3 without problems) are handled exactly as they were > before, by allocating a slot in the filter cache upon request, while > those zillions of pesky little fringe terms outside the mainstream, > for which allocating a slot in the filter cache would be overkill > (and possibly cause inefficient contention, eviction, and, hence, > a performance penalty) are now handled by the new structure mapping > documents to term numbers? > > So doing faceting for a given set of documents would result in (a) doing > set intersection using those filter query results that have been set up > (for the terms occurring in many documents), and (b) collecting all the > pesky little terms from the new structure mapping documents to term > numbers? > > So basically, depending on expediency, you (a) know the facets and count > the documents which display them, or you (b) take the documents and see > what facets they have? > > Michael Ludwig >
Re: Initializing Solr Example
Define caught? When I start up Solr, here's what I see (and know it's working): 2009-06-09 15:18:33.726::INFO: Started SocketConnector @ 0.0.0.0:8983 Jun 9, 2009 3:18:33 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=null path=null params={q=static+firstSearcher+warming +query+from+solrconfig.xml} hits=0 status=0 QTime=30 Jun 9, 2009 3:18:33 PM org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener done. Jun 9, 2009 3:18:33 PM org.apache.solr.handler.component.SpellCheckComponent $SpellCheckerListener newSearcher INFO: Loading spell index for spellchecker: default Jun 9, 2009 3:18:33 PM org.apache.solr.handler.component.SpellCheckComponent $SpellCheckerListener newSearcher INFO: Loading spell index for spellchecker: jarowinkler Jun 9, 2009 3:18:33 PM org.apache.solr.handler.component.SpellCheckComponent $SpellCheckerListener newSearcher INFO: Loading spell index for spellchecker: file Jun 9, 2009 3:18:33 PM org.apache.solr.core.SolrCore registerSearcher INFO: [] Registered new searcher searc...@f7378ab main What happens if you browse to http://localhost:8983/solr/admin? Or, what happens if you index documents? Granted, the message could probably be clearer that "Solr is ready to go" HTH, Grant On Jun 9, 2009, at 10:51 AM, Mukerjee, Neiloy (Neil) wrote: In trying to run the example distributed with Solr 1.3.0 from the command line, the process seems to stop at the following line: INFO: [] Registered new searcher searc...@147c1db main The searcher ID is not always the same, but it repeatedly gets caught at this line. Any suggestions? -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Fetching Dynamic Fields
One option is to hit the Luke request handler (&numTerms=0 for best performance), grab all the field names there, then build the fl list (or facet.field in the cases I've used this trick for) from the fields with the prefix you desire. Erik On Jun 8, 2009, at 11:40 AM, Manepalli, Kalyan wrote: Hi all, Is there a way to select all the dynamic fields in the fl field without using *. Here is what I am looking for. Fields in the schema, locationName_*, locationId,description,content. I want to select just the locationName_* and locationId. How can I do this without using fl=*, coz I don't want to fetch all the other fields. Any suggestions in this regard will be helpful. Thanks, Kalyan Manepalli
Re: Faceting on text fields
Michael, Thanks for the update! I definitely need to get a 1.4 build see if it makes a difference. BTW, maybe instead of using faceting for text mining/clustering/visualization purpose, we can build a separate feature in SOLR for this. Many of commercial search engines I have experiences with (Google Search Appliance, Vivisimo etc) provide dynamic term clustering based on top N ranked documents (N is a parameter can be configured). When facet field is highly fragmented (say a text field), the existing set intersection based approach might no longer be optimum. Aggregating term vectors over top N docs might be more attractive. Another features I can really appreciate is to provide search time n-gram term clustering. Maybe this might be better suited for "spell checker" as it just a different way to display the alternative search terms. -Yao Michael Ludwig-4 wrote: > > Yao Ge schrieb: > >> The facet query is considerably slower comparing to other facets from >> structured database fields (with highly repeated values). What I found >> interesting is that even after I constrained search results to just a >> few hunderd hits using other facets, these text facets are still very >> slow. >> >> I understand that text fields are not good candidate for faceting as >> it can contain very large number of unique values. However why it is >> still slow after my matching documents is reduced to hundreds? Is it >> because the whole filter is cached (regardless the matching docs) and >> I don't have enough filter cache size to fit the whole list? > > Very interesting questions! I think an answer would both require and > further an understanding of how filters work, which might even lead to > a more general guideline on when and how to use filters and facets. > > Even though faceting appears to have changed in 1.4 vs 1.3, it would > still be interesting to understand the 1.3 side of things. > >> Lastly, what I really want to is to give user a chance to visualize >> and filter on top relevant words in the free-text fields. Are there >> alternative to facet field approach? term vectors? I can do client >> side process based on top N (say 100) hits for this but it is my last >> option. > > Also a very interesting data mining question! I'm sorry I don't have any > answers for you. Maybe someone else does. > > Best, > > Michael Ludwig > > -- View this message in context: http://www.nabble.com/Faceting-on-text-fields-tp23872891p23950084.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Initializing Solr Example
After that comes up in the command line, I can access the localhost address, but I can't enter anything on the command line. -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Tuesday, June 09, 2009 3:20 PM To: solr-user@lucene.apache.org Subject: Re: Initializing Solr Example Define caught? When I start up Solr, here's what I see (and know it's working): 2009-06-09 15:18:33.726::INFO: Started SocketConnector @ 0.0.0.0:8983 Jun 9, 2009 3:18:33 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=null path=null params={q=static+firstSearcher+warming +query+from+solrconfig.xml} hits=0 status=0 QTime=30 Jun 9, 2009 3:18:33 PM org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener done. Jun 9, 2009 3:18:33 PM org.apache.solr.handler.component.SpellCheckComponent $SpellCheckerListener newSearcher INFO: Loading spell index for spellchecker: default Jun 9, 2009 3:18:33 PM org.apache.solr.handler.component.SpellCheckComponent $SpellCheckerListener newSearcher INFO: Loading spell index for spellchecker: jarowinkler Jun 9, 2009 3:18:33 PM org.apache.solr.handler.component.SpellCheckComponent $SpellCheckerListener newSearcher INFO: Loading spell index for spellchecker: file Jun 9, 2009 3:18:33 PM org.apache.solr.core.SolrCore registerSearcher INFO: [] Registered new searcher searc...@f7378ab main What happens if you browse to http://localhost:8983/solr/admin? Or, what happens if you index documents? Granted, the message could probably be clearer that "Solr is ready to go" HTH, Grant On Jun 9, 2009, at 10:51 AM, Mukerjee, Neiloy (Neil) wrote: > In trying to run the example distributed with Solr 1.3.0 from the > command line, the process seems to stop at the following line: > INFO: [] Registered new searcher searc...@147c1db main > > The searcher ID is not always the same, but it repeatedly gets > caught at this line. Any suggestions? -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Initializing Solr Example
Solr is a server running in the Jetty web container and accepting requests over HTTP. There is no command line tool, at least not in Solr itself, for interacting with Solr. Typically people interact with it programmatically or via a Web Browser. I'd start by walking through: http://lucene.apache.org/solr/tutorial.html to familiarize yourself with Solr. -Grant On Jun 9, 2009, at 3:55 PM, Mukerjee, Neiloy (Neil) wrote: After that comes up in the command line, I can access the localhost address, but I can't enter anything on the command line. -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Tuesday, June 09, 2009 3:20 PM To: solr-user@lucene.apache.org Subject: Re: Initializing Solr Example Define caught? When I start up Solr, here's what I see (and know it's working): 2009-06-09 15:18:33.726::INFO: Started SocketConnector @ 0.0.0.0:8983 Jun 9, 2009 3:18:33 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=null path=null params={q=static+firstSearcher+warming +query+from+solrconfig.xml} hits=0 status=0 QTime=30 Jun 9, 2009 3:18:33 PM org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener done. Jun 9, 2009 3:18:33 PM org.apache.solr.handler.component.SpellCheckComponent $SpellCheckerListener newSearcher INFO: Loading spell index for spellchecker: default Jun 9, 2009 3:18:33 PM org.apache.solr.handler.component.SpellCheckComponent $SpellCheckerListener newSearcher INFO: Loading spell index for spellchecker: jarowinkler Jun 9, 2009 3:18:33 PM org.apache.solr.handler.component.SpellCheckComponent $SpellCheckerListener newSearcher INFO: Loading spell index for spellchecker: file Jun 9, 2009 3:18:33 PM org.apache.solr.core.SolrCore registerSearcher INFO: [] Registered new searcher searc...@f7378ab main What happens if you browse to http://localhost:8983/solr/admin? Or, what happens if you index documents? Granted, the message could probably be clearer that "Solr is ready to go" HTH, Grant On Jun 9, 2009, at 10:51 AM, Mukerjee, Neiloy (Neil) wrote: In trying to run the example distributed with Solr 1.3.0 from the command line, the process seems to stop at the following line: INFO: [] Registered new searcher searc...@147c1db main The searcher ID is not always the same, but it repeatedly gets caught at this line. Any suggestions? -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
RE: ExtractingRequestHandler and local files
Thanks for the quick response, Grant. We tried it and it seems to work. The confusion stemmed from the fact that the wiki states that the parameter is not used - there are also comments in the test cases for the handler that say: //TODO: stop using locally defined fields once stream.file and stream.body start working everywhere So wanted to confirm. > From: gsing...@apache.org > To: solr-user@lucene.apache.org > Subject: Re: ExtractingRequestHandler and local files > Date: Tue, 9 Jun 2009 14:50:43 -0400 > > I haven't tried it, but I thought the enableRemoteStreaming stuff > should work. That stuff is handled by Solr in other places, if I > recall correctly. Have you tried it? > > -Grant > > On Jun 9, 2009, at 2:28 PM, doraiswamy thirumalai wrote: > > > > > Hi, > > > > > > > > I would greatly appreciate a quick response to this question. > > > > Is there a means of passing a local file to the > > ExtractingRequestHandler (as the enableRemoteStreaming/stream.file > > option does with the other handlers) so the file contents can > > directly be read from the local disk versus going over HTTP? > > > > Per the Solr wiki entry for ExtractingRequestHandler, > > enableRemoteStreaming is not used? > > > > This is also a tad confusing because the Ruby example off: > > http://www.lucidimagination.com/blog/2009/02/17/acts_as_solr_cell/ > > explicitly recommends setting this parameter? > > > > Thanks! > > > > > > _ > > Lauren found her dream laptop. Find the PC that’s right for you. > > http://www.microsoft.com/windows/choosepc/?ocid=ftp_val_wl_290 > > > > > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) > using Solr/Lucene: > http://www.lucidimagination.com/search > _ Windows Live™ SkyDrive™: Get 25 GB of free online storage. http://windowslive.com/online/skydrive?ocid=TXT_TAGLM_WL_SD_25GB_062009
Re: Initializing Solr Example
Neil - when started using the packaged start.jar, Solr runs in the foreground; that's why you can't type anything in the command line after starting it. Mat On Tue, Jun 9, 2009 at 15:55, Mukerjee, Neiloy (Neil) < neil.muker...@alcatel-lucent.com> wrote: > After that comes up in the command line, I can access the localhost > address, but I can't enter anything on the command line. > > -Original Message- > From: Grant Ingersoll [mailto:gsing...@apache.org] > Sent: Tuesday, June 09, 2009 3:20 PM > To: solr-user@lucene.apache.org > Subject: Re: Initializing Solr Example > > Define caught? When I start up Solr, here's what I see (and know it's > working): > 2009-06-09 15:18:33.726::INFO: Started SocketConnector @ 0.0.0.0:8983 > Jun 9, 2009 3:18:33 PM org.apache.solr.core.SolrCore execute > INFO: [] webapp=null path=null params={q=static+firstSearcher+warming > +query+from+solrconfig.xml} hits=0 status=0 QTime=30 > Jun 9, 2009 3:18:33 PM org.apache.solr.core.QuerySenderListener > newSearcher > INFO: QuerySenderListener done. > Jun 9, 2009 3:18:33 PM > org.apache.solr.handler.component.SpellCheckComponent > $SpellCheckerListener newSearcher > INFO: Loading spell index for spellchecker: default > Jun 9, 2009 3:18:33 PM > org.apache.solr.handler.component.SpellCheckComponent > $SpellCheckerListener newSearcher > INFO: Loading spell index for spellchecker: jarowinkler > Jun 9, 2009 3:18:33 PM > org.apache.solr.handler.component.SpellCheckComponent > $SpellCheckerListener newSearcher > INFO: Loading spell index for spellchecker: file > Jun 9, 2009 3:18:33 PM org.apache.solr.core.SolrCore registerSearcher > INFO: [] Registered new searcher searc...@f7378ab main > > What happens if you browse to http://localhost:8983/solr/admin? Or, > what happens if you index documents? > > Granted, the message could probably be clearer that "Solr is ready to > go" > > HTH, > Grant > > On Jun 9, 2009, at 10:51 AM, Mukerjee, Neiloy (Neil) wrote: > > > In trying to run the example distributed with Solr 1.3.0 from the > > command line, the process seems to stop at the following line: > > INFO: [] Registered new searcher searc...@147c1db main > > > > The searcher ID is not always the same, but it repeatedly gets > > caught at this line. Any suggestions? > > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) > using Solr/Lucene: > http://www.lucidimagination.com/search > >
Re: Refresh synonyms.txt file via replication
Shalin Shekhar Mangar wrote: > >> Second Question: >> If i force an empty commit, like this: >> curl >> http://localhost:8080/solr_rep_master/core/update?stream.body=%3Ccommit/%3E >> Then the changed synonym.txt config file are replicated to the slave. >> Unfortunately now I need to do a core "RELOAD" on both the master and >> slave >> to get them to see the updated synonym.txt file. >> > Calling RELOAD on slave should not be necessary. If a configuration file > is > replicated, the slave is always reloaded. Can you try using the > analysis.txt > on a field which has the SynonymFilterFactory enabled to see if the new > file > is indeed not getting used? > I'm a bit confused now. It's not doing what i saw before. Now I can't get it to replicate when i do an "empty" commit. Rather I need to do a real data update, and a commit, then any changes to the solr_rep_master's conf/synonyms.txt file get replicated to the slave, and the slave seems to pick up the change without reloading. I'm not really sure what you mean by the analysis.txt file. Do you mean the /analysis request handler? I've been making synonyms for "solr" so it is pretty obvious if it was picked up. Can you explain what you expect should happen? ie 1) should the slave replicate when you do an empty commit on the master? 2) If you change a master config file, and it is replicated to the slave, would you expect the slave to pick it up automatically, but the master will require a reload? Thanks --Matthias -- View this message in context: http://www.nabble.com/Refresh-synonyms.txt-file-via-replication-tp23789187p23951978.html Sent from the Solr - User mailing list archive at Nabble.com.
qf boost Versus field boost for Dismax queries
When 'dismax' queries are use, where is the best place to apply boost values/factors? While indexing by supplying the 'boost' attribute to the field, or in solrconfig.xml by specifying the 'qf' parameter with the same boosts? What are the advantages/disadvantages to each? What happens if both boosts are present? Do they get multiplied? Thanks - ashok -- View this message in context: http://www.nabble.com/qf-boost-Versus-field-boost-for-Dismax-queries-tp23952323p23952323.html Sent from the Solr - User mailing list archive at Nabble.com.
facets and stopwords
I have a text field from where I remove stop words, as a first approximation I use facets to see the most common words in the text, but.. stopwords are there, and if I search documents having the stopwords, then , there are no documents in the answer. You can test it in this address (using solrjs, the texts are in spanish but you can check in top words that "que" or "en" are there) but if you click on them to perform the search no results are given http://projecte01.development.barcelonamedia.org/fonetic/ or the administrator at http://projecte01.development.barcelonamedia.org/solr/admin so you can check wat's going on on the content field. I use the DataImportHandler to import the data, and Solr analyzer shows me how the stopwords are removed from both the query and the indexed text, but why facets show me these words? -- View this message in context: http://www.nabble.com/facets-and-stopwords-tp23952823p23952823.html Sent from the Solr - User mailing list archive at Nabble.com.
Problem using db-data-config.xml
Hi All, I am facing an issue while fetching the records from database by providing the value" '${prod.prod_cd}' " in this type at db-data-config.xml. It is working fine If I provide the exact value of the product code ie '302437-413' Here is the db-data-config.xm I am using AND p.prod_cd = '302437-413'"> * * The issue is IF I replace the *AND prod_cd ='${prod.prod_cd}' AND reg_id = '${prod_reg.reg_id'">* with the exact value '302437-413' I am getting the result If not it is not executing the prod_reg and prod_reg_cmrc_styl entity. Please advise anything I am missing in the above db-data-config.xml. Thanks in advance. Regards, Jayakeerthi
Servlet filter for Solr
Hi, I've to intercept every request to solr (search and update) and log some performance numbers. In order to do so I tried a Servlet filter and added this to Solr's web.xml, IndexFilter com.xxx.index.filter.IndexRequestFilter test-param This parameter is for testing. IndexFilter SolrUpdate SolrServer but, this doesn't seem to be working. Couple of questions, 1) What's wrong with my web.xml setting? 2) Is there any easier way to intercept calls to Solr without changing its web.xml? Basically can I just change the solrconfig.xml to do so (beside requesthandlers) so I don't have to customize the solr.war? Thanks, -vivek
How to index data without token in Solr
Hi all, I am very new in Solr and I want to use Solr to index data without token to match with my search. Does anyone know how to index data without token in Solr? if possible, can you give me an example? Thanks in advance, LEE
Re: Refresh synonyms.txt file via replication
Hi , Unfortunately , the problem is that an 'empty' commit does not really do anything. I mean, it is not a real commit.Solr takes a look to find if the index is changed and if not, the call is ignored When we designed it, the choice was to look for all the changed conf files also to decide if a replication is required or not . That proved to be expensive and error prone. so we relied on index change to trigger this. take the case of schema.xml change . schema.xml is changed first and then indexing is done . If the schema.xml is replicated and slave core is reloaded , it will cause error. you can raise an issue and we can find out a better way to do this. --Noble On Wed, Jun 10, 2009 at 3:23 AM, mlathe wrote: > > > Shalin Shekhar Mangar wrote: >> >>> Second Question: >>> If i force an empty commit, like this: >>> curl >>> http://localhost:8080/solr_rep_master/core/update?stream.body=%3Ccommit/%3E >>> Then the changed synonym.txt config file are replicated to the slave. >>> Unfortunately now I need to do a core "RELOAD" on both the master and >>> slave >>> to get them to see the updated synonym.txt file. >>> >> Calling RELOAD on slave should not be necessary. If a configuration file >> is >> replicated, the slave is always reloaded. Can you try using the >> analysis.txt >> on a field which has the SynonymFilterFactory enabled to see if the new >> file >> is indeed not getting used? >> > > I'm a bit confused now. It's not doing what i saw before. > Now I can't get it to replicate when i do an "empty" commit. Rather I need > to do a real data update, and a commit, then any changes to the > solr_rep_master's conf/synonyms.txt file get replicated to the slave, and > the slave seems to pick up the change without reloading. > > I'm not really sure what you mean by the analysis.txt file. Do you mean the > /analysis request handler? I've been making synonyms for "solr" so it is > pretty obvious if it was picked up. > > Can you explain what you expect should happen? ie > 1) should the slave replicate when you do an empty commit on the master? > 2) If you change a master config file, and it is replicated to the slave, > would you expect the slave to pick it up automatically, but the master will > require a reload? > > Thanks > --Matthias > -- > View this message in context: > http://www.nabble.com/Refresh-synonyms.txt-file-via-replication-tp23789187p23951978.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Servlet filter for Solr
if you wish to intercept "read" calls ,a filter is the only way. On Wed, Jun 10, 2009 at 6:35 AM, vivek sar wrote: > Hi, > > I've to intercept every request to solr (search and update) and log > some performance numbers. In order to do so I tried a Servlet filter > and added this to Solr's web.xml, > > > IndexFilter > > com.xxx.index.filter.IndexRequestFilter > > test-param > This parameter is for > testing. > > > > IndexFilter > > SolrUpdate > SolrServer I guess you canot put servlets in the filter mapping > > > but, this doesn't seem to be working. Couple of questions, > > 1) What's wrong with my web.xml setting? > 2) Is there any easier way to intercept calls to Solr without changing > its web.xml? Basically can I just change the solrconfig.xml to do so > (beside requesthandlers) so I don't have to customize the solr.war? > > Thanks, > -vivek > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Problem using db-data-config.xml
are you sure prod_cd and reg_id\ are emitted by respective entities in the same name if not you may need to alias those fields (using as) keep in mind ,the field namkes are case sensitive. Just to know what are the values emitted use debug mode or use logTransformer On Wed, Jun 10, 2009 at 4:55 AM, jayakeerthi s wrote: > Hi All, > > I am facing an issue while fetching the records from database by providing > the value" '${prod.prod_cd}' " in this type at db-data-config.xml. > It is working fine If I provide the exact value of the product code ie > '302437-413' > > Here is the db-data-config.xm I am using > > > url="jdbc:oracle:thin:@*:1521:" user="lslsls" > password="***"/> > > > > AND p.prod_cd = '302437-413'"> > > > > > > > > > > > > > > > > > > * > > > > > > name="frst_prod_offr_dt"/> > > > > > > > * > > name="reg_cmrc_styl_nm"/> > > > > > > > > > > > > > > > > > > > > > The issue is IF I replace the *AND prod_cd ='${prod.prod_cd}' AND reg_id = > '${prod_reg.reg_id'">* with the exact value '302437-413' I am getting the > result If not it is not > executing the prod_reg and prod_reg_cmrc_styl entity. > > Please advise anything I am missing in the above db-data-config.xml. > > Thanks in advance. > > Regards, > Jayakeerthi > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Upgrading 1.2.0 to 1.3.0 solr
Francis, If you can wait another month or so, you could skip 1.3.0, and jump to 1.4 which will be released soon. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > >From: Francis Yakin >To: "solr-user@lucene.apache.org" >Sent: Wednesday, June 10, 2009 1:17:25 AM >Subject: Upgrading 1.2.0 to 1.3.0 solr > > > >I am in process to upgrade our solr 1.2.0 to solr 1.3.0 > >Our solr 1.2.0 now is working fine, we just want to upgrade it cause we have >an application that requires some function from 1.3.0( we call it >autocomplete). > >Currently our config files on 1.2.0 are as follow: > >Solrconfig.xml >Schema.xml ( we wrote this in house) >Index_synonyms.txt ( we also modified and wrote this in house) >Scripts.conf >Protwords.txt >Stopwords.txt >Synonyms.txt > >I understand on 1.3.0 , it has new solrconfig.xml . > >My questions are: > >1) what config files that I can reuse from 1.2.0 for 1.3.0 > can I use the same schema.xml >2) Solrconfig.xml, can I use the 1.2.0 version or I have to stick with 1.3.0 > If I need to stick with 1.3.0, what that I need to change. > >As of right I am testing it in my sandbox, so it doesn't work. > >Please advice, if you have any docs for upgrading 1.2.0 to 1.3.0 let me know. > >Thanks in advance > >Francis > >Note: I attached my solrconfigand schema.xml in this email > > > >-Inline Attachment Follows- > > > > > > > > > > > > > > omitNorms="true"/> > > > omitNorms="true"/> > > > > > > > > > > > > > sortMissingLast="true" omitNorms="true"/> > sortMissingLast="true" omitNorms="true"/> > sortMissingLast="true" omitNorms="true"/> > sortMissingLast="true" omitNorms="true"/> > > > > omitNorms="true"/> > > > > > > > positionIncrementGap="100"> > > > > > > > > > > synonyms="index_synonyms.txt" ignoreCase="true" expand="true"/> > words="stopwords.txt"/> > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > catenateAll="0"/> > > protected="protwords.txt"/> > > > > > ignoreCase="true" expand="true"/> > words="stopwords.txt"/> > generateNumberParts="1" catenateWords="0" catenateNumbers="0" > catenateAll="0"/> > > protected="protwords.txt"/> > > > > > > > positionIncrementGap="100" > > > > ignoreCase="true" expand="false"/> > words="stopwords.txt"/> > generateNumberParts="0" catenateWords="1" catenateNumbers="1" > catenateAll="0"/> > > protected="protwords.txt"/> > > > > > > sortMissingLast="true" omitNorms="true"> > > > > > > > > > > > > > > class="solr.StrField" /> > > > > > > > >/> > > > > > > >multiValue="false"/> >multiValue="false"/> >multiValue="false"/> > > > > > > > > > > >stored="true"/> > > > > > > stored="true"/> > >omitNorms="true"/> >multiValued="true"/> > > >stored="true"/> > > >stored="false"/> > > >stored="false"/> > > >stored="false"/> > > > >default="NOW" multiValued="false"/> > > > > > > > > > > > > > > > > > > >id > > >text > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >-Inline Attachment Follows- > > > > > > > > ${solr.abortOnConfigurationError:true} > > > > > > > >false >10 >1000 > > > >1000 > >32 >2147483647 >1 >1000 >1 > > > > > > > > > > > > >single > > > > >false >32 >10 > > >2147483647 >1 > > >true > > > > > > > > > > > > > > > > > > > > > >1024 > > > > class="solr.LRUCache" > size="512" > initialSize="512" > autowarmCount="128"/> > > > class="solr.LRUCache" > size="512" > initialSize="512" > autowarmCount="32"/> > > > class="solr.LRUCache" > size="512" > initialSize="512" > autowarmCount="0"/> > > >true > > > > > > > >50 > > >200 > > > > > > > > > solr 0 name="rows">10 > rocks 0 name="rows">10 > > > > > > > fast_warm 0 name="rows">10 > > > > >false > > >4 > > > > > > > multipartUploadLimitInKB="2048" /> > > > > etagSeed="Solr"> > > > > > > > > > > > > explicit > > > > > >
Re: How to index data without token in Solr
Hello, I don't follow the "index data without token to match with my search" part. Could you please give an example of what you mean? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: chem leakhina > To: solr-user@lucene.apache.org > Sent: Tuesday, June 9, 2009 10:06:35 PM > Subject: How to index data without token in Solr > > Hi all, > I am very new in Solr and I want to use Solr to index data without token to > match with my search. > Does anyone know how to index data without token in Solr? > if possible, can you give me an example? > > Thanks in advance, > LEE
Re: qf boost Versus field boost for Dismax queries
It's like cooking. If you put too much salt in your food, it's kind of hard to undo that and you end up with a salty meal. Boosting at search time makes it easy to change boosts (e.g. when trying to find the best boost values), while boosting at index time "hard-codes" them. You can use both and they should be multiplied. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: ashokc > To: solr-user@lucene.apache.org > Sent: Tuesday, June 9, 2009 6:17:37 PM > Subject: qf boost Versus field boost for Dismax queries > > > When 'dismax' queries are use, where is the best place to apply boost > values/factors? While indexing by supplying the 'boost' attribute to the > field, or in solrconfig.xml by specifying the 'qf' parameter with the same > boosts? What are the advantages/disadvantages to each? What happens if both > boosts are present? Do they get multiplied? > > Thanks > > - ashok > -- > View this message in context: > http://www.nabble.com/qf-boost-Versus-field-boost-for-Dismax-queries-tp23952323p23952323.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Faceting on text fields
Yao, Solr can already cluster top N hits using Carrot2: http://wiki.apache.org/solr/ClusteringComponent I've also done ugly "manual counting" of terms in top N hits. For example, look at the right side of this: http://www.simpy.com/user/otis/tag/%22machine+learning%22 Something like http://www.sematext.com/product-key-phrase-extractor.html could also be used. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Yao Ge > To: solr-user@lucene.apache.org > Sent: Tuesday, June 9, 2009 3:46:13 PM > Subject: Re: Faceting on text fields > > > Michael, > > Thanks for the update! I definitely need to get a 1.4 build see if it makes > a difference. > > BTW, maybe instead of using faceting for text > mining/clustering/visualization purpose, we can build a separate feature in > SOLR for this. Many of commercial search engines I have experiences with > (Google Search Appliance, Vivisimo etc) provide dynamic term clustering > based on top N ranked documents (N is a parameter can be configured). When > facet field is highly fragmented (say a text field), the existing set > intersection based approach might no longer be optimum. Aggregating term > vectors over top N docs might be more attractive. Another features I can > really appreciate is to provide search time n-gram term clustering. Maybe > this might be better suited for "spell checker" as it just a different way > to display the alternative search terms. > > -Yao > > > Michael Ludwig-4 wrote: > > > > Yao Ge schrieb: > > > >> The facet query is considerably slower comparing to other facets from > >> structured database fields (with highly repeated values). What I found > >> interesting is that even after I constrained search results to just a > >> few hunderd hits using other facets, these text facets are still very > >> slow. > >> > >> I understand that text fields are not good candidate for faceting as > >> it can contain very large number of unique values. However why it is > >> still slow after my matching documents is reduced to hundreds? Is it > >> because the whole filter is cached (regardless the matching docs) and > >> I don't have enough filter cache size to fit the whole list? > > > > Very interesting questions! I think an answer would both require and > > further an understanding of how filters work, which might even lead to > > a more general guideline on when and how to use filters and facets. > > > > Even though faceting appears to have changed in 1.4 vs 1.3, it would > > still be interesting to understand the 1.3 side of things. > > > >> Lastly, what I really want to is to give user a chance to visualize > >> and filter on top relevant words in the free-text fields. Are there > >> alternative to facet field approach? term vectors? I can do client > >> side process based on top N (say 100) hits for this but it is my last > >> option. > > > > Also a very interesting data mining question! I'm sorry I don't have any > > answers for you. Maybe someone else does. > > > > Best, > > > > Michael Ludwig > > > > > > -- > View this message in context: > http://www.nabble.com/Faceting-on-text-fields-tp23872891p23950084.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sharding strategy
Aleksander, In a sense you are lucky you have time-ordered data. That makes it very easy to shard and cheaper to search - you know exactly which shards you need to query. The beginning of the year situation should also be easy. Do start with the latest shard for the current year, and go to next shard only if you have to (e.g. if you don't get enough results from the first shard). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Aleksander M. Stensby > To: "solr-user@lucene.apache.org" > Sent: Tuesday, June 9, 2009 7:07:47 AM > Subject: Sharding strategy > > Hi all, > I'm trying to figure out how to shard our index as it is growing rapidly and > we > want to make our solution scalable. > So, we have documents that are most commonly sorted by their date. My initial > thought is to shard the index by date, but I wonder if you have any input on > this and how to best solve this... > > I know that the most frequent queries will be executed against the "latest" > shard, but then let's say we shard by year, how do we best solve the > situation > that will occur in the beginning of a new year? (Some of the data will be in > the > last shard, but most of it will be on the second last shard.) > > Would it be stupid to have a "latest" shard with duplicate data (always > consisting of the last 6 months or something like that) and maintain that > index > in addition to the regular yearly shards? Any one else facing a similar > situation with a good solution? > > Any input would be greatly appreciated :) > > Cheers, > Aleksander > > > > --Aleksander M. Stensby > Lead software developer and system architect > Integrasco A/S > www.integrasco.no > http://twitter.com/Integrasco > > Please consider the environment before printing all or any of this e-mail
Re: Solr update performance decrease after a while
Vincent, It's hard to tell, but some things to look at are your JVM memory heap size, the status of various generations in the JVM, possibility of not enough memory and too frequent GC, etc. All can be seen with jconsole. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Vincent Pérès > To: solr-user@lucene.apache.org > Sent: Tuesday, June 9, 2009 12:00:32 PM > Subject: Solr update performance decrease after a while > > > Hello, > > We are indexing approximately 500 documents per day. My benchmark says an > update is done in 0.7 sec just after Solr has been started. But it quickly > decrease to 2.2 secs per update ! > I have just been focused on the Schema until now, and didn't changed many > stuffs in the solrconfig file. Maybe you have some tips which could help me > to be more linear ? > > Thanks a lot > Vincent > -- > View this message in context: > http://www.nabble.com/Solr-update-performance-decrease-after-a-while-tp23945947p23945947.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr in distributed mode
Hello, All of this is covered on the Wiki, search for: distributed search Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Rakhi Khatwani > To: solr-user@lucene.apache.org > Cc: ninad.r...@germinait.com; ranjit.n...@germinait.com; > saurabh.maha...@germinait.com > Sent: Tuesday, June 9, 2009 4:55:55 AM > Subject: solr in distributed mode > > Hi, > I was looking for ways in which we can use solr in distributed mode. > is there anyways we can use solr indexes across machines or by using Hadoop > Distributed File System? > > Its has been mentioned in the wiki that > When an index becomes too large to fit on a single system, or when a single > query takes too long to execute, an index can be split into multiple shards, > and Solr can query and merge results across those shards. > > what i understand is that shards are a partition. are shards on the same > machine or can it be on different machines?? do we have to manually > split the indexes to store in different shards. > > do you have an example or some tutorial which demonstrates distributed index > searching/ storing using shards? > > Regards, > Raakhi
Re: Example folder - can we change it?
Francis, But that really is an example. It's something that you can try and something that you can copy and base your own Solr setup on. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Francis Yakin > To: "solr-user@lucene.apache.org" > Sent: Monday, June 8, 2009 2:02:53 PM > Subject: Example folder - can we change it? > > > When I install solr , by default it will install it under > /opt/apache-solr-1.3.0/ > > The bin , config file and data is under /opt/apache-solr-1.3.0/example/solr > > Is there anyway that we change the example to something else? > Because "example" is can be interpreted wrong ( like sample, so it's not real) > > > Francis
Re: creating new fields at index time - is it possible?
Hello, It might be expensive/slow, but you could write a custom UpdateRequestProcessor, "manually" run a field through the analyzer and then add/delete other fields right there, in the URP. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Kir4 > To: solr-user@lucene.apache.org > Sent: Sunday, June 7, 2009 8:18:43 PM > Subject: Re: creating new fields at index time - is it possible? > > > Now that I plan on adding new fields based on the data already present, it > would be best to read the existing field after it has been processed > (cleaned up) by the other analyzers. > I was therefore planning on creating a custom analyzer that is started after > the other default ones have been run; said analyzer would read the field and > add new ones based on several rules and some data. > > I have been told that UpdateRequestProcessor probably cannot be invoked from > an analyzer. > Is there any way for an analyzer to add new fields? > It would be enough to just populate them: I could add empty fields to the > original document, and define for them analyzers that read the data of other > fields previously analyzed and populate the empty field. > > Thanks to anyone that may have answers to my questions. =) > Best regards, > G. > > > > Noble Paul നോബിള് नोब्ळ्-2 wrote: > > > > > > If you wish to plugin your code try this > > http://wiki.apache.org/solr/UpdateRequestProcessor > > > > > > -- > View this message in context: > http://www.nabble.com/creating-new-fields-at-index-time---is-it-possible--tp23741267p23916728.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to index data without token in Solr
That's fine, now I've got solution for this. Thanks any way On Wed, Jun 10, 2009 at 12:29 PM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: > > Hello, > > I don't follow the "index data without token to match with my search" part. > Could you please give an example of what you mean? > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message > > From: chem leakhina > > To: solr-user@lucene.apache.org > > Sent: Tuesday, June 9, 2009 10:06:35 PM > > Subject: How to index data without token in Solr > > > > Hi all, > > I am very new in Solr and I want to use Solr to index data without token > to > > match with my search. > > Does anyone know how to index data without token in Solr? > > if possible, can you give me an example? > > > > Thanks in advance, > > LEE > >
How to search date
Hi, Could you tell me how to make query to search Date with these conditions: Before, After, Between, All Could you please write some example for me? Regards, LEE
How to search date in Solr
Hi, Could you tell me how to make query to search Date in Solr with these conditions: Before, After, Between, All Could you please write some example for me? Regards, LEE
Re: How to search date in Solr
Hello, These are all done with range queries. They tend to look like this: &q=add_date:[BeginDateHere TO EndDateHere] You can use * for either BeginDateHere or EndDateHere to get the "before/after" effect. "All" is just q=*:* Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: chem leakhina > To: solr-user@lucene.apache.org > Sent: Wednesday, June 10, 2009 2:28:46 AM > Subject: How to search date in Solr > > Hi, > Could you tell me how to make query to search Date in Solr with these > conditions: > > Before, After, Between, All > > Could you please write some example for me? > > Regards, > LEE
Re: How to search date in Solr
Thanks Otis On Wed, Jun 10, 2009 at 1:32 PM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: > > Hello, > > These are all done with range queries. They tend to look like this: > > &q=add_date:[BeginDateHere TO EndDateHere] > > > You can use * for either BeginDateHere or EndDateHere to get the > "before/after" effect. > > "All" is just q=*:* > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message > > From: chem leakhina > > To: solr-user@lucene.apache.org > > Sent: Wednesday, June 10, 2009 2:28:46 AM > > Subject: How to search date in Solr > > > > Hi, > > Could you tell me how to make query to search Date in Solr with these > > conditions: > > > > Before, After, Between, All > > > > Could you please write some example for me? > > > > Regards, > > LEE > >
Re: Sharding strategy
Hi Otis, thanks for your reply! You could say I'm lucky (and I totally agree since I've made the choice of ordering the data that way:p). What you describe is what I've thought about doing and I'm happy to read that you approve. It is always nice to know that you are not doing things completely off - that's what I love about this mailing list! I've implemented a sharded "yellow pages" that builds up the shard parameter and it will obviously be easy to search in two shards to overcome the beginning of the year situation, just thought it might be a bit stupid to search for 1% of the data in the "latest shard" and the rest in shard n-1. How much of a performance decrease do you recon I will get from searching two shards instead of one? Anyways, thanks for confirming things, Otis! Cheers, Aleksander On Wed, 10 Jun 2009 07:51:16 +0200, Otis Gospodnetic wrote: Aleksander, In a sense you are lucky you have time-ordered data. That makes it very easy to shard and cheaper to search - you know exactly which shards you need to query. The beginning of the year situation should also be easy. Do start with the latest shard for the current year, and go to next shard only if you have to (e.g. if you don't get enough results from the first shard). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Aleksander M. Stensby To: "solr-user@lucene.apache.org" Sent: Tuesday, June 9, 2009 7:07:47 AM Subject: Sharding strategy Hi all, I'm trying to figure out how to shard our index as it is growing rapidly and we want to make our solution scalable. So, we have documents that are most commonly sorted by their date. My initial thought is to shard the index by date, but I wonder if you have any input on this and how to best solve this... I know that the most frequent queries will be executed against the "latest" shard, but then let's say we shard by year, how do we best solve the situation that will occur in the beginning of a new year? (Some of the data will be in the last shard, but most of it will be on the second last shard.) Would it be stupid to have a "latest" shard with duplicate data (always consisting of the last 6 months or something like that) and maintain that index in addition to the regular yearly shards? Any one else facing a similar situation with a good solution? Any input would be greatly appreciated :) Cheers, Aleksander --Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
RE: ExtractingRequestHandler and local files
I had also been wondering about this, but was to lazy/busy to post a question. Now that it is resolved it would help lots if you could post ad example of how you invoked enableRemoteStreaming for your document(s)? Rgds Fergus. >Thanks for the quick response, Grant. > > > >We tried it and it seems to work. > > > >The confusion stemmed from the fact that the wiki states that the parameter is >not used - there are also comments in the test cases for the handler that say: > > > >//TODO: stop using locally defined fields once stream.file and stream.body >start working everywhere > > > >So wanted to confirm. > >> From: gsing...@apache.org >> To: solr-user@lucene.apache.org >> Subject: Re: ExtractingRequestHandler and local files >> Date: Tue, 9 Jun 2009 14:50:43 -0400 >> >> I haven't tried it, but I thought the enableRemoteStreaming stuff >> should work. That stuff is handled by Solr in other places, if I >> recall correctly. Have you tried it? >> >> -Grant >> >> On Jun 9, 2009, at 2:28 PM, doraiswamy thirumalai wrote: >> >> > >> > Hi, >> > >> > >> > >> > I would greatly appreciate a quick response to this question. >> > >> > Is there a means of passing a local file to the >> > ExtractingRequestHandler (as the enableRemoteStreaming/stream.file >> > option does with the other handlers) so the file contents can >> > directly be read from the local disk versus going over HTTP? >> > >> > Per the Solr wiki entry for ExtractingRequestHandler, >> > enableRemoteStreaming is not used? >> > >> > This is also a tad confusing because the Ruby example off: >> > http://www.lucidimagination.com/blog/2009/02/17/acts_as_solr_cell/ >> > explicitly recommends setting this parameter? >> > >> > Thanks! >> > >> > >> > _ >> > Lauren found her dream laptop. Find the PC that¹s right for you. >> > http://www.microsoft.com/windows/choosepc/?ocid=ftp_val_wl_290 >> >> >> >> >> -- >> Grant Ingersoll >> http://www.lucidimagination.com/ >> >> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) >> using Solr/Lucene: >> http://www.lucidimagination.com/search >> > >_ >Windows Live SkyDrive: Get 25 GB of free online storage. >http://windowslive.com/online/skydrive?ocid=TXT_TAGLM_WL_SD_25GB_062009