Large number of collections in SolrCloud
Hi, I have a SolrCloud cluster with 3 nodes : 3 shards per node and replication factor at 3. The collections number is around 1000. All the collections use the same Zookeeper configuration. So when I create each collection, the ZK configuration is pulled from ZK and the configuration files are stored in the JVM. I thought that if the configuration was the same for each collection, the impact on the JVM would be insignifiant because the configuration should be loaded only once. But it is not the case, for each collection created, the JVM size increases because the configuration is loaded again, am I correct ? If I have a small configuration folder size, I have no problem because the folder size is less than 500 KB so if we count 1000 collections x 500 KB, the JVM impact is 500 MB. But we manage a lot of languages with some dictionaries so the configuration folder size is about 6 MB. The JVM impact is very important now because it can be more than 6 GB (1000 x 6 MB). So I would like to have the feeback of people who have a cluster with a large number of collections too. Do I have to change some settings to handle this case better ? What can I do to optimize this behaviour ? For now, we just increase the RAM size per node at 16 GB but we plan to increase the collections number. Thanks, Olivier
reload collections timeout
Hi everybody, I have about 1300 collections, 3 shards, replicationfactor = 3, MaxShardPerNode=3. I have 3 boxes of 64G (32 JVM). When I want to reload all my collections I get a timeout error. Is there a way to make a reload in async as to create collections (async=requestid)? I saw on this issue that it was done but it did not seem to work. https://issues.apache.org/jira/browse/SOLR-5477 how to use the async mode to reload collections ? thanks a lot Olivier Damiot
Re: Large number of collections in SolrCloud
Hi, Thanks a lot Erick and Shawn for your answers. I am aware that it is a very particular issue with not a common use of Solr. I just wondered if people had the similar business case. For information we need a very important number of collections with the same configuration cause of legally reasons. Indeed each collection represents one of our customers and by contract we have to separate the data of each of them. If we had the choice, we just would have one collection with a field name 'Customers' and we would do filter queries on it but we can't ! Anyway thanks again for your answers. For now, we finally did not add the different languages dictionaries per collection and it is fine for 1K+ customers with more resources added to the servers. Best, Olivier Tavard 2015-07-27 17:53 GMT+02:00 Shawn Heisey : > On 7/27/2015 9:16 AM, Olivier wrote: > > I have a SolrCloud cluster with 3 nodes : 3 shards per node and > > replication factor at 3. > > The collections number is around 1000. All the collections use the same > > Zookeeper configuration. > > So when I create each collection, the ZK configuration is pulled from ZK > > and the configuration files are stored in the JVM. > > I thought that if the configuration was the same for each collection, the > > impact on the JVM would be insignifiant because the configuration should > be > > loaded only once. But it is not the case, for each collection created, > the > > JVM size increases because the configuration is loaded again, am I > correct ? > > > > If I have a small configuration folder size, I have no problem because > the > > folder size is less than 500 KB so if we count 1000 collections x 500 KB, > > the JVM impact is 500 MB. > > But we manage a lot of languages with some dictionaries so the > > configuration folder size is about 6 MB. The JVM impact is very important > > now because it can be more than 6 GB (1000 x 6 MB). > > > > So I would like to have the feeback of people who have a cluster with a > > large number of collections too. Do I have to change some settings to > > handle this case better ? What can I do to optimize this behaviour ? > > For now, we just increase the RAM size per node at 16 GB but we plan to > > increase the collections number. > > Severe issues were noticed when dealing with many collections, and this > was with a simple config, and completely empty indexes. A complex > config and actual index data would make it run that much more slowly. > > https://issues.apache.org/jira/browse/SOLR-7191 > > Memory usage for the config wasn't even considered when I was working on > reporting that issue. > > SolrCloud is highly optimized to work well when there are a relatively > small number of collections. I think there is work that we can do which > will optimize operations to the point where thousands of collections > will work well, especially if they all share the same config/schema ... > but this is likely to be a fair amount of work, which will only benefit > a handful of users who are pushing the boundaries of what Solr can do. > In the open source world, a problem like that doesn't normally receive a > lot of developer attention, and we rely much more on help from the > community, specifically from knowledgeable users who are having the > problem and know enough to try and fix it. > > FYI -- 16GB of RAM per machine is quite small for Solr, particularly > when pushing the envelope. My Solr machines are maxed at 64GB, and I > frequently wish I could install more. > > https://wiki.apache.org/solr/SolrPerformanceProblems#RAM > > One possible solution for your dilemma is simply adding more machines > and spreading your collections out so each machine's memory requirements > go down. > > Thanks, > Shawn > >
Large multivalued field and overseer problem
Hi, We have a Solrcloud cluster with 3 nodes (4 processors, 24 Gb RAM per node). We have 3 shards per node and the replication factor is 3. We host 3 collections, the biggest is about 40K documents only. The most important thing is a multivalued field with about 200K to 300K values per document (each value is a kind of reference product of type String). We have some very big issues with our SolrCloud cluster. It crashes entirely very frequently at the indexation time. It starts with an overseer issue : Session expired de l’overseer : KeeperErrorCode = Session expired for /overseer_elect/leader Then an another node is elected overseer. But the recovery phase seems to failed indefinitely. It seems that the communication between the overseer and ZK is impossible. And after a short period of time, all the cluster is unavailable (out of memory JVM error). And we have to restart it. So I wanted to know if we can continue to use huge multivalued field with SolrCloud. We are on Solr 4.10.4 for now, do you think that if we upgrade to Solr 5, with an overseer per collection it can fix our issues ? Or do we have to rethink the schema to avoid this very large multivalued field ? Thanks, Best, Olivier
Problems for indexing large documents on SolrCloud
Hi, I have some problems for indexing large documents in a SolrCloud cluster of 3 servers (Solr 4.8.1) with 3 shards and 2 replicas for each shard on Tomcat 7. For a specific document (with 300 K values in a multivalued field), I couldn't index it on SolrCloud but I could do it in a single instance of Solr on my own PC. The indexation is done with Solarium from a database. The data indexed are e-commerce products with classic fields like name, price, description, instock, etc... The large field (type int) is constitued of other products ids. The only difference with other documents well-indexed on Solr is the size of that multivalued field. Indeed, other documents well-indexed have all between 100K values and 200 K values for that field. The index size is 11 Mb for 20 documents. To solve it, I tried to change several parameters including ZKTimeout in solr.xml : In solrcloud section : 6 10 10 In shardHandlerFactory section : ${socketTimeout:10} ${connTimeout:10} I also tried to increase these values in solrconfig.xml : I also tried to increase the quantity of RAM (there are VMs) : each server has 4 Gb of RAM with 3Gb for the JVM. Are there other settings which can solve the problem that I would have forgotten ? The error messages are : ERROR SolrDispatchFilter null:java.lang.RuntimeException: [was class java.net.SocketException] Connection reset ERROR SolrDispatchFilter null:ClientAbortException: java.net.SocketException: broken pipe ERROR SolrDispatchFilter null:ClientAbortException: java.net.SocketException: broken pipe ERROR SolrCore org.apache.solr.common.SolrException: Unexpected end of input block; expected an identifier ERROR SolrCore org.apache.solr.common.SolrException: Unexpected end of input block; expected an identifier ERROR SolrCore org.apache.solr.common.SolrException: Unexpected end of input block; expected an identifier ERROR SolrCore org.apache.solr.common.SolrException: Unexpected EOF in attribute value Thanks, Olivier SolrCore org.apache.solr.common.SolrException: Unexpected end of input block in start tag
Re: Problems for indexing large documents on SolrCloud
Hi, First thanks for your advices. I did some several tests and finally I could index all the data on my SolrCloud cluster. The error was client side, it's documented in this post : http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201406.mbox/%3ccfc09ae1.94f8%25rebecca.t...@ucsf.edu%3E "EofException from Jetty means one specific thing: The client software disconnected before Solr was finished with the request and sent its response. Chances are good that this is because of a configured socket timeout on your SolrJ client or its HttpClient. This might have been done with the setSoTimeout method on the server object." So I increased Solarium timeout from 5 to 60 seconds and all the data is well indexed now. The error was not reproducible on my development PC because the database and the Solr were on the same local virtual machine with a lot of available resources so the indexation was faster than in SolrCloud cluster. Thanks, Olivier 2014-09-11 0:21 GMT+02:00 Shawn Heisey : > On 9/10/2014 2:05 PM, Erick Erickson wrote: > > bq: org.apache.solr.common.SolrException: Unexpected end of input > > block; expected an identifier > > > > This is very often an indication that your packets are being > > truncated by "something in the chain". In your case, make sure > > that Tomcat is configured to handle inputs of the size that you're > sending. > > > > This may be happening before things get to Solr, in which case your > settings > > in solrconfig.xml aren't germane, the problem is earlier than than. > > > > A "semi-smoking-gun" here is that there's a size of your multivalued > > field that seems to break things... That doesn't rule out time problems > > of course. > > > > But I'd look at the Tomcat settings for maximum packet size first. > > The maximum HTTP request size is actually is controlled by Solr itself > since 4.1, with changes committed for SOLR-4265. Changing the setting > on Tomcat probably will not help. > > An example from my own config which sets this to 32MB - the default is > 2048, or 2MB: > > multipartUploadLimitInKB="32768" formdataUploadLimitInKB="32768"/> > > Thanks, > Shawn > >
Leader election
Hello everybody, I use solr 5.2.1 and am having a big problem. I have about 1200 collections, 3 shards, replicationfactor = 3, MaxShardPerNode=3. I have 3 boxes of 64G (32 JVM). I have no problems with the creation of collection or indexing, but when I lose a node (VMY full or kill) and I restart, all my collections are down. I look in the logs I can see problems of leader election, eg: - Checking if I (core = test339_shard1_replica1, coreNodeName = core_node5) shoulds try and be the leader. - Cloud says we are still state leader. I feel that all server pass the buck! I do not understand this error especially as if I read the mailing list I have the impression that this bug is solved long ago. what should I do to start my collections properly? Is someone could help me ? thank you a lot Olivier
Fast autocomplete for large dataset
Hi, I am looking for a fast and easy to maintain way to do autocomplete for large dataset in solr. I heard about Ternary Search Tree (TST) <https://en.wikipedia.org/wiki/Ternary_search_tree>. But I would like to know if there is something I missed such as best practice, Solr new feature. Any suggestion is welcome. Thank you. Regards Olivier
Re: Fast autocomplete for large dataset
Thank you Eric for your reply. If I understand it seems that these approaches are using index to hold terms. As the index grows bigger, it can be a performance issues. Is it right? Please can you check this article <http://www.norconex.com/serving-autocomplete-suggestions-fast/> to see what I mean? Thank you. Regards Olivier 2015-08-01 17:42 GMT+02:00 Erick Erickson : > Well, defining what you mean by "autocomplete" would be a start. If it's > just > a user types some letters and you suggest the next N terms in the list, > TermsComponent will fix you right up. > > If it's more complicated, the AutoSuggest functionality might help. > > If it's correcting spelling, there's the spellchecker. > > Best, > Erick > > On Sat, Aug 1, 2015 at 10:00 AM, Olivier Austina > wrote: > > Hi, > > > > I am looking for a fast and easy to maintain way to do autocomplete for > > large dataset in solr. I heard about Ternary Search Tree (TST) > > <https://en.wikipedia.org/wiki/Ternary_search_tree>. > > But I would like to know if there is something I missed such as best > > practice, Solr new feature. Any suggestion is welcome. Thank you. > > > > Regards > > Olivier >
Re: Fast autocomplete for large dataset
Thank you Eric, I would like to implement an autocomplete for large dataset. The autocomplete should show the phrase or the question the user want as the user types. The requirement is that the autocomplete should be fast (not slowdown by the volume of data as dataset become bigger), and easy to maintain. The autocomplete can have its own Solr server. It is an autocomplete like others but it should be only fast and easy to maintain. What is the limitations of suggesters mentioned in the article? Thank you. Regards Olivier 2015-08-01 19:41 GMT+02:00 Erick Erickson : > Not really. There's no need to use ngrams as the article suggests if the > terms component does what you need. Which is why I asked you about what > autocomplete means in your context. Which you have not clarified. Have you > even looked at terms component? Especially the terms.prefix option? > > Terms component has it's limitations, but performance isn't one of them. > The suggesters mentioned in the article have other limitations. It's really > useless to discuss those limitations, though, until the problem you're > trying to solve is clearly stated. > On Aug 1, 2015 1:01 PM, "Olivier Austina" > wrote: > > > Thank you Eric for your reply. > > If I understand it seems that these approaches are using index to hold > > terms. As the index grows bigger, it can be a performance issues. > > Is it right? Please can you check this article > > <http://www.norconex.com/serving-autocomplete-suggestions-fast/> to see > > what I mean? Thank you. > > > > Regards > > Olivier > > > > > > 2015-08-01 17:42 GMT+02:00 Erick Erickson : > > > > > Well, defining what you mean by "autocomplete" would be a start. If > it's > > > just > > > a user types some letters and you suggest the next N terms in the list, > > > TermsComponent will fix you right up. > > > > > > If it's more complicated, the AutoSuggest functionality might help. > > > > > > If it's correcting spelling, there's the spellchecker. > > > > > > Best, > > > Erick > > > > > > On Sat, Aug 1, 2015 at 10:00 AM, Olivier Austina > > > wrote: > > > > Hi, > > > > > > > > I am looking for a fast and easy to maintain way to do autocomplete > for > > > > large dataset in solr. I heard about Ternary Search Tree (TST) > > > > <https://en.wikipedia.org/wiki/Ternary_search_tree>. > > > > But I would like to know if there is something I missed such as best > > > > practice, Solr new feature. Any suggestion is welcome. Thank you. > > > > > > > > Regards > > > > Olivier > > > > > >
Re: Fast autocomplete for large dataset
Thank you Eric for your replies and the link. Regards Olivier 2015-08-02 3:47 GMT+02:00 Erick Erickson : > Here's some background: > > http://lucidworks.com/blog/solr-suggester/ > > Basically, the limitation is that to build the suggester all docs in > the index need to be read to pull out the stored field and build > either the FST or the sidecar Lucene index, which can be a _very_ > costly operation (as in minutes/hours for a large dataset). > > bq: The requirement is that the autocomplete should be fast (not > slowdown by the volume of data as dataset become bigger) > > Well, in some alternate universe this may be possible. But the larger > the corpus the slower the processing will be, there's just no way > around that. Whether it's fast enough for your application is a better > question ;). > > Best, > Erick > > > On Sat, Aug 1, 2015 at 2:05 PM, Olivier Austina > wrote: > > Thank you Eric, > > > > I would like to implement an autocomplete for large dataset. The > > autocomplete should show the phrase or the question the user want as the > > user types. The requirement is that the autocomplete should be fast (not > > slowdown by the volume of data as dataset become bigger), and easy to > > maintain. The autocomplete can have its own Solr server. It is an > > autocomplete like others but it should be only fast and easy to maintain. > > > > What is the limitations of suggesters mentioned in the article? Thank > you. > > > > Regards > > Olivier > > > > > > 2015-08-01 19:41 GMT+02:00 Erick Erickson : > > > >> Not really. There's no need to use ngrams as the article suggests if the > >> terms component does what you need. Which is why I asked you about what > >> autocomplete means in your context. Which you have not clarified. Have > you > >> even looked at terms component? Especially the terms.prefix option? > >> > >> Terms component has it's limitations, but performance isn't one of them. > >> The suggesters mentioned in the article have other limitations. It's > really > >> useless to discuss those limitations, though, until the problem you're > >> trying to solve is clearly stated. > >> On Aug 1, 2015 1:01 PM, "Olivier Austina" > >> wrote: > >> > >> > Thank you Eric for your reply. > >> > If I understand it seems that these approaches are using index to hold > >> > terms. As the index grows bigger, it can be a performance issues. > >> > Is it right? Please can you check this article > >> > <http://www.norconex.com/serving-autocomplete-suggestions-fast/> to > see > >> > what I mean? Thank you. > >> > > >> > Regards > >> > Olivier > >> > > >> > > >> > 2015-08-01 17:42 GMT+02:00 Erick Erickson : > >> > > >> > > Well, defining what you mean by "autocomplete" would be a start. If > >> it's > >> > > just > >> > > a user types some letters and you suggest the next N terms in the > list, > >> > > TermsComponent will fix you right up. > >> > > > >> > > If it's more complicated, the AutoSuggest functionality might help. > >> > > > >> > > If it's correcting spelling, there's the spellchecker. > >> > > > >> > > Best, > >> > > Erick > >> > > > >> > > On Sat, Aug 1, 2015 at 10:00 AM, Olivier Austina > >> > > wrote: > >> > > > Hi, > >> > > > > >> > > > I am looking for a fast and easy to maintain way to do > autocomplete > >> for > >> > > > large dataset in solr. I heard about Ternary Search Tree (TST) > >> > > > <https://en.wikipedia.org/wiki/Ternary_search_tree>. > >> > > > But I would like to know if there is something I missed such as > best > >> > > > practice, Solr new feature. Any suggestion is welcome. Thank you. > >> > > > > >> > > > Regards > >> > > > Olivier > >> > > > >> > > >> >
SOLR cloud (5.2.1) recovery
hello, i'am a bit confused about how solr cloud recovery is supposed to work exactly in the case of loosing a single node completely. My 600 collections are created with numShards=3&replicationFactor=3&maxShardsPerNode=3 However, how do i configure a new node to take the place of the dead node, or if accidentally i delete the data dir ? I bring up a new node which is completely empty (empty data dir), install solr, and connect it to zookeeper.Is it supposed to work automatically from there? All my shards/replicas on this node as down (i suppose because there is no cores in data dir). Do I need to recreate the cores first? Can i copy/paste data directory from another node to this one ? I think no because i should rename all variables in core.properties which are dedicated for each node (like name or coreNodeName) thanks, Olivier Damiot
How to dereference boost values?
Is it possible to do something like this: bf=myfield^$myfactor Thanks, Olivier
Dereferencing boost values?
Is there a way to do something like this: " bf=myfield^$myfactor " ? (Doesn't work, the boost value has to be a direct number) Thanks, Olivier
Re: Dereferencing boost values?
Thanks guys... I'm using edismax, and I have a long bf field, that I want in a solr's requesthandler config as default, but customizable via query string, something like that: product(a,$a)^$fa sum(b,$b1,$b2)^$fb c^$fc ... where the caller would pass $a, $fa, $b1, $b2, $fb, $fc (and a, b, c are numeric fields) So my problem is with $fa, $fb, and $fc. Solr doesn't take that syntax. For numeric operands, is the dismax boost operator ^ just a pow()? If so, my problem is solved by doing that: pow(product(a,$a1),$fa) pow(sum(b,$b1,$b2),$fb) pow(c,$fc) Is a^b equiv to pow(a,b)? Thanks, Olivier On 7/14/2015 2:31 PM, Chris Hostetter wrote: To clarify the difference: - "bf" is a special param of the dismax parser, which does an *additive* boost function - that function can be something as simple as a numeric field - alternatively, you can use the "boost" parser in your main query string, to wrap any parser (dismax, edismax, standard, whatever) in a *multiplicitive* boost, where the boost function can be anything - multiplicitve boosts are almost always what people really want, additive boosts are a lot less useful. - when specifying any function, you can use variable derefrencing for any function params. So in the example Upayavira gave, you can use any arbitrary query param to specify the function to use as a multiplicitive boost arround an arbitrary query -- which could still use dismax if you want (just specify the neccessary parser "type" as a localparam on the inner query, or use a defType localparam on the original boost query). Or you could explicitly specify a function that incorporates a field value with some other dynamic params, and use that entire function as your multiplicitive boost. a more elaborate example using the "bin/solr -e techproducts" data... http://localhost:8983/solr/techproducts/query?debug=query&q={!boost%20b=$boost_func%20defType=dismax%20v=$qq}&qf=name+title&qq=apple%20ipod&boost_func=pow%28$boost_field,$boost_factor%29&boost_field=price&boost_factor=2 "params":{ "qq":"apple ipod", "q":"{!boost b=$boost_func defType=dismax v=$qq}", "debug":"query", "qf":"name title", "boost_func":"pow($boost_field,$boost_factor)", "boost_factor":"2", "boost_field":"price"}}, : Date: Tue, 14 Jul 2015 21:58:36 +0100 : From: Upayavira : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: Re: Dereferencing boost values? : : You could do : : q={!boost b=$b v=$qq} : qq=your query : b=YOUR-FACTOR : : If what you want is to provide a value outside. : : Also, with later Solrs, you can use ${whatever} syntax in your main : query, which might work for you too. : : Upayavira : : On Tue, Jul 14, 2015, at 09:28 PM, Olivier Lebra wrote: : > Is there a way to do something like this: " bf=myfield^$myfactor " ? : > (Doesn't work, the boost value has to be a direct number) : > : > Thanks, : > Olivier : -Hoss http://www.lucidworks.com/
Querying specific database attributes or table
Hi, I am new to Solr. I would like to index and querying a relational database. Is it possible to query a specific table or attribute of the database. Example if I have 2 tables A and B both have the attribute "name" and I want to have only the results form the table A and not from table B. Is it possible? Can I restrict the query to only one table without having result from others table? Is it possible to query a specific attribute of a table? Is it possible to do join query like SQL? Any suggestion is welcome. Thank you. Regards Olivier
Topology of Solr use
Hi All, I would to have an idea about Solr usage: number of users, industry, countries or any helpful information. Thank you. Regards Olivier
Re: Topology of Solr use
Thank you Markus, the link is very useful. Regards Olivier 2014-04-17 18:24 GMT+02:00 Markus Jelsma : > This may help a bit: > > https://wiki.apache.org/solr/PublicServers > > -Original message- > From:Olivier Austina > Sent:Thu 17-04-2014 18:16 > Subject:Topology of Solr use > To:solr-user@lucene.apache.org; > Hi All, > I would to have an idea about Solr usage: number of users, industry, > countries or any helpful information. Thank you. > Regards > Olivier >
Problem indexing email attachments
Hello, I'm trying to index email files with Solr (4.7.2) The files have the extension .eml (message/rfc822) The mail body is correctly indexed but attachments are not indexed if they are not .txt files. If attachments are .txt files it works, but if attachment are .pdf of .docx files they are not indexed. I checked the extracted text by calling: curl " http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true&extractOnly=true&extractFormat=text " -F "myfile=@Test1.eml" The returned extracted text does not contain the content of the attachments if they are not .txt files. It is not a problem with the Apache Tika library not being able to process attachments, because running the standalone Apache Tika app by calling: java -jar tika-app-1.4.jar -t Test1.eml on my eml files correctly displays the attachments' text. Maybe is it a problem with how Tika is called by Solr ? Is there something to modify in the default configuration ? Thanx for any help ;) Olivier
Re: Problem indexing email attachments
As I said, it is not a problem in the Tika library ;) I have tried with Tika 1.5 jars and it gives the same results. Guido Medina wrote on 23/04/2014 16:15:11: > From: Guido Medina > To: solr-user@lucene.apache.org > Date: 23/04/2014 16:15 > Subject: Re: Problem indexing email attachments > > We particularly massage solr.war and put our own updated jars, maybe > this helps: > > http://www.apache.org/dist/tika/CHANGES-1.5.txt > > We using Tika 1.5 inside Solr with POI 3.10-Final, etc... > > Guido. > > On 23/04/14 14:38, olivier.mass...@real.lu wrote: > > Hello, > > > > I'm trying to index email files with Solr (4.7.2) > > > > The files have the extension .eml (message/rfc822) > > > > The mail body is correctly indexed but attachments are not indexed if they > > are not .txt files. > > > > If attachments are .txt files it works, but if attachment are .pdf of > > .docx files they are not indexed. > > > > > > > > I checked the extracted text by calling: > > > > curl " > > http://localhost:8983/solr/update/extract? > literal.id=doc1&commit=true&extractOnly=true&extractFormat=text > > " -F "myfile=@Test1.eml" > > > > The returned extracted text does not contain the content of the > > attachments if they are not .txt files. > > > > > > It is not a problem with the Apache Tika library not being able to process > > attachments, because running the standalone Apache Tika app by calling: > > > > > > java -jar tika-app-1.4.jar -t Test1.eml > > > > > > on my eml files correctly displays the attachments' text. > > > > > > > > Maybe is it a problem with how Tika is called by Solr ? > > > > Is there something to modify in the default configuration ? > > > > > > Thanx for any help ;) > > > > Olivier >
Website running Solr
Hi All, Is there a way to know if a website use Solr? Thanks. Regards Olivier
Subject=How to Get Highlighting Working in Velocity (Solr 4.8.0)
May be you miss that your field "dom_title" should be index="true" termVectors="true" termPositions="true" termOffsets="true"
Re: feedback on Solr 4.x LotsOfCores feature
15K cores is around 4 minutes : no network drive, just a spinning disk But, one important thing, to simulate a cold start or an useless linux buffer cache, I used the following command to empty the linux buffer cache : sync && echo 3 > /proc/sys/vm/drop_caches Then, I started Solr and I found the result above Le 11/10/2013 13:06, Erick Erickson a écrit : bq: sharing the underlying solrconfig object the configset introduced in JIRA SOLR-4478 seems to be the solution for non-SolrCloud mode SOLR-4478 will NOT share the underlying config objects, it simply shares the underlying directory. Each core will, at least as presently envisioned, simply read the files that exist there and create their own solrconfig object. Schema objects may be shared, but not config objects. It may turn out to be relatively easy to do in the configset situation, but last time I looked at sharing the underlying config object it was too fraught with problems. bq: 15K cores is around 4 minutes I find this very odd. On my laptop, spinning disk, I think I was seeing 1k cores discovered/sec. You're seeing roughly 16x slower, so I have no idea what's going on here. If this is just reading the files, you should be seeing horrible disk contention. Are you on some kind of networked drive? bq: To do that in background and to block on that request until core discovery is complete, should not work for us (due to the worst case). What other choices are there? Either you have to do it up front or with some kind of blocking. Hmmm, I suppose you could keep some kind of custom store (DB? File? ZooKeeper?) that would keep the last known layout. You'd still have some kind of worst-case situation where the core you were trying to load wouldn't be in your persistent store and you'd _still_ have to wait for the discovery process to complete. bq: and we will use the cores Auto option to create load or only load the core on Interesting. I can see how this could all work without any core discovery but it does require a very specific setup. On Thu, Oct 10, 2013 at 11:42 AM, Soyez Olivier <mailto:olivier.so...@worldline.com> wrote: > The corresponding patch for Solr 4.2.1 LotsOfCores can be found in SOLR-5316, > including the new Cores options : > - "numBuckets" to create a subdirectory based on a hash on the corename % > numBuckets in the core Datadir > - "Auto" with 3 differents values : > 1) false : default behaviour > 2) createLoad : create, if not exist, and load the core on the fly on the > first incoming request (update, select) > 3) onlyLoad : load the core on the fly on the first incoming request > (update, select), if exist on disk > > Concerning : > - sharing the underlying solrconfig object, the configset introduced in JIRA > SOLR-4478 seems to be the solution for non-SolrCloud mode. > We need to test it for our use case. If another solution exists, please tell > me. We are very interested in such functionality and to contribute, if we can. > > - the possibility of lotsOfCores in SolrCloud, we don't know in details how > SolrCloud is working. > But one possible limit is the maximum number of entries that can be added to > a zookeeper node. > Maybe, a solution will be just a kind of hashing in the zookeeper tree. > > - the time to discover cores in Solr 4.4 : with spinning disk under linux, > all cores with transient="true" and loadOnStartup="false", the linux buffer > cache empty before starting Solr : > 15K cores is around 4 minutes. It's linear in the cores number, so for 50K > it's more than 13 minutes. In fact, it corresponding to the time to read all > core.properties files. > To do that in background and to block on that request until core discovery is > complete, should not work for us (due to the worst case). > So, we will just disable the core Discovery, because we don't need to know > all cores from the start. Start Solr without any core entries in solr.xml, > and we will use the cores Auto option to create load or only load the core on > the fly, based on the existence of the core on the disk (absolute path > calculated from the core name). > > Thanks for your interest, > > Olivier > > De : Erick Erickson [erickerick...@gmail.com<mailto:erickerick...@gmail.com>] > Date d'envoi : lundi 7 octobre 2013 14:33 > À : solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org> > Objet : Re: feedback on Solr 4.x LotsOfCores feature > > Thanks for the great writeup! It's always interesting to see how > a feature plays out "in the real world". A couple of questions > though: > > bq: We added 2 Cores options : > Do you mean you patched Solr? If so are you willing to shard the code >
Re: feedback on Solr 4.x LotsOfCores feature
Another way to "simulate" the core discovery is : time find $PATH_TO_CORES -name core.properties -type f -exec cat '{}' > /dev/null 2>&1 \; or just the core.properties read time : find $PATH_TO_CORES -name core.properties > cores.list time for i in `cat cores.list`; do cat $i > /dev/null 2>&1; done; Olivier Le 19/10/2013 11:57, Erick Erickson a écrit : For my quick-and-dirty test I just rebooted my machine totally and still had 1K/sec core discovery. So this still puzzles me greatly. The time do do this should be approximated by the time it takes to just walk your tree, find all the core.properties and read them. I it possible to just write a tiny Java program to do that? Or rip off the core discovery code and use that for a small stand-alone program? Because this is quite a bit at odds with what I've seen. Although now that I think about it, the code has gone through some revisions since then, but I don't think they should have affected this... Best Erick On Fri, Oct 18, 2013 at 2:59 PM, Soyez Olivier <mailto:olivier.so...@worldline.com>wrote: > 15K cores is around 4 minutes : no network drive, just a spinning disk > But, one important thing, to simulate a cold start or an useless linux > buffer cache, > I used the following command to empty the linux buffer cache : > sync && echo 3 > /proc/sys/vm/drop_caches > Then, I started Solr and I found the result above > > > Le 11/10/2013 13:06, Erick Erickson a écrit : > > > bq: sharing the underlying solrconfig object the configset introduced > in JIRA SOLR-4478 seems to be the solution for non-SolrCloud mode > > SOLR-4478 will NOT share the underlying config objects, it simply > shares the underlying directory. Each core will, at least as presently > envisioned, simply read the files that exist there and create their > own solrconfig object. Schema objects may be shared, but not config > objects. It may turn out to be relatively easy to do in the configset > situation, but last time I looked at sharing the underlying config > object it was too fraught with problems. > > bq: 15K cores is around 4 minutes > > I find this very odd. On my laptop, spinning disk, I think I was > seeing 1k cores discovered/sec. You're seeing roughly 16x slower, so I > have no idea what's going on here. If this is just reading the files, > you should be seeing horrible disk contention. Are you on some kind of > networked drive? > > bq: To do that in background and to block on that request until core > discovery is complete, should not work for us (due to the worst case). > What other choices are there? Either you have to do it up front or > with some kind of blocking. Hmmm, I suppose you could keep some kind > of custom store (DB? File? ZooKeeper?) that would keep the last known > layout. You'd still have some kind of worst-case situation where the > core you were trying to load wouldn't be in your persistent store and > you'd _still_ have to wait for the discovery process to complete. > > bq: and we will use the cores Auto option to create load or only load > the core on > Interesting. I can see how this could all work without any core > discovery but it does require a very specific setup. > > On Thu, Oct 10, 2013 at 11:42 AM, Soyez Olivier > <mailto:olivier.so...@worldline.com><mailto:olivier.so...@worldline.com> > wrote: > > The corresponding patch for Solr 4.2.1 LotsOfCores can be found in > SOLR-5316, including the new Cores options : > > - "numBuckets" to create a subdirectory based on a hash on the corename > % numBuckets in the core Datadir > > - "Auto" with 3 differents values : > > 1) false : default behaviour > > 2) createLoad : create, if not exist, and load the core on the fly on > the first incoming request (update, select) > > 3) onlyLoad : load the core on the fly on the first incoming request > (update, select), if exist on disk > > > > Concerning : > > - sharing the underlying solrconfig object, the configset introduced in > JIRA SOLR-4478 seems to be the solution for non-SolrCloud mode. > > We need to test it for our use case. If another solution exists, please > tell me. We are very interested in such functionality and to contribute, if > we can. > > > > - the possibility of lotsOfCores in SolrCloud, we don't know in details > how SolrCloud is working. > > But one possible limit is the maximum number of entries that can be > added to a zookeeper node. > > Maybe, a solution will be just a kind of hashing in the zookeeper tree. > > > > - the time to discover cores in Solr 4.4 : with spinning disk under > linux, all cores with transient="true" and
Remove indexes of XML file
Hi, This is newbie question. I have indexed some documents using some XML files as indicating in the tutorial <http://lucene.apache.org/solr/4_10_1/tutorial.html> with the command : java -jar post.jar *.xml I have seen how to delete an index for one document but how to delete all indexes for documents within an XML file. For example if I have indexed some files A, B, C, D etc., how to delete indexes of documents from file C. Is there a command like above or other solution without using individual ID? Thank you. Regards Olivier
Re: Remove indexes of XML file
Thank you Alex, I think I can use the file to delete corresponding indexes. Regards Olivier 2014-10-24 21:51 GMT+02:00 Alexandre Rafalovitch : > You can delete individually, all (*:* query) or by specific query. So, > if there is no common query pattern you may need to do a multi-id > query - something like "id:(id1 id2 id3 id4)" which does require you > knowing the IDs. > > Regards, >Alex. > Personal: http://www.outerthoughts.com/ and @arafalov > Solr resources and newsletter: http://www.solr-start.com/ and @solrstart > Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 > > > On 24 October 2014 15:44, Olivier Austina > wrote: > > Hi, > > > > This is newbie question. I have indexed some documents using some XML > files > > as indicating in the tutorial > > <http://lucene.apache.org/solr/4_10_1/tutorial.html> with the command : > > > > java -jar post.jar *.xml > > > > I have seen how to delete an index for one document but how to delete > > all indexes > > for documents within an XML file. For example if I have indexed some > > files A, B, C, D etc., > > how to delete indexes of documents from file C. Is there a command > > like above or other > > solution without using individual ID? Thank you. > > > > > > Regards > > Olivier >
OpenExchangeRates.Org rates in solr
Hi, There is a way to see the OpenExchangeRates.Org <http://www.OpenExchangeRates.Org> rates used in Solr somewhere. I have changed the configuration to use these rates. Thank you. Regards Olivier
Re: OpenExchangeRates.Org rates in solr
Hi Will, I am learning Solr now. I can use it later for business or for free access. Thank you. Regards Olivier 2014-10-26 17:32 GMT+01:00 Will Martin : > Hi Olivier: > > Can you clarify this message? Are you using Solr at the business? Or are > you giving free access to solr installations? > > Thanks, > Will > > > -Original Message- > From: Olivier Austina [mailto:olivier.aust...@gmail.com] > Sent: Sunday, October 26, 2014 10:57 AM > To: solr-user@lucene.apache.org > Subject: OpenExchangeRates.Org rates in solr > > Hi, > > There is a way to see the OpenExchangeRates.Org < > http://www.OpenExchangeRates.Org> rates used in Solr somewhere. I have > changed the configuration to use these rates. Thank you. > Regards > Olivier > >
Indexing documents/files for production use
Hi All, I am reading the solr documentation. I have understood that post.jar <http://wiki.apache.org/solr/ExtractingRequestHandler#SimplePostTool_.28post.jar.29> is not meant for production use, cURL <https://cwiki.apache.org/confluence/display/solr/Introduction+to+Solr+Indexing> is not recommanded. Is SolrJ better for production? Thank you. Regards Olivier
Re: Indexing documents/files for production use
Thank you Alexandre, Jürgen and Erick for your replies. It is clear for me. Regards Olivier 2014-10-28 23:35 GMT+01:00 Erick Erickson : > And one other consideration in addition to the two excellent responses > so far > > In a SolrCloud environment, SolrJ via CloudSolrServer will automatically > route the documents to the correct shard leader, saving some additional > overhead. Post.jar and cURL send the docs to a node, which in turn > forward the docs to the correct shard leader which lowers > throughput > > Best, > Erick > > On Tue, Oct 28, 2014 at 2:32 PM, "Jürgen Wagner (DVT)" > wrote: > > Hello Olivier, > > for real production use, you won't really want to use any toys like > > post.jar or curl. You want a decent connector to whatever data source > there > > is, that fetches data, possibly massages it a bit, and then feeds it into > > Solr - by means of SolrJ or directly into the web service of Solr via > binary > > protocols. This way, you can properly handle incremental feeding, > processing > > of data from remote locations (with the connector being closer to the > data > > source), and also source data security. Also think about what happens if > you > > do processing of incoming documents in Solr. What happens if Tika runs > out > > of memory because of PDF problems? What if this crashes your Solr node? > In > > our Solr projects, we generally do not do any sizable processing within > Solr > > as document processing and document indexing or querying have all > different > > scaling properties. > > > > "Production use" most typically is not achieved by deploying a vanilla > Solr, > > but rather having a bit more glue and wrappage, so the whole will fit > your > > requirements in terms of functionality, scaling, monitoring and > robustness. > > Some similar platforms like Elasticsearch try to alleviate these pains of > > going to a production-style infrastructure, but that's at the expense of > > flexibility and comes with limitations. > > > > For proof-of-concept or demonstrator-style applications, the plain tools > out > > of the box will be fine. For production applications, you want to have > more > > robust components. > > > > Best regards, > > --Jürgen > > > > > > On 28.10.2014 22:12, Olivier Austina wrote: > > > > Hi All, > > > > I am reading the solr documentation. I have understood that post.jar > > < > http://wiki.apache.org/solr/ExtractingRequestHandler#SimplePostTool_.28post.jar.29 > > > > is not meant for production use, cURL > > < > https://cwiki.apache.org/confluence/display/solr/Introduction+to+Solr+Indexing > > > > is not recommanded. Is SolrJ better for production? Thank you. > > Regards > > Olivier > > > > > > > > -- > > > > Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С > > уважением > > i.A. Jürgen Wagner > > Head of Competence Center "Intelligence" > > & Senior Cloud Consultant > > > > Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany > > Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 > 1543 > > E-Mail: juergen.wag...@devoteam.com, URL: www.devoteam.de > > > > > > Managing Board: Jürgen Hatzipantelis (CEO) > > Address of Record: 64331 Weiterstadt, Germany; Commercial Register: > > Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071 > > > > >
UI for Solr
Hi, I would like to build a User Interface on top of Solr for PC and mobile. I am wondering if there is a framework, best practice commonly used. I want Solr features such as suggestion, auto complete, facet to be available for UI. Any suggestion is welcome. Than you. Regards Olivier
Re: UI for Solr
Hi Alex, Thank you for prompt reply. I am not aware of Spring.io's Spring Data Solr. Regards Olivier 2014-12-23 16:50 GMT+01:00 Alexandre Rafalovitch : > You don't expose Solr directly to the user, it is not setup for > full-proof security out of the box. So you would need a client to talk > to Solr. > > Something like Spring.io's Spring Data Solr could be one of the things > to check. You can see an auto-complete example for it at: > https://github.com/arafalov/Solr-Javadoc/tree/master/SearchServer/src/main > and embedded in action at > http://www.solr-start.com/javadoc/solr-lucene/index.html (search box > on the top) > > Regards, >Alex. > > Sign up for my Solr resources newsletter at http://www.solr-start.com/ > > > On 23 December 2014 at 10:45, Olivier Austina > wrote: > > Hi, > > > > I would like to build a User Interface on top of Solr for PC and mobile. > I > > am wondering if there is a framework, best practice commonly used. I want > > Solr features such as suggestion, auto complete, facet to be available > for > > UI. Any suggestion is welcome. Than you. > > > > Regards > > Olivier >
Architecture for PHP web site, Solr and an application
Hi, I would like to query only some fields in Solr depend on the user input as I know the fields. The user send an HTML form to the PHP website. The application get the fields and their content from the PHP web site. The application then formulate a query to Solr based on this fields and other contextual information. Only fields from the HTML form are used. The forms don't have the same fields. The application is not yet developed. It could be in C++, Java or other language using a database. It uses more resources. I am wondering which architecture is suitable for this case: -How to make the architecture scalable (to support more users) -How to make PHP communicate with the application if this application is not in PHP. Any suggestion is welcome. Thank you. Regards Olivier
How to implement Auto complete, suggestion client side
Hi All, I would say I am new to web technology. I would like to implement auto complete/suggestion in the user search box as the user type in the search box (like Google for example). I am using Solr as database. Basically I am familiar with Solr and I can formulate suggestion queries. But now I don't know how to implement suggestion in the User Interface. Which technologies should I need. The website is in PHP. Any suggestions, examples, basic tutorial is welcome. Thank you. Regards Olivier
Re: How to implement Auto complete, suggestion client side
Hi, Thank you Dan Davis and Alexandre Rafalovitch. This is very helpful for me. Regards Olivier 2015-01-27 0:51 GMT+01:00 Alexandre Rafalovitch : > You've got a lot of options depending on what you want. But since you > seem to just want _an_ example, you can use mine from > http://www.solr-start.com/javadoc/solr-lucene/index.html (gray search > box there). > > You can see the source for the test screen (using Spring Boot and > Spring Data Solr as a middle-layer) and Select2 for the UI at: > https://github.com/arafalov/Solr-Javadoc/tree/master/SearchServer. > The Solr definition is at: > > https://github.com/arafalov/Solr-Javadoc/tree/master/JavadocIndex/JavadocCollection/conf > > Other implementation pieces are in that (and another) public > repository as well, but it's all in Java. You'll probably want to do > something similar in PHP. > > Regards, >Alex. > > Sign up for my Solr resources newsletter at http://www.solr-start.com/ > > > On 26 January 2015 at 17:11, Olivier Austina > wrote: > > Hi All, > > > > I would say I am new to web technology. > > > > I would like to implement auto complete/suggestion in the user search box > > as the user type in the search box (like Google for example). I am using > > Solr as database. Basically I am familiar with Solr and I can formulate > > suggestion queries. > > > > But now I don't know how to implement suggestion in the User Interface. > > Which technologies should I need. The website is in PHP. Any suggestions, > > examples, basic tutorial is welcome. Thank you. > > > > > > > > Regards > > Olivier >
feedback on Solr 4.x LotsOfCores feature
Hello, In my company, we use Solr in production to offer full text search on mailboxes. We host dozens million of mailboxes, but only webmail users have such feature (few millions). We have the following use case : - non static indexes with more update (indexing and deleting), than select requests (ratio 7:1) - homogeneous configuration for all indexes - not so much user at the same time We started to index mailboxes with Solr 1.4 in 2010, on a subset of 400,000 users. - we had a cluster of 50 servers, 4 Solr per server, 2000 users per Solr instance - we grow to 6000 users per Solr instance, 8 Solr per server, 60Go per index (~2 million users) - we upgraded to Solr 3.5 in 2012 As indexes grew, IOPS and the response times have increased more and more. The index size was mainly due to stored fields (large .fdt files) Retrieving these fields from the index was costly, because of many seek in large files, and no limit usage possible. There is also an overhead on queries : too many results are filtered to find only results concerning user. For these reason and others, like not pooled users, hardware savings, better scoring, some requests that do not support filtering, we have decided to use the LotsOfCores feature. Our goal was to change the current I/O usage : from lots of random I/O access on huge segments to mostly sequential I/O access on small segments. For our use case, it's not a big deal, that the first query to one not yet loaded core will be slow. And, we don’t need to fit all the cores into memory at once. We started from the SOLR-1293 issue and the LotsOfCores wiki page to finally use a patched Solr 4.2.1 LotsOfCores in production (1 user = 1 core). We don't need anymore to run so many Solr per node. We are now able to have around 5 cores per Solr and we plan to grow to 100,000 cores per instance. In a first time, we used the solr.xml persistence. All cores have loadOnStartup="false" and transient="true" attributes, so a cold start is very quick. The response times were better than ever, in comparaison with poor response times, we had before using LotsOfCores. We added 2 Cores options : - "numBuckets" to create a subdirectory based on a hash on the corename % numBuckets in the core Datadir, because all cores cannot live in the same directory - "Auto" with 3 differents values : 1) false : default behaviour 2) createLoad : create, if not exist, and load the core on the fly on the first incoming request (update, select). 3) onlyLoad : load the core on the fly on the first incoming request (update, select), if exist on disk Then, to improve performance and avoid synchronization in the solr.xml persistence : we disabled it. The drawback is we cannot see anymore all the availables cores list with the admin core status command, only those warmed up. Finally, we can achieve very good performances with Solr LotsOfCores : - Index 5 emails (avg) + commit + search : x4.9 faster response time (Mean), x5.4 faster (95th per) - Delete 5 documents (avg) : x8.4 faster response time (Mean) x7.4 faster (95th per) - Search : x3.7 faster response time (Mean) 4x faster (95th per) In fact, the better performance is mainly due to the little size of each index, but also thanks to the isolation between cores (updates and queries on many mailboxes don’t have side effects to each other). One important thing with the LotsOfCores feature is to take care of : - the number of file descriptors, it used a lot (need to increase global max and per process fd) - the value of the transientCacheSize depending of the RAM size and the PermGen allocated size - the leak of ClassLoader that increase minor GC times, when CMS GC is enabled (use -XX:+CMSClassUnloadingEnabled) - the overhead to parse solrconfig.xml and load dependencies to open each core - lotsOfCores doesn’t work with SolrCloud, then we store indexes location outside of Solr. We have Solr proxies to route requests to the right instance. Not in production, we try the core discovery feature in Solr 4.4 with a lots of cores. When you start, it spend a lot of times to discover cores due to a big number of cores, meanwhile all requests fail (SolrDispatchFilter.init() not done yet). It will be great to have for example an option for a core discovery in background, or just to be able to disable it, like we do in our use case. If someone is interested in these new options for LotsOfCores feature, just tell me Ce message et les pièces jointes sont confidentiels et réservés à l'usage exclusif de ses destinataires. Il peut également être protégé par le secret professionnel. Si vous recevez ce message par erreur, merci d'en avertir immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant être assurée sur Internet, la responsabilité de Worldline ne pourra être recherchée quant au contenu de ce message. Bien que les meilleurs efforts soient faits pour maintenir cette transmission exempte de tout virus, l'expéditeur ne donne aucune garantie à cet ég
Re: Re: feedback on Solr 4.x LotsOfCores feature
The corresponding patch for Solr 4.2.1 LotsOfCores can be found in SOLR-5316, including the new Cores options : - "numBuckets" to create a subdirectory based on a hash on the corename % numBuckets in the core Datadir - "Auto" with 3 differents values : 1) false : default behaviour 2) createLoad : create, if not exist, and load the core on the fly on the first incoming request (update, select) 3) onlyLoad : load the core on the fly on the first incoming request (update, select), if exist on disk Concerning : - sharing the underlying solrconfig object, the configset introduced in JIRA SOLR-4478 seems to be the solution for non-SolrCloud mode. We need to test it for our use case. If another solution exists, please tell me. We are very interested in such functionality and to contribute, if we can. - the possibility of lotsOfCores in SolrCloud, we don't know in details how SolrCloud is working. But one possible limit is the maximum number of entries that can be added to a zookeeper node. Maybe, a solution will be just a kind of hashing in the zookeeper tree. - the time to discover cores in Solr 4.4 : with spinning disk under linux, all cores with transient="true" and loadOnStartup="false", the linux buffer cache empty before starting Solr : 15K cores is around 4 minutes. It's linear in the cores number, so for 50K it's more than 13 minutes. In fact, it corresponding to the time to read all core.properties files. To do that in background and to block on that request until core discovery is complete, should not work for us (due to the worst case). So, we will just disable the core Discovery, because we don't need to know all cores from the start. Start Solr without any core entries in solr.xml, and we will use the cores Auto option to create load or only load the core on the fly, based on the existence of the core on the disk (absolute path calculated from the core name). Thanks for your interest, Olivier De : Erick Erickson [erickerick...@gmail.com] Date d'envoi : lundi 7 octobre 2013 14:33 À : solr-user@lucene.apache.org Objet : Re: feedback on Solr 4.x LotsOfCores feature Thanks for the great writeup! It's always interesting to see how a feature plays out "in the real world". A couple of questions though: bq: We added 2 Cores options : Do you mean you patched Solr? If so are you willing to shard the code back? If both are "yes", please open a JIRA, attach the patch and assign it to me. bq: the number of file descriptors, it used a lot (need to increase global max and per process fd) Right, this makes sense since you have a bunch of cores all with their own descriptors open. I'm assuming that you hit a rather high max number and it stays pretty steady bq: the overhead to parse solrconfig.xml and load dependencies to open each core Right, I tried to look at sharing the underlying solrconfig object but it seemed pretty hairy. There are some extensive comments in the JIRA of the problems I foresaw. There may be some action on this in the future. bq: lotsOfCores doesn’t work with SolrCloud Right, we haven't concentrated on that, it's an interesting problem. In particular it's not clear what happens when nodes go up/down, replicate, resynch, all that. bq: When you start, it spend a lot of times to discover cores due to a big How long? I tried 15K cores on my laptop and I think I was getting 15 second delays or roughly 1K cores discovered/second. Is your delay on the order of 50 seconds with 50K cores? I'm not sure how you could do that in the background, but I haven't thought about it much. I tried multi-threading core discovery and that didn't help (SSD disk), I assumed that the problem was mostly I/O contention (but didn't prove it). What if a request came in for a core before you'd found it? I'm not sure what the right behavior would be except perhaps to block on that request until core discovery was complete. Hm. How would that work for your case? That seems do-able. BTW, so far you get the prize for the most cores on a node I think. Thanks again for the great feedback! Erick On Mon, Oct 7, 2013 at 3:53 AM, Soyez Olivier wrote: > Hello, > > In my company, we use Solr in production to offer full text search on > mailboxes. > We host dozens million of mailboxes, but only webmail users have such > feature (few millions). > We have the following use case : > - non static indexes with more update (indexing and deleting), than > select requests (ratio 7:1) > - homogeneous configuration for all indexes > - not so much user at the same time > > We started to index mailboxes with Solr 1.4 in 2010, on a subset of > 400,000 users. > - we had a cluster of 50 servers, 4 Solr per server, 2000 users per Solr > instance > - we grow to 6000
Re: solr distributed search don't work
explicit enum 1 10 192.168.1.6/solr/,192.168.1.7/solr/ 2011/8/19 Li Li > could you please show me your configuration in solrconfig.xml? > > On Fri, Aug 19, 2011 at 5:31 PM, olivier sallou > wrote: > > Hi, > > I do not use spell but I use distributed search, using qt=spell is > correct, > > should not use qt=\spell. > > For "shards", I specify it in solrconfig directly, not in url, but should > > work the same. > > Maybe an issue in your spell request handler. > > > > > > 2011/8/19 Li Li > > > >> hi all, > >> I follow the wiki http://wiki.apache.org/solr/SpellCheckComponent > >> but there is something wrong. > >> the url given my the wiki is > >> > >> > http://solr:8983/solr/select?q=*:*&spellcheck=true&spellcheck.build=true&spellcheck.q=toyata&qt=spell&shards.qt=spell&shards=solr-shard1:8983/solr,solr-shard2:8983/solr > >> but it does not work. I trace the codes and find that > >> qt=spell&shards.qt=spell should be qt=/spell&shards.qt=/spell > >> After modification of url, It return all documents but nothing > >> about spell check. > >> I debug it and find the > >> AbstractLuceneSpellChecker.getSuggestions() is called. > >> > > >
Solr 3.5 MoreLikeThis on Date fields
Hi Everyone, Please help out if you know what is going on. We are upgrading to Solr 3.5 (from 1.4.1) and busy with a Re-Index and Test on our data. Everything seems OK, but Date Fields seem to be "broken" when using with the MoreLikeThis handler (I also saw the same error on Date Fields using the HighLighter in another forum post "Invalid Date String for highlighting any date field match @ Mon 2011/08/15 13:10 "). * I deleted the index/core and only loaded a few records and still get the error when using the MoreLikeThis using the "docdate" as part of the mlt.fl params. * I double checked all the data that was loaded and the dates parse 100% and can see no problems with any of the data loaded. Type: Definition: A sample result: 1999-06-28T00:00:00Z THE MLT QUERY: Jan 16, 2012 4:09:16 PM org.apache.solr.core.SolrCore execute INFO: [legal_spring] webapp=/solr path=/select params={mlt.fl=doctitle,pld_pubtype,docdate,pld_cluster,pld_port,pld_summary,alltext,subclass&mlt.mintf=1&mlt=true&version=2.2&fl=doc_id,doctitle,docdate,prodtype&qt=mlt&mlt.boost=true&mlt.qf=doctitle^5.0+alltext^0.2&json.nl=map&wt=json&rows=50&mlt.mindf=1&mlt.count=50&start=0&q=doc_id:PLD23996} status=400 QTime=1 THE ERROR: Jan 16, 2012 4:09:16 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Invalid Date String:'94046400' at org.apache.solr.schema.DateField.parseMath(DateField.java:165) at org.apache.solr.analysis.TrieTokenizer.reset(TrieTokenizerFactory.java:106) at org.apache.solr.analysis.TrieTokenizer.(TrieTokenizerFactory.java:76) at org.apache.solr.analysis.TrieTokenizerFactory.create(TrieTokenizerFactory.java:51) at org.apache.solr.analysis.TrieTokenizerFactory.create(TrieTokenizerFactory.java:41) at org.apache.solr.analysis.TokenizerChain.getStream(TokenizerChain.java:68) at org.apache.solr.analysis.SolrAnalyzer.reusableTokenStream(SolrAnalyzer.java:75) at org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer.reusableTokenStream(IndexSchema.java:385) at org.apache.lucene.search.similar.MoreLikeThis.addTermFrequencies(MoreLikeThis.java:876) at org.apache.lucene.search.similar.MoreLikeThis.retrieveTerms(MoreLikeThis.java:820) at org.apache.lucene.search.similar.MoreLikeThis.like(MoreLikeThis.java:629) at org.apache.solr.handler.MoreLikeThisHandler$MoreLikeThisHelper.getMoreLikeThis(MoreLikeThisHandler.java:311) at org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(MoreLikeThisHandler.java:149) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:619) Sincerely, Jaco Olivier Please note: This email and its content are subject to the disclaimer as displayed at the following link http://www.sabinet.co.za/?page=e-mail-disclaimer. Should you not have Web access, send an email to i...@sabinet.co.za<mailto:i...@sabinet.co.za> and a copy will be sent to you
Faceted search outofmemory
Hi, I try to make a faceted search on a very large index (around 200GB with 200M doc). I have an out of memory error. With no facet it works fine. There are quite many questions around this but I could not find the answer. How can we know the required memory when facets are used so that I try to scale my server/index correctly to handle it. Thanks Olivier
Re: Faceted search outofmemory
How do make paging over facets? 2010/6/29 Ankit Bhatnagar > > Did you trying paging them? > > > -Original Message- > From: olivier sallou [mailto:olivier.sal...@gmail.com] > Sent: Tuesday, June 29, 2010 2:04 PM > To: solr-user@lucene.apache.org > Subject: Faceted search outofmemory > > Hi, > I try to make a faceted search on a very large index (around 200GB with > 200M > doc). > I have an out of memory error. With no facet it works fine. > > There are quite many questions around this but I could not find the answer. > How can we know the required memory when facets are used so that I try to > scale my server/index correctly to handle it. > > Thanks > > Olivier >
Re: Re: Faceted search outofmemory
I already use facet.limit in my query. I tried however facet.method=enum and though it does not seem to fix everything, I have some requests without the outofmemory error. Best would be to have a calculation rule of required memory for such type of query. 2010/6/29 Markus Jelsma > http://wiki.apache.org/solr/SimpleFacetParameters#facet.limit > > -Original message- > From: olivier sallou > Sent: Tue 29-06-2010 20:11 > To: solr-user@lucene.apache.org; > Subject: Re: Faceted search outofmemory > > How do make paging over facets? > > 2010/6/29 Ankit Bhatnagar > > > > > Did you trying paging them? > > > > > > -Original Message- > > From: olivier sallou [mailto:olivier.sal...@gmail.com] > > Sent: Tuesday, June 29, 2010 2:04 PM > > To: solr-user@lucene.apache.org > > Subject: Faceted search outofmemory > > > > Hi, > > I try to make a faceted search on a very large index (around 200GB with > > 200M > > doc). > > I have an out of memory error. With no facet it works fine. > > > > There are quite many questions around this but I could not find the > answer. > > How can we know the required memory when facets are used so that I try to > > scale my server/index correctly to handle it. > > > > Thanks > > > > Olivier > > >
Re: Faceted search outofmemory
I have given 6G to Tomcat. Using facet.method=enum and facet.limit seems to fix the issue with a few tests, but I do know that it is not a "final" solution. Will work under certain configurations. Real "issue" is to be able to know what is the required RAM for an index... 2010/6/29 Nagelberg, Kallin > How much memory have you given the solr jvm? Many servlet containers have > small amount by default. > > -Kal > > -Original Message- > From: olivier sallou [mailto:olivier.sal...@gmail.com] > Sent: Tuesday, June 29, 2010 2:04 PM > To: solr-user@lucene.apache.org > Subject: Faceted search outofmemory > > Hi, > I try to make a faceted search on a very large index (around 200GB with > 200M > doc). > I have an out of memory error. With no facet it works fine. > > There are quite many questions around this but I could not find the answer. > How can we know the required memory when facets are used so that I try to > scale my server/index correctly to handle it. > > Thanks > > Olivier >
Re: Tag generation
Am 15.07.2010 um 17:34 schrieb kenf_nc: > A colleague mentioned that he knew of services where you pass some content > and it spits out some suggested Tags or Keywords that would be best suited > to associate with that content. > > Does anyone know if there is a contrib to Solr or Lucene that does something > like this? Or a third party tool that can be given a solr index or solr > query and it comes up with some good Tag suggestions? Hi there something from http://www.zemanta.com/ and something from basis tech http://www.basistech.com/ i am not sure if this would help. you could have a look at http://uima.apache.org/ greetings, olivier -- Olivier Dobberkau
Spatial filtering
Hi folks, I can't manage to have the new spatial filtering feature (added in r962727 by Grant Ingersoll, see https://issues.apache.org/jira/browse/SOLR-1568) working. I'm trying to get all the documents located within a circle defined by its center and radius. I've modified my query url as specified in http://wiki.apache.org/solr/SpatialSearch#Spatial_Filter_QParser to add the "pt", "d" and "meas" parameters. Here is what my query parameters looks like (from Solr's response with debug mode activated): [params] => Array ( [explainOther] => true [mm] => 2<-75% [d] => 50 [sort] => date asc [qf] => [wt] => php [rows] => 5000 [version] => 2.2 [fl] => object_type object_id score [debugQuery] => true [start] => 0 [q] => *:* [meas] => hsin [pt] => 48.85341,2.3488 [bf] => [qt] => standard [fq] => +object_type:Concert +date:[2010-07-19T00:00:00Z TO 2011-07-19T23:59:59Z] ) With this query, I get 3859 results. And some (lots) of the found documents are not located whithin the circle! :( If I run the same query without spatial filtering (if I remove the "pt", "d" and "meas" parameters from the url), I get 3859 results too. So it looks like my spatial filtering constraint is not taken into account in the first search query (the one where "pt", "d" and "meas" are set). Is the wiki's doc up to date? In the comments of SOLR-1568, I've seen someone talking about adding "{!sfilt fl=latlon_field_name}". So I tried the following request: [params] => Array ( [explainOther] => true [mm] => 2<-75% [d] => 50 [sort] => date asc [qf] => [wt] => php [rows] => 5000 [version] => 2.2 [fl] => object_type object_id score [debugQuery] => true [start] => 0 [q] => *:* [meas] => hsin [pt] => 48.85341,2.3488 [bf] => [qt] => standard [fq] => +object_type:Concert +date:[2010-07-19T00:00:00Z TO 2011-07-19T23:59:59Z] +{!sfilt fl=coords_lat_lon,units=km,meas=hsin} ) This leads to 2713 results (which is smaller than 3859, good). But some (lots) of the results are once more out of the circle :( Can someone help me get spatial filtering working? I really don't understand the search results I'm getting. Cheers, Olivier -- - *Olivier RICORDEAU* - oliv...@ricordeau.org http://olivier.ricordeau.org
How to get the list of all available fields in a (sharded) index
Hi, I cannot find any info on how to get the list of current fields in an index (possibly sharded). With dynamic fields, I cannot simply parse the schema to know what field are available. Is there any way to get it via a request (or easilly programmable) ? I know information is available in one of the Lucene generated files, but I 'd like to get it via a query for my whole index. Thanks Olivier
Re: dismax request handler without q
Hi, this is not very clear, if you need to query only keyphrase, why don't you query directly it? e.g. q=keyphrase:hotel ? Furthermore, why dismax if only keyphrase field is of interest? dismax is used to query multiple fields automatically. At least dismax do not appear in your query (using query type). It is set in your config for your default request handler? 2010/7/20 Chamnap Chhorn > I wonder how could i make a query to return only *all books* that has > keyphrase "web development" using dismax handler? A book has multiple > keyphrases (keyphrase is multivalued column). Do I have to pass q > parameter? > > > Is it the correct one? > http://locahost:8081/solr/select?&q=hotel&fq=keyphrase:%20hotel > > -- > Chhorn Chamnap > http://chamnapchhorn.blogspot.com/ >
Re: Spatial filtering
Le 20/07/2010 04:18, Lance Norskog a écrit : Add the debugQuery=true parameter and it will show you the Lucene query tree, and how each document is evaluated. This can help with the more complex queries. Do you see something wrong? [debug] => Array ( [rawquerystring] => *:* [querystring] => *:* [parsedquery] => MatchAllDocsQuery(*:*) [parsedquery_toString] => *:* [explain] => Array ( [doc_45269] => 1.0 = (MATCH) MatchAllDocsQuery, product of: 1.0 = queryNorm [doc_50206] => 1.0 = (MATCH) MatchAllDocsQuery, product of: 1.0 = queryNorm [doc_50396] => 1.0 = (MATCH) MatchAllDocsQuery, product of: 1.0 = queryNorm [doc_51199] => 1.0 = (MATCH) MatchAllDocsQuery, product of: 1.0 = queryNorm [] ) [QParser] => LuceneQParser [filter_queries] => Array ( [0] => +object_type:Concert +date:[2010-07-20T00:00:00Z TO 2011-07-20T23:59:59Z] +{!sfilt fl=coords_lat_lon,units=km,meas=hsin} ) [parsed_filter_queries] => Array ( [0] => +object_type:Concert +date:[127958400 TO 1311206399000] +name:{!sfilt TO fl=coords_lat_lon,units=km,meas=hsin} ) [...] I'm not sure about the "parsed_filter_queries" entry. It looks like the "+{!sfilt fl=coords_lat_lon,units=km,meas=hsin}" is not well interpreted (seems like it's interpreted as a range). Does anyone know what the right syntax? This is not documented... Cheers, Olivier On Mon, Jul 19, 2010 at 3:35 AM, Olivier Ricordeau wrote: Hi folks, I can't manage to have the new spatial filtering feature (added in r962727 by Grant Ingersoll, see https://issues.apache.org/jira/browse/SOLR-1568) working. I'm trying to get all the documents located within a circle defined by its center and radius. I've modified my query url as specified in http://wiki.apache.org/solr/SpatialSearch#Spatial_Filter_QParser to add the "pt", "d" and "meas" parameters. Here is what my query parameters looks like (from Solr's response with debug mode activated): [params] => Array ( [explainOther] => true [mm] => 2<-75% [d] => 50 [sort] => date asc [qf] => [wt] => php [rows] => 5000 [version] => 2.2 [fl] => object_type object_id score [debugQuery] => true [start] => 0 [q] => *:* [meas] => hsin [pt] => 48.85341,2.3488 [bf] => [qt] => standard [fq] => +object_type:Concert +date:[2010-07-19T00:00:00Z TO 2011-07-19T23:59:59Z] ) With this query, I get 3859 results. And some (lots) of the found documents are not located whithin the circle! :( If I run the same query without spatial filtering (if I remove the "pt", "d" and "meas" parameters from the url), I get 3859 results too. So it looks like my spatial filtering constraint is not taken into account in the first search query (the one where "pt", "d" and "meas" are set). Is the wiki's doc up to date? In the comments of SOLR-1568, I've seen someone talking about adding "{!sfilt fl=latlon_field_name}". So I tried the following request: [params] => Array ( [explainOther] => true [mm] => 2<-75% [d] => 50 [sort] => date asc [qf] => [wt] => php [rows] => 5000 [version] => 2.2 [fl] => object_type object_id score [debugQuery] => true [start] => 0 [q] => *:* [meas] => hsin [pt] => 48.85341,2.3488 [bf] => [qt] => standard [fq] => +object_type:Concert +date:[2010-07-19T00:00:00Z TO 2011-07-19T23:59:59Z] +{!sfilt fl=coords_lat_lon,units=km,meas=hsin} ) This leads to 2713 results (which is smaller than 3859, good). But some (lots) of the results are once more out of the circle :( Can someone help me get spatial filtering working? I really don't understand the search results I'm getting. Cheers, Olivier -- - *Olivier RICORDEAU* - oliv...@ricordeau.org http://olivier.ricordeau.org -- - *Olivier RICORDEAU* - oliv...@ricordeau.org http://olivier.ricordeau.org
Re: Spatial filtering
Ok, I have found a big bug in my indexing script. Things are getting better. I managed to have my parsed_filter_query to: +coords_lat_lon_0_latLon:[48.694179707855874 TO 49.01213545059667] +coords_lat_lon_1_latLon:[2.1079512793239767 TO 2.5911832073858765] For the record, here are the parameters which made it work: [params] => Array ( [explainOther] => true [mm] => 2<-75% [d] => 25 [sort] => date asc [qf] => [wt] => php [rows] => 5000 [version] => 2.2 [fl] => * score [debugQuery] => true [start] => 0 [q] => *:* [meas] => hsin [pt] => 48.85341,2.3488 [bf] => [qt] => standard [fq] => {!sfilt fl=coords_lat_lon} +object_type:Concert +date:[2008-07-20T00:00:00Z TO 2011-07-20T23:59:59Z] ) But I am facing one problem: the " +object_type:Concert + date:[2008-07-20T00:00:00Z TO 2011-07-20T23:59:59Z]" part of my fq parameter is not taken into account (see the parsed_filter_query above). So here is my question: How can I mix the "{!sfilt fl=coords_lat_lon}" part of the fq parameter with "usual" fq parameters (eg: "+object_type:Concert")? Can anyone help? Regards, Olivier Le 20/07/2010 09:53, Olivier Ricordeau a écrit : Le 20/07/2010 04:18, Lance Norskog a écrit : Add the debugQuery=true parameter and it will show you the Lucene query tree, and how each document is evaluated. This can help with the more complex queries. Do you see something wrong? [debug] => Array ( [rawquerystring] => *:* [querystring] => *:* [parsedquery] => MatchAllDocsQuery(*:*) [parsedquery_toString] => *:* [explain] => Array ( [doc_45269] => 1.0 = (MATCH) MatchAllDocsQuery, product of: 1.0 = queryNorm [doc_50206] => 1.0 = (MATCH) MatchAllDocsQuery, product of: 1.0 = queryNorm [doc_50396] => 1.0 = (MATCH) MatchAllDocsQuery, product of: 1.0 = queryNorm [doc_51199] => 1.0 = (MATCH) MatchAllDocsQuery, product of: 1.0 = queryNorm [] ) [QParser] => LuceneQParser [filter_queries] => Array ( [0] => +object_type:Concert +date:[2010-07-20T00:00:00Z TO 2011-07-20T23:59:59Z] +{!sfilt fl=coords_lat_lon,units=km,meas=hsin} ) [parsed_filter_queries] => Array ( [0] => +object_type:Concert +date:[127958400 TO 1311206399000] +name:{!sfilt TO fl=coords_lat_lon,units=km,meas=hsin} ) [...] I'm not sure about the "parsed_filter_queries" entry. It looks like the "+{!sfilt fl=coords_lat_lon,units=km,meas=hsin}" is not well interpreted (seems like it's interpreted as a range). Does anyone know what the right syntax? This is not documented... Cheers, Olivier On Mon, Jul 19, 2010 at 3:35 AM, Olivier Ricordeau wrote: Hi folks, I can't manage to have the new spatial filtering feature (added in r962727 by Grant Ingersoll, see https://issues.apache.org/jira/browse/SOLR-1568) working. I'm trying to get all the documents located within a circle defined by its center and radius. I've modified my query url as specified in http://wiki.apache.org/solr/SpatialSearch#Spatial_Filter_QParser to add the "pt", "d" and "meas" parameters. Here is what my query parameters looks like (from Solr's response with debug mode activated): [params] => Array ( [explainOther] => true [mm] => 2<-75% [d] => 50 [sort] => date asc [qf] => [wt] => php [rows] => 5000 [version] => 2.2 [fl] => object_type object_id score [debugQuery] => true [start] => 0 [q] => *:* [meas] => hsin [pt] => 48.85341,2.3488 [bf] => [qt] => standard [fq] => +object_type:Concert +date:[2010-07-19T00:00:00Z TO 2011-07-19T23:59:59Z] ) With this query, I get 3859 results. And some (lots) of the found documents are not located whithin the circle! :( If I run the same query without spatial filtering (if I remove the "pt", "d" and "meas" parameters from the url), I get 3859 results too. So it looks like my spatial filtering constraint is not taken into account in the first search query (the one where "pt", "d" and "meas" are set). Is the wiki's doc up to date? In the comments of SOLR-1568, I've seen someone talking about adding "{!sfilt fl=latlon_field_name}". So I tried the following request: [params] => Array ( [explainOther] => true [mm] => 2<-75% [d] => 50 [sort] => date asc [qf] => [wt] => php [rows] => 5000 [version] => 2.2 [fl] => object_type object_id score [debugQuery] => true [start] => 0 [q] => *:* [meas] =>
Re: dismax request handler without q
q will search in defaultSearchField if no field name is set, but you can specify in your "q" param the fields you want to search into. Dismax is a handler where you can specify to look in a number of fields for the input query. In this case, you do not specify the fields and dismax will look in the fields specified in its configuration. However, by default, dismax is not used, it needs to be called help with the query type parameter (qt=dismax). In default solr config, you can call ...solr/select?q=keyphrase:hotel if keyphrzase is a declared field in your schema 2010/7/20 Chamnap Chhorn > I can't put q=keyphrase:hotel in my request using dismax handler. It > returns > no result. > > On Tue, Jul 20, 2010 at 1:19 PM, Chamnap Chhorn >wrote: > > > There are some default configuration on my solrconfig.xml that I didn't > > show you. I'm a little confused when reading > > http://wiki.apache.org/solr/DisMaxRequestHandler#q. I think q is for > plain > > user input query. > > > > > > On Tue, Jul 20, 2010 at 12:08 PM, olivier sallou < > olivier.sal...@gmail.com > > > wrote: > > > >> Hi, > >> this is not very clear, if you need to query only keyphrase, why don't > you > >> query directly it? e.g. q=keyphrase:hotel ? > >> Furthermore, why dismax if only keyphrase field is of interest? dismax > is > >> used to query multiple fields automatically. > >> > >> At least dismax do not appear in your query (using query type). It is > set > >> in > >> your config for your default request handler? > >> > >> 2010/7/20 Chamnap Chhorn > >> > >> > I wonder how could i make a query to return only *all books* that has > >> > keyphrase "web development" using dismax handler? A book has multiple > >> > keyphrases (keyphrase is multivalued column). Do I have to pass q > >> > parameter? > >> > > >> > > >> > Is it the correct one? > >> > http://locahost:8081/solr/select?&q=hotel&fq=keyphrase:%20hotel > >> > > >> > -- > >> > Chhorn Chamnap > >> > http://chamnapchhorn.blogspot.com/ > >> > > >> > > > > > > > > -- > > Chhorn Chamnap > > http://chamnapchhorn.blogspot.com/ > > > > > > -- > Chhorn Chamnap > http://chamnapchhorn.blogspot.com/ >
Solr and Lucene in South Africa
Hi to all Solr/Lucene Users... Out team had a discussion today regarding the Solr/Lucene community closer to home. I am hereby putting out an SOS to all Solr/Lucene users in the South African market and wish to organize a meet-up (or user support group) if at all possible. It would be great to share some triumphs and pitfalls that were experienced. * Sorry for hogging the User Mailing list on non-technical question, but think this is the easiest way to get it done :) Jaco Olivier Web Specialist Please note: This email and its content are subject to the disclaimer as displayed at the following link http://www.sabinet.co.za/?page=e-mail-disclaimer. Should you not have Web access, send an email to i...@sabinet.co.za<mailto:i...@sabinet.co.za> and a copy will be sent to you
Replication and CPU
Hello, I setup a server for the replication of Solr. I used 2 cores and for each one I specified the replication. I followed the tutorial on http://wiki.apache.org/solr/SolrReplication. The replication is OK for each cores. However the CPU is used to 100% on the slave. The master and slave are 2 servers with the same hardware configuration. I don't understand which can cause the problem. The slave is launched by : java -Dsolr.solr.home=/solr/multicore -Denable.master=false -Denable.slave=true -Xms512m -Xmx1536m -XX:+UseConcMarkSweepGC -jar start.jar If I comment the replication the server is OK. Anyone have an idea ? Regards, Olivier
Re: Replication and CPU
Hello Peter, On the slave server http://slave/solr/core0/admin/replication/index.jsp Poll Interval00:30:00 Local Index Index Version: 1284026488242, Generation: 13102 Location: /solr/multicore/core0/data/index Size: 26.9 GB Times Replicated Since Startup: 289 Previous Replication Done At: Tue Oct 12 12:00:00 GMT+02:00 2010 Config Files Replicated At: 1286790818824 Config Files Replicated: [solrconfig_slave.xml] Times Config Files Replicated Since Startup: 1 Next Replication Cycle At: Tue Oct 12 12:30:00 GMT+02:00 2010 The request Handler on the slave : name="masterUrl">http://master/solr/${solr.core.name}/replication 00:30:00 I increased the poll interval because I thought that there were too many changes. Currently there are no changes on the master and the slave is always to 100% of cpu. On the master, I have startup commit name="confFiles">solrconfig_slave.xml:solrconfig.xml,schema.xml,stopwords.txt,elevate.xml,protwords.txt,spellings.txt,synonyms.txt 00:00:10 Regards, Olivier Le 12/10/2010 12:11, Peter Karich a écrit : Hi Olivier, maybe the slave replicates after startup? check replication status here: http://localhost/solr/admin/replication/index.jsp what is your poll frequency (could you paste the replication part)? Regards, Peter. Hello, I setup a server for the replication of Solr. I used 2 cores and for each one I specified the replication. I followed the tutorial on http://wiki.apache.org/solr/SolrReplication. The replication is OK for each cores. However the CPU is used to 100% on the slave. The master and slave are 2 servers with the same hardware configuration. I don't understand which can cause the problem. The slave is launched by : java -Dsolr.solr.home=/solr/multicore -Denable.master=false -Denable.slave=true -Xms512m -Xmx1536m -XX:+UseConcMarkSweepGC -jar start.jar If I comment the replication the server is OK. Anyone have an idea ? Regards, Olivier
Re: Can solr index folder can be moved from one system to another?
The index is not directory related, there is no path information in the index. You can create an index then move it anywhere (or merge it with an other one). I often do this, there is no issue. Olivier 2012/3/22 ravicv > Hi Tomás, > > I can not use Solr replcation in my scenario. My requirement is to gzip the > solr index folder and send to dotnet system through webservice. > Then in dotnet the same index folder should be unzipped and same folder > should be used as an index folder through solrnet . > > Whether my requirement is possible? > > Thanks > Ravi > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Can-solr-index-folder-can-be-moved-from-one-system-to-another-tp3844919p3847725.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438
Solr Cell and operations on metadata extracted
Hi, I have a question about Solr Cell please. I index some files. For example, if I want to extract the filename, then use a hash function on it like MD5 and then store it on Solr ; the correct way is to use Tika « manually » to extract the metadata I want, do the transformations on it and then send it to Solr ? I can’t use directly Solr Cell in this case because I can't do modifications on the metadata extracted, right ? Thanks, Olivier
Re: how to request for Json object
ajax does not allow request to an other domain. Only sway, unless using server side requests, is going through a proxy that would hide the host origin so that ajax request think both servers are the same 2011/6/2 Romi > How to parse Json through ajax when your ajax pager is on one > server(Tomcat)and Json object is of onther server(solr server). i mean i > have to make a request to another server, how can i do it . > > - > Thanks & Regards > Romi > -- > View this message in context: > http://lucene.472066.n3.nabble.com/how-to-request-for-Json-object-tp3014138p3014138.html > Sent from the Solr - User mailing list archive at Nabble.com. >
SOlr upgrade: Invalid version (expected 2, but 1) error when using shards
Hi, I just migrated to solr 3.3 from 1.4.1. My index is still in 1.4.1 format (will be migrated soon). I have an error when I use sharding with the new version: org.apache.solr.common.SolrException: java.lang.RuntimeException: Invalid version (expected 2, but 1) or the data in not in 'javabin' format However, if I request each shard independently (/request), answer is correct. So the error is triggered only with the shard mechanism. While I foresee to upgrade my indexes, I'd like to understand the issue, e.g. is it an "upgrade" issue or don't shards support using an "old" format. Thanks Olivier
lucene 3 and merge/optimize
Hi, after an upgrade to solr/lucene 3, I tried to change the code to remove deprecated functions Though new MergePolicy etc... are not really clear. I have now issues with the merge and optimize functions. I have a command line application (Java/Lucene api) that merge multiple indexes in a single one, or optimize an existing index (this is done offline) When I execute my code, the merge creates a new index, but looks to contain more files than before (with solr 4.1), why not... When I try to optimize, code says OK, but I still have many files, segments : (below for a very small example) _0.fdt _0.tis _1.tii _2.prx _3.nrm _4.frq _5.fnm _6.fdx _7.fdt _7.tis _8.tii _9.prx _a.nrm _b.frq _0.fdx _1.fdt _1.tis _2.tii _3.prx _4.nrm _5.frq _6.fnm _7.fdx _8.fdt _8.tis _9.tii _a.prx _b.nrm _0.fnm _1.fdx _2.fdt _2.tis _3.tii _4.prx _5.nrm _6.frq _7.fnm _8.fdx _9.fdt _9.tis _a.tii _b.prx _0.frq _1.fnm _2.fdx _3.fdt _3.tis _4.tii _5.prx _6.nrm _7.frq _8.fnm _9.fdx _a.fdt _a.tis _b.tii _0.nrm _1.frq _2.fnm _3.fdx _4.fdt _4.tis _5.tii _6.prx _7.nrm _8.frq _9.fnm _a.fdx _b.fdt _b.tis _0.prx _1.nrm _2.frq _3.fnm _4.fdx _5.fdt _5.tis _6.tii _7.prx _8.nrm _9.frq _a.fnm _b.fdx segments_1 _0.tii _1.prx _2.nrm _3.frq _4.fnm _5.fdx _6.fdt _6.tis _7.tii _8.prx _9.nrm _a.frq _b.fnm segments.gen I'd like to reduce with the optimize or the merge to the minimum the number of files, my index is read only and does not change. Here is the code for optimize, am I doing something wrong? IndexWriterConfig conf = new IndexWriterConfig(Version.LUCENE_33,newStandardAnalyzer(Version. LUCENE_33)); conf.setRAMBufferSizeMB(50); LogByteSizeMergePolicy policy = new LogByteSizeMergePolicy(); policy.setMaxMergeDocs(10); conf.setMergePolicy(policy); IndexWriter writer = newIndexWriter(FSDirectory.open(INDEX_DIR),getIndexConfig() ); writer.optimize(); writer.close(); Thanks Olivier
Re: lucene 3 and merge/optimize
answer to myself, to be checked... I used policy.setMaxMergeDocs(10), limiting to small number of filesat least for merge. I gonna test. 2011/8/18 olivier sallou > Hi, > after an upgrade to solr/lucene 3, I tried to change the code to remove > deprecated functions Though new MergePolicy etc... are not really > clear. > > I have now issues with the merge and optimize functions. > > I have a command line application (Java/Lucene api) that merge multiple > indexes in a single one, or optimize an existing index (this is done > offline) > > When I execute my code, the merge creates a new index, but looks to contain > more files than before (with solr 4.1), why not... > When I try to optimize, code says OK, but I still have many files, segments > : (below for a very small example) > _0.fdt _0.tis _1.tii _2.prx _3.nrm _4.frq _5.fnm _6.fdx _7.fdt > _7.tis _8.tii _9.prx _a.nrm _b.frq > _0.fdx _1.fdt _1.tis _2.tii _3.prx _4.nrm _5.frq _6.fnm _7.fdx > _8.fdt _8.tis _9.tii _a.prx _b.nrm > _0.fnm _1.fdx _2.fdt _2.tis _3.tii _4.prx _5.nrm _6.frq _7.fnm > _8.fdx _9.fdt _9.tis _a.tii _b.prx > _0.frq _1.fnm _2.fdx _3.fdt _3.tis _4.tii _5.prx _6.nrm _7.frq > _8.fnm _9.fdx _a.fdt _a.tis _b.tii > _0.nrm _1.frq _2.fnm _3.fdx _4.fdt _4.tis _5.tii _6.prx _7.nrm > _8.frq _9.fnm _a.fdx _b.fdt _b.tis > _0.prx _1.nrm _2.frq _3.fnm _4.fdx _5.fdt _5.tis _6.tii _7.prx > _8.nrm _9.frq _a.fnm _b.fdx segments_1 > _0.tii _1.prx _2.nrm _3.frq _4.fnm _5.fdx _6.fdt _6.tis _7.tii > _8.prx _9.nrm _a.frq _b.fnm segments.gen > > I'd like to reduce with the optimize or the merge to the minimum the number > of files, my index is read only and does not change. > > Here is the code for optimize, am I doing something wrong? > > IndexWriterConfig conf = new > IndexWriterConfig(Version.LUCENE_33,newStandardAnalyzer(Version. > LUCENE_33)); > > conf.setRAMBufferSizeMB(50); > > LogByteSizeMergePolicy policy = new LogByteSizeMergePolicy(); > > policy.setMaxMergeDocs(10); > > conf.setMergePolicy(policy); > > IndexWriter writer = > newIndexWriter(FSDirectory.open(INDEX_DIR),getIndexConfig() ); > > > writer.optimize(); > > writer.close(); > > > > Thanks > > > Olivier >
Re: solr distributed search don't work
Hi, I do not use spell but I use distributed search, using qt=spell is correct, should not use qt=\spell. For "shards", I specify it in solrconfig directly, not in url, but should work the same. Maybe an issue in your spell request handler. 2011/8/19 Li Li > hi all, > I follow the wiki http://wiki.apache.org/solr/SpellCheckComponent > but there is something wrong. > the url given my the wiki is > > http://solr:8983/solr/select?q=*:*&spellcheck=true&spellcheck.build=true&spellcheck.q=toyata&qt=spell&shards.qt=spell&shards=solr-shard1:8983/solr,solr-shard2:8983/solr > but it does not work. I trace the codes and find that > qt=spell&shards.qt=spell should be qt=/spell&shards.qt=/spell > After modification of url, It return all documents but nothing > about spell check. > I debug it and find the > AbstractLuceneSpellChecker.getSuggestions() is called. >
Re: Solr CMS Integration
Am 07.08.2009 um 19:01 schrieb wojtekpia: I've been asked to suggest a framework for managing a website's content and making all that content searchable. I'm comfortable using Solr for search, but I don't know where to start with the content management system. Is anyone using a CMS (open source or commercial) that you've integrated with Solr for search and are happy with? This will be a consumer facing website with a combination or articles, blogs, white papers, etc. Hi Wojtek, Have a look at TYPO3. http://typo3.org/ It is quite powerful. Ingo and I are currently implementing a SOLR extension for it. We currently use it at http://www.be-lufthansa.com/ Contact me if you want an insight. Many greetings, Olivier -- Olivier Dobberkau . . . . . . . . . . . . . . Je TYPO3, desto d.k.d d.k.d Internet Service GmbH Kaiserstrasse 73 D 60329 Frankfurt/Main Fon: +49 (0)69 - 247 52 18 - 0 Fax: +49 (0)69 - 247 52 18 - 99 Mail: olivier.dobber...@dkd.de Web: http://www.dkd.de Registergericht: Amtsgericht Frankfurt am Main Registernummer: HRB 45590 Geschäftsführer: Olivier Dobberkau, Søren Schaffstein, Götz Wegenast Aktuelle Projekte: http://bewegung.taz.de - Launch (Ruby on Rails) http://www.hans-im-glueck.de - Relaunch (TYPO3) http://www.proasyl.de - Relaunch (TYPO3)
Re: Showcase: Facetted Search for Wine using Solr
Marian Steinbach schrieb: On Sat, Sep 26, 2009 at 3:22 AM, Lance Norskog wrote: Have you seen this? It is another Solr/Typeo3 integration project. http://forge.typo3.org/projects/show/extension-solr Would you consider open-sourcing your Solr/Typo3 integration? Hi Lance! I wasn't aware of that extension. Having looked at the website, it does something very different from what we did. The solr extension mentioned above tries to provide a better website search for the Typo3 CMS on top of Solr. Our integration doesn't index web pages but product data from an XML file. I'd say the implementation is pretty much customer-specific so that I don't see a real benefit of making it open source. Regards, Marian hi marian. our extension will be able to do see also once we have set up the indexing queue for the typo3 backend. we have a concept called typo3 extensions connectors so that you will be able to add index documents to your index. feel free to contact ingo about the contribution possibililies in our solr project. if you use open source software you shoud definitly contribute. this gives you great karma. or as we at typo3 say. inspire people to share! olivier
Re: i want to use something like *query* similar to database - %query% like search
Am 02.12.2009 um 09:55 schrieb amittripathi: > its accepting the trailing wildcard character but solr is not accepting the > leading wildcard character The Error message says it all. '*' or '?' not allowed as first character in WildcardQuery solr is not SQL. Olivier -- Olivier Dobberkau
RE: why no results?
Hi Regan, I am using STRING fields only for values that in most cases will be used to FACET on.. I suggest using TEXT fields as per the default examples... ALSO, remember that if you do not specify the " solr.LowerCaseFilterFactory " that your search has just become case sensitive.. I struggled with that one before, so make sure what you are indexing is what you are searching for. * Stick to the default examples that is provided with the SOLR distro and you should be fine. Jaco Olivier -Original Message- From: regany [mailto:re...@newzealand.co.nz] Sent: 08 December 2009 06:15 To: solr-user@lucene.apache.org Subject: Re: why no results? Tom Hill-7 wrote: > > Try solr.TextField instead. > Thanks Tom, I've replaced the section above with... deleted my index, restarted Solr and re-indexed my documents - but the search still returns nothing. Do I need to change the type in the sections as well? regan -- View this message in context: http://old.nabble.com/why-no-results--tp26688249p26688469.html Sent from the Solr - User mailing list archive at Nabble.com. Please consider the environment before printing this email. This transmission is for the intended addressee only and is confidential information. If you have received this transmission in error, please delete it and notify the sender. The content of this e-mail is the opinion of the writer only and is not endorsed by Sabinet Online Limited unless expressly stated otherwise.
RE: why no results?
Hi, Try changing your TEXT field to type "text" (without the of course :)) That is your problem... also use the "text" type as per default examples with SOLR distro :) Jaco Olivier -Original Message- From: regany [mailto:re...@newzealand.co.nz] Sent: 08 December 2009 05:44 To: solr-user@lucene.apache.org Subject: why no results? hi all - newbie solr question - I've indexed some documents and can search / receive results using the following schema - BUT ONLY when searching on the "id" field. If I try searching on the title, subtitle, body or text field I receive NO results. Very confused. :confused: Can anyone see anything obvious I'm doing wrong Regan. id text -- View this message in context: http://old.nabble.com/why-no-results--tp26688249p26688249.html Sent from the Solr - User mailing list archive at Nabble.com. Please consider the environment before printing this email. This transmission is for the intended addressee only and is confidential information. If you have received this transmission in error, please delete it and notify the sender. The content of this e-mail is the opinion of the writer only and is not endorsed by Sabinet Online Limited unless expressly stated otherwise.
RE: do copyField's need to exist as Fields?
Hi Regan, Something I noticed on your setup... The ID field in your setup I assume to be your uniqueID for the book or journal (The ISSN or something) Try making this a string as TEXT is not the ideal field to use for unique IDs Congrats on figuring out SOLR fields - I suggest getting the SOLR 1.4 Book.. It really saved me a 1000 questions on this mailing list :) Jaco Olivier -Original Message- From: regany [mailto:re...@newzealand.co.nz] Sent: 09 December 2009 00:48 To: solr-user@lucene.apache.org Subject: Re: do copyField's need to exist as Fields? regany wrote: > > Is there a different way I should be setting it up to achieve the above?? > Think I figured it out. I set up the so they are present, but get ignored accept for the "text" field which gets indexed... and then copyField the first 4 fields to the "text" field: Seems to be working!? :drunk: -- View this message in context: http://old.nabble.com/do-copyField%27s-need-to-exist-as-Fields--tp267017 06p26702224.html Sent from the Solr - User mailing list archive at Nabble.com. Please consider the environment before printing this email. This transmission is for the intended addressee only and is confidential information. If you have received this transmission in error, please delete it and notify the sender. The content of this e-mail is the opinion of the writer only and is not endorsed by Sabinet Online Limited unless expressly stated otherwise.
Re: Severe errors in solr configuration
Am 04.02.2009 um 13:33 schrieb Anto Binish Kaspar: Hi, I am trying to configure solr on ubuntu server and I am getting the following exception. I can able work it on windows box. Hi Anto. Have you installed the solr package 1.2 from ubuntu? Or the release 1.3 as war file? Olivier -- Olivier Dobberkau Je TYPO3, desto d.k.d d.k.d Internet Service GmbH Kaiserstr. 79 D 60329 Frankfurt/Main Registergericht: Amtsgericht Frankfurt am Main Registernummer: HRB 45590 Geschäftsführer: Olivier Dobberkau, Søren Schaffstein, Götz Wegenast fon: +49 (0)69 - 43 05 61-70 fax: +49 (0)69 - 43 05 61-90 mail: olivier.dobber...@dkd.de home: http://www.dkd.de aktuelle TYPO3-Projekte: www.licht.de - Relaunch (TYPO3) www.lahmeyer.de - Launch (TYPO3) www.seb-assetmanagement.de - Relaunch (TYPO3)
Re: Severe errors in solr configuration
Am 04.02.2009 um 13:54 schrieb Anto Binish Kaspar: Hi Olivier Thanks for your quick reply. I am using the release 1.3 as war file. - Anto Binish Kaspar OK. As far a i understood you need to make sure that your solr home is set. this needs to be done in Quting: http://wiki.apache.org/solr/SolrTomcat In addition to using the default behavior of relying on the Solr Home being in the current working directory (./solr) you can alternately add the solr.solr.home system property to your JVM settings before starting Tomcat... export JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=/my/custom/solr/home/dir/" ...or use a Context file to configure the Solr Home using JNDI A Tomcat context fragments can be used to configure the JNDI property needed to specify your Solr Home directory. Just put a context fragment file under $CATALINA_HOME/conf/Catalina/ localhost that looks something like this... $ cat /tomcat55/conf/Catalina/localhost/solr.xml Greetings, Olivier PS: May be it would be great if we could provide an ubuntu dpkg with 1.3 ? Any takers? -- Olivier Dobberkau Je TYPO3, desto d.k.d d.k.d Internet Service GmbH Kaiserstr. 79 D 60329 Frankfurt/Main Registergericht: Amtsgericht Frankfurt am Main Registernummer: HRB 45590 Geschäftsführer: Olivier Dobberkau, Søren Schaffstein, Götz Wegenast fon: +49 (0)69 - 43 05 61-70 fax: +49 (0)69 - 43 05 61-90 mail: olivier.dobber...@dkd.de home: http://www.dkd.de aktuelle TYPO3-Projekte: www.licht.de - Relaunch (TYPO3) www.lahmeyer.de - Launch (TYPO3) www.seb-assetmanagement.de - Relaunch (TYPO3)
Re: Severe errors in solr configuration
A slash? Olivier Von meinem iPhone gesendet Am 04.02.2009 um 14:06 schrieb Anto Binish Kaspar : I am using Context file, here is my solr.xml $ cat /var/lib/tomcat6/conf/Catalina/localhost/solr.xml I change the ownership of the folder (usr/local/solr/solr-1.3/solr) to tomcat6:tomcat6 from root:root Anything I am missing? - Anto Binish Kaspar -Original Message- From: Olivier Dobberkau [mailto:olivier.dobber...@dkd.de] Sent: Wednesday, February 04, 2009 6:30 PM To: solr-user@lucene.apache.org Subject: Re: Severe errors in solr configuration Am 04.02.2009 um 13:54 schrieb Anto Binish Kaspar: Hi Olivier Thanks for your quick reply. I am using the release 1.3 as war file. - Anto Binish Kaspar OK. As far a i understood you need to make sure that your solr home is set. this needs to be done in Quting: http://wiki.apache.org/solr/SolrTomcat In addition to using the default behavior of relying on the Solr Home being in the current working directory (./solr) you can alternately add the solr.solr.home system property to your JVM settings before starting Tomcat... export JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=/my/custom/solr/home/ dir/" ...or use a Context file to configure the Solr Home using JNDI A Tomcat context fragments can be used to configure the JNDI property needed to specify your Solr Home directory. Just put a context fragment file under $CATALINA_HOME/conf/Catalina/ localhost that looks something like this... $ cat /tomcat55/conf/Catalina/localhost/solr.xml Greetings, Olivier PS: May be it would be great if we could provide an ubuntu dpkg with 1.3 ? Any takers? -- Olivier Dobberkau Je TYPO3, desto d.k.d d.k.d Internet Service GmbH Kaiserstr. 79 D 60329 Frankfurt/Main Registergericht: Amtsgericht Frankfurt am Main Registernummer: HRB 45590 Geschäftsführer: Olivier Dobberkau, Søren Schaffstein, Götz Wegenast fon: +49 (0)69 - 43 05 61-70 fax: +49 (0)69 - 43 05 61-90 mail: olivier.dobber...@dkd.de home: http://www.dkd.de aktuelle TYPO3-Projekte: www.licht.de - Relaunch (TYPO3) www.lahmeyer.de - Launch (TYPO3) www.seb-assetmanagement.de - Relaunch (TYPO3)
Re: Severe errors in solr configuration
Am 04.02.2009 um 15:50 schrieb Anto Binish Kaspar: Yes I removed, still I have the same issue. Any idea what may be cause of this issue? Have you solved your problem? Olivier -- Olivier Dobberkau Je TYPO3, desto d.k.d d.k.d Internet Service GmbH Kaiserstr. 79 D 60329 Frankfurt/Main Registergericht: Amtsgericht Frankfurt am Main Registernummer: HRB 45590 Geschäftsführer: Olivier Dobberkau, Søren Schaffstein, Götz Wegenast fon: +49 (0)69 - 43 05 61-70 fax: +49 (0)69 - 43 05 61-90 mail: olivier.dobber...@dkd.de home: http://www.dkd.de aktuelle TYPO3-Projekte: www.licht.de - Relaunch (TYPO3) www.lahmeyer.de - Launch (TYPO3) www.seb-assetmanagement.de - Relaunch (TYPO3)
Re: Severe errors in solr configuration
Am 05.02.2009 um 12:07 schrieb Anto Binish Kaspar: Do I need to give some permissions to the folder? i would guess so. Olivier -- Olivier Dobberkau Je TYPO3, desto d.k.d d.k.d Internet Service GmbH Kaiserstr. 79 D 60329 Frankfurt/Main Registergericht: Amtsgericht Frankfurt am Main Registernummer: HRB 45590 Geschäftsführer: Olivier Dobberkau, Søren Schaffstein, Götz Wegenast fon: +49 (0)69 - 43 05 61-70 fax: +49 (0)69 - 43 05 61-90 mail: olivier.dobber...@dkd.de home: http://www.dkd.de aktuelle TYPO3-Projekte: www.licht.de - Relaunch (TYPO3) www.lahmeyer.de - Launch (TYPO3) www.seb-assetmanagement.de - Relaunch (TYPO3)
Apachecon 2009 Europe
Hi all, you came back with a head full of impressions from Apachecon Europe. Thanks a lot for the great Speeches and the inspiring personal talks. I strongly believe that solr will have great future. Olivier -- Olivier Dobberkau d.k.d Internet Service GmbH fon: +49 (0)69 - 43 05 61-70 fax: +49 (0)69 - 43 05 61-90 mail: olivier.dobber...@dkd.de home: http://www.dkd.de
Re: indexing/crawling HTML + solr
Hi Have à Look at the droids project in The incubator. Olivier Von meinem iPhone gesendet Am 03.06.2009 um 12:09 schrieb Gena Batsyan : Hi! to be short, where to start with the subject? Any pointers to some [semi-]functional solutions that crawl the web as a normal crawler, take care about html parsing, etc, and feed the crawled stuff as solr-documents per ? regards!
Re: Best approach to multiple languages
Am 22.07.2009 um 18:31 schrieb Ed Summers: In case you are curious I've attached a copy of our schema.xml to give you an idea of what we did. Thanks for sharing! -- Olivier Dobberkau
Re: How to set User.dir or CWD for Solr during Tomcat startup
Am 07.01.2010 um 00:07 schrieb Turner, Robbin J: > I've been doing a bunch of googling and haven't seen if there is a parameter > to set within Tomcat other than the solr/home which is setup in the solr.xml > under the $CATALINA_HOME/conf/Catalina/localhost/. Hi. We set this in solr.xml http://wiki.apache.org/solr/SolrTomcat#Simple_Example_Install hope this helps. olivier -- Olivier Dobberkau . . . . . . . . . . . . . . Je TYPO3, desto d.k.d
Re: Interesting stuff; Solr as a syslog store.
Am 13.02.2010 um 03:02 schrieb Antonio Lobato: > Just thought this would be a neat story to share with you all. I've really > grown to love Solr, it's something else! Hi Antonio, Great. Would you also share the source code somewhere! May the Source be with you. Thanks. Olivier
Re: ubuntu lucid package
Am 30.04.2010 um 09:24 schrieb Gora Mohanty: > Also, the standard Debian/Ubuntu way of finding out what files a > package installed is: > dpkg -l > > Regards, > Gora You might try: # dpkg -L solr-common /. /etc /etc/solr /etc/solr/web.xml /etc/solr/conf /etc/solr/conf/admin-extra.html /etc/solr/conf/elevate.xml /etc/solr/conf/mapping-ISOLatin1Accent.txt /etc/solr/conf/protwords.txt /etc/solr/conf/schema.xml /etc/solr/conf/scripts.conf /etc/solr/conf/solrconfig.xml /etc/solr/conf/spellings.txt /etc/solr/conf/stopwords.txt /etc/solr/conf/synonyms.txt /etc/solr/conf/xslt /etc/solr/conf/xslt/example.xsl /etc/solr/conf/xslt/example_atom.xsl /etc/solr/conf/xslt/example_rss.xsl /etc/solr/conf/xslt/luke.xsl /usr /usr/share /usr/share/solr /usr/share/solr/WEB-INF /usr/share/solr/WEB-INF/lib /usr/share/solr/WEB-INF/lib/apache-solr-core-1.4.0.jar /usr/share/solr/WEB-INF/lib/apache-solr-dataimporthandler-1.4.0.jar /usr/share/solr/WEB-INF/lib/apache-solr-solrj-1.4.0.jar /usr/share/solr/WEB-INF/weblogic.xml /usr/share/solr/scripts /usr/share/solr/scripts/abc /usr/share/solr/scripts/abo /usr/share/solr/scripts/backup /usr/share/solr/scripts/backupcleaner /usr/share/solr/scripts/commit /usr/share/solr/scripts/optimize /usr/share/solr/scripts/readercycle /usr/share/solr/scripts/rsyncd-disable /usr/share/solr/scripts/rsyncd-enable /usr/share/solr/scripts/rsyncd-start /usr/share/solr/scripts/rsyncd-stop /usr/share/solr/scripts/scripts-util /usr/share/solr/scripts/snapcleaner /usr/share/solr/scripts/snapinstaller /usr/share/solr/scripts/snappuller /usr/share/solr/scripts/snappuller-disable /usr/share/solr/scripts/snappuller-enable /usr/share/solr/scripts/snapshooter /usr/share/solr/admin /usr/share/solr/admin/_info.jsp /usr/share/solr/admin/action.jsp /usr/share/solr/admin/analysis.jsp /usr/share/solr/admin/analysis.xsl /usr/share/solr/admin/distributiondump.jsp /usr/share/solr/admin/favicon.ico /usr/share/solr/admin/form.jsp /usr/share/solr/admin/get-file.jsp /usr/share/solr/admin/get-properties.jsp /usr/share/solr/admin/header.jsp /usr/share/solr/admin/index.jsp /usr/share/solr/admin/jquery-1.2.3.min.js /usr/share/solr/admin/meta.xsl /usr/share/solr/admin/ping.jsp /usr/share/solr/admin/ping.xsl /usr/share/solr/admin/raw-schema.jsp /usr/share/solr/admin/registry.jsp /usr/share/solr/admin/registry.xsl /usr/share/solr/admin/replication /usr/share/solr/admin/replication/header.jsp /usr/share/solr/admin/replication/index.jsp /usr/share/solr/admin/schema.jsp /usr/share/solr/admin/solr-admin.css /usr/share/solr/admin/solr_small.png /usr/share/solr/admin/stats.jsp /usr/share/solr/admin/stats.xsl /usr/share/solr/admin/tabular.xsl /usr/share/solr/admin/threaddump.jsp /usr/share/solr/admin/threaddump.xsl /usr/share/solr/admin/debug.jsp /usr/share/solr/admin/dataimport.jsp /usr/share/solr/favicon.ico /usr/share/solr/index.jsp /usr/share/doc /usr/share/doc/solr-common /usr/share/doc/solr-common/changelog.Debian.gz /usr/share/doc/solr-common/README.Debian /usr/share/doc/solr-common/TODO.Debian /usr/share/doc/solr-common/copyright /usr/share/doc/solr-common/changelog.gz /usr/share/doc/solr-common/NOTICE.txt.gz /usr/share/doc/solr-common/README.txt.gz /var /var/lib /var/lib/solr /var/lib/solr/data /usr/share/solr/WEB-INF/lib/xml-apis.jar /usr/share/solr/WEB-INF/lib/xml-apis-ext.jar /usr/share/solr/WEB-INF/lib/slf4j-jdk14.jar /usr/share/solr/WEB-INF/lib/slf4j-api.jar /usr/share/solr/WEB-INF/lib/lucene-spellchecker.jar /usr/share/solr/WEB-INF/lib/lucene-snowball.jar /usr/share/solr/WEB-INF/lib/lucene-queries.jar /usr/share/solr/WEB-INF/lib/lucene-highlighter.jar /usr/share/solr/WEB-INF/lib/lucene-core.jar /usr/share/solr/WEB-INF/lib/lucene-analyzers.jar /usr/share/solr/WEB-INF/lib/jetty-util.jar /usr/share/solr/WEB-INF/lib/jetty.jar /usr/share/solr/WEB-INF/lib/commons-io.jar /usr/share/solr/WEB-INF/lib/commons-httpclient.jar /usr/share/solr/WEB-INF/lib/commons-fileupload.jar /usr/share/solr/WEB-INF/lib/commons-csv.jar /usr/share/solr/WEB-INF/lib/commons-codec.jar /usr/share/solr/WEB-INF/web.xml /usr/share/solr/conf If i reckon correctly some parts of apache solr will not work with the ubuntu lucid distribution. http://solr.dkd.local/update/extract throws an error: The server encountered an internal error (lazy loading error org.apache.solr.common.SolrException: lazy loading error at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249) at Maybe someone from ubuntu reading this list can confirm this. Olivier -- Olivier Dobberkau d.k.d Internet Service GmbH Kaiserstraße 73 60329 Frankfurt/Main mail: olivier.dobber...@dkd.de web: http://www.dkd.de
Solr 1.4 query fails against all fields, but succeed if field is specified.
Hi, I have created in index with several fields. If I query my index in the admin section of solr (or via http request), I get results for my search if I specify the requested field: Query: note:Aspergillus (look for "Aspergillus" in field "note") However, if I query the same word against all fields ("Aspergillus" or "all:Aspergillus") , I have no match in response from Solr. Do you have any idea of what can be wrong with my index? Regards Olivier
Re: Solr 1.4 query fails against all fields, but succeed if field is specified.
Ok, I use default e.g. standard request handler. Using "*:Aspergillus" does not work either. I can try with DisMax but this means that I know all field names. My schema knows a number of them, but some other fields are defined via dynamic fields (I know the type, but I do not know their names). Is there any way to query all fields including dynamic ones? thanks Olivier 2010/5/31 Michael Kuhlmann > Am 31.05.2010 11:50, schrieb olivier sallou: > > Hi, > > I have created in index with several fields. > > If I query my index in the admin section of solr (or via http request), I > > get results for my search if I specify the requested field: > > Query: note:Aspergillus (look for "Aspergillus" in field "note") > > However, if I query the same word against all fields ("Aspergillus" or > > "all:Aspergillus") , I have no match in response from Solr. > > Querying "Aspergillus" without a field does only work if you're using > DisMaxHandler. > > Do you have a field "all"? > > Try "*:Aspergillus" instead. >
Re: Solr 1.4 query fails against all fields, but succeed if field is specified.
I finally got a solution. As I use dynamic fields. I use the copyField to a global indexed attribute, and specify this attribute as defaultSearchField in my schema. The *:term with "standard" query type fails without this... This solution requires to double the required indexing data but works in all cases... In my schema I have: Some other fields are "lowercase" or "int" types. Regards 2010/5/31 Michael Kuhlmann > Am 31.05.2010 12:36, schrieb olivier sallou: > > Is there any way to query all fields including dynamic ones? > > Yes, using the *:term query. (Please note that the asterisk should not > be quoted.) > > To answer your question, we need more details on your Solr > configuration, esp. the part of schema.xml that defines your "note" field. > > Greetings, > Michael > > >
Re: newbie question on how to batch commit documents
I would additionally suggest to use embeddedSolrServer for large uploads if possible, performance are better. 2010/5/31 Steve Kuo > I have a newbie question on what is the best way to batch add/commit a > large > collection of document data via solrj. My first attempt was to write a > multi-threaded application that did following. > > Collection docs = new ArrayList(); > for (Widget w : widges) { >doc.addField("id", w.getId()); >doc.addField("name", w.getName()); > doc.addField("price", w.getPrice()); >doc.addField("category", w.getCat()); >doc.addField("srcType", w.getSrcType()); >docs.add(doc); > >// commit docs to solr server >server.add(docs); >server.commit(); > } > > And I got this exception. > > rg.apache.solr.common.SolrException: > > Error_opening_new_searcher_exceeded_limit_of_maxWarmingSearchers2_try_again_later > > > Error_opening_new_searcher_exceeded_limit_of_maxWarmingSearchers2_try_again_later > >at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:424) >at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243) >at > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) >at > org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:86) > > The solrj wiki/documents seemed to indicate that because multiple threads > were calling SolrServer.commit() which in term called > CommonsHttpSolrServer.request() resulting in multiple searchers. My first > thought was to change the configs for autowarming. But after looking at > the > autowarm params, I am not sure what can be changed or perhaps a different > approach is recommened. > > class="solr.FastLRUCache" > size="512" > initialSize="512" > autowarmCount="0"/> > > class="solr.LRUCache" > size="512" > initialSize="512" > autowarmCount="0"/> > > class="solr.LRUCache" > size="512" > initialSize="512" > autowarmCount="0"/> > > Your help is much appreciated. >
Re: solr itas
did you update solrconfig.xml to add /itas query handler? 2010/6/11 > Hi, > > When I type http://127.0.0.1:8080/solr/itas > > I receive this result in the webpage instead of html page. Does anyone > know the reason and/or suggestion to fix it. > > > - > - > 0 > 62 > > - > - > 1.0 > - > Lucid Imagination > > - > USA > > - > > > > > Thanks, > > >
Need help on Solr Cell usage with specific Tika parser
Hi, I use Solr Cell to send specific content files. I developped a dedicated Parser for specific mime types. However I cannot get Solr accepting my new mime types. In solrconfig, in update/extract requesthandler I specified ./tika-config.xml , where tika-config.xml is in conf directory (same as solrconfig). In tika-config I added my mimetypes: biosequence/document biosequence/embl biosequence/genbank I do not know for: whereas path to tika mimetypes should be absolute or relative... and even if this file needs to be redefined if "magic" is not used. When I run my update/extract, I have an error that "biosequence/document" does not match any known parser. Thanks Olivier
Re: Need help on Solr Cell usage with specific Tika parser
Yeap, I do. As magic is not set, this is the reason why it looks for this specific mime-type. Unfortunatly, It seems it either do not read my specific tika-config file or the mime-type file. But there is no error log concerning those files... (not trying to load them?) 2010/6/14 Ken Krugler > Hi Olivier, > > Are you setting the mime type explicitly via the stream.type parameter? > > -- Ken > > > On Jun 14, 2010, at 9:14am, olivier sallou wrote: > > Hi, >> I use Solr Cell to send specific content files. I developped a dedicated >> Parser for specific mime types. >> However I cannot get Solr accepting my new mime types. >> >> In solrconfig, in update/extract requesthandler I specified > name="tika.config">./tika-config.xml , where tika-config.xml is in >> conf directory (same as solrconfig). >> >> In tika-config I added my mimetypes: >> >> > class="org.irisa.genouest.tools.readseq.ReadSeqParser"> >> biosequence/document >> biosequence/embl >> biosequence/genbank >> >> >> I do not know for: >> >> >> whereas path to tika mimetypes should be absolute or relative... and even >> if >> this file needs to be redefined if "magic" is not used. >> >> >> When I run my update/extract, I have an error that "biosequence/document" >> does not match any known parser. >> >> Thanks >> >> Olivier >> > > > Ken Krugler > +1 530-210-6378 > http://bixolabs.com > e l a s t i c w e b m i n i n g > > > > >
Re: Need help on Solr Cell usage with specific Tika parser
Thanks, moving it to direcxt child worked. Olivier 2010/6/14 Chris Hostetter > > : In solrconfig, in update/extract requesthandler I specified : name="tika.config">./tika-config.xml , where tika-config.xml is in > : conf directory (same as solrconfig). > > can you show us the full requestHandler decalration? ... tika.config needs > to be a direct child of the requestHandler (not in the defaults) > > I also don't know if using a "local" path like that will work -- depends > on how that file is loaded (if solr loads it, then you might want to > remove the "./"; if solr just gives the path to tika, then you probably > need an absolute path. > > > -Hoss > >
ConfigSet API V2 issue with configSetProp.property present
Hi, I have an issue for creating a configset with the V2 API using a configset property. Indeed if I enter the command : curl -X POST -H 'Content-type: application/json' -d '{ "create":{"name": "Test", "baseConfigSet": "myConfigSet","configSetProp.immutable": "false"}}' http://localhost:8983/api/cluster/configs?omitHeader=true (same one than in the documentation : https://lucene.apache.org/solr/guide/7_5/configsets-api.html) It fails with the error : "errorMessages":["Unknown field 'configSetProp.immutable' in object : {\n \"name\":\"Test\",\n \"baseConfigSet\":\"myConfigSet\",\n \"configSetProp.immutable\":\"false\"}"]}], "msg":"Error in command payload", "code":400}} If I enter the same command still with the V2 API without the configSetProp.immutable property it succeeds. With the V1 API, no problem with or without the presence of the configset property. The tests were done with Solr 7.4 and Solr 7.5. Did I miss something with the configset property usage ? Thanks, Best regards, Olivier
Backup collections using SolrJ
Hi, I have a question regarding the backup of a Solr collection using SolrJ. I use Solr 7. I want to do a JAR for that and launch it into a cron job. So far, no problem for the request using CollectionAdminRequest.backupCollection then I use the processAsync method. The command is well transmitted to Solr. My problem is for parsing the response and manage the different cases in the code for a failure. Let's say that the Solr response is the following after sending the asynchronous backup request (the request id is "backupsolr") : { "responseHeader": { "status": 0, "QTime": 1 }, "success": { "IP:8983_solr": { "responseHeader": { "status": 0, "QTime": 0 } }, "IP:8983_solr": { "responseHeader": { "status": 0, "QTime": 0 } } }, "solrbackup5704378348890743": { "responseHeader": { "status": 0, "QTime": 0 }, "STATUS": "failed", "Response": "Failed to backup core=Test_shard1_replica1 because java.io.IOException: Aucun espace disponible sur le périphérique" }, "status": { "state": "completed", "msg": "found [solrbackup] in completed tasks" } } If I use the code : System.out.println(CollectionAdminRequest.requestStatus("solrbackup ").process(solr).getRequestStatus()); The output is : "COMPLETED". But it is not enough to check if the backup was well done or not. For example in this case the task is completed but the backup was not successful because there was not enough space left on the disk. So the interesting part is into the solrbackup5704378348890743 section of the response. My first question is why some numbers are added to the request-id name ? Because if I write : CollectionAdminRequest.requestStatus("solrbackup").getRequestId() the response is : "solrbackup" and not solrbackup5704378348890743. So retrieving the section related to solrbackup5704378348890743 in the response is not very easy. I cannot directly use (NamedList) CollectionAdminRequest.requestStatus("solrbackup").process(solr).getResponse().get("solrbackup") but instead I have to use an iterator into the entire Solr response and check the beginning of each String for retrieving the section that begins by solrbackup. And finally get the elements that I want. Am I correct to do this, maybe there is a simpler way to do that ? Thanks, Olivier Tavard
Cannot find Solr 7.4.1 release
Hi, I wanted to download Solr 7.4.1, but I cannot find the 7.4.1 release into http://archive.apache.org/dist/lucene/solr/ : there are Solr 7.4 and after directly 7.5. Of course I can build from source code, but this is frustrating because I can see that in the 7_4_branch there is a fix that I need (SOLR-12594) with the status fixed into 7.4.1 and 7.5 versions. Everythings seems to have been prepared to release the 7.4.1, but I cannot find it. Does this release exist ? Thank you, Olivier
filtering facets
Hi, Long time lurker, first time poster. I have a multi-valued field, let's call it article_outlinks containing all outgoing urls from a document. I want to get all matching urls sorted by counts. For exemple, I want to get all outgoing wikipedia url in my documents sorted by counts. So I execute a query like this: q=article_outlinks:http*wikipedia.org* and I facet on article_outlinks But I get facets containing the other urls in the documents. I can get something close by using facet.prefix=http://en.wikipedia.org but I want to include other subdomains on wikipedia (ex: fr.wikipedia.org). Is there a way to do a search and getting facets only matching my query? I know facet.prefix isn't a query, but is there a way to get that behavior? Is it easy to extend solr to do something like that? Thank you, Olivier Sorry for my english.
Re: filtering facets
Hi Mike, No, my problem is that the field article_outlinks is multivalued thus it contains several urls not related to my search. I would like to facet only urls matching my query. For exemple(only on one document, but my search targets over 1M docs): Doc1: article_url: url1.com/1 url2.com/2 url1.com/1 url1.com/3 And my query is: article_url:url1.com* and I facet by article_url and I want it to give me: url1.com/1 (2) url1.com/3 (1) But right now, because url2.com/2 is contained in a multivalued field with the matching urls, I get this: url1.com/1 (2) url1.com/3 (1) url2.com/2 (1) I can use facet.prefix to filter, but it's not very flexible if my url contains a subdomain as facet.prefix doesn't support wildcards. Thank you, Olivier Mike Topper a écrit : Hi Olivier, are the facet counts on the urls you dont want 0? if so you can use facet.mincount to only return results greater than 0. -Mike Olivier H. Beauchesne wrote: Hi, Long time lurker, first time poster. I have a multi-valued field, let's call it article_outlinks containing all outgoing urls from a document. I want to get all matching urls sorted by counts. For exemple, I want to get all outgoing wikipedia url in my documents sorted by counts. So I execute a query like this: q=article_outlinks:http*wikipedia.org* and I facet on article_outlinks But I get facets containing the other urls in the documents. I can get something close by using facet.prefix=http://en.wikipedia.org but I want to include other subdomains on wikipedia (ex: fr.wikipedia.org). Is there a way to do a search and getting facets only matching my query? I know facet.prefix isn't a query, but is there a way to get that behavior? Is it easy to extend solr to do something like that? Thank you, Olivier Sorry for my english.
Re: filtering facets
yeah, but then I would have to retrieve *a lot* of facets. I think for now i'll retrieve all the subdomains with facet.prefix and then merge those queries. Not ideal, but when I will have more motivation, I will submit a patch to solr :-) Michael a écrit : You could post-process the response and remove urls that don't match your domain pattern. On Mon, Aug 31, 2009 at 9:45 AM, Olivier H. Beauchesne wrote: Hi Mike, No, my problem is that the field article_outlinks is multivalued thus it contains several urls not related to my search. I would like to facet only urls matching my query. For exemple(only on one document, but my search targets over 1M docs): Doc1: article_url: url1.com/1 url2.com/2 url1.com/1 url1.com/3 And my query is: article_url:url1.com* and I facet by article_url and I want it to give me: url1.com/1 (2) url1.com/3 (1) But right now, because url2.com/2 is contained in a multivalued field with the matching urls, I get this: url1.com/1 (2) url1.com/3 (1) url2.com/2 (1) I can use facet.prefix to filter, but it's not very flexible if my url contains a subdomain as facet.prefix doesn't support wildcards. Thank you, Olivier Mike Topper a écrit : Hi Olivier, are the facet counts on the urls you dont want 0? if so you can use facet.mincount to only return results greater than 0. -Mike Olivier H. Beauchesne wrote: Hi, Long time lurker, first time poster. I have a multi-valued field, let's call it article_outlinks containing all outgoing urls from a document. I want to get all matching urls sorted by counts. For exemple, I want to get all outgoing wikipedia url in my documents sorted by counts. So I execute a query like this: q=article_outlinks:http*wikipedia.org* and I facet on article_outlinks But I get facets containing the other urls in the documents. I can get something close by using facet.prefix=http://en.wikipedia.org but I want to include other subdomains on wikipedia (ex: fr.wikipedia.org). Is there a way to do a search and getting facets only matching my query? I know facet.prefix isn't a query, but is there a way to get that behavior? Is it easy to extend solr to do something like that? Thank you, Olivier Sorry for my english.